Git Repo - linux.git/log

Merge tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fixes of error handling cleanup of metadata accounting with qgroups
   enabled

- fix swapped values for qgroup tracepoints

- fix race when handling full sync flag

- don't start unused worker thread, functionality removed already

* tag 'for-5.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: check for the full sync flag while holding the inode lock during fsync
  Btrfs: fix qgroup double free after failure to reserve metadata for delalloc
  btrfs: tracepoints: Fix bad entry members of qgroup events
  btrfs: tracepoints: Fix wrong parameter order for qgroup events
  btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()
  btrfs: don't needlessly create extent-refs kernel thread
  btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
  Btrfs: add missing extents release on file extent cluster relocation error

scripts/nsdeps: use alternative sed delimiter

When doing an out of tree build with O=, the nsdeps script constructs
the absolute pathname of the module source file so that it can insert
MODULE_IMPORT_NS statements in the right place. However, ${srctree}
contains an unescaped path to the source tree, which, when used in a sed
substitution, makes sed complain:

++ sed 's/[^ ]* *//home/jeyu/jeyu-linux\/&/g'
sed: -e expression #1, char 12: unknown option to `s'

The sed substitution command 's' ends prematurely with the forward
slashes in the pathname, and sed errors out when it encounters the 'h',
which is an invalid sed substitution option. To avoid escaping forward
slashes ${srctree}, we can use '|' as an alternative delimiter for
sed instead to avoid this error.

Reviewed-by: Masahiro Yamada <[email protected]>
Reviewed-by: Matthias Maennich <[email protected]>
Tested-by: Matthias Maennich <[email protected]>
Signed-off-by: Jessica Yu <[email protected]>

Merge branch 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for v5.4
from Viresh Kumar:

"This contains:

- Patch to revert addition of regulator enable/disable in OPP core
  (Marek).
- Remove incorrect lockdep assert (Viresh).
- Fix a kref counting issue (Viresh)."

* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Reinitialize the list_kref before adding the static OPPs again
  opp: core: Revert "add regulators enable and disable"
  opp: of: drop incorrect lockdep_assert_held()

fs/dax: Fix pmd vs pte conflict detection

Users reported a v5.3 performance regression and inability to establish
huge page mappings. A revised version of the ndctl "dax.sh" huge page
unit test identifies commit 23c84eb78375 "dax: Fix missed wakeup with
PMD faults" as the source.

Update get_unlocked_entry() to check for NULL entries before checking
the entry order, otherwise NULL is misinterpreted as a present pte
conflict. The 'order' check needs to happen before the locked check as
an unlocked entry at the wrong order must fallback to lookup the correct
order.

Reported-by: Jeff Smits <[email protected]>
Reported-by: Doug Nelson <[email protected]>
Cc: <[email protected]>
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Reviewed-by: Jan Kara <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

opp: Reinitialize the list_kref before adding the static OPPs again

The list_kref reaches a count of 0 when all the static OPPs are removed,
for example when dev_pm_opp_of_cpumask_remove_table() is called, though
the actual OPP table may not get freed as it may still be referenced by
other parts of the kernel, like from a call to
dev_pm_opp_set_supported_hw(). And if we call
dev_pm_opp_of_cpumask_add_table() again at this point, we must
reinitialize the list_kref otherwise the kernel will hit a WARN() in
kref infrastructure for incrementing a kref with value 0.

Fixes: 11e1a1648298 ("opp: Don't decrement uninitialized list_kref")
Reported-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>

ALSA: hda: Add Tigerlake/Jasperlake PCI ID

Add HD Audio Device PCI ID for the Intel Tigerlake and Jasperlake
platform.

Signed-off-by: Pan Xiuli <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

KVM: nVMX: Don't leak L1 MMIO regions to L2

If the "virtualize APIC accesses" VM-execution control is set in the
VMCS, the APIC virtualization hardware is triggered when a page walk
in VMX non-root mode terminates at a PTE wherein the address of the 4k
page frame matches the APIC-access address specified in the VMCS. On
hardware, the APIC-access address may be any valid 4k-aligned physical
address.

KVM's nVMX implementation enforces the additional constraint that the
APIC-access address specified in the vmcs12 must be backed by
a "struct page" in L1. If not, L0 will simply clear the "virtualize
APIC accesses" VM-execution control in the vmcs02.

The problem with this approach is that the L1 guest has arranged the
vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
VM-execution control is clear in the vmcs12--so that the L2 guest
physical address(es)--or L2 guest linear address(es)--that reference
the L2 APIC map to the APIC-access address specified in the
vmcs12. Without the "virtualize APIC accesses" VM-execution control in
the vmcs02, the APIC accesses in the L2 guest will directly access the
APIC-access page in L1.

When there is no mapping whatsoever for the APIC-access address in L1,
the L2 VM just loses the intended APIC virtualization. However, when
the APIC-access address is mapped to an MMIO region in L1, the L2
guest gets direct access to the L1 MMIO device. For example, if the
APIC-access address specified in the vmcs12 is 0xfee00000, then L2
gets direct access to L1's APIC.

Since this vmcs12 configuration is something that KVM cannot
faithfully emulate, the appropriate response is to exit to userspace
with KVM_INTERNAL_ERROR_EMULATION.

Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Reported-by: Dan Cross <[email protected]>
Signed-off-by: Jim Mattson <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update

Guest physical APIC ID may not equal to vcpu->vcpu_id in some case.
We may set the wrong physical id in avic_handle_ldr_update as we
always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page
instead.
Export and use kvm_xapic_id here and in avic_handle_apic_id_update
as suggested by Vitaly.

Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'misc' into fixes

cpufreq: Cancel policy update work scheduled before freeing

Scheduled policy update work may end up racing with the freeing of the
policy and unregistering the driver.

One possible race is as below, where the cpufreq_driver is unregistered,
but the scheduled work gets executed at later stage when, cpufreq_driver
is NULL (i.e. after freeing the policy and driver).

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = (ptrval)
[0000001c] *pgd=80000080204003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP THUMB2
Modules linked in:
CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 5.4.0-rc3-00006-g67f5a8081a4b #86
Hardware name: ARM-Versatile Express
Workqueue: events handle_update
PC is at cpufreq_set_policy+0x58/0x228
LR is at dev_pm_qos_read_value+0x77/0xac
Control: 70c5387d Table: 80203000 DAC: fffffffd
Process kworker/0:1 (pid: 34, stack limit = 0x(ptrval))
(cpufreq_set_policy) from (refresh_frequency_limits.part.24+0x37/0x48)
(refresh_frequency_limits.part.24) from (handle_update+0x2f/0x38)
(handle_update) from (process_one_work+0x16d/0x3cc)
(process_one_work) from (worker_thread+0xff/0x414)
(worker_thread) from (kthread+0xff/0x100)
(kthread) from (ret_from_fork+0x11/0x28)

Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Signed-off-by: Sudeep Holla <[email protected]>
[ rjw: Cancel the work before dropping the QoS requests ]
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

s390/kaslr: add support for R_390_GLOB_DAT relocation type

Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".

Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.

Fixes: 805bc0bc238f ("s390/kernel: build a relocatable kernel")
Cc: <[email protected]> # v5.2+
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Gerald Schaefer <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/zcrypt: fix memleak at release

If a process is interrupted while accessing the crypto device and the
global ap_perms_mutex is contented, release() could return early and
fail to free related resources.

Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support")
Cc: <[email protected]> # 4.19
Cc: Harald Freudenberger <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

ALSA: usb-audio: Fix copy&paste error in the validator

The recently introduced USB-audio descriptor validator had a stupid
copy&paste error that may lead to an unexpected overlook of too short
descriptors for processing and extension units. It's likely the cause
of the report triggered by syzkaller fuzzer. Let's fix it.

Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

perf/aux: Fix AUX output stopping

Commit:

8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'kvm-ppc-fixes-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM fix for 5.4

- Fix a bug in the XIVE code which can cause a host crash.

Merge tag 'kvmarm-fixes-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.4, take #2

Special PMU edition:

- Fix cycle counter truncation
- Fix cycle counter overflow limit on pure 64bit system
- Allow chained events to be actually functional
- Correct sample period after overflow

kvm: clear kvmclock MSR on reset

After resetting the vCPU, the kvmclock MSR keeps the previous value but it is
not enabled. This can be confusing, so fix it.

Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: fix bugon.cocci warnings

Use BUG_ON instead of a if condition followed by BUG.

Generated by: scripts/coccinelle/misc/bugon.cocci

Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()")
CC: Sean Christopherson <[email protected]>
Signed-off-by: kbuild test robot <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Remove specialized handling of unexpected exit-reasons

Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit
and handle WAITPKG vmexit") introduced specialized handling of
specific exit-reasons that should not be raised by CPU because
KVM configures VMCS such that they should never be raised.

However, since commit 7396d337cfad ("KVM: x86: Return to userspace
with internal error on unexpected exit reason"), VMX & SVM
exit handlers were modified to generically handle all unexpected
exit-reasons by returning to userspace with internal error.

Therefore, there is no need for specialized handling of specific
unexpected exit-reasons (This specialized handling also introduced
inconsistency for these exit-reasons to silently skip guest instruction
instead of return to userspace on internal-error).

Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit")
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: fix sync_regs_test with newer gccs

Commit 204c91eff798a ("KVM: selftests: do not blindly clobber registers in
guest asm") was intended to make test more gcc-proof, however, the result
is exactly the opposite: on newer gccs (e.g. 8.2.1) the test breaks with

==== Test Assertion Failure ====
  x86_64/sync_regs_test.c:168: run->s.regs.regs.rbx == 0xBAD1DEA + 1
  pid=14170 tid=14170 - Invalid argument
     1 0x00000000004015b3: main at sync_regs_test.c:166 (discriminator 6)
     2 0x00007f413fb66412: ?? ??:0
     3 0x000000000040191d: _start at ??:?
  rbx sync regs value incorrect 0x1.

Apparently, compile is still free to play games with registers even
when they have variables attached.

Re-write guest code with 'asm volatile' by embedding ucall there and
making sure rbx is preserved.

Fixes: 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported

vmx_dirty_log_test fails on AMD and this is no surprise as it is VMX
specific. Bail early when nested VMX is unsupported.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: consolidate VMX support checks

vmx_* tests require VMX and three of them implement the same check. Move it
to vmx library.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice

vmx_set_nested_state_test() checks if VMX is supported twice: in the very
beginning (and skips the whole test if it's not) and before doing
test_vmx_nested_state(). One should be enough.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled

Don't waste cycles to shrink/grow vCPU halt_poll_ns if host
side polling is disabled.

Acked-by: Marcelo Tosatti <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests: kvm: synchronize .gitignore to Makefile

Because "Untracked files:" are annoying.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID

When the RDPID instruction is supported on the host, enumerate it in
KVM_GET_SUPPORTED_CPUID.

Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Here is a bunch of pin control fixes. I was lagging behind on this
  one, some fixes should have come in earlier, sorry about that.

  Anyways here it is, pretty straight-forward fixes, the Strago fix
  stand out as something serious affecting a lot of machines.

  Summary:
   - Handle multiple instances of Intel chips without complaining.
   - Restore the Intel Strago DMI workaround
   - Make the Armada 37xx handle pins over 32
   - Fix the polarity of the LED group on Armada 37xx
   - Fix an off-by-one bug in the NS2 driver
   - Fix error path for iproc's platform_get_irq()
   - Fix error path on the STMFX driver
   - Fix a typo in the Berlin AS370 driver
   - Fix up misc errors in the Aspeed 2600 BMC support
   - Fix a stray SPDX tag"

* tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups
  pinctrl: aspeed-g6: Fix UART13 group pinmux
  pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively
  pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration
  pinctrl: aspeed-g6: Fix I2C14 SDA description
  pinctrl: aspeed-g6: Sort pins for sanity
  dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups
  pinctrl: berlin: as370: fix a typo s/spififib/spdifib
  pinctrl: armada-37xx: swap polarity on LED group
  pinctrl: stmfx: fix null pointer on remove
  pinctrl: iproc: allow for error from platform_get_irq()
  pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable()
  pinctrl: bcm-iproc: Use SPDX header
  pinctrl: armada-37xx: fix control of pins 32 and up
  pinctrl: cherryview: restore Strago DMI workaround for all versions
  pinctrl: intel: Allocate IRQ chip dynamic

cpuidle: haltpoll: Take 'idle=' override into account

Currenly haltpoll isn't aware of the 'idle=' override, the priority is
'idle=poll' > haltpoll > 'idle=halt'. When 'idle=poll' is used, cpuidle
driver is bypassed but current_driver in sys still shows 'haltpoll'.

When 'idle=halt' is used, haltpoll takes precedence and makes
'idle=halt' have no effect.

Add a check to prevent the haltpoll driver from loading if 'idle=' is
present.

Signed-off-by: Zhenzhong Duan <[email protected]>
Co-developed-by: Joao Martins <[email protected]>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: NFIT: Fix unlock on error in scrub_show()

We change the locking in this function and forgot to update this error
path so we are accidentally still holding the "dev->lockdep_mutex".

Fixes: 87a30e1f05d7 ("driver-core, libnvdimm: Let device subsystems add local lockdep coverage")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: 5.3+ <[email protected]> # 5.3+
Signed-off-by: Rafael J. Wysocki <[email protected]>

tracing: Fix race in perf_trace_buf initialization

A race condition exists while initialiazing perf_trace_buf from
perf_trace_init() and perf_kprobe_init().

      CPU0                                        CPU1
perf_trace_init()
  mutex_lock(&event_mutex)
    perf_trace_event_init()
      perf_trace_event_reg()
        total_ref_count == 0
buf = alloc_percpu()
        perf_trace_buf[i] = buf
        tp_event->class->reg() //fails       perf_kprobe_init()
goto fail                              perf_trace_event_init()
                                                 perf_trace_event_reg()
        fail:
  total_ref_count == 0

                                                   total_ref_count == 0
                                                   buf = alloc_percpu()
                                                   perf_trace_buf[i] = buf
                                                   tp_event->class->reg()
                                                   total_ref_count++

          free_percpu(perf_trace_buf[i])
          perf_trace_buf[i] = NULL

Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0,
causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf
getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring
event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should
fix this race.

The race caused the following bug:

Unable to handle kernel paging request at virtual address 0000003106f2003c
Mem abort info:
   ESR = 0x96000045
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x00000045
   CM = 0, WnR = 1
user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000
[0000003106f2003c] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000045 [#1] PREEMPT SMP
Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000)
pstate: 80400005 (Nzcv daif +PAN -UAO)
pc : __memset+0x20/0x1ac
lr : memset+0x3c/0x50
sp : ffffffc09319fc50

  __memset+0x20/0x1ac
  perf_trace_buf_alloc+0x140/0x1a0
  perf_trace_sys_enter+0x158/0x310
  syscall_trace_enter+0x348/0x7c0
  el0_svc_common+0x11c/0x368
  el0_svc_handler+0x12c/0x198
  el0_svc+0x8/0xc

Ramdumps showed the following:
  total_ref_count = 3
  perf_trace_buf = (
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL,
      0x0 -> NULL)

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <[email protected]>
Signed-off-by: Prateek Sood <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

x86/cpu/vmware: Fix platform detection VMWARE_PORT macro

The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.

Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.

Signed-off-by: Thomas Hellstrom <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm

LLVM's assembler doesn't accept the short form INL instruction:

inl (%%dx)

but instead insists on the output register to be explicitly specified.

This was previously fixed for the VMWARE_PORT macro. Fix it also for
the VMWARE_HYPERCALL macro.

Suggested-by: Sami Tolvanen <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'arm-soc/for-5.4/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoC Device Tree fixes for
5.4, please pull the following:

- Stefan removes the activity LED node from the CM3 DTS since there is
no driver for that LED yet and leds-gpio cannot drive it either

* tag 'arm-soc/for-5.4/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'arm-soc/for-5.4/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.4, please pull the following:

- Stefan fixes the MMC controller bus-width property for the Raspberry Pi
Zero Wireless which was incorrect after a prior refactoring

* tag 'arm-soc/for-5.4/devicetree-fixes' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'davinci-fixes-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into arm/fixes

DaVinci fixes for v5.4
======================
* fix GPIO backlight support on DA850 by enabling the needed config
  in davinci_all_defconfig. This is a fix because the driver and board
  support got converted to use BACKLIGHT_GPIO driver, but defconfig update
  is still missing in v5.4.
* fix for McBSP DMA on DM365

* tag 'davinci-fixes-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
  ARM: davinci_all_defconfig: enable GPIO backlight
  ARM: davinci: dm365: Fix McBSP dma_slave_map entry

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'v5.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

A number of fixes for individual boards like the rockpro64, and Hugsun X99
as well as a fix for the Gru-Kevin display override and fixing the dt-
binding for Theobroma boards to the correct naming that is also actually
used in the wild.

* tag 'v5.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
  arm64: dts: rockchip: Fix usb-c on Hugsun X99 TV Box
  arm64: dts: rockchip: fix RockPro64 sdmmc settings
  arm64: dts: rockchip: fix RockPro64 sdhci settings
  arm64: dts: rockchip: fix RockPro64 vdd-log regulator settings
  dt-bindings: arm: rockchip: fix Theobroma-System board bindings
  arm64: dts: rockchip: fix Rockpro64 RK808 interrupt line

Link: https://lore.kernel.org/r/1599050.HRXuSXmxRg@phil
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'imx-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.4:
- Re-enable SNVS power key for imx6q-logicpd board which was accidentally
   disabled by a SoC level change.
- Fix I2C switches on vf610-zii-scu4-aib board by specifying property
   i2c-mux-idle-disconnect.
- A fix on imx-scu API that reads UID from firmware to avoid kernel NULL
   pointer dump.
- A series from Anson to correct i.MX7 GPT and i.MX8 USDHC IPG clock.
- A fix on DRM_MSM Kconfig regression on i.MX5 by adding the option
   explicitly into imx_v6_v7_defconfig.
- Fix ARM regulator states issue for zii-ultra board, which is impacting
   stability of the board.
- A correction on CPU core idle state name for LayerScape LX2160A SoC.

* tag 'imx-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
  arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
  arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
  arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
  ARM: dts: imx7s: Correct GPT's ipg clock source
  ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
  ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
  arm64: dts: lx2160a: Correct CPU core idle state name
  arm64: dts: zii-ultra: fix ARM regulator states
  soc: imx: imx-scu: Getting UID from SCU should have response

Link: https://lore.kernel.org/r/20191017141851.GA22506@dragon
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'omap-for-v5.4/fixes-rc3-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps for v5.4-rc cycle

More fixes for omap variants:

- Update more panel options in omap2plus_defconfig that got changed
  as we moved to use generic LCD panels

- Remove unused twl_keypad for logicpd-torpedo-som to avoid boot
  time warnings. This is only a cosmetic fix, but at least dmesg output
  is now getting more readable after all the fixes to remove pointless
  warnings

- Fix gpu_cm node name as we still have a non-standard node name
  dependency for clocks. This should eventually get fixed by use
  of domain specific compatible property

- Fix use of i2c-mux-idle-disconnect for m3874-iceboard

- Use level interrupt for omap4 & 5 wlcore to avoid lost edge
  interrupts

* tag 'omap-for-v5.4/fixes-rc3-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Use level interrupt for omap4 & 5 wlcore
  ARM: dts: am3874-iceboard: Fix 'i2c-mux-idle-disconnect' usage
  ARM: dts: omap5: fix gpu_cm clock provider name
  ARM: dts: logicpd-torpedo-som: Remove twl_keypad
  ARM: omap2plus_defconfig: Fix selected panels after generic panel changes

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'arm-soc/for-5.4/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM64-based SoCs Device Tree fixes
for 5.4, please pull the following:

- Rayangonda fixes the GPIO pins assignment for the Stringray SoCs

* tag 'arm-soc/for-5.4/devicetree-arm64-fixes' of https://github.com/Broadcom/stblinux:
arm64: dts: Fix gpio to pinmux mapping

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

ARM: 8926/1: v7m: remove register save to stack before svc

r0-r3 & r12 registers are saved & restored, before & after svc
respectively. Intention was to preserve those registers across thread to
handler mode switch.

On v7-M, hardware saves the register context upon exception in AAPCS
complaint way. Restoring r0-r3 & r12 is done from stack location where
hardware saves it, not from the location on stack where these registers
were saved.

To clarify, on stm32f429 discovery board:

1. before svc, sp - 0x90009ff8
2. r0-r3,r12 saved to 0x90009ff8 - 0x9000a00b
3. upon svc, h/w decrements sp by 32 & pushes registers onto stack
4. after svc, sp - 0x90009fd8
5. r0-r3,r12 restored from 0x90009fd8 - 0x90009feb

Above means r0-r3,r12 is not restored from the location where they are
saved, but since hardware pushes the registers onto stack, the registers
are restored correctly.

Note that during register saving to stack (step 2), it goes past
0x9000a000. And it seems, based on objdump, there are global symbols
residing there, and it perhaps can cause issues on a non-XIP Kernel
(on XIP, data section is setup later).

Based on the analysis above, manually saving registers onto stack is at
best no-op and at worst can cause data section corruption. Hence remove
storing of registers onto stack before svc.

Fixes: b70cd406d7fe ("ARM: 8671/1: V7M: Preserve registers across switch from Thread to Handler mode")
Signed-off-by: afzal mohammed <[email protected]>
Acked-by: Vladimir Murzin <[email protected]>
Signed-off-by: Russell King <[email protected]>

Input: st1232 - fix reporting multitouch coordinates

For Sitronix st1633 multi-touch controller driver the coordinates reported
for multiple fingers were wrong, as it was always taking LSB of coordinates
from the first contact data.

Signed-off-by: Dixit Parmar <[email protected]>
Reviewed-by: Martin Kepplinger <[email protected]>
Cc: [email protected]
Fixes: 351e0592bfea ("Input: st1232 - add support for st1633")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204561
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

mmc: mxs: fix flags passed to dmaengine_prep_slave_sg

Since ceeeb99cd821 we no longer abuse the DMA_CTRL_ACK flag for custom
driver use and introduced the MXS_DMA_CTRL_WAIT4END instead. We have not
changed all users to this flag though. This patch fixes it for the
mxs-mmc driver.

Fixes: ceeeb99cd821 ("dmaengine: mxs: rename custom flag")
Signed-off-by: Sascha Hauer <[email protected]>
Tested-by: Fabio Estevam <[email protected]>
Reported-by: Bruno Thomsen <[email protected]>
Tested-by: Bruno Thomsen <[email protected]>
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>

drm/komeda: Fix typos in komeda_splitter_validate

Fix both the string and the struct member being printed.

Changes since v1:
- Now with a bonus grammar fix, too.

Fixes: 264b9436d23b ("drm/komeda: Enable writeback split support")
Reviewed-by: James Qian Wang (Arm Technology China) <[email protected]>
Signed-off-by: Mihail Atanassov <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/komeda: Don't flush inactive pipes

HW doesn't allow flushing inactive pipes and raises an MERR interrupt
if you try to do so. Stop triggering the MERR interrupt in the
middle of a commit by calling drm_atomic_helper_commit_planes
with the ACTIVE_ONLY flag.

Reviewed-by: James Qian Wang (Arm Technology China) <[email protected]>
Signed-off-by: Mihail Atanassov <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

i2c: aspeed: fix master pending state handling

In case of master pending state, it should not trigger a master
command, otherwise data could be corrupted because this H/W shares
the same data buffer for slave and master operations. It also means
that H/W command queue handling is unreliable because of the buffer
sharing issue. To fix this issue, it clears command queue if a
master command is queued in pending state to use S/W solution
instead of H/W command queue handling. Also, it refines restarting
mechanism of the pending master command.

Fixes: 2e57b7cebb98 ("i2c: aspeed: Add multi-master use case support")
Signed-off-by: Jae Hyun Yoo <[email protected]>
Reviewed-by: Brendan Higgins <[email protected]>
Acked-by: Joel Stanley <[email protected]>
Tested-by: Tao Ren <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

Merge tag 'asoc-fix-v5.4-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.4

A collection of fixes that have arrived since the merge window. There
are a small number of core fixes here but they are smaller ones around
error handling.

mmc: cqhci: Commit descriptors before setting the doorbell

Add a write memory barrier to make sure that descriptors are actually
written to memory, before ringing the doorbell.

Signed-off-by: Faiz Abbas <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>

mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C

According to the App note[1] detailing the tuning algorithm, for
temperatures < -20C, the initial tuning value should be min(largest value
in LPW - 24, ceil(13/16 ratio of LPW)). The largest value in LPW is
(max_window + 4 * (max_len - 1)) and not (max_window + 4 * max_len) itself.
Fix this implementation.

[1] http://www.ti.com/lit/an/spraca9b/spraca9b.pdf

Fixes: 961de0a856e3 ("mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929)")
Cc: [email protected]
Signed-off-by: Faiz Abbas <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

ALSA: hda/realtek - Add support for ALC711

Support new codec ALC711.

Signed-off-by: Kailang Yang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

perf/aux: Fix tracking of auxiliary trace buffer allocation

The following commit from the v5.4 merge window:

  d44248a41337 ("perf/core: Rework memory accounting in perf_mmap()")

... breaks auxiliary trace buffer tracking.

If I run command 'perf record -e rbd000' to record samples and saving
them in the **auxiliary** trace buffer then the value of 'locked_vm' becomes
negative after all trace buffers have been allocated and released:

During allocation the values increase:

  [52.250027] perf_mmap user->locked_vm:0x87 pinned_vm:0x0 ret:0
  [52.250115] perf_mmap user->locked_vm:0x107 pinned_vm:0x0 ret:0
  [52.250251] perf_mmap user->locked_vm:0x188 pinned_vm:0x0 ret:0
  [52.250326] perf_mmap user->locked_vm:0x208 pinned_vm:0x0 ret:0
  [52.250441] perf_mmap user->locked_vm:0x289 pinned_vm:0x0 ret:0
  [52.250498] perf_mmap user->locked_vm:0x309 pinned_vm:0x0 ret:0
  [52.250613] perf_mmap user->locked_vm:0x38a pinned_vm:0x0 ret:0
  [52.250715] perf_mmap user->locked_vm:0x408 pinned_vm:0x2 ret:0
  [52.250834] perf_mmap user->locked_vm:0x408 pinned_vm:0x83 ret:0
  [52.250915] perf_mmap user->locked_vm:0x408 pinned_vm:0x103 ret:0
  [52.251061] perf_mmap user->locked_vm:0x408 pinned_vm:0x184 ret:0
  [52.251146] perf_mmap user->locked_vm:0x408 pinned_vm:0x204 ret:0
  [52.251299] perf_mmap user->locked_vm:0x408 pinned_vm:0x285 ret:0
  [52.251383] perf_mmap user->locked_vm:0x408 pinned_vm:0x305 ret:0
  [52.251544] perf_mmap user->locked_vm:0x408 pinned_vm:0x386 ret:0
  [52.251634] perf_mmap user->locked_vm:0x408 pinned_vm:0x406 ret:0
  [52.253018] perf_mmap user->locked_vm:0x408 pinned_vm:0x487 ret:0
  [52.253197] perf_mmap user->locked_vm:0x408 pinned_vm:0x508 ret:0
  [52.253374] perf_mmap user->locked_vm:0x408 pinned_vm:0x589 ret:0
  [52.253550] perf_mmap user->locked_vm:0x408 pinned_vm:0x60a ret:0
  [52.253726] perf_mmap user->locked_vm:0x408 pinned_vm:0x68b ret:0
  [52.253903] perf_mmap user->locked_vm:0x408 pinned_vm:0x70c ret:0
  [52.254084] perf_mmap user->locked_vm:0x408 pinned_vm:0x78d ret:0
  [52.254263] perf_mmap user->locked_vm:0x408 pinned_vm:0x80e ret:0

The value of user->locked_vm increases to a limit then the memory
is tracked by pinned_vm.

During deallocation the size is subtracted from pinned_vm until
it hits a limit. Then a larger value is subtracted from locked_vm
leading to a large number (because of type unsigned):

  [64.267797] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x78d
  [64.267826] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x70c
  [64.267848] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x68b
  [64.267869] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x60a
  [64.267891] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x589
  [64.267911] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x508
  [64.267933] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x487
  [64.267952] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x406
  [64.268883] perf_mmap_close mmap_user->locked_vm:0x307 pinned_vm:0x406
  [64.269117] perf_mmap_close mmap_user->locked_vm:0x206 pinned_vm:0x406
  [64.269433] perf_mmap_close mmap_user->locked_vm:0x105 pinned_vm:0x406
  [64.269536] perf_mmap_close mmap_user->locked_vm:0x4 pinned_vm:0x404
  [64.269797] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff84 pinned_vm:0x303
  [64.270105] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff04 pinned_vm:0x202
  [64.270374] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe84 pinned_vm:0x101
  [64.270628] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe04 pinned_vm:0x0

This value sticks for the user until system is rebooted, causing
follow-on system calls using locked_vm resource limit to fail.

Note: There is no issue using the normal trace buffer.

In fact the issue is in perf_mmap_close(). During allocation auxiliary
trace buffer memory is either traced as 'extra' and added to 'pinned_vm'
or trace as 'user_extra' and added to 'locked_vm'. This applies for
normal trace buffers and auxiliary trace buffer.

However in function perf_mmap_close() all auxiliary trace buffer is
subtraced from 'locked_vm' and never from 'pinned_vm'. This breaks the
ballance.

Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: d44248a41337 ("perf/core: Rework memory accounting in perf_mmap()")
Link: https://lkml.kernel.org/r/[email protected]
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'perf-urgent-for-mingo-5.4-20191017' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

perf buildid-cache:

  Adrian Hunter:

  - Fix mode setting in copyfile_mode_ns() when copying /proc/kcore.

perf evlist:

  Andi Kleen:

  - Fix freeing id arrays.

tools headers:

  - Sync sched.h anc kvm.h headers with the kernel sources.

perf jvmti:

  Thomas Richter:

  - Link against tools/lib/ctype.o to have weak strlcpy().

perf annotate:

  Gustavo A. R. Silva:

  - Fix multiple memory and file descriptor leaks, found by coverity.

perf c2c/kmem:

  Yunfeng Ye:

   - Fix leaks in error handling paths in 'perf c2c', 'perf kmem',  found by
     internal static analysis tool.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

opp: core: Revert "add regulators enable and disable"

All the drivers, which use the OPP framework control regulators, which
are already enabled. Typically those regulators are also system critical,
due to providing power to CPU core or system buses. It turned out that
there are cases, where calling regulator_enable() on such boot-enabled
regulator has side-effects and might change its initial voltage due to
performing initial voltage balancing without all restrictions from the
consumers. Until this issue becomes finally solved in regulator core,
avoid calling regulator_enable()/disable() from the OPP framework.

This reverts commit 7f93ff73f7c8c8bfa6be33bcc16470b0b44682aa.

Signed-off-by: Marek Szyprowski <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>

cifs: Fix missed free operations

cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.

CC: Stable <[email protected]>
Fixes: aa081859b10c ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>

CIFS: avoid using MID 0xFFFF

According to MS-CIFS specification MID 0xFFFF should not be used by the
CIFS client, but we actually do. Besides, this has proven to cause races
leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
MID is a 2 byte value easy to reach in CurrentMid which may conflict with
an oplock break notification request coming from server

Signed-off-by: Roberto Bergantinos Corpas <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
CC: Stable <[email protected]>

cifs: clarify comment about timestamp granularity for old servers

It could be confusing why we set granularity to 1 seconds rather
than 2 seconds (1 second is the max the VFS allows) for these
mounts to very old servers ...

Signed-off-by: Steve French <[email protected]>

cifs: Handle -EINPROGRESS only when noblockcnt is set

We only want to avoid blocking in connect when mounting SMB root
filesystems, otherwise bail out from generic_ip_connect() so cifs.ko
can perform any reconnect failover appropriately.

This fixes DFS failover/reconnection tests in upstream buildbot.

Fixes: 8eecd1c2e5bc ("cifs: Add support for root file systems")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

PM: QoS: Drop frequency QoS types from device PM QoS

There are no more active users of DEV_PM_QOS_MIN_FREQUENCY and
DEV_PM_QOS_MAX_FREQUENCY device PM QoS request types, so drop them
along with the code supporting them.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>

cpufreq: Use per-policy frequency QoS

Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.

Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).

In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.

The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.

After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.

Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <[email protected]>
Tested-by: Dmitry Osipenko <[email protected]>
Reported-by: Sudeep Holla <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
Diagnosed-by: Viresh Kumar <[email protected]>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>

PM: QoS: Introduce frequency QoS

Introduce frequency QoS, based on the "raw" low-level PM QoS, to
represent min and max frequency requests and aggregate constraints.

The min and max frequency requests are to be represented by
struct freq_qos_request objects and the aggregate constraints are to
be represented by struct freq_constraints objects. The latter are
expected to be initialized with the help of freq_constraints_init().

The freq_qos_read_value() helper is defined to retrieve the aggregate
constraints values from a given struct freq_constraints object and
there are the freq_qos_add_request(), freq_qos_update_request() and
freq_qos_remove_request() helpers to manipulate the min and max
frequency requests. It is assumed that the the helpers will not
run concurrently with each other for the same struct freq_qos_request
object, so if that may be the case, their uses must ensure proper
synchronization between them (e.g. through locking).

In addition, freq_qos_add_notifier() and freq_qos_remove_notifier()
are provided to add and remove notifiers that will trigger on aggregate
constraint changes to and from a given struct freq_constraints object,
respectively.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>

Linux 5.4-rc4

Merge tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild fixes from Masahiro Yamada:

- fix a bashism of setlocalversion

- do not use the too new --sort option of tar

* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kheaders: substituting --sort in archive creation
  scripts: setlocalversion: fix a bashism
  kbuild: update comment about KBUILD_ALLDIRS

perf/x86/intel/pt: Fix base for single entry topa

Jan reported failing ltp test for PT:

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/tracing/pt_test/pt_test.c

It looks like the reason is this new commit added in this v5.4 merge window:

38bb8d77d0b9 ("perf/x86/intel/pt: Split ToPA metadata and page layout")

which did not keep the TOPA_SHIFT for entry base.

Add it back.

Reported-by: Jan Stancek <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Fixes: 38bb8d77d0b9 ("perf/x86/intel/pt: Split ToPA metadata and page layout")
Link: https://lkml.kernel.org/r/[email protected]
[ Minor changelog edits. ]
Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:

   - Prevent a NULL pointer dereference in the X2APIC code in case of a
     CPU hotplug failure.

   - Prevent boot failures on HP superdome machines by invalidating the
     level2 kernel pagetable entries outside of the kernel area as
     invalid so BIOS reserved space won't be touched unintentionally.

     Also ensure that memory holes are rounded up to the next PMD
     boundary correctly.

   - Enable X2APIC support on Hyper-V to prevent boot failures.

   - Set the paravirt name when running on Hyper-V for consistency

   - Move a function under the appropriate ifdef guard to prevent build
     warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard
  x86/hyperv: Set pv_info.name to "Hyper-V"
  x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
  x86/hyperv: Make vapic support x2apic mode
  x86/boot/64: Round memory hole size up to next PMD page
  x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A small set of irq chip driver fixes and updates:

   - Update the SIFIVE PLIC interrupt driver to use the fasteoi handler
     to address the shortcomings of the existing flow handling which was
     prone to lose interrupts

   - Use the proper limit for GIC interrupt line numbers

   - Add retrigger support for the recently merged Anapurna Labs Fabric
     interrupt controller to make it complete

   - Enable the ATMEL AIC5 interrupt controller driver on the new
     SAM9X60 SoC"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Switch to fasteoi flow
  irqchip/gic-v3: Fix GIC_LINE_NR accessor
  irqchip/atmel-aic5: Add support for sam9x60 irqchip
  irqchip/al-fic: Add support for irq retrigger

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull hrtimer fixlet from Thomas Gleixner:
"A single commit annotating the lockcless access to timer->base with
READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Annotate lockless access to timer->base

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull stop-machine fix from Thomas Gleixner:
"A single fix, amending stop machine with WRITE/READ_ONCE() to address
the fallout of KCSAN"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Avoid potential race behaviour

KVM: arm64: pmu: Reset sample period on overflow handling

The PMU emulation code uses the perf event sample period to trigger
the overflow detection. This works fine for the *first* overflow
handling, but results in a huge number of interrupts on the host,
unrelated to the number of interrupts handled in the guest (a x20
factor is pretty common for the cycle counter). On a slow system
(such as a SW model), this can result in the guest only making
forward progress at a glacial pace.

It turns out that the clue is in the name. The sample period is
exactly that: a period. And once the an overflow has occured,
the following period should be the full width of the associated
counter, instead of whatever the guest had initially programed.

Reset the sample period to the architected value in the overflow
handler, which now results in a number of host interrupts that is
much closer to the number of interrupts in the guest.

Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing")
Reviewed-by: Andrew Murray <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>

KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event

The current convention for KVM to request a chained event from the
host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED).

But as it turns out, this bit gets set *after* we create the kernel
event that backs our virtual counter, meaning that we never get
a 64bit counter.

Moving the setting to an earlier point solves the problem.

Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: Andrew Murray <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>

arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems

Of PMCR_EL0.LC, the ARMv8 ARM says:

"In an AArch64 only implementation, this field is RES 1."

So be it.

Fixes: ab9468340d2bc ("arm64: KVM: Add access handler for PMCR register")
Reviewed-by: Andrew Murray <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>

KVM: arm64: pmu: Fix cycle counter truncation

When a counter is disabled, its value is sampled before the event
is being disabled, and the value written back in the shadow register.

In that process, the value gets truncated to 32bit, which is adequate
for any counter but the cycle counter (defined as a 64bit counter).

This obviously results in a corrupted counter, and things like
"perf record -e cycles" not working at all when run in a guest...
A similar, but less critical bug exists in kvm_pmu_get_counter_value.

Make the truncation conditional on the counter not being the cycle
counter, which results in a minor code reorganisation.

Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters")
Reviewed-by: Andrew Murray <[email protected]>
Reported-by: Julien Thierry <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:
"I was battling a cold after some recent trips, so quite a bit piled up
  meanwhile, sorry about that.

  Highlights:

   1) Fix fd leak in various bpf selftests, from Brian Vazquez.

   2) Fix crash in xsk when device doesn't support some methods, from
      Magnus Karlsson.

   3) Fix various leaks and use-after-free in rxrpc, from David Howells.

   4) Fix several SKB leaks due to confusion of who owns an SKB and who
      should release it in the llc code. From Eric Biggers.

   5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet.

   6) Jumbo packets don't work after resume on r8169, as the BIOS resets
      the chip into non-jumbo mode during suspend. From Heiner Kallweit.

   7) Corrupt L2 header during MPLS push, from Davide Caratti.

   8) Prevent possible infinite loop in tc_ctl_action, from Eric
      Dumazet.

   9) Get register bits right in bcmgenet driver, based upon chip
      version. From Florian Fainelli.

  10) Fix mutex problems in microchip DSA driver, from Marek Vasut.

  11) Cure race between route lookup and invalidation in ipv4, from Wei
      Wang.

  12) Fix performance regression due to false sharing in 'net'
      structure, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits)
  net: reorder 'struct net' fields to avoid false sharing
  net: dsa: fix switch tree list
  net: ethernet: dwmac-sun8i: show message only when switching to promisc
  net: aquantia: add an error handling in aq_nic_set_multicast_list
  net: netem: correct the parent's backlog when corrupted packet was dropped
  net: netem: fix error path for corrupted GSO frames
  macb: propagate errors when getting optional clocks
  xen/netback: fix error path of xenvif_connect_data()
  net: hns3: fix mis-counting IRQ vector numbers issue
  net: usb: lan78xx: Connect PHY before registering MAC
  vsock/virtio: discard packets if credit is not respected
  vsock/virtio: send a credit update when buffer size is changed
  mlxsw: spectrum_trap: Push Ethernet header before reporting trap
  net: ensure correct skb->tstamp in various fragmenters
  net: bcmgenet: reset 40nm EPHY on energy detect
  net: bcmgenet: soft reset 40nm EPHYs before MAC init
  net: phy: bcm7xxx: define soft_reset for 40nm EPHY
  net: bcmgenet: don't set phydev->link from MAC
  net: Update address for MediaTek ethernet driver in MAINTAINERS
  ipv4: fix race condition between route lookup and invalidation
  ...

net: reorder 'struct net' fields to avoid false sharing

Intel test robot reported a ~7% regression on TCP_CRR tests
that they bisected to the cited commit.

Indeed, every time a new TCP socket is created or deleted,
the atomic counter net->count is touched (via get_net(net)
and put_net(net) calls)

So cpus might have to reload a contended cache line in
net_hash_mix(net) calls.

We need to reorder 'struct net' fields to move @hash_mix
in a read mostly cache line.

We move in the first cache line fields that can be
dirtied often.

We probably will have to address in a followup patch
the __randomize_layout that was added in linux-4.13,
since this might break our placement choices.

Fixes: 355b98553789 ("netns: provide pure entropy for net_hash_mix()")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: fix switch tree list

If there are multiple switch trees on the device, only the last one
will be listed, because the arguments of list_add_tail are swapped.

Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: dwmac-sun8i: show message only when switching to promisc

Printing the info message every time more than the max number of mac
addresses are requested generates unnecessary log spam. Showing it only
when the hw is not already in promiscous mode is equally informative
without being annoying.

Signed-off-by: Mans Rullgard <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: add an error handling in aq_nic_set_multicast_list

add an error handling in aq_nic_set_multicast_list, it may not
work when hw_multicast_list_set error; and at the same time
it will remove gcc Wunused-but-set-variable warning.

Signed-off-by: Chenwandun <[email protected]>
Reviewed-by: Igor Russkikh <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'netem-fix-further-issues-with-packet-corruption'

Jakub Kicinski says:

====================
net: netem: fix further issues with packet corruption

This set is fixing two more issues with the netem packet corruption.

First patch (which was previously posted) avoids NULL pointer dereference
if the first frame gets freed due to allocation or checksum failure.
v2 improves the clarity of the code a little as requested by Cong.

Second patch ensures we don't return SUCCESS if the frame was in fact
dropped. Thanks to this commit message for patch 1 no longer needs the
"this will still break with a single-frame failure" disclaimer.
====================

Signed-off-by: David S. Miller <[email protected]>

net: netem: correct the parent's backlog when corrupted packet was dropped

If packet corruption failed we jump to finish_segs and return
NET_XMIT_SUCCESS. Seeing success will make the parent qdisc
increment its backlog, that's incorrect - we need to return
NET_XMIT_DROP.

Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: netem: fix error path for corrupted GSO frames

To corrupt a GSO frame we first perform segmentation. We then
proceed using the first segment instead of the full GSO skb and
requeue the rest of the segments as separate packets.

If there are any issues with processing the first segment we
still want to process the rest, therefore we jump to the
finish_segs label.

Commit 177b8007463c ("net: netem: fix backlog accounting for
corrupted GSO frames") started using the pointer to the first
segment in the "rest of segments processing", but as mentioned
above the first segment may had already been freed at this point.

Backlog corrections for parent qdiscs have to be adjusted.

Fixes: 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames")
Reported-by: kbuild test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Reported-by: Ben Hutchings <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

macb: propagate errors when getting optional clocks

The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver
marks clock as not available if it receives an error when trying to get
a clock. This is wrong, because a clock controller might return
-EPROBE_DEFER if a clock is not available, but will eventually become
available.

In these cases, the driver would probe successfully but will never be
able to adjust the clocks, because the clocks were not available during
probe, but became available later.

For example, the clock controller for the ZynqMP is implemented in the
PMU firmware and the clocks are only available after the firmware driver
has been probed.

Use devm_clk_get_optional() in instead of devm_clk_get() to get the
optional clock and propagate all errors to the calling function.

Signed-off-by: Michael Tretter <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Tested-by: Nicolas Ferre <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

xen/netback: fix error path of xenvif_connect_data()

xenvif_connect_data() calls module_put() in case of error. This is
wrong as there is no related module_get().

Remove the superfluous module_put().

Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down")
Cc: <[email protected]> # 3.12
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: fix mis-counting IRQ vector numbers issue

Currently, the num_msi_left means the vector numbers of NIC,
but if the PF supported RoCE, it contains the vector numbers
of NIC and RoCE(Not expected).

This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.

This patch adds a new variable num_nic_msi to store the vector
numbers of NIC, and adjust the default TQP numbers and rss_size
according to the value of num_nic_msi.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"Rather a lot of fixes, almost all affecting mm/"

* emailed patches from Andrew Morton <[email protected]>: (26 commits)
  scripts/gdb: fix debugging modules on s390
  kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
  mm/thp: allow dropping THP from page cache
  mm/vmscan.c: support removing arbitrary sized pages from mapping
  mm/thp: fix node page state in split_huge_page_to_list()
  proc/meminfo: fix output alignment
  mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
  mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
  mm: include <linux/huge_mm.h> for is_vma_temporary_stack
  zram: fix race between backing_dev_show and backing_dev_store
  mm/memcontrol: update lruvec counters in mem_cgroup_move_account
  ocfs2: fix panic due to ocfs2_wq is null
  hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
  mm: memblock: do not enforce current limit for memblock_phys* family
  mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
  mm/gup: fix a misnamed "write" argument, and a related bug
  mm/gup_benchmark: add a missing "w" to getopt string
  ocfs2: fix error handling in ocfs2_setattr()
  mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
  mm/memunmap: don't access uninitialized memmap in memunmap_pages()
  ...

scripts/gdb: fix debugging modules on s390

Currently lx-symbols assumes that module text is always located at
module->core_layout->base, but s390 uses the following layout:

  +------+  <- module->core_layout->base
  | GOT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset
  | PLT  |
  +------+  <- module->core_layout->base + module->arch->plt_offset +
  | TEXT |     module->arch->plt_size
  +------+

Therefore, when trying to debug modules on s390, all the symbol
addresses are skewed by plt_offset + plt_size.

Fix by adding plt_offset + plt_size to module_addr in
load_module_symbols().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Jan Kiszka <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register

Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries.  On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.

However, the regroup is broken For perf_event based trace_uprobe.  This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER.  The second call will split the PMD mapped
page table entry, which is not the desired behavior.

Fix this by only use FOLL_SPLIT_PMD for uprobe register case.

Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/thp: allow dropping THP from page cache

Once a THP is added to the page cache, it cannot be dropped via
/proc/sys/vm/drop_caches. Fix this issue with proper handling in
invalidate_mapping_pages().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/vmscan.c: support removing arbitrary sized pages from mapping

__remove_mapping() assumes that pages can only be either base pages or
HPAGE_PMD_SIZE. Ask the page what size it is.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: William Kucharski <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/thp: fix node page state in split_huge_page_to_list()

Make sure split_huge_page_to_list() handles the state of shmem THP and
file THP properly.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc/meminfo: fix output alignment

Patch series "Fixes for THP in page cache", v2.

This patch (of 5):

Add extra space for FileHugePages and FilePmdMapped, so the output is
aligned with other rows.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 60fbf0ab5da1 ("mm,thp: stats for file backed THP")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch

mm_init.c needs to include <linux/mman.h> for the definition of
vm_committed_as_batch. Fixes the following sparse warning:

mm/mm_init.c:141:5: warning: symbol 'vm_committed_as_batch' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition

The generic_file_vm_ops is defined in <linux/ramfs.h> so include it to
fix the following warning:

mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ben Dooks <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: include <linux/huge_mm.h> for is_vma_temporary_stack

Include <linux/huge_mm.h> for the definition of is_vma_temporary_stack
to fix the following sparse warning:

mm/rmap.c:1673:6: warning: symbol 'is_vma_temporary_stack' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ben Dooks <[email protected]>
Reviewed-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

zram: fix race between backing_dev_show and backing_dev_store

CPU0:        CPU1:
backing_dev_show        backing_dev_store
    ......    ......
    file = zram->backing_dev;
    down_read(&zram->init_lock);    down_read(&zram->init_init_lock)
    file_path(file, ...);    zram->backing_dev = backing_dev;
    up_read(&zram->init_lock);    up_read(&zram->init_lock);

gets the value of zram->backing_dev too early in backing_dev_show, which
resultin the value being NULL at the beginning, and not NULL later.

backtrace:
  d_path+0xcc/0x174
  file_path+0x10/0x18
  backing_dev_show+0x40/0xb4
  dev_attr_show+0x20/0x54
  sysfs_kf_seq_show+0x9c/0x10c
  kernfs_seq_show+0x28/0x30
  seq_read+0x184/0x488
  kernfs_fop_read+0x5c/0x1a4
  __vfs_read+0x44/0x128
  vfs_read+0xa0/0x138
  SyS_read+0x54/0xb4

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chenwandun <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]> [4.14+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memcontrol: update lruvec counters in mem_cgroup_move_account

Mapped, dirty and writeback pages are also counted in per-lruvec stats.
These counters needs update when page is moved between cgroups.

Currently is nobody *consuming* the lruvec versions of these counters and
that there is no user-visible effect.

Link: http://lkml.kernel.org/r/157112699975.7360.1062614888388489788.stgit@buzz
Fixes: 00f3ca2c2d66 ("mm: memcontrol: per-lruvec stats infrastructure")
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix panic due to ocfs2_wq is null

mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

  Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
  ------------[ cut here ]------------
  Oops: 0002 [#1] SMP NOPTI
  CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
        4.14.148-200.ckv.x86_64 #1
  Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
  task: ffff967af0520000 task.stack: ffffa5f05484000
  RIP: 0010:mutex_lock+0x19/0x20
  Call Trace:
    flush_workqueue+0x81/0x460
    ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
    ocfs2_dismount_volume+0x84/0x400 [ocfs2]
    ocfs2_fill_super+0xa4/0x1270 [ocfs2]
    ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
    mount_bdev+0x17f/0x1c0
    mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yi Li <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

Let's make sure that we only consider online memory (managed by the
buddy) that has initialized memmaps.  ZONE_DEVICE is not applicable.

page_zone() will call page_to_nid(), which will trigger
VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) with CONFIG_PAGE_POISONING
and CONFIG_DEBUG_VM_PGFLAGS when called on uninitialized memmaps.  This
can be the case when an offline memory block (e.g., never onlined) is
spanned by a zone.

Note: As explained by Michal in [1], alloc_contig_range() will verify
the range.  So it boils down to the wrong access in this function.

[1] http://lkml.kernel.org/r/20180423000943 [email protected]

Link: http://lkml.kernel.org/r/[email protected]
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: <[email protected]> [4.13+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memblock: do not enforce current limit for memblock_phys* family

Until commit 92d12f9544b7 ("memblock: refactor internal allocation
functions") the maximal address for memblock allocations was forced to
memblock.current_limit only for the allocation functions returning
virtual address.  The changes introduced by that commit moved the limit
enforcement into the allocation core and as a result the allocation
functions returning physical address also started to limit allocations
to memblock.current_limit.

This caused breakage of etnaviv GPU driver:

  etnaviv etnaviv: bound 130000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
  etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
  etnaviv-gpu 130000.gpu: model: GC2000, revision: 5108
  etnaviv-gpu 130000.gpu: command buffer outside valid memory window
  etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
  etnaviv-gpu 134000.gpu: command buffer outside valid memory window
  etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
  etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0

Restore the behaviour of memblock_phys* family so that these functions
will not enforce memblock.current_limit.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions")
Signed-off-by: Mike Rapoport <[email protected]>
Reported-by: Adam Ford <[email protected]>
Tested-by: Adam Ford <[email protected]> [imx6q-logicpd]
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: Lucas Stach <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size

Commit 1a61ab8038e72 ("mm: memcontrol: replace zone summing with
lruvec_page_state()") has made lruvec_page_state to use per-cpu counters
instead of calculating it directly from lru_zone_size with an idea that
this would be more effective.

Tim has reported that this is not really the case for their database
benchmark which is showing an opposite results where lruvec_page_state
is taking up a huge chunk of CPU cycles (about 25% of the system time
which is roughly 7% of total cpu cycles) on 5.3 kernels.  The workload
is running on a larger machine (96cpus), it has many cgroups (500) and
it is heavily direct reclaim bound.

Tim Chen said:

: The problem can also be reproduced by running simple multi-threaded
: pmbench benchmark with a fast Optane SSD swap (see profile below).
:
:
: 6.15%     3.08%  pmbench          [kernel.vmlinux]            [k] lruvec_lru_size
:             |
:             |--3.07%--lruvec_lru_size
:             |          |
:             |          |--2.11%--cpumask_next
:             |          |          |
:             |          |           --1.66%--find_next_bit
:             |          |
:             |           --0.57%--call_function_interrupt
:             |                     |
:             |                      --0.55%--smp_call_function_interrupt
:             |
:             |--1.59%--0x441f0fc3d009
:             |          _ops_rdtsc_init_base_freq
:             |          access_histogram
:             |          page_fault
:             |          __do_page_fault
:             |          handle_mm_fault
:             |          __handle_mm_fault
:             |          |
:             |           --1.54%--do_swap_page
:             |                     swapin_readahead
:             |                     swap_cluster_readahead
:             |                     |
:             |                      --1.53%--read_swap_cache_async
:             |                                __read_swap_cache_async
:             |                                alloc_pages_vma
:             |                                __alloc_pages_nodemask
:             |                                __alloc_pages_slowpath
:             |                                try_to_free_pages
:             |                                do_try_to_free_pages
:             |                                shrink_node
:             |                                shrink_node_memcg
:             |                                |
:             |                                |--0.77%--lruvec_lru_size
:             |                                |
:             |                                 --0.76%--inactive_list_is_low
:             |                                           |
:             |                                            --0.76%--lruvec_lru_size
:             |
:              --1.50%--measure_read
:                        page_fault
:                        __do_page_fault
:                        handle_mm_fault
:                        __handle_mm_fault
:                        do_swap_page
:                        swapin_readahead
:                        swap_cluster_readahead
:                        |
:                         --1.48%--read_swap_cache_async
:                                   __read_swap_cache_async
:                                   alloc_pages_vma
:                                   __alloc_pages_nodemask
:                                   __alloc_pages_slowpath
:                                   try_to_free_pages
:                                   do_try_to_free_pages
:                                   shrink_node
:                                   shrink_node_memcg
:                                   |
:                                   |--0.75%--inactive_list_is_low
:                                   |          |
:                                   |           --0.75%--lruvec_lru_size
:                                   |
:                                    --0.73%--lruvec_lru_size

The likely culprit is the cache traffic the lruvec_page_state_local
generates.  Dave Hansen says:

: I was thinking purely of the cache footprint.  If it's reading
: pn->lruvec_stat_local->count[idx] is three separate cachelines, so 192
: bytes of cache *96 CPUs = 18k of data, mostly read-only.  1 cgroup would
: be 18k of data for the whole system and the caching would be pretty
: efficient and all 18k would probably survive a tight page fault loop in
: the L1.  500 cgroups would be ~90k of data per CPU thread which doesn't
: fit in the L1 and probably wouldn't survive a tight page fault loop if
: both logical threads were banging on different cgroups.
:
: It's just a theory, but it's why I noted the number of cgroups when I
: initially saw this show up in profiles

Fix the regression by partially reverting the said commit and calculate
the lru size explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1a61ab8038e72 ("mm: memcontrol: replace zone summing with lruvec_page_state()")
Signed-off-by: Honglei Wang <[email protected]>
Reported-by: Tim Chen <[email protected]>
Acked-by: Tim Chen <[email protected]>
Tested-by: Tim Chen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]> [5.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: fix a misnamed "write" argument, and a related bug

In several routines, the "flags" argument is incorrectly named "write".
Change it to "flags".

Also, in one place, the misnaming led to an actual bug:
"flags & FOLL_WRITE" is required, rather than just "flags".
(That problem was flagged by krobot, in v1 of this patch.)

Also, change the flags argument from int, to unsigned int.

You can see that this was a simple oversight, because the
calling code passes "flags" to the fifth argument:

gup_pgd_range():
    ...
    if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr,
    PGDIR_SHIFT, next, flags, pages, nr))

...which, until this patch, the callees referred to as "write".

Also, change two lines to avoid checkpatch line length
complaints, and another line to fix another oversight
that checkpatch called out: missing "int" on pdshift.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: b798bec4741b ("mm/gup: change write parameter to flags in fast walk")
Signed-off-by: John Hubbard <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Suggested-by: Kirill A. Shutemov <[email protected]>
Suggested-by: Ira Weiny <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup_benchmark: add a missing "w" to getopt string

Even though gup_benchmark.c has code to handle the -w command-line option,
the "w" is not part of the getopt string.  It looks as if it has been
missing the whole time.

On my machine, this leads naturally to the following predictable result:

  $ sudo ./gup_benchmark -w
  ./gup_benchmark: invalid option -- 'w'

...which is fixed with this commit.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: kbuild test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix error handling in ocfs2_setattr()

Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL on error case before
jumping to do dqput().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chengguang Xu <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>