Git Repo - linux.git/log

Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fix from Thomas Gleixner:
"Add warnings to the smp function calls so callers from wrong contexts
get detected"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Warn on function calls from softirq context

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CONFIG_PREEMPT_RT stub config from Thomas Gleixner:
"The real-time preemption patch set exists for almost 15 years now and
  while the vast majority of infrastructure and enhancements have found
  their way into the mainline kernel, the final integration of RT is
  still missing.

  Over the course of the last few years, we have worked on reducing the
  intrusivenness of the RT patches by refactoring kernel infrastructure
  to be more real-time friendly. Almost all of these changes were
  benefitial to the mainline kernel on their own, so there was no
  objection to integrate them.

  Though except for the still ongoing printk refactoring, the remaining
  changes which are required to make RT a first class mainline citizen
  are not longer arguable as immediately beneficial for the mainline
  kernel. Most of them are either reordering code flows or adding RT
  specific functionality.

  But this now has hit a wall and turned into a classic hen and egg
  problem:

     Maintainers are rightfully wary vs. these changes as they make only
     sense if the final integration of RT into the mainline kernel takes
     place.

  Adding CONFIG_PREEMPT_RT aims to solve this as a clear sign that RT
  will be fully integrated into the mainline kernel. The final
  integration of the missing bits and pieces will be of course done with
  the same careful approach as we have used in the past.

  While I'm aware that you are not entirely enthusiastic about that, I
  think that RT should receive the same treatment as any other widely
  used out of tree functionality, which we have accepted into mainline
  over the years.

  RT has become the de-facto standard real-time enhancement and is
  shipped by enterprise, embedded and community distros. It's in use
  throughout a wide range of industries: telecommunications, industrial
  automation, professional audio, medical devices, data acquisition,
  automotive - just to name a few major use cases.

  RT development is backed by a Linuxfoundation project which is
  supported by major stakeholders of this technology. The funding will
  continue over the actual inclusion into mainline to make sure that the
  functionality is neither introducing regressions, regressing itself,
  nor becomes subject to bitrot. There is also a lifely user community
  around RT as well, so contrary to the grim situation 5 years ago, it's
  a healthy project.

  As RT is still a good vehicle to exercise rarely used code paths and
  to detect hard to trigger issues, you could at least view it as a QA
  tool if nothing else"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
"Mostly bugfixes, but also:

   - s390 support for KVM selftests

   - LAPIC timer offloading to housekeeping CPUs

   - Extend an s390 optimization for overcommitted hosts to all
     architectures

   - Debugging cleanups and improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: x86: Add fixed counters to PMU filter
  KVM: nVMX: do not use dangling shadow VMCS after guest reset
  KVM: VMX: dump VMCS on failed entry
  KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
  KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
  KVM: Boost vCPUs that are delivering interrupts
  KVM: selftests: Remove superfluous define from vmx.c
  KVM: SVM: Fix detection of AMD Errata 1096
  KVM: LAPIC: Inject timer interrupt via posted interrupt
  KVM: LAPIC: Make lapic timer unpinned
  KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
  KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
  kvm: x86: ioapic and apic debug macros cleanup
  kvm: x86: some tsc debug cleanup
  kvm: vmx: fix coccinelle warnings
  x86: kvm: avoid constant-conversion warning
  x86: kvm: avoid -Wsometimes-uninitized warning
  KVM: x86: expose AVX512_BF16 feature to guest
  KVM: selftests: enable pgste option for the linker on s390
  KVM: selftests: Move kvm_create_max_vcpus test to generic code
  ...

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is the final round of mostly small fixes in our initial submit.

  It's mostly minor fixes and driver updates. The only change of note is
  adding a virt_boundary_mask to the SCSI host and host template to
  parametrise this for NVMe devices instead of having them do a call in
  slave_alloc. It's a fairly straightforward conversion except in the
  two NVMe handling drivers that didn't set it who now have a virtual
  infinity parameter added"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
  scsi: megaraid_sas: set an unlimited max_segment_size
  scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
  scsi: IB/srp: set virt_boundary_mask in the scsi host
  scsi: IB/iser: set virt_boundary_mask in the scsi host
  scsi: storvsc: set virt_boundary_mask in the scsi host template
  scsi: ufshcd: set max_segment_size in the scsi host template
  scsi: core: take the DMA max mapping size into account
  scsi: core: add a host / host template field for the virt boundary
  scsi: core: Fix race on creating sense cache
  scsi: sd_zbc: Fix compilation warning
  scsi: libfc: fix null pointer dereference on a null lport
  scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
  scsi: zfcp: fix request object use-after-free in send path causing wrong traces
  scsi: zfcp: fix request object use-after-free in send path causing seqno errors
  scsi: megaraid_sas: Update driver version to 07.710.50.00
  scsi: megaraid_sas: Add module parameter for FW Async event logging
  scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers
  scsi: megaraid_sas: Fix calculation of target ID
  scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE
  scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade
  ...

Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

- match the directory structure of the linux-libc-dev package to that
   of Debian-based distributions

- fix incorrect include/config/auto.conf generation when Kconfig
   creates it along with the .config file

- remove misleading $(AS) from documents

- clean up precious tag files by distclean instead of mrproper

- add a new coccinelle patch for devm_platform_ioremap_resource
   migration

- refactor module-related scripts to read modules.order instead of
   $(MODVERDIR)/*.mod files to get the list of created modules

- remove MODVERDIR

- update list of header compile-test

- add -fcf-protection=none flag to avoid conflict with the retpoline
   flags when CONFIG_RETPOLINE=y

- misc cleanups

* tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
  kbuild: add -fcf-protection=none when using retpoline flags
  kbuild: update compile-test header list for v5.3-rc1
  kbuild: split out *.mod out of {single,multi}-used-m rules
  kbuild: remove 'prepare1' target
  kbuild: remove the first line of *.mod files
  kbuild: create *.mod with full directory path and remove MODVERDIR
  kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
  kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
  kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
  kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
  scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
  kbuild: remove duplication from modules.order in sub-directories
  kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
  kbuild: do not create empty modules.order in the prepare stage
  coccinelle: api: add devm_platform_ioremap_resource script
  kbuild: compile-test headers listed in header-test-m as well
  kbuild: remove unused hostcc-option
  kbuild: remove tag files by distclean instead of mrproper
  kbuild: add --hash-style= and --build-id unconditionally
  kbuild: get rid of misleading $(AS) from documents
  ...

Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dcache and mountpoint updates from Al Viro:
"Saner handling of refcounts to mountpoints.

  Transfer the counting reference from struct mount ->mnt_mountpoint
  over to struct mountpoint ->m_dentry. That allows us to get rid of the
  convoluted games with ordering of mount shutdowns.

  The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
  mixed-filesystem shrink lists, which we'll also need for the Slab
  Movable Objects patchset"

* 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the remnants of releasing the mountpoint away from fs_pin
  get rid of detach_mnt()
  make struct mountpoint bear the dentry reference to mountpoint, not struct mount
  Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
  fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
  __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
  nfs: dget_parent() never returns NULL
  ceph: don't open-code the check for dead lockref

smp: Warn on function calls from softirq context

It's clearly documented that smp function calls cannot be invoked from
softirq handling context. Unfortunately nothing enforces that or emits a
warning.

A single function call can be invoked from softirq context only via
smp_call_function_single_async().

The only legit context is task context, so add a warning to that effect.

Reported-by: luferry <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

KVM: x86: Add fixed counters to PMU filter

Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist
fixed counters.

Signed-off-by: Eric Hankland <[email protected]>
[No need to check padding fields for zero. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: do not use dangling shadow VMCS after guest reset

If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.

This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.

Reported-by: Jan Kiszka <[email protected]>
Cc: Liran Alon <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: dump VMCS on failed entry

This is useful for debugging, and is ratelimited nowadays.

Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed

If a perf_event creation fails due to any reason of the host perf
subsystem, it has no chance to log the corresponding event for guest
which may cause abnormal sampling data in guest result. In debug mode,
this message helps to understand the state of vPMC and we may not
limit the number of occurrences but not in a spamming style.

Suggested-by: Joe Perches <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup

Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup().

Suggested-by: Paolo Bonzini <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Boost vCPUs that are delivering interrupts

Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during
vcpu wakeup and interrupt delivery), we want to also boost not just
lock holders but also vCPUs that are delivering interrupts. Most
smp_call_function_many calls are synchronous, so the IPI target vCPUs
are also good yield candidates.  This patch introduces vcpu->ready to
boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
(VT-d PI handles voluntary preemption separately, in pi_pre_block).

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

            vanilla     boosting    improved
1VM          21443       23520         9%
2VM           2800        8000       180%
3VM           1800        3100        72%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)

            vanilla     boosting   improved
              1570         4000      155%

w/ boosting + w/ pv sched yield(vanilla)

            vanilla     boosting   improved
              1844         5157      179%

w/o boosting, perf top in VM:

72.33%  [kernel]       [k] smp_call_function_many
  4.22%  [kernel]       [k] call_function_i
  3.71%  [kernel]       [k] async_page_fault

w/ boosting, perf top in VM:

38.43%  [kernel]       [k] smp_call_function_many
  6.31%  [kernel]       [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]       [k] call_function_interrupt

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Marc Zyngier <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Remove superfluous define from vmx.c

The code in vmx.c does not use "program_invocation_name", so there
is no need to "#define _GNU_SOURCE" here.

Signed-off-by: Thomas Huth <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: Fix detection of AMD Errata 1096

When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is
possible that CPU microcode implementing DecodeAssist will fail
to read bytes of instruction which caused #NPF. This is AMD errata
1096 and it happens because CPU microcode reading instruction bytes
incorrectly attempts to read code as implicit supervisor-mode data
accesses (that is, just like it would read e.g. a TSS), which are
susceptible to SMAP faults. The microcode reads CS:RIP and if it is
a user-mode address according to the page tables, the processor
gives up and returns no instruction bytes.  In this case,
GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly
return 0 instead of the correct guest instruction bytes.

Current KVM code attemps to detect and workaround this errata, but it
has multiple issues:

1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1,
which is required for encountering a SMAP fault.

2) It assumes SMAP faults can only occur when guest CPL==3.
However, in case guest CR4.SMEP=0, the guest can execute an instruction
which reside in a user-accessible page with CPL<3 priviledge. If this
instruction raise a #NPF on it's data access, then CPU DecodeAssist
microcode will still encounter a SMAP violation.  Even though no sane
OS will do so (as it's an obvious priviledge escalation vulnerability),
we still need to handle this semanticly correct in KVM side.

Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy
triggerable condition and guests usually enable SMAP together with SMEP.
If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt
at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest
instead of #NPF.  We keep this condition to avoid false positives in
the detection of the errata.

In addition, to avoid future confusion and improve code readbility,
include details of the errata in code and not just in commit message.

Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Cc: Singh Brijesh <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Reviewed-by: Brijesh Singh <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: LAPIC: Inject timer interrupt via posted interrupt

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside.  There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits.  This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially.  Without patch

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

While with patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kbuild: add -fcf-protection=none when using retpoline flags

The gcc -fcf-protection=branch option is not compatible with
-mindirect-branch=thunk-extern. The latter is used when
CONFIG_RETPOLINE is selected, and this will fail to build with
a gcc which has -fcf-protection=branch enabled by default. Adding
-fcf-protection=none when building with retpoline enabled
prevents such build failures.

Signed-off-by: Seth Forshee <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: update compile-test header list for v5.3-rc1

- Some headers graduated from the blacklist

- hyperv_timer.h joined the header-test when CONFIG_X86=y

- nf_tables*.h joined the header-test when CONFIG_NF_TABLES is
   enabled.

- The entry for nf_tables_offload.h was added to fix build error for
   the combination of CONFIG_NF_TABLES=n and CONFIG_KERNEL_HEADER_TEST=y.

- The entry for iomap.h was added because this header is supposed to
   be included only when CONFIG_BLOCK=y

Signed-off-by: Masahiro Yamada <[email protected]>

Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC defconfig updates from Olof Johansson:
"We keep this in a separate branch to avoid cross-branch conflicts, but
  most of the material here is fairly boring -- some new drivers turned
  on for hardware since they were merged, and some refreshed files due
  to time having moved a lot of entries around"

* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  ARM: configs: multi_v5: Remove duplicate ASPEED options
  arm64: defconfig: Enable CONFIG_KEYBOARD_SNVS_PWRKEY as module
  ARM: imx_v6_v7_defconfig: Enable CONFIG_ARM_IMX_CPUFREQ_DT
  defconfig: arm64: enable i.MX8 SCU octop driver
  arm64: defconfig: Add i.MX SCU SoC info driver
  arm64: defconfig: Enable CONFIG_QORIQ_THERMAL
  ARM: imx_v6_v7_defconfig: Select CONFIG_NVMEM_SNVS_LPGPR
  arm64: defconfig: ARM_IMX_CPUFREQ_DT=m
  ARM: imx_v6_v7_defconfig: Add TPM PWM support by default
  ARM: imx_v6_v7_defconfig: Enable the OV2680 camera driver
  ARM: imx_v6_v7_defconfig: Enable CONFIG_THERMAL_STATISTICS
  arm64: defconfig: NVMEM_IMX_OCOTP=y for imx8m
  ARM: multi_v7_defconfig: enable STMFX pinctrl support
  arm64 defconfig: enable LVM support
  ARM: configs: multi_v5: Add more ASPEED devices
  arm64: defconfig: Add Tegra194 PCIe driver
  ARM: configs: aspeed: Add new drivers
  ARM: exynos_defconfig: Enable Panfrost and Lima drivers
  ARM: multi_v7_defconfig: Enable Panfrost and Lima drivers
  arm64 defconfig: enable Mellanox cards
  ...

Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM Devicetree updates from Olof Johansson:
"We continue to see a lot of new material. I've highlighted some of it
  below, but there's been more beyond that as well.

  One of the sweeping changes is that many boards have seen their ARM
  Mali GPU devices added to device trees, since the DRM drivers have now
  been merged.

  So, with the caveat that I have surely missed several great
  contributions, here's a collection of the material this time around:

  New SoCs:

   - Mediatek mt8183 (4x Cortex-A73 + 4x Cortex-A53)

   - TI J721E (2x Cortex-A72 + 3x Cortex-R5F + 3 DSPs + MMA)

   - Amlogic G12B (4x Cortex-A73 + 2x Cortex-A53)

  New Boards / platforms:

   - Aspeed BMC support for a number of new server platforms

   - Kontron SMARC SoM (several i.MX6 versions)

   - Novtech's Meerkat96 (i.MX7)

   - ST Micro Avenger96 board

   - Hardkernel ODROID-N2 (Amlogic G12B)

   - Purism Librem5 devkit (i.MX8MQ)

   - Google Cheza (Qualcomm SDM845)

   - Qualcomm Dragonboard 845c (Qualcomm SDM845)

   - Hugsun X99 TV Box (Rockchip RK3399)

   - Khadas Edge/Edge-V/Captain (Rockchip RK3399)

  Updated / expanded boards and platforms:

   - Renesas r7s9210 has a lot of new peripherals added

   - Fixes and polish for Rockchip-based Chromebooks

   - Amlogic G12A has a lot of peripherals added

   - Nvidia Jetson Nano sees various fixes and improvements, and is now
     at feature parity with TX1"

* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (586 commits)
  ARM: dts: gemini: Set DIR-685 SPI CS as active low
  ARM: dts: exynos: Adjust buck[78] regulators to supported values on Arndale Octa
  ARM: dts: exynos: Adjust buck[78] regulators to supported values on Odroid XU3 family
  ARM: dts: exynos: Move Mali400 GPU node to "/soc"
  ARM: dts: exynos: Fix imprecise abort on Mali GPU probe on Exynos4210
  arm64: dts: qcom: qcs404: Add missing space for cooling-cells property
  arm64: dts: rockchip: Fix USB3 Type-C on rk3399-sapphire
  arm64: dts: rockchip: Update DWC3 modules on RK3399 SoCs
  arm64: dts: rockchip: enable rk3328 watchdog clock
  ARM: dts: rockchip: add display nodes for rk322x
  ARM: dts: rockchip: fix vop iommu-cells on rk322x
  arm64: dts: rockchip: Add support for Hugsun X99 TV Box
  arm64: dts: rockchip: Define values for the IPA governor for rock960
  arm64: dts: rockchip: Fix multiple thermal zones conflict in rk3399.dtsi
  arm64: dts: rockchip: add core dtsi file for RK3399Pro SoCs
  arm64: dts: rockchip: improve rk3328-roc-cc rgmii performance.
  Revert "ARM: dts: rockchip: set PWM delay backlight settings for Minnie"
  ARM: dts: rockchip: Configure BT_DEV_WAKE in on rk3288-veyron
  arm64: dts: qcom: sdm845-cheza: add initial cheza dt
  ARM: dts: msm8974-FP2: Add vibration motor
  ...

Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC-related driver updates from Olof Johansson:
"Various driver updates for platforms and a couple of the small driver
  subsystems we merge through our tree:

   - A driver for SCU (system control) on NXP i.MX8QXP

   - Qualcomm Always-on Subsystem messaging driver (AOSS QMP)

   - Qualcomm PM support for MSM8998

   - Support for a newer version of DRAM PHY driver for Broadcom (DPFE)

   - Reset controller support for Bitmain BM1880

   - TI SCI (System Control Interface) support for CPU control on AM654
     processors

   - More TI sysc refactoring and rework"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (84 commits)
  reset: remove redundant null check on pointer dev
  soc: rockchip: work around clang warning
  dt-bindings: reset: imx7: Fix the spelling of 'indices'
  soc: imx: Add i.MX8MN SoC driver support
  soc: aspeed: lpc-ctrl: Fix probe error handling
  soc: qcom: geni: Add support for ACPI
  firmware: ti_sci: Fix gcc unused-but-set-variable warning
  firmware: ti_sci: Use the correct style for SPDX License Identifier
  soc: imx8: Use existing of_root directly
  soc: imx8: Fix potential kernel dump in error path
  firmware/psci: psci_checker: Park kthreads before stopping them
  memory: move jedec_ddr.h from include/memory to drivers/memory/
  memory: move jedec_ddr_data.c from lib/ to drivers/memory/
  MAINTAINERS: Remove myself as qcom maintainer
  soc: aspeed: lpc-ctrl: make parameter optional
  soc: qcom: apr: Don't use reg for domain id
  soc: qcom: fix QCOM_AOSS_QMP dependency and build errors
  memory: tegra: Fix -Wunused-const-variable
  firmware: tegra: Early resume BPMP
  soc/tegra: Select pinctrl for Tegra194
  ...

Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC platform updates from Olof Johansson:
"SoC platform changes. Main theme this merge window:

   - The Netx platform (Netx 100/500) platform is removed by Linus
     Walleij-- the SoC doesn't have active maintainers with hardware,
     and in discussions with the vendor the agreement was that it's OK
     to remove.

   - Russell King has a series of patches that cleans up and refactors
     SA1101 and RiscPC support"

* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  ARM: stm32: use "depends on" instead of "if" after prompt
  ARM: sa1100: convert to common clock framework
  ARM: exynos: Cleanup cppcheck shifting warning
  ARM: pxa/lubbock: remove lubbock_set_misc_wr() from global view
  ARM: exynos: Only build MCPM support if used
  arm: add missing include platform-data/atmel.h
  ARM: davinci: Use GPIO lookup table for DA850 LEDs
  ARM: OMAP2: drop explicit assembler architecture
  ARM: use arch_extension directive instead of arch argument
  ARM: imx: Switch imx7d to imx-cpufreq-dt for speed-grading
  ARM: bcm: Enable PINCTRL for ARCH_BRCMSTB
  ARM: bcm: Enable ARCH_HAS_RESET_CONTROLLER for ARCH_BRCMSTB
  ARM: riscpc: enable chained scatterlist support
  ARM: riscpc: reduce IRQ handling code
  ARM: riscpc: move RiscPC assembly files from arch/arm/lib to mach-rpc
  ARM: riscpc: parse video information from tagged list
  ARM: riscpc: add ecard quirk for Atomwide 3port serial card
  MAINTAINERS: mvebu: Add git entry
  soc: ti: pm33xx: Add a print while entering RTC only mode with DDR in self-refresh
  ARM: OMAP2+: Make some variables static
  ...

Merge tag 'drm-next-2019-07-19' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"Dave is back in shape, but now family got it so I'm doing the pull.
  Two things worthy of note:

   - nouveau feature pull was way too late, Dave&me decided to not take
     that, so Ben spun up a pull with just the fixes.

   - after some chatting with the arm display maintainers we decided to
     change a bit how that's maintained, for more oversight/review and
     cross vendor collab.

  More details below:

  nouveau:
   - bugfixes
   - TU116 enabling (minor iteration) :w

  amdgpu:
   - large pile of fixes for new hw support this release (navi, vega20)
   - audio hotplug fix
   - bunch of corner cases and small fixes all over for amdgpu/kfd

  komeda:
   - back out some new properties (from this merge window) that needs
     more pondering.

  bochs:
   - fb pitch setup

  core:
   - a new panel quirk
   - misc fixes"

* tag 'drm-next-2019-07-19' of git://anongit.freedesktop.org/drm/drm: (73 commits)
  drm/nouveau/secboot/gp102-: remove WAR for SEC2 RTOS start bug
  drm/nouveau/flcn/gp102-: improve implementation of bind_context() on SEC2/GSP
  drm/nouveau: fix memory leak in nouveau_conn_reset()
  drm/nouveau/dmem: missing mutex_lock in error path
  drm/nouveau/hwmon: return EINVAL if the GPU is powered down for sensors reads
  drm/nouveau: fix bogus GPL-2 license header
  drm/nouveau: fix bogus GPL-2 license header
  drm/nouveau/i2c: Enable i2c pads & busses during preinit
  drm/nouveau/disp/tu102-: wire up scdc parameter setter
  drm/nouveau/core: recognise TU116 chipset
  drm/nouveau/kms: disallow dual-link harder if hdmi connection detected
  drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling
  drm/nouveau/disp/nv50-: force scaler for any non-default LVDS/eDP modes
  drm/nouveau/mcp89/mmu: Use mcp77_mmu_new instead of g84_mmu_new on MCP89.
  drm/amd/display: init res_pool dccg_ref, dchub_ref with xtalin_freq
  drm/amdgpu/pm: remove check for pp funcs in freq sysfs handlers
  drm/amd/display: Force uclk to max for every state
  drm/amdkfd: Remove GWS from process during uninit
  drm/amd/amdgpu: Fix offset for vmid selection in debugfs interface
  drm/amd/powerplay: update vega20 driver if to fit latest SMU firmware
  ...

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Fix missed wake-up race in padata

- Use crypto_memneq in ccp

- Fix version check in ccp

- Fix fuzz test failure in ccp

- Fix potential double free in crypto4xx

- Fix compile warning in stm32

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  crypto: ccp - Fix SEV_VERSION_GREATER_OR_EQUAL
  crypto: ccp/gcm - use const time tag comparison.
  crypto: ccp - memset structure fields to zero before reuse
  crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
  crypto: stm32/hash - Fix incorrect printk modifier for size_t

Remove references to dead website.

This fell into disrepair a while ago, and the majority of hits to the
snapshots were from bots, so it's more trouble to keep running than it's worth.

Signed-off-by: Dave Jones <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'trace-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Eiichi Tsukata found a small bug from the fixup of the stack code

  Removing ULONG_MAX as the marker for the user stack trace end, made
  the tracing code not know where the end is. The end is now marked with
  a zero (NULL) pointer. Eiichi fixed this in the tracing code"

* tag 'trace-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix user stack trace "??" output

Merge tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux

Pull arch/csky pupdates from Guo Ren:
"This round of csky subsystem gives two features (ASID algorithm
  update, Perf pmu record support) and some fixups.

  ASID updates:
   - Revert mmu ASID mechanism
   - Add new asid lib code from arm
   - Use generic asid algorithm to implement switch_mm
   - Improve tlb operation with help of asid

  Perf pmu record support:
   - Init pmu as a device
   - Add count-width property for csky pmu
   - Add pmu interrupt support
   - Fix perf record in kernel/user space
   - dt-bindings: Add csky PMU bindings

  Fixes:
   - Fixup no panic in kernel for some traps
   - Fixup some error count in 810 & 860.
   - Fixup abiv1 memset error"

* tag 'csky-for-linus-5.3-rc1' of git://github.com/c-sky/csky-linux:
  csky: Fixup abiv1 memset error
  csky: Improve tlb operation with help of asid
  csky: Use generic asid algorithm to implement switch_mm
  csky: Add new asid lib code from arm
  csky: Revert mmu ASID mechanism
  dt-bindings: csky: Add csky PMU bindings
  dt-bindings: interrupt-controller: Update csky mpintc
  csky: Fixup some error count in 810 & 860.
  csky: Fix perf record in kernel/user space
  csky: Add pmu interrupt support
  csky: Add count-width property for csky pmu
  csky: Init pmu as a device
  csky: Fixup no panic in kernel for some traps
  csky: Select intc & timer drivers

Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
"Fixes and features:

   - A series to introduce a common command line parameter for disabling
     paravirtual extensions when running as a guest in virtualized
     environment

   - A fix for int3 handling in Xen pv guests

   - Removal of the Xen-specific tmem driver as support of tmem in Xen
     has been dropped (and it was experimental only)

   - A security fix for running as Xen dom0 (XSA-300)

   - A fix for IRQ handling when offlining cpus in Xen guests

   - Some small cleanups"

* tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: let alloc_xenballooned_pages() fail if not enough memory free
  xen/pv: Fix a boot up hang revealed by int3 self test
  x86/xen: Add "nopv" support for HVM guest
  x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
  xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
  x86: Add "nopv" parameter to disable PV extensions
  x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
  xen: remove tmem driver
  Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
  xen/events: fix binding user event channels to cpus

Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap split/cleanup from Darrick Wong:
"As promised, here's the second part of the iomap merge for 5.3, in
  which we break up iomap.c into smaller files grouped by functional
  area so that it'll be easier in the long run to maintain cohesiveness
  of code units and to review incoming patches. There are no functional
  changes and fs/iomap.c split cleanly.

  Summary:

   - Regroup the fs/iomap.c code by major functional area so that we can
     start development for 5.4 from a more stable base"

* tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move internal declarations into fs/iomap/
  iomap: move the main iteration code into a separate file
  iomap: move the buffered IO code into a separate file
  iomap: move the direct IO code into a separate file
  iomap: move the SEEK_HOLE code into a separate file
  iomap: move the file mapping reporting code into a separate file
  iomap: move the swapfile code into a separate file
  iomap: start moving code to fs/iomap/

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted stuff"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
perf_event_get(): don't bother with fget_raw()
vfs: update d_make_root() description

Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull adfs updates from Al Viro:
"More ADFS patches from Russell King"

* 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/adfs: add time stamp and file type helpers
  fs/adfs: super: limit idlen according to directory type
  fs/adfs: super: fix use-after-free bug
  fs/adfs: super: safely update options on remount
  fs/adfs: super: correct superblock flags
  fs/adfs: clean up indirect disc addresses and fragment IDs
  fs/adfs: clean up error message printing
  fs/adfs: use %pV for error messages
  fs/adfs: use format_version from disc_record
  fs/adfs: add helper to get filesystem size
  fs/adfs: add helper to get discrecord from map
  fs/adfs: correct disc record structure

Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs mount updates from Al Viro:
"The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix AF_XDP cq entry leak, from Ilya Maximets.

2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.

3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.

4) Fix handling of neigh timers wrt. entries added by userspace, from
    Lorenzo Bianconi.

5) Various cases of missing of_node_put(), from Nishka Dasgupta.

6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.

7) Various RDS layer fixes, from Gerd Rausch.

8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
    Cong Wang.

9) Fix FIB source validation checks over loopback, also from Cong Wang.

10) Use promisc for unsupported number of filters, from Justin Chen.

11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  tcp: fix tcp_set_congestion_control() use from bpf hook
  ag71xx: fix return value check in ag71xx_probe()
  ag71xx: fix error return code in ag71xx_probe()
  usb: qmi_wwan: add D-Link DWM-222 A2 device ID
  bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
  net: dsa: sja1105: Fix missing unlock on error in sk_buff()
  gve: replace kfree with kvfree
  selftests/bpf: fix test_xdp_noinline on s390
  selftests/bpf: fix "valid read map access into a read-only array 1" on s390
  net/mlx5: Replace kfree with kvfree
  MAINTAINERS: update netsec driver
  ipv6: Unlink sibling route in case of failure
  liquidio: Replace vmalloc + memset with vzalloc
  udp: Fix typo in net/ipv4/udp.c
  net: bcmgenet: use promisc for unsupported filters
  ipv6: rt6_check should return NULL if 'from' is NULL
  tipc: initialize 'validated' field of received packets
  selftests: add a test case for rp_filter
  fib: relax source validation check for loopback packets
  mlxsw: spectrum: Do not process learned records with a dummy FID
  ...

Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:
"The rest of MM and a kernel-wide procfs cleanup.

  Summary of the more significant patches:

   - Patch series "mm/memory_hotplug: Factor out memory block
     devicehandling", v3. David Hildenbrand.

     Some spring-cleaning of the memory hotplug code, notably in
     drivers/base/memory.c

   - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
     Shi.

     Fix /proc/pid/smaps output for THP pages used in shmem.

   - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

     Bugfix and speedup for kernel/resource.c

   - Patch series "mm: Further memory block device cleanups", David
     Hildenbrand.

     More spring-cleaning of the memory hotplug code.

   - Patch series "mm: Sub-section memory hotplug support". Dan
     Williams.

     Generalise the memory hotplug code so that pmem can use it more
     completely. Then remove the hacks from the libnvdimm code which
     were there to work around the memory-hotplug code's constraints.

   - "proc/sysctl: add shared variables for range check", Matteo Croce.

     We have about 250 instances of

          int zero;
          ...
                  .extra1 = &zero,

     in the tree. This is a tree-wide sweep to make all those private
     "zero"s and "one"s use global variables.

     Alas, it isn't practical to make those two global integers const"

* emailed patches from Andrew Morton <[email protected]>: (38 commits)
  proc/sysctl: add shared variables for range check
  mm: migrate: remove unused mode argument
  mm/sparsemem: cleanup 'section number' data types
  libnvdimm/pfn: stop padding pmem namespaces to section alignment
  libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
  mm/devm_memremap_pages: enable sub-section remap
  mm: document ZONE_DEVICE memory-model implications
  mm/sparsemem: support sub-section hotplug
  mm/sparsemem: prepare for sub-section ranges
  mm: kill is_dev_zone() helper
  mm/hotplug: kill is_dev_zone() usage in __remove_pages()
  mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
  mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
  mm/sparsemem: add helpers track active portions of a section at boot
  mm/sparsemem: introduce a SECTION_IS_EARLY flag
  mm/sparsemem: introduce struct mem_section_usage
  drivers/base/memory.c: get rid of find_memory_block_hinted()
  mm/memory_hotplug: move and simplify walk_memory_blocks()
  mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
  mm: make register_mem_sect_under_node() static
  ...

tracing: Fix user stack trace "??" output

Commit c5c27a0a5838 ("x86/stacktrace: Remove the pointless ULONG_MAX
marker") removes ULONG_MAX marker from user stack trace entries but
trace_user_stack_print() still uses the marker and it outputs unnecessary
"??".

For example:

            less-1911  [001] d..2    34.758944: <user stack trace>
   =>  <00007f16f2295910>
   => ??
   => ??
   => ??
   => ??
   => ??
   => ??
   => ??

The user stack trace code zeroes the storage before saving the stack, so if
the trace is shorter than the maximum number of entries it can terminate
the print loop if a zero entry is detected.

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery")
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge branch 'linux-5.3' of git://github.com/skeggsb/linux into drm-next

nouveau fixes and TU116 enablement.

Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5hZ3B4S9cVTPd2-Ug7dMSasLPJrWMyoDo4MOg8cbXWkA@mail.gmail.com

Merge tag 'drm-next-5.3-2019-07-18' of git://people.freedesktop.org/~agd5f/linux into drm-next

drm-next-5.3-2019-07-18:

amdgpu:
- Navi DC fix for secondary adapters
- Fix Navi flickering with high res panels
- Navi SMU fixes
- Vega20 SMU fixes
- Fixes for audio hotplug on HG systems
- Fix for potential integer overflows on large buffer
migrations
- debugfs fixes for umr
- Various other small fixes

amdkfd:
- Apply noretry setting consistently
- Fix hang in eviction
- Properly clean up GWS on uninit

UAPI:
- clarify a comment on ctx priority

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/nouveau/secboot/gp102-: remove WAR for SEC2 RTOS start bug

Appears to be fixed by "flcn/gp102-: improve implementation of
bind_context() on SEC2/GSP".

Tested on GP10[24678] and GV100.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/flcn/gp102-: improve implementation of bind_context() on SEC2/GSP

Fixes various issues encountered while attempting to initialise ACR.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau: fix memory leak in nouveau_conn_reset()

In nouveau_conn_reset(), if connector->state is true,
__drm_atomic_helper_connector_destroy_state() will be called,
but the memory pointed by asyc isn't freed. Memory leak happens
in the following function __drm_atomic_helper_connector_reset(),
where newly allocated asyc->state will be assigned to connector->state.

So using nouveau_conn_atomic_destroy_state() instead of
__drm_atomic_helper_connector_destroy_state to free the "old" asyc.

Here the is the log showing memory leak.

unreferenced object 0xffff8c5480483c80 (size 192):
  comm "kworker/0:2", pid 188, jiffies 4294695279 (age 53.179s)
  hex dump (first 32 bytes):
    00 f0 ba 7b 54 8c ff ff 00 00 00 00 00 00 00 00  ...{T...........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000005005c0d0>] kmem_cache_alloc_trace+0x195/0x2c0
    [<00000000a122baed>] nouveau_conn_reset+0x25/0xc0 [nouveau]
    [<000000004fd189a2>] nouveau_connector_create+0x3a7/0x610 [nouveau]
    [<00000000c73343a8>] nv50_display_create+0x343/0x980 [nouveau]
    [<000000002e2b03c3>] nouveau_display_create+0x51f/0x660 [nouveau]
    [<00000000c924699b>] nouveau_drm_device_init+0x182/0x7f0 [nouveau]
    [<00000000cc029436>] nouveau_drm_probe+0x20c/0x2c0 [nouveau]
    [<000000007e961c3e>] local_pci_probe+0x47/0xa0
    [<00000000da14d569>] work_for_cpu_fn+0x1a/0x30
    [<0000000028da4805>] process_one_work+0x27c/0x660
    [<000000001d415b04>] worker_thread+0x22b/0x3f0
    [<0000000003b69f1f>] kthread+0x12f/0x150
    [<00000000c94c29b7>] ret_from_fork+0x3a/0x50

Signed-off-by: Yongxin Liu <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/dmem: missing mutex_lock in error path

In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked before
calling nouveau_dmem_chunk_alloc() as shown when CONFIG_PROVE_LOCKING
is enabled:

[ 1294.871933] =====================================
[ 1294.876656] WARNING: bad unlock balance detected!
[ 1294.881375] 5.2.0-rc3+ #5 Not tainted
[ 1294.885048] -------------------------------------
[ 1294.889773] test-malloc-vra/6299 is trying to release lock (&drm->dmem->mutex) at:
[ 1294.897482] [<ffffffffa01a220f>] nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.905782] but there are no more locks to release!
[ 1294.910690]
[ 1294.910690] other info that might help us debug this:
[ 1294.917249] 1 lock held by test-malloc-vra/6299:
[ 1294.921881]  #0: 0000000016e10454 (&mm->mmap_sem#2){++++}, at: nouveau_svmm_bind+0x142/0x210 [nouveau]
[ 1294.931313]
[ 1294.931313] stack backtrace:
[ 1294.935702] CPU: 4 PID: 6299 Comm: test-malloc-vra Not tainted 5.2.0-rc3+ #5
[ 1294.942786] Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1401 05/21/2018
[ 1294.949590] Call Trace:
[ 1294.952059]  dump_stack+0x7c/0xc0
[ 1294.955469]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.962213]  print_unlock_imbalance_bug.cold.52+0xca/0xcf
[ 1294.967641]  lock_release+0x306/0x380
[ 1294.971383]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1294.978089]  ? lock_downgrade+0x2d0/0x2d0
[ 1294.982121]  ? find_held_lock+0xac/0xd0
[ 1294.985979]  __mutex_unlock_slowpath+0x8f/0x3f0
[ 1294.990540]  ? wait_for_completion+0x230/0x230
[ 1294.995002]  ? rwlock_bug.part.2+0x60/0x60
[ 1294.999197]  nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
[ 1295.005751]  ? page_mapping+0x98/0x110
[ 1295.009511]  migrate_vma+0xa74/0x1090
[ 1295.013186]  ? move_to_new_page+0x480/0x480
[ 1295.017400]  ? __kmalloc+0x153/0x300
[ 1295.021052]  ? nouveau_dmem_migrate_vma+0xd8/0x1e0 [nouveau]
[ 1295.026796]  nouveau_dmem_migrate_vma+0x157/0x1e0 [nouveau]
[ 1295.032466]  ? nouveau_dmem_init+0x490/0x490 [nouveau]
[ 1295.037612]  ? vmacache_find+0xc2/0x110
[ 1295.041537]  nouveau_svmm_bind+0x1b4/0x210 [nouveau]
[ 1295.046583]  ? nouveau_svm_fault+0x13e0/0x13e0 [nouveau]
[ 1295.051912]  drm_ioctl_kernel+0x14d/0x1a0
[ 1295.055930]  ? drm_setversion+0x330/0x330
[ 1295.059971]  drm_ioctl+0x308/0x530
[ 1295.063384]  ? drm_version+0x150/0x150
[ 1295.067153]  ? find_held_lock+0xac/0xd0
[ 1295.070996]  ? __pm_runtime_resume+0x3f/0xa0
[ 1295.075285]  ? mark_held_locks+0x29/0xa0
[ 1295.079230]  ? _raw_spin_unlock_irqrestore+0x3c/0x50
[ 1295.084232]  ? lockdep_hardirqs_on+0x17d/0x250
[ 1295.088768]  nouveau_drm_ioctl+0x9a/0x100 [nouveau]
[ 1295.093661]  do_vfs_ioctl+0x137/0x9a0
[ 1295.097341]  ? ioctl_preallocate+0x140/0x140
[ 1295.101623]  ? match_held_lock+0x1b/0x230
[ 1295.105646]  ? match_held_lock+0x1b/0x230
[ 1295.109660]  ? find_held_lock+0xac/0xd0
[ 1295.113512]  ? __do_page_fault+0x324/0x630
[ 1295.117617]  ? lock_downgrade+0x2d0/0x2d0
[ 1295.121648]  ? mark_held_locks+0x79/0xa0
[ 1295.125583]  ? handle_mm_fault+0x352/0x430
[ 1295.129687]  ksys_ioctl+0x60/0x90
[ 1295.133020]  ? mark_held_locks+0x29/0xa0
[ 1295.136964]  __x64_sys_ioctl+0x3d/0x50
[ 1295.140726]  do_syscall_64+0x68/0x250
[ 1295.144400]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1295.149465] RIP: 0033:0x7f1a3495809b
[ 1295.153053] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
[ 1295.171850] RSP: 002b:00007ffef7ed1358 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1295.179451] RAX: ffffffffffffffda RBX: 00007ffef7ed1628 RCX: 00007f1a3495809b
[ 1295.186601] RDX: 00007ffef7ed13b0 RSI: 0000000040406449 RDI: 0000000000000004
[ 1295.193759] RBP: 00007ffef7ed13b0 R08: 0000000000000000 R09: 000000000157e770
[ 1295.200917] R10: 000000000151c010 R11: 0000000000000246 R12: 0000000040406449
[ 1295.208083] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000

Reacquire the lock before continuing to the next page.

Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/hwmon: return EINVAL if the GPU is powered down for sensors reads

fixes bogus values userspace gets from hwmon while the GPU is powered down

Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Rhys Kidd <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau: fix bogus GPL-2 license header

The bulk SPDX addition made all these files into GPL-2.0 licensed files.
However the remainder of the project is MIT-licensed, these files
were simply missing the boiler plate and got caught up in the global update.

Fixes: 96ac6d4351004 (treewide: Add SPDX license identifier - Kbuild)
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau: fix bogus GPL-2 license header

The bulk SPDX addition made all these files into GPL-2.0 licensed files.
However the remainder of the project is MIT-licensed, these files
(primarily header files) were simply missing the boiler plate and got
caught up in the global update.

Fixes: b24413180f5 (License cleanup: add SPDX GPL-2.0 license identifier to files with no license)
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Emil Velikov <[email protected]>
Acked-by: Karol Herbst <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/i2c: Enable i2c pads & busses during preinit

It turns out that while disabling i2c bus access from software when the
GPU is suspended was a step in the right direction with:

commit 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")

We also ended up accidentally breaking the vbios init scripts on some
older Tesla GPUs, as apparently said scripts can actually use the i2c
bus. Since these scripts are executed before initializing any
subdevices, we end up failing to acquire access to the i2c bus which has
left a number of cards with their fan controllers uninitialized. Luckily
this doesn't break hardware - it just means the fan gets stuck at 100%.

This also means that we've always been using our i2c busses before
initializing them during the init scripts for older GPUs, we just didn't
notice it until we started preventing them from being used until init.
It's pretty impressive this never caused us any issues before!

So, fix this by initializing our i2c pad and busses during subdev
pre-init. We skip initializing aux busses during pre-init, as those are
guaranteed to only ever be used by nouveau for DP aux transactions.

Signed-off-by: Lyude Paul <[email protected]>
Tested-by: Marc Meledandri <[email protected]>
Fixes: 342406e4fbba ("drm/nouveau/i2c: Disable i2c bus access after ->fini()")
Cc: [email protected]
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/disp/tu102-: wire up scdc parameter setter

Regs seem valid here still, and tested on TU116.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/core: recognise TU116 chipset

Modesetting only, still waiting on ACR/GR firmware from NVIDIA for Turing
graphics/compute bring-up.

Each subsystem was compared with traces, along with various tests to check
that things generally work as they should, and appears compatible enough
with the current TU117 code to enable support.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/kms: disallow dual-link harder if hdmi connection detected

The fallthrough cases (pre-Fermi) would accidentally allow dual-link pixel
clocks even where they shouldn't be. This leads to a high resolution HDMI
displays, connected via a DVI->HDMI adapter, to fail on the original NV50.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling

Previously center scaling would get scaling applied to it (when it was
only supposed to center the image), and aspect-corrected scaling did not
always correctly pick whether to reduce width or height for a particular
combination of inputs/outputs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110660
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/disp/nv50-: force scaler for any non-default LVDS/eDP modes

Higher layers tend to add a lot of modes not actually in the EDID, such
as the standard DMT modes. Changing this would be extremely intrusive to
everyone, so just force the scaler more often. There are no practical
cases we're aware of where a LVDS/eDP panel has multiple resolutions
exposed, and i915 already does it this way.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110660
Signed-off-by: Ilia Mirkin <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau/mcp89/mmu: Use mcp77_mmu_new instead of g84_mmu_new on MCP89.

Fix a crash or broken depth testing in all OpenGL applications that use the
depth buffer on MCP89 (GeForce 320M) seen on a MacBook Pro Late 2010.

The bug is tracked in https://bugs.freedesktop.org/show_bug.cgi?id=108500

Signed-off-by: Timo Wiren <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

csky: Fixup abiv1 memset error

Current memset implementation in abiv1 is wrong and it'll cause unalign
access. Just remove it and use the generic one. This patch will cause
performance degradation and we will improve it with a new design in next
patchset.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

csky: Improve tlb operation with help of asid

There are two generations of tlb operation instruction for C-SKY.
First generation is use mcr register and it need software do more
things, second generation is use specific instructions, eg:
tlbi.va, tlbi.vas, tlbi.alls

We implemented the following functions:

- flush_tlb_range (a range of entries)
- flush_tlb_page (one entry)

Above functions use asid from vma->mm to invalid tlb entries and
we could use tlbi.vas instruction for newest generation csky cpu.

- flush_tlb_kernel_range
- flush_tlb_one

Above functions don't care asid and it invalid the tlb entries only
with vpn and we could use tlbi.vaas instruction for newest generat-
ion csky cpu.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

csky: Use generic asid algorithm to implement switch_mm

Use linux generic asid/vmid algorithm to implement csky
switch_mm function. The algorithm is from arm and it could
work with SMP system. It'll help reduce tlb flush for
switch_mm in task/vm switch.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

csky: Add new asid lib code from arm

This patch only contains asid help code from arm for next patch to
use.

The asid allocator use five level check to reduce the cost of
switch_mm.

1. Check if the asid version is the same (it's general)
2. Check reserved_asid which is set in rollover flush_context()
    and key point is to keep the same bit position with the current
    asid version instead of input version.
3. Check if the position of bitmap is free then it could be set &
    used directly.
4. find_next_zero_bit() (a little performance cost)
5. flush_context  (this is the worst cost with increase current asid
    version)

Check is level by level and cost is also higher with the next level.
The reserved_asid and bitmap mechanism prevent unnecessary
find_next_zero_bit().

The atomic 64 bit asid is also suitable for 32-bit system and it
won't cost a lot in 1th 2th 3th level check.

The operation of set/clear mm_cpumask was removed in arm64 compared to
arm32. It seems no side effect on current arm64 system, but from
software meaning it's wrong. Although csky also needn't it, we add it
back for csky.

The asid_per_ctxt is no use for csky and it reserves the lowest bits for
other use, maybe: trust zone ? Ok, just keep it in csky copy.

Seems it also could be used by other archs and it's worth to move asid
code to generic in future.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Julien Grall <[email protected]>

csky: Revert mmu ASID mechanism

Current C-SKY ASID mechanism is from mips and it doesn't work well
with multi-cores. ASID per core mechanism is not suitable for C-SKY
SMP tlb maintain operations, eg: tlbi.vas need share the same asid
in all processors and it'll invalid the tlb entry in all cores with
the same asid.

This patch is prepare for new ASID mechanism.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

dt-bindings: csky: Add csky PMU bindings

This patch adds the documentation to describe that how to add pmu node in
dts.

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Cc: Rob Herring <[email protected]>

dt-bindings: interrupt-controller: Update csky mpintc

Add trigger type setting for csky,mpintc. The driver also could
support #interrupt-cells <1> and it wouldn't invalidate existing
DTs. Here we only show the complete format.

Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Cc: Marc Zyngier <[email protected]>

csky: Fixup some error count in 810 & 860.

CK810 pmu only support event with index 0-8 and 0xd; CK860 only
support event 1~4, 0xa~0x1b. So do not register unsupport event
to hardware cache event, which may leader to unknown behavior.

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>

csky: Fix perf record in kernel/user space

csky_pmu_event_init is called several times during the perf record
initialzation. After configure the event counter in either kernel
space or user space, csky_pmu_event_init is called twice with no
attr specified. Configuration will be overwritten with sampling in
both kernel space and user space. --all-kernel/--all-user is
useless without this patch applied.

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>

csky: Add pmu interrupt support

This patch add interrupt request and handler for csky pmu.
perf can record on hardware event with this patch applied.

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>

csky: Add count-width property for csky pmu

The csky pmu counter may have different io width. When the counter is
smaller then 64 bits and counter value is smaller than the old value, it
will result to a extremely large delta value. So the sampled value should
be extend to 64 bits to avoid this, the extension bits base on the
count-width property from dts.

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>

csky: Init pmu as a device

This patch change the csky pmu initialization from arch init to
device init. The pmu can be configued with information from
device tree(pmu device name, irq number and etc.).

Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>

csky: Fixup no panic in kernel for some traps

These traps couldn't be hanppen in kernel and we must panic there not
send a signal to userspace.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

csky: Select intc & timer drivers

Let arch help to select interrupt controller's and timer's drivers
instead of people using menuconfig to select. This help the mini system
boot up.

Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>

tcp: fix tcp_set_congestion_control() use from bpf hook

Neal reported incorrect use of ns_capable() from bpf hook.

bpf_setsockopt(...TCP_CONGESTION...)
  -> tcp_set_congestion_control()
   -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
    -> ns_capable_common()
     -> current_cred()
      -> rcu_dereference_protected(current->cred, 1)

Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.

As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.

The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
Reported-by: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Lawrence Brakmo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ag71xx: fix return value check in ag71xx_probe()

In case of error, the function of_get_mac_address() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ag71xx: fix error return code in ag71xx_probe()

Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
handling case instead of 0, as done elsewhere in this function.

Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

proc/sysctl: add shared variables for range check

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[[email protected]: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix net/ipv6/sysctl_net_ipv6.c]
[[email protected]: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: migrate: remove unused mode argument

migrate_page_move_mapping() doesn't use the mode argument. Remove it
and update callers accordingly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: cleanup 'section number' data types

David points out that there is a mixture of 'int' and 'unsigned long'
usage for section number data types. Update the memory hotplug path to
use 'unsigned long' consistently for section numbers.

[[email protected]: fix printk format]
Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

libnvdimm/pfn: stop padding pmem namespaces to section alignment

Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.

Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jeff Moyer <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields

At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data.  While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location.  For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.

In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero.  Bump the minor version to indicate
it is safe to assume the 'padding' and 'flags' are zero.  Otherwise,
this corruption is expected to benign since all other critical fields
are explicitly initialized.

Note The cc: stable is about spreading this new policy to as many
kernels as possible not fixing an issue in those kernels.  It is not
until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
section alignment" where this improper initialization becomes a problem.
So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
namespaces to section alignment" (which is not tagged for stable), make
sure this pre-requisite is flagged.

Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/devm_memremap_pages: enable sub-section remap

Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).

Link: http://lkml.kernel.org/r/156092355542.979959.10060071713397030576.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: document ZONE_DEVICE memory-model implications

Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.

[[email protected]: update ZONE_DEVICE memory model documentation]
Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Mike Rapoport <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Jonathan Corbet <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: support sub-section hotplug

The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity.

For example the following backtrace is emitted when attempting
arch_add_memory() with physical address ranges that intersect 'System
RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:

    # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
    100000000-1ffffffff : System RAM
    200000000-303ffffff : Persistent Memory (legacy)
    304000000-43fffffff : System RAM
    440000000-23ffffffff : Persistent Memory
    2400000000-43bfffffff : Persistent Memory
      2400000000-43bfffffff : namespace2.0

    WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
    [..]
    RIP: 0010:add_pages+0x5c/0x60
    [..]
    Call Trace:
     devm_memremap_pages+0x460/0x6e0
     pmem_attach_disk+0x29e/0x680 [nd_pmem]
     ? nd_dax_probe+0xfc/0x120 [libnvdimm]
     nvdimm_bus_probe+0x66/0x160 [libnvdimm]

It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while.

A fix for that long-standing breakage introduces as many problems as it
solves as it would require a backward-incompatible change to the
namespace metadata interpretation.  Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.

Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.

[1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237 [email protected]

[[email protected]: fix deactivate_section for early sections]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: prepare for sub-section ranges

Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames.
No intended functional changes.

Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: kill is_dev_zone() helper

Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.

Link: http://lkml.kernel.org/r/156092353211.979959.1489004866360828964.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hotplug: kill is_dev_zone() usage in __remove_pages()

The zone type check was a leftover from the cleanup that plumbed altmap
through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
vmem_altmap to arch_remove_memory and __remove_pages".

Link: http://lkml.kernel.org/r/156092352642.979959.6664333788149363039.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()

Allow sub-section sized ranges to be added to the memmap.

populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.

Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal

Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.

[[email protected]: fix shrink_{zone,node}_span]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/156092351496.979959.12703722803097017492.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: add helpers track active portions of a section at boot

Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
sub-section active bitmask, each bit representing a PMD_SIZE span of the
architecture's memory hotplug section size.

The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and either determine that the
section is an "early section", or read the sub-section active ranges
from the bitmask. The expectation is that the bitmask (subsection_map)
fits in the same cacheline as the valid_section() / early_section()
data, so the incremental performance overhead to pfn_valid() should be
negligible.

The rationale for using early_section() to short-ciruit the
subsection_map check is that there are legacy code paths that use
pfn_valid() at section granularity before validating the pfn against
pgdat data. So, the early_section() check allows those traditional
assumptions to persist while also permitting subsection_map to tell the
truth for purposes of populating the unused portions of early sections
with PMEM and other ZONE_DEVICE mappings.

Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Jane Chu <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: introduce a SECTION_IS_EARLY flag

In preparation for sub-section hotplug, track whether a given section
was created during early memory initialization, or later via memory
hotplug. This distinction is needed to maintain the coarse expectation
that pfn_valid() returns true for any pfn within a given section even if
that section has pages that are reserved from the page allocator.

For example one of the of goals of subsection hotplug is to support
cases where the system physical memory layout collides System RAM and
PMEM within a section. Several pfn_valid() users expect to just check
if a section is valid, but they are not careful to check if the given
pfn is within a "System RAM" boundary and instead expect pgdat
information to further validate the pfn.

Rather than unwind those paths to make their pfn_valid() queries more
precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
traditional expectation that pfn_valid() returns true for all early
sections.

Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparsemem: introduce struct mem_section_usage

Patch series "mm: Sub-section memory hotplug support", v10.

The memory hotplug section is an arbitrary / convenient unit for memory
hotplug.  'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace.  The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular.  Recall that pmem
uses devm_memremap_pages(), and subsequently arch_add_memory(), to
allocate a 'struct page' memmap for pmem.  However, it does not use the
'bottom half' of memory hotplug, i.e.  never marks pmem pages online and
never exposes the userspace memblock interface for pmem.  This leaves an
opening to redress the section-size constraint.

To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory().  Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next.  Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.

It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages().  Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:

Current design assumptions:

- Sections that describe boot memory (early sections) are never
   unplugged / removed.

- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
   valid_section() check

- __add_pages() and helper routines assume all operations occur in
   PAGES_PER_SECTION units.

- The memblock sysfs interface only comprehends full sections

New design assumptions:

- Sections are instrumented with a sub-section bitmask to track (on
   x86) individual 2MB sub-divisions of a 128MB section.

- Partially populated early sections can be extended with additional
   sub-sections, and those sub-sections can be removed with
   arch_remove_memory(). With this in place we no longer lose usable
   memory capacity to padding.

- pfn_valid() is updated to look deeper than valid_section() to also
   check the active-sub-section mask. This indication is in the same
   cacheline as the valid_section() so the performance impact is
   expected to be negligible. So far the lkp robot has not reported any
   regressions.

- Outside of the core vmemmap population routines which are replaced,
   other helper routines like shrink_{zone,pgdat}_span() are updated to
   handle the smaller granularity. Core memory hotplug routines that
   deal with online memory are not touched.

- The existing memblock sysfs user api guarantees / assumptions are not
   touched since this capability is limited to !online
   !memblock-sysfs-accessible sections.

Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them.  The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt.  Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem ranges
with other pmem ranges by default [3].  In short, devm_memremap_pages()
has pushed the venerable section-size constraint past the breaking point,
and the simplicity of section-aligned arch_add_memory() is no longer
tenable.

These patches are exposed to the kbuild robot on a subsection-v10 branch
[4], and a preview of the unit test for this functionality is available
on the 'subsection-pending' branch of ndctl [5].

[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237 [email protected]
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c

This patch (of 13):

Towards enabling memory hotplug to track partial population of a section,
introduce 'struct mem_section_usage'.

A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap.  Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
a new 'subsection_map' bitmap.  The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.

SUBSECTION_SHIFT is defined as global constant instead of per-architecture
value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
subsection users.  Specifically a common subsection size allows for the
possibility that persistent memory namespace configurations be made
compatible across architectures.

The primary motivation for this functionality is to support platforms that
mix "System RAM" and "Persistent Memory" within a single section, or
multiple PMEM ranges with different mapping lifetimes within a single
section.  The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.

Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.

Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/base/memory.c: get rid of find_memory_block_hinted()

No longer needed, let's remove it. Also, drop the "hint" parameter
completely from "find_memory_block_by_id", as nobody needs it anymore.

[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: handle zero-length walks]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Tested-by: Qian Cai <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: move and simplify walk_memory_blocks()

Let's move walk_memory_blocks() to the place where memory block logic
resides and simplify it. While at it, add a type for the callback
function.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns

walk_memory_range() was once used to iterate over sections.  Now, it
iterates over memory blocks.  Rename the function, fixup the
documentation.

Also, pass start+size instead of PFNs, which is what most callers
already have at hand.  (we'll rework link_mem_sections() most probably
soon)

Follow-up patches will rework, simplify, and move walk_memory_blocks()
to drivers/base/memory.c.

Note: walk_memory_blocks() only works correctly right now if the
start_pfn is aligned to a section start.  This is the case right now,
but we'll generalize the function in a follow up patch so the semantics
match the documentation.

[[email protected]: remove unused variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: make register_mem_sect_under_node() static

It is only used internally.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/base/memory: use "unsigned long" for block ids

Block ids are just shifted section numbers, so let's also use "unsigned
long" for them, too.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: section numbers use the type "unsigned long"

Patch series "mm: Further memory block device cleanups", v1.

Some further cleanups around memory block devices.  Especially, clean up
and simplify walk_memory_range().  Including some other minor cleanups.

This patch (of 6):

We are using a mixture of "int" and "unsigned long".  Let's make this
consistent by using "unsigned long" everywhere.  We'll do the same with
memory block ids next.

While at it, turn the "unsigned long i" in removable_show() into an int
- sections_per_block is an int.

[[email protected]: s/unsigned long i/unsigned long nr/]
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

resource: avoid unnecessary lookups in find_next_iomem_res()

find_next_iomem_res() shows up to be a source for overhead in dax
benchmarks.

Improve performance by not considering children of the tree if the top
level does not match.  Since the range of the parents should include the
range of the children such check is redundant.

Running sysbench on dax (pmem emulation, with write_cache disabled):

  sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
   --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run

Provides the following results:

events (avg/stddev)
-------------------
  5.2-rc3: 1247669.0000/16075.39
  w/patch: 1286320.5000/16402.72 (+3%)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

resource: fix locking in find_next_iomem_res()

Since resources can be removed, locking should ensure that the resource
is not removed while accessing it.  However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.

Keep holding the lock while the data is copied.  While at it, change the
return value to a more informative value.  It is disregarded by the
callers.

[[email protected]: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: thp: fix false negative of shmem vma's THP eligibility

Commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps.  But, when
checking the eligibility for shmem vma, __transparent_hugepage_enabled()
is called to override the result from shmem_huge_enabled().  It may
result in the anonymous vma's THP flag override shmem's.  For example,
running a simple test which create THP for shmem, but with anonymous THP
disabled, when reading the process's smaps, it may show:

  7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
  Size:               4096 kB
  ...
  [snip]
  ...
  ShmemPmdMapped:     4096 kB
  ...
  [snip]
  ...
  THPeligible:    0

And, /proc/meminfo does show THP allocated and PMD mapped too:

  ShmemHugePages:     4096 kB
  ShmemPmdMapped:     4096 kB

This doesn't make too much sense.  The shmem objects should be treated
separately from anonymous THP.  Calling shmem_huge_enabled() with
checking MMF_DISABLE_THP sounds good enough.  And, we could skip stack
and dax vma check since we already checked if the vma is shmem already.

Also check if vma is suitable for THP by calling
transhuge_vma_suitable().

And minor fix to smaps output format and documentation.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Signed-off-by: Yang Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: thp: make transhuge_vma_suitable available for anonymous THP

transhuge_vma_suitable() was only available for shmem THP, but anonymous
THP has the same check except pgoff check. And, it will be used for THP
eligible check in the later patch, so make it available for all kind of
THPs. This also helps reduce code duplication slightly.

Since anonymous THP doesn't have to check pgoff, so make pgoff check
shmem vma only.

And regroup some functions in include/linux/mm.h to solve compile issue
since transhuge_vma_suitable() needs call vma_is_anonymous() which was
defined after huge_mm.h is included.

[[email protected]: fix typo]
[[email protected]: v4]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/sparse.c: set section nid for hot-add memory

In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
section_to_node_table[]. While for hot-add memory, this is missed.
Without this information, page_to_nid() may not give the right node id.

BTW, current online_pages works because it leverages nid in
memory_block. But the granularity of node id should be mem_section
wide.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section

The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail

We really don't want anything during memory hotunplug to fail.  We
always pass a valid memory block device, that check can go.  Avoid
allocating memory and eventually failing.  As we are always called under
lock, we can use a static piece of memory.  This avoids having to put
the structure onto the stack, having to guess about the stack size of
callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove.  Memory block devices
with mixed nodes (added during boot) should properly fenced off and
never removed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: remove memory block devices before arch_remove_memory()

Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: drop MHP_MEMBLOCK_API

No longer needed, the callers of arch_add_memory() can handle this
manually.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory_hotplug: create memory block devices after arch_add_memory()

Only memory to be added to the buddy and to be onlined/offlined by user
space using /sys/devices/system/memory/...  needs (and should have!)
memory block devices.

Factor out creation of memory block devices.  Create all devices after
arch_add_memory() succeeded.  We can later drop the want_memblock
parameter, because it is now effectively stale.

Only after memory block devices have been added, memory can be onlined
by user space.  This implies, that memory is not visible to user space
at all before arch_add_memory() succeeded.

While at it
- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
- introduce find_memory_block_by_id() to search via block id
- Use find_memory_block_by_id() in init_memory_block() to catch
   duplicates

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>