Git Repo - linux.git/log

powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer

The current Branch History Rolling Buffer (BHRB) code does not check
for any privilege levels before updating the data from BHRB. This
could leak kernel addresses to userspace even when profiling only with
userspace privileges. Add proper checks to prevent it.

Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/perf: Fix kernel address leak via sampling registers

Current code in power_pmu_disable() does not clear the sampling
registers like Sampling Instruction Address Register (SIAR) and
Sampling Data Address Register (SDAR) after disabling the PMU. Since
these are userspace readable and could contain kernel addresses, add
code to explicitly clear the content of these registers.

Also add a "context synchronizing instruction" to enforce no further
updates to these registers as suggested by Power ISA v3.0B. From
section 9.4, on page 1108:

  "If an mtspr instruction is executed that changes the value of a
  Performance Monitor register other than SIAR, SDAR, and SIER, the
  change is not guaranteed to have taken effect until after a
  subsequent context synchronizing instruction has been executed (see
  Chapter 11. "Synchronization Requirements for Context Alterations"
  on page 1133)."

Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Massage change log and add ISA reference]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9

On POWER9, since commit cc3d2940133d ("powerpc/64: Enable use of radix
MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and
HPT bits in the client-architecture-support (CAS) vector, which tells
the hypervisor that we can do either radix or HPT.  According to PAPR,
if we use this combination we are promising to do a H_REGISTER_PROC_TBL
hcall later on to let the hypervisor know whether we are doing radix
or HPT.  We currently do this call if we are doing radix but not if
we are doing HPT.  If the hypervisor is able to support both radix
and HPT guests, it would be entitled to defer allocation of the HPT
until the H_REGISTER_PROC_TBL call, and to fail any attempts to create
HPTEs until the H_REGISTER_PROC_TBL call.  Thus we need to do a
H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may
crash at boot time.

This adds the code to call H_REGISTER_PROC_TBL in this case, before
we attempt to create any HPT entries using H_ENTER.

Fixes: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9")
Cc: [email protected] # v4.11+
Signed-off-by: Paul Mackerras <[email protected]>
Reviewed-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs

The SLB bad address handler's trap number fixup does not preserve the
low bit that indicates nonvolatile GPRs have not been saved. This
leads save_nvgprs to skip saving them, and subsequent functions and
return from interrupt will think they are saved.

This causes kernel branch-to-garbage debugging to not have correct
registers, can also cause userspace to have its registers clobbered
after a segfault.

Fixes: f0f558b131db ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address")
Cc: [email protected] # v4.9+
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Merge branch 'topic/ppc-kvm' into next

This brings in two series from Paul, one of which touches KVM code and
may need to be merged into the kvm-ppc tree to resolve conflicts.

KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where the contents of the TEXASR can get corrupted while a thread is
in fake suspend state.  The workaround is for the instruction emulation
code to use the value saved at the most recent guest exit in real
suspend mode.  We achieve this by simply not saving the TEXASR into
the vcpu struct on an exit in fake suspend state.  We also have to
take care to set the orig_texasr field only on guest exit in real
suspend state.

This also means that on guest entry in fake suspend state, TEXASR
will be restored to the value it had on the last exit in real suspend
state, effectively counteracting any hardware-caused corruption.  This
works because TEXASR may not be written in suspend state.

With this, the guest might see the wrong values in TEXASR if it reads
it while in suspend state, but will see the correct value in
non-transactional state (e.g. after a treclaim), and treclaim will
work correctly.

With this workaround, the code will actually run slightly faster, and
will operate correctly on systems without the TEXASR bug (since TEXASR
may not be written in suspend state, and is only changed by failure
recording, which will have already been done before we get into fake
suspend state).  Therefore these changes are not made subject to a CPU
feature bit.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode

This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
where a treclaim performed in fake suspend mode can cause subsequent
reads from the XER register to return inconsistent values for the SO
(summary overflow) bit.  The inconsistent SO bit state can potentially
be observed on any thread in the core.  We have to do the treclaim
because that is the only way to get the thread out of suspend state
(fake or real) and into non-transactional state.

The workaround for the bug is to force the core into SMT4 mode before
doing the treclaim.  This patch adds the code to do that, conditional
on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.

Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9

POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode).  Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads.  The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems.  This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional.  The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated.  The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present.  The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state.  Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths.  If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state.  This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active.  If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0.  The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv: Provide a way to force a core into SMT4 mode

POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily.  This workaround is only needed when
running bare-metal.

This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state.  Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.

To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state.  If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0.  The pnv_power9_force_smt4_catch() function does the following:

1. Set the dont_stop flag for each thread in the core, except
   ourselves (in fact we use an atomic_inc() in case more than
   one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
   requested_psscr field in the paca being 0.  If this is at
   least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
   being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
   we sent a doorbell interrupt and check if they are awake now.

This relies on the following properties:

- Once dont_stop is non-zero, requested_psccr can't go from zero to
  non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
  a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
  and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
  must be in SMT4 mode, since SMT modes are powers of 2.

This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop.  The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.

Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section.  This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.

In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2

This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2
processors which will be used to enable the hypervisor to assist
hardware with the handling of checkpointed register values while the
CPU is in suspend state, in order to work around hardware bugs.  The
hardware assistance for these workarounds introduced a new hardware
bug relating to the XER[SO] bit.  We add a separate feature bit for
this bug in case future chips fix it while still requiring the
hypervisor assistance with suspend state.

When the dt_cpu_ftrs subsystem is in use, the software assistance can
be enabled using a "tm-suspend-hypervisor-assist" node in the device
tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for
the XER[SO] bug.  In the absence of such nodes, a quirk enables both
for POWER9 "Nimbus" DD2.2 processors.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Free up CPU feature bits on 64-bit machines

This moves all the CPU feature bits that are only used on 32-bit
machines to the top 20 bits of the CPU feature word and arranges
for them to be defined only in 32-bit builds.  The features that
are common to 32-bit and 64-bit machines are moved to bits 0-11
of the CPU feature word.  This means that for 64-bit platforms,
bits 44-63 can now be used for new features that only exist on
64-bit machines.  (These bit numbers are counting from the right,
i.e. the LSB is bit 0.)

Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high
16 bits, we have to adjust some assembly code.  Also, CPU_FTR_EMB_HV
moved from the high 16 bits to the low 16 bits.

Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only
64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is
a CPU execution mode as opposed to being a page attribute.

With this we now have 20 free CPU feature bits on 64-bit machines.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Book E: Remove unused CPU_FTR_L2CSR bit

The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the
bit.

The last usage was removed in 86d63363defc ("powerpc/e500mc: Remove
dead L2 flushing code in idle_e500.S") (Jun 2015).

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Use feature bit for RTC presence rather than timebase presence

All PowerPC CPUs other than the original PPC601 have a timebase
register rather than the "real-time clock" (RTC) register that the
PPC601 (and the original POWER and POWER2 CPUs) had. Currently
we have a CPU feature bit to indicate the presence of the timebase,
but it makes more sense to use a bit to indicate the unusual
situation rather than the common situation. This therefore defines
a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and
arranges for it to be set on PPC601 systems.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

On POWER9, under some circumstances, a broadcast TLB invalidation
might complete before all previous stores have drained, potentially
allowing stale stores from becoming visible after the invalidation.
This works around it by doubling up those TLB invalidations which was
verified by HW to be sufficient to close the risk window.

This will be documented in a yet-to-be-published errata.

Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <[email protected]>
[mpe: Enable the feature in the DT CPU features code for all Power9,
rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/radix: Move the functions that does the actual tlbie closer

No functionality change. Just code movement to ease code changes later

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/radix: Remove unused code

These function are not used in the code. Remove them.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm: Workaround Nest MMU bug with TLB invalidations

On POWER9 the Nest MMU may fail to invalidate some translations when
doing a tlbie "by PID" or "by LPID" that is targeted at the TLB only
and not the page walk cache.

This works around it by forcing such invalidations to escalate to
RIC=2 (full invalidation of TLB *and* PWC) when a coprocessor is in
use for the context.

Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: [email protected] # v4.15+
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Balbir Singh <[email protected]>
[balbirs: fixed spelling and coding style to quiesce checkpatch.pl]
Tested-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm: Add tracking of the number of coprocessors using a context

Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.

Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.

This maintains a separate copros counter in the MMU context for
that purpose.

NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.

Fixes: 03b8abedf4f4 ("cxl: Enable global TLBIs for cxl contexts")
Cc: [email protected] # v4.15+
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Tested-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened

force_external_irq_replay() can be called in the do_IRQ path with
interrupts hard enabled and soft disabled if may_hard_irq_enable() set
MSR[EE]=1. It updates local_paca->irq_happened with a load, modify,
store sequence. If a maskable interrupt hits during this sequence, it
will go to the masked handler to be marked pending in irq_happened.
This update will be lost when the interrupt returns and the store
instruction executes. This can result in unpredictable latencies,
timeouts, lockups, etc.

Fix this by ensuring hard interrupts are disabled before modifying
irq_happened.

This could cause any maskable asynchronous interrupt to get lost, but
it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE,
so very high external interrupt rate and high IPI rate. The hang was
bisected down to enabling doorbell interrupts for IPIs. These provided
an interrupt type that could run at high rates in the do_IRQ path,
stressing the race.

Fixes: 1d607bb3bd60 ("powerpc/irq: Add mechanism to force a replay of interrupts")
Cc: [email protected] # v4.8+
Reported-by: Carol L. Soto <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'

'linux,stdout-path' has been deprecated for some time in favor of
'stdout-path'. Now dtc will warn on occurrences of 'linux,stdout-path'.
Search and replace all the of occurrences with 'stdout-path'.

Signed-off-by: Rob Herring <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
Signed-off-by: Michael Ellerman <[email protected]>

selftests/powerpc: Add process creation benchmark

Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Add SPDX, and fixup formatting]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Use sizeof(*foo) rather than sizeof(struct foo)

It's slightly less error prone to use sizeof(*foo) rather than
specifying the type.

Signed-off-by: Markus Elfring <[email protected]>
[mpe: Consolidate into one patch, rewrite change log]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Remove unused flush_dcache_phys_range()

The flush_dcache_phys_range() function is no longer used in the
kernel. The last usage was removed in c40785ad305b ("powerpc/dart: Use
a cachable DART").

This patch removes the function and declaration.

Signed-off-by: Matt Brown <[email protected]>
[mpe: Munge change log, include commit that removed last user]
Signed-off-by: Michael Ellerman <[email protected]>

lib/raid6: Build proper raid6test files on powerpc

Previously the raid6 test Makefile did not build the POWER specific files
(altivec and vpermxor).
This patch fixes the bug, so that all appropriate files for powerpc are built.

This patch also fixes the missing and mismatched ifdef statements to allow the
altivec.uc file to be built correctly.

Signed-off-by: Matt Brown <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

This patch uses the vpermxor instruction to optimise the raid6 Q
syndrome. This instruction was made available with POWER8, ISA version
2.07. It allows for both vperm and vxor instructions to be done in a
single instruction. This has been tested for correctness on a ppc64le
vm with a basic RAID6 setup containing 5 drives.

The performance benchmarks are from the raid6test in the
/lib/raid6/test directory. These results are from an IBM Firestone
machine with ppc64le architecture. The benchmark results show a 35%
speed increase over the best existing algorithm for powerpc (altivec).
The raid6test has also been run on a big-endian ppc64 vm to ensure it
also works for big-endian architectures.

Performance benchmarks:
  raid6: altivecx4 gen() 18773 MB/s
  raid6: altivecx8 gen() 19438 MB/s

  raid6: vpermxor4 gen() 25112 MB/s
  raid6: vpermxor8 gen() 26279 MB/s

Signed-off-by: Matt Brown <[email protected]>
Reviewed-by: Daniel Axtens <[email protected]>
[mpe: Add VPERMXOR macro so we can build with old binutils]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/5200: dts: digsy_mtc.dts: fix rv3029 compatible

The proper compatible for rv3029 is microcrystal,rv3029.

Acked-by: Anatolij Gustschin <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/time: stop validating rtc_time in .read_time

The RTC core is always calling rtc_valid_tm after the read_time callback.
It is not necessary to call it just before returning from the callback.

Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

When running virtualised the powerpc kernel is able to run the system
in "compat mode" - which means the kernel and hardware are pretending
to userspace that the CPU is an older version than it actually is.

AT_BASE_PLATFORM is an AUXV entry that we export to userspace for use
when we're running in that mode, which tells userspace the "platform"
string for the real CPU version, as opposed to the faked version.

Although we don't support compat mode when using DT CPU features, and
arguably don't need to set AT_BASE_PLATFORM, the existing cputable
based code always sets it even when we're running bare metal. That
means the lack of AT_BASE_PLATFORM is a user-visible artifact of the
fact that the kernel is using DT CPU features, which we don't want.

So set it in the DT CPU features code also.

This results in eg:
  $ LD_SHOW_AUXV=1 /bin/true | grep "AT_.*PLATFORM"
  AT_PLATFORM:     power9
  AT_BASE_PLATFORM:power9

Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>

powerpc/vas: Add a couple of trace points

Add a couple of trace points in the VAS driver

Signed-off-by: Sukadev Bhattiprolu <[email protected]>
[mpe: Add SPDX tag to new header]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/vas: Fix cleanup when VAS is not configured

When VAS is not configured, unregister the platform driver. Also simplify
cleanup by delaying vas debugfs init until we know VAS is configured.

Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/npu-dma.c: Fix crash after __mmu_notifier_register failure

pnv_npu2_init_context wasn't checking the return code from
__mmu_notifier_register. If __mmu_notifier_register failed, the
npu_context was still assigned to the mm and the caller wasn't given any
indication that things went wrong. Later on pnv_npu2_destroy_context would
be called, which in turn called mmu_notifier_unregister and dropped
mm->mm_count without having incremented it in the first place. This led to
various forms of corruption like mm use-after-free and mm double-free.

__mmu_notifier_register can fail with EINTR if a signal is pending, so
this case can be frequent.

This patch calls opal_npu_destroy_context on the failure paths, and makes
sure not to assign mm->context.npu_context until past the failure points.

Signed-off-by: Mark Hairgrove <[email protected]>
Acked-By: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Fix timebase synchronization status on P9

The PSL Timebase register is updated by the PSL to maintain the
timebase.

On P9, the Timebase value is only provided by the CAPP as received the
last time a timebase request was performed.

The timebase requests are initiated through the adapter configuration
or application registers.

The specific sysfs entry "/sys/class/cxl/cardxx/psl_timebase_synced"
is now dynamically updated according the content of the PSL Timebase
register.

Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Signed-off-by: Christophe Lombard <[email protected]>
Reviewed-by: Vaibhav Jain <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: remove radix calls to the slice code

This is a tidy up which removes radix MMU calls into the slice
code.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Use const pointers to cached slice masks where possible

The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 2%.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: remove dead code

This code is never compiled in, and it gets broken by the next
patch, so remove it.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Switch to 3-operand slice bitops helpers

This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which allows 2 inputs to set a different output
without an extra copy, which is used in the next patch.

Adds slice_copy_mask, which will be used in the next patch.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: implement slice_check_range_fits

Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.

This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases, so building and
comparing a full mask is going to be more expensive than testing just
one or two bits of the range.

On POWER8, this increases vfork+exec+exit performance by 0.3%
and reduces time to mmap+munmap a 64kB page by 5%.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: implement a slice mask cache

Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice mask caches.

On POWER8, this increases vfork+exec+exit performance by 9.9%
and reduces time to mmap+munmap a 64kB page by 28%.

Reduces time to mmap+munmap by about 10% on 8xx.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: pass pointers to struct slice_mask where possible

Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.

checkstack.pl gives, before:
0x00000d1c slice_get_unmapped_area [slice.o]: 592
0x00001864 is_hugepage_only_range [slice.o]: 448
0x00000754 slice_find_area_topdown [slice.o]: 400
0x00000484 slice_find_area_bottomup.isra.1 [slice.o]: 272
0x000017b4 slice_set_range_psize [slice.o]: 224
0x00000a4c slice_find_area [slice.o]: 128
0x00000160 slice_check_fit [slice.o]: 112

after:
0x00000ad0 slice_get_unmapped_area [slice.o]: 448
0x00001464 is_hugepage_only_range [slice.o]: 288
0x000006c0 slice_find_area [slice.o]: 144
0x0000016c slice_check_fit [slice.o]: 128
0x00000528 slice_find_area_bottomup.isra.2 [slice.o]: 128
0x000013e4 slice_set_range_psize [slice.o]: 128

This increases vfork+exec+exit performance by 1.5%.

Reduces time to mmap+munmap a 64kB page by 17%.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: tidy lpsizes and hpsizes update loops

Make these loops look the same, and change their form so the
important part is not wrapped over so many lines.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Simplify and optimise slice context initialisation

The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.

This speeds up vfork+exec+exit performance on POWER8 by 3%.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/xmon: Move empty plpar_set_ciabr() into plpar_wrappers.h

Now that plpar_wrappers.h has an #ifdef PSERIES we can move the empty
version of plpar_set_ciabr() which xmon wants into there.

Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Rename plapr routines to plpar

Back in 2013 we added some hypercall wrappers which misspelled
"plpar" (P-series Logical PARtition) as "plapr".

Visually they're hard to distinguish and it almost doesn't matter, but
it is confusing when grepping to miss some calls because of the typo.

They've also started spreading, so before they take over let's fix
them all to be "plpar".

Signed-off-by: Michael Ellerman <[email protected]>

powerpc/pseries: Make plpar_wrappers.h safe to include when PSERIES=n

Currently plpar_wrappers.h is not safe to include when
CONFIG_PPC_PSERIES=n, or at least it can be depending on other config
options and so on.

Fix that by wrapping the entire content in an ifdef.

Signed-off-by: Michael Ellerman <[email protected]>

powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.h

smp_query_cpu_stopped() and related #defines are currently in
plpar_wrappers.h. The function actually does an RTAS call, not an
hcall, and basically has nothing to do with plpar_wrappers.h

Move it into pseries.h, where it can easily be used by the only two
callers in pseries/smp.c and pseries/hotplug-cpu.c.

Signed-off-by: Michael Ellerman <[email protected]>

powerpc/32: Add missing prototypes for (early|machine)_init()

early_init() and machine_init() have no prototype, add one in
asm-prototypes.h.

Fixes the following warnings (treated as error in W=1):
arch/powerpc/kernel/setup_32.c:68:30: error: no previous prototype for ‘early_init’
arch/powerpc/kernel/setup_32.c:99:21: error: no previous prototype for ‘machine_init’

Signed-off-by: Mathieu Malaterre <[email protected]>
[mpe: Move them to asm-prototypes.h, drop other functions]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/32: Make some functions static

These functions can all be static, make it so.

Signed-off-by: Mathieu Malaterre <[email protected]>
[mpe: Combine a patch of Mathieu's with some other static conversions]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Avoid comparison of unsigned long >= 0 in __access_ok()

Rewrite function-like macro into regular static inline function to
avoid a warning during macro expansion.

Fix warning (treated as error in W=1):
./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned expression >= 0 is always true
(((size) == 0) || (((size) - 1) <= ((segment).seg - (addr)))))
^

Suggested-by: Segher Boessenkool <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid()

Rewrite comparison since all values compared are of type `unsigned long`.

Instead of using unsigned properties and rewriting the original code as:
(originally suggested by Segher Boessenkool <[email protected]>)

  #define pfn_valid(pfn) \
               (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET))

Prefer a static inline function to make code as readable as possible.

Fix a warning (treated as error in W=1):
  arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
  #define pfn_valid(pfn)  ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr)
                                  ^

Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/prom: Remove warning on array size when empty

When neither CONFIG_ALTIVEC, nor CONFIG_VSX or CONFIG_PPC64 is
defined, the array feature_properties is defined as an empty array,
which in turn triggers the following warning (treated as error on
W=1):

  arch/powerpc/kernel/prom.c: In function ‘check_cpu_feature_properties’:
  arch/powerpc/kernel/prom.c:298:16: error: comparison of unsigned expression < 0 is always false
    for (i = 0; i < ARRAY_SIZE(feature_properties); ++i, ++fp) {
                  ^

Suggested-by: Michael Ellerman <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototypes for ppc_select() & ppc_fadvise64_64()

Add missing prototypes for ppc_select() & ppc_fadvise64_64() to header
asm-prototypes.h. Fix the following warnings (treated as errors in W=1)

arch/powerpc/kernel/syscalls.c:87:1: error: no previous prototype for ‘ppc_select’
arch/powerpc/kernel/syscalls.c:119:6: error: no previous prototype for ‘ppc_fadvise64_64’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototypes for hw_breakpoint_handler() & arch_unregister_hw_breakpoint()

In commit 5aae8a537080 ("powerpc, hw_breakpoints: Implement
hw_breakpoints for 64-bit server processors") function
hw_breakpoint_handler() and arch_unregister_hw_breakpoint() were added
without function prototypes in hw_breakpoint.h header.

Fix the following warning(s) (treated as error in W=1):
arch/powerpc/kernel/hw_breakpoint.c:106:6: error: no previous prototype for ‘arch_unregister_hw_breakpoint’
arch/powerpc/kernel/hw_breakpoint.c:209:5: error: no previous prototype for ‘hw_breakpoint_handler’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototypes for sys_sigreturn() & sys_rt_sigreturn()

Two functions did not have a prototype defined in signal.h header. Fix
the following two warnings (treated as errors in W=1):

arch/powerpc/kernel/signal_32.c:1135:6: error: no previous prototype for ‘sys_rt_sigreturn’
arch/powerpc/kernel/signal_32.c:1422:6: error: no previous prototype for ‘sys_sigreturn’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for sys_debug_setcontext()

In commit 81e7009ea46c ("powerpc: merge ppc signal.c and ppc64
signal32.c") the function sys_debug_setcontext was added without a
prototype.

Fix compilation warning (treated as error in W=1):
arch/powerpc/kernel/signal_32.c:1227:5: error: no previous prototype for ‘sys_debug_setcontext’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for init_IRQ()

A function init_IRQ() was added without a prototype declared in header
irq.h. Fix the following warning (treated as error in W=1):

arch/powerpc/kernel/irq.c:662:13: error: no previous prototype for ‘init_IRQ’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for arch_irq_work_raise()

In commit 4f8b50bbbe63 ("irq_work, ppc: Fix up arch hooks") a new
function arch_irq_work_raise() was added without a prototype in header
irq_work.h.

Fix the following warning (treated as error in W=1):
arch/powerpc/kernel/time.c:523:6: error: no previous prototype for ‘arch_irq_work_raise’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for arch_dup_task_struct()

In commit 55ccf3fe3f9a ("fork: move the real prepare_to_copy() users to
arch_dup_task_struct()") a new arch_dup_task_struct() was added
without a prototype declared in thread_info.h header. Fix the
following warning (treated as error in W=1):

arch/powerpc/kernel/process.c:1609:5: error: no previous prototype for ‘arch_dup_task_struct’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for time_init()

The function time_init did not have a prototype defined in the time.h
header. Fix the following warning (treated as error in W=1):

arch/powerpc/kernel/time.c:1068:13: error: no previous prototype for ‘time_init’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for hdec_interrupt

In commit dabe859ec636 ("powerpc: Give hypervisor decrementer interrupts
their own handler") an empty body function was added, but no prototype
was declared. Fix warning (treated as error in W=1):

arch/powerpc/kernel/time.c:629:6: error: no previous prototype for ‘hdec_interrupt’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Add missing prototype for slb_miss_bad_addr()

In commit f0f558b131db ("powerpc/mm: Preserve CFAR value on SLB miss caused
by access to bogus address"), the function slb_miss_bad_addr() was added
without a prototype. This commit adds it.

Fix a warning (treated as error in W=1):
arch/powerpc/kernel/traps.c:1498:6: error: no previous prototype for ‘slb_miss_bad_addr’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/kernel: Make function __giveup_fpu() static

__giveup_fpu() is never called outside process.c, so it can be static.
That also means we don't need an empty definition in switch_to.h

Signed-off-by: Mathieu Malaterre <[email protected]>
[mpe: Also drop the empty version, rewrite change log]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/embedded6xx: Make functions flipper_pic_init() & ug_udbg_putc() static

Change signature of two functions, adding static keyword to prevent the
following two warnings (treated as errors on W=1):

arch/powerpc/platforms/embedded6xx/flipper-pic.c:135:28: error: no previous prototype for ‘flipper_pic_init’
arch/powerpc/platforms/embedded6xx/usbgecko_udbg.c:172:6: error: no previous prototype for ‘ug_udbg_putc’

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/32: Mark both tmp variables as unused

Since the value of `tmp` is never intended to be read, declare both `tmp`
variables as unused. Fix warning (treated as error in W=1):

  arch/powerpc/kernel/signal_32.c: In function ‘sys_swapcontext’:
  arch/powerpc/kernel/signal_32.c:1048:16: error: variable ‘tmp’ set but not used
  arch/powerpc/kernel/signal_32.c: In function ‘sys_debug_setcontext’:
  arch/powerpc/kernel/signal_32.c:1234:16: error: variable ‘tmp’ set but not used

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/32: Move the inline keyword at the beginning of function declaration

The inline keyword was not at the beginning of the function declaration.
Fix the following warning (treated as error in W=1):

  arch/powerpc/lib/sstep.c:283:1: error: ‘inline’ is not at beginning of declaration
   static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb,
  arch/powerpc/lib/sstep.c:388:1: error: ‘inline’ is not at beginning of declaration
   static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb,

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/epapr: Move register keyword at the beginning of declaration

Fix warning for all register unsigned long (0,3-12) that appear during W=1
compilation:

./arch/powerpc/include/asm/epapr_hcalls.h:479:2: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration]
unsigned long register r[\d] asm("r[\d]");

Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

PCI/hotplug: ppc: correct a php_slot usage after free

In pnv_php_unregister_one(), pnv_php_put_slot() might kfree
php_slot structure. But there is pci_hp_deregister() after
that with php_slot reference.

This patch moves pnv_php_put_slot() to the end of function.

Signed-off-by: Simon Guo <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv/mce: Don't silently restart the machine

On MCE the current code will restart the machine with
ppc_md.restart(). This case was extremely unlikely since
prior to that a skiboot call is made and that resulted in
a checkstop for analysis.

With newer skiboots, on P9 we don't checkstop the box by
default, instead we return back to the kernel to extract
useful information at the time of the MCE. While we still
get this information, this patch converts the restart to
a panic(), so that if configured a dump can be taken and
we can track and probably debug the potential issue causing
the MCE.

Signed-off-by: Balbir Singh <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Reviewed-by: Stewart Smith <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: read PHB indications from the device tree

Configure the P9 XSL_DSNCTL register with PHB indications found
in the device tree, or else use legacy hard-coded values.

Signed-off-by: Philippe Bergheaud <[email protected]>
Reviewed-by: Frederic Barrat <[email protected]>
Reviewed-by: Christophe Lombard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv: Enable tunneled operations

P9 supports PCI tunneled operations (atomics and as_notify). This
patch adds support for tunneled operations on powernv, with a new
API, to be called by device drivers:

pnv_pci_enable_tunnel()
   Enable tunnel operations, tell driver the 16-bit ASN indication
   used by kernel.

pnv_pci_disable_tunnel()
   Disable tunnel operations.

pnv_pci_set_tunnel_bar()
   Tell kernel the Tunnel BAR Response address used by driver.
   This function uses two new OPAL calls, as the PBCQ Tunnel BAR
   register is configured by skiboot.

pnv_pci_get_as_notify_info()
   Return the ASN info of the thread to be woken up.

Signed-off-by: Philippe Bergheaud <[email protected]>
Reviewed-by: Frederic Barrat <[email protected]>
Reviewed-by: Christophe Lombard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv/npu: Fix deadlock in mmio_invalidate()

When sending TLB invalidates to the NPU we need to send extra flushes due
to a hardware issue. The original implementation would lock the all the
ATSD MMIO registers sequentially before unlocking and relocking each of
them sequentially to do the extra flush.

This introduced a deadlock as it is possible for one thread to hold one
ATSD register whilst waiting for another register to be freed while the
other thread is holding that register waiting for the one in the first
thread to be freed.

For example if there are two threads and two ATSD registers:

  Thread A Thread B
  ----------------------
  Acquire 1
  Acquire 2
  Release 1 Acquire 1
  Wait 1 Wait 2

Both threads will be stuck waiting to acquire a register resulting in an
RCU stall warning or soft lockup.

This patch solves the deadlock by refactoring the code to ensure registers
are not released between flushes and to ensure all registers are either
acquired or released together and in order.

Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD")
Signed-off-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/8xx: fix cpm_cascade() dual end of interrupt

cpm_cascade() doesn't have to call eoi() as it is already called
by handle_fasteoi_irq()

And cpm_get_irq() will always return an unsigned int so the test
is useless

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm: Drop the function native_register_proc_table()

This is left over from the segment table implementation and not getting
called from any where now. Hence just drop it.

Suggested-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Check if PSL data-cache is available before issue flush request

PSL9D doesn't have a data-cache that needs to be flushed before
resetting the card. However when cxl tries to flush data-cache on such
a card, it times-out as PSL_Control register never indicates flush
operation complete due to missing data-cache. This is usually
indicated in the kernel logs with this message:

"WARNING: cache flush timed out"

To fix this the patch checks PSL_Debug register CDC-Field(BIT:27)
which indicates the absence of a data-cache and sets a flag
'no_data_cache' in 'struct cxl_native' to indicate this. When
cxl_data_cache_flush() is called it checks the flag and if set bails
out early without requesting a data-cache flush operation to the PSL.

Signed-off-by: Vaibhav Jain <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Remove function write_timebase_ctrl_psl9() for PSL9

For PSL9 the contents of PSL_TB_CTLSTAT register have changed in PSL9
and all of the register is now readonly. Hence we don't need an sl_ops
implementation for 'write_timebase_ctrl' for to populate this register
for PSL9.

Hence this patch removes function write_timebase_ctrl_psl9() and its
references from the code.

Signed-off-by: Vaibhav Jain <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Enable NORST bit in PSL_DEBUG register for PSL9

We enable the NORST bit by default for debug afu images to prevent
reset of AFU trace-data on a PCI link drop. For production AFU images
this bit is always ignored and PSL gets reconfigured anyways thereby
resetting the trace data. So setting this bit for non-debug images
doesn't have any impact.

Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Christophe Lombard <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/xmon: Clear all breakpoints when xmon is disabled via debugfs

Presently when xmon is disabled by debugfs any existing
instruction/data-access breakpoints set are not disabled. This may
lead to kernel oops when those breakpoints are hit as the necessary
debugger hooks aren't installed.

Hence this patch introduces a new function named clear_all_bpt() which
is called when xmon is disabled via debugfs. The function will
unpatch/clear all the trap and ciabr/dab based breakpoints.

Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
[mpe: Fix build break when CONFIG_DEBUG_FS=n]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/xmon: Setup debugger hooks when first break-point is set

Presently sysrq key for xmon('x') is registered during kernel init
irrespective of the value of kernel param 'xmon'. Thus xmon is enabled
even if 'xmon=off' is passed on the kernel command line. However this
doesn't enable the kernel debugger hooks needed for instruction or
data breakpoints. Thus when a break-point is hit with xmon=off a
kernel oops of the form below is reported:

  Oops: Exception in kernel mode, sig: 5 [#1]
  < snip >
  Trace/breakpoint trap

To fix this the patch checks and enables debugger hooks when an
instruction or data break-point is set via xmon console.

Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
[mpe: Just printf directly, no need for static const char[]]
Signed-off-by: Michael Ellerman <[email protected]>

selftests/powerpc: Skip tm-unavailable if TM is not enabled

Some processor revisions do not support transactional memory, and
additionally kernel support can be disabled. In either case the
tm-unavailable test should be skipped, otherwise it will fail with
a SIGILL.

That commit also sets this selftest to be called through the test
harness as it's done for other TM selftests.

Finally, it avoids using "ping" as a thread name since it's
ambiguous and can be confusing when shown, for instance,
in a kernel backtrace log.

Fixes: 77fad8bfb1d2 ("selftests/powerpc: Check FP/VEC on exception in TM")
Signed-off-by: Gustavo Romero <[email protected]>
Reviewed-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/via-pmu: Fix section mismatch warning

Make the struct via_pmu_driver const to avoid following warning:

WARNING: vmlinux.o(.data+0x4739c): Section mismatch in reference from the variable via_pmu_driver to the function .init.text:pmu_init()
The variable via_pmu_driver references
the function __init pmu_init()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console

Signed-off-by: Mathieu Malaterre <[email protected]>
Suggested-by: Laurent Vivier <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv/vas: Fix order of cleanup in vas_window_init_dbgdir()

Fix the order of cleanup to ensure we free the name buffer in case
of an error creating 'hvwc' or 'info' files.

Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/powernv/vas: Remove a stray line in Makefile

Remove a bogus line from arch/powerpc/platforms/powernv/Makefile that
was added by commit ece4e51 ("powerpc/vas: Export HVWC to debugfs").

Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

selftest/powerpc: Add test for sigreturn in transaction

Ensure that kernel is throwing away the suspended transaction when
sigreturn() is called otherwise it if fails to restore the signal
frame's TM SPRS.

Signed-off-by: Laurent Dufour <[email protected]>
Reviewed-by: Cyril Bur <[email protected]>
[mpe: Add have_htm() check, minor formatting, add SPDX tag]
Signed-off-by: Michael Ellerman <[email protected]>

macintosh: Add module license to ans-lcd

In kernel 4.15, the modprobe step on my PowerBook G4 started complaining that
there was no module license for ans-lcd.

Signed-off-by: Larry Finger <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Merge two commits from 'kvm-ppc-fixes' into next

This merges two commits from the `kvm-ppc-fixes` branch into next, as
they fix build breaks we are seeing while testing next.

powerpc/pseries: Fix vector5 in ibm architecture vector table

With ibm,dynamic-memory-v2 and ibm,drc-info coming around the same
time, byte22 in vector5 of ibm architecture vector table got set twice
separately. The end result is that guest kernel isn't advertising
support for ibm,dynamic-memory-v2.

Fix this by removing the duplicate assignment of byte22.

Fixes: 02ef6dd8109b ("powerpc: Enable support for ibm,drc-info devtree property")
Signed-off-by: Bharata B Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/8xx: Increase number of slices to 64

On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode.

This patch increases the number of slices from 16 to 64 on the 8xx.

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Allow up to 64 low slices

While the implementation of the "slices" address space allows
a significant amount of high slices, it limits the number of
low slices to 16 due to the use of a single u64 low_slices_psize
element in struct mm_context_t

On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode. This means we could have at least 64 slices.

In order to override this limitation, this patch switches the
handling of low_slices_psize to char array as done already for
high_slices_psize.

Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx

On the 8xx, the page size is set in the PMD entry and applies to
all pages of the page table pointed by the said PMD entry.

When an app has some regular pages allocated (e.g. see below) and tries
to mmap() a huge page at a hint address covered by the same PMD entry,
the kernel accepts the hint allthough the 8xx cannot handle different
page sizes in the same PMD entry.

10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc
10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc

mmap(0x10080000, 524288, PROT_READ|PROT_WRITE,
     MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x10080000

This results the app remaining forever in do_page_fault()/hugetlb_fault()
and when interrupting that app, we get the following warning:

[162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4
[162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W       4.14.6 #85
[162980.035744] task: c67e2c00 task.stack: c668e000
[162980.035783] NIP:  c000fe18 LR: c00e1eec CTR: c00f90c0
[162980.035830] REGS: c668fc20 TRAP: 0700   Tainted: G W        (4.14.6)
[162980.035854] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24044224 XER: 20000000
[162980.036003]
[162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000
[162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc
[162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48
[162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000
[162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4
[162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150
[162980.036861] Call Trace:
[162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable)
[162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150
[162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4
[162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8
[162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c
[162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc
[162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614
[162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244
[162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88
[162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4
[162980.037781] Instruction dump:
[162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94
[162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094
[162980.038216] ---[ end trace c0ceeca8e7a5800a ]---
[162980.038754] BUG: non-zero nr_ptes on freeing mm: 1
[162985.363322] BUG: non-zero nr_ptes on freeing mm: -1

In order to fix this, this patch uses the address space "slices"
implemented for BOOK3S/64 and enhanced to support PPC32 by the
preceding patch.

This patch modifies the context.id on the 8xx to be in the range
[1:16] instead of [0:15] in order to identify context.id == 0 as
not initialised contexts as done on BOOK3S

This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is
selected for the 8xx

Alltough we could in theory have as many slices as PMD entries, the
current slices implementation limits the number of low slices to 16.
This limitation is not preventing us to fix the initial issue allthough
it is suboptimal. It will be cured in a subsequent patch.

Fixes: 4b91428699477 ("powerpc/8xx: Implement support of hugepages")
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Enhance for supporting PPC32

In preparation for the following patch which will fix an issue on
the 8xx by re-using the 'slices', this patch enhances the
'slices' implementation to support 32 bits CPUs.

On PPC32, the address space is limited to 4Gbytes, hence only the low
slices will be used.

The high slices use bitmaps. As bitmap functions are not prepared to
handle bitmaps of size 0, this patch ensures that bitmap functions
are called only when SLICE_NUM_HIGH is not nul.

Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: create header files dedicated to slices

In preparation for the following patch which will enhance 'slices'
for supporting PPC32 in order to fix an issue on hugepages on 8xx,
this patch takes out of page*.h all bits related to 'slices' and put
them into newly created slice.h header files.
While common parts go into asm/slice.h, subarch specific
parts go into respective books3s/64/slice.c and nohash/64/slice.c
'slices'

Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/slice: Remove intermediate bitmap copy

bitmap_or() and bitmap_andnot() can work properly with dst identical
to src1 or src2. There is no need of an intermediate result bitmap
that is copied back to dst in a second step.

Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Keep const vars out of writable .sdata

Newer gcc will support "-mno-readonly-in-sdata"[1], which makes sure that
the optimization on PPC32 for variables getting moved into the .sdata
section will not apply to const variables (which must be in .rodata).

This was originally noticed in mm/rodata_test.c when rodata_test_data
was not static:

c0695034 g O .data 00000004 rodata_test_data

After this patch with an updated compiler, this is correctly in .rodata.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82411

Reported-by: Christophe Leroy <[email protected]>
Signed-off-by: Segher Boessenkool <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Linux 4.16-rc4

Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:

   - Add missing instruction suffixes to assembly code so it can be
     compiled by newer GAS versions without warnings.

   - Switch refcount WARN exceptions to UD2 as we did in general

   - Make the reboot on Intel Edison platforms work

   - A small documentation update so text and sample command match"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation, x86, resctrl: Make text and sample command match
  x86/platform/intel-mid: Handle Intel Edison reboot correctly
  x86/asm: Add instruction suffixes to bitops
  x86/entry/64: Add instruction suffix
  x86/refcounts: Switch to UD2 for exceptions

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/pti fixes from Thomas Gleixner:
"Three fixes related to melted spectrum:

   - Sync the cpu_entry_area page table to initial_page_table on 32 bit.

     Otherwise suspend/resume fails because resume uses
     initial_page_table and triggers a triple fault when accessing the
     cpu entry area.

   - Zero the SPEC_CTL MRS on XEN before suspend to address a
     shortcoming in the hypervisor.

   - Fix another switch table detection issue in objtool"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
  objtool: Fix another switch table detection issue
  x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"A small set of fixes from the timer departement:

   - Add a missing timer wheel clock forward when migrating timers off a
     unplugged CPU to prevent operating on a stale clock base and
     missing timer deadlines.

   - Use the proper shift count to extract data from a register value to
     prevent evaluating unrelated bits

   - Make the error return check in the FSL timer driver work correctly.
     Checking an unsigned variable for less than zero does not really
     work well.

   - Clarify the confusing comments in the ARC timer code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Forward timer base before migrating timers
  clocksource/drivers/arc_timer: Update some comments
  clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
  clocksource/drivers/fsl_ftm_timer: Fix error return checking

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixlet from Thomas Gleixner:
"Just a documentation update for the missing device tree property of
the R-Car M3N interrupt controller"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings/irqchip/renesas-irqc: Document R-Car M3-N support

Merge tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- when NR_CPUS is large, a SRCU structure can significantly inflate
   size of the main filesystem structure that would not be possible to
   allocate by kmalloc, so the kvalloc fallback is used

- improved error handling

- fix endiannes when printing some filesystem attributes via sysfs,
   this is could happen when a filesystem is moved between different
   endianity hosts

- send fixes: the NO_HOLE mode should not send a write operation for a
   file hole

- fix log replay for for special files followed by file hardlinks

- fix log replay failure after unlink and link combination

- fix max chunk size calculation for DUP allocation

* tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix log replay failure after unlink and link combination
  Btrfs: fix log replay failure after linking special file and fsync
  Btrfs: send, fix issuing write op when processing hole in no data mode
  btrfs: use proper endianness accessors for super_copy
  btrfs: alloc_chunk: fix DUP stripe size handling
  btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
  btrfs: handle failure of add_pending_csums
  btrfs: use kvzalloc to allocate btrfs_fs_info

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"A driver fix and a documentation fix (which makes dependency handling
  for the next cycle easier)"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: octeon: Prevent error message on bus error
  dt-bindings: at24: sort manufacturers alphabetically

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A 4.16 regression fix, three fixes for -stable, and a cleanup fix:

   - During the merge window support for the new ACPI NVDIMM Platform
     Capabilities structure disabled support for "deep flush", a
     force-unit- access like mechanism for persistent memory. Restore
     that mechanism.

   - VFIO like RDMA is yet one more memory registration / pinning
     interface that is incompatible with Filesystem-DAX. Disable long
     term pins of Filesystem-DAX mappings via VFIO.

   - The Filesystem-DAX detection to prevent long terms pins mistakenly
     also disabled Device-DAX pins which are not subject to the same
     block- map collision concerns.

   - Similar to the setup path, softlockup warnings can trigger in the
     shutdown path for large persistent memory namespaces. Teach
     for_each_device_pfn() to perform cond_resched() in all cases.

   - Boaz noticed that the might_sleep() in dax_direct_access() is stale
     as of the v4.15 kernel.

  These have received a build success notification from the 0day robot,
  and the longterm pin fixes have appeared in -next. However, I recently
  rebased the tree to remove some other fixes that need to be reworked
  after review feedback.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  memremap: fix softlockup reports at teardown
  libnvdimm: re-enable deep flush for pmem devices via fsync()
  vfio: disable filesystem-dax page pinning
  dax: fix vma_is_fsdax() helper
  dax: ->direct_access does not sleep anymore