* arm64/for-next/perf:
perf: Switch back to struct platform_driver::remove()
perf: arm_pmuv3: Add support for Samsung Mongoose PMU
dt-bindings: arm: pmu: Add Samsung Mongoose core compatible
perf/dwc_pcie: Fix typos in event names
perf/dwc_pcie: Add support for Ampere SoCs
ARM: pmuv3: Add missing write_pmuacr()
perf/marvell: Marvell PEM performance monitor support
perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control
perf/dwc_pcie: Convert the events with mixed case to lowercase
perf/cxlpmu: Support missing events in 3.1 spec
perf: imx_perf: add support for i.MX91 platform
dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible
drivers perf: remove unused field pmu_node
* for-next/gcs: (42 commits)
: arm64 Guarded Control Stack user-space support
kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
arm64/gcs: Fix outdated ptrace documentation
kselftest/arm64: Ensure stable names for GCS stress test results
kselftest/arm64: Validate that GCS push and write permissions work
kselftest/arm64: Enable GCS for the FP stress tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Add GCS signal tests
kselftest/arm64: Add test coverage for GCS mode locking
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Verify the GCS hwcap
arm64: Add Kconfig for Guarded Control Stack (GCS)
arm64/ptrace: Expose GCS via ptrace and core files
arm64/signal: Expose GCS state in signal frames
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/mm: Implement map_shadow_stack()
...
* for-next/misc:
: Miscellaneous patches
arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
arm64/ptrace: Clarify documentation of VL configuration via ptrace
acpi/arm64: remove unnecessary cast
arm64/mm: Change protval as 'pteval_t' in map_range()
arm64: uprobes: Optimize cache flushes for xol slot
acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings
arm64/mm: Sanity check PTE address before runtime P4D/PUD folding
arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont()
ACPI: GTDT: Tighten the check for the array of platform timer structures
arm64/fpsimd: Fix a typo
arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumers
arm64: Return early when break handler is found on linked-list
arm64/mm: Re-organize arch_make_huge_pte()
arm64/mm: Drop _PROT_SECT_DEFAULT
arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV
arm64: head: Drop SWAPPER_TABLE_SHIFT
arm64: cpufeature: add POE to cpucap_is_possible()
arm64/mm: Change pgattr_change_is_safe() arguments as pteval_t
* for-next/mte:
: Various MTE improvements
selftests: arm64: add hugetlb mte tests
hugetlb: arm64: add mte support
* for-next/stacktrace:
: arm64 stacktrace improvements
arm64: preserve pt_regs::stackframe during exec*()
arm64: stacktrace: unwind exception boundaries
arm64: stacktrace: split unwind_consume_stack()
arm64: stacktrace: report recovered PCs
arm64: stacktrace: report source of unwind data
arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
arm64: use a common struct frame_record
arm64: pt_regs: swap 'unused' and 'pmr' fields
arm64: pt_regs: rename "pmr_save" -> "pmr"
arm64: pt_regs: remove stale big-endian layout
arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
* for-next/hwcap3:
: Add AT_HWCAP3 support for arm64 (also wire up AT_HWCAP4)
arm64: Support AT_HWCAP3
binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4
* for-next/kselftest: (30 commits)
: arm64 kselftest fixes/cleanups
kselftest/arm64: Try harder to generate different keys during PAC tests
kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
kselftest/arm64: Add FPMR coverage to fp-ptrace
kselftest/arm64: Expand the set of ZA writes fp-ptrace does
kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
kselftest/arm64: Enable build of PAC tests with LLVM=1
kselftest/arm64: Check that SVCR is 0 in signal handlers
kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
kselftest/arm64: Fix build with stricter assemblers
kselftest/arm64: Test signal handler state modification in fp-stress
kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test
kselftest/arm64: Implement irritators for ZA and ZT
kselftest/arm64: Remove unused ADRs from irritator handlers
kselftest/arm64: Correct misleading comments on fp-stress irritators
kselftest/arm64: Poll less often while waiting for fp-stress children
kselftest/arm64: Increase frequency of signal delivery in fp-stress
kselftest/arm64: Fix encoding for SVE B16B16 test
...
* for-next/crc32:
: Optimise CRC32 using PMULL instructions
arm64/crc32: Implement 4-way interleave using PMULL
arm64/crc32: Reorganize bit/byte ordering macros
arm64/lib: Handle CRC-32 alternative in C code
* for-next/guest-cca:
: Support for running Linux as a guest in Arm CCA
arm64: Document Arm Confidential Compute
virt: arm-cca-guest: TSM_REPORT support for realms
arm64: Enable memory encrypt for Realms
arm64: mm: Avoid TLBI when marking pages as valid
arm64: Enforce bounce buffers for realm DMA
efi: arm64: Map Device with Prot Shared
arm64: rsi: Map unprotected MMIO as decrypted
arm64: rsi: Add support for checking whether an MMIO is protected
arm64: realm: Query IPA size from the RMM
arm64: Detect if in a realm and set RIPAS RAM
arm64: rsi: Add RSI definitions
* for-next/haft:
: Support for arm64 FEAT_HAFT
arm64: pgtable: Warn unexpected pmdp_test_and_clear_young()
arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
arm64: Add support for FEAT_HAFT
arm64: setup: name 'tcr2' register
arm64/sysreg: Update ID_AA64MMFR1_EL1 register
* for-next/scs:
: Dynamic shadow call stack fixes
arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
arm64/scs: Deal with 64-bit relative offsets in FDE frames
arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames
Will Deacon [Thu, 14 Nov 2024 09:53:32 +0000 (09:53 +0000)]
arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
Commit 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of
tpidrro_el0 for native tasks") tried to optimise the context switching
of tpidrro_el0 by eliding the clearing of the register when switching
to a native task with kpti enabled, on the erroneous assumption that
the kpti trampoline entry code would already have taken care of the
write.
Although the kpti trampoline does zero the register on entry from a
native task, the check in tls_thread_switch() is on the *next* task and
so we can end up leaving a stale, non-zero value in the register if the
previous task was 32-bit.
Drop the broken optimisation and zero tpidrro_el0 unconditionally when
switching to a native 64-bit task.
Mark Brown [Mon, 11 Nov 2024 16:18:56 +0000 (16:18 +0000)]
kselftest/arm64: Try harder to generate different keys during PAC tests
We very intermittently see failures in the single_thread_different_keys
PAC test. As noted in the comment in the test the PAC field can be quite
narrow so there is a chance of collisions even with different keys with a
chance of 5% for 7 bit keys, and the potential for narrower keys. The test
tries to avoid this by running repeatedly, but only tries 10 times which
even with a 5% chance of collisions isn't enough.
Increase the number of times we attempt to look for collisions by a factor
of 100, this also affects other tests which are following a similar pattern
with running the test repeatedly and either don't care like with
pac_instruction_not_nop or potentially have the same issue like
exec_sign_all.
The PAC tests are very fast, running in a second or two even in emulation,
so the 100x increased cost is mildly irritating but not a huge issue. The
bulk of the overhead is in the exec_sign_all test which does a fork() and
exec() per iteration.
Mark Brown [Mon, 11 Nov 2024 16:18:55 +0000 (16:18 +0000)]
kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
The PAC exec_sign_all() test spawns some child processes, creating pipes
to be stdin and stdout for the child. It cleans up most of the file
descriptors that are created as part of this but neglects to clean up the
parent end of the child stdin and stdout. Add the missing close() calls.
Mark Brown [Wed, 6 Nov 2024 17:41:32 +0000 (17:41 +0000)]
arm64/ptrace: Clarify documentation of VL configuration via ptrace
When we configure SVE, SSVE or ZA via ptrace we allow the user to configure
the vector length and specify any of the flags that are accepted when
configuring via prctl(). This includes the S[VM]E_SET_VL_ONEXEC flag which
defers the configuration of the VL until an exec(). We don't do anything to
limit the provision of register data as part of configuring the _ONEXEC VL
but as a function of the VL enumeration support we do this will be
interpreted using the vector length currently configured for the process.
This is all a bit surprising, and probably we should just not have allowed
register data to be specified with _ONEXEC, but it's our ABI so let's
add some explicit documentation in both the ABI documents and the source
calling out what happens.
The comments are also missing the fact that since SME does not have a
mandatory 128 bit VL it is possible for VL enumeration to result in the
configuration of a higher VL than was requested, cover that too.
Mark Brown [Thu, 7 Nov 2024 01:39:22 +0000 (01:39 +0000)]
kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
When building for streaming SVE the irritator for SVE skips updates of both
P0 and FFR. While FFR is skipped since it might not be present there is no
reason to skip corrupting P0 so switch to an instruction valid in streaming
mode and move the ifdef.
arm64/mm: Change protval as 'pteval_t' in map_range()
pgprot_t has been defined as an encapsulated structure with pteval_t as its
element. Hence it is prudent to use pteval_t as the type instead of via the
size based u64. Besides pteval_t type might be different size later on with
FEAT_D128.
Catalin Marinas [Tue, 12 Nov 2024 14:35:05 +0000 (14:35 +0000)]
kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
Compiling the child_cleanup() function results in:
gcs-stress.c: In function ‘child_cleanup’:
gcs-stress.c:266:75: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
266 | ksft_print_msg("%s: Exited due to signal %d\n",
| ~^
| |
| int
Mark Brown [Tue, 12 Nov 2024 13:08:16 +0000 (13:08 +0000)]
kselftest/arm64: Add FPMR coverage to fp-ptrace
Add coverage for FPMR to fp-ptrace. FPMR can be available independently of
SVE and SME, if SME is supported then FPMR is cleared by entering and
exiting streaming mode. As with other registers we generate random values
to load into the register, we restrict these to bitfields which are always
defined. We also leave bitfields where the valid values are affected by
the set of supported FP8 formats zero to reduce complexity, it is unlikely
that specific bitfields will be affected by ptrace issues.
Mark Brown [Tue, 12 Nov 2024 13:08:15 +0000 (13:08 +0000)]
kselftest/arm64: Expand the set of ZA writes fp-ptrace does
Currently our test for implementable ZA writes is written in a bit of a
convoluted fashion which excludes all changes where we clear SVCR.SM even
though we can actually support that since changing the vector length resets
SVCR. Make the logic more direct, enabling us to actually run these cases.
Mark Brown [Tue, 12 Nov 2024 13:08:14 +0000 (13:08 +0000)]
kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
The assembler portions of fp-ptrace are passed feature flags by the C code
indicating which architectural features are supported. Currently these use
an entire register for each flag which is wasteful and gets cumbersome as
new flags are added. Switch to using flag bits in a single register to make
things easier to maintain.
Mark Brown [Mon, 11 Nov 2024 18:32:58 +0000 (18:32 +0000)]
kselftest/arm64: Enable build of PAC tests with LLVM=1
Currently we don't build the PAC selftests when building with LLVM=1 since
we attempt to test for PAC support in the toolchain before we've set up the
build system to point at LLVM in lib.mk, which has to be one of the last
things in the Makefile.
Since all versions of LLVM supported for use with the kernel have PAC
support we can just sidestep the issue by just assuming PAC is there when
doing a LLVM=1 build.
Mark Brown [Wed, 6 Nov 2024 17:07:51 +0000 (17:07 +0000)]
kselftest/arm64: Check that SVCR is 0 in signal handlers
We don't currently validate that we exit streaming mode and clear ZA when
we enter a signal handler. Add simple checks for this in the SSVE and ZA
tests.
Catalin Marinas [Fri, 8 Nov 2024 13:49:18 +0000 (13:49 +0000)]
kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
While prctl() returns an 'int', the PR_MTE_TCF_MASK is defined as
unsigned long which results in the larger type following a bitwise 'and'
operation. Cast the printf() argument to 'int'.
Mark Brown [Fri, 8 Nov 2024 15:20:46 +0000 (15:20 +0000)]
kselftest/arm64: Fix build with stricter assemblers
While some assemblers (including the LLVM assembler I mostly use) will
happily accept SMSTART as an instruction by default others, specifically
gas, require that any architecture extensions be explicitly enabled.
The assembler SME test programs use manually encoded helpers for the new
instructions but no SMSTART helper is defined, only SM and ZA specific
variants. Unfortunately the irritators that were just added use plain
SMSTART so on stricter assemblers these fail to build:
za-test.S:160: Error: selected processor does not support `smstart'
Switch to using SMSTART ZA via the manually encoded smstart_za macro we
already have defined.
Ard Biesheuvel [Wed, 6 Nov 2024 18:55:16 +0000 (19:55 +0100)]
arm64/scs: Deal with 64-bit relative offsets in FDE frames
In some cases, the compiler may decide to emit DWARF FDE frames with
64-bit signed fields for the code offset and range fields. This may
happen when using the large code model, for instance, which permits
an executable to be spread out over more than 4 GiB of address space.
Whether this is the case can be inferred from the augmentation data in
the CIE frame, so decode this data before processing the FDE frames.
Ard Biesheuvel [Wed, 6 Nov 2024 18:55:15 +0000 (19:55 +0100)]
arm64/scs: Fix handling of DWARF augmentation data in CIE/FDE frames
The dynamic SCS patching code pretends to parse the DWARF augmentation
data in the CIE (header) frame, and handle accordingly when processing
the individual FDE frames based on this CIE frame. However, the boolean
variable is defined inside the loop, and so the parsed value is ignored.
The same applies to the code alignment field, which is also read from
the header but then discarded.
This was never spotted before because Clang is the only compiler that
supports dynamic SCS patching (which is essentially an Android feature),
and the unwind tables it produces are highly uniform, and match the
de facto defaults.
So instead of testing for the 'z' flag in the augmentation data field,
require a fixed augmentation data string of 'zR', and simplify the rest
of the code accordingly.
Also introduce some error codes to specify why the patching failed, and
log it to the kernel console on failure when this happens when loading a
module. (Doing so for vmlinux is infeasible, as the patching is done
extremely early in the boot.)
arm64: uprobes: Optimize cache flushes for xol slot
The profiling of single-thread selftests bench reveals a bottlenect in
caches_clean_inval_pou() on ARM64. On my local testing machine, this
function takes approximately 34% of CPU cycles for trig-uprobe-nop and
trig-uprobe-push.
This patch add a check to avoid unnecessary cache flush when writing
instruction to the xol slot. If the instruction is same with the
existing instruction in slot, there is no need to synchronize D/I cache.
Since xol slot allocation and updates occur on the hot path of uprobe
handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
nop and push testcases.
Aleksandr Mishin [Tue, 27 Aug 2024 10:12:39 +0000 (13:12 +0300)]
acpi/arm64: Adjust error handling procedure in gtdt_parse_timer_block()
In case of error in gtdt_parse_timer_block() invalid 'gtdt_frame'
will be used in 'do {} while (i-- >= 0 && gtdt_frame--);' statement block
because do{} block will be executed even if 'i == 0'.
Adjust error handling procedure by replacing 'i-- >= 0' with 'i-- > 0'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Mark Brown [Thu, 7 Nov 2024 01:39:25 +0000 (01:39 +0000)]
kselftest/arm64: Test signal handler state modification in fp-stress
Currently in fp-stress we test signal delivery to the test threads by
sending SIGUSR2 which simply counts how many signals are delivered. The
test programs now also all have a SIGUSR1 handler which for the threads
doing userspace testing additionally modifies the floating point register
state in the signal handler, verifying that when we return the saved
register state is restored from the signal context as expected. Switch over
to triggering that to validate that we are restoring as expected.
Mark Brown [Thu, 7 Nov 2024 01:39:24 +0000 (01:39 +0000)]
kselftest/arm64: Provide a SIGUSR1 handler in the kernel mode FP stress test
The other stress test programs provide a SIGUSR1 handler which modifies the
live register state in order to validate that signal context is being
restored during signal return. While we can't usefully do this when testing
kernel mode FP usage provide a handler for SIGUSR1 which just counts the
number of signals like we do for SIGUSR2, allowing fp-stress to treat all
the test programs uniformly.
Mark Brown [Thu, 7 Nov 2024 01:39:23 +0000 (01:39 +0000)]
kselftest/arm64: Implement irritators for ZA and ZT
Currently we don't use the irritator signal in our floating point stress
tests so when we added ZA and ZT stress tests we didn't actually bother
implementing any actual action in the handlers, we just counted the signal
deliveries. In preparation for using the irritators let's implement them,
just trivially SMSTOP and SMSTART to reset all bits in the register to 0.
Mark Brown [Thu, 7 Nov 2024 01:39:21 +0000 (01:39 +0000)]
kselftest/arm64: Remove unused ADRs from irritator handlers
The irritator handlers for the fp-stress test programs all use ADR to load
an address into x0 which is then not referenced. Remove these ADRs as they
just cause confusion.
Mark Brown [Thu, 7 Nov 2024 01:39:20 +0000 (01:39 +0000)]
kselftest/arm64: Correct misleading comments on fp-stress irritators
The comments in the handlers for the irritator signal in the test threads
for fp-stress suggest that the irritator will corrupt the register state
observed by the main thread but this is not the case, instead the FPSIMD
and SVE irritators (which are the only ones that are implemented) modify
the current register state which is expected to be overwritten on return
from the handler by the saved register state. Update the comment to reflect
what the handler is actually doing.
Mark Brown [Wed, 30 Oct 2024 00:02:03 +0000 (00:02 +0000)]
kselftest/arm64: Poll less often while waiting for fp-stress children
While fp-stress is waiting for children to start it doesn't send any
signals to them so there is no need for it to have as short an epoll()
timeout as it does when the children are all running. We do still want to
have some timeout so that we can log diagnostics about missing children but
this can be relatively large. On emulated platforms the overhead of running
the supervisor process is quite high, especially during the process of
execing the test binaries.
Implement a longer epoll() timeout during the setup phase, using a 5s
timeout while waiting for children and switching to the signal raise
interval when all the children are started and we start sending signals.
Mark Brown [Wed, 30 Oct 2024 00:02:02 +0000 (00:02 +0000)]
kselftest/arm64: Increase frequency of signal delivery in fp-stress
Currently we only deliver signals to the processes being tested about once
a second, meaning that the signal code paths are subject to relatively
little stress. Increase this frequency substantially to 25ms intervals,
along with some minor refactoring to make this more readily tuneable and
maintain the 1s logging interval. This interval was chosen based on some
experimentation with emulated platforms to avoid causing so much extra load
that the test starts to run into the 45s limit for selftests or generally
completely disconnect the timeout numbers from the
We could increase this if we moved the signal generation out of the main
supervisor thread, though we should also consider that he percentage of
time that we spend interacting with the floating point state is also a
consideration.
Masahiro Yamada [Wed, 6 Nov 2024 16:18:42 +0000 (01:18 +0900)]
arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANG
Commit be2881824ae9 ("arm64/build: Assert for unwanted sections")
introduced an assertion to ensure that the .data.rel.ro section does
not exist.
However, this check does not work when CONFIG_LTO_CLANG is enabled,
because .data.rel.ro matches the .data.[0-9a-zA-Z_]* pattern in the
DATA_MAIN macro.
Uwe Kleine-König [Sun, 27 Oct 2024 18:03:14 +0000 (19:03 +0100)]
perf: Switch back to struct platform_driver::remove()
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/perf to use .remove(), with
the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Yicong Yang [Sat, 2 Nov 2024 10:42:34 +0000 (18:42 +0800)]
arm64: Enable ARCH_HAS_NONLEAF_PMD_YOUNG
With the support of FEAT_HAFT, the NONLEAF_PMD_YOUNG can be enabled
on arm64 since the hardware is capable of updating the AF flag for
PMD table descriptor. Since the AF bit of the table descriptor
shares the same bit position in block descriptors, we only need
to implement arch_has_hw_nonleaf_pmd_young() and select related
configs. The related pmd_young test/update operations keeps the
same with and already implemented for transparent page support.
Currently ARCH_HAS_NONLEAF_PMD_YOUNG is used to improve the
efficiency of lru-gen aging.
Yicong Yang [Sat, 2 Nov 2024 10:42:33 +0000 (18:42 +0800)]
arm64: Add support for FEAT_HAFT
Armv8.9/v9.4 introduces the feature Hardware managed Access Flag
for Table descriptors (FEAT_HAFT). The feature is indicated by
ID_AA64MMFR1_EL1.HAFDBS == 0b0011 and can be enabled by
TCR2_EL1.HAFT so it has a dependency on FEAT_TCR2.
Adds the Kconfig for FEAT_HAFT and support detecting and enabling
the feature. The feature is enabled in __cpu_setup() before MMU on
just like HA. A CPU capability is added to notify the user of the
feature.
Add definition of P{G,4,U,M}D_TABLE_AF bit and set the AF bit
when creating the page table, which will save the hardware
from having to update them at runtime. This will be ignored if
FEAT_HAFT is not enabled.
The AF bit of table descriptors cannot be managed by the software
per spec, unlike the HA. So this should be used only if it's supported
system wide by system_supports_haft().
arm64/ptdump: Test both PTE_TABLE_BIT and PTE_VALID for block mappings
Test both PTE_TABLE_BIT and PTE_VALID for block mappings, similar to KVM S2
ptdump. This ensures consistency in identifying block mappings, both in the
S1 and the S2 page tables. Besides being kernel page tables, there will not
be any unmapped (!PTE_VALID) block mappings.
Ard Biesheuvel [Tue, 5 Nov 2024 09:39:20 +0000 (10:39 +0100)]
arm64/mm: Sanity check PTE address before runtime P4D/PUD folding
The runtime P4D/PUD folding logic assumes that the respective pgd_t* and
p4d_t* arguments are pointers into actual page tables that are part of
the hierarchy being operated on.
This may not always be the case, and we have been bitten once by this
already [0], where the argument was actually a stack variable, and in
this case, the logic does not work at all.
So let's add a VM_BUG_ON() for each case, to ensure that the address of
the provided page table entry is consistent with the address being
translated.
Yicong Yang [Sat, 2 Nov 2024 10:42:32 +0000 (18:42 +0800)]
arm64: setup: name 'tcr2' register
TCR2_EL1 introduced some additional controls besides TCR_EL1. Currently
only PIE is supported and enabled by writing TCR2_EL1 directly if PIE
detected.
Introduce a named register 'tcr2' just like 'tcr' we've already had.
It'll be initialized to 0 and updated if certain feature detected and
needs to be enabled. Touch the TCR2_EL1 registers at last with the
updated 'tcr2' value if FEAT_TCR2 supported by checking
ID_AA64MMFR3_EL1.TCRX. Then we can extend the support of other features
controlled by TCR2_EL1.
Yicong Yang [Sat, 2 Nov 2024 10:42:31 +0000 (18:42 +0800)]
arm64/sysreg: Update ID_AA64MMFR1_EL1 register
Update ID_AA64MMFR1_EL1 register fields definition per DDI0601 (ID092424)
2024-09. ID_AA64MMFR1_EL1.ETS adds definition for FEAT_ETS2 and
FEAT_ETS3. ID_AA64MMFR1_EL1.HAFDBS adds definition for FEAT_HAFT and
FEAT_HDBSS.
arm64/mm: Drop setting PTE_TYPE_PAGE in pte_mkcont()
PTE_TYPE_PAGE bits were being set in pte_mkcont() because PTE_TABLE_BIT
was being cleared in pte_mkhuge(). But after arch_make_huge_pte()
modification in commit f8192813dcbe ("arm64/mm: Re-organize
arch_make_huge_pte()"), which dropped pte_mkhuge() completely, setting
back PTE_TYPE_PAGE bits is no longer necessary. Change pte_mkcont() to
only set PTE_CONT.
Zheng Zengkai [Wed, 16 Oct 2024 09:54:58 +0000 (17:54 +0800)]
ACPI: GTDT: Tighten the check for the array of platform timer structures
As suggested by Marc and Lorenzo, first we need to check whether the
platform_timer entry pointer is within gtdt bounds (< gtdt_end) before
de-referencing what it points at to detect the length of the platform
timer struct and then check that the length of current platform_timer
struct is also valid, i.e. the length is not zero and within gtdt_end.
Now next_platform_timer() only checks against gtdt_end for the entry of
subsequent platform timer without checking the length of it and will
not report error if the check failed and the existing check in function
acpi_gtdt_init() is also not enough.
Modify the for_each_platform_timer() iterator and use it combined with
a dedicated check function platform_timer_valid() to do the check
against table length (gtdt_end) for each element of platform timer
array in function acpi_gtdt_init(), making sure that both their entry
and length actually fit in the table.
Mark Brown [Thu, 31 Oct 2024 19:21:38 +0000 (19:21 +0000)]
arm64/gcs: Fix outdated ptrace documentation
The ptrace documentation for GCS was written prior to the implementation of
clone3() when we still blocked enabling of GCS via ptrace. This restriction
was relaxed as part of implementing clone3() support since we implemented
support for the GCS not being managed by the kernel but the documentation
still mentions the restriction. Update the documentation to reflect what
was merged.
We have not yet merged clone3() itself but all the support other than in
clone() itself is there.
Mark Brown [Tue, 29 Oct 2024 12:34:21 +0000 (12:34 +0000)]
kselftest/arm64: Use ksft_perror() to log MTE failures
The logging in the allocation helpers variously uses ksft_print_msg() with
very intermittent logging of errno and perror() (which won't produce KTAP
conformant output) when logging the result of API calls that set errno.
Standardise on using the ksft_perror() helper in these cases so that more
information is available should the tests fail.
Liao Chang [Thu, 24 Oct 2024 03:41:20 +0000 (03:41 +0000)]
arm64: Return early when break handler is found on linked-list
The search for breakpoint handlers iterate through the entire
linked list. Given that all registered hook has a valid fn field, and no
registered hooks share the same mask and imm. This commit optimize the
efficiency slightly by returning early as a matching handler is found.
Core HugeTLB defines a fallback definition for arch_make_huge_pte(), which
calls platform provided pte_mkhuge(). But if any platform already provides
an override for arch_make_huge_pte(), then it does not need to provide the
helper pte_mkhuge().
arm64 override for arch_make_huge_pte() calls pte_mkhuge() internally, thus
creating an impression, that both of these callbacks are being used in core
HugeTLB and hence required to be defined. This drops off pte_mkhuge() which
was never required to begin with as there could not be any section mappings
at the PTE level. Re-organize arch_make_huge_pte() based on requested page
size and create the entry for the applicable page table level as needed. It
also removes a redundancy of clearing PTE_TABLE_BIT bit followed by setting
both PTE_TABLE_BIT and PTE_VALID bits (via PTE_TYPE_MASK) in the pte, while
creating CONT_PTE_SIZE size entries.
perf/marvell: Marvell PEM performance monitor support
PCI Express Interface PMU includes various performance counters
to monitor the data that is transmitted over the PCIe link. The
counters track various inbound and outbound transactions which
includes separate counters for posted/non-posted/completion TLPs.
Also, inbound and outbound memory read requests along with their
latencies can also be monitored. Address Translation Services(ATS)events
such as ATS Translation, ATS Page Request, ATS Invalidation along with
their corresponding latencies are also supported.
The performance counters are 64 bits wide.
For instance,
perf stat -e ib_tlp_pr <workload>
tracks the inbound posted TLPs for the workload.
perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control
Armv8.9/9.4 PMUv3.9 adds per counter EL0 access controls. Per counter
access is enabled with the UEN bit in PMUSERENR_EL1 register. Individual
counters are enabled/disabled in the PMUACR_EL1 register. When UEN is
set, the CR/ER bits control EL0 write access and must be set to disable
write access.
With the access controls, the clearing of unused counters can be
skipped.
KVM also configures PMUSERENR_EL1 in order to trap to EL2. UEN does not
need to be set for it since only PMUv3.5 is exposed to guests.
-e, --event <event> event selector. use 'perf list' to list available events
Perf tool assumes the event names are either in lower or upper case. This
is also mentioned in
Documentation/ABI/testing/sysfs-bus-event_source-devices-events
"As performance monitoring event names are case
insensitive in the perf tool, the perf tool only looks
for lower or upper case event names in sysfs to avoid
scanning the directory. It is therefore required the
name of the event here is either lower or upper case."
'commit db95ea787bd1 ("arm64: mm: Wire up TCR.DS bit to PTE shareability
fields")' dropped the last reference to symbol _PROT_SECT_DEFAULT, while
transitioning from PMD_SECT_S to PMD_MAYBE_SHARED for PROT_SECT_DEFAULT.
Hence let's just drop that symbol which is now unused.
Mark Brown [Tue, 22 Oct 2024 23:20:45 +0000 (00:20 +0100)]
kselftest/arm64: Log fp-stress child startup errors to stdout
Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.
Marc Zyngier [Mon, 21 Oct 2024 18:14:34 +0000 (19:14 +0100)]
arm64: Add command-line override for ID_AA64MMFR0_EL1.ECV
It appears that relatively popular hardware out there implements
the CNTPOFF_EL2 variant of FEAT_ECV, advertises it via ID_AA64MMFR0_EL1,
but cannot be bothered to set SCR_EL3.ECVEn to 1.
You would probably think that "this is fine, EL3 will take the
trap on access to CNTPOFF_EL2 and flip the ECVEn bit", as that's
what a semi-decent firmware implementation would do.
But no. None of that. This particular implementation takes the trap,
considers its purpose in life, decides that it has none, and *RESETS*
the system.
Yes, x1e001de, I'm talking about you.
In order to allow this machine to be promoted slightly above the
level of a glorified door-stop, add a new "id_aa64mmfr0.ecv" override.
allowing the kernel to pretend this option was never there.
Steven Price [Thu, 17 Oct 2024 13:14:34 +0000 (14:14 +0100)]
arm64: Document Arm Confidential Compute
Add some documentation on Arm CCA and the requirements for running Linux
as a Realm guest. Also update booting.rst to describe the requirement
for RIPAS RAM.
Sami Mujawar [Thu, 17 Oct 2024 13:14:33 +0000 (14:14 +0100)]
virt: arm-cca-guest: TSM_REPORT support for realms
Introduce an arm-cca-guest driver that registers with
the configfs-tsm module to provide user interfaces for
retrieving an attestation token.
When a new report is requested the arm-cca-guest driver
invokes the appropriate RSI interfaces to query an
attestation token.
The steps to retrieve an attestation token are as follows:
1. Mount the configfs filesystem if not already mounted
mount -t configfs none /sys/kernel/config
2. Generate an attestation token
report=/sys/kernel/config/tsm/report/report0
mkdir $report
dd if=/dev/urandom bs=64 count=1 > $report/inblob
hexdump -C $report/outblob
rmdir $report
Suzuki K Poulose [Thu, 17 Oct 2024 13:14:32 +0000 (14:14 +0100)]
arm64: Enable memory encrypt for Realms
Use the memory encryption APIs to trigger a RSI call to request a
transition between protected memory and shared memory (or vice versa)
and updating the kernel's linear map of modified pages to flip the top
bit of the IPA. This requires that block mappings are not used in the
direct map for realm guests.
Steven Price [Thu, 17 Oct 2024 13:14:31 +0000 (14:14 +0100)]
arm64: mm: Avoid TLBI when marking pages as valid
When __change_memory_common() is purely setting the valid bit on a PTE
(e.g. via the set_memory_valid() call) there is no need for a TLBI as
either the entry isn't changing (the valid bit was already set) or the
entry was invalid and so should not have been cached in the TLB.
Steven Price [Thu, 17 Oct 2024 13:14:30 +0000 (14:14 +0100)]
arm64: Enforce bounce buffers for realm DMA
Within a realm guest it's not possible for a device emulated by the VMM
to access arbitrary guest memory. So force the use of bounce buffers to
ensure that the memory the emulated devices are accessing is in memory
which is explicitly shared with the host.
This adds a call to swiotlb_update_mem_attributes() which calls
set_memory_decrypted() to ensure the bounce buffer memory is shared with
the host. For non-realm guests or hosts this is a no-op.
Suzuki K Poulose [Thu, 17 Oct 2024 13:14:27 +0000 (14:14 +0100)]
arm64: rsi: Add support for checking whether an MMIO is protected
On Arm CCA, with RMM-v1.0, all MMIO regions are shared. However, in
the future, an Arm CCA-v1.0 compliant guest may be run in a lesser
privileged partition in the Realm World (with Arm CCA-v1.1 Planes
feature). In this case, some of the MMIO regions may be emulated
by a higher privileged component in the Realm world, i.e, protected.
Thus the guest must decide today, whether a given MMIO region is shared
vs Protected and create the stage1 mapping accordingly. On Arm CCA, this
detection is based on the "IPA State" (RIPAS == RIPAS_IO). Provide a
helper to run this check on a given range of MMIO.
Also, provide a arm64 helper which may be hooked in by other solutions.
Steven Price [Thu, 17 Oct 2024 13:14:26 +0000 (14:14 +0100)]
arm64: realm: Query IPA size from the RMM
The top bit of the configured IPA size is used as an attribute to
control whether the address is protected or shared. Query the
configuration from the RMM to assertain which bit this is.
Suzuki K Poulose [Thu, 17 Oct 2024 13:14:25 +0000 (14:14 +0100)]
arm64: Detect if in a realm and set RIPAS RAM
Detect that the VM is a realm guest by the presence of the RSI
interface. This is done after PSCI has been initialised so that we can
check the SMCCC conduit before making any RSI calls.
If in a realm then iterate over all memory ensuring that it is marked as
RIPAS RAM. The loader is required to do this for us, however if some
memory is missed this will cause the guest to receive a hard to debug
external abort at some random point in the future. So for a
belt-and-braces approach set all memory to RIPAS RAM. Any failure here
implies that the RAM regions passed to Linux are incorrect so panic()
promptly to make the situation clear.
Mark Brown [Thu, 17 Oct 2024 17:43:31 +0000 (18:43 +0100)]
kselftest/arm64: Fail the overall fp-stress test if any test fails
Currently fp-stress does not report a top level test result if it runs to
completion, it always exits with a return code 0. Use the ksft_finished()
helper to ensure that the exit code for the top level program reports a
failure if any of the individual tests has failed.
Mark Rutland [Mon, 21 Oct 2024 16:44:56 +0000 (17:44 +0100)]
arm64: preserve pt_regs::stackframe during exec*()
When performing an exec*(), there's a transient period before the return
to userspace where any stacktrace will result in a warning triggered by
kunwind_next_frame_record_meta() encountering a struct frame_record_meta
with an unknown type. This can be seen fairly reliably by enabling KASAN
or KFENCE, e.g.
This happens because start_thread_common() zeroes the entirety of
current_pt_regs(), including pt_regs::stackframe::type, changing this
from FRAME_META_TYPE_FINAL to 0 and making the final record invalid.
The stacktrace code will reject this until the next return to userspace,
where a subsequent exception entry will reinitialize the type to
FRAME_META_TYPE_FINAL.
... as before that commit the stacktrace code only expected the final
pt_regs::stackframe to contain zeroes, which was unchanged by
start_thread_common().
A stacktrace could occur at any time, either due to instrumentation or
an exception, and so start_thread_common() must ensure that
pt_regs::stackframe is always valid.
Fix this by changing the way start_thread_common() zeroes and
reinitializes the pt_regs fields:
* The '{regs,pc,pstate}' fields are initialized in one go via a struct
assignment to the user_regs, with start_thread() and
compat_start_thread() modified to pass 'pstate' into
start_thread_common().
* The 'sp' and 'compat_sp' fields are zeroed by the struct assignment in
start_thread_common(), and subsequently overwritten in start_thread()
and compat_start_thread respectively, matching existing behaviour.
* The 'syscallno' field is implicitly preserved while the 'orig_x0'
field is explicitly zeroed, maintaining existing ABI.
* The 'pmr' field is explicitly initialized, as necessary for an exec*()
from a kernel thread, matching existing behaviour.
* The 'stackframe' field is implicitly preserved, with a new comment and
some assertions to ensure we don't accidentally break this in future.
* All other fields are implicitly preserved, and should have no
functional impact:
- 'sdei_ttbr1' is only used for SDEI exception entry/exit, and we
never exec*() inside an SDEI handler.
- 'lockdep_hardirqs' and 'exit_rcu' are only used for EL1 exception
entry/exit, and we never exec*() inside an EL1 exception handler.
While updating compat_start_thread() to pass 'pstate' into
start_thread_common(), I've also updated the logic to remove the
ifdeffery, replacing:
Ard Biesheuvel [Fri, 18 Oct 2024 07:53:51 +0000 (09:53 +0200)]
arm64/crc32: Implement 4-way interleave using PMULL
Now that kernel mode NEON no longer disables preemption, using FP/SIMD
in library code which is not obviously part of the crypto subsystem is
no longer problematic, as it will no longer incur unexpected latencies.
So accelerate the CRC-32 library code on arm64 to use a 4-way
interleave, using PMULL instructions to implement the folding.
On Apple M2, this results in a speedup of 2 - 2.8x when using input
sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
restoring the FP/SIMD register file may not be worth it, so 1k is used
as a threshold for choosing this code path.
The coefficient tables were generated using code provided by Eric. [0]
Ard Biesheuvel [Fri, 18 Oct 2024 07:53:50 +0000 (09:53 +0200)]
arm64/crc32: Reorganize bit/byte ordering macros
In preparation for a new user, reorganize the bit/byte ordering macros
that are used to parameterize the crc32 template code and instantiate
CRC-32, CRC-32c and 'big endian' CRC-32.
Ard Biesheuvel [Fri, 18 Oct 2024 07:53:49 +0000 (09:53 +0200)]
arm64/lib: Handle CRC-32 alternative in C code
In preparation for adding another code path for performing CRC-32, move
the alternative patching for ARM64_HAS_CRC32 into C code. The logic for
deciding whether to use this new code path will be implemented in C too.
Andre Przywara [Fri, 16 Aug 2024 15:32:51 +0000 (16:32 +0100)]
kselftest/arm64: mte: fix printf type warnings about longs
When checking MTE tags, we print some diagnostic messages when the tests
fail. Some variables uses there are "longs", however we only use "%x"
for the format specifier.
Update the format specifiers to "%lx", to match the variable types they
are supposed to print.
Andre Przywara [Fri, 16 Aug 2024 15:32:49 +0000 (16:32 +0100)]
kselftest/arm64: mte: fix printf type warnings about __u64
When printing the signal context's PC, we use a "%lx" format specifier,
which matches the common userland (glibc's) definition of uint64_t as an
"unsigned long". However the structure in question is defined in a
kernel uapi header, which uses a self defined __u64 type, and the arm64
kernel headers define this using "int-ll64.h", so it becomes an
"unsigned long long". This mismatch leads to the usual compiler warning.
The common fix would be to use "PRIx64", but because this is defined by
the userland's toolchain libc headers, it wouldn't match as well. Since
we know the exact type of __u64, just use "%llx" here instead, to silence
this warning.
This also fixes a more severe typo: "$lx" is not a valid format
specifier.
Andre Przywara [Fri, 16 Aug 2024 15:32:47 +0000 (16:32 +0100)]
kselftest/arm64: mte: use string literal for printf-style functions
Using pointers for the format specifier strings in printf-style
functions can create potential security problems, as the number of
arguments to be parsed could vary from call to call. Most compilers
consequently warn about those:
"format not a string literal and no format arguments [-Wformat-security]"
If we only want to print a constant string, we can just use a fixed "%s"
format instead, and pass the string as an argument.
Andre Przywara [Fri, 16 Aug 2024 15:32:46 +0000 (16:32 +0100)]
kselftest/arm64: mte: use proper SKIP syntax
If MTE is not available on a system, we detect this early and skip all
the MTE selftests. However this happens before we print the TAP plan, so
tools parsing the TAP output get confused and report an error.
Use the existing ksft_exit_skip() function to handle this, which uses a
dummy plan to work with tools expecting proper TAP syntax, as described
in the TAP specification.
Andre Przywara [Fri, 16 Aug 2024 15:32:44 +0000 (16:32 +0100)]
kselftest/arm64: signal: drop now redundant GNU_SOURCE definition
The definition of GNU_SOURCE was recently centralised in an upper layer
kselftest Makefile, so the definition in the arm64 signal tests Makefile
is no longer needed. To make things worse, since both definitions are
not strictly identical, the compiler warns about it:
<command-line>: warning: "_GNU_SOURCE" redefined
<command-line>: note: this is the location of the previous definition
Mark Brown [Fri, 4 Oct 2024 20:26:30 +0000 (21:26 +0100)]
arm64: Support AT_HWCAP3
We have filled all 64 bits of AT_HWCAP2 so in order to support discovery of
further features provide the framework to use the already defined AT_HWCAP3
for further CPU features.
Mark Brown [Fri, 4 Oct 2024 20:26:29 +0000 (21:26 +0100)]
binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4
AT_HWCAP3 and AT_HWCAP4 were recently defined for use on PowerPC in commit 3281366a8e79 ("uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector,
entries"). Since we want to start using AT_HWCAP3 on arm64 add support for
exposing both these new hwcaps via binfmt_elf.
Mark Rutland [Thu, 17 Oct 2024 09:25:38 +0000 (10:25 +0100)]
arm64: stacktrace: unwind exception boundaries
When arm64's stack unwinder encounters an exception boundary, it uses
the pt_regs::stackframe created by the entry code, which has a copy of
the PC and FP at the time the exception was taken. The unwinder doesn't
know anything about pt_regs, and reports the PC from the stackframe, but
does not report the LR.
The LR is only guaranteed to contain the return address at function call
boundaries, and can be used as a scratch register at other times, so the
LR at an exception boundary may or may not be a legitimate return
address. It would be useful to report the LR value regardless, as it can
be helpful when debugging, and in future it will be helpful for reliable
stacktrace support.
This patch changes the way we unwind across exception boundaries,
allowing both the PC and LR to be reported. The entry code creates a
frame_record_meta structure embedded within pt_regs, which the unwinder
uses to find the pt_regs. The unwinder can then extract pt_regs::pc and
pt_regs::lr as two separate unwind steps before continuing with a
regular walk of frame records.
When a PC is unwound from pt_regs::lr, dump_backtrace() will log this
with an "L" marker so that it can be identified easily. For example,
an unwind across an exception boundary will appear as follows:
... where the LR points a few instructions before the current PC.
This plays nicely with all the other unwind metadata tracking. With the
ftrace_graph profiler enabled globally, and kretprobes installed on
generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered
by magic-sysrq + L reports:
Mark Rutland [Thu, 17 Oct 2024 09:25:37 +0000 (10:25 +0100)]
arm64: stacktrace: split unwind_consume_stack()
When unwinding stacks, we use unwind_consume_stack() to both find
whether an object (e.g. a frame record) is on an accessible stack *and*
to update the stack boundaries. This works fine today since we only care
about one type of object which does not overlap other objects.
In subsequent patches we'll want to check whether an object (e.g a frame
record) is on the stack and follow this up by accessing a larger object
containing the first (e.g. a pt_regs with an embedded frame record).
To make that pattern easier to implement, this patch reworks
unwind_find_next_stack() and unwind_consume_stack() so that the former
can be used to check if an object is on any accessible stack, and the
latter is purely used to update the stack boundaries.
As unwind_find_next_stack() is modified to also check the stack
currently being unwound, it is renamed to unwind_find_stack().
There should be no functional change as a result of this patch.
Mark Rutland [Thu, 17 Oct 2024 09:25:36 +0000 (10:25 +0100)]
arm64: stacktrace: report recovered PCs
When analysing a stacktrace it can be useful to know whether an unwound
PC has been rewritten by fgraph or kretprobes, as in some situations
these may be suspect or be known to be unreliable.
This patch adds flags to track when an unwind entry has recovered the PC
from fgraph and/or kretprobes, and updates dump_backtrace() to log when
this is the case.
The flags recorded are:
"F" - the PC was recovered from fgraph
"K" - the PC was recovered from kretprobes
These flags are recorded and logged in addition to the original source
of the unwound PC.
For example, with the ftrace_graph profiler enabled globally, and
kretprobes installed on generic_handle_domain_irq() and
do_interrupt_handler(), a backtrace triggered by magic-sysrq + L
reports:
Note that as these flags are reported next to the recovered PC value,
they appear on the callers of instrumented functions. For example
gic_handle_irq() has a "K" marker because generic_handle_domain_irq()
was instrumented with kretprobes and had its return address rewritten.
Mark Rutland [Thu, 17 Oct 2024 09:25:35 +0000 (10:25 +0100)]
arm64: stacktrace: report source of unwind data
When analysing a stacktrace it can be useful to know where an unwound PC
came from, as in some situations certain sources may be suspect or known
to be unreliable. In future it would also be useful to track this so
that certain unwind steps can be performed in a stateful manner. For
example when unwinding across an exception boundary, we'd ideally unwind
pt_regs::pc, then pt_regs::lr, then the next frame record.
This patch adds an enumerated set of unwind sources, tracks this during
the unwind, and updates dump_backtrace() to log these for interesting
unwind steps.
The interesting sources recorded are:
"C" - the PC came from the caller of an unwind function.
"T" - the PC came from thread_saved_pc() for a blocked task.
"P" - the PC came from a pt_regs::pc.
"U" - the PC came from an unknown source (indicates an unwinder error).
... with nothing recorded when the PC came from a frame_record::pc as
this is the vastly common case and logging this would make it difficult
to spot the more interesting cases.
For example, when triggering a backtrace via magic-sysrq + L, the CPU
handling the sysrq will have a backtrace whose first element is the
caller (C) of dump_backtrace():
Mark Rutland [Thu, 17 Oct 2024 09:25:34 +0000 (10:25 +0100)]
arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()
Currently dump_backtrace() can only print the PC value at each step of
the unwind, as this is all the information that arch_stack_walk()
passes to the dump_backtrace_entry() callback.
In future we'd like to print some additional information, such as the
origin of entries (e.g. PC, LR, FP) and/or the reliability thereof.
In preparation for doing so, this patch moves dump_backtrace() over to
kunwind_stack_walk(), which passes the full kunwind_state to the
callback.
There should be no functional change as a result of this patch.
Mark Rutland [Thu, 17 Oct 2024 09:25:33 +0000 (10:25 +0100)]
arm64: use a common struct frame_record
Currently the signal handling code has its own struct frame_record,
the definition of struct pt_regs open-codes a frame record as an array,
and the kernel unwinder hard-codes frame record offsets.
Move to a common struct frame_record that can be used throughout the
kernel.
Mark Rutland [Thu, 17 Oct 2024 09:25:32 +0000 (10:25 +0100)]
arm64: pt_regs: swap 'unused' and 'pmr' fields
In subsequent patches we'll want to add an additional u64 to struct
pt_regs. To make space, this patch swaps the 'unused' and 'pmr' fields,
as the 'pmr' value only requires bits[7:0] and can safely fit into a
u32, which frees up a 64-bit unused field.
The 'lockdep_hardirqs' and 'exit_rcu' fields should eventually be moved
out of pt_regs and managed locally within entry-common.c, so I've left
those as-is for the moment.
There should be no functional change as a result of this patch.
Mark Rutland [Thu, 17 Oct 2024 09:25:31 +0000 (10:25 +0100)]
arm64: pt_regs: rename "pmr_save" -> "pmr"
The pt_regs::pmr_save field is weirdly named relative to all other
pt_regs fields, with a '_save' suffix that doesn't make anything clearer
and only leads to more typing to access the field.
Remove the '_save' suffix.
There should be no functional change as a result of this patch.
Mark Rutland [Thu, 17 Oct 2024 09:25:30 +0000 (10:25 +0100)]
arm64: pt_regs: remove stale big-endian layout
For historical reasons the layout of struct pt_regs depends on the
configured endianness, with the order of the 'syscallno' and 'unused2'
fields varying dependent upon whether __AARCH64EB__ is defined. We no
longer depend on the order of these two fields and can remove the
ifdeffery.
The current conditional layout was introduced in commit:
35d0e6fb4d219d64 ("arm64: syscallno is secretly an int, make it official")
At the time, this was necessary so that the entry assembly could use a
single STP instruction to save the pt_regs::{orig_x0,syscallno} fields,
without logic that was conditional on the endianness of the kernel:
| el0_svc_naked:
| stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
Since that commit, we no longer manipulate pt_regs::orig_x0 from
assembly, and only manipulate pt_regs::syscallno as a 32-bit quantity
early in the kernel_entry assembly:
| /* Not in a syscall by default (el0_svc overwrites for real syscall) */
| .if \el == 0
| mov w21, #NO_SYSCALL
| str w21, [sp, #S_SYSCALLNO]
| .endif
Given the above, there's no longer a need for the layout of
pt_regs::{syscallno,unused2} to depend on the endianness of the kernel.
This patch removes the ifdeffery and places 'syscallno' before 'unused2'
regardless of the endianess of the kernel. At the same time, 'unused2'
is renamed to 'unused', as it is the only unused field within pt_regs.
There should be no functional change as a result of this patch.
Mark Rutland [Thu, 17 Oct 2024 09:25:29 +0000 (10:25 +0100)]
arm64: pt_regs: assert pt_regs is a multiple of 16 bytes
To ensure that the stack is correctly aligned when branching to C code,
we require that struct pt_regs is a multiple of 16 bytes, as noted in a
comment.
Add an explicit assertion for this, so that any accidental violation of
this requirement will be caught by the compiler.
arm64: lib: Use MOPS for copy_page() and clear_page()
Similarly to what was done to the memcpy() routines, make copy_page()
and clear_page() also use the Armv8.8 FEAT_MOPS instructions.
Note: For copy_page() this uses the CPY* instructions instead of CPYF*
as CPYF* doesn't allow src and dst to be equal. It's not clear if
copy_page() needs to allow equal src and dst but it has worked so far
with the current implementation and there is no documentation forbidding
it.
Note, the unoptimized version of copy_page() in assembler.h is left as
it is.