Ard Biesheuvel [Tue, 24 Nov 2020 10:47:19 +0000 (11:47 +0100)]
crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
ecdh_set_secret() casts a void* pointer to a const u64* in order to
feed it into ecc_is_key_valid(). This is not generally permitted by
the C standard, and leads to actual misalignment faults on ARMv6
cores. In some cases, these are fixed up in software, but this still
leads to performance hits that are entirely avoidable.
So let's copy the key into the ctx buffer first, which we will do
anyway in the common case, and which guarantees correct alignment.
crypto: ccree - Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.
Ard Biesheuvel [Fri, 20 Nov 2020 11:04:33 +0000 (12:04 +0100)]
crypto: tcrypt - include 1420 byte blocks in aead and skcipher benchmarks
WireGuard and IPsec both typically operate on input blocks that are
~1420 bytes in size, given the default Ethernet MTU of 1500 bytes and
the overhead of the VPN metadata.
Many aead and sckipher implementations are optimized for power-of-2
block sizes, and whether they perform well when operating on 1420
byte blocks cannot be easily extrapolated from the performance on
power-of-2 block size. So let's add 1420 bytes explicitly, and round
it up to the next blocksize multiple of the algo in question if it
does not support 1420 byte blocks.
Ard Biesheuvel [Fri, 20 Nov 2020 11:04:32 +0000 (12:04 +0100)]
crypto: tcrypt - permit tcrypt.ko to be builtin
When working on crypto algorithms, being able to run tcrypt quickly
without booting an entire Linux installation can be very useful. For
instance, QEMU/kvm can be used to boot a kernel from the command line,
and having tcrypt.ko builtin would allow tcrypt to be executed to run
benchmarks, or to run tests for algorithms that need to be instantiated
from templates, without the need to make it past the point where the
rootfs is mounted.
So let's relax the requirement that tcrypt can only be built as a module
when CONFIG_EXPERT is enabled.
Ard Biesheuvel [Fri, 20 Nov 2020 11:04:31 +0000 (12:04 +0100)]
crypto: tcrypt - don't initialize at subsys_initcall time
Commit c4741b2305979 ("crypto: run initcalls for generic implementations
earlier") converted tcrypt.ko's module_init() to subsys_initcall(), but
this was unintentional: tcrypt.ko currently cannot be built into the core
kernel, and so the subsys_initcall() gets converted into module_init()
under the hood. Given that tcrypt.ko does not implement a generic version
of a crypto algorithm that has to be available early during boot, there
is no point in running the tcrypt init code earlier than implied by
module_init().
However, for crypto development purposes, we will lift the restriction
that tcrypt.ko must be built as a module, and when builtin, it makes sense
for tcrypt.ko (which does its work inside the module init function) to run
as late as possible. So let's switch to late_initcall() instead.
Weili Qian [Fri, 20 Nov 2020 09:02:31 +0000 (17:02 +0800)]
hwrng: hisi - remove HiSilicon TRNG driver
Driver of HiSilicon true random number generator(TRNG)
is removed from 'drivers/char/hw_random'.
Both 'Kunpeng 920' and 'Kunpeng 930' chips have TRNG,
however, PRNG is only supported by 'Kunpeng 930'.
So, this driver is moved to 'drivers/crypto/hisilicon/trng/'
in the next to enable the two's TRNG better.
Thara Gopinath [Thu, 19 Nov 2020 15:52:31 +0000 (10:52 -0500)]
crypto: qce - Fix SHA result buffer corruption issues
Partial hash was being copied into the final result buffer without the
entire message block processed. Depending on how the end user processes
this result buffer, errors vary from result buffer corruption to result
buffer poisoing. Fix this issue by ensuring that only the final hash value
is copied into the result buffer.
Ard Biesheuvel [Tue, 17 Nov 2020 13:32:14 +0000 (14:32 +0100)]
crypto: aegis128 - expose SIMD code path as separate driver
Wiring the SIMD code into the generic driver has the unfortunate side
effect that the tcrypt testing code cannot distinguish them, and will
therefore not use the latter to fuzz test the former, as it does for
other algorithms.
So let's refactor the code a bit so we can register two implementations:
aegis128-generic and aegis128-simd.
Ard Biesheuvel [Tue, 17 Nov 2020 13:32:13 +0000 (14:32 +0100)]
crypto: aegis128/neon - move final tag check to SIMD domain
Instead of calculating the tag and returning it to the caller on
decryption, use a SIMD compare and min across vector to perform
the comparison. This is slightly more efficient, and removes the
need on the caller's part to wipe the tag from memory if the
decryption failed.
While at it, switch to unsigned int when passing cryptlen and
assoclen - we don't support input sizes where it matters anyway.
Avoid copying the tail block via a stack buffer if the total size
exceeds a single AEGIS block. In this case, we can use overlapping
loads and stores and NEON permutation instructions instead, which
leads to a modest performance improvement on some cores (< 5%),
and is slightly cleaner. Note that we still need to use a stack
buffer if the entire input is smaller than 16 bytes, given that
we cannot use 16 byte NEON loads and stores safely in this case.
Ard Biesheuvel [Tue, 17 Nov 2020 13:32:11 +0000 (14:32 +0100)]
crypto: aegis128 - wipe plaintext and tag if decryption fails
The AEGIS spec mentions explicitly that the security guarantees hold
only if the resulting plaintext and tag of a failed decryption are
withheld. So ensure that we abide by this.
While at it, drop the unused struct aead_request *req parameter from
crypto_aegis128_process_crypt().
Corentin Labbe [Sun, 15 Nov 2020 19:08:07 +0000 (19:08 +0000)]
crypto: sun8i-ce - fix two error path's memory leak
This patch fixes the following smatch warnings:
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-hash.c:412
sun8i_ce_hash_run() warn: possible memory leak of 'result'
Note: "buf" is leaked as well.
Furthermore, in case of ENOMEM, crypto_finalize_hash_request() was not
called which was an error.
Giovanni Cabiddu [Fri, 13 Nov 2020 16:46:42 +0000 (16:46 +0000)]
crypto: qat - add hook to initialize vector routing table
Add an hook to initialize the vector routing table with the default
values before MSIx is enabled.
The new function set_msix_rttable() is called only if present in the
struct adf_hw_device_data of the device. This is to allow for QAT
devices that do not support that functionality.
Giovanni Cabiddu [Fri, 13 Nov 2020 16:46:41 +0000 (16:46 +0000)]
crypto: qat - target fw images to specific AEs
Introduce support for devices that require multiple firmware images.
If a device requires more than a firmware image to operate, load the
image to the appropriate Acceleration Engine (AE).
Zhang Qilong [Fri, 13 Nov 2020 13:17:28 +0000 (21:17 +0800)]
crypto: omap-aes - Fix PM disable depth imbalance in omap_aes_probe
The pm_runtime_enable will increase power disable depth.
Thus a pairing decrement is needed on the error handling
path to keep it balanced according to context.
Fixes: f7b2b5dd6a62a ("crypto: omap-aes - add error check for pm_runtime_get_sync") Signed-off-by: Zhang Qilong <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Yang Shen [Fri, 13 Nov 2020 09:32:35 +0000 (17:32 +0800)]
crypto: hisilicon/zip - add a work_queue for zip irq
The patch 'irqchip/gic-v3-its: Balance initial LPI affinity across CPUs'
set the IRQ to an uncentain CPU. If an IRQ is bound to the CPU used by the
thread which is sending request, the throughput will be just half.
So allocate a 'work_queue' and set as 'WQ_UNBOUND' to do the back half work
on some different CPUS.
Eric Biggers [Fri, 13 Nov 2020 05:20:21 +0000 (21:20 -0800)]
crypto: sha - split sha.h into sha1.h and sha2.h
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.
This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.
Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.
This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.
crypto: crypto4xx - Replace bitwise OR with logical OR in crypto4xx_build_pd
Clang warns:
drivers/crypto/amcc/crypto4xx_core.c:921:60: warning: operator '?:' has
lower precedence than '|'; '|' will be evaluated first
[-Wbitwise-conditional-parentheses]
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
around the '|' expression to silence this warning
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
^
)
drivers/crypto/amcc/crypto4xx_core.c:921:60: note: place parentheses
around the '?:' expression to evaluate it first
(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
^
(
1 warning generated.
It looks like this should have been a logical OR so that
PD_CTL_HASH_FINAL gets added to the w bitmask if crypto_tfm_alg_type
is either CRYPTO_ALG_TYPE_AHASH or CRYPTO_ALG_TYPE_AEAD. Change the
operator so that everything works properly.
Ard Biesheuvel [Tue, 10 Nov 2020 09:10:42 +0000 (10:10 +0100)]
crypto: arm64/gcm - move authentication tag check to SIMD domain
Instead of copying the calculated authentication tag to memory and
calling crypto_memneq() to verify it, use vector bytewise compare and
min across vector instructions to decide whether the tag is valid. This
is more efficient, and given that the tag is only transiently held in a
NEON register, it is also safer, given that calculated tags for failed
decryptions should be withheld.
Based on lessons learnt from optimizing the 32-bit version of this driver,
we can simplify the arm64 version considerably, by reordering the final
two stores when the last block is not a multiple of 64 bytes. This removes
the need to use permutation instructions to calculate the elements that are
clobbered by the final overlapping store, given that the store of the
penultimate block now follows it, and that one carries the correct values
for those elements already.
While at it, simplify the overlapping loads as well, by calculating the
address of the final overlapping load upfront, and switching to this
address for every load that would otherwise extend past the end of the
source buffer.
There is no impact on performance, but the resulting code is substantially
smaller and easier to follow.
Jack Xu [Fri, 6 Nov 2020 11:28:07 +0000 (19:28 +0800)]
crypto: qat - allow to target specific AEs
Introduce new API, qat_uclo_set_cfg_ae_mask(), to allow the load of the
firmware image to a subset of Acceleration Engines (AEs). This is
required by the next generation of QAT devices to be able to load
different firmware images to the device.
Jack Xu [Fri, 6 Nov 2020 11:27:59 +0000 (19:27 +0800)]
crypto: qat - add reset CSR and mask to chip info
Add reset CSR offset and mask to chip info since they are different
in new QAT devices. This also simplifies the reset/clrReset functions
by using the reset mask.
Jack Xu [Fri, 6 Nov 2020 11:27:55 +0000 (19:27 +0800)]
crypto: qat - replace check based on DID
Modify condition in qat_uclo_wr_mimage() to use a capability of the
device (sram_visible), rather than the device ID, so the check is not
specific to devices of the same type.
Jack Xu [Fri, 6 Nov 2020 11:27:54 +0000 (19:27 +0800)]
crypto: qat - introduce chip info structure
Introduce the chip info structure which contains device specific
information. The initialization path has been split between common and
hardware specific in order to facilitate the introduction of the next
generation hardware.
Jack Xu [Fri, 6 Nov 2020 11:27:53 +0000 (19:27 +0800)]
crypto: qat - refactor long expressions
Replace long expressions with local variables in the functions
qat_uclo_wr_uimage_page(), qat_uclo_init_globals() and
qat_uclo_init_umem_seg() to improve readability.
Jack Xu [Fri, 6 Nov 2020 11:27:51 +0000 (19:27 +0800)]
crypto: qat - move defines to header files
Move the definition of ICP_QAT_AE_OFFSET, ICP_QAT_CAP_OFFSET,
LOCAL_TO_XFER_REG_OFFSET and ICP_QAT_EP_OFFSET from qat_hal.c to
icp_qat_hal.h to avoid the definition of generation specific constants
in qat_hal.c.
Jack Xu [Fri, 6 Nov 2020 11:27:50 +0000 (19:27 +0800)]
crypto: qat - remove global CSRs helpers
Include the offset of GLOBAL_CSR directly into the enum hal_global_csr
and remove the macros SET_GLB_CSR/GET_GLB_CSR to simplify the global CSR
access.
Jack Xu [Fri, 6 Nov 2020 11:27:49 +0000 (19:27 +0800)]
crypto: qat - refactor AE start
Change the API and the behaviour of the qat_hal_start() function.
With this change, the function starts under the hood all acceleration
engines (AEs) and there is no longer need to call it for each engine.
Jack Xu [Fri, 6 Nov 2020 11:27:46 +0000 (19:27 +0800)]
crypto: qat - add support for relative FW ucode loading
Improve the way micro instructions (FW code) are uploaded to Accelerator
Engines (AEs). If code starts at PC zero (absolute addressing), read
uwords with no relative address. Otherwise, use relative addressing to
the page region.
Jack Xu [Fri, 6 Nov 2020 11:27:41 +0000 (19:27 +0800)]
crypto: qat - fix CSR access
Do not mask the AE number with the AE mask when accessing the AE local
CSRs. Bit 12 of the local CSR address is the start of AE number so just
take out the AE mask here.
Jack Xu [Fri, 6 Nov 2020 11:27:40 +0000 (19:27 +0800)]
crypto: qat - fix status check in qat_hal_put_rel_rd_xfer()
The return value of qat_hal_rd_ae_csr() is always a CSR value and never
a status and should not be stored in the status variable of
qat_hal_put_rel_rd_xfer().
This removes the assignment as qat_hal_rd_ae_csr() is not expected to
fail.
A more comprehensive handling of the theoretical corner case which could
result in a fail will be submitted in a separate patch.
Implement infrastructure for the Multiple Object File (MOF) format
in the firmware loader. This will allow to load a specific firmware
image contained inside an MOF file.
This patch is based on earlier work done by Pingchao Yang.
Herbert Xu [Fri, 6 Nov 2020 06:53:52 +0000 (17:53 +1100)]
crypto: cavium/nitrox - Fix sparse warnings
This patch fixes all the sparse warnings in cavium/nitrox:
- Fix endianness warnings by adding the correct markers to unions.
- Add missing header inclusions for prototypes.
- Move nitrox_sriov_configure prototype into the isr header file.
Ard Biesheuvel [Tue, 3 Nov 2020 16:28:09 +0000 (17:28 +0100)]
crypto: arm/chacha-neon - optimize for non-block size multiples
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.
For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.
Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.
This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)
crypto: Kconfig - CRYPTO_MANAGER_EXTRA_TESTS requires the manager
The extra tests in the manager actually require the manager to be
selected too. Otherwise the linker gives errors like:
ld: arch/x86/crypto/chacha_glue.o: in function `chacha_simd_stream_xor':
chacha_glue.c:(.text+0x422): undefined reference to `crypto_simd_disabled_for_test'
Fixes: 2343d1529aff ("crypto: Kconfig - allow tests to be disabled when manager is disabled") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
At the time xts fallback tfm allocation fails the device struct
hasn't been enabled yet in the caam xts tfm's private context.
Fix this by using the device struct from xts algorithm's private context
or, when not available, by replacing dev_err with pr_err.
Fixes: 9d9b14dbe077 ("crypto: caam/jr - add fallback for XTS with more than 8B IV") Fixes: 83e8aa912138 ("crypto: caam/qi - add fallback for XTS with more than 8B IV") Fixes: 36e2d7cfdcf1 ("crypto: caam/qi2 - add fallback for XTS with more than 8B IV") Signed-off-by: Horia Geantă <[email protected]> Reviewed-by: Iuliana Prodan <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Weili Qian [Sat, 31 Oct 2020 09:07:02 +0000 (17:07 +0800)]
crypto: hisilicon/qm - modify the return type of function
The returns of 'qm_get_hw_error_status' and 'qm_get_dev_err_status'
are values from the hardware registers, which should not be defined
as 'int', so update as 'u32'.
Nigel Christian [Thu, 29 Oct 2020 00:52:17 +0000 (20:52 -0400)]
hwrng: imx-rngc - irq already prints an error
Clean up the check for irq. dev_err() is superfluous as
platform_get_irq() already prints an error. Check for zero
would indicate a bug. Remove curly braces to conform to
styling requirements. Signed-off-by: Nigel Christian <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Horia Geantă [Wed, 28 Oct 2020 09:03:20 +0000 (11:03 +0200)]
crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback
Loading the module deadlocks since:
-local cbc(aes) implementation needs a fallback and
-crypto API tries to find one but the request_module() resolves back to
the same module
Fix this by changing the module alias for cbc(aes) and
using the NEED_FALLBACK flag when requesting for a fallback algorithm.
Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc encryption path") Signed-off-by: Horia Geantă <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Ard Biesheuvel [Mon, 26 Oct 2020 23:00:27 +0000 (00:00 +0100)]
crypto: arm64/poly1305-neon - reorder PAC authentication with SP update
PAC pointer authentication signs the return address against the value
of the stack pointer, to prevent stack overrun exploits from corrupting
the control flow. However, this requires that the AUTIASP is issued with
SP holding the same value as it held when the PAC value was generated.
The Poly1305 NEON code got this wrong, resulting in crashes on PAC
capable hardware.
Commit 3f69cc60768b ("crypto: af_alg - Allow arbitrarily long algorithm
names") made the kernel start accepting arbitrarily long algorithm names
in sockaddr_alg. However, the actual length of the salg_name field
stayed at the original 64 bytes.
This is broken because the kernel can access indices >= 64 in salg_name,
which is undefined behavior -- even though the memory that is accessed
is still located within the sockaddr structure. It would only be
defined behavior if the array were properly marked as arbitrary-length
(either by making it a flexible array, which is the recommended way
these days, or by making it an array of length 0 or 1).
We can't simply change salg_name into a flexible array, since that would
break source compatibility with userspace programs that embed
sockaddr_alg into another struct, or (more commonly) declare a
sockaddr_alg like 'struct sockaddr_alg sa = { .salg_name = "foo" };'.
One solution would be to change salg_name into a flexible array only
when '#ifdef __KERNEL__'. However, that would keep userspace without an
easy way to actually use the longer algorithm names.
Instead, add a new structure 'sockaddr_alg_new' that has the flexible
array field, and expose it to both userspace and the kernel.
Make the kernel use it correctly in alg_bind().
This addresses the syzbot report
"UBSAN: array-index-out-of-bounds in alg_bind"
(https://syzkaller.appspot.com/bug?extid=92ead4eb8e26a26d465e).
Use the new crypto_engine_alloc_init_and_set() function to
initialize crypto-engine and enable retry mechanism.
Set the maximum size for crypto-engine software queue based on
Job Ring size (JOBR_DEPTH) and a threshold (reserved for the
non-crypto-API requests that are not passed through crypto-engine).
The callback for do_batch_requests is NULL, since CAAM
doesn't support linked requests.
Eric Biggers [Mon, 26 Oct 2020 16:31:12 +0000 (09:31 -0700)]
crypto: testmgr - WARN on test failure
Currently, by default crypto self-test failures only result in a
pr_warn() message and an "unknown" status in /proc/crypto. Both of
these are easy to miss. There is also an option to panic the kernel
when a test fails, but that can't be the default behavior.
A crypto self-test failure always indicates a kernel bug, however, and
there's already a standard way to report (recoverable) kernel bugs --
the WARN() family of macros. WARNs are noisier and harder to miss, and
existing test systems already know to look for them in dmesg or via
/proc/sys/kernel/tainted.
Therefore, call WARN() when an algorithm fails its self-tests.
Eric Biggers [Mon, 26 Oct 2020 16:17:02 +0000 (09:17 -0700)]
crypto: testmgr - always print the actual skcipher driver name
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the skcipher algorithm tests by getting the driver name
from the crypto_skcipher that actually got allocated.
Eric Biggers [Mon, 26 Oct 2020 16:17:01 +0000 (09:17 -0700)]
crypto: testmgr - always print the actual AEAD driver name
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the AEAD algorithm tests by getting the driver name from
the crypto_aead that actually got allocated.
Eric Biggers [Mon, 26 Oct 2020 16:17:00 +0000 (09:17 -0700)]
crypto: testmgr - always print the actual hash driver name
When alg_test() is called from tcrypt.ko rather than from the algorithm
registration code, "driver" is actually the algorithm name, not the
driver name. So it shouldn't be used in places where a driver name is
wanted, e.g. when reporting a test failure or when checking whether the
driver is the generic driver or not.
Fix this for the hash algorithm tests by getting the driver name from
the crypto_ahash or crypto_shash that actually got allocated.
Arvind Sankar [Sun, 25 Oct 2020 14:31:18 +0000 (10:31 -0400)]
crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tested on Broadwell Xeon).
Arvind Sankar [Sun, 25 Oct 2020 14:31:17 +0000 (10:31 -0400)]
crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()
The temporary W[] array is currently zeroed out once every call to
sha256_transform(), i.e. once every 64 bytes of input data. Moving it to
sha256_update() instead so that it is cleared only once per update can
save about 2-3% of the total time taken to compute the digest, with a
reasonable memset() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
The assignments to clear a through h and t1/t2 are optimized out by the
compiler because they are unused after the assignments.
Clearing individual scalar variables is unlikely to be useful, as they
may have been assigned to registers, and even if stack spilling was
required, there may be compiler-generated temporaries that are
impossible to clear in any case.