Ard Biesheuvel [Tue, 5 Jan 2021 16:47:58 +0000 (17:47 +0100)]
crypto: x86/twofish - drop CTR mode implementation
Twofish in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:57 +0000 (17:47 +0100)]
crypto: x86/cast6 - drop CTR mode implementation
CAST6 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:56 +0000 (17:47 +0100)]
crypto: x86/cast5 - drop CTR mode implementation
CAST5 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:55 +0000 (17:47 +0100)]
crypto: x86/serpent - drop CTR mode implementation
Serpent in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:54 +0000 (17:47 +0100)]
crypto: x86/camellia - drop CTR mode implementation
Camellia in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:52 +0000 (17:47 +0100)]
crypto: x86/twofish - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Twofish in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:51 +0000 (17:47 +0100)]
crypto: x86/serpent- switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Serpent in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:50 +0000 (17:47 +0100)]
crypto: x86/cast6 - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement CAST6 in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:49 +0000 (17:47 +0100)]
crypto: x86/camellia - switch to XTS template
Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Camellia in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster.
Kai Ye [Tue, 5 Jan 2021 06:16:42 +0000 (14:16 +0800)]
crypto: hisilicon - add ZIP device using mode parameter
Add 'uacce_mode' parameter for ZIP, which can be set as 0(default) or 1.
'0' means ZIP is only registered to kernel crypto, and '1' means it's
registered to both kernel crypto and UACCE.
Kai Ye [Tue, 5 Jan 2021 06:12:03 +0000 (14:12 +0800)]
crypto: hisilicon/qm - SVA bugfixed on Kunpeng920
Kunpeng920 SEC/HPRE/ZIP cannot support running user space SVA and kernel
Crypto at the same time. Therefore, the algorithms should not be registered
to Crypto as user space SVA is enabled.
Jiri Olsa [Mon, 4 Jan 2021 23:02:37 +0000 (00:02 +0100)]
crypto: bcm - Rename struct device_private to bcm_device_private
Renaming 'struct device_private' to 'struct bcm_device_private',
because it clashes with 'struct device_private' from
'drivers/base/base.h'.
While it's not a functional problem, it's causing two distinct
type hierarchies in BTF data. It also breaks build with options:
CONFIG_DEBUG_INFO_BTF=y
CONFIG_CRYPTO_DEV_BCM_SPU=y
Wojciech Ziemba [Mon, 4 Jan 2021 16:55:46 +0000 (16:55 +0000)]
crypto: qat - configure arbiter mapping based on engines enabled
The hardware specific function adf_get_arbiter_mapping() modifies
the static array thrd_to_arb_map to disable mappings for AEs
that are disabled. This static array is used for each device
of the same type. If the ae mask is not identical for all devices
of the same type then the arbiter mapping returned by
adf_get_arbiter_mapping() may be wrong.
This patch fixes this problem by ensuring the static arbiter
mapping is unchanged and the device arbiter mapping is re-calculated
each time based on the static mapping.
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:50 +0000 (16:55 +0100)]
crypto: aesni - replace function pointers with static branches
Replace the function pointers in the GCM implementation with static branches,
which are based on code patching, which occurs only at module load time.
This avoids the severe performance penalty caused by the use of retpolines.
In order to retain the ability to switch between different versions of the
implementation based on the input size on cores that support AVX and AVX2,
use static branches instead of static calls.
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:49 +0000 (16:55 +0100)]
crypto: aesni - refactor scatterlist processing
Currently, the gcm(aes-ni) driver open codes the scatterlist handling
that is encapsulated by the skcipher walk API. So let's switch to that
instead.
Also, move the handling at the end of gcmaes_crypt_by_sg() that is
dependent on whether we are encrypting or decrypting into the callers,
which always do one or the other.
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:46 +0000 (16:55 +0100)]
crypto: aesni - prevent misaligned buffers on the stack
The GCM mode driver uses 16 byte aligned buffers on the stack to pass
the IV to the asm helpers, but unfortunately, the x86 port does not
guarantee that the stack pointer is 16 byte aligned upon entry in the
first place. Since the compiler is not aware of this, it will not emit
the additional stack realignment sequence that is needed, and so the
alignment is not guaranteed to be more than 8 bytes.
So instead, allocate some padding on the stack, and realign the IV
pointer by hand.
Ard Biesheuvel [Thu, 31 Dec 2020 16:41:55 +0000 (17:41 +0100)]
crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper
The AES-NI driver implements XTS via the glue helper, which consumes
a struct with sets of function pointers which are invoked on chunks
of input data of the appropriate size, as annotated in the struct.
Let's get rid of this indirection, so that we can perform direct calls
to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
provide a helper which can consume inputs of any size, provided that the
penultimate, full block is passed via the last call if ciphertext stealing
needs to be applied.
This also allows us to enable the XTS mode for i386.
Ard Biesheuvel [Thu, 31 Dec 2020 16:41:54 +0000 (17:41 +0100)]
crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.
Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.
As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)
Rob Herring [Thu, 10 Dec 2020 20:03:14 +0000 (14:03 -0600)]
crypto: picoxcell - Remove PicoXcell driver
PicoXcell has had nothing but treewide cleanups for at least the last 8
years and no signs of activity. The most recent activity is a yocto vendor
kernel based on v3.0 in 2015.
On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1. It is also almost three times
as fast as the generic implementation of BLAKE2b:
This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S. At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP). It does
only one block at a time, so it performs well on short messages too.
NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.
Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s. This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it. The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.
(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)
For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet. However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.
Eric Biggers [Wed, 23 Dec 2020 08:10:02 +0000 (00:10 -0800)]
crypto: blake2b - update file comment
The file comment for blake2b_generic.c makes it sound like it's the
reference implementation of BLAKE2b with only minor changes. But it's
actually been changed a lot. Update the comment to make this clearer.
Eric Biggers [Wed, 23 Dec 2020 08:10:01 +0000 (00:10 -0800)]
crypto: blake2b - sync with blake2s implementation
Sync the BLAKE2b code with the BLAKE2s code as much as possible:
- Move a lot of code into new headers <crypto/blake2b.h> and
<crypto/internal/blake2b.h>, and adjust it to be like the
corresponding BLAKE2s code, i.e. like <crypto/blake2s.h> and
<crypto/internal/blake2s.h>.
- Rename constants, e.g. BLAKE2B_*_DIGEST_SIZE => BLAKE2B_*_HASH_SIZE.
- Use a macro BLAKE2B_ALG() to define the shash_alg structs.
- Export blake2b_compress_generic() for use as a fallback.
This makes it much easier to add optimized implementations of BLAKE2b,
as optimized implementations can use the helper functions
crypto_blake2b_{setkey,init,update,final}() and
blake2b_compress_generic(). The ARM implementation will use these.
But this change is also helpful because it eliminates unnecessary
differences between the BLAKE2b and BLAKE2s code, so that the same
improvements can easily be made to both. (The two algorithms are
basically identical, except for the word size and constants.) It also
makes it straightforward to add a library API for BLAKE2b in the future
if/when it's needed.
This change does make the BLAKE2b code slightly more complicated than it
needs to be, as it doesn't actually provide a library API yet. For
example, __blake2b_update() doesn't really need to exist yet; it could
just be inlined into crypto_blake2b_update(). But I believe this is
outweighed by the benefits of keeping the code in sync.
Eric Biggers [Wed, 23 Dec 2020 08:09:59 +0000 (00:09 -0800)]
crypto: arm/blake2s - add ARM scalar optimized BLAKE2s
Add an ARM scalar optimized implementation of BLAKE2s.
NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help. Each NEON instruction would depend on the
previous one, resulting in poor performance.
With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.
Performance results on Cortex-A7 in cycles per byte using the shash API:
Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.
This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.
Eric Biggers [Wed, 23 Dec 2020 08:09:55 +0000 (00:09 -0800)]
crypto: blake2s - optimize blake2s initialization
If no key was provided, then don't waste time initializing the block
buffer, as its initial contents won't be used.
Also, make crypto_blake2s_init() and blake2s() call a single internal
function __blake2s_init() which treats the key as optional, rather than
conditionally calling blake2s_init() or blake2s_init_key(). This
reduces the compiled code size, as previously both blake2s_init() and
blake2s_init_key() were being inlined into these two callers, except
when the key size passed to blake2s() was a compile-time constant.
These optimizations aren't that significant for BLAKE2s. However, the
equivalent optimizations will be more significant for BLAKE2b, as
everything is twice as big in BLAKE2b. And it's good to keep things
consistent rather than making optimizations for BLAKE2b but not BLAKE2s.
Eric Biggers [Wed, 23 Dec 2020 08:09:54 +0000 (00:09 -0800)]
crypto: blake2s - share the "shash" API boilerplate code
Add helper functions for shash implementations of BLAKE2s to
include/crypto/internal/blake2s.h, taking advantage of
__blake2s_update() and __blake2s_final() that were added by the previous
patch to share more code between the library and shash implementations.
crypto_blake2s_setkey() and crypto_blake2s_init() are usable as
shash_alg::setkey and shash_alg::init directly, while
crypto_blake2s_update() and crypto_blake2s_final() take an extra
'blake2s_compress_t' function pointer parameter. This allows the
implementation of the compression function to be overridden, which is
the only part that optimized implementations really care about.
The new functions are inline functions (similar to those in sha1_base.h,
sha256_base.h, and sm3_base.h) because this avoids needing to add a new
module blake2s_helpers.ko, they aren't *too* long, and this avoids
indirect calls which are expensive these days. Note that they can't go
in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S
from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency.
Finally, use these new helper functions in the x86 implementation of
BLAKE2s. (This part should be a separate patch, but unfortunately the
x86 implementation used the exact same function names like
"crypto_blake2s_update()", so it had to be updated at the same time.)
Eric Biggers [Wed, 23 Dec 2020 08:09:53 +0000 (00:09 -0800)]
crypto: blake2s - move update and final logic to internal/blake2s.h
Move most of blake2s_update() and blake2s_final() into new inline
functions __blake2s_update() and __blake2s_final() in
include/crypto/internal/blake2s.h so that this logic can be shared by
the shash helper functions. This will avoid duplicating this logic
between the library and shash implementations.
Eric Biggers [Wed, 23 Dec 2020 08:09:52 +0000 (00:09 -0800)]
crypto: blake2s - remove unneeded includes
It doesn't make sense for the generic implementation of BLAKE2s to
include <crypto/internal/simd.h> and <linux/jump_label.h>, as these are
things that would only be useful in an architecture-specific
implementation. Remove these unnecessary includes.
Eric Biggers [Wed, 23 Dec 2020 08:09:51 +0000 (00:09 -0800)]
crypto: x86/blake2s - define shash_alg structs using macros
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size. So, avoid
code duplication by using a macro to define these structs.
Eric Biggers [Wed, 23 Dec 2020 08:09:50 +0000 (00:09 -0800)]
crypto: blake2s - define shash_alg structs using macros
The shash_alg structs for the four variants of BLAKE2s are identical
except for the algorithm name, driver name, and digest size. So, avoid
code duplication by using a macro to define these structs.
Matthias Brugger [Fri, 18 Dec 2020 10:57:08 +0000 (11:57 +0100)]
hwrng: iproc-rng200 - Move enable/disable in separate function
We are calling the same code for enable and disable the block in various
parts of the driver. Put that code into a new function to reduce code
duplication.
Matthias Brugger [Fri, 18 Dec 2020 10:57:07 +0000 (11:57 +0100)]
hwrng: iproc-rng200 - Fix disable of the block.
When trying to disable the block we bitwise or the control
register with value zero. This is confusing as using bitwise or with
value zero doesn't have any effect at all. Drop this as we already set
the enable bit to zero by appling inverted RNG_RBGEN_MASK.
Ard Biesheuvel [Thu, 17 Dec 2020 18:55:16 +0000 (19:55 +0100)]
crypto: arm64/aes-ctr - improve tail handling
Counter mode is a stream cipher chaining mode that is typically used
with inputs that are of arbitrarily length, and so a tail block which
is smaller than a full AES block is rule rather than exception.
The current ctr(aes) implementation for arm64 always makes a separate
call into the assembler routine to process this tail block, which is
suboptimal, given that it requires reloading of the AES round keys,
and prevents us from handling this tail block using the 5-way stride
that we use for better performance on deep pipelines.
So let's update the assembler routine so it can handle any input size,
and uses NEON permutation instructions and overlapping loads and stores
to handle the tail block. This results in a ~16% speedup for 1420 byte
blocks on cores with deep pipelines such as ThunderX2.
Ard Biesheuvel [Thu, 17 Dec 2020 18:55:15 +0000 (19:55 +0100)]
crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled
Commit 69b6f2e817e5b ("crypto: arm64/aes-neon - limit exposed routines if
faster driver is enabled") intended to hide modes from the plain NEON
driver that are also implemented by the faster bit sliced NEON one if
both are enabled. However, the defined() CPP function does not detect
if the bit sliced NEON driver is enabled as a module. So instead, let's
use IS_ENABLED() here.
Fixes: 69b6f2e817e5b ("crypto: arm64/aes-neon - limit exposed routines if ...") Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Add HMAC support to the Keem Bay OCS HCU driver, thus making it provide
the following additional transformations:
- hmac(sha256)
- hmac(sha384)
- hmac(sha512)
- hmac(sm3)
The Keem Bay OCS HCU hardware does not allow "context-switch" for HMAC
operations, i.e., it does not support computing a partial HMAC, save its
state and then continue it later. Therefore, full hardware acceleration
is provided only when possible (e.g., when crypto_ahash_digest() is
called); in all other cases hardware acceleration is only partial (OPAD
and IPAD calculation is done in software, while hashing is hardware
accelerated).
Declan Murphy [Wed, 16 Dec 2020 11:46:36 +0000 (11:46 +0000)]
crypto: keembay - Add Keem Bay OCS HCU driver
Add support for the Hashing Control Unit (HCU) included in the Offload
Crypto Subsystem (OCS) of the Intel Keem Bay SoC, thus enabling
hardware-accelerated hashing on the Keem Bay SoC for the following
algorithms:
- sha256
- sha384
- sha512
- sm3
The driver is composed of two files:
- 'ocs-hcu.c' which interacts with the hardware and abstracts it by
providing an API following the usual paradigm used in hashing drivers
/ libraries (e.g., hash_init(), hash_update(), hash_final(), etc.).
NOTE: this API can block and sleep, since completions are used to wait
for the HW to complete the hashing.
- 'keembay-ocs-hcu-core.c' which exports the functionality provided by
'ocs-hcu.c' as a ahash crypto driver. The crypto engine is used to
provide asynchronous behavior. 'keembay-ocs-hcu-core.c' also takes
care of the DMA mapping of the input sg list.
The driver passes crypto manager self-tests, including the extra tests
(CRYPTO_MANAGER_EXTRA_TESTS=y).
Corentin Labbe [Mon, 14 Dec 2020 20:02:30 +0000 (20:02 +0000)]
crypto: sun4i-ss - fix kmap usage
With the recent kmap change, some tests which were conditional on
CONFIG_DEBUG_HIGHMEM now are enabled by default.
This permit to detect a problem in sun4i-ss usage of kmap.
sun4i-ss uses two kmap via sg_miter (one for input, one for output), but
using two kmap at the same time is hard:
"the ordering has to be correct and with sg_miter that's probably hard to get
right." (quoting Tlgx)
So the easiest solution is to never have two sg_miter/kmap open at the same time.
After each use of sg_miter, I store the current index, for being able to
resume sg_miter to the right place.
Corentin Labbe [Mon, 14 Dec 2020 20:02:27 +0000 (20:02 +0000)]
crypto: sun4i-ss - IV register does not work on A10 and A13
Allwinner A10 and A13 SoC have a version of the SS which produce
invalid IV in IVx register.
Instead of adding a variant for those, let's convert SS to produce IV
directly from data. Fixes: 6298e948215f2 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Cc: <[email protected]> Signed-off-by: Corentin Labbe <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Corentin Labbe [Mon, 14 Dec 2020 20:02:26 +0000 (20:02 +0000)]
crypto: sun4i-ss - checking sg length is not sufficient
The optimized cipher function need length multiple of 4 bytes.
But it get sometimes odd length.
This is due to SG data could be stored with an offset.
So the fix is to check also if the offset is aligned with 4 bytes. Fixes: 6298e948215f2 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Cc: <[email protected]> Signed-off-by: Corentin Labbe <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Remove dev_err() messages after platform_get_irq*() failures.
drivers/crypto/inside-secure/safexcel.c: line 1161 is redundant
because platform_get_irq() already prints an error
Ard Biesheuvel [Fri, 11 Dec 2020 12:27:15 +0000 (13:27 +0100)]
crypto: remove cipher routines from public crypto API
The cipher routines in the crypto API are mostly intended for templates
implementing skcipher modes generically in software, and shouldn't be
used outside of the crypto subsystem. So move the prototypes and all
related definitions to a new header file under include/crypto/internal.
Also, let's use the new module namespace feature to move the symbol
exports into a new namespace CRYPTO_INTERNAL.
Ard Biesheuvel [Fri, 11 Dec 2020 12:27:14 +0000 (13:27 +0100)]
chcr_ktls: use AES library for single use cipher
Allocating a cipher via the crypto API only to free it again after using
it to encrypt a single block is unnecessary in cases where the algorithm
is known at compile time. So replace this pattern with a call to the AES
library.
Ard Biesheuvel [Tue, 8 Dec 2020 14:34:41 +0000 (15:34 +0100)]
crypto: tcrypt - avoid signed overflow in byte count
The signed long type used for printing the number of bytes processed in
tcrypt benchmarks limits the range to -/+ 2 GiB, which is not sufficient
to cover the performance of common accelerated ciphers such as AES-NI
when benchmarked with sec=1. So switch to u64 instead.
While at it, fix up a missing printk->pr_cont conversion in the AEAD
benchmark.
Ard Biesheuvel [Mon, 7 Dec 2020 23:34:02 +0000 (00:34 +0100)]
crypto: aesni - implement support for cts(cbc(aes))
Follow the same approach as the arm64 driver for implementing a version
of AES-NI in CBC mode that supports ciphertext stealing. This results in
a ~2x speed increase for relatively short inputs (less than 256 bytes),
which is relevant given that AES-CBC with ciphertext stealing is used
for filename encryption in the fscrypt layer. For larger inputs, the
speedup is still significant (~25% on decryption, ~6% on encryption)
MAINTAINERS: crypto: s5p-sss: drop Kamil Konieczny
E-mails to Kamil Konieczny to his Samsung address bounce with 550 (User
unknown). Kamil no longer takes care about Samsung S5P SSS driver so
remove the invalid email address from:
- mailmap,
- bindings maintainer entries,
- maintainers entry for S5P Security Subsystem crypto accelerator.
Ard Biesheuvel [Sat, 2 Jan 2021 13:59:09 +0000 (14:59 +0100)]
crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
Pavel reports that commit 17858b140bf4 ("crypto: ecdh - avoid unaligned
accesses in ecdh_set_secret()") fixes one problem but introduces another:
the unconditional memcpy() introduced by that commit may overflow the
target buffer if the source data is invalid, which could be the result of
intentional tampering.
So check params.key_size explicitly against the size of the target buffer
before validating the key further.
Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block
size multiples") refactored the chacha block handling in the glue code in
a way that may result in the counter increment to be omitted when calling
chacha_block_xor_neon() to process a full block. This violates the skcipher
API, which requires that the output IV is suitable for handling more input
as long as the preceding input has been presented in round multiples of the
block size. Also, the same code is exposed via the chacha library interface
whose callers may actually rely on this increment to occur even for final
blocks that are smaller than the chacha block size.
So increment the counter after calling chacha_block_xor_neon().
Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") Reported-by: Eric Biggers <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
Linus Torvalds [Sun, 27 Dec 2020 18:56:33 +0000 (10:56 -0800)]
proc mountinfo: make splice available again
Since commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops") we've required that file operation structures explicitly
enable splice support, rather than falling back to the default handlers.
Most /proc files use the indirect 'struct proc_ops' to describe their
file operations, and were fixed up to support splice earlier in commits 40be821d627c..b24c30c67863, but the mountinfo files interact with the
VFS directly using their own 'struct file_operations' and got missed as
a result.
This adds the necessary support for splice to work for /proc/*/mountinfo
and friends.
Linus Torvalds [Sun, 27 Dec 2020 17:03:41 +0000 (09:03 -0800)]
Merge tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Update/fix two CPU sanity checks in the hotplug and the boot code, and
fix a typo in the Kconfig help text.
[ Context: the first two commits are the result of an ongoing
annotation+review work of (intentional) tick_do_timer_cpu() data
races reported by KCSAN, but the annotations aren't fully cooked
yet ]"
* tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix spelling mistake in Kconfig "fullfill" -> "fulfill"
tick/sched: Remove bogus boot "safety" check
tick: Remove pointless cpu valid check in hotplug code
Linus Torvalds [Sat, 26 Dec 2020 17:19:49 +0000 (09:19 -0800)]
mfd: ab8500-debugfs: Remove extraneous seq_putc
Commit c9a3c4e637ac ("mfd: ab8500-debugfs: Remove extraneous curly
brace") removed a left-over curly brace that caused build failures, but
Joe Perches points out that the subsequent 'seq_putc()' should also be
removed, because the commit that caused all these problems already added
the final '\n' to the seq_printf() above it.
PCI: dwc: Fix inverted condition of DMA mask setup warning
Commit 660c486590aa ("PCI: dwc: Set 32-bit DMA mask for MSI target address
allocation") added dma_mask_set() call to explicitly set 32-bit DMA mask
for MSI message mapping, but for now it throws a warning on ret == 0, while
dma_set_mask() returns 0 in case of success.
0001:00:00.0 PCI bridge: Molex Incorporated Device 1ad2 (rev a1)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)
0005:00:00.0 PCI bridge: Molex Incorporated Device 1ad0 (rev a1)
0005:01:00.0 PCI bridge: PLX Technology, Inc. Device 3380 (rev ab)
0005:02:02.0 PCI bridge: PLX Technology, Inc. Device 3380 (rev ab)
0005:03:00.0 USB controller: PLX Technology, Inc. Device 3380 (rev ab)
The problem seems to be dw_pcie_setup_rc() is now called twice before and
after the link up handling. The fix is to move Tegra's link up handling to
.start_link() function like other DWC drivers. Tegra is a bit more
complicated than others as it re-inits the whole DWC controller to retry
the link. With this, the initialization ordering is restored to match the
prior sequence.
clang (quite rightly) complains fairly loudly about the newly added
mpc1_get_mpc_out_mux() function returning an uninitialized value if the
'opp_id' checks don't pass.
This may not happen in practice, but the code really shouldn't return
garbage if the sanity checks don't pass.
So just initialize 'val' to zero to avoid the issue.
Linus Torvalds [Fri, 25 Dec 2020 19:07:34 +0000 (11:07 -0800)]
Merge tag 'perf-tools-2020-12-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Refactor 'perf stat' per CPU/socket/die/thread aggregation fixing use
cases in ARM machines.
- Fix memory leak when synthesizing SDT probes in 'perf probe'.
- Update kernel header copies related to KVM, epol_pwait. msr-index and
powerpc and s390 syscall tables.
* tag 'perf-tools-2020-12-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (24 commits)
perf probe: Fix memory leak when synthesizing SDT probes
perf stat aggregation: Add separate thread member
perf stat aggregation: Add separate core member
perf stat aggregation: Add separate die member
perf stat aggregation: Add separate socket member
perf stat aggregation: Add separate node member
perf stat aggregation: Start using cpu_aggr_id in map
perf cpumap: Drop in cpu_aggr_map struct
perf cpumap: Add new map type for aggregation
perf stat: Replace aggregation ID with a struct
perf cpumap: Add new struct for cpu aggregation
perf cpumap: Use existing allocator to avoid using malloc
perf tests: Improve topology test to check all aggregation types
perf tools: Update s390's syscall.tbl copy from the kernel sources
perf tools: Update powerpc's syscall.tbl copy from the kernel sources
perf s390: Move syscall.tbl check into check-headers.sh
perf powerpc: Move syscall.tbl check to check-headers.sh
tools headers UAPI: Synch KVM's svm.h header with the kernel
tools kvm headers: Update KVM headers from the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
...
Linus Torvalds [Fri, 25 Dec 2020 19:05:32 +0000 (11:05 -0800)]
Merge branch 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux
Pull coccinelle updates from Julia Lawall.
* 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
scripts: coccicheck: Correct usage of make coccicheck
coccinelle: update expiring email addresses
coccinnelle: Remove ptr_ret script
kbuild: do not use scripts/ld-version.sh for checking spatch version
remove boolinit.cocci
Michael Ellerman [Fri, 25 Dec 2020 11:30:58 +0000 (22:30 +1100)]
genirq: Fix export of irq_to_desc() for powerpc KVM
Commit 64a1b95bb9fe ("genirq: Restrict export of irq_to_desc()") removed
the export of irq_to_desc() unless powerpc KVM is being built, because
there is still a use of irq_to_desc() in modular code there.
However it used:
#ifdef CONFIG_KVM_BOOK3S_64_HV
Which doesn't work when that symbol is =m, leading to a build failure:
Linus Torvalds [Fri, 25 Dec 2020 18:54:29 +0000 (10:54 -0800)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted patches from previous cycle(s)..."
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix hostfs_open() use of ->f_path.dentry
Make sure that make_create_in_sticky() never sees uninitialized value of dir_mode
fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set
fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
fs/namespace.c: WARN if mnt_count has become negative
Linus Torvalds [Thu, 24 Dec 2020 22:20:33 +0000 (14:20 -0800)]
Merge tag 'docs-5.11-2' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A small set of late-arriving, small documentation fixes"
* tag 'docs-5.11-2' of git://git.lwn.net/linux:
docs: admin-guide: Fix default value of max_map_count in sysctl/vm.rst
Documentation/submitting-patches: Document the SoB chain
Documentation: process: Correct numbering
docs: submitting-patches: Trivial - fix grammatical error
Linus Torvalds [Thu, 24 Dec 2020 22:16:02 +0000 (14:16 -0800)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Various bug fixes and cleanups for ext4; no new features this cycle"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
ext4: remove unnecessary wbc parameter from ext4_bio_write_page
ext4: avoid s_mb_prefetch to be zero in individual scenarios
ext4: defer saving error info from atomic context
ext4: simplify ext4 error translation
ext4: move functions in super.c
ext4: make ext4_abort() use __ext4_error()
ext4: standardize error message in ext4_protect_reserved_inode()
ext4: remove redundant sb checksum recomputation
ext4: don't remount read-only with errors=continue on reboot
ext4: fix deadlock with fs freezing and EA inodes
jbd2: add a helper to find out number of fast commit blocks
ext4: make fast_commit.h byte identical with e2fsprogs/fast_commit.h
ext4: fix fall-through warnings for Clang
ext4: add docs about fast commit idempotence
ext4: remove the unused EXT4_CURRENT_REV macro
ext4: fix an IS_ERR() vs NULL check
ext4: check for invalid block size early when mounting a file system
ext4: fix a memory leak of ext4_free_data
ext4: delete nonsensical (commented-out) code inside ext4_xattr_block_set()
ext4: update ext4_data_block_valid related comments
...
Linus Torvalds [Thu, 24 Dec 2020 22:08:43 +0000 (14:08 -0800)]
Merge tag 'Smack-for-5.11-io_uring-fix' of git://github.com/cschaufler/smack-next
Pull smack fix from Casey Schaufler:
"Provide a fix for the incorrect handling of privilege in the face of
io_uring's use of kernel threads. That invalidated an long standing
assumption regarding the privilege of kernel threads.
The fix is simple and safe. It was provided by Jens Axboe and has been
tested"
* tag 'Smack-for-5.11-io_uring-fix' of git://github.com/cschaufler/smack-next:
Smack: Handle io_uring kernel thread privileges
Linus Torvalds [Thu, 24 Dec 2020 22:02:00 +0000 (14:02 -0800)]
Merge tag 'powerpc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Four commits fixing various things in the new C VDSO code
- One fix for a 32-bit VMAP stack bug
- Two minor build fixes
Thanks to Cédric Le Goater, Christophe Leroy, and Will Springer.
* tag 'powerpc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32: Fix vmap stack - Properly set r1 before activating MMU on syscall too
powerpc/vdso: Fix DOTSYM for 32-bit LE VDSO
powerpc/vdso: Don't pass 64-bit ABI cflags to 32-bit VDSO
powerpc/vdso: Block R_PPC_REL24 relocations
powerpc/smp: Add __init to init_big_cores()
powerpc/time: Force inlining of get_tb()
powerpc/boot: Fix build of dts/fsl