Linus Torvalds [Fri, 12 Nov 2021 19:44:31 +0000 (11:44 -0800)]
Merge tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"This includes new ioctls to get and set parameters and in particular
the backup switch mode that is needed for some RTCs to actually enable
the backup voltage (and have a useful RTC).
The same interface can also be used to get the actual features
supported by the RTC so userspace has a better way than trying and
failing.
Summary:
Subsystem:
- Add new ioctl to get and set extra RTC parameters, this includes
backup switch mode
- Expose available features to userspace, in particular, when alarmas
have a resolution of one minute instead of a second.
- Let the core handle those alarms with a minute resolution
New driver:
- MSTAR MSC313 RTC
Drivers:
- Add SPI ID table where necessary
- Add BSM support for rv3028, rv3032 and pcf8523
- s3c: set RTC range
- rx8025: set range, implement .set_offset and .read_offset"
* tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
rtc: rx8025: use .set_offset/.read_offset
rtc: rx8025: use rtc_add_group
rtc: rx8025: clear RTC_FEATURE_ALARM when alarm are not supported
rtc: rx8025: set range
rtc: rx8025: let the core handle the alarm resolution
rtc: rx8025: switch to devm_rtc_allocate_device
rtc: ab8500: let the core handle the alarm resolution
rtc: ab-eoz9: support UIE when available
rtc: ab-eoz9: use RTC_FEATURE_UPDATE_INTERRUPT
rtc: rv3032: let the core handle the alarm resolution
rtc: s35390a: let the core handle the alarm resolution
rtc: handle alarms with a minute resolution
rtc: pcf85063: silence cppcheck warning
rtc: rv8803: fix writing back ctrl in flag register
rtc: s3c: Add time range
rtc: s3c: Extract read/write IO into separate functions
rtc: s3c: Remove usage of devm_rtc_device_register()
rtc: tps80031: Remove driver
rtc: sun6i: Allow probing without an early clock provider
rtc: pcf8523: add BSM support
...
Dave Jones [Fri, 29 Oct 2021 20:57:59 +0000 (16:57 -0400)]
x86/mce: Add errata workaround for Skylake SKX37
Errata SKX37 is word-for-word identical to the other errata listed in
this workaround. I happened to notice this after investigating a CMCI
storm on a Skylake host. While I can't confirm this was the root cause,
spurious corrected errors does sound like a likely suspect.
Steve French [Wed, 10 Nov 2021 09:15:29 +0000 (03:15 -0600)]
smb3: do not setup the fscache_super_cookie until fsinfo initialized
We were calling cifs_fscache_get_super_cookie after tcon but before
we queried the info (QFS_Info) we need to initialize the cookie
properly. Also includes an additional check suggested by Paulo
to make sure we don't initialize super cookie twice.
Sasha Levin [Fri, 12 Nov 2021 15:16:02 +0000 (10:16 -0500)]
tools/lib/lockdep: drop liblockdep
TL;DR: While a tool like liblockdep is useful, it probably doesn't
belong within the kernel tree.
liblockdep attempts to reuse kernel code both directly (by directly
building the kernel's lockdep code) as well as indirectly (by using
sanitized headers). This makes liblockdep an integral part of the
kernel.
It also makes liblockdep quite unique: while other userspace code might
use sanitized headers, it generally doesn't attempt to use kernel code
directly which means that changes on the kernel side of things don't
affect (and break) it directly.
All our workflows and tooling around liblockdep don't support this
uniqueness. Changes that go into the kernel code aren't validated to not
break in-tree userspace code.
liblockdep ended up being very fragile, breaking over and over, to the
point that living in the same tree as the lockdep code lost most of it's
value.
liblockdep should continue living in an external tree, syncing with
the kernel often, in a controllable way.
Paulo Alcantara [Fri, 12 Nov 2021 18:16:08 +0000 (15:16 -0300)]
cifs: fix potential use-after-free bugs
Ensure that share and prefix variables are set to NULL after kfree()
when looping through DFS targets in __tree_connect_dfs_target().
Also, get rid of @ref in __tree_connect_dfs_target() and just pass a
boolean to indicate whether we're handling link targets or not.
Fixes: c88f7dcd6d64 ("cifs: support nested dfs links over reconnect") Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches") Signed-off-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Steve French <[email protected]>
Steve French [Fri, 12 Nov 2021 15:55:03 +0000 (09:55 -0600)]
cifs: release lock earlier in dequeue_mid error case
In dequeue_mid we can log an error while holding a spinlock,
GlobalMid_Lock. Coverity notes that the error logging
also grabs a lock so it is cleaner (and a bit safer) to
release the GlobalMid_Lock before logging the warning.
Linus Torvalds [Fri, 12 Nov 2021 18:56:25 +0000 (10:56 -0800)]
thermal: int340x: fix build on 32-bit targets
Commit aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.
That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access. Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.
It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.
The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.
So just add the include for the 'io-64-nonatomic-lo-hi.h' version.
Fixes: aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses") Reported-by: Jakub Kicinski <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Paul Moore [Fri, 12 Nov 2021 17:07:02 +0000 (12:07 -0500)]
net,lsm,selinux: revert the security_sctp_assoc_established() hook
This patch reverts two prior patches, e7310c94024c
("security: implement sctp_assoc_established hook in selinux") and 7c2ef0240e6a ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation. Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.
Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel. In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.
Ming Lei [Fri, 12 Nov 2021 12:47:15 +0000 (20:47 +0800)]
blk-mq: fix filesystem I/O request allocation
submit_bio_checks() may update bio->bi_opf, so we have to initialize
blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks()
returns when allocating new request.
In case of using cached request, fallback to allocate new request if
cached rq isn't compatible with the incoming bio, otherwise change
rq->cmd_flags with incoming bio->bi_opf.
Marc Zyngier [Fri, 12 Nov 2021 14:10:39 +0000 (14:10 +0000)]
of/irq: Don't ignore interrupt-controller when interrupt-map failed
Since 041284181226 ("of/irq: Allow matching of an interrupt-map local
to an interrupt controller"), the irq code favors using an interrupt-map
over a interrupt-controller property if both are available, while the
earlier behaviour was to ignore the interrupt-map altogether.
However, we now end-up with the opposite behaviour, which is to
ignore the interrupt-controller property even if the interrupt-map
fails to match its input. This new behaviour breaks the AmigaOne
X1000 machine, which ships with an extremely "creative" (read:
broken) device tree.
Fix this by allowing the interrupt-controller property to be selected
when interrupt-map fails to match anything.
Guo Ren [Fri, 5 Nov 2021 09:47:48 +0000 (17:47 +0800)]
irqchip/sifive-plic: Fixup EOI failed when masked
When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver,
only the first interrupt is handled, and following interrupts are never
delivered (initially reported in [1]).
That's because the RISC-V PLIC cannot EOI masked interrupts, as explained
in the description of Interrupt Completion in the PLIC spec [2]:
<quote>
The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored.
</quote>
Re-enable the interrupt before completion if it has been masked during
the handling, and remask it afterwards.
The mask/unmask must be implemented, and enable/disable supplement
them if the HW requires something different at startup time. When
irq source is disabled by mask, mpintc could complete irq normally.
Introduction of map_uid made two lookups from outer map to be distinct.
That distinction is only necessary when inner map has an embedded timer.
Otherwise it will make the verifier state pruning to be conservative
which will cause complex programs to hit 1M insn_processed limit.
Tighten map_uid logic to apply to inner maps with timers only.
Magnus Karlsson [Thu, 11 Nov 2021 07:57:07 +0000 (08:57 +0100)]
xsk: Fix crash on double free in buffer pool
Fix a crash in the buffer pool allocator when a buffer is double
freed. It is possible to trigger this behavior not only from a faulty
driver, but also from user space like this: Create a zero-copy AF_XDP
socket. Load an XDP program that will issue XDP_DROP for all
packets. Put the same umem buffer into the fill ring multiple times,
then bind the socket and send some traffic. This will crash the kernel
as the XDP_DROP action triggers one call to xsk_buff_free()/xp_free()
for every packet dropped. Each call will add the corresponding buffer
entry to the free_list and increase the free_list_cnt. Some entries
will have been added multiple times due to the same buffer being
freed. The buffer allocation code will then traverse this broken list
and since the same buffer is in the list multiple times, it will try
to delete the same buffer twice from the list leading to a crash.
The fix for this is just to test that the buffer has not been added
before in xp_free(). If it has been, just return from the function and
do not put it in the free_list a second time.
Note that this bug was not present in the code before the commit
referenced in the Fixes tag. That code used one list entry per
allocated buffer, so multiple frees did not have any side effects. But
the commit below optimized the usage of the pool and only uses a
single entry per buffer in the umem, meaning that multiple
allocations/frees of the same buffer will also only use one entry,
thus leading to the problem.
Ian Rogers [Thu, 4 Nov 2021 06:41:47 +0000 (23:41 -0700)]
perf test: Use macro for "suite" declarations
Currently tests are setup in builtin-test with function pointers. Kunit
exposes tests as a kunit_suite with a null terminated array of test
cases. Use a macro to aid transition from one to the other in later
changes.
perf beauty: Add socket level scnprintf that handles ARCH specific SOL_SOCKET
SOL_SOCKET has a different value according to the architecture, some
have it as 0xffff while all the others have it as 1, so a simple string
array isn't usable, add a scnprintf routine that treats it as a special
case, using the array for other values.
So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.
So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.
perf beauty socket: Rename header_dir to uapi_header_dir
Paving the way to pass more headers to be consumed, like
tools/perf/trace/beauty/include/linux/socket.h in addition to the
current tools/include/uapi/linux/in.h, to get the SOL_* defines.
perf beauty: Rename socket_ipproto.sh to socket.sh to hold more socket table generators
To avoid having to add new entries to tools/perf/Makefile.perf prep
socket.sh so that it can generate other socket table generators, such as
the upcoming SOL_ socket level one.
perf beauty: Make all sockaddr files use a common naming scheme
The script that generates the tables was named 'socket.sh', which is
confusing, rename it to sockaddr.sh and make sure the related
Makefile.perf targets also use the 'sockaddr' namespace.
Arnd Bergmann [Sat, 6 Nov 2021 18:42:29 +0000 (19:42 +0100)]
ARM: 9156/1: drop cc-option fallbacks for architecture selection
Naresh and Antonio ran into a build failure with latest Debian
armhf compilers, with lots of output like
tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode
As it turns out, $(cc-option) fails early here when the FPU is not
selected before CPU architecture is selected, as the compiler
option check runs before enabling -msoft-float, which causes
a problem when testing a target architecture level without an FPU:
cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU
Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this
issue, but the fallback logic is already broken because all supported
compilers (gcc-5 and higher) are much more recent than these options,
and building with -march=armv5t as a fallback no longer works.
The best way forward that I see is to just remove all the checks, which
also has the nice side-effect of slightly improving the startup time for
'make'.
The -mtune=marvell-f option was apparently never supported by any mainline
compiler, and the custom Codesourcery gcc build that did support is
now too old to build kernels, so just use -mtune=xscale unconditionally
for those.
This should be safe to apply on all stable kernels, and will be required
in order to keep building them with gcc-11 and higher.
Michał Mirosław [Thu, 4 Nov 2021 16:28:28 +0000 (17:28 +0100)]
ARM: 9155/1: fix early early_iounmap()
Currently __set_fixmap() bails out with a warning when called in early boot
from early_iounmap(). Fix it, and while at it, make the comment a bit easier
to understand.
Bio will be checked at beginning of submit_bio_noacct(). If bio needs
to be throttled, it will start the timer and stop submit bio directly.
Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
But in the current process, if bio is throttled, it will still set bio
issue->value by blkcg_bio_issue_init(). This is redundant and may cause
the above use-after-free.
CPU0 CPU1
submit_bio
submit_bio_noacct
submit_bio_checks
blk_throtl_bio()
<=mod_timer(&sq->pending_timer
blk_throtl_dispatch_work_fn
submit_bio_noacct() <= bio have
throttle tag, will throw directly
and bio issue->value will be set
here
bio_endio()
bio_put()
bio_free() <= free this bio
blkcg_bio_issue_init(bio)
<= bio has been freed and
will lead to UAF
return BLK_QC_T_NONE
Paolo Bonzini [Fri, 12 Nov 2021 09:02:24 +0000 (04:02 -0500)]
KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from
Use the same cleanup code independent of whether the cgroup to be
uncharged and unref'd is the source or the destination cgroup. Use a
bool to track whether the destination cgroup has been charged, which also
fixes a bug in the error case: the destination cgroup must be uncharged
only if it does not match the source.
Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Signed-off-by: Paolo Bonzini <[email protected]>
Paolo Bonzini [Fri, 12 Nov 2021 07:53:41 +0000 (02:53 -0500)]
KVM: x86: move guest_pv_has out of user_access section
When UBSAN is enabled, the code emitted for the call to guest_pv_has
includes a call to __ubsan_handle_load_invalid_value. objtool
complains that this call happens with UACCESS enabled; to avoid
the warning, pull the calls to user_access_begin into both arms
of the "if" statement, after the check for guest_pv_has.
David Heidelberg [Fri, 29 Oct 2021 14:24:42 +0000 (16:24 +0200)]
dt-bindings: watchdog: sunxi: fix error in schema
"maxItems" is not needed with an "items" list
Fixes:
$ DT_SCHEMA_FILES=Documentation/devicetree/bindings/watchdog/allwinner,sun4i-a10-wdt.yaml make dtbs_check
Documentation/devicetree/bindings/watchdog/allwinner,sun4i-a10-wdt.yaml: properties:clocks: {'required': ['maxItems']} is not allowed for {'minItems': 1, 'maxItems': 2, 'items': [{'description': 'High-frequency oscillator input, divided internally'}, {'description': 'Low-frequency oscillator input, only found on some variants'}]}
hint: "maxItems" is not needed with an "items" list
from schema $id: http://devicetree.org/meta-schemas/items.yaml#
...
bindings: media: venus: Drop redundant maxItems for power-domain-names
make dt_binding_check:
Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml: ignoring, error in schema: properties: power-domain-names
warning: no schema found in file: Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml
Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml: properties:power-domain-names: {'required': ['maxItems']} is not allowed for {'minItems': 2, 'maxItems': 3, 'items': [{'const': 'venus'}, {'const': 'vcodec0'}, {'const': 'cx'}]}
hint: "maxItems" is not needed with an "items" list
from schema $id: http://devicetree.org/meta-schemas/items.yaml#
Rob Herring [Tue, 9 Nov 2021 16:46:50 +0000 (10:46 -0600)]
clk: versatile: clk-icst: Ensure clock names are unique
Commit 2d3de197a818 ("ARM: dts: arm: Update ICST clock nodes 'reg' and
node names") moved to using generic node names. That results in trying
to register multiple clocks with the same name. Fix this by including
the unit-address in the clock name.
Rob Herring [Tue, 9 Nov 2021 16:46:49 +0000 (10:46 -0600)]
of: Support using 'mask' in making device bus id
Commit 25b892b583cc ("ARM: dts: arm: Update register-bit-led nodes
'reg' and node names") added a 'reg' property to nodes. This change has
the side effect of changing how the kernel generates the device name.
The assumption was a translatable 'reg' address is unique. However, in
the case of the register-bit-led binding (and a few others) that is not
the case. The 'mask' property must also be used in this case to make a
unique device name.
Patrice Chotard [Wed, 10 Nov 2021 15:01:44 +0000 (16:01 +0100)]
dt-bindings: treewide: Update @st.com email address to @foss.st.com
Not all @st.com email address are concerned, only people who have
a specific @foss.st.com email will see their entry updated.
For some people, who left the company, remove their email.
David Heidelberg [Fri, 29 Oct 2021 14:11:33 +0000 (16:11 +0200)]
dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz
clock-frequency is only restricted by the upper limit of 400 kHz.
Found with:
$ DT_SCHEMA_FILES=Documentation/devicetree/bindings/i2c/i2c-imx.yaml make dtbs_check
...
arch/arm64/boot/dts/freescale/imx8mq-librem5-r2.dt.yaml: i2c@30a20000: clock-frequency:0:0: 387000 is not one of [100000, 400000]
From schema: linux/Documentation/devicetree/bindings/i2c/i2c-imx.yaml
...
Namjae Jeon [Tue, 2 Nov 2021 23:08:44 +0000 (08:08 +0900)]
ksmbd: remove smb2_buf_length in smb2_hdr
To move smb2_hdr to smbfs_common, This patch remove smb2_buf_length
variable in smb2_hdr. Also, declare smb2_get_msg function to get smb2
request/response from ->request/response_buf.
Namjae Jeon [Sun, 31 Oct 2021 00:53:50 +0000 (09:53 +0900)]
ksmbd: set unique value to volume serial field in FS_VOLUME_INFORMATION
Steve French reported ksmbd set fixed value to volume serial field in
FS_VOLUME_INFORMATION. Volume serial value needs to be set to a unique
value for client fscache. This patch set crc value that is generated
with share name, path name and netbios name to volume serial.
Jens Axboe [Fri, 12 Nov 2021 00:32:53 +0000 (17:32 -0700)]
io-wq: serialize hash clear with wakeup
We need to ensure that we serialize the stalled and hash bits with the
wait_queue wait handler, or we could be racing with someone modifying
the hashed state after we find it busy, but before we then give up and
wait for it to be cleared. This can cause random delays or stalls when
handling buffered writes for many files, where some of these files cause
hash collisions between the worker threads.
Linus Torvalds [Thu, 11 Nov 2021 23:10:18 +0000 (15:10 -0800)]
Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Revert conversion to struct device.driver instead of struct
pci_dev.driver.
The device.driver is set earlier, and using it caused the PCI core to
call driver PM entry points before .probe() and after .remove(), when
the driver isn't prepared.
This caused NULL pointer dereferences in i2c_designware_pci and
probably other driver issues"
* tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
Revert "PCI: Remove struct pci_dev->driver"
Damien Le Moal [Mon, 8 Nov 2021 23:45:25 +0000 (08:45 +0900)]
libata: add horkage for missing Identify Device log
ACS-3 introduced the ATA Identify Device Data log as mandatory. A
warning message currently signals to the user if a device does not
report supporting this log page in the log directory page, regardless
of the ATA version of the device. Furthermore, this warning will appear
for all attempts at accessing this missing log page during device
revalidation.
Since it is useless to constantly access the log directory and warn
about this lack of support once we have discovered that the device
does not support this log page, introduce the horkage flag
ATA_HORKAGE_NO_ID_DEV_LOG to mark a device as lacking support for
the Identify Device Data log page. Set this flag when
ata_log_supported() returns false in ata_identify_page_supported().
The warning is printed only if the device ATA level is 10 or above
(ACS-3 or above), and only once on device scan. With this flag set, the
log directory page is not accessed again to test for Identify Device
Data log page support.
Linus Torvalds [Thu, 11 Nov 2021 23:00:04 +0000 (15:00 -0800)]
Merge tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
"This contains initialization fixups, testing improvements, addition of
instruction pointer to data-race reports, and scoped data-race checks"
* tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: selftest: Cleanup and add missing __init
kcsan: Move ctx to start of argument list
kcsan: Support reporting scoped read-write access type
kcsan: Start stack trace with explicit location if provided
kcsan: Save instruction pointer for scoped accesses
kcsan: Add ability to pass instruction pointer of access to reporting
kcsan: test: Fix flaky test case
kcsan: test: Use kunit_skip() to skip tests
kcsan: test: Defer kcsan_test_init() after kunit initialization
Linus Torvalds [Thu, 11 Nov 2021 22:47:32 +0000 (14:47 -0800)]
Merge tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"Features
- use per file locks for transactional queries
- update policy management capability checks to work with LSM stacking
Bug Fixes:
- check/put label on apparmor_sk_clone_security()
- fix error check on update of label hname
- fix introspection of of task mode for unconfined tasks
Cleanups:
- avoid -Wempty-body warning
- remove duplicated 'Returns:' comments
- fix doc warning
- remove unneeded one-line hook wrappers
- use struct_size() helper in kzalloc()
- fix zero-length compiler warning in AA_BUG()
- file.h: delete duplicated word
- delete repeated words in comments
- remove repeated declaration"
* tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: remove duplicated 'Returns:' comments
apparmor: remove unneeded one-line hook wrappers
apparmor: Use struct_size() helper in kzalloc()
apparmor: fix zero-length compiler warning in AA_BUG()
apparmor: use per file locks for transactional queries
apparmor: fix doc warning
apparmor: Remove the repeated declaration
apparmor: avoid -Wempty-body warning
apparmor: Fix internal policy capable check for policy management
apparmor: fix error check
security: apparmor: delete repeated words in comments
security: apparmor: file.h: delete duplicated word
apparmor: switch to apparmor to internal capable check for policy management
apparmor: update policy capable checks to use a label
apparmor: fix introspection of of task mode for unconfined tasks
apparmor: check/put label on apparmor_sk_clone_security()
Linus Torvalds [Thu, 11 Nov 2021 22:31:47 +0000 (14:31 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"The post-linux-next material.
7 patches.
Subsystems affected by this patch series (all mm): debug,
slab-generic, migration, memcg, and kasan"
* emailed patches from Andrew Morton <[email protected]>:
kasan: add kasan mode messages when kasan init
mm: unexport {,un}lock_page_memcg
mm: unexport folio_memcg_{,un}lock
mm/migrate.c: remove MIGRATE_PFN_LOCKED
mm: migrate: simplify the file-backed pages validation when migrating its mapping
mm: allow only SLUB on PREEMPT_RT
mm/page_owner.c: modify the type of argument "order" in some functions
Linus Torvalds [Thu, 11 Nov 2021 22:22:05 +0000 (14:22 -0800)]
Merge tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"Only two changes.
One removes the now unused CONFIG_MCPU32 symbol. The other sets a
default for the CONFIG_MEMORY_RESERVE config symbol (this aids
scripting and other automation) so you don't interactively get asked
for a value at configure time.
Summary:
- remove unused CONFIG_MCPU32 symbol
- default CONFIG_MEMORY_RESERVE value (for scripting)"
* tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: Remove MCPU32 config symbol
m68k: set a default value for MEMORY_RESERVE
Robert reported a NULL pointer dereference caused by the PCI core
(local_pci_probe()) calling the i2c_designware_pci driver's
.runtime_resume() method before the .probe() method. i2c_dw_pci_resume()
depends on initialization done by i2c_dw_pci_probe().
Prior to 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver"), pci_pm_runtime_resume() avoided calling the
.runtime_resume() method because pci_dev->driver had not been set yet.
2a4d9408c9e8 and b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"),
removed pci_dev->driver, replacing it by device->driver, which *has* been
set by this time, so pci_pm_runtime_resume() called the .runtime_resume()
method when it previously had not.
Revert b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"), which is needed
to revert 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver").
2a4d9408c9e8 caused a NULL pointer dereference reported by Robert Święcki.
Details in the revert of that commit.
When BLKRESETZONE ioctl and data read race, the data read leaves stale
page cache. The commit e5113505904e ("block: Discard page cache of zone
reset target range") added page cache truncation to avoid stale page
cache after the ioctl. However, the stale page cache still can be read
during the reset zone operation for the ioctl. To avoid the stale page
cache completely, hold invalidate_lock of the block device file mapping.
Linus Torvalds [Thu, 11 Nov 2021 18:16:33 +0000 (10:16 -0800)]
Merge tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two locking fixes:
- Add mutex protection to ring_buffer_reset()
- Fix deadlock in modify_ftrace_direct_multi()"
* tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/direct: Fix lockup in modify_ftrace_direct_multi
ring-buffer: Protect ring_buffer_reset() from reentrancy
- net: fix possible NULL deref in sock_reserve_memory
- amt: fix error return code in amt_init(); fix stopping the
workqueue
- ax88796c: use the correct ioctl callback
Previous releases - always broken:
- bpf: stop caching subprog index in the bpf_pseudo_func insn
- security: fixups for the security hooks in sctp
- nfc: add necessary privilege flags in netlink layer, limit
operations to admin only
- vsock: prevent unnecessary refcnt inc for non-blocking connect
- net/smc: fix sk_refcnt underflow on link down and fallback
- nfnetlink_queue: fix OOB when mac header was cleared
- can: j1939: ignore invalid messages per standard
- bpf, sockmap:
- fix race in ingress receive verdict with redirect to self
- fix incorrect sk_skb data_end access when src_reg = dst_reg
- strparser, and tls are reusing qdisc_skb_cb and colliding
- ethtool: fix ethtool msg len calculation for pause stats
- vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to
access an unregistering real_dev
- udp6: make encap_rcv() bump the v6 not v4 stats
- drv: prestera: add explicit padding to fix m68k build
- drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge
- drv: mvpp2: fix wrong SerDes reconfiguration order
Misc & small latecomers:
- ipvs: auto-load ipvs on genl access
- mctp: sanity check the struct sockaddr_mctp padding fields
- libfs: support RENAME_EXCHANGE in simple_rename()
- avoid double accounting for pure zerocopy skbs"
* tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits)
selftests/net: udpgso_bench_rx: fix port argument
net: wwan: iosm: fix compilation warning
cxgb4: fix eeprom len when diagnostics not implemented
net: fix premature exit from NAPI state polling in napi_disable()
net/smc: fix sk_refcnt underflow on linkdown and fallback
net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer()
gve: fix unmatched u64_stats_update_end()
net: ethernet: lantiq_etop: Fix compilation error
selftests: forwarding: Fix packet matching in mirroring selftests
vsock: prevent unnecessary refcnt inc for nonblocking connect
net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
net: stmmac: allow a tc-taprio base-time of zero
selftests: net: test_vxlan_under_vrf: fix HV connectivity test
net: hns3: allow configure ETS bandwidth of all TCs
net: hns3: remove check VF uc mac exist when set by PF
net: hns3: fix some mac statistics is always 0 in device version V2
net: hns3: fix kernel crash when unload VF while it is being reset
net: hns3: sync rx ring head in echo common pull
net: hns3: fix pfc packet number incorrect after querying pfc parameters
...
Linus Torvalds [Thu, 11 Nov 2021 17:44:29 +0000 (09:44 -0800)]
Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is a single fix for 5.16-rc1 to resolve a build problem that came
in through the coresight tree (and as such came in through the
char/misc tree merge in the 5.16-rc1 merge window).
It resolves a build problem with 'allmodconfig' on arm64 and is acked
by the proper subsystem maintainers. It has been in linux-next all
week with no reported problems"
* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
arm64: cpufeature: Export this_cpu_has_cap helper
Linus Torvalds [Thu, 11 Nov 2021 17:40:15 +0000 (09:40 -0800)]
Merge tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small reverts and fixes for USB drivers for issues that
came up during the 5.16-rc1 merge window.
These include:
- two reverts of xhci and USB core patches that are causing problems
in many systems.
- xhci 3.1 enumeration delay fix for systems that were having
problems.
All three of these have been in linux-next all week with no reported
issues"
* tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
Revert "usb: core: hcd: Add support for deferring roothub registration"
Revert "xhci: Set HCD flag to defer primary roothub registration"
Alistair Popple [Thu, 11 Nov 2021 04:32:40 +0000 (20:32 -0800)]
mm/migrate.c: remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.
Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.
Baolin Wang [Thu, 11 Nov 2021 04:32:37 +0000 (20:32 -0800)]
mm: migrate: simplify the file-backed pages validation when migrating its mapping
There is no need to validate the file-backed page's refcount before
trying to freeze the page's expected refcount, instead we can rely on
the folio_ref_freeze() to validate if the page has the expected refcount
before migrating its mapping.
Moreover we are always under the page lock when migrating the page
mapping, which means nowhere else can remove it from the page cache, so
we can remove the xas_load() validation under the i_pages lock.
Ingo Molnar [Thu, 11 Nov 2021 04:32:33 +0000 (20:32 -0800)]
mm: allow only SLUB on PREEMPT_RT
Memory allocators may disable interrupts or preemption as part of the
allocation and freeing process. For PREEMPT_RT it is important that
these sections remain deterministic and short and therefore don't depend
on the size of the memory to allocate/ free or the inner state of the
algorithm.
Until v3.12-RT the SLAB allocator was an option but involved several
changes to meet all the requirements. The SLUB design fits better with
PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
patchset. Comparing the two allocator, SLUB outperformed SLAB in both
throughput (time needed to allocate and free memory) and the maximal
latency of the system measured with cyclictest during hackbench.
SLOB was never evaluated since it was unlikely that it preforms better
than SLAB. During a quick test, the kernel crashed with SLOB enabled
during boot.
Paolo Bonzini [Thu, 11 Nov 2021 15:52:26 +0000 (10:52 -0500)]
Merge branch 'kvm-sev-move-context' into kvm-master
Add support for AMD SEV and SEV-ES intra-host migration support. Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.
In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.
Vitaly Kuznetsov [Thu, 11 Nov 2021 13:47:33 +0000 (14:47 +0100)]
KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.
Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.
For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.
Vipin Sharma [Tue, 9 Nov 2021 17:44:26 +0000 (17:44 +0000)]
KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()
Handle #GP on INVPCID due to an invalid type in the common switch
statement instead of relying on the callers (VMX and SVM) to manually
validate the type.
Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
the type before reading the operand from memory, so deferring the
type validity check until after that point is architecturally allowed.
Vipin Sharma [Tue, 9 Nov 2021 17:44:25 +0000 (17:44 +0000)]
KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT
handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
holds the invalidation type. Add a helper to retrieve reg2 from VMX
instruction info to consolidate and document the shift+mask magic.
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations. Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).
KVM: VMX: Macrofy the MSR bitmap getters and setters
Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02. This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.
The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.
The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.
Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.
Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.
KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN
When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
expect that the value we keep internally in KVM wasn't updated.