Shakeel Butt [Fri, 18 Dec 2020 22:01:38 +0000 (14:01 -0800)]
mm, kvm: account kvm_vcpu_mmap to kmemcg
A VCPU of a VM can allocate couple of pages which can be mmap'ed by the
user space application. At the moment this memory is not charged to the
memcg of the VMM. On a large machine running large number of VMs or
small number of VMs having large number of VCPUs, this unaccounted
memory can be very significant. So, charge this memory to the memcg of
the VMM. Please note that lifetime of these allocations corresponds to
the lifetime of the VMM.
Alex Shi [Fri, 18 Dec 2020 22:01:31 +0000 (14:01 -0800)]
mm/memcg: warning on !memcg after readahead page charged
Add VM_WARN_ON_ONCE_PAGE() macro.
Since readahead page is charged on memcg too, in theory we don't have to
check this exception now. Before safely remove them all, add a warning
for the unexpected !memcg.
Alex Shi [Fri, 18 Dec 2020 22:01:28 +0000 (14:01 -0800)]
mm/memcg: bail early from swap accounting if memcg disabled
Patch series "bail out early for memcg disable".
These 2 patches are indepenedent from per memcg lru lock, and may
encounter unexpected warning, so let's move out them from per memcg
lru locking patchset.
This patch (of 2):
We could bail out early when memcg wasn't enabled.
selftests/core: add regression test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
This test is a minimalized version of the reproducer given by syzbot
(cf. [1]).
After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when
CLOSE_RANGE_CLOEXEC is specified in conjunction with
CLOSE_RANGE_UNSHARE. When CLOSE_RANGE_UNSHARE is specified the caller
will receive a private file descriptor table in case their file
descriptor table is currently shared.
For the case where the caller has requested all file descriptors to be
actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that
the caller does not need any of the file descriptors anymore and will
optimize the close operation by only copying all files in the range from
0 to 3 and no others.
However, if the caller requested CLOSE_RANGE_CLOEXEC together with
CLOSE_RANGE_UNSHARE the caller wants to still make use of the file
descriptors so the kernel needs to copy all of them and can't optimize.
The original patch didn't account for this and thus could cause oopses
as evidenced by the syzbot report. Add tests for this regression.
We first create a huge gap in the fd table. When we now call
CLOSE_RANGE_UNSHARE with a shared fd table and and with ~0U as upper
bound the kernel will only copy up to fd1 file descriptors into the new
fd table. If the kernel is buggy and doesn't handle CLOSE_RANGE_CLOEXEC
correctly it will not have copied all file descriptors and we will oops!
This test passes on a fixed kernel and will trigger an oops on a buggy
kernel.
Tobias Klauser [Fri, 18 Dec 2020 14:54:12 +0000 (15:54 +0100)]
selftests/core: fix close_range_test build after XFAIL removal
XFAIL was removed in commit 9847d24af95c ("selftests/harness: Refactor
XFAIL into SKIP") and its use in close_range_test was already replaced
by commit 1d44d0dd61b6 ("selftests: core: use SKIP instead of XFAIL in
close_range_test.c"). However, commit 23afeaeff3d9 ("selftests: core:
add tests for CLOSE_RANGE_CLOEXEC") introduced usage of XFAIL in
TEST(close_range_cloexec). Use SKIP there as well.
close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when
CLOSE_RANGE_CLOEXEC is specified in conjunction with CLOSE_RANGE_UNSHARE.
When CLOSE_RANGE_UNSHARE is specified the caller will receive a private
file descriptor table in case their file descriptor table is currently
shared.
For the case where the caller has requested all file descriptors to be
actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that
the caller does not need any of the file descriptors anymore and will
optimize the close operation by only copying all files in the range from
0 to 3 and no others.
However, if the caller requested CLOSE_RANGE_CLOEXEC together with
CLOSE_RANGE_UNSHARE the caller wants to still make use of the file
descriptors so the kernel needs to copy all of them and can't optimize.
The original patch didn't account for this and thus could cause oopses
as evidenced by the syzbot report because it assumed that all fds had
been copied. Fix this by handling the CLOSE_RANGE_CLOEXEC case.
syzbot reported
==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: null-ptr-deref in atomic64_read include/asm-generic/atomic-instrumented.h:837 [inline]
BUG: KASAN: null-ptr-deref in atomic_long_read include/asm-generic/atomic-long.h:29 [inline]
BUG: KASAN: null-ptr-deref in filp_close+0x22/0x170 fs/open.c:1274
Read of size 8 at addr 0000000000000077 by task syz-executor511/8522
Jason Andryuk [Wed, 16 Dec 2020 14:08:38 +0000 (09:08 -0500)]
xen: Kconfig: remove X86_64 depends from XEN_512GB
commit bfda93aee0ec ("xen: Kconfig: nest Xen guest options")
accidentally re-added X86_64 as a dependency to XEN_512GB. It was
originally removed in commit a13f2ef168cb ("x86/xen: remove 32-bit Xen
PV guest support"). Remove it again.
Boris Protopopov [Thu, 17 Dec 2020 20:58:08 +0000 (20:58 +0000)]
Add SMB 2 support for getting and setting SACLs
Fix passing of the additional security info via version
operations. Force new open when getting SACL and avoid
reuse of files that were previously open without
sufficient privileges to access SACLs.
Uwe Kleine-König [Fri, 18 Dec 2020 10:10:54 +0000 (11:10 +0100)]
rtc: pcf2127: only use watchdog when explicitly available
Most boards using the pcf2127 chip (in my bubble) don't make use of the
watchdog functionality and the respective output is not connected. The
effect on such a board is that there is a watchdog device provided that
doesn't work.
So only register the watchdog if the device tree has a "reset-source"
property.
Rasmus Villemoes [Fri, 18 Dec 2020 10:10:53 +0000 (11:10 +0100)]
dt-bindings: rtc: add reset-source property
Some RTCs, e.g. the pcf2127, can be used as a hardware watchdog. But
if the reset pin is not actually wired up, the driver exposes a
watchdog device that doesn't actually work.
Provide a standard binding that can be used to indicate that a given
RTC can perform a reset of the machine, similar to wakeup-source.
Kent Overstreet [Fri, 18 Dec 2020 09:07:11 +0000 (04:07 -0500)]
mm/filemap: fix infinite loop in generic_file_buffered_read()
If iter->count is 0 and iocb->ki_pos is page aligned, this causes
nr_pages to be 0.
Then in generic_file_buffered_read_get_pages() find_get_pages_contig()
returns 0 - because we asked for 0 pages, so we call
generic_file_buffered_read_no_cached_page() which attempts to add a page
to the page cache, which fails with -EEXIST, and then we loop. Oops...
Linus Torvalds [Fri, 18 Dec 2020 20:50:18 +0000 (12:50 -0800)]
Merge tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"In this release we add the ability to set a 'needsrepair' flag
indicating that we /know/ the filesystem requires xfs_repair, but
other than that, it's the usual strengthening of metadata validation
and miscellaneous cleanups.
Summary:
- Introduce a "needsrepair" "feature" to flag a filesystem as needing
a pass through xfs_repair. This is key to enabling filesystem
upgrades (in xfs_db) that require xfs_repair to make minor
adjustments to metadata.
- Refactor parameter checking of recovered log intent items so that
we actually use the same validation code as them that generate the
intent items.
- Various fixes to online scrub not reacting correctly to directory
entries pointing to inodes that cannot be igetted.
- Refactor validation helpers for data and rt volume extents.
- Refactor XFS_TRANS_DQ_DIRTY out of existence.
- Fix a longstanding bug where mounting with "uqnoenforce" would
start user quotas in non-enforcing mode but /proc/mounts would
display "usrquota", implying that they are being enforced.
- Don't flag dax+reflink inodes as corruption since that is a valid
(but not fully functional) combination right now.
- Clean up raid stripe validation functions.
- Refactor the inode allocation code to be more straightforward.
- Small prep cleanup for idmapping support.
- Get rid of the xfs_buf_t typedef"
* tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits)
xfs: remove xfs_buf_t typedef
fs/xfs: convert comma to semicolon
xfs: open code updating i_mode in xfs_set_acl
xfs: remove xfs_vn_setattr_nonsize
xfs: kill ialloced in xfs_dialloc()
xfs: spilt xfs_dialloc() into 2 functions
xfs: move xfs_dialloc_roll() into xfs_dialloc()
xfs: move on-disk inode allocation out of xfs_ialloc()
xfs: introduce xfs_dialloc_roll()
xfs: convert noroom, okalloc in xfs_dialloc() to bool
xfs: don't catch dax+reflink inodes as corruption in verifier
xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
xfs: remove unneeded return value check for *init_cursor()
xfs: introduce xfs_validate_stripe_geometry()
xfs: show the proper user quota options
xfs: remove the unused XFS_B_FSB_OFFSET macro
xfs: remove unnecessary null check in xfs_generic_create
xfs: directly return if the delta equal to zero
xfs: check tp->t_dqinfo value instead of the XFS_TRANS_DQ_DIRTY flag
xfs: delete duplicated tp->t_dqinfo null check and allocation
...
Linus Torvalds [Fri, 18 Dec 2020 20:46:43 +0000 (12:46 -0800)]
Merge tag 'ktest-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
"No new features. Just a couple of fixes that I had in my local
repository that fixed issues with sending the result emails"
* tag 'ktest-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Fix the logic for truncating the size of the log file for email
ktest.pl: If size of log is too big to email, email error message
Linus Torvalds [Fri, 18 Dec 2020 20:38:28 +0000 (12:38 -0800)]
Merge tag 'drm-next-2020-12-18' of git://anongit.freedesktop.org/drm/drm
Pull more drm updates from Daniel Vetter:
"UAPI Changes:
- Only enable char/agp uapi when CONFIG_DRM_LEGACY is set
Cross-subsystem Changes:
- vma_set_file helper to make vma->vm_file changing less brittle,
acked by Andrew
Core Changes:
- dma-buf heaps improvements
- pass full atomic modeset state to driver callbacks
- shmem helpers: cached bo by default
- cleanups for fbdev, fb-helpers
- better docs for drm modes and SCALING_FITLER uapi
- ttm: fix dma32 page pool regression
Driver Changes:
- multi-hop regression fixes for amdgpu, radeon, nouveau
- lots of small amdgpu hw enabling fixes (display, pm, ...)
- fixes for imx, mcde, meson, some panels, virtio, qxl, i915, all
fairly minor
- some cleanups for legacy drm/fbdev drivers"
* tag 'drm-next-2020-12-18' of git://anongit.freedesktop.org/drm/drm: (117 commits)
drm/qxl: don't allocate a dma_address array
drm/nouveau: fix multihop when move doesn't work.
drm/i915/tgl: Fix REVID macros for TGL to fetch correct stepping
drm/i915: Fix mismatch between misplaced vma check and vma insert
drm/i915/perf: also include Gen11 in OATAILPTR workaround
Revert "drm/i915: re-order if/else ladder for hpd_irq_setup"
drm/amdgpu/disply: fix documentation warnings in display manager
drm/amdgpu: print mmhub client name for dimgrey_cavefish
drm/amdgpu: set mode1 reset as default for dimgrey_cavefish
drm/amd/display: Add get_dig_frontend implementation for DCEx
drm/radeon: remove h from printk format specifier
drm/amdgpu: remove h from printk format specifier
drm/amdgpu: Fix spelling mistake "Heterogenous" -> "Heterogeneous"
drm/amdgpu: fix regression in vbios reservation handling on headless
drm/amdgpu/SRIOV: Extend VF reset request wait period
drm/amdkfd: correct amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu log.
drm/amd/display: Adding prototype for dccg21_update_dpp_dto()
drm/amdgpu: print what method we are using for runtime pm
drm/amdgpu: simplify logic in atpx resume handling
drm/amdgpu: no need to call pci_ignore_hotplug for _PR3
...
That causes only these 'perf bench' objects to rebuild:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
To pick the changes from:
3ceb6543e9cf6ed8 ("fscrypt: remove kernel-internal constants from UAPI header")
That don't result in any changes in tooling, just addressing this perf
build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
tools headers UAPI: Sync linux/const.h with the kernel headers
To pick up the changes in:
a85cbe6159ffc973 ("uapi: move constants from <linux/kernel.h> to <linux/const.h>")
That causes no changes in tooling, just addresses this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/const.h' differs from latest version at 'include/uapi/linux/const.h'
diff -u tools/include/uapi/linux/const.h include/uapi/linux/const.h
Which causes these parts of tools/perf/ to be rebuilt:
CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
LD /tmp/build/perf/trace/beauty/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
At some point these should just be tables read by perf on demand.
This allows 'perf trace' users to use those strings to translate from
the msr ids provided by the msr: tracepoints.
This addresses this perf tools build warning:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
perf trace beauty: Update copy of linux/socket.h with the kernel sources
This just triggers the rebuilding of the syscall beautifiers that
extract patterns from this file due to this cset:
b713c195d5933227 ("net: provide __sys_shutdown_sock() that takes a socket")
After updating it:
CC /tmp/build/perf/trace/beauty/sockaddr.o
Addressing this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Warning: Kernel ABI header at 'tools/include/linux/ctype.h' differs from latest version at 'include/linux/ctype.h'
diff -u tools/include/linux/ctype.h include/linux/ctype.h
And we need to continue using the combination of:
inline __isdigit()
#define isdigit() __isdigit
When the __has_builtin() thing isn't available, as it is a builtin in
older systems with it as a builtin but with compilers not hacinv
__has_builtin(), rendering the __has_builtin() check useless otherwise.
tools headers: Get tools's linux/compiler.h closer to the kernel's
We're cherry picking stuff from the kernel to allow for the other
headers that we keep in sync via tools/perf/check-headers.sh to work,
so introduce linux/compiler_types.h and from there get the compiler
specific stuff.
Boris Protopopov [Fri, 18 Dec 2020 17:30:12 +0000 (11:30 -0600)]
SMB3: Add support for getting and setting SACLs
Add SYSTEM_SECURITY access flag and use with smb2 when opening
files for getting/setting SACLs. Add "system.cifs_ntsd_full"
extended attribute to allow user-space access to the functionality.
Avoid multiple server calls when setting owner, DACL, and SACL.
Linus Torvalds [Fri, 18 Dec 2020 19:08:06 +0000 (11:08 -0800)]
Merge tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
"This is mainly to decouple udelay() and arch_cpu_idle() and simplify
both of them.
Summary:
- Always initialize kernel stack backchain when entering the kernel,
so that unwinding works properly.
- Fix stack unwinder test case to avoid rare interrupt stack
corruption.
- Simplify udelay() and just let it busy loop instead of implementing
a complex logic.
- arch_cpu_idle() cleanup.
- Some other minor improvements"
* tag 's390-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: convert comma to semicolon
s390/idle: allow arch_cpu_idle() to be kprobed
s390/idle: remove raw_local_irq_save()/restore() from arch_cpu_idle()
s390/idle: merge enabled_wait() and arch_cpu_idle()
s390/delay: remove udelay_simple()
s390/irq: select HAVE_IRQ_EXIT_ON_IRQ_STACK
s390/delay: simplify udelay
s390/test_unwind: use timer instead of udelay
s390/test_unwind: fix CALL_ON_STACK tests
s390: make calls to TRACE_IRQS_OFF/TRACE_IRQS_ON balanced
s390: always clear kernel stack backchain before calling functions
Linus Torvalds [Fri, 18 Dec 2020 18:57:27 +0000 (10:57 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"These are some some trivial updates that mostly fix/clean-up code
pushed during the merging window:
- Work around broken GCC 4.9 handling of "S" asm constraint
- Suppress W=1 missing prototype warnings
- Warn the user when a small VA_BITS value cannot map the available
memory
- Drop the useless update to per-cpu cycles"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Work around broken GCC 4.9 handling of "S" constraint
arm64: Warn the user when a small VA_BITS value wastes memory
arm64: entry: suppress W=1 prototype warnings
arm64: topology: Drop the useless update to per-cpu cycles
Linus Torvalds [Fri, 18 Dec 2020 18:43:07 +0000 (10:43 -0800)]
Merge tag 'riscv-for-linus-5.11-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
"We have a handful of new kernel features for 5.11:
- Support for the contiguous memory allocator.
- Support for IRQ Time Accounting
- Support for stack tracing
- Support for strict /dev/mem
- Support for kernel section protection
I'm being a bit conservative on the cutoff for this round due to the
timing, so this is all the new development I'm going to take for this
cycle (even if some of it probably normally would have been OK). There
are, however, some fixes on the list that I will likely be sending
along either later this week or early next week.
There is one issue in here: one of my test configurations
(PREEMPT{,_DEBUG}=y) fails to boot on QEMU 5.0.0 (from April) as of
the .text.init alignment patch.
With any luck we'll sort out the issue, but given how many bugs get
fixed all over the place and how unrelated those features seem my
guess is that we're just running into something that's been lurking
for a while and has already been fixed in the newer QEMU (though I
wouldn't be surprised if it's one of these implicit assumptions we
have in the boot flow). If it was hardware I'd be strongly inclined to
look more closely, but given that users can upgrade their simulators
I'm less worried about it"
* tag 'riscv-for-linus-5.11-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
arm64: Use the generic devmem_is_allowed()
arm: Use the generic devmem_is_allowed()
RISC-V: Use the new generic devmem_is_allowed()
lib: Add a generic version of devmem_is_allowed()
riscv: Fixed kernel test robot warning
riscv: kernel: Drop unused clean rule
riscv: provide memmove implementation
RISC-V: Move dynamic relocation section under __init
RISC-V: Protect all kernel sections including init early
RISC-V: Align the .init.text section
RISC-V: Initialize SBI early
riscv: Enable ARCH_STACKWALK
riscv: Make stack walk callback consistent with generic code
riscv: Cleanup stacktrace
riscv: Add HAVE_IRQ_TIME_ACCOUNTING
riscv: Enable CMA support
riscv: Ignore Image.* and loader.bin
riscv: Clean up boot dir
riscv: Fix compressed Image formats build
RISC-V: Add kernel image sections to the resource tree
Aditya Swarup [Thu, 3 Dec 2020 07:23:58 +0000 (23:23 -0800)]
drm/i915/tgl: Fix REVID macros for TGL to fetch correct stepping
Fix TGL REVID macros to fetch correct display/gt stepping based
on SOC rev id from INTEL_REVID() macro. Previously, we were just
returning the first element of the revid array instead of using
the correct index based on SOC rev id.
Chris Wilson [Wed, 16 Dec 2020 09:29:51 +0000 (09:29 +0000)]
drm/i915: Fix mismatch between misplaced vma check and vma insert
When inserting a VMA, we restrict the placement to the low 4G unless the
caller opts into using the full range. This was done to allow usersapce
the opportunity to transition slowly from a 32b address space, and to
avoid breaking inherent 32b assumptions of some commands.
However, for insert we limited ourselves to 4G-4K, but on verification
we allowed the full 4G. This causes some attempts to bind a new buffer
to sporadically fail with -ENOSPC, but at other times be bound
successfully.
commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
page") suggests that there is a genuine problem with stateless addressing
that cannot utilize the last page in 4G and so we purposefully excluded
it. This means that the quick pin pass may cause us to utilize a buggy
placement.
Chris Wilson [Fri, 27 Nov 2020 14:57:48 +0000 (14:57 +0000)]
Revert "drm/i915: re-order if/else ladder for hpd_irq_setup"
We now use ilk_hpd_irq_setup for all GMCH platforms that do not have
hotplug. These are early gen3 and gen2 devices that now explode on boot
as they try to access non-existent registers.
Dan Carpenter [Thu, 17 Dec 2020 11:01:48 +0000 (14:01 +0300)]
cifs: Delete a stray unlock in cifs_swn_reconnect()
The unlock is done in the caller, this is a stray which leads to a
double unlock bug.
Fixes: bf80e5d4259a ("cifs: Send witness register and unregister commands to userspace daemon") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Samuel Cabrero <[email protected]> Signed-off-by: Steve French <[email protected]>
Linus Torvalds [Fri, 18 Dec 2020 02:07:20 +0000 (18:07 -0800)]
Merge tag 'gpio-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of the GPIO changes for the v5.11 kernel cycle:
Core changes:
- Retired the old set-up function for GPIO IRQ chips. All chips now
use the template struct gpio_irq_chip and pass that to the core to
be set up alongside the gpio_chip. We can finally get rid of the
old cruft.
- Some refactoring and clean up of the core code.
- Support edge event timestamps to be stamped using REALTIME (wall
clock) timestamps. We have found solid use cases for this, so we
support it.
New drivers:
- MStar MSC313 GPIO driver.
- HiSilicon GPIO driver.
Driver improvements:
- The PCA953x driver now also supports the NXP PCAL9554B/C chips.
- The mockup driver can now be probed from the device tree which is
pretty useful for virtual prototyping of devices.
- The Rcar driver now supports .get_multiple()
- The MXC driver dropped some legacy and became a pure device tree
client.
- The Exar driver was moved over to the IDA interface for
enumerating, and also switched over to using regmap for register
access"
* tag 'gpio-v5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (87 commits)
MAINTAINERS: Remove reference to non-existing file
gpio: hisi: Do not require ACPI for COMPILE_TEST
MAINTAINERS: Add maintainer for HiSilicon GPIO driver
gpio: gpio-hisi: Add HiSilicon GPIO support
gpio: cs5535: Simplify the return expression of cs5535_gpio_probe()
gpiolib: irq hooks: fix recursion in gpiochip_irq_unmask
dt-bindings: mt7621-gpio: convert bindings to YAML format
gpiolib: cdev: Flag invalid GPIOs as used
gpio: put virtual gpio device into their own submenu
drivers: gpio: amd8111: use SPDX-License-Identifier
drivers: gpio: amd8111: prefer dev_err()/dev_info() over raw printk
drivers: gpio: bt8xx: prefer dev_err()/dev_warn() over of raw printk
gpio: Add TODO item for debugfs interface
gpio: just plain warning when nonexisting gpio requested
tools: gpio: add option to report wall-clock time to gpio-event-mon
tools: gpio: add support for reporting realtime event clock to lsgpio
gpiolib: cdev: allow edge event timestamps to be configured as REALTIME
gpio: msc313: MStar MSC313 GPIO driver
dt-bindings: gpio: Binding for MStar MSC313 GPIO controller
dt-bindings: gpio: Add a binding header for the MSC313 GPIO driver
...
Linus Torvalds [Fri, 18 Dec 2020 01:56:44 +0000 (17:56 -0800)]
Merge tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- IRQ handling cleanups
- Support for suspend
- Various fixes for UML specific drivers: ubd, vector, xterm
* tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (32 commits)
um: Fix build w/o CONFIG_PM_SLEEP
um: time-travel: Correct time event IRQ delivery
um: irq/sigio: Support suspend/resume handling of workaround IRQs
um: time-travel: Actually apply "free-until" optimisation
um: chan_xterm: Fix fd leak
um: tty: Fix handling of close in tty lines
um: Monitor error events in IRQ controller
um: allocate a guard page to helper threads
um: support some of ARCH_HAS_SET_MEMORY
um: time-travel: avoid multiple identical propagations
um: Fetch registers only for signals which need them
um: Support suspend to RAM
um: Allow PM with suspend-to-idle
um: time: Fix read_persistent_clock64() in time-travel
um: Simplify os_idle_sleep() and sleep longer
um: Simplify IRQ handling code
um: Remove IRQ_NONE type
um: irq: Reduce irq_reg allocation
um: irq: Clean up and rename struct irq_fd
um: Clean up alarm IRQ chip name
...
Linus Torvalds [Fri, 18 Dec 2020 01:46:34 +0000 (17:46 -0800)]
Merge tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2, ubi and ubifs updates from Richard Weinberger:
"JFFS2:
- Fix for a remount regression
- Fix for an abnormal GC exit
- Fix for a possible NULL pointer issue while mounting
UBI:
- Add support ECC-ed NOR flash
- Removal of dead code
UBIFS:
- Make node dumping debug code more reliable
- Various cleanups: less ifdefs, less typos
- Fix for an info leak"
* tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: ubifs_dump_node: Dump all branches of the index node
ubifs: ubifs_dump_sleb: Remove unused function
ubifs: Pass node length in all node dumping callers
Revert "ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len"
ubifs: Limit dumping length by size of memory which is allocated for the node
ubifs: Remove the redundant return in dbg_check_nondata_nodes_order
jffs2: Fix NULL pointer dereference in rp_size fs option parsing
ubifs: Fixed print foramt mismatch in ubifs
ubi: Do not zero out EC and VID on ECC-ed NOR flashes
jffs2: remove trailing semicolon in macro definition
ubifs: Fix error return code in ubifs_init_authentication()
ubifs: wbuf: Don't leak kernel memory to flash
ubi: Remove useless code in bytes_str_to_int
ubifs: Fix the printing type of c->big_lpt
jffs2: Allow setting rp_size to zero during remounting
jffs2: Fix ignoring mounting options problem during remounting
jffs2: Fix GC exit abnormally
ubifs: Code cleanup by removing ifdef macro surrounding
jffs2: Fix if/else empty body warnings
ubifs: Delete duplicated words + other fixes
Linus Torvalds [Fri, 18 Dec 2020 01:41:37 +0000 (17:41 -0800)]
Merge tag '5.11-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"The largest part are for support of the newer mount API which has been
needed for cifs/smb3 mounts for a long time due to the new API's
better handling of remount, and better error reporting. There are
three additional small cleanup patches for this being tested, that are
not included yet.
This series also includes addition of support for the SMB3 witness
protocol which can provide important notifications from the server to
client on server address or export or network changes. This can be
useful for example in order to be notified before the failure - when a
server's IP address changes (in the future it will allow us to support
server notifications of when a share is moved).
It also includes three patches for stable e.g. some that better handle
some confusing error messages during session establishment"
* tag '5.11-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: (55 commits)
cifs: update internal module version number
cifs: Fix support for remount when not changing rsize/wsize
cifs: handle "guest" mount parameter
cifs: correct four aliased mount parms to allow use of previous names
cifs: Tracepoints and logs for tracing credit changes.
cifs: fix use after free in cifs_smb3_do_mount()
cifs: fix rsize/wsize to be negotiated values
cifs: Fix some error pointers handling detected by static checker
smb3: remind users that witness protocol is experimental
cifs: update super_operations to show_devname
cifs: fix uninitialized variable in smb3_fs_context_parse_param
cifs: update mnt_cifs_flags during reconfigure
cifs: move update of flags into a separate function
cifs: remove ctx argument from cifs_setup_cifs_sb
cifs: do not allow changing posix_paths during remount
cifs: uncomplicate printing the iocharset parameter
cifs: don't create a temp nls in cifs_setup_ipc
cifs: simplify handling of cifs_sb/ctx->local_nls
cifs: we do not allow changing username/password/unc/... during remount
cifs: add initial reconfigure support
...
Linus Torvalds [Thu, 17 Dec 2020 21:45:24 +0000 (13:45 -0800)]
Merge tag 'net-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - always broken:
- net/smc: fix access to parent of an ib device
- devlink: use _BITUL() macro instead of BIT() in the UAPI header
- handful of mptcp fixes
Previous release - regressions:
- intel: AF_XDP: clear the status bits for the next_to_use descriptor
- dpaa2-eth: fix the size of the mapped SGT buffer
Previous release - always broken:
- mptcp: fix security context on server socket
- ethtool: fix string set id check
- ethtool: fix error paths in ethnl_set_channels()
- lan743x: fix rx_napi_poll/interrupt ping-pong
- qca: ar9331: fix sleeping function called from invalid context bug"
* tag 'net-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits)
net/sched: sch_taprio: reset child qdiscs before freeing them
nfp: move indirect block cleanup to flower app stop callback
octeontx2-af: Fix undetected unmap PF error check
net: nixge: fix spelling mistake in Kconfig: "Instuments" -> "Instruments"
qlcnic: Fix error code in probe
mptcp: fix pending data accounting
mptcp: push pending frames when subflow has free space
mptcp: properly annotate nested lock
mptcp: fix security context on server socket
net/mlx5: Fix compilation warning for 32-bit platform
mptcp: clear use_ack and use_map when dropping other suboptions
devlink: use _BITUL() macro instead of BIT() in the UAPI header
net: korina: fix return value
net/smc: fix access to parent of an ib device
ethtool: fix error paths in ethnl_set_channels()
nfc: s3fwrn5: Remove unused NCI prop commands
nfc: s3fwrn5: Remove the delay for NFC sleep
phy: fix kdoc warning
tipc: do sanity check payload of a netlink message
use __netdev_notify_peers in hyperv
...
Linus Torvalds [Thu, 17 Dec 2020 21:34:25 +0000 (13:34 -0800)]
Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
setup/handling code.
- Support for KUAP (Kernel User Access Prevention) on systems using the
hashed page table MMU, using memory protection keys.
- Better handling of PowerVM SMT8 systems where all threads of a core
do not share an L2, allowing the scheduler to make better scheduling
decisions.
- Further improvements to our machine check handling.
- Show registers when unwinding interrupt frames during stack traces.
- Improvements to our pseries (PowerVM) partition migration code.
- Several series from Christophe refactoring and cleaning up various
parts of the 32-bit code.
- Other smaller features, fixes & cleanups.
Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.
* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
powerpc: Add config fragment for disabling -Werror
powerpc/configs: Add ppc64le_allnoconfig target
powerpc/powernv: Rate limit opal-elog read failure message
powerpc/pseries/memhotplug: Quieten some DLPAR operations
powerpc/ps3: use dma_mapping_error()
powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
KVM: PPC: fix comparison to bool warning
KVM: PPC: Book3S: Assign boolean values to a bool variable
powerpc: Inline setup_kup()
powerpc/64s: Mark the kuap/kuep functions non __init
KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
powerpc/xive: Improve error reporting of OPAL calls
powerpc/xive: Simplify xive_do_source_eoi()
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
...
Linus Torvalds [Thu, 17 Dec 2020 21:22:17 +0000 (13:22 -0800)]
Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
Linus Torvalds [Thu, 17 Dec 2020 21:01:31 +0000 (13:01 -0800)]
Merge tag 'modules-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 5.11 merge window:
- Fix a race condition between systemd/udev and the module loader.
The module loader was sending a uevent before the module was fully
initialized (i.e., before its init function has been called). This
means udev can start processing the module uevent before the module
has finished initializing, and some udev rules expect that the
module has initialized already upon receiving the uevent.
This resulted in some systemd mount units failing if udev processes
the event faster than the module can finish init. This is fixed by
delaying the uevent until after the module has called its init
routine.
- Make the linker array sections for kernel params and module version
attributes more robust by switching to use the alignment of the
type in question.
Namely, linker section arrays will be constructed using the
alignment required by the struct (using __alignof__()) as opposed
to a specific value such as sizeof(void *) or sizeof(long). This is
less likely to cause breakages should the size of the type ever
change (Johan Hovold)
- Fix module state inconsistency by setting it back to GOING when a
module fails to load and is on its way out (Miroslav Benes)
- Some comment and code cleanups (Sergey Shtylyov)"
* tag 'modules-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: delay kobject uevent until after module init call
module: drop semicolon from version macro
init: use type alignment for kernel parameters
params: clean up module-param macros
params: use type alignment for kernel parameters
params: drop redundant "unused" attributes
module: simplify version-attribute handling
module: drop version-attribute alignment
module: fix comment style
module: add more 'kernel-doc' comments
module: fix up 'kernel-doc' comments
module: only handle errors with the *switch* statement in module_sig_check()
module: avoid *goto*s in module_sig_check()
module: merge repetitive strings in module_sig_check()
module: set MODULE_STATE_GOING state when a module fails to load
Linus Torvalds [Thu, 17 Dec 2020 20:52:23 +0000 (12:52 -0800)]
Merge tag 'dmaengine-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"The last dmaengine updates for this year :)
This contains couple of new drivers, new device support and updates to
bunch of drivers.
New drivers/devices:
- Qualcomm ADM driver
- Qualcomm GPI driver
- Allwinner A100 DMA support
- Microchip Sama7g5 support
- Mediatek MT8516 apdma
Updates:
- more updates to idxd driver and support for IAX config
- runtime PM support for dw driver
- TI drivers"
* tag 'dmaengine-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (75 commits)
soc: ti: k3-ringacc: Use correct error casting in k3_ringacc_dmarings_init
dmaengine: ti: k3-udma-glue: Add support for K3 PKTDMA
dmaengine: ti: k3-udma: Initial support for K3 PKTDMA
dmaengine: ti: k3-udma: Add support for BCDMA channel TPL handling
dmaengine: ti: k3-udma: Initial support for K3 BCDMA
soc: ti: k3-ringacc: add AM64 DMA rings support.
dmaengine: ti: Add support for k3 event routers
dmaengine: ti: k3-psil: Add initial map for AM64
dmaengine: ti: k3-psil: Extend psil_endpoint_config for K3 PKTDMA
dt-bindings: dma: ti: Add document for K3 PKTDMA
dt-bindings: dma: ti: Add document for K3 BCDMA
dmaengine: dmatest: Use dmaengine_get_dma_device
dmaengine: doc: client: Update for dmaengine_get_dma_device() usage
dmaengine: Add support for per channel coherency handling
dmaengine: of-dma: Add support for optional router configuration callback
dmaengine: ti: k3-udma-glue: Configure the dma_dev for rings
dmaengine: ti: k3-udma-glue: Get the ringacc from udma_dev
dmaengine: ti: k3-udma-glue: Add function to get device pointer for DMA API
dmaengine: ti: k3-udma: Add support for second resource range from sysfw
dmaengine: ti: k3-udma: Wait for peer teardown completion if supported
...
Linus Torvalds [Thu, 17 Dec 2020 20:15:03 +0000 (12:15 -0800)]
Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
support
- A series of patches to improve readdir performance, particularly
with large directories
- Basic support for using NFS/RDMA with the pNFS files and flexfiles
drivers
- Micro-optimisations for RDMA
- RDMA tracing improvements
Bugfixes:
- Fix a long standing bug with xs_read_xdr_buf() when receiving
partial pages (Dan Aloni)
- Various fixes for getxattr and listxattr, when used over non-TCP
transports
- Fixes for containerised NFS from Sargun Dhillon
- switch nfsiod to be an UNBOUND workqueue (Neil Brown)
- READDIR should not ask for security label information if there is
no LSM policy (Olga Kornievskaia)
- Avoid using interval-based rebinding with TCP in lockd (Calum
Mackay)
- A series of RPC and NFS layer fixes to support the NFSv4.2
READ_PLUS code
- A couple of fixes for pnfs/flexfiles read failover
Cleanups:
- Various cleanups for the SUNRPC xdr code in conjunction with the
READ_PLUS fixes"
* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
NFSv4/pnfs: Add tracing for the deviceid cache
fs/lockd: convert comma to semicolon
NFSv4.2: fix error return on memory allocation failure
NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
NFSv4.2: decode_read_plus_hole() needs to check the extent offset
NFSv4.2: decode_read_plus_data() must skip padding after data segment
NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
SUNRPC: When expanding the buffer, we may need grow the sparse pages
SUNRPC: Cleanup - constify a number of xdr_buf helpers
SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
SUNRPC: _copy_to/from_pages() now check for zero length
SUNRPC: Cleanup xdr_shrink_bufhead()
SUNRPC: Fix xdr_expand_hole()
SUNRPC: Fixes for xdr_align_data()
SUNRPC: _shift_data_left/right_pages should check the shift length
...
Linus Torvalds [Thu, 17 Dec 2020 19:53:52 +0000 (11:53 -0800)]
Merge tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The big ticket item here is support for msgr2 on-wire protocol, which
adds the option of full in-transit encryption using AES-GCM algorithm
(myself).
On top of that we have a series to avoid intermittent errors during
recovery with recover_session=clean and some MDS request encoding work
from Jeff, a cap handling fix and assorted observability improvements
from Luis and Xiubo and a good number of cleanups.
Luis also ran into a corner case with quotas which sadly means that we
are back to denying cross-quota-realm renames"
* tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client: (59 commits)
libceph: drop ceph_auth_{create,update}_authorizer()
libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1
libceph, ceph: implement msgr2.1 protocol (crc and secure modes)
libceph: introduce connection modes and ms_mode option
libceph, rbd: ignore addr->type while comparing in some cases
libceph, ceph: get and handle cluster maps with addrvecs
libceph: factor out finish_auth()
libceph: drop ac->ops->name field
libceph: amend cephx init_protocol() and build_request()
libceph, ceph: incorporate nautilus cephx changes
libceph: safer en/decoding of cephx requests and replies
libceph: more insight into ticket expiry and invalidation
libceph: move msgr1 protocol specific fields to its own struct
libceph: move msgr1 protocol implementation to its own file
libceph: separate msgr1 protocol implementation
libceph: export remaining protocol independent infrastructure
libceph: export zero_page
libceph: rename and export con->flags bits
libceph: rename and export con->state states
libceph: make con->state an int
...
Linus Torvalds [Thu, 17 Dec 2020 19:42:48 +0000 (11:42 -0800)]
Merge tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
- Allow unprivileged mounting in a user namespace.
For quite some time the security model of overlayfs has been that
operations on underlying layers shall be performed with the
privileges of the mounting task.
This way an unprvileged user cannot gain privileges by the act of
mounting an overlayfs instance. A full audit of all function calls
made by the overlayfs code has been performed to see whether they
conform to this model, and this branch contains some fixes in this
regard.
- Support running on copied filesystem images by optionally disabling
UUID verification.
- Bug fixes as well as documentation updates.
* tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: unprivieged mounts
ovl: do not get metacopy for userxattr
ovl: do not fail because of O_NOATIME
ovl: do not fail when setting origin xattr
ovl: user xattr
ovl: simplify file splice
ovl: make ioctl() safe
ovl: check privs before decoding file handle
vfs: verify source area in vfs_dedupe_file_range_one()
vfs: move cap_convert_nscap() call into vfs_setxattr()
ovl: fix incorrect extent info in metacopy case
ovl: expand warning in ovl_d_real()
ovl: document lower modification caveats
ovl: warn about orphan metacopy
ovl: doc clarification
ovl: introduce new "uuid=off" option for inodes index feature
ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh
Linus Torvalds [Thu, 17 Dec 2020 19:34:25 +0000 (11:34 -0800)]
Merge tag 'fuse-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Improve performance of virtio-fs in mixed read/write workloads
- Try to revalidate cache before returning EEXIST on exclusive create
- Add a couple of miscellaneous bug fixes as well as some code cleanups
* tag 'fuse-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix bad inode
fuse: support SB_NOSEC flag to improve write performance
fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request
fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2
fuse: setattr should set FATTR_KILL_SUIDGID
fuse: set FUSE_WRITE_KILL_SUIDGID in cached write path
fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID
fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2
fuse: always revalidate if exclusive create
virtiofs: clean up error handling in virtio_fs_get_tree()
fuse: add fuse_sb_destroy() helper
fuse: simplify get_fuse_conn*()
fuse: get rid of fuse_mount refcount
virtiofs: simplify sb setup
virtiofs fix leak in setup
fuse: launder page should wait for page writeback
Linus Torvalds [Thu, 17 Dec 2020 19:18:00 +0000 (11:18 -0800)]
Merge tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've made more work into per-file compression support.
For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to
change the algorithm or cluster size per file. F2FS_IOC_COMPRESS |
DECOMPRESS_FILE provides a way to compress and decompress the existing
normal files manually.
There is also a new mount option, compress_mode=fs|user, which can
control who compresses the data.
Chao also added a checksum feature with a mount option so that
we are able to detect any corrupted cluster.
In addition, Daniel contributed casefolding with encryption patch,
which will be used for Android devices.
Summary:
Enhancements:
- add ioctls and mount option to manage per-file compression feature
- support casefolding with encryption
- support checksum for compressed cluster
- avoid IO starvation by replacing mutex with rwsem
- add sysfs, max_io_bytes, to control max bio size
Bug fixes:
- fix use-after-free issue when compression and fsverity are enabled
- fix consistency corruption during fault injection test
- fix data offset for lseek
- get rid of buffer_head which has 32bits limit in fiemap
- fix some bugs in multi-partitions support
- fix nat entry count calculation in shrinker
- fix some stat information
And, we've refactored some logics and fix minor bugs as well"
* tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
f2fs: compress: fix compression chksum
f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
f2fs: fix race of pending_pages in decompression
f2fs: fix to account inline xattr correctly during recovery
f2fs: inline: fix wrong inline inode stat
f2fs: inline: correct comment in f2fs_recover_inline_data
f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()
f2fs: convert to F2FS_*_INO macro
f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
f2fs: don't allow any writes on readonly mount
f2fs: avoid race condition for shrinker count
f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
f2fs: add compress_mode mount option
f2fs: Remove unnecessary unlikely()
f2fs: init dirty_secmap incorrectly
f2fs: remove buffer_head which has 32bits limit
f2fs: fix wrong block count instead of bytes
f2fs: use new conversion functions between blks and bytes
f2fs: rename logical_to_blk and blk_to_logical
f2fs: fix kbytes written stat for multi-device case
...
Linus Torvalds [Thu, 17 Dec 2020 19:00:37 +0000 (11:00 -0800)]
Merge tag 'for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, reiserfs, quota and writeback updates from Jan Kara:
- a couple of quota fixes (mostly for problems found by syzbot)
- several ext2 cleanups
- one fix for reiserfs crash on corrupted image
- a fix for spurious warning in writeback code
* tag 'for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: don't warn on an unregistered BDI in __mark_inode_dirty
fs: quota: fix array-index-out-of-bounds bug by passing correct argument to vfs_cleanup_quota_inode()
reiserfs: add check for an invalid ih_entry_count
ext2: Fix fall-through warnings for Clang
fs/ext2: Use ext2_put_page
docs: filesystems: Reduce ext2.rst to one top-level heading
quota: Sanity-check quota file headers on load
quota: Don't overflow quota file offsets
ext2: Remove unnecessary blank
fs/quota: update quota state flags scheme with project quota flags
Davide Caratti [Wed, 16 Dec 2020 18:33:29 +0000 (19:33 +0100)]
net/sched: sch_taprio: reset child qdiscs before freeing them
syzkaller shows that packets can still be dequeued while taprio_destroy()
is running. Let sch_taprio use the reset() function to cancel the advance
timer and drop all skbs from the child qdiscs.
Simon Horman [Wed, 16 Dec 2020 14:57:01 +0000 (15:57 +0100)]
nfp: move indirect block cleanup to flower app stop callback
The indirect block cleanup may cause control messages to be sent
if offloaded flows are present. However, by the time the flower app
cleanup callback is called txbufs are no longer available and attempts
to send control messages result in a NULL-pointer dereference in
nfp_ctrl_tx_one().
This problem may be resolved by moving the indirect block cleanup
to the stop callback, where txbufs are still available.
Colin Ian King [Wed, 16 Dec 2020 12:36:04 +0000 (12:36 +0000)]
octeontx2-af: Fix undetected unmap PF error check
Currently the check for an unmap PF error is always going to be false
because intr_val is a 32 bit int and is being bit-mask checked against
1ULL << 32. Fix this by making intr_val a u64 to match the type at it
is copied from, namely npa_event_context->npa_af_rvu_ge.
Linus Torvalds [Thu, 17 Dec 2020 18:56:27 +0000 (10:56 -0800)]
Merge tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"A few fsnotify fixes from Amir fixing fallout from big fsnotify
overhaul a few months back and an improvement of defaults limiting
maximum number of inotify watches from Waiman"
* tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix events reported to watching parent and child
inotify: convert to handle_inode_event() interface
fsnotify: generalize handle_inode_event()
inotify: Increase default inotify.max_user_watches limit to 1048576
Jakub Kicinski [Thu, 17 Dec 2020 18:25:10 +0000 (10:25 -0800)]
Merge branch 'mptcp-a-bunch-of-assorted-fixes'
Paolo Abeni says:
====================
mptcp: a bunch of assorted fixes
This series pulls a few fixes for the MPTCP datapath.
Most issues addressed here has been recently introduced
with the recent reworks, with the notable exception of
the first patch, which addresses an issue present since
the early days
====================
Paolo Abeni [Wed, 16 Dec 2020 11:48:35 +0000 (12:48 +0100)]
mptcp: fix pending data accounting
When sendmsg() needs to wait for memory, the pending data
is not updated. That causes a drift in forward memory allocation,
leading to stall and/or warnings at socket close time.
This change addresses the above issue moving the pending data
counter update inside the sendmsg() main loop.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Paolo Abeni [Wed, 16 Dec 2020 11:48:34 +0000 (12:48 +0100)]
mptcp: push pending frames when subflow has free space
When multiple subflows are active, we can receive a
window update on subflow with no write space available.
MPTCP will try to push frames on such subflow and will
fail. Pending frames will be pushed only after receiving
a window update on a subflow with some wspace available.
Overall the above could lead to suboptimal aggregate
bandwidth usage.
Instead, we should try to push pending frames as soon as
the subflow reaches both conditions mentioned above.
We can finally enable self-tests with asymmetric links,
as the above makes them finally pass.
Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Paolo Abeni [Wed, 16 Dec 2020 11:48:33 +0000 (12:48 +0100)]
mptcp: properly annotate nested lock
MPTCP closes the subflows while holding the msk-level lock.
While acquiring the subflow socket lock we need to use the
correct nested annotation, or we can hit a lockdep splat
at runtime.
Parav Pandit [Sun, 13 Dec 2020 12:06:41 +0000 (14:06 +0200)]
net/mlx5: Fix compilation warning for 32-bit platform
MLX5_GENERAL_OBJECT_TYPES types bitfield is 64-bit field.
Defining an enum for such bit fields on 32-bit platform results in below
warning.
./include/vdso/bits.h:7:26: warning: left shift count >= width of type [-Wshift-count-overflow]
^
./include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
^~~
tools headers UAPI: Sync linux/stat.h with the kernel sources
To pick the changes in:
72d1249e2ffdbc34 ("uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT")
That don't cause any change in tooling, just addresses this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/stat.h' differs from latest version at 'include/uapi/linux/stat.h'
diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h
Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
perf test: Make sample-parsing test aware of PERF_SAMPLE_{CODE,DATA}_PAGE_SIZE
To fix this:
$ perf test -v 27
27: Sample parsing :
--- start ---
test child forked, pid 586013
sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating
test child finished with -1
---- end ----
Sample parsing: FAILED!
$
This patchset is still not completely merged, so when adding the
PERF_SAMPLE_CODE_PAGE_SIZE to 'struct perf_sample' we need to add the
bits added in this patch for 'perf_sample.data_page_size'.
Jiri Olsa [Mon, 14 Dec 2020 10:54:48 +0000 (11:54 +0100)]
perf tools: Add support to read build id from compressed elf
Adding support to decompress file before reading build id.
Adding filename__read_build_id and change its current versions to
read_build_id.
Shutting down stderr output of perf list in the shell test:
82: Check open filename arg using perf trace + vfs_getname : Ok
because with decompression code in the place we the
filename__read_build_id function is more verbose in case
of error and the test did not account for that.
Namhyung Kim [Thu, 10 Dec 2020 06:13:01 +0000 (15:13 +0900)]
perf report: Support --header-only for pipe mode
The --header-only checks file header and prints the feature data. But
as pipe mode doesn't have it in the header it prints almost nothing.
Change it to process first few records until it founds HEADER_FEATURE.
Before:
$ perf record -o- true | perf report -i- --header-only
# ========
# captured on : Thu Dec 10 14:34:59 2020
# header version : 1
# data offset : 0
# data size : 0
# feat offset : 0
# ========
#
After:
$ perf record -o- true | perf report -i- --header-only
# ========
# captured on : Thu Dec 10 14:49:11 2020
# header version : 1
# data offset : 0
# data size : 0
# feat offset : 0
# ========
#
# hostname : balhae
# os release : 5.7.17-1xxx
# perf version : 5.10.rc6.gdb0ea13cc741
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
# cpuid : GenuineIntel,6,142,12
# total memory : 16158916 kB
# cmdline : perf record -o- true
# event : name = cycles, , id = { 81, 82, 83, 84, 85, 86, 87, 88 }, size = 120, ...
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_pt = 9, intel_bts = 8, software = 1, power = 20, uprobe = 7, ...
# time of first sample : 0.000000
# time of last sample : 0.000000
# sample duration : 0.000 ms
# MEM_TOPOLOGY info available, use -I to display
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
John Garry [Fri, 4 Dec 2020 11:10:13 +0000 (19:10 +0800)]
perf metricgroup: Split up metricgroup__print()
To aid supporting system event metric groups, break up the function
metricgroup__print() into a part which iterates metrics and a part which
actually "prints" the metric.
Notice that only the results from one PMU are included. Fix the logic of
find_evsel_group() to enable events which apply to multiple PMUs, by
checking if the event pmu_name matches that of the metric event.
With that, "perf stat" now gives:
unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_1/umask=0x81,event=0x22/
unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_0/umask=0x81,event=0x22/
unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_1/umask=0x41,event=0x22/
unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_0/umask=0x41,event=0x22/
Control descriptor is not initialized
unc_cbo_xsnp_response.miss_eviction: 423798310009041001000904100
unc_cbo_xsnp_response.miss_xcore: 218643 10009041001000904100
unc_cbo_xsnp_response.miss_eviction: 425414810009026291000902629
unc_cbo_xsnp_response.miss_xcore: 213352 10009026291000902629
John Garry [Fri, 4 Dec 2020 11:10:10 +0000 (19:10 +0800)]
perf pmu: Add pmu_add_sys_aliases()
Add pmu_add_sys_aliases() to add system PMU events aliases.
For adding system PMU events, iterate through all the events for all SoC
event tables in pmu_sys_event_tables[].
Matches must satisfy both:
- PMU identifier matches event "compat" value
- event "Unit" member must match, same as uncore event aliases matched by
CPUID
perf evsel: Emit warning about kernel not supporting the data page size sample_type bit
Before we had this unhelpful message:
$ perf record --data-page-size sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles:u).
/bin/dmesg | grep -i perf may provide additional information.
$
Add support to the perf_missing_features variable to remember what
caused evsel__open() to fail and then use that information in
evsel__open_strerror().
$ perf record --data-page-size sleep 1
Error:
Asking for the data page size isn't supported by this kernel.
$
Kan Liang [Mon, 30 Nov 2020 17:27:52 +0000 (09:27 -0800)]
tools headers UAPI: Update tools's copy of linux/perf_event.h
To get the changes in:
commit 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE")
commit 995f088efebe ("perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE")
This silences this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h'
differs from latest version at 'include/uapi/linux/perf_event.h'
diff -u tools/include/uapi/linux/perf_event.h
include/uapi/linux/perf_event.h
Jan Kratochvil [Fri, 4 Dec 2020 12:17:02 +0000 (09:17 -0300)]
perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder
elfutils needs to be provided main binary and separate debug info file
respectively. Providing separate debug info file instead of the main
binary is not sufficient.
One needs to try both supplied filename and its possible cache by its
build-id depending on the use case.
perf record: Fix memory leak when using '--user-regs=?' to list registers
When using 'perf record's option '-I' or '--user-regs=' along with
argument '?' to list available register names, memory of variable 'os'
allocated by strdup() needs to be released before __parse_regs()
returns, otherwise memory leak will occur.
Jiri Olsa [Thu, 3 Dec 2020 23:08:36 +0000 (00:08 +0100)]
tools build: Add missing libcap to test-all.bin target
We're missing -lcap in test-all.bin target, so in case it's the only
library missing (if more are missing test-all.bin fails anyway), we will
falsely claim that we detected it and fail build, like:
$ make
...
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... libbfd: [ on ]
... libbfd-buildid: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... disassembler-four-args: [ on ]
...
CC builtin-ftrace.o
In file included from builtin-ftrace.c:29:
util/cap.h:11:10: fatal error: sys/capability.h: No such file or directory
11 | #include <sys/capability.h>
| ^~~~~~~~~~~~~~~~~~
compilation terminated.
Linus Torvalds [Thu, 17 Dec 2020 17:27:57 +0000 (09:27 -0800)]
drm/edid: fix objtool warning in drm_cvt_modes()
Commit 991fcb77f490 ("drm/edid: Fix uninitialized variable in
drm_cvt_modes()") just replaced one warning with another.
The original warning about a possibly uninitialized variable was due to
the compiler not being smart enough to see that the case statement
actually enumerated all possible cases. And the initial fix was just to
add a "default" case that had a single "unreachable()", just to tell the
compiler that that situation cannot happen.
However, that doesn't actually fix the fundamental reason for the
problem: the compiler still doesn't see that the existing case
statements enumerate all possibilities, so the compiler will still
generate code to jump to that unreachable case statement. It just won't
complain about an uninitialized variable any more.
So now the compiler generates code to our inline asm marker that we told
it would not fall through, and end end result is basically random. We
have created a bridge to nowhere.
And then, depending on the random details of just exactly what the
compiler ends up doing, 'objtool' might end up complaining about the
conditional branches (for conditions that cannot happen, and that thus
will never be taken - but if the compiler was not smart enough to figure
that out, we can't expect objtool to do so) going off in the weeds.
So depending on how the compiler has laid out the result, you might see
something like this:
drivers/gpu/drm/drm_edid.o: warning: objtool: do_cvt_mode() falls through to next function drm_mode_detailed.isra.0()
and now you have a truly inscrutable warning that makes no sense at all
unless you start looking at whatever random code the compiler happened
to generate for our bare "unreachable()" statement.
IOW, don't use "unreachable()" unless you have an _active_ operation
that generates code that actually makes it obvious that something is not
reachable (ie an UD instruction or similar).
Solve the "compiler isn't smart enough" problem by just marking one of
the cases as "default", so that even when the compiler doesn't otherwise
see that we've enumerated all cases, the compiler will feel happy and
safe about there always being a valid case that initializes the 'width'
variable.
This also generates better code, since now the compiler doesn't generate
comparisons for five different possibilities (the four real ones and the
one that can't happen), but just for the three real ones and "the rest"
(which is that last one).
A smart enough compiler that sees that we cover all the cases won't care.