printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.
The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).
Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.
Merge tag 'for-linus-5.7-1' of git://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
"Bug fixes for main IPMI driver, kcs updates
A couple of bug fixes for the main IPMI driver, one functional and two
annotations.
The kcs driver has some significant updates that have been pending for
a while, but I forgot to include in next until a week ago. But this
code is only used by the people who are sending it to me, really, so
it's not a big deal. I did want it to sit in next for at least a week,
and it did result in a fix"
* tag 'for-linus-5.7-1' of git://github.com/cminyard/linux-ipmi:
ipmi: kcs: Fix aspeed_kcs_probe_of_v1()
ipmi: Add missing annotation for ipmi_ssif_lock_cond() and ipmi_ssif_unlock_cond()
ipmi: kcs: aspeed: Implement v2 bindings
ipmi: kcs: Finish configuring ASPEED KCS device before enable
dt-bindings: ipmi: aspeed: Introduce a v2 binding for KCS
ipmi: fix hung processes in __get_guid()
drivers: char: ipmi: ipmi_msghandler: Pass lockdep expression to RCU lists
Merge tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"As expected, more fixes did turn up in the latter part of the week.
The drm_local_map build regression fix is here, along with temporary
disabling of the hugepage work due to some amdgpu related crashes.
Otherwise it's just a bunch of i915, and amdgpu fixes.
legacy:
- fix drm_local_map.offset type
ttm:
- temporarily disable hugepages to debug amdgpu problems.
prime:
- fix sg extraction
amdgpu:
- Various Renoir fixes
- Fix gfx clockgating sequence on gfx10
- RAS fixes
- Avoid MST property creation after registration
- Various cursor/viewport fixes
- Fix a confusing log message about optional firmwares
i915:
- Flush all the reloc_gpu batch (Chris)
- Ignore readonly failures when updating relocs (Chris)
- Fill all the unused space in the GGTT (Chris)
- Return the right vswing table (Jose)
- Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)
analogix_dp:
- probe fix
virtio:
- oob fix in object create"
* tag 'drm-next-2020-04-10' of git://anongit.freedesktop.org/drm/drm: (34 commits)
drm/ttm: Temporarily disable the huge_fault() callback
drm/bridge: analogix_dp: Split bind() into probe() and real bind()
drm/legacy: Fix type for drm_local_map.offset
drm/amdgpu/display: fix warning when compiling without debugfs
drm/amdgpu: unify fw_write_wait for new gfx9 asics
drm/amd/powerplay: error out on forcing clock setting not supported
drm/amdgpu: fix gfx hang during suspend with video playback (v2)
drm/amd/display: Check for null fclk voltage when parsing clock table
drm/amd/display: Acknowledge wm_optimized_required
drm/amd/display: Make cursor source translation adjustment optional
drm/amd/display: Calculate scaling ratios on every medium/full update
drm/amd/display: Program viewport when source pos changes for DCN20 hw seq
drm/amd/display: Fix incorrect cursor pos on scaled primary plane
drm/amd/display: change default pipe_split policy for DCN1
drm/amd/display: Translate cursor position by source rect
drm/amd/display: Update stream adjust in dc_stream_adjust_vmin_vmax
drm/amd/display: Avoid create MST prop after registration
drm/amdgpu/psp: dont warn on missing optional TA's
drm/amdgpu: update RAS related dmesg print
drm/amdgpu: resolve mGPU RAS query instability
...
Merge tag 'sound-fix-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes gathered since the previous update.
ALSA core:
- Regression fix for OSS PCM emulation
ASoC:
- Trivial fixes in reg bit mask ops, DAPM, DPCM and topology
- Lots of fixes for Intel-based devices
- Minor fixes for AMD, STM32, Qualcomm, Realtek
Others:
- Fixes for the bugs in mixer handling in HD-audio and ice1724
drivers that were caught by the recent kctl validator
- New quirks for HD-audio and USB-audio
Also this contains a fix for EDD firmware fix, which slipped from
anyone's hands"
* tag 'sound-fix-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (35 commits)
ALSA: hda: Add driver blacklist
ALSA: usb-audio: Add mixer workaround for TRX40 and co
ALSA: hda/realtek - Add quirk for MSI GL63
ALSA: ice1724: Fix invalid access for enumerated ctl items
ALSA: hda: Fix potential access overflow in beep helper
ASoC: cs4270: pull reset GPIO low then high
ALSA: hda/realtek - Add HP new mute led supported for ALC236
ALSA: hda/realtek - Add supported new mute Led for HP
ASoC: rt5645: Add platform-data for Medion E1239T
ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN MPWIN895CL tablet
ASoC: stm32: sai: Add missing cleanup
ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Alpha S
ASoC: Intel: atom: Fix uninitialized variable compiler warning
ASoC: Intel: atom: Check drv->lock is locked in sst_fill_and_send_cmd_unlocked
ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map()
ASoC: SOF: Turn "firmware boot complete" message into a dbg message
ALSA: usb-audio: Add Pioneer DJ DJM-250MK2 quirk
ALSA: pcm: oss: Fix regression by buffer overflow fix (again)
ALSA: pcm: oss: Fix regression by buffer overflow fix
edd: Use scnprintf() for avoiding potential buffer overflow
...
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a batch of changes that didn't make it in the initial pull
request because the lpfc series had to be rebased to redo an incorrect
split.
It's basically driver updates to lpfc, target, bnx2fc and ufs with the
rest being minor updates except the sr_block_release one which fixes a
use after free introduced by the removal of the global mutex in the
first patch set"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (35 commits)
scsi: core: Add DID_ALLOC_FAILURE and DID_MEDIUM_ERROR to hostbyte_table
scsi: ufs: Use ufshcd_config_pwr_mode() when scaling gear
scsi: bnx2fc: fix boolreturn.cocci warnings
scsi: zfcp: use fallthrough;
scsi: aacraid: do not overwrite retval in aac_reset_adapter()
scsi: sr: Fix sr_block_release()
scsi: aic7xxx: Remove more FreeBSD-specific code
scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
scsi: ufs: set device as active power mode after resetting device
scsi: iscsi: Report unbind session event when the target has been removed
scsi: lpfc: Change default SCSI LUN QD to 64
scsi: libfc: rport state move to PLOGI if all PRLI retry exhausted
scsi: libfc: If PRLI rejected, move rport to PLOGI state
scsi: bnx2fc: Update the driver version to 2.12.13
scsi: bnx2fc: Fix SCSI command completion after cleanup is posted
scsi: bnx2fc: Process the RQE with CQE in interrupt context
scsi: target: use the stack for XCOPY passthrough cmds
scsi: target: increase XCOPY I/O size
scsi: target: avoid per-loop XCOPY buffer allocations
scsi: target: drop xcopy DISK BLOCK LENGTH debug
...
Steve French [Fri, 10 Apr 2020 02:42:18 +0000 (21:42 -0500)]
smb3: enable swap on SMB3 mounts
Add experimental support for allowing a swap file to be on an SMB3
mount. There are use cases where swapping over a secure network
filesystem is preferable. In some cases there are no local
block devices large enough, and network block devices can be
hard to setup and secure. And in some cases there are no
local block devices at all (e.g. with the recent addition of
remote boot over SMB3 mounts).
There are various enhancements that can be added later e.g.:
- doing a mandatory byte range lock over the swapfile (until
the Linux VFS is modified to notify the file system that an open
is for a swapfile, when the file can be opened "DENY_ALL" to prevent
others from opening it).
- pinning more buffers in the underlying transport to minimize memory
allocations in the TCP stack under the fs
- documenting how to create ACLs (on the server) to secure the
swapfile (or adding additional tools to cifs-utils to make it easier)
arch: nios2: Enable the common clk subsystem on Nios2
This patch adds support for common clock framework on Nios2. Clock
framework is commonly used in many drivers, and this patch makes it
available for the entire architecture, not just on a per-driver basis.
Merge tag 'libata-5.7-2020-04-09' of git://git.kernel.dk/linux-block
Pull libata fixes from Jens Axboe:
"A few followup changes/fixes for libata:
- PMP removal fix (Kai-Heng)
- Add remapped NVMe device attribute to sysfs (Kai-Heng)
- Remove redundant assignment (Colin)
- Add yet another Comet Lake ID (Jian-Hong)"
* tag 'libata-5.7-2020-04-09' of git://git.kernel.dk/linux-block:
ahci: Add Intel Comet Lake PCH RAID PCI ID
ata: ahci: Add sysfs attribute to show remapped NVMe device count
ata: ahci-imx: remove redundant assignment to ret
libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a set of fixes that should go into this merge window. This
contains:
- NVMe pull request from Christoph with various fixes
- Better discard support for loop (Evan)
- Only call ->commit_rqs() if we have queued IO (Keith)
- blkcg offlining fixes (Tejun)
- fix (and fix the fix) for busy partitions"
* tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block:
block: fix busy device checking in blk_drop_partitions again
block: fix busy device checking in blk_drop_partitions
nvmet-rdma: fix double free of rdma queue
blk-mq: don't commit_rqs() if none were queued
nvme-fc: Revert "add module to ops template to allow module references"
nvme: fix deadlock caused by ANA update wrong locking
nvmet-rdma: fix bonding failover possible NULL deref
loop: Better discard support for block devices
loop: Report EOPNOTSUPP properly
nvmet: fix NULL dereference when removing a referral
nvme: inherit stable pages constraint in the mpath stack device
blkcg: don't offline parent blkcg first
blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it
nvme-tcp: fix possible crash in recv error flow
nvme-tcp: don't poll a non-live queue
nvme-tcp: fix possible crash in write_zeroes processing
nvmet-fc: fix typo in comment
nvme-rdma: Replace comma with a semicolon
nvme-fcloop: fix deallocation of working context
nvme: fix compat address handling in several ioctls
Merge tag 'io_uring-5.7-2020-04-09' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Here's a set of fixes that either weren't quite ready for the first,
or came about from some intensive testing on memcached with 350K+
sockets.
Summary:
- Fixes for races or deadlocks around poll handling
- Don't double account fixed files against RLIMIT_NOFILE
- IORING_OP_OPENAT LFS fix
- Poll retry handling (Bijan)
- Missing finish_wait() for SQPOLL (Hillf)
- Cleanup/split of io_kiocb alloc vs ctx references (Pavel)
- Fixed file unregistration and init fixes (Xiaoguang)
- Various little fixes (Xiaoguang, Pavel, Colin)"
* tag 'io_uring-5.7-2020-04-09' of git://git.kernel.dk/linux-block:
io_uring: punt final io_ring_ctx wait-and-free to workqueue
io_uring: fix fs cleanup on cqe overflow
io_uring: don't read user-shared sqe flags twice
io_uring: remove req init from io_get_req()
io_uring: alloc req only after getting sqe
io_uring: simplify io_get_sqring
io_uring: do not always copy iovec in io_req_map_rw()
io_uring: ensure openat sets O_LARGEFILE if needed
io_uring: initialize fixed_file_data lock
io_uring: remove redundant variable pointer nxt and io_wq_assign_next call
io_uring: fix ctx refcounting in io_submit_sqes()
io_uring: process requests completed with -EAGAIN on poll list
io_uring: remove bogus RLIMIT_NOFILE check in file registration
io_uring: use io-wq manager as backup task if task is exiting
io_uring: grab task reference for poll requests
io_uring: retry poll if we got woken with non-matching mask
io_uring: add missing finish_wait() in io_sq_thread()
io_uring: refactor file register/unregister/update handling
Merge tag 'xfs-5.7-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs updates from Darrick Wong:
"As promised last week, this batch changes how xfs interacts with
memory reclaim; how the log batches and throttles log items; how hard
writes near ENOSPC will try to squeeze more space out of the
filesystem; and hopefully fix the last of the umount hangs after a
catastrophic failure.
Summary:
- Validate the realtime geometry in the superblock when mounting
- Refactor a bunch of tricky flag handling in the log code
- Flush the CIL more judiciously so that we don't wait until there
are millions of log items consuming a lot of memory.
- Throttle transaction commits to prevent the xfs frontend from
flooding the CIL with too many log items.
- Account metadata buffers correctly for memory reclaim.
- Mark slabs properly for memory reclaim. These should help reclaim
run more effectively when XFS is using a lot of memory.
- Don't write a garbage log record at unmount time if we're trying to
trigger summary counter recalculation at next mount.
- Don't block the AIL on locked dquot/inode buffers; instead trigger
its backoff mechanism to give the lock holder a chance to finish
up.
- Ratelimit writeback flushing when buffered writes encounter ENOSPC.
- Other minor cleanups.
- Make reflink a synchronous operation when the fs is mounted with
wsync or sync, which means that now we force the log to disk to
record the changes"
* tag 'xfs-5.7-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
xfs: reflink should force the log out if mounted with wsync
xfs: factor out a new xfs_log_force_inode helper
xfs: fix inode number overflow in ifree cluster helper
xfs: remove redundant variable assignment in xfs_symlink()
xfs: ratelimit inode flush on buffered write ENOSPC
xfs: return locked status of inode buffer on xfsaild push
xfs: trylock underlying buffer on dquot flush
xfs: remove unnecessary ternary from xfs_create
xfs: don't write a corrupt unmount record to force summary counter recalc
xfs: factor inode lookup from xfs_ifree_cluster
xfs: tail updates only need to occur when LSN changes
xfs: factor common AIL item deletion code
xfs: correctly acount for reclaimable slabs
xfs: Improve metadata buffer reclaim accountability
xfs: don't allow log IO to be throttled
xfs: Throttle commits on delayed background CIL push
xfs: Lower CIL flush limit for large logs
xfs: remove some stale comments from the log code
xfs: refactor unmount record writing
xfs: merge xlog_commit_record with xlog_write_done
...
Merge tag 'acpi-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These prevent a false-positive static checker warning from triggering
in the ACPI EC driver (Rafael Wysocki), fix white space in an ACPI
document (Vilhelm Prytz) and add static annotation to one variable
(Jason Yan)"
* tag 'acpi-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI, x86/boot: make acpi_nobgrt static
Documentation: firmware-guide: ACPI: fix table alignment in namespace.rst
ACPI: EC: Fix up fast path check in acpi_ec_add()
Merge tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"Rework compat ioctl handling in the user space hibernation interface
(Christoph Hellwig) and fix a typo in a function name in the cpuidle
haltpoll driver (Yihao Wu)"
* tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle-haltpoll: Fix small typo
PM / sleep: handle the compat case in snapshot_set_swap_area()
PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
io_uring: punt final io_ring_ctx wait-and-free to workqueue
We can't reliably wait in io_ring_ctx_wait_and_kill(), since the
task_works list isn't ordered (in fact it's LIFO ordered). We could
either fix this with a separate task_works list for io_uring work, or
just punt the wait-and-free to async context. This ensures that
task_work that comes in while we're shutting down is processed
correctly. If we don't go async, we could have work past the fput()
work for the ring that depends on work that won't be executed until
after we're done with the wait-and-free. But as this operation is
blocking, it'll never get a chance to run.
This was reproduced with hundreds of thousands of sockets running
memcached, haven't been able to reproduce this synthetically.
Dave Airlie [Thu, 9 Apr 2020 20:42:21 +0000 (06:42 +1000)]
Merge tag 'drm-intel-next-fixes-2020-04-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Flush all the reloc_gpu batch (Chris)
- Ignore readonly failures when updating relocs (Chris)
- Fill all the unused space in the GGTT (Chris)
- Return the right vswing table (Jose)
- Don't enable DDI IO power on a TypeC port in TBT mode for ICL+ (Imre)
drm/ttm: Temporarily disable the huge_fault() callback
With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
errors like:
BUG: non-zero pgtables_bytes on freeing mm
and:
BUG: Bad rss-counter state
with TTM transparent huge-pages.
Until we've figured out what other TTM drivers do differently compared to
vmwgfx, disable the huge_fault() callback, eliminating transhuge
page-table entries.
Merge tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Only a small cleanup this time around: a trivial conversion of
zero-length arrays to flexible arrays"
* tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
kernel: module: Replace zero-length array with flexible-array member
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Ensure that the compiler and linker versions are aligned so that ld
doesn't complain about not understanding a .note.gnu.property section
(emitted when pointer authentication is enabled).
- Force -mbranch-protection=none when the feature is not enabled, in
case a compiler may choose a different default value.
- Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
rarely enabled.
- Fix checking 16-bit Thumb-2 instructions checking mask in the
emulation of the SETEND instruction (it could match the bottom half
of a 32-bit Thumb-2 instruction).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
arm64: Always force a branch protection mode when the compiler has one
arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
init/kconfig: Add LD_VERSION Kconfig
Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"The bulk of this is the series to make CONFIG_COMPAT user-selectable,
it's been around for a long time but was blocked behind the
syscall-in-C series.
Plus there's also a few fixes and other minor things.
Summary:
- A fix for a crash in machine check handling on pseries (ie. guests)
- A small series to make it possible to disable CONFIG_COMPAT, and
turn it off by default for ppc64le where it's not used.
- A few other miscellaneous fixes and small improvements.
Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
Nicholas Piggin, Stephen Boyd, Wen Xiong"
* tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Always build the tm-poison test 64-bit
powerpc: Improve ppc_save_regs()
Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
powerpc/perf: split callchain.c by bitness
powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
powerpc/64: make buildable without CONFIG_COMPAT
powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
powerpc/perf: consolidate read_user_stack_32
powerpc: move common register copy functions from signal_32.c to signal.c
powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
powerpc/ps3: Remove an unneeded NULL check
powerpc/ps3: Remove duplicate error message
powerpc/powernv: Re-enable imc trace-mode in kernel
powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
powerpc/pseries: Fix MCE handling on pseries
selftests/eeh: Skip ahci adapters
powerpc/64s: Fix doorbell wakeup msgclr optimisation
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu update from Greg Ungerer:
"Only a single commit, to remove all use of the obsolete setup_irq()
calls within the m68knommu architecture code"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: Replace setup_irq() by request_irq()
Merge tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains a handful of new features:
- Partial support for the Kendryte K210.
There are still a few outstanding issues that I have patches for,
but I don't actually have a board to test them so they're not
included yet.
- SBI v0.2 support.
- Fixes to support for building with LLVM-based toolchains. The
resulting images are known not to boot yet.
I don't anticipate a part two, but I'll probably have something early
in the RCs to finish up the K210 support"
* tag 'riscv-for-linus-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits)
riscv: create a loader.bin boot image for Kendryte SoC
riscv: Kendryte K210 default config
riscv: Add Kendryte K210 device tree
riscv: Select required drivers for Kendryte SOC
riscv: Add Kendryte K210 SoC support
riscv: Add SOC early init support
riscv: Unaligned load/store handling for M_MODE
RISC-V: Support cpu hotplug
RISC-V: Add supported for ordered booting method using HSM
RISC-V: Add SBI HSM extension definitions
RISC-V: Export SBI error to linux error mapping function
RISC-V: Add cpu_ops and modify default booting method
RISC-V: Move relocate and few other functions out of __init
RISC-V: Implement new SBI v0.2 extensions
RISC-V: Introduce a new config for SBI v0.1
RISC-V: Add SBI v0.2 extension definitions
RISC-V: Add basic support for SBI v0.2
RISC-V: Mark existing SBI as 0.1 SBI.
riscv: Use macro definition instead of magic number
riscv: Add support to dump the kernel page tables
...
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
> (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&pid->wait_pidfd);
> local_irq_disable();
> lock(tasklist_lock);
> lock(&pid->wait_pidfd);
> <Interrupt>
> lock(tasklist_lock);
>
> *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:
The problem is that because wait_pidfd.lock is taken under the tasklist
lock. It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.
Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock. That should be safe and I think the code will eventually
do that. It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.
It is also true that my pre-merge window testing was insufficient. So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things. I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.
It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals. That will require taking pid->lock with irqs disabled.
Pavel Begunkov [Thu, 9 Apr 2020 05:17:59 +0000 (08:17 +0300)]
io_uring: fix fs cleanup on cqe overflow
If completion queue overflow occurs, __io_cqring_fill_event() will
update req->cflags, which is in a union with req->work and happens to
be aliased to req->work.fs. Following io_free_req() ->
io_req_work_drop_env() may get a bunch of different problems (miscount
fs->users, segfault, etc) on cleaning @fs.
Commit 2f62f36e62daec ("x86/xen: Make the boot CPU idle task reliable")
introduced a regression for booting 32 bit Xen PV guests: the address
of the initial stack needs to be a virtual one.
Marek Szyprowski [Tue, 10 Mar 2020 10:34:27 +0000 (11:34 +0100)]
drm/bridge: analogix_dp: Split bind() into probe() and real bind()
Analogix_dp driver acquires all its resources in the ->bind() callback,
what is a bit against the component driver based approach, where the
driver initialization is split into a probe(), where all resources are
gathered, and a bind(), where all objects are created and a compound
driver is initialized.
Extract all the resource related operations to analogix_dp_probe() and
analogix_dp_remove(), then call them before/after registration of the
device components from the main Exynos DP and Rockchip DP drivers. Also
move the plat_data initialization to the probe() to make it available for
the analogix_dp_probe() function.
This fixes the multiple calls to the bind() of the DRM compound driver
when the DP PHY driver is not yet loaded/probed:
Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The main items are:
- support for asynchronous create and unlink (Jeff Layton).
Creates and unlinks are satisfied locally, without waiting for a
reply from the MDS, provided the client has been granted
appropriate caps (new in v15.y.z ("Octopus") release). This can be
a big help for metadata heavy workloads such as tar and rsync.
Opt-in with the new nowsync mount option.
- multiple blk-mq queues for rbd (Hannes Reinecke and myself).
When the driver was converted to blk-mq, we settled on a single
blk-mq queue because of a global lock in libceph and some other
technical debt. These have since been addressed, so allocate a
queue per CPU to enhance parallelism.
- don't hold onto caps that aren't actually needed (Zheng Yan).
This has been our long-standing behavior, but it causes issues with
some active/standby applications (synchronous I/O, stalls if the
standby goes down, etc).
- .snap directory timestamps consistent with ceph-fuse (Luis
Henriques)"
* tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits)
ceph: fix snapshot directory timestamps
ceph: wait for async creating inode before requesting new max size
ceph: don't skip updating wanted caps when cap is stale
ceph: request new max size only when there is auth cap
ceph: cleanup return error of try_get_cap_refs()
ceph: return ceph_mdsc_do_request() errors from __get_parent()
ceph: check all mds' caps after page writeback
ceph: update i_requested_max_size only when sending cap msg to auth mds
ceph: simplify calling of ceph_get_fmode()
ceph: remove delay check logic from ceph_check_caps()
ceph: consider inode's last read/write when calculating wanted caps
ceph: always renew caps if mds_wanted is insufficient
ceph: update dentry lease for async create
ceph: attempt to do async create when possible
ceph: cache layout in parent dir on first sync create
ceph: add new MDS req field to hold delegated inode number
ceph: decode interval_sets for delegated inos
ceph: make ceph_fill_inode non-static
ceph: perform asynchronous unlink if we have sufficient caps
ceph: don't take refs to want mask unless we have all bits
...
Merge tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix failure to copy-up files from certain NFSv4 mounts
- Sort out inconsistencies between st_ino and i_ino (used in /proc/locks)
- Allow consistent (POSIX-y) inode numbering in more cases
- Allow virtiofs to be used as upper layer
- Miscellaneous cleanups and fixes
* tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: document xino expected behavior
ovl: enable xino automatically in more cases
ovl: avoid possible inode number collisions with xino=on
ovl: use a private non-persistent ino pool
ovl: fix WARN_ON nlink drop to zero
ovl: fix a typo in comment
ovl: replace zero-length array with flexible-array member
ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old
ovl: strict upper fs requirements for remote upper fs
ovl: check if upper fs supports RENAME_WHITEOUT
ovl: allow remote upper
ovl: decide if revalidate needed on a per-dentry basis
ovl: separate detection of remote upper layer from stacked overlay
ovl: restructure dentry revalidation
ovl: ignore failure to copy up unknown xattrs
ovl: document permission model
ovl: simplify i_ino initialization
ovl: factor out helper ovl_get_root()
ovl: fix out of date comment and unreachable code
ovl: fix value of i_ino for lower hardlink corner case
Merge tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add TI K3 RTI watchdog
- add stop_on_reboot parameter to control reboot policy
- wm831x_wdt: Remove GPIO handling
- several small fixes, improvements and clean-ups
* tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: Add K3 RTI watchdog support
dt-bindings: watchdog: Add support for TI K3 RTI watchdog
watchdog: ziirave_wdt: change name to be more specific
watchdog: orion: use 0 for unset heartbeat
watchdog: npcm: remove whitespaces
watchdog: reset last_hw_keepalive time at start
watchdog: imx2_wdt: Drop .remove callback
watchdog: Add stop_on_reboot parameter to control reboot policy
watchdog: wm831x_wdt: Remove GPIO handling
watchdog: imx7ulp: Remove unused include of init.h
watchdog: imx_sc_wdt: Remove unused includes
watchdog: qcom: Use irq flags from firmware
watchdog: pm8916_wdt: Add system sleep callbacks
watchdog: qcom-wdt: disable pretimeout on timer platform
Merge tag 'tag-chrome-platform-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
cros-usbpd-notify and cros_ec_typec:
- Add a new notification driver that handles and dispatches USB PD
related events to other drivers.
- Add a Type C connector class driver for cros_ec
CrOS EC:
- Introduce a new cros_ec_cmd_xfer_status helper
Sensors/iio:
- A series from Gwendal that adds Cros EC sensor hub FIFO support
Wilco EC:
- Fix a build warning.
- Platform data shouldn't include kernel.h
Misc:
- i2c api conversion complete, with i2c_new_client_device instead of
i2c_new_device in chromeos_laptop.
- Replace zero-length array with flexible-array member in
cros_ec_chardev and wilco_ec
- Update new structure for SPI transfer delays in cros_ec_spi
* tag 'tag-chrome-platform-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (34 commits)
platform/chrome: cros_ec_spi: Wait for USECS, not NSECS
iio: cros_ec: Use Hertz as unit for sampling frequency
iio: cros_ec: Report hwfifo_watermark_max
iio: cros_ec: Expose hwfifo_timeout
iio: cros_ec: Remove pm function
iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO
iio: expose iio_device_set_clock
iio: cros_ec: Move function description to .c file
platform/chrome: cros_ec_sensorhub: Add median filter
platform/chrome: cros_ec_sensorhub: Add code to spread timestmap
platform/chrome: cros_ec_sensorhub: Add FIFO support
platform/chrome: cros_ec_sensorhub: Add the number of sensors in sensorhub
platform/chrome: chromeos_laptop: make I2C API conversion complete
platform/chrome: wilco_ec: event: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_chardev: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_typec: Update port info from EC
platform/chrome: Add Type C connector class driver
platform/chrome: cros_usbpd_notify: Pull PD_HOST_EVENT status
platform/chrome: cros_usbpd_notify: Amend ACPI driver to plat
platform/chrome: cros_usbpd_notify: Add driver data struct
...
Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.
This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.
Summary:
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic
facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.
- Fixup some flexible-array declarations"
* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...
Aaron Liu [Tue, 7 Apr 2020 09:46:04 +0000 (17:46 +0800)]
drm/amdgpu: unify fw_write_wait for new gfx9 asics
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.
Evan Quan [Fri, 3 Apr 2020 05:19:14 +0000 (13:19 +0800)]
drm/amd/powerplay: error out on forcing clock setting not supported
For Arcturus, forcing clock to some specific level is not supported
with 54.18 and onwards SMU firmware. As according to firmware team,
they adopt new gfx dpm tuned parameters which can cover all the use
case in a much smooth way. Thus setting through driver interface
is not needed and maybe do a disservice.
drm/amdgpu: fix gfx hang during suspend with video playback (v2)
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.
v2: Use disable the GFX CGPG instead of RLC safe mode guard.
The commit 2e05ea5cdc1a ("dma-mapping: implement dma_map_single_attrs using
dma_map_page_attrs") removed "dma_debug_page" enum, but missed to update
type2name string table. This causes incorrect displaying of dma allocation
type.
Fix it by removing "page" string from type2name string table and switch to
use named initializers.
dma-direct: fix data truncation in dma_direct_get_required_mask()
The upper 32-bit physical address gets truncated inadvertently
when dma_direct_get_required_mask() invokes phys_to_dma_direct().
This results in dma_addressing_limited() return incorrect value
when used in platforms with LPAE enabled.
Fix it here by explicitly type casting 'max_pfn' to phys_addr_t
in order to prevent overflow of intermediate value while evaluating
'(max_pfn - 1) << PAGE_SHIFT'.
kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
As Documentation/kbuild/llvm.rst implies, building the kernel with a
full set of LLVM tools gets very verbose and unwieldy.
Provide a single switch LLVM=1 to use Clang and LLVM tools instead
of GCC and Binutils. You can pass it from the command line or as an
environment variable.
Please note LLVM=1 does not turn on the integrated assembler. You need
to pass LLVM_IAS=1 to use it. When the upstream kernel is ready for the
integrated assembler, I think we can make it default.
We discussed what we need, and we agreed to go with a simple boolean
flag that switches both target and host tools:
When multiple versions of LLVM are installed, I just thought supporting
LLVM_DIR=/path/to/my/llvm/bin/ might be useful.
CC = $(LLVM_DIR)clang
LD = $(LLVM_DIR)ld.lld
...
However, we can handle this by modifying PATH. So, we decided to not do
this.
- LLVM_SUFFIX
Some distributions (e.g. Debian) package specific versions of LLVM with
naming conventions that use the version as a suffix.
CC = clang$(LLVM_SUFFIX)
LD = ld.lld(LLVM_SUFFIX)
...
will allow a user to pass LLVM_SUFFIX=-11 to use clang-11 etc.,
but the suffixed versions in /usr/bin/ are symlinks to binaries in
/usr/lib/llvm-#/bin/, so this can also be handled by PATH.
The 'AS' variable is unused for building the kernel. Only the remaining
usage is to turn on the integrated assembler. A boolean flag is a better
fit for this purpose.
AS=clang was added for experts. So, I replaced it with LLVM_IAS=1,
breaking the backward compatibility.
Merge tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU support for the TLB range invalidation command in SMMUv3.2
- ARM-SMMU introduction of command batching helpers to batch up CD and
ATC invalidation
- ARM-SMMU support for PCI PASID, along with necessary PCI symbol
exports
- Introduce a generic (actually rename an existing) IOMMU related
pointer in struct device and reduce the IOMMU related pointers
- Some fixes for the OMAP IOMMU driver to make it build on 64bit
architectures
- Various smaller fixes and improvements
* tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (39 commits)
iommu: Move fwspec->iommu_priv to struct dev_iommu
iommu/virtio: Use accessor functions for iommu private data
iommu/qcom: Use accessor functions for iommu private data
iommu/mediatek: Use accessor functions for iommu private data
iommu/renesas: Use accessor functions for iommu private data
iommu/arm-smmu: Use accessor functions for iommu private data
iommu/arm-smmu: Refactor master_cfg/fwspec usage
iommu/arm-smmu-v3: Use accessor functions for iommu private data
iommu: Introduce accessors for iommu private data
iommu/arm-smmu: Fix uninitilized variable warning
iommu: Move iommu_fwspec to struct dev_iommu
iommu: Rename struct iommu_param to dev_iommu
iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
ACPI/IORT: Remove direct access of dev->iommu_fwspec
iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
iommu/virtio: Fix freeing of incomplete domains
iommu/virtio: Fix sparse warning
iommu/vt-d: Add build dependency on IOASID
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"s390:
- nested virtualization fixes
x86:
- split svm.c
- miscellaneous fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: fix crash cleanup when KVM wasn't used
KVM: X86: Filter out the broadcast dest for IPI fastpath
KVM: s390: vsie: Fix possible race when shadowing region 3 tables
KVM: s390: vsie: Fix delivery of addressing exceptions
KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
KVM: nVMX: don't clear mtf_pending when nested events are blocked
KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter
KVM: SVM: Split svm_vcpu_run inline assembly to separate file
KVM: SVM: Move SEV code to separate file
KVM: SVM: Move AVIC code to separate file
KVM: SVM: Move Nested SVM Implementation to nested.c
kVM SVM: Move SVM related files to own sub-directory
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- Some bug fixes
- The new vdpa subsystem with two first drivers
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
vdpa: move to drivers/vdpa
virtio: Intel IFC VF driver for VDPA
vdpasim: vDPA device simulator
vhost: introduce vDPA-based backend
virtio: introduce a vDPA based transport
vDPA: introduce vDPA bus
vringh: IOTLB support
vhost: factor out IOTLB
vhost: allow per device message handler
vhost: refine vhost and vringh kconfig
virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
virtio-net: Introduce hash report feature
virtio-net: Introduce RSS receive steering feature
virtio-net: Introduce extended RSC feature
tools/virtio: option to build an out of tree module
Fredrik Strupe [Wed, 8 Apr 2020 11:29:41 +0000 (13:29 +0200)]
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
For thumb instructions, call_undef_hook() in traps.c first reads a u16,
and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second
u16 is read, which then makes up the the lower half-word of a T32
instruction. For T16 instructions, the second u16 is not read,
which makes the resulting u32 opcode always have the upper half set to
0.
However, having the upper half of instr_mask in the undef_hook set to 0
masks out the upper half of all thumb instructions - both T16 and T32.
This results in trapped T32 instructions with the lower half-word equal
to the T16 encoding of setend (b650) being matched, even though the upper
half-word is not 0000 and thus indicates a T32 opcode.
An example of such a T32 instruction is eaa0b650, which should raise a
SIGILL since T32 instructions with an eaa prefix are unallocated as per
Arm ARM, but instead works as a SETEND because the second half-word is set
to b650.
This patch fixes the issue by extending instr_mask to include the
upper u32 half, which will still match T16 instructions where the upper
half is 0, but not T32 instructions.
mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal
__get_user_pages_locked() will return 0 instead of -EINTR after commit 4426e945df588 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which
added extra code to allow gup detect fatal signal faster.
Pavel Begunkov [Wed, 8 Apr 2020 05:58:45 +0000 (08:58 +0300)]
io_uring: remove req init from io_get_req()
io_get_req() do two different things: io_kiocb allocation and
initialisation. Move init part out of it and rename into
io_alloc_req(). It's simpler this way and also have better data
locality.
Pavel Begunkov [Wed, 8 Apr 2020 05:58:44 +0000 (08:58 +0300)]
io_uring: alloc req only after getting sqe
As io_get_sqe() split into 2 stage get/consume, get an sqe before
allocating io_kiocb, so no free_req*() for a failure case is needed,
and inline back __io_req_do_free(), which has only 1 user.
Pavel Begunkov [Wed, 8 Apr 2020 05:58:43 +0000 (08:58 +0300)]
io_uring: simplify io_get_sqring
Make io_get_sqring() care only about sqes themselves, not initialising
the io_kiocb. Also, split it into get + consume, that will be helpful in
the future.
Xiaoguang Wang [Wed, 8 Apr 2020 14:29:58 +0000 (22:29 +0800)]
io_uring: do not always copy iovec in io_req_map_rw()
In io_read_prep() or io_write_prep(), io_req_map_rw() takes
struct io_async_rw's fast_iov as argument to call io_import_iovec(),
and if io_import_iovec() uses struct io_async_rw's fast_iov as
valid iovec array, later indeed io_req_map_rw() does not need
to do the memcpy operation, because they are same pointers.
io_uring: ensure openat sets O_LARGEFILE if needed
OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:
*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded
*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded
*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large
*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large
Ensure we set O_LARGEFILE, if force_o_largefile() is true.
kbuild: add dummy toolchains to enable all cc-option etc. in Kconfig
Staring v4.18, Kconfig evaluates compiler capabilities, and hides CONFIG
options your compiler does not support. This works well if you configure
and build the kernel on the same host machine.
It is inconvenient if you prepare the .config that is carried to a
different build environment (typically this happens when you package
the kernel for distros) because using a different compiler potentially
produces different CONFIG options than the real build environment.
So, you probably want to make as many options visible as possible.
In other words, you need to create a super-set of CONFIG options that
cover any build environment. If some of the CONFIG options turned out
to be unsupported on the build machine, they are automatically disabled
by the nature of Kconfig.
However, it is not feasible to get a full-featured compiler for every
arch.
This issue was discussed here:
https://lkml.org/lkml/2019/12/9/620
Other than distros, savedefconfig is also a problem. Some arch sub-systems
periodically resync defconfig files. If you use a less-capable compiler
for savedefconfig, options that do not meet 'depends on $(cc-option,...)'
will be forcibly disabled. So, 'make defconfig && make savedefconfig'
may silently change the behavior.
This commit adds a set of dummy toolchains that pretend to support any
feature.
Most of compiler features are tested by cc-option, which simply checks
the exit code of $(CC). The dummy tools are shell scripts that always
exit with 0. So, $(cc-option, ...) is evaluated as 'y'.
Masahiro Yamada [Wed, 11 Mar 2020 22:37:25 +0000 (07:37 +0900)]
kbuild: link lib-y objects to vmlinux forcibly when CONFIG_MODULES=y
Kbuild supports not only obj-y but also lib-y to list objects linked to
vmlinux.
The difference between them is that all the objects from obj-y are
forcibly linked to vmlinux, whereas the objects from lib-y are linked
as needed; if there is no user of a lib-y object, it is not linked.
lib-y is intended to list utility functions that may be called from all
over the place (and may be unused at all), but it is a problem for
EXPORT_SYMBOL(). Even if there is no call-site in the vmlinux, we need
to keep exported symbols for the use from loadable modules.
Commit 7f2084fa55e6 ("[kbuild] handle exports in lib-y objects reliably")
worked around it by linking a dummy object, lib-ksyms.o, which contains
references to all the symbols exported from lib.a in that directory.
It uses the linker script command, EXTERN. Unfortunately, the meaning of
EXTERN of ld.lld is different from that of ld.bfd. Therefore, this does
not work with LD=ld.lld (CBL issue #515).
Anyway, the build rule of lib-ksyms.o is somewhat tricky. So, I want to
get rid of it.
At first, I was thinking of accumulating lib-y objects into obj-y
(or even replacing lib-y with obj-y entirely), but the lib-y syntax
is used beyond the ordinary use in lib/ and arch/*/lib/.
Examples:
- drivers/firmware/efi/libstub/Makefile builds lib.a, which is linked
into vmlinux in the own way (arm64), or linked to the decompressor
(arm, x86).
- arch/alpha/lib/Makefile builds lib.a which is linked not only to
vmlinux, but also to bootloaders in arch/alpha/boot/Makefile.
- arch/xtensa/boot/lib/Makefile builds lib.a for use from
arch/xtensa/boot/boot-redboot/Makefile.
One more thing, adding everything to obj-y would increase the vmlinux
size of allnoconfig (or tinyconfig).
For less impact, I tweaked the destination of lib.a at the top Makefile;
when CONFIG_MODULES=y, lib.a goes to KBUILD_VMLINUX_OBJS, which is
forcibly linked to vmlinux, otherwise lib.a goes to KBUILD_VMLINUX_LIBS
as before.
The size impact for normal usecases is quite small since at lease one
symbol in every lib-y object is eventually called by someone. In case
you are intrested, here are the figures.
Hopefully this is still not a big deal. The per-file trimming with the
static library is not so effective after all.
If fine-grained optimization is desired, some architectures support
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, which trims dead code per-symbol
basis. When LTO is supported in mainline, even better optimization will
be possible.
MIPS: fw: arc: add __weak to prom_meminit and prom_free_prom_memory
As far as I understood, prom_meminit() in arch/mips/fw/arc/memory.c
is overridden by the one in arch/mips/sgi-ip32/ip32-memory.c if
CONFIG_SGI_IP32 is enabled.
The use of EXPORT_SYMBOL in static libraries potentially causes a
problem for the llvm linker [1]. So, I want to forcibly link lib-y
objects to vmlinux when CONFIG_MODULES=y.
As a groundwork, we must fix multiple definitions that have previously
been hidden by lib-y.
The prom_cleanup() in this file is already marked as __weak (because
it is overridden by the one in arch/mips/sgi-ip22/ip22-mc.c).
I think it should be OK to do the same for these two.
kbuild: remove -I$(srctree)/tools/include from scripts/Makefile
I do not like to add an extra include path for every tool with no
good reason. This should be specified per file.
This line was added by commit 6520fe5564ac ("x86, realmode: 16-bit
real-mode code support for relocs tool"), which did not touch
anything else in scripts/. I see no reason to add this.
Also, remove the comment about kallsyms because we do not have any
for the rest of programs.
kbuild: do not pass $(KBUILD_CFLAGS) to scripts/mkcompile_h
scripts/mkcompile_h uses $(CC) only for getting the version string.
I suspected there was a specific reason why the additional flags were
needed, and dug the commit history. This code dates back to at least
2002 [1], but I could not get any more clue.
kbuild: mkcompile_h: Include $LD version in /proc/version
When doing Clang builds of the kernel, it is possible to link with
either ld.bfd (binutils) or ld.lld (LLVM), but it is not possible to
discover this from a running kernel. Add the "$LD -v" output to
/proc/version.
kconfig: qconf: fix the content of the main widget
The port to Qt5 tried to preserve the same way as it used
to work with Qt3 and Qt4. However, at least with newer
versions of Qt5 (5.13), this doesn't work properly.
Change the schema by adding a vertical layout, in order
for it to start working properly again.
Both main config window and the item window have "Option"
name. That sounds weird, and makes harder to debug issues
of a window appearing at the wrong place.
So, change the title to reflect the contents of each
window.
The recommended way to initialize a null string is with
QString(). This is there at least since Qt5.5, with is
when qconf was ported to Qt5.
Fix those warnings:
scripts/kconfig/qconf.cc: In member function ‘void ConfigItem::updateMenu()’:
scripts/kconfig/qconf.cc:158:31: warning: ‘QString::null’ is deprecated: use QString() [-Wdeprecated-declarations]
158 | setText(noColIdx, QString::null);
| ^~~~
In file included from /usr/include/qt5/QtCore/qobject.h:47,
from /usr/include/qt5/QtWidgets/qwidget.h:45,
from /usr/include/qt5/QtWidgets/qmainwindow.h:44,
from /usr/include/qt5/QtWidgets/QMainWindow:1,
from scripts/kconfig/qconf.cc:9:
Currently, we disable -Wtautological-compare, which in turn disables a
bunch of more specific tautological comparison warnings that are useful
for the kernel such as -Wtautological-bitwise-compare. See clang's
documentation below for the other warnings that are suppressed by
-Wtautological-compare. Now that all of the major/noisy warnings have
been fixed, enable -Wtautological-compare so that more issues can be
caught at build time by various continuous integration setups.
-Wtautological-constant-out-of-range-compare is kept disabled under a
normal build but visible at W=1 because there are places in the kernel
where a constant or variable size can change based on the kernel
configuration. These are not fixed in a clean/concise way and the ones
I have audited so far appear to be harmless. It is not a subgroup but
rather just one warning so we do not lose out on much coverage by
default.
Regular files opened with O_NONBLOCK allow read to return after a single
round-trip with the server instead of trying to fill buffer.
Add a few lines in 9p documentation to describe that.
Masahiro Yamada [Thu, 26 Mar 2020 08:01:04 +0000 (17:01 +0900)]
crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'
poly1305-x86_64-cryptogams.S is a generated file, so it should be
cleaned up by 'make clean'.
Assigning it to the variable 'targets' teaches Kbuild that it is a
generated file. However, this line is not evaluated when cleaning
because scripts/Makefile.clean does not include include/config/auto.conf.
Remove the ifneq-conditional, so this file is correctly cleaned up.
Borislav Petkov [Thu, 26 Mar 2020 08:01:02 +0000 (17:01 +0900)]
Documentation/changes: Raise minimum supported binutils version to 2.23
The currently minimum-supported binutils version 2.21 has the problem of
promoting symbols which are defined outside of a section into absolute.
According to Arvind:
binutils-2.21 and -2.22. An x86-64 defconfig will fail with
Invalid absolute R_X86_64_32S relocation: _etext
and after fixing that one, with
Invalid absolute R_X86_64_32S relocation: __end_of_kernel_reserve
Those two versions of binutils have a bug when it comes to handling
symbols defined outside of a section and binutils 2.23 has the proper
fix, see: https://sourceware.org/legacy-ml/binutils/2012-06/msg00155.html
Therefore, up to the fixed version directly, skipping the broken ones.
Currently shipping distros already have the fixed binutils version so
there should be no breakage resulting from this.
For more details about the whole thing, see the thread in Link.
crypto: curve25519 - do not pollute dispatcher based on assembler
Since we're doing a static inline dispatch here, we normally branch
based on whether or not there's an arch implementation. That would have
been fine in general, except the crypto Makefile prior used to turn
things off -- despite the Kconfig -- resulting in us needing to also
hard code various assembler things into the dispatcher too. The horror!
Now that the assembler config options are done by Kconfig, we can get
rid of the inconsistency.
crypto: x86 - rework configuration based on Kconfig
Now that assembler capabilities are probed inside of Kconfig, we can set
up proper Kconfig-based dependencies. We also take this opportunity to
reorder the Makefile, so that items are grouped logically by primitive.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:59 +0000 (17:00 +0900)]
x86: add comments about the binutils version to support code in as-instr
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
We have these as-instr tests because binutils 2.21 does not support
them.
When we bump the binutils version next time, this will be a good
hint to find out which one can be dropped.
As for the Clang/LLVM builds, we require very new LLVM version,
so the LLVM integrated assembler supports all of them.
x86: probe assembler capabilities via kconfig instead of makefile
Doing this probing inside of the Makefiles means we have a maze of
ifdefs inside the source code and child Makefiles that need to make
proper decisions on this too. Instead, we do it at Kconfig time, like
many other compiler and assembler options, which allows us to set up the
dependencies normally for full compilation units. In the process, the
ADX test changes to use %eax instead of %r10 so that it's valid in both
32-bit and 64-bit mode.
CONFIG_AS_MOVNTDQA was introduced by commit 0b1de5d58e19 ("drm/i915:
Use SSE4.1 movntdqa to accelerate reads from WC memory").
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.
Remove CONFIG_AS_MOVNTDQA, which is always defined.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:55 +0000 (17:00 +0900)]
x86: remove always-defined CONFIG_AS_AVX
CONFIG_AS_AVX was introduced by commit ea4d26ae24e5 ("raid5: add AVX
optimized RAID5 checksumming").
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:54 +0000 (17:00 +0900)]
x86: remove always-defined CONFIG_AS_SSSE3
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6a4 ("x86/raid6:
correctly check for assembler capabilities").
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.
Remove CONFIG_AS_SSSE3, which is always defined.
I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors
on non-x86 architectures.
lib/raid6/algos.c is built not only for the kernel but also for
testing the library code from userspace. I added -DCONFIG_X86 to
lib/raid6/test/Makefile to cator to this usecase.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:53 +0000 (17:00 +0900)]
x86: remove always-defined CONFIG_AS_CFI_SECTIONS
CONFIG_AS_CFI_SECTIONS was introduced by commit 9e565292270a ("x86:
Use .cfi_sections for assembly code").
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.
Remove CONFIG_AS_CFI_SECTIONS, which is always defined.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:51 +0000 (17:00 +0900)]
x86: remove always-defined CONFIG_AS_CFI
CONFIG_AS_CFI was introduced by commit e2414910f212 ("[PATCH] x86:
Detect CFI support in the assembler at runtime"), and extended by
commit f0f12d85af85 ("x86_64: Check for .cfi_rel_offset in CFI probe").
We raise the minimal supported binutils version from time to time.
The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum
required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the
binutils 2.21 assembler and also by LLVM integrated assembler.
Masahiro Yamada [Thu, 26 Mar 2020 08:00:49 +0000 (17:00 +0900)]
lib/raid6/test: fix build on distros whose /bin/sh is not bash
You can build a user-space test program for the raid6 library code,
like this:
$ cd lib/raid6/test
$ make
The command in $(shell ...) function is evaluated by /bin/sh by default.
(or, you can specify the shell by passing SHELL=<shell> from command line)
Currently '>&/dev/null' is used to sink both stdout and stderr. Because
this code is bash-ism, it only works when /bin/sh is a symbolic link to
bash (this is the case on RHEL etc.)
This does not work on Ubuntu where /bin/sh is a symbolic link to dash.
I see lots of
/bin/sh: 1: Syntax error: Bad fd number
and
warning "your version of binutils lacks ... support"