Josef Bacik [Mon, 10 Aug 2020 15:42:31 +0000 (11:42 -0400)]
btrfs: set the lockdep class for log tree extent buffers
These are special extent buffers that get rewound in order to lookup
the state of the tree at a specific point in time. As such they do not
go through the normal initialization paths that set their lockdep class,
so handle them appropriately when they are created and before they are
locked.
Josef Bacik [Mon, 10 Aug 2020 15:42:30 +0000 (11:42 -0400)]
btrfs: set the correct lockdep class for new nodes
When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
in replace_path(), which is weird because we're swapping the reloc root
with the actual target root. Turns out this is because we're using the
root->root_key.objectid as the root id for the newly allocated tree
block when setting the lockdep class, however we need to be using the
actual owner of this new block, which is saved in owner.
The affected path is through btrfs_copy_root as all other callers of
btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
== root->root_key.objectid .
This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.
Because the work queue allocation is done with GFP_KERNEL, it can
trigger reclaim, which can lead to a transaction commit, which in turns
needs the device_list_mutex, it can lead to a deadlock. A different
problem for which this fix is a solution.
Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info. We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.
The problem is we're doing a copy_to_user() while holding tree locks,
which can deadlock if we have to do a page fault for the copy_to_user().
This exists even without my locking changes, so it needs to be fixed.
Rework the search ioctl to do the pre-fault and then
copy_to_user_nofault for the copying.
This problem exists because we have two different rescan threads,
btrfs_uuid_scan_kthread which creates the uuid tree, and
btrfs_uuid_tree_iterate that goes through and updates or deletes any out
of date roots. The problem is they both do things in different order.
btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
into the uuid_root. btrfs_uuid_tree_iterate() scans the uuid_root, but
then does a btrfs_get_fs_root() which can read from the tree_root.
It's actually easy enough to not be holding the path in
btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
it further down and re-start the search when we loop. So simply move
the path release before we add our entry to the uuid tree.
This also fixes a problem where we're holding a path open after we do
btrfs_end_transaction(), which has it's own problems.
[BUG]
After commit 9afc66498a0b ("btrfs: block-group: refactor how we read one
block group item"), cache->length is being assigned after calling
btrfs_create_block_group_cache. This causes a problem since
set_free_space_tree_thresholds calculates the free-space threshold to
decide if the free-space tree should convert from extents to bitmaps.
The current code calls set_free_space_tree_thresholds with cache->length
being 0, which then makes cache->bitmap_high_thresh zero. This implies
the system will always use bitmap instead of extents, which is not
desired if the block group is not fragmented.
This behavior can be seen by a test that expects to repair systems
with FREE_SPACE_EXTENT and FREE_SPACE_BITMAP, but the current code only
created FREE_SPACE_BITMAP.
[FIX]
Call set_free_space_tree_thresholds after setting cache->length. There
is now a WARN_ON in set_free_space_tree_thresholds to help preventing
the same mistake to happen again in the future.
Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
cpuidle stop state implementation has minor optimizations for P10
where hardware preserves more SPR registers compared to P9. The
current P9 driver works for P10, although does few extra
save-restores. P9 driver can provide the required power management
features like SMT thread folding and core level power savings on a P10
platform.
Until the P10 stop driver is available, revert the commit which allows
for only P9 systems to utilize cpuidle and blocks all idle stop states
for P10. CPU idle states are enabled and tested on the P10 platform
with this fix.
Athira Rajeev [Wed, 26 Aug 2020 06:40:29 +0000 (02:40 -0400)]
powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
IMC trace-mode uses MSR[HV/PR] bits to set the cpumode for the
instruction pointer captured in each sample. The bits are fetched from
the third double word of the trace record. Reading third double word
from IMC trace record should use be64_to_cpu() along with READ_ONCE
inorder to fetch correct MSR[HV/PR] bits. Patch addresses this change.
Currently we are using PERF_RECORD_MISC_HYPERVISOR as cpumode if MSR
HV is 1 and PR is 0 which means the address is from host counter. But
using PERF_RECORD_MISC_HYPERVISOR for host counter data will fail to
resolve the address -> symbol during "perf report" because perf tools
side uses PERF_RECORD_MISC_KERNEL to represent the host counter data.
Therefore, fix the trace imc sample data to use
PERF_RECORD_MISC_KERNEL as cpumode for host kernel information.
powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
The bhrb_filter_map ("The Branch History Rolling Buffer") callback is
only defined in raw CPUs' power_pmu structs. The "architected" CPUs
use generic_compat_pmu, which does not have this callback, and crashes
occur if a user tries to enable branch stack for an event.
This add a NULL pointer check for bhrb_filter_map() which behaves as
if the callback returned an error.
This does not add the same check for config_bhrb() as the only caller
checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.
Michael Ellerman [Tue, 25 Aug 2020 09:34:24 +0000 (19:34 +1000)]
powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
unnecessarily changing MSR") changed some of the handling of floating
point/vector restore.
In particular it caused current->thread.fpexc_mode to be copied into
the current MSR (via msr_check_and_set()), rather than just into
regs->msr (which is moved into MSR on return to userspace).
This can lead to a crash in the kernel if we take a floating point
exception when restoring FPSCR:
Fix it by only loading the fpexc_mode value into regs->msr.
Also add a comment to explain that although VSX is subject to the
value of fpexc_mode, we don't have to handle that separately because
we only allow VSX to be enabled if FP is also enabled.
Randy Dunlap [Mon, 24 Aug 2020 00:31:16 +0000 (17:31 -0700)]
Documentation/powerpc: fix malformed table in syscall64-abi
Fix malformed table warning in powerpc/syscall64-abi.rst by making
two tables and moving the headings.
Documentation/powerpc/syscall64-abi.rst:53: WARNING: Malformed table.
Text in column margin in table line 2.
=========== ============= ========================================
--- For the sc instruction, differences with the ELF ABI ---
r0 Volatile (System call number.)
r3 Volatile (Parameter 1, and return value.)
r4-r8 Volatile (Parameters 2-6.)
cr0 Volatile (cr0.SO is the return error condition.)
cr1, cr5-7 Nonvolatile
lr Nonvolatile
--- For the scv 0 instruction, differences with the ELF ABI ---
r0 Volatile (System call number.)
r3 Volatile (Parameter 1, and return value.)
r4-r8 Volatile (Parameters 2-6.)
=========== ============= ========================================
Michael Ellerman [Fri, 21 Aug 2020 10:49:10 +0000 (20:49 +1000)]
video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
The build is currently broken, if COMPILE_TEST=y and PPC_PMAC=n:
linux/drivers/video/fbdev/controlfb.c: In function ‘control_set_hardware’:
linux/drivers/video/fbdev/controlfb.c:276:2: error: implicit declaration of function ‘btext_update_display’
276 | btext_update_display(p->frame_buffer_phys + CTRLFB_OFF,
| ^~~~~~~~~~~~~~~~~~~~
Fix it by including btext.h whenever CONFIG_BOOTX_TEXT is enabled.
Thomas Gleixner [Wed, 26 Aug 2020 20:21:44 +0000 (22:21 +0200)]
x86/irq: Unbreak interrupt affinity setting
Several people reported that 5.8 broke the interrupt affinity setting
mechanism.
The consolidation of the entry code reused the regular exception entry code
for device interrupts and changed the way how the vector number is conveyed
from ptregs->orig_ax to a function argument.
The low level entry uses the hardware error code slot to push the vector
number onto the stack which is retrieved from there into a function
argument and the slot on stack is set to -1.
The reason for setting it to -1 is that the error code slot is at the
position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
indicates that the entry came via a syscall. If it's not set to a negative
value then a signal delivery on return to userspace would try to restart a
syscall. But there are other places which rely on pt_regs::orig_ax being a
valid indicator for syscall entry.
But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
interrupt affinity setting mechanism, which was overlooked when this change
was made.
Moving interrupts on x86 happens in several steps. A new vector on a
different CPU is allocated and the relevant interrupt source is
reprogrammed to that. But that's racy and there might be an interrupt
already in flight to the old vector. So the old vector is preserved until
the first interrupt arrives on the new vector and the new target CPU. Once
that happens the old vector is cleaned up, but this cleanup still depends
on the vector number being stored in pt_regs::orig_ax, which is now -1.
That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
always false. As a consequence the interrupt is moved once, but then it
cannot be moved anymore because the cleanup of the old vector never
happens.
There would be several ways to convey the vector information to that place
in the guts of the interrupt handling, but on deeper inspection it turned
out that this check is pointless and a leftover from the old affinity model
of X86 which supported multi-CPU affinities. Under this model it was
possible that an interrupt had an old and a new vector on the same CPU, so
the vector match was required.
Under the new model the effective affinity of an interrupt is always a
single CPU from the requested affinity mask. If the affinity mask changes
then either the interrupt stays on the CPU and on the same vector when that
CPU is still in the new affinity mask or it is moved to a different CPU, but
it is never moved to a different vector on the same CPU.
Ergo the cleanup check for the matching vector number is not required and
can be removed which makes the dependency on pt_regs:orig_ax go away.
The remaining check for new_cpu == smp_processsor_id() is completely
sufficient. If it matches then the interrupt was successfully migrated and
the cleanup can proceed.
For paranoia sake add a warning into the vector assignment code to
validate that the assumption of never moving to a different vector on
the same CPU holds.
Ashok Raj [Thu, 27 Aug 2020 04:12:10 +0000 (21:12 -0700)]
x86/hotplug: Silence APIC only after all interrupts are migrated
There is a race when taking a CPU offline. Current code looks like this:
native_cpu_disable()
{
...
apic_soft_disable();
/*
* Any existing set bits for pending interrupt to
* this CPU are preserved and will be sent via IPI
* to another CPU by fixup_irqs().
*/
cpu_disable_common();
{
....
/*
* Race window happens here. Once local APIC has been
* disabled any new interrupts from the device to
* the old CPU are lost
*/
fixup_irqs(); // Too late to capture anything in IRR.
...
}
}
The fix is to disable the APIC *after* cpu_disable_common().
Testing was done with a USB NIC that provided a source of frequent
interrupts. A script migrated interrupts to a specific CPU and
then took that CPU offline.
Cyril Roelandt [Tue, 25 Aug 2020 21:22:31 +0000 (23:22 +0200)]
USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
This device does not support UAS properly and a similar entry already
exists in drivers/usb/storage/unusual_uas.h. Without this patch,
storage_probe() defers the handling of this device to UAS, which cannot
handle it either.
Tang Bin [Wed, 26 Aug 2020 14:49:31 +0000 (22:49 +0800)]
usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
If the function platform_get_irq() failed, the negative value
returned will not be detected here. So fix error handling in
exynos_ohci_probe(). And when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant
message here.
Andy Shevchenko [Wed, 26 Aug 2020 19:21:19 +0000 (22:21 +0300)]
USB: gadget: u_f: Unbreak offset calculation in VLAs
Inadvertently the commit b1cd1b65afba ("USB: gadget: u_f: add overflow checks
to VLA macros") makes VLA macros to always return 0 due to different scope of
two variables of the same name. Obviously we need to have only one.
Alan Stern [Wed, 26 Aug 2020 19:46:24 +0000 (15:46 -0400)]
USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
The Sound Devices MixPre-D audio card suffers from the same defect
as the Sound Devices USBPre2: an endpoint shared between a normal
audio interface and a vendor-specific interface, in violation of the
USB spec. Since the USB core now treats duplicated endpoints as bugs
and ignores them, the audio endpoint isn't available and the card
can't be used for audio capture.
Along the same lines as commit bdd1b147b802 ("USB: quirks: blacklist
duplicate ep on Sound Devices USBPre2"), this patch adds a quirks
entry saying to ignore ep5in for interface 1, leaving it available for
use with standard audio interface 2.
Dan Carpenter [Wed, 26 Aug 2020 11:33:30 +0000 (14:33 +0300)]
dma-pool: Fix an uninitialized variable bug in atomic_pool_expand()
The "page" pointer can be used with out being initialized.
Fixes: d7e673ec2c8e ("dma-pool: Only allocate from CMA when in same memory zone") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
Simon Leiner [Tue, 25 Aug 2020 09:31:53 +0000 (11:31 +0200)]
arm/xen: Add misuse warning to virt_to_gfn
As virt_to_gfn uses virt_to_phys, it will return invalid addresses when
used with vmalloc'd addresses. This patch introduces a warning, when
virt_to_gfn is used in this way.
Simon Leiner [Tue, 25 Aug 2020 09:31:52 +0000 (11:31 +0200)]
xen/xenbus: Fix granting of vmalloc'd memory
On some architectures (like ARM), virt_to_gfn cannot be used for
vmalloc'd memory because of its reliance on virt_to_phys. This patch
introduces a check for vmalloc'd addresses and obtains the PFN using
vmalloc_to_pfn in that case.
Thomas Gleixner [Tue, 25 Aug 2020 15:22:58 +0000 (17:22 +0200)]
XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt
XEN data pointer which contains XEN specific information.
handler data is meant for interrupt handlers and not for storing irq chip
specific information as some devices require handler data to store internal
per interrupt information, e.g. pinctrl/GPIO chained interrupt handlers.
This obviously creates a conflict of interests and crashes the machine
because the XEN pointer is overwritten by the driver pointer.
As the XEN data is not handler specific it should be stored in
irqdesc::irq_data::chip_data instead.
A simple sed s/irq_[sg]et_handler_data/irq_[sg]et_chip_data/ cures that.
Dave Airlie [Thu, 27 Aug 2020 02:44:04 +0000 (12:44 +1000)]
Merge tag 'amd-drm-fixes-5.9-2020-08-26' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amd-drm-fixes-5.9-2020-08-26:
amdgpu:
- Misc display fixes
- Backlight fixes
- MPO fix for DCN1
- Fixes for Sienna Cichlid
- Fixes for Navy Flounder
- Vega SW CTF fixes
- SMU fix for Raven
- Fix a possible overflow in INFO ioctl
- Gfx10 clockgating fix
Dave Airlie [Thu, 27 Aug 2020 02:36:32 +0000 (12:36 +1000)]
Merge tag 'drm-msm-fixes-2020-08-24' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Some fixes for v5.9 plus the one opp/bandwidth scaling patch ("drm:
msm: a6xx: use dev_pm_opp_set_bw to scale DDR") which was not included
in the initial pull due to dependency on patch landing thru OPP tree
Dave Airlie [Thu, 27 Aug 2020 02:34:06 +0000 (12:34 +1000)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes
Two fixes:
One fixes a bad interaction with the DRM scheduler, leading to some dma
fences not getting signalled after hitting the job timeout. The other
one fixes a GPU init regression, as apparently one old core doesn't
likes us reading some of the identification registers.
Jens Axboe [Thu, 27 Aug 2020 00:58:26 +0000 (18:58 -0600)]
io_uring: clear req->result on IOPOLL re-issue
Make sure we clear req->result, which was set to -EAGAIN for retry
purposes, when moving it to the reissue list. Otherwise we can end up
retrying a request more than once, which leads to weird results in
the io-wq handling (and other spots).
Eric Sandeen [Wed, 26 Aug 2020 21:11:58 +0000 (14:11 -0700)]
xfs: fix boundary test in xfs_attr_shortform_verify
The boundary test for the fixed-offset parts of xfs_attr_sf_entry in
xfs_attr_shortform_verify is off by one, because the variable array
at the end is defined as nameval[1] not nameval[].
Hence we need to subtract 1 from the calculation.
This can be shown by:
# touch file
# setfattr -n root.a file
and verifications will fail when it's written to disk.
This only matters for a last attribute which has a single-byte name
and no value, otherwise the combination of namelen & valuelen will
push endp further out and this test won't fail.
Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs") Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
Brian Foster [Wed, 26 Aug 2020 21:08:27 +0000 (14:08 -0700)]
xfs: fix off-by-one in inode alloc block reservation calculation
The inode chunk allocation transaction reserves inobt_maxlevels-1
blocks to accommodate a full split of the inode btree. A full split
requires an allocation for every existing level and a new root
block, which means inobt_maxlevels is the worst case block
requirement for a transaction that inserts to the inobt. This can
lead to a transaction block reservation overrun when tmpfile
creation allocates an inode chunk and expands the inobt to its
maximum depth. This problem has been observed in conjunction with
overlayfs, which makes frequent use of tmpfiles internally.
The existing reservation code goes back as far as the Linux git repo
history (v2.6.12). It was likely never observed as a problem because
the traditional file/directory creation transactions also include
worst case block reservation for directory modifications, which most
likely is able to make up for a single block deficiency in the inode
allocation portion of the calculation. tmpfile support is relatively
more recent (v3.15), less heavily used, and only includes the inode
allocation block reservation as tmpfiles aren't linked into the
directory tree on creation.
Fix up the inode alloc block reservation macro and a couple of the
block allocator minleft parameters that enforce an allocation to
leave enough free blocks in the AG for a full inobt split.
Brian Foster [Tue, 18 Aug 2020 15:05:58 +0000 (08:05 -0700)]
xfs: finish dfops on every insert range shift iteration
The recent change to make insert range an atomic operation used the
incorrect transaction rolling mechanism. The explicit transaction
roll does not finish deferred operations. This means that intents
for rmapbt updates caused by extent shifts are not logged until the
final transaction commits. Thus if a crash occurs during an insert
range, log recovery might leave the rmapbt in an inconsistent state.
This was discovered by repeated runs of generic/455.
Update insert range to finish dfops on every shift iteration. This
is similar to collapse range and ensures that intents are logged
with the transactions that make associated changes.
Fixes: dd87f87d87fa ("xfs: rework insert range into an atomic operation") Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
Dinghao Liu [Wed, 26 Aug 2020 13:24:58 +0000 (21:24 +0800)]
drm/amd/display: Fix memleak in amdgpu_dm_mode_config_init
When amdgpu_display_modeset_create_props() fails, state and
state->context should be freed to prevent memleak. It's the
same when amdgpu_dm_audio_init() fails.
Wayne Lin [Tue, 18 Aug 2020 03:19:42 +0000 (11:19 +0800)]
drm/amd/display: Retry AUX write when fail occurs
[Why]
In dm_dp_aux_transfer() now, we forget to handle AUX_WR fail cases. We
suppose every write wil get done successfully and hence some AUX
commands might not sent out indeed.
Alex Deucher [Tue, 25 Aug 2020 15:43:45 +0000 (11:43 -0400)]
drm/amdgpu: Fix buffer overflow in INFO ioctl
The values for "se_num" and "sh_num" come from the user in the ioctl.
They can be in the 0-255 range but if they're more than
AMDGPU_GFX_MAX_SE (4) or AMDGPU_GFX_MAX_SH_PER_SE (2) then it results in
an out of bounds read.
drm/amd/powerplay: Fix hardmins not being sent to SMU for RV
[Why]
DC uses these to raise the voltage as needed for higher dispclk/dppclk
and to ensure that we have enough bandwidth to drive the displays.
There's a bug preventing these from actuially sending messages since
it's checking the actual clock (which is 0) instead of the incoming
clock (which shouldn't be 0) when deciding to send the hardmin.
[How]
Check the clocks != 0 instead of the actual clocks.
Samson Tam [Thu, 13 Aug 2020 14:50:21 +0000 (10:50 -0400)]
drm/amd/display: Fix passive dongle mistaken as active dongle in EDID emulation
[Why]
dongle_type is set during dongle connection but for passive dongles,
dongle_type is not set. If user starts with an active dongle and
then switches to a passive dongle, it will still report as an active
dongle. Trying to emulate the wrong connecter type results in display
not lighting up.
[How]
Set dpcd_caps.dongle_type for passive dongles in detect_dp().
[Why]
Revert HDCP disable sequence change that blanks stream before
disabling HDCP. PSP and HW teams are currently investigating the
root cause of why HDCP cannot be disabled before stream blank,
which is expected to work without issues.
Furquan Shaikh [Thu, 20 Aug 2020 07:52:41 +0000 (00:52 -0700)]
drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in amdgpu_dm_update_backlight_caps
In `amdgpu_dm_update_backlight_caps()`, there is a local
`amdgpu_dm_backlight_caps` object that is filled in by
`amdgpu_acpi_get_backlight_caps()`. However, this object is
uninitialized before the call and hence the subsequent check for
aux_support can fail since it is not initialized by
`amdgpu_acpi_get_backlight_caps()` as well. This change initializes
this local `amdgpu_dm_backlight_caps` object to 0.
drm/amd/display: Reject overlay plane configurations in multi-display scenarios
[Why]
These aren't stable on some platform configurations when driving
multiple displays, especially on higher resolution.
In particular the delay in asserting p-state and validating from
x86 outweights any power or performance benefit from the hardware
composition.
Under some configurations this will manifest itself as extreme stutter
or unresponsiveness especially when combined with cursor movement.
[How]
Disable these for now. Exposing overlays to userspace doesn't guarantee
that they'll be able to use them in any and all configurations and it's
part of the DRM contract to have userspace gracefully handle validation
failures when they occur.
Valdiation occurs as part of DC and this in particular affects RV, so
disable this in dcn10_global_validation.
drm/amd/display: use correct scale for actual_brightness
Documentation for sysfs backlight level interface requires that
values in both 'brightness' and 'actual_brightness' files are
interpreted to be in range from 0 to the value given in the
'max_brightness' file.
With amdgpu, max_brightness gives 255, and values written by the user
into 'brightness' are internally rescaled to a wider range. However,
reading from 'actual_brightness' gives the raw register value without
inverse rescaling. This causes issues for various userspace tools such
as PowerTop and systemd that expect the value to be in the correct
range.
Introduce a helper to retrieve internal backlight range. Use it to
reimplement 'convert_brightness' as 'convert_brightness_from_user' and
introduce 'convert_brightness_to_user'.
Linus Torvalds [Wed, 26 Aug 2020 17:58:20 +0000 (10:58 -0700)]
Merge tag 'tty-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a few small TTY/Serial/vt fixes for 5.9-rc3
Included in here are:
- qcom serial fixes
- vt ioctl and core bugfixes
- pl011 serial driver fixes
- 8250 serial driver fixes
- other misc serial driver fixes
and for good measure:
- fbcon fix for syzbot found problem.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: imx: add dependence and build for earlycon
serial: samsung: Removes the IRQ not found warning
serial: 8250: change lock order in serial8250_do_startup()
serial: stm32: avoid kernel warning on absence of optional IRQ
serial: pl011: Fix oops on -EPROBE_DEFER
serial: pl011: Don't leak amba_ports entry on driver register error
serial: 8250_exar: Fix number of ports for Commtech PCIe cards
tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
serial: qcom_geni_serial: Fix recent kdb hang
vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize()
fbcon: prevent user font height or width change from causing potential out-of-bounds access
vt: defer kfree() of vc_screenbuf in vc_do_resize()
Linus Torvalds [Wed, 26 Aug 2020 17:50:50 +0000 (10:50 -0700)]
Merge tag 'char-misc-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc and other driver subsystem fixes for
5.9-rc3.
The majority of these are tiny habanalabs driver fixes, but also in
here are:
- speakup build fixes now that it is out of staging and got exposed
to more build systems all of a sudden
- mei driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
habanalabs: correctly report inbound pci region cfg error
habanalabs: check correct vmalloc return code
habanalabs: validate FW file size
habanalabs: fix incorrect check on failed workqueue create
habanalabs: set max power according to card type
habanalabs: proper handling of alloc size in coresight
habanalabs: set clock gating according to mask
habanalabs: verify user input in cs_ioctl_signal_wait
habanalabs: Fix a loop in gaudi_extract_ecc_info()
habanalabs: Fix memory corruption in debugfs
habanalabs: validate packet id during CB parse
habanalabs: Validate user address before mapping
habanalabs: unmap PCI bars upon iATU failure
mei: hdcp: fix mei_hdcp_verify_mprime() input parameter
speakup: only build serialio when ISA is enabled
speakup: Fix wait_for_xmitr for ttyio case
Linus Torvalds [Wed, 26 Aug 2020 17:44:15 +0000 (10:44 -0700)]
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
"Two patches from Vineeth to improve Hyper-V timesync facility"
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv_utils: drain the timesync packets on onchannelcallback
hv_utils: return error if host timesysnc update is stale
Linus Torvalds [Wed, 26 Aug 2020 17:40:09 +0000 (10:40 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio bugfixes from Michael Tsirkin:
"A couple vdpa and vhost bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Avoid warnings about shifts on 32-bit platforms
vhost-iotlb: fix vhost_iotlb_itree_next() documentation
vdpa: ifcvf: free config irq in ifcvf_free_irq()
vdpa: ifcvf: return err when fail to request config irq
Jens Axboe [Wed, 26 Aug 2020 16:36:20 +0000 (10:36 -0600)]
io_uring: make offset == -1 consistent with preadv2/pwritev2
The man page for io_uring generally claims were consistent with what
preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
in how offset == -1 is handled for pipes/streams. preadv doesn't allow
it, but preadv2 does. This currently causes io_uring to return -EINVAL
if that is attempted, but we should allow that as documented.
This change makes us consistent with preadv2/pwritev2 for just passing
in a NULL ppos for streams if the offset is -1.
Sven Schnelle [Thu, 20 Aug 2020 07:48:23 +0000 (09:48 +0200)]
s390: don't trace preemption in percpu macros
Since commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context}
to per-cpu variables") the lockdep code itself uses percpu variables. This
leads to recursions because the percpu macros are calling preempt_enable()
which might call trace_preempt_on().
Martijn Coenen [Tue, 25 Aug 2020 07:18:29 +0000 (09:18 +0200)]
loop: Set correct device size when using LOOP_CONFIGURE
The device size calculation was done before processing the loop
configuration, which meant that the we set the size on the underlying
block device incorrectly in case lo_offset/lo_sizelimit were set in the
configuration. Delay computing the size until we've setup the device
parameters correctly.
Hou Pu [Mon, 10 Aug 2020 12:00:44 +0000 (08:00 -0400)]
nbd: restore default timeout when setting it to zero
If we configured io timeout of nbd0 to 100s. Later after we
finished using it, we configured nbd0 again and set the io
timeout to 0. We expect it would timeout after 30 seconds
and keep retry. But in fact we could not change the timeout
when we set it to 0. the timeout is still the original 100s.
So change the timeout to default 30s when we set it to zero.
It also behaves same as commit 2da22da57348 ("nbd: fix zero
cmd timeout handling v2").
It becomes more important if we were reconfigure a nbd device
and the io timeout it set to zero. Because it could take 30s
to detect the new socket and thus io could be completed more
quickly compared to 100s.
vdpa/mlx5: Avoid warnings about shifts on 32-bit platforms
Clang warns several times when building for 32-bit ARM along the lines
of:
drivers/vdpa/mlx5/net/mlx5_vnet.c:1462:31: warning: shift count >= width
of type [-Wshift-count-overflow]
ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
^~~~~~~~~~~~~~~~~~~~~~~
This is related to the BIT macro, which uses an unsigned long literal,
which is 32-bit on ARM so having a shift equal to or larger than 32 will
cause this warning, such as the above, where VIRTIO_F_VERSION_1 is 32.
To avoid this, use BIT_ULL, which will be an unsigned long long. This
matches the size of the features field throughout this driver, which is
u64 so there should be no functional change.
This patch contains trivial changes for the vhost_iotlb_itree_next()
documentation, fixing the function name and the description of
first argument (@map).
Jason Wang [Thu, 23 Jul 2020 09:12:54 +0000 (17:12 +0800)]
vdpa: ifcvf: free config irq in ifcvf_free_irq()
We don't free config irq in ifcvf_free_irq() which will trigger a
BUG() in pci core since we try to free the vectors that has an
action. Fixing this by recording the config irq in ifcvf_hw structure
and free it in ifcvf_free_irq().
Peter Zijlstra [Fri, 7 Aug 2020 18:53:16 +0000 (20:53 +0200)]
lockdep,trace: Expose tracepoints
The lockdep tracepoints are under the lockdep recursion counter, this
has a bunch of nasty side effects:
- TRACE_IRQFLAGS doesn't work across the entire tracepoint
- RCU-lockdep doesn't see the tracepoints either, hiding numerous
"suspicious RCU usage" warnings.
Pull the trace_lock_*() tracepoints completely out from under the
lockdep recursion handling and completely rely on the trace level
recusion handling -- also, tracing *SHOULD* not be taking locks in any
case.
Nicholas Piggin [Thu, 23 Jul 2020 10:56:14 +0000 (20:56 +1000)]
lockdep: Only trace IRQ edges
Problem:
raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*
existing instances:
- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()
A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now
Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.
So we pick B) and declare any code that ends up doing:
broken. AFAICT this problem has existed forever, the only reason it came
up is because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ
state tracking") changed IRQ tracing vs lockdep recursion and the
first instance is fairly common, the other cases hardly ever happen.
Peter Zijlstra [Wed, 12 Aug 2020 10:27:10 +0000 (12:27 +0200)]
cpuidle: Move trace_cpu_idle() into generic code
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and
put it in the generic code, right before disabling RCU. Gets rid of
more trace_*_rcuidle() users.
Peter Zijlstra [Thu, 20 Aug 2020 07:13:30 +0000 (09:13 +0200)]
lockdep: Use raw_cpu_*() for per-cpu variables
Sven reported that commit a21ee6055c30 ("lockdep: Change
hardirq{s_enabled,_context} to per-cpu variables") caused trouble on
s390 because their this_cpu_*() primitives disable preemption which
then lands back tracing.
On the one hand, per-cpu ops should use preempt_*able_notrace() and
raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*()
ops for this.
Marco Elver [Thu, 20 Aug 2020 17:20:46 +0000 (19:20 +0200)]
sched: Use __always_inline on is_idle_task()
is_idle_task() may be used from noinstr functions such as
irqentry_enter(). Since the compiler is free to not inline regular
inline functions, switch to using __always_inline.
Marek Szyprowski [Mon, 13 Jul 2020 07:07:08 +0000 (09:07 +0200)]
drm/exynos: gem: Fix sparse warning
kvaddr element of the exynos_gem object points to a memory buffer, thus
it should not have a __iomem annotation. Then, to avoid a warning or
casting on assignment to fbi structure, the screen_buffer element of the
union should be used instead of the screen_base.
Thomas Gleixner [Tue, 25 Aug 2020 23:36:07 +0000 (01:36 +0200)]
Merge tag 'irqchip-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Revert the wholesale conversion to platform drivers of the pdc, sysirq
and cirq drivers, as it breaks a number of platforms even when the
driver is built-in (probe ordering bites you).
- Prevent interrupt from being lost with the STM32 exti driver
- Fix wake-up interrupts for the MIPS Ingenic driver
- Fix an embarassing typo in the new module helpers, leading to the probe
failing most of the time
- The promised TI firmware rework that couldn't make it into the merge
window due to a very badly managed set of dependencies
Linus Torvalds [Tue, 25 Aug 2020 18:59:23 +0000 (11:59 -0700)]
Merge tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fix from Greg Ungerer:
"Only a single fix for the binfmt_flat loader (reverting a recent
change)"
* tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
binfmt_flat: revert "binfmt_flat: don't offset the data start"
Jens Axboe [Tue, 25 Aug 2020 18:59:22 +0000 (12:59 -0600)]
io_uring: ensure read requests go through -ERESTART* transformation
We need to call kiocb_done() for any ret < 0 to ensure that we always
get the proper -ERESTARTSYS (and friends) transformation done.
At some point this should be tied into general error handling, so we
can get rid of the various (mostly network) related commands that check
and perform this substitution.
Linus Torvalds [Tue, 25 Aug 2020 18:42:43 +0000 (11:42 -0700)]
Merge tag 'libnvdimm-fix-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Vishal Verma:
"A couple of minor fixes for things merged in 5.9-rc1.
One is an out-of-bounds access caught by KASAN, and the second is a
tweak to some overzealous logging about dax support even for
traditional block devices which was unnecessary"
* tag 'libnvdimm-fix-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: do not print error message for non-persistent memory block device
libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group
Jens Axboe [Tue, 25 Aug 2020 18:27:50 +0000 (12:27 -0600)]
io_uring: don't use poll handler if file can't be nonblocking read/written
There's no point in using the poll handler if we can't do a nonblocking
IO attempt of the operation, since we'll need to go async anyway. In
fact this is actively harmful, as reading from eg pipes won't return 0
to indicate EOF.
Linus Torvalds [Tue, 25 Aug 2020 18:13:29 +0000 (11:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- regression fix / revert of a commit that intended to reduce probing
delay by ~50ms, but introduced a race that causes quite a few devices
not to enumerate, or get stuck on first IRQ
- buffer overflow fix in hiddev, from Peilin Ye
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Revert "HID: usbhid: do not sleep when opening device"
HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage()
HID: quirks: Always poll three more Lenovo PixArt mice
HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands
HID: macally: Constify macally_id_table
HID: cougar: Constify cougar_id_table
Jens Axboe [Tue, 25 Aug 2020 13:58:00 +0000 (07:58 -0600)]
io_uring: fix imbalanced sqo_mm accounting
We do the initial accounting of locked_vm and pinned_vm before we have
setup ctx->sqo_mm, which means we can end up having not accounted the
memory at setup time, but still decrement it when we exit. This causes
an imbalance in the accounting.
Setup ctx->sqo_mm earlier in io_uring_create(), before we do the first
accounting of mm->{locked,pinned}_vm. This also unifies the state
grabbing for the ctx, and eliminates a failure case in
io_sq_offload_start().
PM: sleep: core: Fix the handling of pending runtime resume requests
It has been reported that system-wide suspend may be aborted in the
absence of any wakeup events due to unforseen interactions of it with
the runtume PM framework.
One failing scenario is when there are multiple devices sharing an
ACPI power resource and runtime-resume needs to be carried out for
one of them during system-wide suspend (for example, because it needs
to be reconfigured before the whole system goes to sleep). In that
case, the runtime-resume of that device involves turning the ACPI
power resource "on" which in turn causes runtime-resume requests
to be queued up for all of the other devices sharing it. Those
requests go to the runtime PM workqueue which is frozen during
system-wide suspend, so they are not actually taken care of until
the resume of the whole system, but the pm_runtime_barrier()
call in __device_suspend() sees them and triggers system wakeup
events for them which then cause the system-wide suspend to be
aborted if wakeup source objects are in active use.
Of course, the logic that leads to triggering those wakeup events is
questionable in the first place, because clearly there are cases in
which a pending runtime resume request for a device is not connected
to any real wakeup events in any way (like the one above). Moreover,
it is racy, because the device may be resuming already by the time
the pm_runtime_barrier() runs and so if the driver doesn't take care
of signaling the wakeup event as appropriate, it will be lost.
However, if the driver does take care of that, the extra
pm_wakeup_event() call in the core is redundant.
Accordingly, drop the conditional pm_wakeup_event() call fron
__device_suspend() and make the latter call pm_runtime_barrier()
alone. Also modify the comment next to that call to reflect the new
code and extend it to mention the need to avoid unwanted interactions
between runtime PM and system-wide device suspend callbacks.
ACPI: OSL: Prevent acpi_release_memory() from returning too early
After commit 1757659d022b ("ACPI: OSL: Implement deferred unmapping
of ACPI memory") in some cases acpi_release_memory() may return
before the target memory mappings actually go away, because they
are released asynchronously now.
Prevent it from returning prematurely by making it wait for the next
RCU grace period to elapse, for all of the RCU callbacks to complete
and for all of the scheduled work items to be flushed before
returning.
usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures
The patch addresses the compliance test failures while running TDA
2.3.1.1 and TDA 2.3.1.2 of the "PD Communications Engine USB PD
Compliance MOI" test plan published in https://www.usb.org/usbc.
For a product to be Type-C compliant, it's expected that these tests
are run on usb.org certified Type-C compliance tester as mentioned in
https://www.usb.org/usbc.
While the purpose of TDA 2.3.1.1 and TDA 2.3.1.2 is to verify that
the static and dynamic electrical capabilities of a Source meet the
requirements for each PDO offered, while doing so, the tests also
monitor that the timing of the VBUS waveform versus the messages meets
the requirements for Hard Reset defined in PROT-PROC-HR-TSTR as
mentioned in step 11 of TDA.2.3.1.1 and step 15 of TDA.2.3.1.2.
TDB.2.2.13.1: PROT-PROC-HR-TSTR Procedure and Checks for Tester
Originated Hard Reset
Purpose: To perform the appropriate protocol checks relating to any
circumstance in which the Hard Reset signal is sent by the Tester.
UUT is behaving as source:
The Tester sends a Hard Reset signal.
1. Check VBUS stays within present valid voltage range for
tPSHardReset min (25ms) after last bit of Hard Reset signal.
[PROT_PROC_HR_TSTR_1]
2. Check that VBUS starts to fall below present valid voltage range by
tPSHardReset max (35ms). [PROT_PROC_HR_TSTR_2]
3. Check that VBUS reaches vSafe0V within tSafe0v max (650 ms).
[PROT_PROC_HR_TSTR_3]
4. Check that VBUS starts rising to vSafe5V after a delay of
tSrcRecover (0.66s - 1s) from reaching vSafe0V. [PROT_PROC_HR_TSTR_4]
5. Check that VBUS reaches vSafe5V within tSrcTurnOn max (275ms) of
rising above vSafe0v max (0.8V). [PROT_PROC_HR_TSTR_5] Power Delivery
Compliance Plan 139 6. Check that Source Capabilities are finished
sending within tFirstSourceCap max (250ms) of VBUS reaching vSafe5v
min. [PROT_PROC_HR_TSTR_6].
This is in line with 7.1.5 Response to Hard Resets of the USB Power
Delivery Specification Revision 3.0, Version 1.2,
"Hard Reset Signaling indicates a communication failure has occurred
and the Source Shall stop driving VCONN, Shall remove Rp from the
VCONN pin and Shall drive VBUS to vSafe0V as shown in Figure 7-9. The
USB connection May reset during a Hard Reset since the VBUS voltage
will be less than vSafe5V for an extended period of time. After
establishing the vSafe0V voltage condition on VBUS, the Source Shall
wait tSrcRecover before re-applying VCONN and restoring VBUS to
vSafe5V. A Source Shall conform to the VCONN timing as specified in
[USB Type-C 1.3]."
With the above guidelines from the spec in mind, TCPM does not turn
off VCONN while entering SRC_HARD_RESET_VBUS_OFF. The patch makes TCPM
turn off VCONN while entering SRC_HARD_RESET_VBUS_OFF and turn it back
on while entering SRC_HARD_RESET_VBUS_ON along with vbus instead of
having VCONN on through hardreset.
Also, the spec clearly states that "After establishing the vSafe0V
voltage condition on VBUS", the Source Shall wait tSrcRecover before
re-applying VCONN and restoring VBUS to vSafe5V.
TCPM does not conform to this requirement. If the TCPC driver calls
tcpm_vbus_change with vbus off signal, TCPM right away enters
SRC_HARD_RESET_VBUS_ON without waiting for tSrcRecover.
For TCPC's which are buggy/does not call tcpm_vbus_change, TCPM
assumes that the vsafe0v is instantaneous as TCPM only waits
tSrcRecover instead of waiting for tSafe0v + tSrcRecover.
This patch also fixes this behavior by making sure that TCPM waits for
tSrcRecover before transitioning into SRC_HARD_RESET_VBUS_ON when
tcpm_vbus_change is called by TCPC.
When TCPC does not call tcpm_vbus_change, TCPM assumes the worst case
i.e. tSafe0v + tSrcRecover before transitioning into
SRC_HARD_RESET_VBUS_ON.
The commit 2a6c0b82e651 ("USB: PHY: JZ4770: Add support for new
Ingenic SoCs.") introduced the initialization function for different
chips, but left the relevant code involved in the resetting process
in the original function, resulting in uninitialized variable calls.
Brooke Basile [Tue, 25 Aug 2020 13:07:27 +0000 (09:07 -0400)]
USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
Some values extracted by ncm_unwrap_ntb() could possibly lead to several
different out of bounds reads of memory. Specifically the values passed
to netdev_alloc_skb_ip_align() need to be checked so that memory is not
overflowed.
Resolve this by applying bounds checking to a number of different
indexes and lengths of the structure parsing logic.
Brooke Basile [Tue, 25 Aug 2020 13:05:08 +0000 (09:05 -0400)]
USB: gadget: u_f: add overflow checks to VLA macros
size can potentially hold an overflowed value if its assigned expression
is left unchecked, leading to a smaller than needed allocation when
vla_group_size() is used by callers to allocate memory.
To fix this, add a test for saturation before declaring variables and an
overflow check to (n) * sizeof(type).
If the expression results in overflow, vla_group_size() will return SIZE_MAX.
Jens Axboe [Mon, 24 Aug 2020 17:45:26 +0000 (11:45 -0600)]
io_uring: revert consumed iov_iter bytes on error
Some consumers of the iov_iter will return an error, but still have
bytes consumed in the iterator. This is an issue for -EAGAIN, since we
rely on a sane iov_iter state across retries.
Fix this by ensuring that we revert consumed bytes, if any, if the file
operations have consumed any bytes from iterator. This is similar to what
generic_file_read_iter() does, and is always safe as we have the previous
bytes count handy already.