Chris Park [Wed, 5 Aug 2020 17:46:40 +0000 (13:46 -0400)]
drm/amd/display: Call DMUB for eDP power control
[Why]
If DMUB is used, LVTMA VBIOS call can be used to control eDP instead of
tranditional transmitter control. Interface is agreed with VBIOS for
eDP to use this new path to program LVTMA registers.
[How]
Create DAL interface to send DMUB command for LVTMA as currently
implemented in VBIOS.
Linus Torvalds [Tue, 18 Aug 2020 21:27:12 +0000 (14:27 -0700)]
Merge tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A bunch of fixes that came in for SPI during the merge window.
Some from ST and others for their controller, one from Lukas for a
race between device addition and controller unregistration and one
from fix from Geert for the DT bindings which unbreaks validation"
* tag 'spi-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
dt-bindings: lpspi: Add missing boolean type for fsl,spi-only-use-cs1-sel
spi: stm32: always perform registers configuration prior to transfer
spi: stm32: fixes suspend/resume management
spi: stm32: fix stm32_spi_prepare_mbr in case of odd clk_rate
spi: stm32: fix fifo threshold level in case of short transfer
spi: stm32h7: fix race condition at end of transfer
spi: stm32: clear only asserted irq flags on interrupt
spi: Prevent adding devices below an unregistering controller
Jens Axboe [Tue, 18 Aug 2020 20:58:33 +0000 (13:58 -0700)]
io_uring: cleanup io_import_iovec() of pre-mapped request
io_rw_prep_async() goes through a dance of clearing req->io, calling
the iovec import, then re-setting req->io. Provide an internal helper
that does the right thing without needing state tweaked to get there.
This enables further cleanups in io_read, io_write, and
io_resubmit_prep(), but that's left for another time.
Guchun Chen [Thu, 13 Aug 2020 06:35:35 +0000 (14:35 +0800)]
drm/amdgpu: fix NULL pointer access issue when unloading driver
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.
Kevin Wang [Thu, 6 Aug 2020 15:41:47 +0000 (23:41 +0800)]
drm/amdgpu: fix uninit-value in arcturus_log_thermal_throttling_event()
when function arcturus_get_smu_metrics_data() call failed,
it will cause the variable "throttler_status" isn't initialized before use.
warning:
powerplay/arcturus_ppt.c:2268:24: warning: ‘throttler_status’ may be used uninitialized in this function [-Wmaybe-uninitialized]
2268 | if (throttler_status & logging_label[throttler_idx].feature_mask) {
net: gianfar: Add of_node_put() before goto statement
Every iteration of for_each_available_child_of_node() decrements
reference count of the previous node, however when control
is transferred from the middle of the loop, as in the case of
a return or break or goto, there is no decrement thus ultimately
resulting in a memory leak.
Fix a potential memory leak in gianfar.c by inserting of_node_put()
before the goto statement.
Ganji Aravind [Tue, 18 Aug 2020 15:40:58 +0000 (21:10 +0530)]
cxgb4: Fix race between loopback and normal Tx path
Even after Tx queues are marked stopped, there exists a
small window where the current packet in the normal Tx
path is still being sent out and loopback selftest ends
up corrupting the same Tx ring. So, ensure selftest takes
the Tx lock to synchronize access the Tx ring.
Fixes: 7235ffae3d2c ("cxgb4: add loopback ethtool self-test") Signed-off-by: Ganji Aravind <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Tue, 18 Aug 2020 19:49:13 +0000 (12:49 -0700)]
Merge branch 'sfc-more-EF100-fixes'
Edward Cree says:
====================
sfc: more EF100 fixes
Fix up some bugs in the initial EF100 submission, and re-fix
the hash_valid fix which was incomplete.
The reset bugs are currently hard to trigger; they were found
with an in-progress patch adding ethtool support, whereby
ethtool --reset reliably reproduces them.
====================
Edward Cree [Tue, 18 Aug 2020 12:44:50 +0000 (13:44 +0100)]
sfc: don't free_irq()s if they were never requested
If efx_nic_init_interrupt fails, or was never run (e.g. due to an earlier
failure in ef100_net_open), freeing irqs in efx_nic_fini_interrupt is not
needed and will cause error messages and stack traces.
So instead, only do this if efx_nic_init_interrupt successfully completed,
as indicated by the new efx->irqs_hooked flag.
Fixes: 965b549f3c20 ("sfc_ef100: implement ndo_open/close and EVQ probing") Signed-off-by: Edward Cree <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Edward Cree [Tue, 18 Aug 2020 12:44:18 +0000 (13:44 +0100)]
sfc: null out channel->rps_flow_id after freeing it
If an ef100_net_open() fails, ef100_net_stop() may be called without
channel->rps_flow_id having been written; thus it may hold the address
freed by a previous ef100_net_stop()'s call to efx_remove_filters().
This then causes a double-free when efx_remove_filters() is called
again, leading to a panic.
To prevent this, after freeing it, overwrite it with NULL.
Fixes: a9dc3d5612ce ("sfc_ef100: RX filter table management and related gubbins") Signed-off-by: Edward Cree <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Edward Cree [Tue, 18 Aug 2020 12:43:57 +0000 (13:43 +0100)]
sfc: take correct lock in ef100_reset()
When downing and upping the ef100 filter table, we need to take a write
lock on efx->filter_sem, not just a read lock, because we may kfree()
the table pointers.
Without this, resets cause a WARN_ON from efx_rwsem_assert_write_locked().
Fixes: a9dc3d5612ce ("sfc_ef100: RX filter table management and related gubbins") Signed-off-by: Edward Cree <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Alvin Šipraga [Tue, 18 Aug 2020 08:51:34 +0000 (10:51 +0200)]
macvlan: validate setting of multiple remote source MAC addresses
Remote source MAC addresses can be set on a 'source mode' macvlan
interface via the IFLA_MACVLAN_MACADDR_DATA attribute. This commit
tightens the validation of these MAC addresses to match the validation
already performed when setting or adding a single MAC address via the
IFLA_MACVLAN_MACADDR attribute.
iproute2 uses IFLA_MACVLAN_MACADDR_DATA for its 'macvlan macaddr set'
command, and IFLA_MACVLAN_MACADDR for its 'macvlan macaddr add' command,
which demonstrates the inconsistent behaviour that this commit
addresses:
# ip link add link eth0 name macvlan0 type macvlan mode source
# ip link set link dev macvlan0 type macvlan macaddr add 01:00:00:00:00:00
RTNETLINK answers: Cannot assign requested address
# ip link set link dev macvlan0 type macvlan macaddr set 01:00:00:00:00:00
# ip -d link show macvlan0
5: macvlan0@eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 ...
link/ether 2e:ac:fd:2d:69:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvlan mode source remotes (1) 01:00:00:00:00:00 numtxqueues 1 ...
With this change, the 'set' command will (rightly) fail in the same way
as the 'add' command.
Linus Torvalds [Tue, 18 Aug 2020 19:05:46 +0000 (12:05 -0700)]
Merge tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull ia64 page table fix from Mike Rapoport:
"Fix regression in IA-64 caused by page table allocation refactoring
The refactoring and consolidation of <asm/pgalloc.h> caused regression
on parisc and ia64. The fix for parisc made it into v5.9-rc1 while the
fix ia64 got delayed a bit and here it is"
* tag 'fixes-2020-08-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
arch/ia64: Restore arch-specific pgd_offset_k implementation
Yang Shi [Sat, 15 Aug 2020 04:30:41 +0000 (21:30 -0700)]
mm/memory.c: skip spurious TLB flush for retried page fault
Recently we found regression when running will_it_scale/page_fault3 test
on ARM64. Over 70% down for the multi processes cases and over 20% down
for the multi threads cases. It turns out the regression is caused by
commit 89b15332af7c ("mm: drop mmap_sem before calling
balance_dirty_pages() in write fault").
The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault. The retried
page fault would see correct PTEs installed then just fall through to
spurious TLB flush. The regression is caused by the excessive spurious
TLB flush. It is fine on x86 since x86's spurious TLB flush is no-op.
We could just skip the spurious TLB flush to mitigate the regression.
brookxu [Fri, 7 Aug 2020 14:01:39 +0000 (22:01 +0800)]
ext4: optimize the implementation of ext4_mb_good_group()
It might be better to adjust the code in two places:
1. Determine whether grp is currupt or not should be placed first.
2. (cr<=2 && free <ac->ac_g_ex.fe_len)should may belong to the crx
strategy, and it may be more appropriate to put it in the
subsequent switch statement block. For cr1, cr2, the conditions
in switch potentially realize the above judgment. For cr0, we
should add (free <ac->ac_g_ex.fe_len) judgment, and then delete
(free / fragments) >= ac->ac_g_ex.fe_len), because cr0 returns
true by default.
This patch causes a regression with quite a few devices, as probing fails
because of the race where the first IRQ is dropped on the floor (after
hid_device_io_start() happens, but before the 50ms timeout passess), and
report descriptor never gets parsed and populated.
As this is just a boot time micro-optimization, let's revert the
patch for 5.9 now, and fix this properly eventually for next merge
window.
Andrii Nakryiko [Tue, 18 Aug 2020 16:44:56 +0000 (09:44 -0700)]
libbpf: Fix build on ppc64le architecture
On ppc64le we get the following warning:
In file included from btf_dump.c:16:0:
btf_dump.c: In function ‘btf_dump_emit_struct_def’:
../include/linux/kernel.h:20:17: error: comparison of distinct pointer types lacks a cast [-Werror]
(void) (&_max1 == &_max2); \
^
btf_dump.c:882:11: note: in expansion of macro ‘max’
m_sz = max(0LL, btf__resolve_size(d->btf, m->type));
^~~
Fix by explicitly casting to __s64, which is a return type from
btf__resolve_size().
drm/msm: add shutdown support for display platform_driver
Define shutdown callback for display drm driver,
so as to disable all the CRTCS when shutdown
notification is received by the driver.
This change will turn off the timing engine so
that no display transactions are requested
while mmu translations are getting disabled
during reboot sequence.
Signed-off-by: Krishna Manikandan <[email protected]>
Changes in v2:
- Remove NULL check from msm_pdev_shutdown (Stephen Boyd)
- Change commit text to reflect when this issue
was uncovered (Sai Prakash Ranjan)
Dmitry Monakhov [Tue, 11 Aug 2020 06:43:40 +0000 (06:43 +0000)]
bfq: fix blkio cgroup leakage v4
Changes from v1:
- update commit description with proper ref-accounting justification
commit db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree")
introduce leak forbfq_group and blkcg_gq objects because of get/put
imbalance.
In fact whole idea of original commit is wrong because bfq_group entity
can not dissapear under us because it is referenced by child bfq_queue's
entities from here:
-> bfq_init_entity()
->bfqg_and_blkg_get(bfqg);
->entity->parent = bfqg->my_entity
So parent entity can not disappear while child entity is in tree,
and child entities already has proper protection.
This patch revert commit db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree")
bfq_group leak trace caused by bad commit:
-> blkg_alloc
-> bfq_pq_alloc
-> bfqg_get (+1)
->bfq_activate_bfqq
->bfq_activate_requeue_entity
-> __bfq_activate_entity
->bfq_get_entity
->bfqg_and_blkg_get (+1) <==== : Note1
->bfq_del_bfqq_busy
->bfq_deactivate_entity+0x53/0xc0 [bfq]
->__bfq_deactivate_entity+0x1b8/0x210 [bfq]
-> bfq_forget_entity(is_in_service = true)
entity->on_st_or_in_serv = false <=== :Note2
if (is_in_service)
return; ==> do not touch reference
-> blkcg_css_offline
-> blkcg_destroy_blkgs
-> blkg_destroy
-> bfq_pd_offline
-> __bfq_deactivate_entity
if (!entity->on_st_or_in_serv) /* true, because (Note2)
return false;
-> bfq_pd_free
-> bfqg_put() (-1, byt bfqg->ref == 2) because of (Note2)
So bfq_group and blkcg_gq will leak forever, see test-case below.
##TESTCASE_BEGIN:
#!/bin/bash
max_iters=${1:-100}
#prep cgroup mounts
mount -t tmpfs cgroup_root /sys/fs/cgroup
mkdir /sys/fs/cgroup/blkio
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
Fixes: db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree") Tested-by: Oleksandr Natalenko <[email protected]> Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto
the stack as part of machine check delivery is valid or not.
Various drivers copied a code fragment that uses the RIPV bit to
determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED
or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where
RIPV is set as "FATAL").
Reverse the tests so that the error is marked fatal when RIPV is not set.
wired_cmd_repeater_auth_stream_req_in has a variable
length array at the end. we use struct_size() overflow
macro to determine the size for the allocation and sending
size.
This also fixes bug in case number of streams is > 0 in the original
submission. This bug was not triggered as the number of streams is
always one.
Fixes: c56967d674e3 (mei: hdcp: Replace one-element array with flexible-array member) Fixes: 0a1af1b5c18d (misc/mei/hdcp: Verify M_prime) Cc: <[email protected]> # v5.1+: c56967d674e3 (mei: hdcp: Replace one-element array with flexible-array member) Signed-off-by: Tomas Winkler <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Tamseel Shams [Mon, 10 Aug 2020 03:00:21 +0000 (08:30 +0530)]
serial: samsung: Removes the IRQ not found warning
In few older Samsung SoCs like s3c2410, s3c2412
and s3c2440, UART IP is having 2 interrupt lines.
However, in other SoCs like s3c6400, s5pv210,
exynos5433, and exynos4210 UART is having only 1
interrupt line. Due to this, "platform_get_irq(platdev, 1)"
call in the driver gives the following false-positive error:
"IRQ index 1 not found" on newer SoC's.
This patch adds the condition to check for Tx interrupt
only for the those SoC's which have 2 interrupt lines.
serial: 8250: change lock order in serial8250_do_startup()
We have a number of "uart.port->desc.lock vs desc.lock->uart.port"
lockdep reports coming from 8250 driver; this causes a bit of trouble
to people, so let's fix it.
The problem is reverse lock order in two different call paths:
Fix this by changing the order of locks in serial8250_do_startup():
do disable_irq_nosync() first, which grabs desc->lock, and grab
uart->port after that, so that chain #1 and chain #2 have same lock
order.
Full lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
5.4.39 #55 Not tainted
======================================================
swapper/0/0 is trying to acquire lock: ffffffffab65b6c0 (console_owner){-...}, at: console_lock_spinning_enable+0x31/0x57
but task is already holding lock: ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at: __report_bad_irq+0x5b/0xba
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Holger Assmann [Thu, 13 Aug 2020 15:27:57 +0000 (17:27 +0200)]
serial: stm32: avoid kernel warning on absence of optional IRQ
stm32_init_port() of the stm32-usart may trigger a warning in
platform_get_irq() when the device tree specifies no wakeup interrupt.
The wakeup interrupt is usually a board-specific GPIO and the driver
functions correctly in its absence. The mainline stm32mp151.dtsi does
not specify it, so all mainline device trees trigger an unnecessary
kernel warning. Use of platform_get_irq_optional() avoids this.
Lukas Wunner [Thu, 13 Aug 2020 10:52:40 +0000 (12:52 +0200)]
serial: pl011: Fix oops on -EPROBE_DEFER
If probing of a pl011 gets deferred until after free_initmem(), an oops
ensues because pl011_console_match() is called which has been freed.
Fix by removing the __init attribute from the function and those it
calls.
Commit 10879ae5f12e ("serial: pl011: add console matching function")
introduced pl011_console_match() not just for early consoles but
regular preferred consoles, such as those added by acpi_parse_spcr().
Regular consoles may be registered after free_initmem() for various
reasons, one being deferred probing, another being dynamic enablement
of serial ports using a DeviceTree overlay.
Thus, pl011_console_match() must not be declared __init and the
functions it calls mustn't either.
Stack trace for posterity:
Unable to handle kernel paging request at virtual address 80c38b58
Internal error: Oops: 8000000d [#1] PREEMPT SMP ARM
PC is at pl011_console_match+0x0/0xfc
LR is at register_console+0x150/0x468
[<80187004>] (register_console)
[<805a8184>] (uart_add_one_port)
[<805b2b68>] (pl011_register_port)
[<805b3ce4>] (pl011_probe)
[<80569214>] (amba_probe)
[<805ca088>] (really_probe)
[<805ca2ec>] (driver_probe_device)
[<805ca5b0>] (__device_attach_driver)
[<805c8060>] (bus_for_each_drv)
[<805c9dfc>] (__device_attach)
[<805ca630>] (device_initial_probe)
[<805c90a8>] (bus_probe_device)
[<805c95a8>] (deferred_probe_work_func)
Lukas Wunner [Thu, 13 Aug 2020 10:59:54 +0000 (12:59 +0200)]
serial: pl011: Don't leak amba_ports entry on driver register error
pl011_probe() calls pl011_setup_port() to reserve an amba_ports[] entry,
then calls pl011_register_port() to register the uart driver with the
tty layer.
If registration of the uart driver fails, the amba_ports[] entry is not
released. If this happens 14 times (value of UART_NR macro), then all
amba_ports[] entries will have been leaked and driver probing is no
longer possible. (To be fair, that can only happen if the DeviceTree
doesn't contain alias IDs since they cause the same entry to be used for
a given port.) Fix it.
If the number of ports a card has is not explicitly specified, it defaults
to the rightmost 4 bits of the PCI device ID. This is prone to error since
not all PCI device IDs contain a number which corresponds to the number of
ports that card provides.
This particular case involves COMMTECH_4222PCIE, COMMTECH_4224PCIE and
COMMTECH_4228PCIE cards with device IDs 0x0022, 0x0020 and 0x0021.
Currently the multiport cards receive 2, 0 and 1 port instead of 2, 4 and
8 ports respectively.
To fix this, each Commtech Fastcom PCIe card is given a struct where the
number of ports is explicitly specified. This ensures 'board->num_ports'
is used instead of the default 'pcidev->device & 0x0f'.
John Stultz [Tue, 11 Aug 2020 02:50:44 +0000 (02:50 +0000)]
tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
When booting with heavily modularized config, the serial console
may not be able to load until after init when modules that
satisfy needed dependencies have time to load.
Unfortunately, as qcom_geni_console_setup is marked as __init,
the function may have been freed before we get to run it,
causing boot time crashes such as:
The commit e42d6c3ec0c7 ("serial: qcom_geni_serial: Make kgdb work
even if UART isn't console") worked pretty well and I've been doing a
lot of debugging with it. However, recently I typed "dmesg" in kdb
and then held the space key down to scroll through the pagination. My
device hung. This was repeatable and I found that it was introduced
with the aforementioned commit.
It turns out that there are some strange boundary cases in geni where
in some weird situations it will signal RX_LAST but then will put 0 in
RX_LAST_BYTE. This means that the entire last FIFO entry is valid.
This weird corner case is handled in qcom_geni_serial_handle_rx()
where you can see that we only honor RX_LAST_BYTE if RX_LAST is set
_and_ RX_LAST_BYTE is non-zero. If either of these is not true we use
BYTES_PER_FIFO_WORD (4) for the size of the last FIFO word.
Let's fix kgdb. While at it, also use the proper #define for 4.
George Kennedy [Fri, 31 Jul 2020 16:33:12 +0000 (12:33 -0400)]
vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize()
vc_resize() can return with an error after failure. Change VT_RESIZEX ioctl
to save struct vc_data values that are modified and restore the original
values in case of error.
George Kennedy [Fri, 31 Jul 2020 16:33:11 +0000 (12:33 -0400)]
fbcon: prevent user font height or width change from causing potential out-of-bounds access
Add a check to fbcon_resize() to ensure that a possible change to user font
height or user font width will not allow a font data out-of-bounds access.
NOTE: must use original charcount in calculation as font charcount can
change and cannot be used to determine the font data allocated size.
vt: defer kfree() of vc_screenbuf in vc_do_resize()
syzbot is reporting UAF bug in set_origin() from vc_do_resize() [1], for
vc_do_resize() calls kfree(vc->vc_screenbuf) before calling set_origin().
Unfortunately, in set_origin(), vc->vc_sw->con_set_origin() might access
vc->vc_pos when scroll is involved in order to manipulate cursor, but
vc->vc_pos refers already released vc->vc_screenbuf until vc->vc_pos gets
updated based on the result of vc->vc_sw->con_set_origin().
Preserving old buffer and tolerating outdated vc members until set_origin()
completes would be easier than preventing vc->vc_sw->con_set_origin() from
accessing outdated vc members.
Masahiro Yamada [Mon, 17 Aug 2020 16:36:30 +0000 (01:36 +0900)]
kconfig: qconf: fix the popup menu in the ConfigInfoView window
I do not know when ConfigInfoView::createStandardContextMenu() is
called.
Because QTextEdit::createStandardContextMenu() is not virtual,
ConfigInfoView::createStandardContextMenu() cannot override it.
Even if right-click the ConfigInfoView window, the "Show Debug Info"
menu does not show up.
Build up the menu in the constructor, and invoke it from the
contextMenuEvent().
Masahiro Yamada [Mon, 17 Aug 2020 16:36:29 +0000 (01:36 +0900)]
kconfig: qconf: fix signal connection to invalid slots
If you right-click in the ConfigList window, you will see the following
messages in the console:
QObject::connect: No such slot QAction::setOn(bool) in scripts/kconfig/qconf.cc:888
QObject::connect: (sender name: 'config')
QObject::connect: No such slot QAction::setOn(bool) in scripts/kconfig/qconf.cc:897
QObject::connect: (sender name: 'config')
QObject::connect: No such slot QAction::setOn(bool) in scripts/kconfig/qconf.cc:906
QObject::connect: (sender name: 'config')
Right, there is no such slot in QAction. I think this is a typo of
setChecked.
Due to this bug, when you toggled the menu "Option->Show Name/Range/Data"
the state of the context menu was not previously updated. Fix this.
Fixes: d5d973c3f8a9 ("Port xconfig to Qt5 - Put back some of the old implementation(part 2)") Signed-off-by: Masahiro Yamada <[email protected]>
Samuel Thibault [Tue, 4 Aug 2020 16:06:37 +0000 (18:06 +0200)]
speakup: Fix wait_for_xmitr for ttyio case
This was missed while introducing the tty-based serial access.
The only remaining use of wait_for_xmitr with tty-based access is in
spk_synth_is_alive_restart to check whether the synth can be restarted.
With tty-based this is up to the tty layer to cope with the buffering
etc. so we can just say yes.
Bastien Nocera [Tue, 18 Aug 2020 11:04:45 +0000 (13:04 +0200)]
USB: Fix device driver race
When a new device with a specialised device driver is plugged in, the
new driver will be modprobe()'d but the driver core will attach the
"generic" driver to the device.
After that, nothing will trigger a reprobe when the modprobe()'d device
driver has finished initialising, as the device has the "generic"
driver attached to it.
Trigger a reprobe ourselves when new specialised drivers get registered.
JC Kuo [Tue, 11 Aug 2020 09:31:43 +0000 (17:31 +0800)]
usb: host: xhci-tegra: otg usb2/usb3 port init
tegra_xusb_init_usb_phy() should initialize "otg_usb2_port" and
"otg_usb3_port" with -EINVAL because "0" is a valid value
represents usb2 port 0 or usb3 port 0.
Andy Shevchenko [Fri, 14 Aug 2020 18:22:18 +0000 (21:22 +0300)]
usb: hcd: Fix use after free in usb_hcd_pci_remove()
On the removal stage we put a reference to the controller structure and
if it's not used anymore it gets freed, but later we try to dereference
a pointer to a member of that structure.
Copy necessary field to a temporary variable to avoid use after free.
Hans de Goede [Sun, 9 Aug 2020 14:19:04 +0000 (16:19 +0200)]
usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()
Commit 081da1325d35 ("usb: typec: ucsi: displayport: Fix a potential race
during registration") made the ucsi code hold con->lock in
ucsi_register_displayport(). But we really don't want any interactions
with the connector to run before the port-registration process is fully
complete.
This commit moves the taking of con->lock from ucsi_register_displayport()
into ucsi_register_port() to achieve this.
Hans de Goede [Sun, 9 Aug 2020 14:19:03 +0000 (16:19 +0200)]
usb: typec: ucsi: Rework ppm_lock handling
The ppm_lock really only needs to be hold during 2 functions:
ucsi_reset_ppm() and ucsi_run_command().
Push the taking of the lock down into these 2 functions, renaming
ucsi_run_command() to ucsi_send_command() which was an existing
wrapper already taking the lock for its callers.
This simplifies things for the callers and removes the difference
between ucsi_send_command() and ucsi_run_command() which has led
to various locking bugs in the past.
1. ucsi_handle_connector_change() contains one ucsi_send_command() call,
which takes the ppm_lock for it; and one ucsi_run_command() call which
relies on the caller have taking the ppm_lock.
ucsi_handle_connector_change() does not take the lock, so the
second (ucsi_run_command) calls should also be ucsi_send_command().
2. ucsi_get_pdos() gets called from ucsi_handle_connector_change() which
does not hold the ppm_lock, so it also must use ucsi_send_command().
This commit also adds a WARN_ON(!mutex_is_locked(&ucsi->ppm_lock)); to
ucsi_run_command() to avoid similar problems getting re-introduced in
the future.
Hans de Goede [Sun, 9 Aug 2020 14:19:01 +0000 (16:19 +0200)]
usb: typec: ucsi: Fix AB BA lock inversion
Lockdep reports an AB BA lock inversion between ucsi_init() and
ucsi_handle_connector_change():
AB order:
1. ucsi_init takes ucsi->ppm_lock (it runs with that locked for the
duration of the function)
2. usci_init eventually end up calling ucsi_register_displayport,
which takes ucsi_connector->lock
BA order:
1. ucsi_handle_connector_change work is started, takes ucsi_connector->lock
2. ucsi_handle_connector_change calls ucsi_send_command which takes
ucsi->ppm_lock
The ppm_lock really only needs to be hold during 2 functions:
ucsi_reset_ppm() and ucsi_run_command().
This commit fixes the AB BA lock inversion by making ucsi_init drop the
ucsi->ppm_lock before it starts registering ports; and replacing any
ucsi_run_command() calls after this point with ucsi_send_command()
(which is a wrapper around run_command taking the lock while handling
the command).
Some of the replacing of ucsi_run_command with ucsi_send_command
in the helpers used during port registration also fixes a number of
code paths after registration which call ucsi_run_command() without
holding the ppm_lock:
1. ucsi_altmode_update_active() call in ucsi/displayport.c
2. ucsi_register_altmodes() call from ucsi_handle_connector_change()
(through ucsi_partner_change())
M. Vefa Bicakci [Mon, 10 Aug 2020 16:00:17 +0000 (19:00 +0300)]
usbip: Implement a match function to fix usbip
Commit 88b7381a939d ("USB: Select better matching USB drivers when
available") introduced the use of a "match" function to select a
non-generic/better driver for a particular USB device. This
unfortunately breaks the operation of usbip in general, as reported in
the kernel bugzilla with bug 208267 (linked below).
Upon inspecting the aforementioned commit, one can observe that the
original code in the usb_device_match function used to return 1
unconditionally, but the aforementioned commit makes the usb_device_match
function use identifier tables and "match" virtual functions, if either of
them are available.
Hence, this commit implements a match function for usbip that
unconditionally returns true to ensure that usbip is functional again.
This change has been verified to restore usbip functionality, with a
v5.7.y kernel on an up-to-date version of Qubes OS 4.0, which uses
usbip to redirect USB devices between VMs.
Thanks to Jonathan Dieter for the effort in bisecting this issue down
to the aforementioned commit.
Evgeny Novikov [Wed, 5 Aug 2020 09:06:43 +0000 (12:06 +0300)]
USB: lvtest: return proper error code in probe
lvs_rh_probe() can return some nonnegative value from usb_control_msg()
when it is less than "USB_DT_HUB_NONVAR_SIZE + 2" that is considered as
a failure. Make lvs_rh_probe() return -EINVAL in this case.
Found by Linux Driver Verification project (linuxtesting.org).
Tom Rix [Sat, 1 Aug 2020 15:21:54 +0000 (08:21 -0700)]
USB: cdc-acm: rework notification_buffer resizing
Clang static analysis reports this error
cdc-acm.c:409:3: warning: Use of memory after it is freed
acm_process_notification(acm, (unsigned char *)dr);
There are three problems, the first one is that dr is not reset
The variable dr is set with
if (acm->nb_index)
dr = (struct usb_cdc_notification *)acm->notification_buffer;
But if the notification_buffer is too small it is resized with
if (acm->nb_size) {
kfree(acm->notification_buffer);
acm->nb_size = 0;
}
alloc_size = roundup_pow_of_two(expected_size);
/*
* kmalloc ensures a valid notification_buffer after a
* use of kfree in case the previous allocation was too
* small. Final freeing is done on disconnect.
*/
acm->notification_buffer =
kmalloc(alloc_size, GFP_ATOMIC);
dr should point to the new acm->notification_buffer.
The second problem is any data in the notification_buffer is lost
when the pointer is freed. In the normal case, the current data
is accumulated in the notification_buffer here.
USB: quirks: Add no-lpm quirk for another Raydium touchscreen
There's another Raydium touchscreen needs the no-lpm quirk:
[ 1.339149] usb 1-9: New USB device found, idVendor=2386, idProduct=350e, bcdDevice= 0.00
[ 1.339150] usb 1-9: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 1.339151] usb 1-9: Product: Raydium Touch System
[ 1.339152] usb 1-9: Manufacturer: Raydium Corporation
...
[ 6.450497] usb 1-9: can't set config #1, error -110
Alan Stern [Mon, 10 Aug 2020 18:29:54 +0000 (14:29 -0400)]
USB: yurex: Fix bad gfp argument
The syzbot fuzzer identified a bug in the yurex driver: It passes
GFP_KERNEL as a memory-allocation flag to usb_submit_urb() at a time
when its state is TASK_INTERRUPTIBLE, not TASK_RUNNING:
Michael Roth [Tue, 11 Aug 2020 16:15:44 +0000 (11:15 -0500)]
powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
For a power9 KVM guest with XIVE enabled, running a test loop
where we hotplug 384 vcpus and then unplug them, the following traces
can be seen (generally within a few loops) either from the unplugged
vcpu:
cpu 65 (hwid 65) Ready to die...
Querying DEAD? cpu 66 (66) shows 2
list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 ...
CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
NIP: c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
REGS: c0000009e5a17840 TRAP: 0700 Not tainted (4.18.0-221.el8.ppc64le)
MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28000842 XER: 20040000
...
NIP __list_del_entry_valid+0xac/0x100
LR __list_del_entry_valid+0xa8/0x100
Call Trace:
__list_del_entry_valid+0xa8/0x100 (unreliable)
free_pcppages_bulk+0x1f8/0x940
free_unref_page+0xd0/0x100
xive_spapr_cleanup_queue+0x148/0x1b0
xive_teardown_cpu+0x1bc/0x240
pseries_mach_cpu_die+0x78/0x2f0
cpu_die+0x48/0x70
arch_cpu_idle_dead+0x20/0x40
do_idle+0x2f4/0x4c0
cpu_startup_entry+0x38/0x40
start_secondary+0x7bc/0x8f0
start_secondary_prolog+0x10/0x14
or on the worker thread handling the unplug:
pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2
BUG: Bad page state in process kworker/u768:3 pfn:95de1
cpu 314 (hwid 314) Ready to die...
page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
flags: 0x5ffffc00000000()
raw: 005ffffc000000005deadbeef00001005deadbeef00002000000000000000000
raw: 0000000000000000000000000000000000000000ffffff7f0000000000000000
page dumped because: nonzero mapcount
Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ...
CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
Workqueue: pseries hotplug workque pseries_hp_work_fn
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
bad_page+0x12c/0x1b0
free_pcppages_bulk+0x5bc/0x940
page_alloc_cpu_dead+0x118/0x120
cpuhp_invoke_callback.constprop.5+0xb8/0x760
_cpu_down+0x188/0x340
cpu_down+0x5c/0xa0
cpu_subsys_offline+0x24/0x40
device_offline+0xf0/0x130
dlpar_offline_cpu+0x1c4/0x2a0
dlpar_cpu_remove+0xb8/0x190
dlpar_cpu_remove_by_index+0x12c/0x150
dlpar_cpu+0x94/0x800
pseries_hp_work_fn+0x128/0x1e0
process_one_work+0x304/0x5d0
worker_thread+0xcc/0x7a0
kthread+0x1ac/0x1c0
ret_from_kernel_thread+0x5c/0x80
The latter trace is due to the following sequence:
where drain_pages() in this case is called under the assumption that
the unplugged cpu is no longer executing. To ensure that is the case,
and early call is made to __cpu_die()->pseries_cpu_die(), which runs a
loop that waits for the cpu to reach a halted state by polling its
status via query-cpu-stopped-state RTAS calls. It only polls for 25
iterations before giving up, however, and in the trace above this
results in the following being printed only .1 seconds after the
hotplug worker thread begins processing the unplug request:
pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
Querying DEAD? cpu 314 (314) shows 2
At that point the worker thread assumes the unplugged CPU is in some
unknown/dead state and procedes with the cleanup, causing the race
with the XIVE cleanup code executed by the unplugged CPU.
Fix this by waiting indefinitely, but also making an effort to avoid
spurious lockup messages by allowing for rescheduling after polling
the CPU status and printing a warning if we wait for longer than 120s.
Saurav Kashyap [Thu, 6 Aug 2020 11:10:13 +0000 (04:10 -0700)]
Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
FCoE adapter initialization failed for ISP8021 with the following patch
applied. In addition, reproduction of the issue the patch originally tried
to address has been unsuccessful.
Quinn Tran [Thu, 6 Aug 2020 11:10:12 +0000 (04:10 -0700)]
scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
NVMEAsync command is being submitted to QLA while the same NVMe controller
is in the middle of reset. The reset path has deleted the association and
freed aen_op->fcp_req.private. Add a check for this private pointer before
issuing the command.
Saurav Kashyap [Thu, 6 Aug 2020 11:10:11 +0000 (04:10 -0700)]
scsi: qla2xxx: Check if FW supports MQ before enabling
OS boot during Boot from SAN was stuck at dracut emergency shell after
enabling NVMe driver parameter. For non-MQ support the driver was enabling
MQ. Add a check to confirm if FW supports MQ.
Arun Easi [Thu, 6 Aug 2020 11:10:10 +0000 (04:10 -0700)]
scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
qla_nvme_register_hba() puts out a warning when there are not enough queue
pairs available for FC-NVME. Just fail the NVME registration rather than a
WARNING + call Trace.
Quinn Tran [Thu, 6 Aug 2020 11:10:07 +0000 (04:10 -0700)]
scsi: qla2xxx: Fix login timeout
Multipath errors were seen during failback due to login timeout. The
remote device sent LOGO, the local host tore down the session and did
relogin. The RSCN arrived indicates remote device is going through failover
after which the relogin is in a 20s timeout phase. At this point the
driver is stuck in the relogin process. Add a fix to delete the session as
part of abort/flush the login.
Quinn Tran [Thu, 6 Aug 2020 11:10:04 +0000 (04:10 -0700)]
scsi: qla2xxx: Flush all sessions on zone disable
On Zone Disable, certain switches would ignore all commands. This causes
timeout for both switch scan command and abort of that command. On
detection of this condition, all sessions will be shutdown.
If we pass in an offset which is larger than PAGE_SIZE, then
page_is_mergeable() thinks it's not mergeable with the previous bio_vec,
leading to a large number of bio_vecs being used. Use a slightly more
obvious test that the two pages are compatible with each other.
Fixes: 52d52d1c98a9 ("block: only allow contiguous page structs in a bio_vec") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Douglas Gilbert [Thu, 13 Aug 2020 15:57:38 +0000 (11:57 -0400)]
scsi: scsi_debug: Fix scp is NULL errors
John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).
The random=1 parameter setting was pivotal in finding this error. The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval. The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).
Steffen Maier [Thu, 13 Aug 2020 15:28:56 +0000 (17:28 +0200)]
scsi: zfcp: Fix use-after-free in request timeout handlers
Before v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use
timer_setup()"), we intentionally only passed zfcp_adapter as context
argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
adapter recovery, it was unnecessary to sync against races between timeout
and (late) completion. Likewise, we only passed zfcp_erp_action as context
argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
it was unnecessary to sync against races between timeout and (late)
completion.
Meanwhile the timeout handlers get timer_list as context argument and do a
timer-specific container-of to zfcp_fsf_req which can have been freed.
Fix it by making sure that any request timeout handlers, that might just
have started before del_timer(), are completed by using del_timer_sync()
instead. This ensures the request free happens afterwards.
Space time diagram of potential use-after-free:
Basic idea is to have 2 or more pending requests whose timeouts run out at
almost the same time.
Bean Huo [Tue, 11 Aug 2020 14:18:59 +0000 (16:18 +0200)]
scsi: ufs: No need to send Abort Task if the task in DB was cleared
If the bit corresponding to a task in the Doorbell register has been
cleared, no need to poll the status of the task on the device side and to
send an Abort Task TM. Instead, let it directly goto cleanup.
In addition, to keep original debug output, move the goto below the debug
print.
Stanley Chu [Tue, 11 Aug 2020 14:18:58 +0000 (16:18 +0200)]
scsi: ufs: Clean up completed request without interrupt notification
If somehow no interrupt notification is raised for a completed request and
its doorbell bit is cleared by host, UFS driver needs to cleanup its
outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally
in the following scenario:
After ufshcd_abort() returns, this request will be requeued by SCSI layer
with its outstanding bit set. Any future completed request will trigger
ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At
this time the "abnormal outstanding bit" will be detected and the "requeued
request" will be chosen to execute request post-processing flow. This is
wrong because this request is still "alive".
Adrian Hunter [Tue, 11 Aug 2020 13:39:35 +0000 (16:39 +0300)]
scsi: ufs: Fix interrupt error message for shared interrupts
The interrupt might be shared, in which case it is not an error for the
interrupt handler to be called when the interrupt status is zero, so don't
print the message unless there was enabled interrupt status.
Stanley Chu [Sun, 9 Aug 2020 05:07:34 +0000 (13:07 +0800)]
scsi: ufs: Fix possible infinite loop in ufshcd_hold
In ufshcd_suspend(), after clk-gating is suspended and link is set
as Hibern8 state, ufshcd_hold() is still possibly invoked before
ufshcd_suspend() returns. For example, MediaTek's suspend vops may
issue UIC commands which would call ufshcd_hold() during the command
issuing flow.
Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled,
then ufshcd_hold() may enter infinite loops because there is no
clk-ungating work scheduled or pending. In this case, ufshcd_hold()
shall just bypass, and keep the link as Hibern8 state.