For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the
following code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
- CPU0: sz = PUD_SIZE and *pud = 0 , continue
- CPU0: "pud_huge(*pud)" is false
- CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
- CPU0: "!pud_present(*pud)" is false, continue
- CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
We must make sure there is exactly one dereference of pud and pmd.
Kfifo has been written by Stefani Seibold and she's implicitly expected
to Ack any changes to it. She's not however officially listed as kfifo
maintainer which leads to delays in patch review. This patch proposes
to add an explitic entry for kfifo to MAINTAINERS file.
slub: avoid redzone when choosing freepointer location
Marco Elver reported system crashes when booting with "slub_debug=Z".
The freepointer location (s->offset) was not taking into account that
the "inuse" size that includes the redzone area should not be used by
the freelist pointer. Change the calculation to save the area of the
object that an inline freepointer may be written into.
Waiman Long [Tue, 21 Apr 2020 13:07:55 +0000 (09:07 -0400)]
blk-iocost: Fix error on iocost_ioc_vrate_adj
Systemtap 4.2 is unable to correctly interpret the "u32 (*missed_ppm)[2]"
argument of the iocost_ioc_vrate_adj trace entry defined in
include/trace/events/iocost.h leading to the following error:
/tmp/stapAcz0G0/stap_c89c58b83cea1724e26395efa9ed4939_6321_aux_6.c:78:8:
error: expected ‘;’, ‘,’ or ‘)’ before ‘*’ token
, u32[]* __tracepoint_arg_missed_ppm
That argument type is indeed rather complex and hard to read. Looking
at block/blk-iocost.c. It is just a 2-entry u32 array. By simplifying
the argument to a simple "u32 *missed_ppm" and adjusting the trace
entry accordingly, the compilation error was gone.
Ronnie Sahlberg [Tue, 21 Apr 2020 02:37:39 +0000 (12:37 +1000)]
cifs: protect updating server->dstaddr with a spinlock
We use a spinlock while we are reading and accessing the destination address for a server.
We need to also use this spinlock to protect when we are modifying this address from
reconn_set_ipaddr().
signal: Avoid corrupting si_pid and si_uid in do_notify_parent
Christof Meerwald <[email protected]> writes:
> Hi,
>
> this is probably related to commit
> 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.
>
> A simple test program (based on the sample from the signalfd man page):
>
> #include <sys/signalfd.h>
> #include <signal.h>
> #include <unistd.h>
> #include <spawn.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define handle_error(msg) \
> do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int main(int argc, char *argv[])
> {
> sigset_t mask;
> int sfd;
> struct signalfd_siginfo fdsi;
> ssize_t s;
>
> sigemptyset(&mask);
> sigaddset(&mask, SIGCHLD);
>
> if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> handle_error("sigprocmask");
>
> pid_t chldpid;
> char *chldargv[] = { "./sfdclient", NULL };
> posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
>
> sfd = signalfd(-1, &mask, 0);
> if (sfd == -1)
> handle_error("signalfd");
>
> for (;;) {
> s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> if (s != sizeof(struct signalfd_siginfo))
> handle_error("read");
>
> if (fdsi.ssi_signo == SIGCHLD) {
> printf("Got SIGCHLD %d %d %d %d\n",
> fdsi.ssi_status, fdsi.ssi_code,
> fdsi.ssi_uid, fdsi.ssi_pid);
> return 0;
> } else {
> printf("Read unexpected signal\n");
> }
> }
> }
>
>
> and a multi-threaded client to test with:
>
> #include <unistd.h>
> #include <pthread.h>
>
> void *f(void *arg)
> {
> sleep(100);
> }
>
> int main()
> {
> pthread_t t[8];
>
> for (int i = 0; i != 8; ++i)
> {
> pthread_create(&t[i], NULL, f, NULL);
> }
> }
>
> I tried to do a bit of debugging and what seems to be happening is
> that
>
> /* From an ancestor pid namespace? */
> if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
>
> fails inside task_pid_nr_ns because the check for "pid_alive" fails.
>
> This code seems to be called from do_notify_parent and there we
> actually have "tsk != current" (I am assuming both are threads of the
> current process?)
The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.
The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie. So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.
So change do_notify_parent to call __send_signal directly.
Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30. This fix only
backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.
Mark Rutland [Tue, 21 Apr 2020 12:10:14 +0000 (13:10 +0100)]
arm64: sync kernel APIAKey when installing
A direct write to a APxxKey_EL1 register requires a context
synchronization event to ensure that indirect reads made by subsequent
instructions (e.g. AUTIASP, PACIASP) observe the new value.
When we initialize the boot task's APIAKey in boot_init_stack_canary()
via ptrauth_keys_switch_kernel() we miss the necessary ISB, and so there
is a window where instructions are not guaranteed to use the new APIAKey
value. This has been observed to result in boot-time crashes where
PACIASP and AUTIASP within a function used a mixture of the old and new
key values.
Fix this by having ptrauth_keys_switch_kernel() synchronize the new key
value with an ISB. At the same time, __ptrauth_key_install() is renamed
to __ptrauth_key_install_nosync() so that it is obvious that this
performs no synchronization itself.
Shengjiu Wang [Tue, 21 Apr 2020 11:28:45 +0000 (19:28 +0800)]
ASoC: wm8960: Fix wrong clock after suspend & resume
After suspend & resume, wm8960_hw_params may be called when
bias_level is not SND_SOC_BIAS_ON, then wm8960_configure_clocking
is not called. But if sample rate is changed at that time, then
the output clock rate will be not correct.
So judgement of bias_level is SND_SOC_BIAS_ON in wm8960_hw_params
is not necessary and it causes above issue.
xhci: Don't clear hub TT buffer on ep0 protocol stall
The default control endpoint ep0 can return a STALL indicating the
device does not support the control transfer requests. This is called
a protocol stall and does not halt the endpoint.
xHC behaves a bit different. Its internal endpoint state will always
be halted on any stall, even if the device side of the endpiont is not
halted. So we do need to issue the reset endpoint command to clear the
xHC host intenal endpoint halt state, but should not request the HS hub
to clear the TT buffer unless device side of endpoint is halted.
Clearing the hub TT buffer at protocol stall caused ep0 to become
unresponsive for some FS/LS devices behind HS hubs, and class drivers
failed to set the interface due to timeout:
xhci: prevent bus suspend if a roothub port detected a over-current condition
Suspending the bus and host controller while a port is in a over-current
condition may halt the host.
Also keep the roothub running if over-current is active.
xhci: Fix handling halted endpoint even if endpoint ring appears empty
If a class driver cancels its only URB then the endpoint ring buffer will
appear empty to the xhci driver. xHC hardware may still process cached
TRBs, and complete with a STALL, halting the endpoint.
This halted endpoint was not handled correctly by xhci driver as events on
empty rings were all assumed to be spurious events.
xhci driver refused to restart the ring with EP_HALTED flag set, so class
driver was never informed the endpoint halted even if it queued new URBs.
The host side of the endpoint needs to be reset, and dequeue pointer should
be moved in order to clear the cached TRBs and resetart the endpoint.
Small adjustments in finding the new dequeue pointer are needed to support
the case of stall on an empty ring and unknown current TD.
In the reference BIOS implementation, WRDS can be disabled without
disabling WGDS. And this happens in most cases where WRDS is
disabled, causing the WGDS without WRDS check and issue an error.
To avoid this issue, we change the check so that we only considered it
an error if the WRDS entry doesn't exist. If the entry (or the
selected profile is disabled for any other reason), we just silently
ignore WGDS.
Johannes Berg [Fri, 17 Apr 2020 07:08:14 +0000 (10:08 +0300)]
iwlwifi: mvm: fix inactive TID removal return value usage
The function iwl_mvm_remove_inactive_tids() returns bool, so we
should just check "if (ret)", not "if (ret >= 0)" (which would
do nothing useful here). We obviously therefore cannot use the
return value of the function for the free_queue, we need to use
the queue (i) we're currently dealing with instead.
Johannes Berg [Fri, 17 Apr 2020 07:08:12 +0000 (10:08 +0300)]
iwlwifi: mvm: limit maximum queue appropriately
Due to some hardware issues, queue 31 isn't usable on devices that have
32 queues (7000, 8000, 9000 families), which is correctly reflected in
the configuration and TX queue initialization.
However, the firmware API and queue allocation code assumes that there
are 32 queues, and if something actually attempts to use #31 this leads
to a NULL-pointer dereference since it's not allocated.
Fix this by limiting to 31 in the IWL_MVM_DQA_MAX_DATA_QUEUE, and also
add some code to catch this earlier in the future, if the configuration
changes perhaps.
Johannes Berg [Fri, 17 Apr 2020 07:08:11 +0000 (10:08 +0300)]
iwlwifi: pcie: indicate correct RB size to device
In the context info, we need to indicate the correct RB size
to the device so that it will not think we have 4k when we
only use 2k. This seems to not have caused any issues right
now, likely because the hardware no longer supports putting
multiple entries into a single RB, and practically all of
the entries should be smaller than 2k.
Nevertheless, it's a bug, and we must advertise the right
size to the device.
Note that right now we can only tell it 2k vs. 4k, so for
the cases where we have more, still use 4k. This needs to
be fixed by the firmware first.
Johannes Berg [Fri, 17 Apr 2020 07:08:09 +0000 (10:08 +0300)]
iwlwifi: pcie: actually release queue memory in TVQM
The iwl_trans_pcie_dyn_txq_free() function only releases the frames
that may be left on the queue by calling iwl_pcie_gen2_txq_unmap(),
but doesn't actually free the DMA ring or byte-count tables for the
queue. This leads to pretty large memory leaks (at least before my
queue size improvements), in particular in monitor/sniffer mode on
channel hopping since this happens on every channel change.
This was also now more evident after the move to a DMA pool for the
byte count tables, showing messages such as
BUG iwlwifi:bc (...): Objects remaining in iwlwifi:bc on __kmem_cache_shutdown()
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=206811.
Chris Packham [Thu, 16 Apr 2020 22:19:08 +0000 (10:19 +1200)]
powerpc/setup_64: Set cache-line-size based on cache-block-size
If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use
the block-size value for both. Per the devicetree spec cache-line-size
is only needed if it differs from the block size.
Originally the code would fallback from block size to line size. An
error message was printed if both properties were missing.
Later the code was refactored to use clearer names and logic but it
inadvertently made line size a required property, meaning on systems
without a line size property we fall back to the default from the
cputable.
On powernv (OPAL) platforms, since the introduction of device tree CPU
features (5a61ef74f269 ("powerpc/64s: Support new device tree binding
for discovering CPU features")), that has led to the wrong value being
used, as the fallback value is incorrect for Power8/Power9 CPUs.
The incorrect values flow through to the VDSO and also to the sysconf
values, SC_LEVEL1_ICACHE_LINESIZE etc.
Commit 71bc0334a637 ("iwlwifi: check allocated pointer when allocating
conf_tlvs") attempted to fix a typoe introduced by commit 17b809c9b22e
("iwlwifi: dbg: move debug data to a struct") but does not implement the
check correctly.
The error handling code in usX2Y_rate_set() may hit a potential NULL
dereference when an error occurs before allocating all us->urb[].
Add a proper NULL check for fixing the corner case.
Gregor Pintar [Mon, 20 Apr 2020 21:40:30 +0000 (23:40 +0200)]
ALSA: usb-audio: Add quirk for Focusrite Scarlett 2i2
Force it to use asynchronous playback.
Same quirk has already been added for Focusrite Scarlett Solo (2nd gen)
with a commit 46f5710f0b88 ("ALSA: usb-audio: Add quirk for Focusrite
Scarlett Solo").
This also seems to prevent regular clicks when playing at 44100Hz
on Scarlett 2i2 (2nd gen). I did not notice any side effects.
Moved both quirks to snd_usb_audioformat_attributes_quirk() as suggested.
Luke Nelson [Sat, 18 Apr 2020 23:26:54 +0000 (16:26 -0700)]
bpf, selftests: Add test for BPF_STX BPF_B storing R10
This patch adds a test to test_verifier that writes the lower 8 bits of
R10 (aka FP) using BPF_B to an array map and reads the result back. The
expected behavior is that the result should be the same as first copying
R10 to R9, and then storing / loading the lower 8 bits of R9.
This test catches a bug that was present in the x86-64 JIT that caused
an incorrect encoding for BPF_STX BPF_B when the source operand is R10.
Luke Nelson [Sat, 18 Apr 2020 23:26:53 +0000 (16:26 -0700)]
bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B
This patch fixes an encoding bug in emit_stx for BPF_B when the source
register is BPF_REG_FP.
The current implementation for BPF_STX BPF_B in emit_stx saves one REX
byte when the operands can be encoded using Mod-R/M alone. The lower 8
bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using
a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers,
(e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit
equivalents (%sil, %dil, %bpl, %spl).
The current code checks if the source for BPF_STX BPF_B is BPF_REG_1
or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the
required REX prefix. However, it misses the case when the source is
BPF_REG_FP (mapped to %rbp).
The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand
will read from register %ch instead of the correct %bpl. This patch fixes
the problem by fixing and refactoring the check on which registers need
the extra REX byte. Since no BPF registers map to %rsp, there is no need
to handle %spl.
check_xadd() can cause check_ptr_to_btf_access() to be executed with
atype==BPF_READ and value_regno==-1 (meaning "just check whether the access
is okay, don't tell me what type it will result in").
Handle that case properly and skip writing type information, instead of
indexing into the registers at index -1 and writing into out-of-bounds
memory.
Note that at least at the moment, you can't actually write through a BTF
pointer, so check_xadd() will reject the program after calling
check_ptr_to_btf_access with atype==BPF_WRITE; but that's after the
verifier has already corrupted memory.
This patch assumes that BTF pointers are not available in unprivileged
programs.
bpf: Forbid XADD on spilled pointers for unprivileged users
When check_xadd() verifies an XADD operation on a pointer to a stack slot
containing a spilled pointer, check_stack_read() verifies that the read,
which is part of XADD, is valid. However, since the placeholder value -1 is
passed as `value_regno`, check_stack_read() can only return a binary
decision and can't return the type of the value that was read. The intent
here is to verify whether the value read from the stack slot may be used as
a SCALAR_VALUE; but since check_stack_read() doesn't check the type, and
the type information is lost when check_stack_read() returns, this is not
enforced, and a malicious user can abuse XADD to leak spilled kernel
pointers.
Fix it by letting check_stack_read() verify that the value is usable as a
SCALAR_VALUE if no type information is passed to the caller.
To be able to use __is_pointer_value() in check_stack_read(), move it up.
Fix up the expected unprivileged error message for a BPF selftest that,
until now, assumed that unprivileged users can use XADD on stack-spilled
pointers. This also gives us a test for the behavior introduced in this
patch for free.
In theory, this could also be fixed by forbidding XADD on stack spills
entirely, since XADD is a locked operation (for operations on memory with
concurrency) and there can't be any concurrency on the BPF stack; but
Alexei has said that he wants to keep XADD on stack slots working to avoid
changes to the test suite [1].
The following BPF program demonstrates how to leak a BPF map pointer as an
unprivileged user using this bug:
cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabled
When the kernel is built with CONFIG_DEBUG_PER_CPU_MAPS, the cpumap code
can trigger a spurious warning if CONFIG_CPUMASK_OFFSTACK is also set. This
happens because in this configuration, NR_CPUS can be larger than
nr_cpumask_bits, so the initial check in cpu_map_alloc() is not sufficient
to guard against hitting the warning in cpumask_check().
Fix this by explicitly checking the supplied key against the
nr_cpumask_bits variable before calling cpu_possible().
Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.
Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE. That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down. That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.
This addresses the problem by catching that case and returning to the guest
instead.
For completeness, this fixes the radix page fault handler in the same
way. For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.
Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402 Reported-by: David Gibson <[email protected]> Tested-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
net/mlx5e: Get the latest values from counters in switchdev mode
In the switchdev mode, when running "cat
/sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
accessed to get the latest values. But currently this command can
not get the correct values from ppcnt.
From firmware manual, before getting the 802_3 counters, the 802_3
data layout should be set to the ppcnt register.
When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
run, before updating 802_3 data layout with ppcnt register, the
monitor counters are tested. The test result will decide the
802_3 data layout is updated or not.
Actually the monitor counters do not support to monitor rx/tx
stats of 802_3 in switchdev mode. So the rx/tx counters change
will not trigger monitor counters. So the 802_3 data layout will
not be updated in ppcnt register. Finally this command can not get
the latest values from ppcnt register with 802_3 data layout.
Fixes: 5c7e8bbb0257 ("net/mlx5e: Use monitor counters for update stats") Signed-off-by: Zhu Yanjun <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
net/mlx5: Kconfig: convert imply usage to weak dependency
MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
MLXFW and PCI_HYPERV_INTERFACE.
This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
regardless of their config states.
Due to the changes in the cited commit below, the semantics of 'imply'
was changed to not force any restriction on the implied config.
As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
would result in undefined references, as VXLAN now would stay as 'm'.
To fix this we change MLX5_CORE to have a weak dependency on
these modules/configs and make sure they are reachable, by adding:
depend on symbol || !symbol.
For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns
XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
ICOSQ. When the application floods the driver with wakeup requests by
calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
the XSK ICOSQ may overflow.
Multiple NOPs are not required and won't accelerate the process, so
avoid posting a second NOP if there is one already on the way. This way
we also avoid increasing the queue size (which might not help anyway).
net/mlx5: Fix failing fw tracer allocation on s390
On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
allocation as done for the firmware tracer will always fail.
Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
that copies the debug data into the trace array and there is no need for
the allocation to be contiguous in physical memory. We can therefor use
kvzalloc() instead of kzalloc() and get rid of the large contiguous
allcoation.
Paul Moore [Mon, 20 Apr 2020 20:24:34 +0000 (16:24 -0400)]
audit: check the length of userspace generated audit records
Commit 756125289285 ("audit: always check the netlink payload length
in audit_receive_msg()") fixed a number of missing message length
checks, but forgot to check the length of userspace generated audit
records. The good news is that you need CAP_AUDIT_WRITE to submit
userspace audit records, which is generally only given to trusted
processes, so the impact should be limited.
For the algorithm that does not match the bank, a positive
value EINVAL is returned here. I think this is a typo error.
It is necessary to return an error value.
George Wilson [Fri, 20 Mar 2020 03:27:58 +0000 (23:27 -0400)]
tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send()
tpm_ibmvtpm_send() can fail during PowerVM Live Partition Mobility resume
with an H_CLOSED return from ibmvtpm_send_crq(). The PAPR says, 'The
"partner partition suspended" transport event disables the associated CRQ
such that any H_SEND_CRQ hcall() to the associated CRQ returns H_Closed
until the CRQ has been explicitly enabled using the H_ENABLE_CRQ hcall.'
This patch adds a check in tpm_ibmvtpm_send() for an H_CLOSED return from
ibmvtpm_send_crq() and in that case calls tpm_ibmvtpm_resume() and
retries the ibmvtpm_send_crq() once.
When team mode is changed or set, the team_mode_get() is called to check
whether the mode module is inserted or not. If the mode module is not
inserted, it calls the request_module().
In the request_module(), it creates a child process, which is
the "modprobe" process and waits for the done of the child process.
At this point, the following locks were used.
down_read(&cb_lock()); by genl_rcv()
genl_lock(); by genl_rcv_msc()
rtnl_lock(); by team_nl_cmd_options_set()
mutex_lock(&team->lock); by team_nl_team_get()
Concurrently, the team module could be removed by rmmod or "modprobe -r"
The __exit function of team module is team_module_exit(), which calls
team_nl_fini() and it tries to acquire following locks.
down_write(&cb_lock);
genl_lock();
Because of the genl_lock() and cb_lock, this process can't be finished
earlier than request_module() routine.
The problem secenario.
CPU0 CPU1
team_mode_get
request_module()
modprobe -r team_mode_roundrobin
team <--(B)
modprobe team <--(A)
team_mode_roundrobin
By request_module(), the "modprobe team_mode_roundrobin" command
will be executed. At this point, the modprobe process will decide
that the team module should be inserted before team_mode_roundrobin.
Because the team module is being removed.
By the module infrastructure, the same module insert/remove operations
can't be executed concurrently.
So, (A) waits for (B) but (B) also waits for (A) because of locks.
So that the hang occurs at this point.
Test commands:
while :
do
teamd -d &
killall teamd &
modprobe -rv team_mode_roundrobin &
done
The approach of this patch is to hold the reference count of the team
module if the team module is compiled as a module. If the reference count
of the team module is not zero while request_module() is being called,
the team module will not be removed at that moment.
So that the above scenario could not occur.
David S. Miller [Mon, 20 Apr 2020 19:59:33 +0000 (12:59 -0700)]
Merge branch 'mptcp-fix-races-on-accept'
Paolo Abeni says:
====================
mptcp: fix races on accept()
This series includes some fixes for accept() races which may cause inconsistent
MPTCP socket status and oops. Please see the individual patches for the
technical details.
====================
Paolo Abeni [Mon, 20 Apr 2020 14:25:06 +0000 (16:25 +0200)]
mptcp: drop req socket remote_key* fields
We don't need them, as we can use the current ingress opt
data instead. Setting them in syn_recv_sock() may causes
inconsistent mptcp socket status, as per previous commit.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Mon, 20 Apr 2020 14:25:05 +0000 (16:25 +0200)]
mptcp: avoid flipping mp_capable field in syn_recv_sock()
If multiple CPUs races on the same req_sock in syn_recv_sock(),
flipping such field can cause inconsistent child socket status.
When racing, the CPU losing the req ownership may still change
the mptcp request socket mp_capable flag while the CPU owning
the request is cloning the socket, leaving the child socket with
'is_mptcp' set but no 'mp_capable' flag.
Such socket will stay with 'conn' field cleared, heading to oops
in later mptcp callback.
Address the issue tracking the fallback status in a local variable.
Problem is that tcp_child_process() calls listen sockets'
sk_data_ready() notification, but it doesn't hold the listener
lock. Another cpu calling close() on the listener will then cause
transition of refcount to 0.
Fetching PTP sync information from mailbox is slow and can take
up to 10 milliseconds. Reduce this unnecessary delay by directly
reading the information from the corresponding registers.
Fixes: 9c33e4208bce ("cxgb4: Add PTP Hardware Clock (PHC) support") Signed-off-by: Manoj Malviya <[email protected]> Signed-off-by: Rahul Lakkireddy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge tag 'omap-for-v5.6/fixes-rc7-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Boot regression fix for N950/N9
We need to tag RNG as disabled for N950/N9 as it blocked by the secure
mode. We have a similar change done for N900, but I missed adding it
for N950/N9 with the recent RNG changes.
* tag 'omap-for-v5.6/fixes-rc7-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: OMAP3: disable RNG on N950/N9
The buggy address belongs to the variable:
div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]
Memory state around the buggy address: ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00 ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
>ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
^ ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
==================================================================
Digging into this indeed shows that the clock divider array is
lacking a final fence, and that the clock subsystems goes in the
weeds. Oh well.
Let's add the empty structure that indicates the end of the array.
Fixes: bd6f48546b9c ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs") Signed-off-by: Marc Zyngier <[email protected]> Cc: Martin Blumenstingl <[email protected]> Reviewed-by: Martin Blumenstingl <[email protected]> Signed-off-by: David S. Miller <[email protected]>
John Haxby [Sat, 18 Apr 2020 15:30:49 +0000 (16:30 +0100)]
ipv6: fix restrict IPV6_ADDRFORM operation
Commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
problem found by syzbot an unfortunate logic error meant that it
also broke IPV6_ADDRFORM.
Rearrange the checks so that the earlier test is just one of the series
of checks made before moving the socket from IPv6 to IPv4.
vdso/datapage: Use correct clock mode name in comment
While the explanation for time namespace <-> vdso interactions is very
helpful it uses the wrong name in the comment when describing the clock
mode making grepping a bit annoying.
This seems like an accidental oversight when moving from VCLOCK_TIMENS
to VDSO_CLOCKMODE_TIMENS. It seems that 660fd04f9317 ("lib/vdso: Prepare for time namespace support") misspelled
VCLOCK_TIMENS as VLOCK_TIMENS which explains why it got missed when
VCLOCK_TIMENS became VDSO_CLOCKMODE_TIMENS in 2d6b01bd88cc ("lib/vdso: Move VCLOCK_TIMENS to vdso_clock_modes").
Fix the warning caused by enabling the autosectionlabel extension in the
kernel Sphinx build:
Documentation/gpu/i915.rst:610: WARNING: duplicate label
gpu/i915:layout, other instance in Documentation/gpu/i915.rst
The autosectionlabel extension adds labels to each section title for
cross-referencing, but forbids identical section titles in a
document. With kernel-doc, this includes sections titles in the included
kernel-doc comments.
In the warning message, Sphinx is unable to reference the labels in
their true locations in the kernel-doc comments in source. In this case,
there's "Layout" sections in both gt/intel_workarounds.c and
i915_reg.h. Rename the section in the latter to "File Layout".
drm/i915/display: Load DP_TP_CTL/STATUS offset before use it
Right now dp.regs.dp_tp_ctl/status are only set during the encoder
pre_enable() hook, what is causing all reads and writes to those
registers to go to offset 0x0 before pre_enable() is executed.
So if i915 takes the BIOS state and don't do a modeset any following
link retraing will fail.
In the case that i915 needs to do a modeset, the DDI disable sequence
will write to a wrong register not disabling DP 'Transport Enable' in
DP_TP_CTL, making a HDMI modeset in the same port/transcoder to
not light up the monitor.
So here for GENs older than 12, that have those registers fixed at
port offset range it is loading at encoder/port init while for GEN12
it will keep setting it at encoder pre_enable() and during HW state
readout.
Chris Wilson [Wed, 15 Apr 2020 17:03:18 +0000 (18:03 +0100)]
drm/i915/gt: Update PMINTRMSK holding fw
If we use a non-forcewaked write to PMINTRMSK, it does not take effect
until much later, if at all, causing a loss of RPS interrupts and no GPU
reclocking, leaving the GPU running at the wrong frequency for long
periods of time.
Douglas Anderson [Tue, 24 Mar 2020 21:48:27 +0000 (14:48 -0700)]
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
While trying to "dd" to the block device for a USB stick, I
encountered a hung task warning (blocked for > 120 seconds). I
managed to come up with an easy way to reproduce this on my system
(where /dev/sdb is the block device for my USB stick) with:
while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done
With my reproduction here are the relevant bits from the hung task
detector:
INFO: task udevd:294 blocked for more than 122 seconds.
...
udevd D 0 294 1 0x00400008
Call trace:
...
mutex_lock_nested+0x40/0x50
__blkdev_get+0x7c/0x3d4
blkdev_get+0x118/0x138
blkdev_open+0x94/0xa8
do_dentry_open+0x268/0x3a0
vfs_open+0x34/0x40
path_openat+0x39c/0xdf4
do_filp_open+0x90/0x10c
do_sys_open+0x150/0x3c8
...
...
Showing all locks held in the system:
...
1 lock held by dd/2798:
#0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
...
dd D 0 2798 2764 0x00400208
Call trace:
...
schedule+0x8c/0xbc
io_schedule+0x1c/0x40
wait_on_page_bit_common+0x238/0x338
__lock_page+0x5c/0x68
write_cache_pages+0x194/0x500
generic_writepages+0x64/0xa4
blkdev_writepages+0x24/0x30
do_writepages+0x48/0xa8
__filemap_fdatawrite_range+0xac/0xd8
filemap_write_and_wait+0x30/0x84
__blkdev_put+0x88/0x204
blkdev_put+0xc4/0xe4
blkdev_close+0x28/0x38
__fput+0xe0/0x238
____fput+0x1c/0x28
task_work_run+0xb0/0xe4
do_notify_resume+0xfc0/0x14bc
work_pending+0x8/0x14
The problem appears related to the fact that my USB disk is terribly
slow and that I have a lot of RAM in my system to cache things.
Specifically my writes seem to be happening at ~15 MB/s and I've got
~4 GB of RAM in my system that can be used for buffering. To write 4
GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.
The 267 second number is a problem because in __blkdev_put() we call
sync_blockdev() while holding the bd_mutex. Any other callers who
want the bd_mutex will be blocked for the whole time.
The problem is made worse because I believe blkdev_put() specifically
tells other tasks (namely udev) to go try to access the device at right
around the same time we're going to hold the mutex for a long time.
Putting some traces around this (after disabling the hung task detector),
I could confirm:
dd: 437.608600: __blkdev_put() right before sync_blockdev() for sdb
udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
dd: 661.468451: __blkdev_put() right after sync_blockdev() for sdb
udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb
A simple fix for this is to realize that sync_blockdev() works fine if
you're not holding the mutex. Also, it's not the end of the world if
you sync a little early (though it can have performance impacts).
Thus we can make a guess that we're going to need to do the sync and
then do it without holding the mutex. We still do one last sync with
the mutex but it should be much, much faster.
With this, my hung task warnings for my test case are gone.
Benjamin Lee [Fri, 17 Apr 2020 18:45:38 +0000 (11:45 -0700)]
mei: me: fix irq number stored in hw struct
Commit 261b3e1f2a01 ("mei: me: store irq number in the hw struct.")
stores the irq number in the hw struct before MSI is enabled. This
caused a regression for mei_me_synchronize_irq() waiting for the wrong
irq number. On my laptop this causes a hang on shutdown. Fix the issue
by storing the irq number after enabling MSI.
sound/soc/codecs/wm8900.o: In function `wm8900_i2c_probe':
wm8900.c:(.text+0xa36): undefined reference to `__devm_regmap_init_i2c'
sound/soc/codecs/wm8900.o: In function `wm8900_modinit':
wm8900.c:(.init.text+0xb): undefined reference to `i2c_register_driver'
sound/soc/codecs/wm8900.o: In function `wm8900_exit':
wm8900.c:(.exit.text+0x8): undefined reference to `i2c_del_driver'
sound/soc/codecs/wm8988.o: In function `wm8988_i2c_probe':
wm8988.c:(.text+0x857): undefined reference to `__devm_regmap_init_i2c'
sound/soc/codecs/wm8988.o: In function `wm8988_modinit':
wm8988.c:(.init.text+0xb): undefined reference to `i2c_register_driver'
sound/soc/codecs/wm8988.o: In function `wm8988_exit':
wm8988.c:(.exit.text+0x8): undefined reference to `i2c_del_driver'
sound/soc/codecs/wm8995.o: In function `wm8995_i2c_probe':
wm8995.c:(.text+0x1c4f): undefined reference to `__devm_regmap_init_i2c'
sound/soc/codecs/wm8995.o: In function `wm8995_modinit':
wm8995.c:(.init.text+0xb): undefined reference to `i2c_register_driver'
sound/soc/codecs/wm8995.o: In function `wm8995_exit':
wm8995.c:(.exit.text+0x8): undefined reference to `i2c_del_driver'
vhost is currently broken on the some ARM configs.
The reason is that the ring element addresses are passed between
components with different alignments assumptions. Thus, if
guest selects a pointer and host then gets and dereferences
it, then alignment assumed by the host's compiler might be
greater than the actual alignment of the pointer.
compiler on the host from assuming pointer is aligned.
This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration. With this OABI, compiler assumes that
all structures are 4 byte aligned - which is stronger than
virtio guarantees for available and used rings, which are
merely 2 bytes. Thus a guest without -mabi=apcs-gnu running
on top of host with -mabi=apcs-gnu will be broken.
The correct fix is to force alignment of structures - however
that is an intrusive fix that's best deferred until the next release.
We didn't previously support such ancient systems at all - this surfaced
after vdpa support prompted removing dependency of vhost on
VIRTULIZATION. So for now, let's just add something along the lines of
depends on !ARM || AEABI
to the virtio Kconfig declaration, and add a comment that it has to do
with struct member alignment.
Note: we can't make VHOST and VHOST_RING themselves have
a dependency since these are selected. Add a new symbol for that.
We should be able to drop this dependency down the road.
Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel <[email protected]> Suggested-by: Richard Earnshaw <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
Mark Brown [Mon, 20 Apr 2020 13:35:08 +0000 (14:35 +0100)]
Merge series "ASoC: rsnd: multi-SSI setup fixes" from Matthias Blankertz <[email protected]>:
Fix rsnd_dai_call() operations being performed twice for the master SSI
in multi-SSI setups, and fix the rsnd_ssi_stop operation for multi-SSI
setups.
The only visible effect of these issues was some "status check failed"
spam when the rsnd_ssi_stop was called, but overall the code is cleaner
now, and some questionable writes to the SSICR register which did not
lead to any observable misbehaviour but were contrary to the datasheet
are fixed.
Mark:
The first patch kind of reverts my "ASoC: rsnd: Fix parent SSI
start/stop in multi-SSI mode" from a few days ago and achieves the same
effect in a simpler fashion, if you would prefer a clean patch series
based on v5.6 drop me a note.
Greetings,
Matthias
Matthias Blankertz (2):
ASoC: rsnd: Don't treat master SSI in multi SSI setup as parent
ASoC: rsnd: Fix "status check failed" spam for multi-SSI
Mark Brown [Mon, 20 Apr 2020 13:35:07 +0000 (14:35 +0100)]
Merge series "ASoC: meson: fix codec-to-codec link setup" from Jerome Brunet <[email protected]>:
This patchset fixes the problem reported by Marc in this thread [0]
The problem was due to an error in the meson card drivers which had
the "no_pcm" dai_link property set on codec-to-codec links
Gyeongtaek Lee [Sat, 18 Apr 2020 04:13:20 +0000 (13:13 +0900)]
ASoC: dapm: fixup dapm kcontrol widget
snd_soc_dapm_kcontrol widget which is created by autodisable control
should contain correct on_val, mask and shift because it is set when the
widget is powered and changed value is applied on registers by following
code in dapm_seq_run_coalesced().
mask |= w->mask << w->shift;
if (w->power)
value |= w->on_val << w->shift;
else
value |= w->off_val << w->shift;
Shift on the mask in dapm_kcontrol_data_alloc() is removed to prevent
double shift.
And, on_val in dapm_kcontrol_set_value() is modified to get correct
value in the dapm_seq_run_coalesced().
ASoC: rsnd: Fix "status check failed" spam for multi-SSI
Fix the rsnd_ssi_stop function to skip disabling the individual SSIs of
a multi-SSI setup, as the actual stop is performed by rsnd_ssiu_stop_gen2
- the same logic as in rsnd_ssi_start. The attempt to disable these SSIs
was harmless, but caused a "status check failed" message to be printed
for every SSI in the multi-SSI setup.
The disabling of interrupts is still performed, as they are enabled for
all SSIs in rsnd_ssi_init, but care is taken to not accidentally set the
EN bit for an SSI where it was not set by rsnd_ssi_start.
ASoC: rsnd: Don't treat master SSI in multi SSI setup as parent
The master SSI of a multi-SSI setup was attached both to the
RSND_MOD_SSI slot and the RSND_MOD_SSIP slot of the rsnd_dai_stream.
This is not correct wrt. the meaning of being "parent" in the rest of
the SSI code, where it seems to indicate an SSI that provides clock and
word sync but is not transmitting/receiving audio data.
Not treating the multi-SSI master as parent allows removal of various
special cases to the rsnd_ssi_is_parent conditions introduced in commit a09fb3f28a60 ("ASoC: rsnd: Fix parent SSI start/stop in multi-SSI mode").
It also fixes the issue that operations performed via rsnd_dai_call()
were performed twice for the master SSI. This caused some "status check
failed" spam when stopping a multi-SSI stream as the driver attempted to
stop the master SSI twice.
ASoC: meson: gx-card: fix codec-to-codec link setup
Since the addition of commit 9b5db059366a ("ASoC: soc-pcm: dpcm: Only allow
playback/capture if supported"), meson-axg cards which have codec-to-codec
links fail to init and Oops.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000128
Internal error: Oops: 96000044 [#1] PREEMPT SMP
CPU: 3 PID: 1582 Comm: arecord Not tainted 5.7.0-rc1
pc : invalidate_paths_ep+0x30/0xe0
lr : snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
Call trace:
invalidate_paths_ep+0x30/0xe0
snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
dpcm_path_get+0x38/0xd0
dpcm_fe_dai_open+0x70/0x920
snd_pcm_open_substream+0x564/0x840
snd_pcm_open+0xfc/0x228
snd_pcm_capture_open+0x4c/0x78
snd_open+0xac/0x1a8
...
While this error was initially reported the axg-card type, it also applies
to the gx-card type.
While initiliazing the links, ASoC treats the codec-to-codec links of this
card type as a DPCM backend. This error eventually leads to the Oops.
Most of the card driver code is shared between DPCM backends and
codec-to-codec links. The property "no_pcm" marking DCPM BE was left set on
codec-to-codec links, leading to this problem. This commit fixes that.
ASoC: meson: axg-card: fix codec-to-codec link setup
Since the addition of commit 9b5db059366a ("ASoC: soc-pcm: dpcm: Only allow
playback/capture if supported"), meson-axg cards which have codec-to-codec
links fail to init and Oops:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000128
Internal error: Oops: 96000044 [#1] PREEMPT SMP
CPU: 3 PID: 1582 Comm: arecord Not tainted 5.7.0-rc1
pc : invalidate_paths_ep+0x30/0xe0
lr : snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
Call trace:
invalidate_paths_ep+0x30/0xe0
snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
dpcm_path_get+0x38/0xd0
dpcm_fe_dai_open+0x70/0x920
snd_pcm_open_substream+0x564/0x840
snd_pcm_open+0xfc/0x228
snd_pcm_capture_open+0x4c/0x78
snd_open+0xac/0x1a8
...
While initiliazing the links, ASoC treats the codec-to-codec links of this
card type as a DPCM backend. This error eventually leads to the Oops.
Most of the card driver code is shared between DPCM backends and
codec-to-codec links. The property "no_pcm" marking DCPM BE was left set on
codec-to-codec links, leading to this problem. This commit fixes that.
Merge tag 'iio-fixes-for-5.7a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First set of IIO fixes for the 5.7 cycle.
Includes one MAINTAINERS update to avoid people getting a lot of bounce
messages and complaining about it.
* MAINTAINERS
- Drop Stefan Popa's Analog Devices email address in favour of
Michael Hennerich.
* core
- Fix handling of dB sysfs inputs.
- Drop a stray semi colon in macro definition.
* ad5770r
- Fix an off by one in chec on maximum number of channels.
* ad7192
- Fix a null pointer de-reference due to the name previously being
retrieved from the spi_get_device_id call which no longer works as
the relevant table was removed.
* ad7797
- Use correct attribute group.
* counter/104-quad-8
- Add locks to prevent some race conditions.
* inv-mpu6050
- Fix issues around suspend / resume clashing with runtime PM.
* stm32-adc
- Fix sleep in invalid context
- Fix id relative path error in device tree binding doc.
* st_lsm6dsx
- Fix a read alignment issue on an untagged FIFO.
- Handle odr for slave to properly compute the FIFO data layout / pattern.
- Flush the HW FIFO before resettting the device to avoid a race on
interrupt line 1.
* st_sensors
- Rely on ODR mask not ODR address to identify if the ODR can be set.
Some devices have an ODR address of 0.
* ti-ads8344
- Byte ordering was wrong - fix it.
* xilinx-xadc
- Fix inverted logic in powering down the second ADC.
- Fix clearing interrupt when enabling the trigger.
- Fix configuration of sequencer when in simultaneous sampling mode.
- Limit initial sampling rate as done for runtime configured ones.
* tag 'iio-fixes-for-5.7a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
MAINTAINERS: remove Stefan Popa's email
iio: adc: ad7192: fix null pointer de-reference crash during probe
iio: core: remove extra semi-colon from devm_iio_device_register() macro
iio: adc: ti-ads8344: properly byte swap value
iio: imu: inv_mpu6050: fix suspend/resume with runtime power
iio: st_sensors: rely on odr mask to know if odr can be set
iio: xilinx-xadc: Make sure not exceed maximum samplerate
iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode
iio: xilinx-xadc: Fix clearing interrupt when enabling trigger
iio: xilinx-xadc: Fix ADC-B powerdown
iio: dac: ad5770r: fix off-by-one check on maximum number of channels
iio: imu: st_lsm6dsx: flush hw FIFO before resetting the device
iio: core: Fix handling of 'dB'
dt-bindings: iio: adc: stm32-adc: fix id relative path
counter: 104-quad-8: Add lock guards - generic interface
iio: imu: st_lsm6dsx: specify slave odr in slv_odr
iio: imu: st_lsm6dsx: fix read misalignment on untagged FIFO
iio: adc: stm32-adc: fix sleep in atomic context
iio:ad7797: Use correct attribute_group
Merge tag 'fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
USB: fixes for v5.7-rc2
DWC3 learns how to properly set maxpacket limit and got a fix for a
request completion bug. The raw gadget got a fix for
copy_to/from_user() checks. Atmel got an improvement on vbus
disconnect handling.
We're also adding support for another SoC to the Renesas DRD driver.
* tag 'fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: raw-gadget: Fix copy_to/from_user() checks
usb: raw-gadget: fix raw_event_queue_fetch locking
usb: gadget: udc: atmel: Fix vbus disconnect handling
usb: dwc3: gadget: Fix request completion check
usb: dwc3: gadget: Do link recovery for SS and SSP
dt-bindings: usb: renesas,usb3-peri: add r8a77961 support
dt-bindings: usb: renesas,usbhs: add r8a77961 support
dt-bindings: usb: usb-xhci: add r8a77961 support
docs: dt: qcom,dwc3.txt: fix cross-reference for a converted file
usb: dwc3: gadget: Properly set maxpacket limit
usb: dwc3: Fix GTXFIFOSIZ.TXFDEP macro name
usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete
Eric Farman [Wed, 15 Apr 2020 19:03:53 +0000 (21:03 +0200)]
KVM: s390: Fix PV check in deliverable_irqs()
The diag 0x44 handler, which handles a directed yield, goes into a
a codepath that does a kvm_for_each_vcpu() and ultimately
deliverable_irqs(). The new check for kvm_s390_pv_cpu_is_protected()
contains an assertion that the vcpu->mutex is held, which isn't going
to be the case in this scenario.
The result is a plethora of these messages if the lock debugging
is enabled, and thus an implication that we have a problem.
WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm]
...snip...
Call Trace:
[<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm]
([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm])
[<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm]
[<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm]
[<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm]
[<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm]
[<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm]
[<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm]
[<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm]
[<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm]
[<00000001504ced06>] ksys_ioctl+0xae/0xe8
[<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38
[<0000000150cb9034>] system_call+0xd8/0x2d8
2 locks held by CPU 2/KVM/16167:
#0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm]
#1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm]
Last Breaking-Event-Address:
[<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]
irq event stamp: 11967
hardirqs last enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650
hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650
softirqs last enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8
softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80
Considering what's being done here, let's fix this by removing the
mutex assertion rather than acquiring the mutex for every other vcpu.
Hans de Goede [Sun, 19 Apr 2020 15:16:13 +0000 (17:16 +0200)]
ACPI: button: Drop no longer necessary Asus T200TA lid_init_state quirk
Commit 17e5888e4e18 ("x86: Select HARDIRQS_SW_RESEND on x86") fixes
the edge-triggered embedded-controller (WC) IRQ not being replayed after
resume when woken by opening the lid, which gets signaled by the EC.
This means that the lid_init_state=ACPI_BUTTON_LID_INIT_OPEN quirk for
the Asus T200TA is no longer necessary, the lid now works properly
without it, so drop the quirk.
Fixes: 17e5888e4e18 ("x86: Select HARDIRQS_SW_RESEND on x86") Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
sleepgraph:
- force usage of python3 instead of using system default
- fix bugzilla 204773 (https://bugzilla.kernel.org/show_bug.cgi?id=204773)
- fix issue of platform info not being reset in -multi (logs fill up)
- change -ftop call to "pm_suspend", this is one level below state_store
- add -wificheck command to read out the current wifi device details
- change -wifi behavior to poll /proc/net/wireless for wifi connect
- add wifi reconnect time to timeline, include time in summary column
- add "fail on wifi_resume" to timeline and summary when wifi fails
- add a set of commands to collect data before/after suspend in the log
- add "-cmdinfo" command which prints out all the data collected
- check for cmd info tools at start, print found/missing in green/red
- fix kernel suspend time calculation: tool used to look for start of
pm_suspend_console, but the order has changed. latest kernel starts
with ksys_sync, use this instead
- include time spent in mem/disk in the header (same as freeze/standby)
- ignore turbostat 32-bit capability warnings
- print to result.txt when -skiphtml is used, just say result: pass
- don't exit on SIGTSTP, it's a ctrl-Z and the tool may come back
- -multi argument supports duration as well as count: hours, minutes, seconds
- update the -multi status output to be more informative
- -maxfail sets maximum consecutive fails before a -multi run is aborted
- in -summary, ignore dmesg/ftrace/html files that are 0 size
bootgraph:
- force usage of python3 instead of using system default
Hans de Goede [Mon, 13 Apr 2020 13:09:49 +0000 (15:09 +0200)]
ACPI/PCI: pci_link: use extended_irq union member when setting ext-irq shareable
The case ACPI_RESOURCE_TYPE_EXTENDED_IRQ inside acpi_pci_link_set()
is correctly using resource->res.data.extended_irq.foo for most settings,
but for the shareable setting it so far has accidentally been using
resource->res.data.irq.shareable instead of
resource->res.data.extended_irq.shareable.
Note that the old code happens to also work because the shareable field
offset is the same for both the acpi_resource_irq and
acpi_resource_extended_irq structs.
Tomi Valkeinen [Wed, 15 Apr 2020 09:20:06 +0000 (12:20 +0300)]
drm/tidss: fix crash related to accessing freed memory
tidss uses devm_kzalloc to allocate DRM plane, encoder and crtc objects.
This is not correct as the lifetime of those objects should be longer
than the underlying device's.
When unloading tidss module, the devm_kzalloc'ed objects have already
been freed when tidss_release() is called, and the driver will accesses
freed memory possibly causing a crash, a kernel WARN, or other undefined
behavior, and also KASAN will give a bug.
ALSA: usb-audio: Add static mapping table for ALC1220-VB-based mobos
TRX40 mobos from MSI and others with ALC1220-VB USB-audio device need
yet more quirks for the proper control names.
This patch provides the mapping table for those boards, correcting the
FU names for volume and mute controls as well as the terminal names
for jack controls. It also improves build_connector_control() not to
add the directional suffix blindly if the string is given from the
mapping table.
With this patch applied, the new UCM profiles will be effective.
ALSA: hda: Remove ASUS ROG Zenith from the blacklist
The commit 3c6fd1f07ed0 ("ALSA: hda: Add driver blacklist") added a
new blacklist for the devices that are known to have empty codecs, and
one of the entries was ASUS ROG Zenith II (PCI SSID 1043:874f).
However, it turned out that the very same PCI SSID is used for the
previous model that does have the valid HD-audio codecs and the change
broke the sound on it.
This patch reverts the corresponding entry as a temporary solution.
Although Zenith II and co will see get the empty HD-audio bus again,
it'd be merely resource wastes and won't affect the functionality,
so it's no end of the world. We'll need to address this later,
e.g. by either switching to DMI string matching or using PCI ID &
SSID pairs.
Brian Geffon [Fri, 17 Apr 2020 17:25:56 +0000 (10:25 -0700)]
mm: Fix MREMAP_DONTUNMAP accounting on VMA merge
When remapping a mapping where a portion of a VMA is remapped
into another portion of the VMA it can cause the VMA to become
split. During the copy_vma operation the VMA can actually
be remerged if it's an anonymous VMA whose pages have not yet
been faulted. This isn't normally a problem because at the end
of the remap the original portion is unmapped causing it to
become split again.
However, MREMAP_DONTUNMAP leaves that original portion in place which
means that the VMA which was split and then remerged is not actually
split at the end of the mremap. This patch fixes a bug where
we don't detect that the VMAs got remerged and we end up
putting back VM_ACCOUNT on the next mapping which is completely
unreleated. When that next mapping is unmapped it results in
incorrectly unaccounting for the memory which was never accounted,
and eventually we will underflow on the memory comittment.
There is also another issue which is similar, we're currently
accouting for the number of pages in the new_vma but that's wrong.
We need to account for the length of the remap operation as that's
all that is being added. If there was a mapping already at that
location its comittment would have been adjusted as part of
the munmap at the start of the mremap.
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two build fixes for a couple clk drivers and a fix for the Unisoc
serial clk where we want to keep it on for earlycon"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sprd: don't gate uart console clock
clk: mmp2: fix link error without mmp2
clk: asm9260: fix __clk_hw_register_fixed_rate_with_accuracy typo
task_work_run calls io_wq_submit_work unexpectedly, it's obvious that
struct callback_head's func member has been changed. After looking into
codes, I found this issue is still due to the union definition:
union {
/*
* Only commands that never go async can use the below fields,
* obviously. Right now only IORING_OP_POLL_ADD uses them, and
* async armed poll handlers for regular commands. The latter
* restore the work, if needed.
*/
struct {
struct callback_head task_work;
struct hlist_node hash_node;
struct async_poll *apoll;
};
struct io_wq_work work;
};
When task_work_run has multiple work to execute, the work that calls
io_poll_remove_all() will do req->work restore for non-poll request
always, but indeed if a non-poll request has been added to a new
callback_head, subsequent callback will call io_async_task_func() to
handle this request, that means we should not do the restore work
for such non-poll request. Meanwhile in io_async_task_func(), we should
drop submit ref when req has been canceled.
Fix both issues.
Fixes: b1f573bd15fd ("io_uring: restore req->work when canceling poll request") Signed-off-by: Xiaoguang Wang <[email protected]>
Use io_double_put_req()
Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 and objtool fixes from Thomas Gleixner:
"A set of fixes for x86 and objtool:
objtool:
- Ignore the double UD2 which is emitted in BUG() when
CONFIG_UBSAN_TRAP is enabled.
- Support clang non-section symbols in objtool ORC dump
- Fix switch table detection in .text.unlikely
- Make the BP scratch register warning more robust.
x86:
- Increase microcode maximum patch size for AMD to cope with new CPUs
which have a larger patch size.
- Fix a crash in the resource control filesystem when the removal of
the default resource group is attempted.
- Preserve Code and Data Prioritization enabled state accross CPU
hotplug.
- Update split lock cpu matching to use the new X86_MATCH macros.
- Change the split lock enumeration as Intel finaly decided that the
IA32_CORE_CAPABILITIES bits are not architectural contrary to what
the SDM claims. !@#%$^!
- Add Tremont CPU models to the split lock detection cpu match.
- Add a missing static attribute to make sparse happy"
* tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Add Tremont family CPU models
x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
x86/resctrl: Preserve CDP enable over CPU hotplug
x86/resctrl: Fix invalid attempt at removing the default resource group
x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
x86/umip: Make umip_insns static
x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
objtool: Make BP scratch register warning more robust
objtool: Fix switch table detection in .text.unlikely
objtool: Support Clang non-section symbols in ORC generation
objtool: Support Clang non-section symbols in ORC dump
objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time namespace fix from Thomas Gleixner:
"An update for the proc interface of time namespaces: Use symbolic
names instead of clockid numbers. The usability nuisance of numbers
was noticed by Michael when polishing the man page"
* tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
Merge tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes and updates from Thomas Gleixner:
- Fix the header line of perf stat output for '--metric-only --per-socket'
- Fix the python build with clang
- The usual tools UAPI header synchronization
* tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools headers: Synchronize linux/bits.h with the kernel sources
tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources
tools headers: Update x86's syscall_64.tbl with the kernel sources
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Update tools's copy of drm.h headers
tools headers kvm: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
tools headers UAPI: Sync linux/mman.h with the kernel
tools headers UAPI: Sync sched.h with the kernel
tools headers: Update linux/vdso.h and grab a copy of vdso/const.h
perf stat: Fix no metric header if --per-socket and --metric-only set
perf python: Check if clang supports -fno-semantic-interposition
tools arch x86: Sync the msr-index.h copy with the kernel sources
Merge tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes/updates for the interrupt subsystem:
- Remove setup_irq() and remove_irq(). All users have been converted
so remove them before new users surface.
- A set of bugfixes for various interrupt chip drivers
- Add a few missing static attributes to address sparse warnings"
* tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-bcm7038-l1: Make bcm7038_l1_of_init() static
irqchip/irq-mvebu-icu: Make legacy_bindings static
irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order
irqchip/sifive-plic: Fix maximum priority threshold value
irqchip/ti-sci-inta: Fix processing of masked irqs
irqchip/mbigen: Free msi_desc on device teardown
irqchip/gic-v4.1: Update effective affinity of virtual SGIs
irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling
genirq: Remove setup_irq() and remove_irq()
Merge tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two fixes for the scheduler:
- Work around an uninitialized variable warning where GCC can't
figure it out.
- Allow 'isolcpus=' to skip unknown subparameters so that older
kernels work with the commandline of a newer kernel. Improve the
error output while at it"
* tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/vtime: Work around an unitialized variable warning
sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
Merge tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fix from Thomas Gleixner:
"A single bugfix for RCU to prevent taking a lock in NMI context"
* tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous bug fixes and cleanups for ext4, including a fix for
generic/388 in data=journal mode, removing some BUG_ON's, and cleaning
up some compiler warnings"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: convert BUG_ON's to WARN_ON's in mballoc.c
ext4: increase wait time needed before reuse of deleted inode numbers
ext4: remove set but not used variable 'es' in ext4_jbd2.c
ext4: remove set but not used variable 'es'
ext4: do not zeroout extents beyond i_disksize
ext4: fix return-value types in several function comments
ext4: use non-movable memory for superblock readahead
ext4: use matching invalidatepage in ext4_writepage
Merge tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small smb3 fixes: two debug related (helping network tracing for
SMB2 mounts, and the other removing an unintended debug line on
signing failures), and one fixing a performance problem with 64K
pages"
* tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: remove overly noisy debug line in signing errors
cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+
cifs: dump the session id and keys also for SMB2 sessions