Jose Abreu [Wed, 18 Dec 2019 10:17:36 +0000 (11:17 +0100)]
net: stmmac: Determine earlier the size of RX buffer
Split Header feature needs to know the size of RX buffer but current
code is determining it too late. Fix this by moving the RX buffer
computation to earlier stage.
Changes from v2:
- Do not try to align already aligned buffer size
Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:35 +0000 (11:17 +0100)]
net: stmmac: selftests: Needs to check the number of Multicast regs
When running the MC and UC filter tests we setup a multicast address
that its expected to be blocked. If the number of available multicast
registers is zero, driver will always pass the multicast packets which
will fail the test.
Check if available multicast addresses is enough before running the
tests.
Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jia-Ju Bai [Wed, 18 Dec 2019 09:21:55 +0000 (17:21 +0800)]
net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()
The kernel may sleep while holding a spinlock.
The function call path (from bottom to top) in Linux 4.19 is:
net/nfc/nci/uart.c, 349:
nci_skb_alloc in nci_uart_default_recv_buf
net/nfc/nci/uart.c, 255:
(FUNC_PTR)nci_uart_default_recv_buf in nci_uart_tty_receive
net/nfc/nci/uart.c, 254:
spin_lock in nci_uart_tty_receive
nci_skb_alloc(GFP_KERNEL) can sleep at runtime.
(FUNC_PTR) means a function pointer is called.
To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC for
nci_skb_alloc().
This bug is found by a static analysis tool STCheck written by myself.
Jens Axboe [Wed, 18 Dec 2019 19:19:41 +0000 (12:19 -0700)]
io_uring: io_wq_submit_work() should not touch req->rw
I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.
Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
Olof Johansson [Wed, 18 Dec 2019 17:56:21 +0000 (09:56 -0800)]
clk: Move clk_core_reparent_orphans() under CONFIG_OF
A recent addition exposed a helper that is only used for CONFIG_OF. Move
it into the CONFIG_OF zone in this file to make the compiler stop
warning about an unused function.
Pavel Begunkov [Wed, 18 Dec 2019 16:53:45 +0000 (19:53 +0300)]
io_uring: don't wait when under-submitting
There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.
Linus Torvalds [Wed, 18 Dec 2019 16:54:15 +0000 (08:54 -0800)]
Merge tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A slightly high amount at this time, but all good and small fixes:
- A PCM core fix that initializes the buffer properly for avoiding
information leaks; it is a long-standing minor problem, but good to
fix better now
- A few ASoC core fixes for the init / cleanup ordering issues that
surfaced after the recent refactoring
- Lots of SOF and topology-related fixes went in, as usual as such
hot topics
- Several ASoC codec and platform-specific small fixes: wm89xx,
realtek, and max98090, AMD, Intel-SST
- A fix for the previous incomplete regression of HD-audio, now
hitting Nvidia HDMI
- A few HD-audio CA0132 codec fixes"
* tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
ALSA: hda - Downgrade error message for single-cmd fallback
ASoC: wm8962: fix lambda value
ALSA: hda: Fix regression by strip mask fix
ALSA: hda/ca0132 - Fix work handling in delayed HP detection
ALSA: hda/ca0132 - Avoid endless loop
ALSA: hda/ca0132 - Keep power on during processing DSP response
ALSA: pcm: Avoid possible info leaks from PCM stream buffers
ASoC: Intel: common: work-around incorrect ACPI HID for CML boards
ASoC: SOF: Intel: split cht and byt debug window sizes
ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data
ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
ASoC: simple-card: Don't create separate link when platform is present
ASoC: topology: Check return value for soc_tplg_pcm_create()
ASoC: topology: Check return value for snd_soc_add_dai_link()
ASoC: core: only flush inited work during free
ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
ASoC: core: Init pcm runtime work early to avoid warnings
ASoC: Intel: sst: Add missing include <linux/io.h>
ASoC: max98090: fix possible race conditions
ASoC: max98090: exit workaround earlier if PLL is locked
...
The host reports support for the synthetic feature X86_FEATURE_SSBD
when any of the three following hardware features are set:
CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31]
CPUID.80000008H:EBX.AMD_SSBD[bit 24]
CPUID.80000008H:EBX.VIRT_SSBD[bit 25]
Either of the first two hardware features implies the existence of the
IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does
not. Therefore, CPUID.80000008H:EBX.AMD_SSBD[bit 24] should only be
set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or
CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host.
The host reports support for the synthetic feature X86_FEATURE_SSBD
when any of the three following hardware features are set:
CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31]
CPUID.80000008H:EBX.AMD_SSBD[bit 24]
CPUID.80000008H:EBX.VIRT_SSBD[bit 25]
Either of the first two hardware features implies the existence of the
IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does
not. Therefore, CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] should only be
set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or
CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host.
Robin Murphy [Mon, 9 Dec 2019 19:47:25 +0000 (19:47 +0000)]
iommu/dma: Relax locking in iommu_dma_prepare_msi()
Since commit ece6e6f0218b ("iommu/dma-iommu: Split iommu_dma_map_msi_msg()
in two parts"), iommu_dma_prepare_msi() should no longer have to worry
about preempting itself, nor being called in atomic context at all. Thus
we can downgrade the IRQ-safe locking to a simple mutex to avoid angering
the new might_sleep() check in iommu_map().
Hanjun Guo [Wed, 11 Dec 2019 06:43:06 +0000 (14:43 +0800)]
perf/smmuv3: Remove the leftover put_cpu() in error path
In smmu_pmu_probe(), there is put_cpu() in the error path,
which is wrong because we use raw_smp_processor_id() to
get the cpu ID, not get_cpu(), remove it.
While we are at it, kill 'out_cpuhp_err' altogether and
just return err if we fail to add the hotplug instance.
Lu Baolu [Wed, 20 Nov 2019 06:10:16 +0000 (14:10 +0800)]
iommu/vt-d: Remove incorrect PSI capability check
The PSI (Page Selective Invalidation) bit in the capability register
is only valid for second-level translation. Intel IOMMU supporting
scalable mode must support page/address selective IOTLB invalidation
for first-level translation. Remove the PSI capability check in SVA
cache invalidation code.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:40 +0000 (16:14 +0000)]
staging: wfx: fix wrong error message
The driver checks that the number of retries made by the device is
coherent with the rate policy. However, this check make sense only if
the device has returned RETRY_EXCEEDED.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:37 +0000 (16:14 +0000)]
staging: wfx: detect race condition in WEP authentication
Current code has a special case to handle association with WEP. Before
to rework the tx data handling, let's try to detect any possible misuse
of this code.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:36 +0000 (16:14 +0000)]
staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps
When not using HT mode, minstrel always includes 1Mbps as fallback rate.
But, when using HT mode, this fallback is not included. Yet, it seems
that it could save some frames. So, this patch add it unconditionally.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:34 +0000 (16:14 +0000)]
staging: wfx: fix rate control handling
A tx_retry_policy (the equivalent of a list of ieee80211_tx_rate in
hardware API) is not able to include a rate multiple time. So currently,
the driver merges the identical rates from the policy provided by
minstrel (and it try to do the best choice it can in the associated
flags) before to sent it to firmware.
Until now, when rates are merged, field "count" is set to
max(count1, count2). But, it means that the sum of retries for all rates
could be far less than initial number of retries. So, this patch changes
the value of field "count" to count1 + count2. Thus, sum of all retries
for all rates stay the same.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:30 +0000 (16:14 +0000)]
staging: wfx: fix counter overflow
Some weird behaviors were observed when connection is really good and
packets are small. It appears that sometime, number of packets in queues
can exceed 255 and generate an overflow in field usage_count.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:29 +0000 (16:14 +0000)]
staging: wfx: fix case of lack of tx_retry_policies
In some rare cases, driver may not have any available tx_retry_policies.
In this case, the driver asks to mac80211 to stop sending data. However,
it seems that a race is possible and a few frames can be sent to the
driver. In this case, driver can't wait for free tx_retry_policies since
wfx_tx() must be atomic. So, this patch fix this case by sending these
frames with the special policy number 15.
The firmware normally use policy 15 to send internal frames (PS-poll,
beacons, etc...). So, it is not a so bad fallback.
Jérôme Pouiller [Tue, 17 Dec 2019 16:14:27 +0000 (16:14 +0000)]
staging: wfx: fix the cache of rate policies on interface reset
Device and driver maintain a cache of rate policies (aka.
tx_retry_policy in hardware API).
When hif_reset() is sent to hardware, device resets its cache of rate
policies. In order to keep driver in sync, it is necessary to do the
same on driver.
Note, when driver tries to use a rate policy that has not been defined
on device, data is sent at 1Mbps. So, this patch should fix abnormal
throughput observed sometime after a reset of the interface.
Adrian Hunter [Tue, 17 Dec 2019 09:53:49 +0000 (11:53 +0200)]
mmc: sdhci: Add a quirk for broken command queuing
Command queuing has been reported broken on some systems based on Intel
GLK. A separate patch disables command queuing in some cases.
This patch adds a quirk for broken command queuing, which enables users
with problems to disable command queuing using sdhci module parameters for
quirks.
Adrian Hunter [Tue, 17 Dec 2019 09:53:48 +0000 (11:53 +0200)]
mmc: sdhci: Workaround broken command queuing on Intel GLK
Command queuing has been reported broken on some Lenovo systems based on
Intel GLK. This is likely a BIOS issue, so disable command queuing for
Intel GLK if the BIOS vendor string is "LENOVO".
Yangbo Lu [Mon, 16 Dec 2019 03:18:42 +0000 (11:18 +0800)]
mmc: sdhci-of-esdhc: fix P2020 errata handling
Two previous patches introduced below quirks for P2020 platforms.
- SDHCI_QUIRK_RESET_AFTER_REQUEST
- SDHCI_QUIRK_BROKEN_TIMEOUT_VAL
The patches made a mistake to add them in quirks2 of sdhci_host
structure, while they were defined for quirks.
host->quirks2 |= SDHCI_QUIRK_RESET_AFTER_REQUEST;
host->quirks2 |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;
This patch is to fix them.
host->quirks |= SDHCI_QUIRK_RESET_AFTER_REQUEST;
host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL;
Chris Wilson [Tue, 17 Dec 2019 13:47:29 +0000 (13:47 +0000)]
drm/i915/gem: Keep request alive while attaching fences
Since commit e5dadff4b093 ("drm/i915: Protect request retirement with
timeline->mutex"), the request retirement can happen outside of the
struct_mutex serialised only by the timeline->mutex. We drop the
timeline->mutex on submitting the request (i915_request_add) so after
that point, it is liable to be freed. Make sure our local reference is
kept alive until we have finished attaching it to the signalers. (Note
that this erodes the argument that i915_request_add should consume the
reference, but that is a slightly larger patch!)
John Hurley [Tue, 17 Dec 2019 11:28:56 +0000 (11:28 +0000)]
nfp: flower: fix stats id allocation
As flower rules are added, they are given a stats ID based on the number
of rules that can be supported in firmware. Only after the initial
allocation of all available IDs does the driver begin to reuse those that
have been released.
The initial allocation of IDs was modified to account for multiple memory
units on the offloaded device. However, this introduced a bug whereby the
counter that controls the IDs could be decremented before the ID was
assigned (where it is further decremented). This means that the stats ID
could be assigned as -1/0xfffffff which is out of range.
Fix this by only decrementing the main counter after the current ID has
been assigned.
Fixes: 467322e2627f ("nfp: flower: support multiple memory units for filter offloads") Signed-off-by: John Hurley <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Oleksij Rempel [Tue, 17 Dec 2019 06:51:45 +0000 (07:51 +0100)]
net: ag71xx: fix compile warnings
drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_probe':
drivers/net/ethernet/atheros/ag71xx.c:1776:30: warning: passing argument 2 of
'of_get_phy_mode' makes pointer from integer without a cast [-Wint-conversion]
In file included from drivers/net/ethernet/atheros/ag71xx.c:33:
./include/linux/of_net.h:15:69: note: expected 'phy_interface_t *'
{aka 'enum <anonymous> *'} but argument is of type 'int'
Fixes: 0c65b2b90d13c1 ("net: of_get_phy_mode: Change API to solve int/unit warnings") Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Tue, 17 Dec 2019 02:51:03 +0000 (18:51 -0800)]
net: annotate lockless accesses to sk->sk_pacing_shift
sk->sk_pacing_shift can be read and written without lock
synchronization. This patch adds annotations to
document this fact and avoid future syzbot complains.
This might also avoid unexpected false sharing
in sk_pacing_shift_update(), as the compiler
could remove the conditional check and always
write over sk->sk_pacing_shift :
if (sk->sk_pacing_shift != val)
sk->sk_pacing_shift = val;
Ben Hutchings [Tue, 17 Dec 2019 01:57:40 +0000 (01:57 +0000)]
net: qlogic: Fix error paths in ql_alloc_large_buffers()
ql_alloc_large_buffers() has the usual RX buffer allocation
loop where it allocates skbs and maps them for DMA. It also
treats failure as a fatal error.
There are (at least) three bugs in the error paths:
1. ql_free_large_buffers() assumes that the lrg_buf[] entry for the
first buffer that couldn't be allocated will have .skb == NULL.
But the qla_buf[] array is not zero-initialised.
2. ql_free_large_buffers() DMA-unmaps all skbs in lrg_buf[]. This is
incorrect for the last allocated skb, if DMA mapping failed.
3. Commit 1acb8f2a7a9f ("net: qlogic: Fix memory leak in
ql_alloc_large_buffers") added a direct call to dev_kfree_skb_any()
after the skb is recorded in lrg_buf[], so ql_free_large_buffers()
will double-free it.
The bugs are somewhat inter-twined, so fix them all at once:
* Clear each entry in qla_buf[] before attempting to allocate
an skb for it. This goes half-way to fixing bug 1.
* Set the .skb field only after the skb is DMA-mapped. This
fixes the rest.
Fixes: 1357bfcf7106 ("qla3xxx: Dynamically size the rx buffer queue ...") Fixes: 0f8ab89e825f ("qla3xxx: Check return code from pci_map_single() ...") Fixes: 1acb8f2a7a9f ("net: qlogic: Fix memory leak in ql_alloc_large_buffers") Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
sctp: fix memleak on err handling of stream initialization
syzbot reported a memory leak when an allocation fails within
genradix_prealloc() for output streams. That's because
genradix_prealloc() leaves initialized members initialized when the
issue happens and SCTP stack will abort the current initialization but
without cleaning up such members.
The fix here is to always call genradix_free() when genradix_prealloc()
fails, for output and also input streams, as it suffers from the same
issue.
Eric Sandeen [Fri, 6 Dec 2019 16:55:59 +0000 (10:55 -0600)]
fs: call fsnotify_sb_delete after evict_inodes
When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches. This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.
At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.
After that, the call to evict_inodes will evict everything else with a
zero refcount.
This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().
Eric Sandeen [Fri, 6 Dec 2019 16:54:23 +0000 (10:54 -0600)]
fs: avoid softlockups in s_inodes iterators
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.
Previous efforts were made in 2 functions, see:
c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() ac05fbb inode: don't softlockup when evicting inodes
but there hasn't been an audit of all walkers, so do that now. This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.
One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.
Changbin Du [Wed, 18 Dec 2019 04:51:56 +0000 (20:51 -0800)]
lib/Kconfig.debug: fix some messed up configurations
Some configuration items are messed up during conflict resolving. For
example, STRICT_DEVMEM should not in testing menu, but kunit should.
This patch fixes all of them.
Yang Shi [Wed, 18 Dec 2019 04:51:52 +0000 (20:51 -0800)]
mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG
Since commit 0a432dcbeb32 ("mm: shrinker: make shrinker not depend on
memcg kmem"), shrinkers' idr is protected by CONFIG_MEMCG instead of
CONFIG_MEMCG_KMEM, so it makes no sense to protect shrinker idr replace
with CONFIG_MEMCG_KMEM.
And in the CONFIG_MEMCG && CONFIG_SLOB case, shrinker_idr contains only
shrinker, and it is deferred_split_shrinker. But it is never actually
called, since idr_replace() is never compiled due to the wrong #ifdef.
The deferred_split_shrinker all the time is staying in half-registered
state, and it's never called for subordinate mem cgroups.
Daniel Axtens [Wed, 18 Dec 2019 04:51:49 +0000 (20:51 -0800)]
kasan: don't assume percpu shadow allocations will succeed
syzkaller and the fault injector showed that I was wrong to assume that
we could ignore percpu shadow allocation failures.
Handle failures properly. Merge all the allocated areas back into the
free list and release the shadow, then clean up and return NULL. The
shadow is released unconditionally, which relies upon the fact that the
release function is able to tolerate pages not being present.
Also clean up shadows in the recovery path - currently they are not
released, which leaks a bit of memory.
Daniel Axtens [Wed, 18 Dec 2019 04:51:46 +0000 (20:51 -0800)]
kasan: use apply_to_existing_page_range() for releasing vmalloc shadow
kasan_release_vmalloc uses apply_to_page_range to release vmalloc
shadow. Unfortunately, apply_to_page_range can allocate memory to fill
in page table entries, which is not what we want.
Also, kasan_release_vmalloc is called under free_vmap_area_lock, so if
apply_to_page_range does allocate memory, we get a sleep in atomic bug:
BUG: sleeping function called from invalid context at mm/page_alloc.c:4681
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 15087, name:
apply_to_page_range() takes an address range, and if any parts of it are
not covered by the existing page table hierarchy, it allocates memory to
fill them in.
In some use cases, this is not what we want - we want to be able to
operate exclusively on PTEs that are already in the tables.
Add apply_to_existing_page_range() for this. Adjust the walker
functions for apply_to_page_range to take 'create', which switches them
between the old and new modes.
Andrey Ryabinin [Wed, 18 Dec 2019 04:51:38 +0000 (20:51 -0800)]
kasan: fix crashes on access to memory mapped by vm_map_ram()
With CONFIG_KASAN_VMALLOC=y any use of memory obtained via vm_map_ram()
will crash because there is no shadow backing that memory.
Instead of sprinkling additional kasan_populate_vmalloc() calls all over
the vmalloc code, move it into alloc_vmap_area(). This will fix
vm_map_ram() and simplify the code a bit.
Paul Mackerras [Wed, 18 Dec 2019 00:43:06 +0000 (11:43 +1100)]
KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
Commit 22945688acd4 ("KVM: PPC: Book3S HV: Support reset of secure
guest") added a call to uv_svm_terminate, which is an ultravisor
call, without any check that the guest is a secure guest or even that
the system has an ultravisor. On a system without an ultravisor,
the ultracall will degenerate to a hypercall, but since we are not
in KVM guest context, the hypercall will get treated as a system
call, which could have random effects depending on what happens to
be in r0, and could also corrupt the current task's kernel stack.
Hence this adds a test for the guest being a secure guest before
doing uv_svm_terminate().
Fixes: 22945688acd4 ("KVM: PPC: Book3S HV: Support reset of secure guest") Signed-off-by: Paul Mackerras <[email protected]>
Jens Axboe [Wed, 18 Dec 2019 02:45:06 +0000 (19:45 -0700)]
io_uring: warn about unhandled opcode
Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.
Jens Axboe [Wed, 18 Dec 2019 02:53:05 +0000 (19:53 -0700)]
io_uring: read opcode and user_data from SQE exactly once
If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.
Jens Axboe [Wed, 18 Dec 2019 01:50:29 +0000 (18:50 -0700)]
io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Jens Axboe [Wed, 18 Dec 2019 01:45:56 +0000 (18:45 -0700)]
io_uring: make IORING_OP_CANCEL_ASYNC deferrable
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Jens Axboe [Wed, 18 Dec 2019 01:40:57 +0000 (18:40 -0700)]
io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Pavel Begunkov [Tue, 17 Dec 2019 17:57:05 +0000 (20:57 +0300)]
io_uring: make HARDLINK imply LINK
The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Jens Axboe [Mon, 16 Dec 2019 18:55:28 +0000 (11:55 -0700)]
io_uring: any deferred command must have stable sqe data
We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.
Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Jens Axboe [Tue, 10 Dec 2019 21:38:45 +0000 (14:38 -0700)]
io_uring: remove 'sqe' parameter to the OP helpers that take it
We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.
Jens Axboe [Mon, 16 Dec 2019 05:13:43 +0000 (22:13 -0700)]
io_uring: fix pre-prepped issue with force_nonblock == true
Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.
Cleanup the async context setup in general, we have a lot of duplicated
code there.
DT property definitions must be under a 'properties' keyword. This was
missing for 'snps,tso' in an if/then clause. A meta-schema fix will
catch future errors like this.
Fixes: 7db3545aef5f ("dt-bindings: net: stmmac: Convert the binding to a schemas") Cc: "David S. Miller" <[email protected]> Acked-by: Maxime Ripard <[email protected]> Signed-off-by: Rob Herring <[email protected]>
Ioana Ciornei [Mon, 16 Dec 2019 15:32:30 +0000 (17:32 +0200)]
dpaa2-ptp: fix double free of the ptp_qoriq IRQ
Upon reusing the ptp_qoriq driver, the ptp_qoriq_free() function was
used on the remove path to free any allocated resources.
The ptp_qoriq IRQ is among these resources that are freed in
ptp_qoriq_free() even though it is also a managed one (allocated using
devm_request_threaded_irq).
Drop the resource managed version of requesting the IRQ in order to not
trigger a double free of the interrupt as below:
Linus Torvalds [Tue, 17 Dec 2019 21:27:02 +0000 (13:27 -0800)]
Merge tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A mix of regression fixes and regular fixes for stable trees:
- fix swapped error messages for qgroup enable/rescan
- fixes for NO_HOLES feature with clone range
- fix deadlock between iget/srcu lock/synchronize srcu while freeing
an inode
- fix double lock on subvolume cross-rename
- tree log fixes
* fix missing data checksums after replaying a log tree
* also teach tree-checker about this problem
* skip log replay on orphaned roots
- fix maximum devices constraints for RAID1C -3 and -4
- send: don't print warning on read-only mount regarding orphan
cleanup
- error handling fixes"
* tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: send: remove WARN_ON for readonly mount
btrfs: do not leak reloc root if we fail to read the fs root
btrfs: skip log replay on orphaned roots
btrfs: handle ENOENT in btrfs_uuid_tree_iterate
btrfs: abort transaction after failed inode updates in create_subvol
Btrfs: fix hole extent items with a zero size after range cloning
Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
Btrfs: make tree checker detect checksum items with overlapping ranges
Btrfs: fix missing data checksums after replaying a log tree
btrfs: return error pointer from alloc_test_extent_buffer
btrfs: fix devs_max constraints for raid1c3 and raid1c4
btrfs: tree-checker: Fix error format string for size_t
btrfs: don't double lock the subvol_sem for rename exchange
btrfs: handle error in btrfs_cache_block_group
btrfs: do not call synchronize_srcu() in inode_tree_del
Btrfs: fix cloning range with a hole when using the NO_HOLES feature
btrfs: Fix error messages in qgroup_rescan_init
Linus Torvalds [Tue, 17 Dec 2019 21:08:41 +0000 (13:08 -0800)]
Merge tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small set of fixes for mostly minor issues here, the only real code
ones are Wen Yang's fixes for error handling in the core and Christian
Marussi's list_voltage() change which is a fix for disruptively bad
performance for regulators with continuous voltage control (which are
rare)"
* tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rn5t618: fix module aliases
regulator: max77650: add of_match table
regulator: core: avoid unneeded .list_voltage calls
regulator: s5m8767: Fix a warning message
regulator: core: fix regulator_register() error paths to properly release rdev
regulator: fix use after free issue
Linus Torvalds [Tue, 17 Dec 2019 21:06:31 +0000 (13:06 -0800)]
Merge tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A relatively large set of fixes here, the biggest part of it is for
fallout from the GPIO descriptor rework that affected several of the
devices with usable native chip select support. There's also some new
PCI IDs for Intel Jasper Lake devices.
The conversion to platform_get_irq() in the fsl driver is an
incremental fix for build errors introduced on SPARC by the earlier
fix for error handling in probe in that driver"
* tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
spi: nxp-fspi: Ensure width is respected in spi-mem operations
spi: spi-ti-qspi: Fix a bug when accessing non default CS
spi: fsl: don't map irq during probe
spi: spi-cavium-thunderx: Add missing pci_release_regions()
spi: sprd: Fix the incorrect SPI register
gpiolib: of: Make of_gpio_spi_cs_get_count static
spi: fsl: Handle the single hardwired chipselect case
gpio: Handle counting of Freescale chipselects
spi: fsl: Fix GPIO descriptor support
spi: dw: Correct handling of native chipselect
spi: cadence: Correct handling of native chipselect
spi: pxa2xx: Add support for Intel Jasper Lake
Darrick J. Wong [Wed, 11 Dec 2019 21:19:07 +0000 (13:19 -0800)]
xfs: fix log reservation overflows when allocating large rt extents
Omar Sandoval reported that a 4G fallocate on the realtime device causes
filesystem shutdowns due to a log reservation overflow that happens when
we log the rtbitmap updates. Factor rtbitmap/rtsummary updates into the
the tr_write and tr_itruncate log reservation calculation.
"The following reproducer results in a transaction log overrun warning
for me:
Linus Torvalds [Tue, 17 Dec 2019 19:17:03 +0000 (11:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix kexec booting with certain EFI memory map layouts"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
Linus Torvalds [Tue, 17 Dec 2019 19:11:08 +0000 (11:11 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Add HPET quirks for the Intel 'Coffee Lake H' and 'Ice Lake' platforms"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel: Disable HPET on Intel Ice Lake platforms
x86/intel: Disable HPET on Intel Coffee Lake H platforms
Linus Torvalds [Tue, 17 Dec 2019 19:03:57 +0000 (11:03 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
"These are all perf tooling changes: most of them are fixes.
Note that the large CPU count related fixes go beyond regression
fixes, but the IPI-flood symptoms are severe enough that I think
justifies their inclusion"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES
libtraceevent: Allow custom libdir path
perf header: Fix false warning when there are no duplicate cache entries
perf metricgroup: Fix printing event names of metric group with multiple events
perf/x86/pmu-events: Fix Kernel_Utilization metric
perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS
perf arch: Make the default get_cpuid() return compatible error
tools headers kvm: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Update tools's copy of drm.h headers
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
perf inject: Fix processing of ID index for injected instruction tracing
perf report: Bail out --mem-mode if mem info is not available
perf report: Make -F more strict like -s
perf report/top TUI: Replace pr_err() with ui__error()
libtraceevent: Copy pkg-config file to output folder when using O=
libtraceevent: Fix lib installation with O=
perf kvm: Clarify the 'perf kvm' -i and -o command line options
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
perf beauty: Add CLEAR_SIGHAND support for clone's flags arg
...
Linus Torvalds [Tue, 17 Dec 2019 19:00:46 +0000 (11:00 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Tone down mutex debugging complaints, and annotate/fix spinlock
debugging data accesses for KCSAN"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"
locking/spinlock/debug: Fix various data races
Linus Torvalds [Tue, 17 Dec 2019 18:39:55 +0000 (10:39 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Protect presistent EFI memory reservations from kexec, fix EFIFB early
console, EFI stub graphics output fixes and other misc fixes."
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Don't attempt to map RCI2 config table if it doesn't exist
efi/earlycon: Remap entire framebuffer after page initialization
efi: Fix efi_loaded_image_t::unload type
efi/gop: Fix memory leak in __gop_query32/64()
efi/gop: Return EFI_SUCCESS if a usable GOP was found
efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
efi/memreserve: Register reservations as 'reserved' in /proc/iomem
Recently, there's been some compat ioctl cleanup, in which large
hardcoded lists were replaced with compat_ptr_ioctl. One of these
changes involved removing the random.c hardcoded list entries and adding
a compat ioctl function pointer to the random.c fops. In the process,
urandom was forgotten about, so this commit fixes that oversight.
Daniel Borkmann [Tue, 17 Dec 2019 12:28:16 +0000 (13:28 +0100)]
bpf: Fix cgroup local storage prog tracking
Recently noticed that we're tracking programs related to local storage maps
through their prog pointer. This is a wrong assumption since the prog pointer
can still change throughout the verification process, for example, whenever
bpf_patch_insn_single() is called.
Therefore, the prog pointer that was assigned via bpf_cgroup_storage_assign()
is not guaranteed to be the same as we pass in bpf_cgroup_storage_release()
and the map would therefore remain in busy state forever. Fix this by using
the prog's aux pointer which is stable throughout verification and beyond.
Suwan Kim [Fri, 13 Dec 2019 02:30:55 +0000 (11:30 +0900)]
usbip: Fix error path of vhci_recv_ret_submit()
If a transaction error happens in vhci_recv_ret_submit(), event
handler closes connection and changes port status to kick hub_event.
Then hub tries to flush the endpoint URBs, but that causes infinite
loop between usb_hub_flush_endpoint() and vhci_urb_dequeue() because
"vhci_priv" in vhci_urb_dequeue() was already released by
vhci_recv_ret_submit() before a transmission error occurred. Thus,
vhci_urb_dequeue() terminates early and usb_hub_flush_endpoint()
continuously calls vhci_urb_dequeue().
The root cause of this issue is that vhci_recv_ret_submit()
terminates early without giving back URB when transaction error
occurs in vhci_recv_ret_submit(). That causes the error URB to still
be linked at endpoint list without “vhci_priv".
So, in the case of transaction error in vhci_recv_ret_submit(),
unlink URB from the endpoint, insert proper error code in
urb->status and give back URB.
Suwan Kim [Fri, 13 Dec 2019 02:30:54 +0000 (11:30 +0900)]
usbip: Fix receive error in vhci-hcd when using scatter-gather
When vhci uses SG and receives data whose size is smaller than SG
buffer size, it tries to receive more data even if it acutally
receives all the data from the server. If then, it erroneously adds
error event and triggers connection shutdown.
vhci-hcd should check if it received all the data even if there are
more SG entries left. So, check if it receivces all the data from
the server in for_each_sg() loop.
Erkka Talvitie [Wed, 11 Dec 2019 08:08:39 +0000 (10:08 +0200)]
USB: EHCI: Do not return -EPIPE when hub is disconnected
When disconnecting a USB hub that has some child device(s) connected to it
(such as a USB mouse), then the stack tries to clear halt and
reset device(s) which are _already_ physically disconnected.
The issue has been reproduced with:
CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
SW: U-Boot 2019.07 and kernel 4.19.40.
CPU: HP Proliant Microserver Gen8.
SW: Linux version 4.2.3-300.fc23.x86_64
In this situation there will be error bit for MMF active yet the
CERR equals EHCI_TUNE_CERR + halt. Existing implementation
interprets this as a stall [1] (chapter 8.4.5).
The possible conditions when the MMF will be active + halt
can be found from [2] (Table 4-13).
Fix for the issue is to check whether MMF is active and PID Code is
IN before checking for the stall. If these conditions are true then
it is not a stall.
What happens after the fix is that when disconnecting a hub with
attached device(s) the situation is not interpret as a stall.
zhong jiang [Fri, 13 Dec 2019 12:16:18 +0000 (20:16 +0800)]
usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'
Fixes the following compile error:
drivers/usb/typec/tcpm/fusb302.o: In function `tcpm_get_current_limit':
fusb302.c:(.text+0x3ee): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x422): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x450): undefined reference to `extcon_get_state'
fusb302.c:(.text+0x48c): undefined reference to `extcon_get_state'
drivers/usb/typec/tcpm/fusb302.o: In function `fusb302_probe':
fusb302.c:(.text+0x980): undefined reference to `extcon_get_extcon_dev'
make: *** [vmlinux] Error 1
It is because EXTCON is build as a module, but FUSB302 is not.
intel_th: msu: Fix window switching without windows
Commit 6cac7866c2741 ("intel_th: msu: Add a sysfs attribute to trigger
window switch") adds a NULL pointer dereference in the case when there are
no windows allocated:
Commit aac8da65174a ("intel_th: msu: Start handling IRQs") implicitly
relies on the use of devm_request_irq() to subsequently free the irqs on
device removal, but in case of the pci_free_irq_vectors() API, the
handlers need to be freed before it is called. Therefore, at the moment
the driver's remove path trips a BUG_ON(irq_has_action()):
Takashi Iwai [Tue, 17 Dec 2019 13:18:32 +0000 (14:18 +0100)]
Merge tag 'asoc-fix-v5.5-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.5
A collection of fixes since the merge window, mostly driver specific but
there's a few in the core that clean up fallout from the refactorings
done in the last cycle.
Sudip Mukherjee [Thu, 12 Dec 2019 13:16:02 +0000 (13:16 +0000)]
tty: link tty and port before configuring it as console
There seems to be a race condition in tty drivers and I could see on
many boot cycles a NULL pointer dereference as tty_init_dev() tries to
do 'tty->port->itty = tty' even though tty->port is NULL.
'tty->port' will be set by the driver and if the driver has not yet done
it before we open the tty device we can get to this situation. By adding
some extra debug prints, I noticed that:
uart_add_one_port() registers the console, as soon as it registers, the
userspace tries to use it and that leads to tty_open() but
uart_add_one_port() has not yet done tty_port_link_device() and so
tty->port is not yet configured when control reaches tty_init_dev().
Further look into the code and tty_port_link_device() is done by
uart_add_one_port(). After registering the console uart_add_one_port()
will call tty_port_register_device_attr_serdev() and
tty_port_link_device() is called from this.
Call add tty_port_link_device() before uart_configure_port() is done and
add a check in tty_port_link_device() so that it only links the port if
it has not been done yet.
Vincent Guittot [Fri, 29 Nov 2019 14:04:47 +0000 (15:04 +0100)]
sched/cfs: fix spurious active migration
The load balance can fail to find a suitable task during the periodic check
because the imbalance is smaller than half of the load of the waiting
tasks. This results in the increase of the number of failed load balance,
which can end up to start an active migration. This active migration is
useless because the current running task is not a better choice than the
waiting ones. In fact, the current task was probably not running but
waiting for the CPU during one of the previous attempts and it had already
not been selected.
When load balance fails too many times to migrate a task, we should relax
the contraint on the maximum load of the tasks that can be migrated
similarly to what is done with cache hotness.
Before the rework, load balance used to set the imbalance to the average
load_per_task in order to mitigate such situation. This increased the
likelihood of migrating a task but also of selecting a larger task than
needed while more appropriate ones were in the list.
Vincent Guittot [Wed, 4 Dec 2019 18:21:40 +0000 (19:21 +0100)]
sched/fair: Fix find_idlest_group() to handle CPU affinity
Because of CPU affinity, the local group can be skipped which breaks the
assumption that statistics are always collected for local group. With
uninitialized local_sgs, the comparison is meaningless and the behavior
unpredictable. This can even end up to use local pointer which is to
NULL in this case.
If the local group has been skipped because of CPU affinity, we return
the idlest group.
Johannes Weiner [Tue, 3 Dec 2019 18:35:24 +0000 (13:35 -0500)]
psi: Fix a division error in psi poll()
The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.
The crashing instruction is trying to divide the observed stall time
by the sampling period. The period, stored in R8, is not 0, but we are
dividing by the lower 32 bits only, which are all 0 in this instance.
We could switch to a 64-bit division, but the period shouldn't be that
big in the first place. It's the time between the last update and the
next scheduled one, and so should always be around 2s and comfortably
fit into 32 bits.
The bug is in the initialization of new cgroups: we schedule the first
sampling event in a cgroup as an offset of sched_clock(), but fail to
initialize the last_update timestamp, and it defaults to 0. That
results in a bogusly large sampling period the first time we run the
sampling code, and consequently we underreport pressure for the first
2s of a cgroup's life. But worse, if sched_clock() is sufficiently
advanced on the system, and the user gets unlucky, the period's lower
32 bits can all be 0 and the sampling division will crash.
Fix this by initializing the last update timestamp to the creation
time of the cgroup, thus correctly marking the start of the first
pressure sampling period in a new cgroup.
Since commit 28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.
Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.
ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
skips the PT/LBR exclusivity check on CPUs where PT and LBRs coexist, but
also inadvertently skips the active_events bump for PT in that case, which
is a bug. If there aren't any hardware events at the same time as PT, the
PMI handler will ignore PT PMIs, as active_events reads zero in that case,
resulting in the "Uhhuh" spurious NMI warning and PT data loss.
Fix this by always increasing active_events for PT events.
Peter Zijlstra [Fri, 6 Dec 2019 11:50:16 +0000 (12:50 +0100)]
perf/x86: Fix potential out-of-bounds access
UBSAN reported out-of-bound accesses for x86_pmu.event_map(), it's
arguments should be < x86_pmu.max_events. Make sure all users observe
this constraint.
Jerry Snitselaar [Fri, 13 Dec 2019 05:36:42 +0000 (22:36 -0700)]
iommu/vt-d: Allocate reserved region for ISA with correct permission
Currently the reserved region for ISA is allocated with no
permissions. If a dma domain is being used, mapping this region will
fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE.
Jerry Snitselaar [Tue, 10 Dec 2019 18:56:06 +0000 (11:56 -0700)]
iommu: set group default domain before creating direct mappings
iommu_group_create_direct_mappings uses group->default_domain, but
right after it is called, request_default_domain_for_dev calls
iommu_domain_free for the default domain, and sets the group default
domain to a different domain. Move the
iommu_group_create_direct_mappings call to after the group default
domain is set, so the direct mappings get associated with that domain.
Lu Baolu [Wed, 11 Dec 2019 01:40:15 +0000 (09:40 +0800)]
iommu/vt-d: Fix dmar pte read access not set error
If the default DMA domain of a group doesn't fit a device, it
will still sit in the group but use a private identity domain.
When map/unmap/iova_to_phys come through iommu API, the driver
should still serve them, otherwise, other devices in the same
group will be impacted. Since identity domain has been mapped
with the whole available memory space and RMRRs, we don't need
to worry about the impact on it.