Lee Jones [Mon, 2 Nov 2020 18:32:42 +0000 (13:32 -0500)]
Fonts: Replace discarded const qualifier
Commit 6735b4632def ("Fonts: Support FONT_EXTRA_WORDS macros for built-in
fonts") introduced the following error when building rpc_defconfig (only
this build appears to be affected):
`acorndata_8x8' referenced in section `.text' of arch/arm/boot/compressed/ll_char_wr.o:
defined in discarded section `.data' of arch/arm/boot/compressed/font.o
`acorndata_8x8' referenced in section `.data.rel.ro' of arch/arm/boot/compressed/font.o:
defined in discarded section `.data' of arch/arm/boot/compressed/font.o
make[3]: *** [/scratch/linux/arch/arm/boot/compressed/Makefile:191: arch/arm/boot/compressed/vmlinux] Error 1
make[2]: *** [/scratch/linux/arch/arm/boot/Makefile:61: arch/arm/boot/compressed/vmlinux] Error 2
make[1]: *** [/scratch/linux/arch/arm/Makefile:317: zImage] Error 2
The .data section is discarded at link time. Reinstating acorndata_8x8 as
const ensures it is still available after linking. Do the same for the
other 12 built-in fonts as well, for consistency purposes.
Vanshidhar Konda [Fri, 30 Oct 2020 17:30:50 +0000 (10:30 -0700)]
arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4
The current arm64 default config limits max NUMA nodes available on
system to 4 (NODES_SHIFT = 2). Today's arm64 systems can reach or
exceed 16 NUMA nodes. To accomodate current hardware and to fit
NODES_SHIFT within page flags on arm64, increase NODES_SHIFT to 4.
Sagi Grimberg [Thu, 22 Oct 2020 02:15:31 +0000 (10:15 +0800)]
nvme-tcp: avoid repeated request completion
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Sagi Grimberg [Thu, 22 Oct 2020 02:15:23 +0000 (10:15 +0800)]
nvme-rdma: avoid repeated request completion
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Chao Leng [Thu, 22 Oct 2020 02:15:15 +0000 (10:15 +0800)]
nvme-tcp: avoid race between time out and tear down
Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.
To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Chao Leng [Thu, 22 Oct 2020 02:15:08 +0000 (10:15 +0800)]
nvme-rdma: avoid race between time out and tear down
Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.
To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Alan Stern [Mon, 2 Nov 2020 14:58:21 +0000 (09:58 -0500)]
USB: Add NO_LPM quirk for Kingston flash drive
In Bugzilla #208257, Julien Humbert reports that a 32-GB Kingston
flash drive spontaneously disconnects and reconnects, over and over.
Testing revealed that disabling Link Power Management for the drive
fixed the problem.
This patch adds a quirk entry for that drive to turn off LPM permanently.
A receive callback is queued while the client is still connected
but can still be called after the client was disconnected. Upon
disconnect cl->me_cl is set to NULL, hence we need to check
that ME client is not-NULL in mei_cl_mtu to avoid
null dereference.
Maxime Ripard [Mon, 2 Nov 2020 16:29:08 +0000 (17:29 +0100)]
drm/vc4: drv: Remove unused variable
The commit dcda7c28bff2 ("drm/vc4: kms: Add functions to create the state
objects") removed the last users of the vc4 variable, but didn't remove
that variable resulting in a warning.
Steven Price [Fri, 30 Oct 2020 14:58:33 +0000 (14:58 +0000)]
drm/panfrost: Fix module unload
When unloading the call to pm_runtime_put_sync_suspend() will attempt to
turn the GPU cores off, however panfrost_device_fini() will have turned
the clocks off. This leads to the hardware locking up.
Instead don't call pm_runtime_put_sync_suspend() and instead simply mark
the device as suspended using pm_runtime_set_suspended(). And also
include this on the error path in panfrost_probe().
Boris Brezillon [Sun, 1 Nov 2020 17:40:16 +0000 (18:40 +0100)]
drm/panfrost: Fix a deadlock between the shrinker and madvise path
panfrost_ioctl_madvise() and panfrost_gem_purge() acquire the mappings
and shmem locks in different orders, thus leading to a potential
the mappings lock first.
Commit 5a18e1e0c193b introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:
a. we get a transport event due to a FAILOVER
b. later, we get CRQ_INITIALIZED indicating the partner is
ready at which point we schedule a FAILOVER reset.
and ->failover_pending is true during this window.
If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.
This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.
This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.
We could try and block the open until failover is completed but a) that
could still timeout the application and b) Existing code "pretends" that
failover occurred "just after" open succeeded, so marks the open successful
and lets the failover complete the open. So, mark the open successful even
if the transport event occurs before we actually start the open.
The qca8k only supports a switch-wide MTU setting, and the code to take
the max of all ports was only looking at the port currently being set.
Fix to examine all ports.
Sreekanth Reddy [Mon, 2 Nov 2020 07:27:46 +0000 (12:57 +0530)]
scsi: mpt3sas: Fix timeouts observed while reenabling IRQ
While reenabling the IRQ after irq poll there may be small time window
where HBA firmware has posted some replies and raise the interrupts but
driver has not received the interrupts. So we may observe I/O timeouts as
the driver has not processed the replies as interrupts got missed while
reenabling the IRQ.
To fix this issue the driver has to go for one more round of processing the
reply descriptors from reply descriptor post queue after enabling the IRQ.
Hannes Reinecke [Thu, 24 Sep 2020 10:45:59 +0000 (12:45 +0200)]
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
alua_bus_detach() might be running concurrently with alua_rtpg_work(), so
we might trip over h->sdev == NULL and call BUG_ON(). The correct way of
handling it is to not set h->sdev to NULL in alua_bus_detach(), and call
rcu_synchronize() before the final delete to ensure that all concurrent
threads have left the critical section. Then we can get rid of the
BUG_ON() and replace it with a simple if condition.
Petr Malat [Fri, 30 Oct 2020 13:26:33 +0000 (14:26 +0100)]
sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms
Commit 978aa0474115 ("sctp: fix some type cast warnings introduced since
very beginning")' broke err reading from sctp_arg, because it reads the
value as 32-bit integer, although the value is stored as 16-bit integer.
Later this value is passed to the userspace in 16-bit variable, thus the
user always gets 0 on big-endian platforms. Fix it by reading the __u16
field of sctp_arg union, as reading err field would produce a sparse
warning.
Linus Torvalds [Mon, 26 Oct 2020 20:15:23 +0000 (13:15 -0700)]
tty: make FONTX ioctl use the tty pointer they were actually passed
Some of the font tty ioctl's always used the current foreground VC for
their operations. Don't do that then.
This fixes a data race on fg_console.
Side note: both Michael Ellerman and Jiri Slaby point out that all these
ioctls are deprecated, and should probably have been removed long ago,
and everything seems to be using the KDFONTOP ioctl instead.
In fact, Michael points out that it looks like busybox's loadfont
program seems to have switched over to using KDFONTOP exactly _because_
of this bug (ahem.. 12 years ago ;-).
Linus Torvalds [Mon, 2 Nov 2020 22:47:37 +0000 (14:47 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Subsystems affected by this patch series: mm (memremap, memcg,
slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap),
kthread, signals, lib, epoll, and core-kernel"
* emailed patches from Andrew Morton <[email protected]>:
kernel/hung_task.c: make type annotations consistent
epoll: add a selftest for epoll timeout race
mm: always have io_remap_pfn_range() set pgprot_decrypted()
mm, oom: keep oom_adj under or at upper limit when printing
kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
mm/truncate.c: make __invalidate_mapping_pages() static
lib/crc32test: remove extra local_irq_disable/enable
ptrace: fix task_join_group_stop() for the case when current is traced
mm: mempolicy: fix potential pte_unmap_unlock pte error
kasan: adopt KUNIT tests to SW_TAGS mode
mm: memcg: link page counters to root if use_hierarchy is false
mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
hugetlb_cgroup: fix reservation accounting
mm/mremap_pages: fix static key devmap_managed_key updates
Ian Rogers [Thu, 29 Oct 2020 22:37:07 +0000 (15:37 -0700)]
libbpf, hashmap: Fix undefined behavior in hash_bits
If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.
Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.
The TI CPTS does not natively support PTPv1, only PTPv2. But, as it
happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because
CPTS HW parser looks for PTP messageType id in PTP message octet 0 which
value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1
and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1
messages (Delay_req/resp) and will return PTP messageType id 0 for them.
The commit e9523a5a32a1 ("net: ethernet: ti: cpsw: enable
HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping
advertisement by mistake, only to make Linux Kernel "timestamping" utility
work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW
timestamped, but Delay_req/resp are not.
Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1
compatible HW/SW can properly roll back to SW timestamping.
Zenghui Yu [Thu, 22 Oct 2020 12:24:17 +0000 (20:24 +0800)]
vfio/type1: Use the new helper to find vfio_group
When attaching a new group to the container, let's use the new helper
vfio_iommu_find_iommu_group() to check if it's already attached. There
is no functional change.
Also take this chance to add a missing blank line.
tracing: Make -ENOMEM the default error for parse_synth_field()
parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.
Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.
Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
ring-buffer: Fix recursion protection transitions between interrupt context
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.
Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.
Cc: [email protected] Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
gfs2: Don't call cancel_delayed_work_sync from within delete work function
Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work. When that happens, it will result in a
deadlock. Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).
Reported-by: Alexander Ahring Oder Aring <[email protected]> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: [email protected] # v5.8+ Signed-off-by: Andreas Gruenbacher <[email protected]>
Lukas Bulwahn [Mon, 2 Nov 2020 01:08:10 +0000 (17:08 -0800)]
kernel/hung_task.c: make type annotations consistent
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.
It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.
Hence, sparse complains:
kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))
Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:
kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
kernel/hung_task.c:277:52: expected void *
kernel/hung_task.c:277:52: got void [noderef] __user *buffer
Jason Gunthorpe [Mon, 2 Nov 2020 01:08:00 +0000 (17:08 -0800)]
mm: always have io_remap_pfn_range() set pgprot_decrypted()
The purpose of io_remap_pfn_range() is to map IO memory, such as a
memory mapped IO exposed through a PCI BAR. IO devices do not
understand encryption, so this memory must always be decrypted.
Automatically call pgprot_decrypted() as part of the generic
implementation.
This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail.
The CPU will encrypt access to those BAR pages instead of passing
unencrypted IO directly to the device.
Places not mapping IO should use remap_pfn_range().
mm, oom: keep oom_adj under or at upper limit when printing
For oom_score_adj values in the range [942,999], the current
calculations will print 16 for oom_adj. This patch simply limits the
output so output is inline with docs.
int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);
return 0;
}
fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().
This is because task_join_group_stop() has 2 problems when current is traced:
1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
can be woken up by debugger and it can clone another thread which
should join the group-stop.
We need to check group_stop_count || SIGNAL_STOP_STOPPED.
2. If SIGNAL_STOP_STOPPED is already set, we should not increment
sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
should stop without another do_notify_parent_cldstop() report.
To clarify, the problem is very old and we should blame
ptrace_init_task(). But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Now that we have KASAN-KUNIT tests integration, it's easy to see that
some KASAN tests are not adopted to the SW_TAGS mode and are failing.
Adjust the allocation size for kasan_memchr() and kasan_memcmp() by
roung it up to OOB_TAG_OFF so the bad access ends up in a separate
memory granule.
Add a new kmalloc_uaf_16() tests that relies on UAF, and a new
kasan_bitops_tags() test that is tailored to tag-based mode, as it's
hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic()
(renamed from kasan_bitops()) without losing the precision.
Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs,
as it's hard to adopt the existing kmalloc_oob_16() and
kasan_bitops_oob() (rename from kasan_bitops()) without losing the
precision.
Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS
mode doesn't instrument globals nor dynamic allocas.
The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.
After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example. It leads to an eventual underflow of the charge and
triggers a warning.
Fix it by linking all page counters to corresponding root page counters
in the non-hierarchical mode.
Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more
than 1 level. This patch doesn't change it.
The patch also doesn't affect how the hierarchical mode is working,
which is the only sane and truly supported mode now.
Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!
zhongjiang-ali [Mon, 2 Nov 2020 01:07:30 +0000 (17:07 -0800)]
mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
memcg_page_state will get the specified number in hierarchical memcg, It
should multiply by HPAGE_PMD_NR rather than an page if the item is
NR_ANON_THPS.
Mike Kravetz [Mon, 2 Nov 2020 01:07:27 +0000 (17:07 -0800)]
hugetlb_cgroup: fix reservation accounting
Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below. QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM. The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.
commit 6f42193fd86e ("memremap: don't use a separate devm action for
devmap_managed_enable_get") changed the static key updates such that we
now call devmap_managed_enable_put() without doing the equivalent
devmap_managed_enable_get().
devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
types too. This results in the below warning when switching between
system-ram and devdax mode for devdax namespace.
Vineet Gupta [Fri, 30 Oct 2020 02:18:34 +0000 (19:18 -0700)]
ARC: [plat-hsdk] Remap CCMs super early in asm boot trampoline
ARC HSDK platform stopped booting on released v5.10-rc1, getting stuck
in startup of non master SMP cores.
This was bisected to upstream commit 7fef431be9c9ac25
"(mm/page_alloc: place pages to tail in __free_pages_core())"
That commit itself is harmless, it just exposed a subtle assumption in
our platform code (hence CC'ing linux-mm just as FYI in case some other
arches / platforms trip on it).
The upstream commit is semantically disruptive as it reverses the order
of page allocations (actually it can be good test for hardware
verification to exercise different memory patterns altogether).
For ARC HSDK platform that meant a remapped memory region (pertaining to
unused Closely Coupled Memory) started getting used early for dynamice
allocations, while not effectively remapped on all the cores, triggering
memory error exception on those cores.
The fix is to move the CCM remapping from early platform code to to early core
boot code. And while it is undesirable to riddle common boot code with
platform quirks, there is no other way to do this since the faltering code
involves setting up stack itself so even function calls are not allowed at
that point.
If anyone is interested, all the gory details can be found at Link below.
Vineet Gupta [Tue, 27 Oct 2020 22:01:17 +0000 (15:01 -0700)]
ARC: stack unwinding: avoid indefinite looping
Currently stack unwinder is a while(1) loop which relies on the dwarf
unwinder to signal termination, which in turn relies on dwarf info to do
so. This in theory could cause an infinite loop if the dwarf info was
somehow messed up or the register contents were etc.
This fix thus detects the excessive looping and breaks the loop.
Maor Gottlieb [Wed, 28 Oct 2020 06:50:51 +0000 (08:50 +0200)]
IB/srpt: Fix memory leak in srpt_add_one
Failure in srpt_refresh_port() for the second port will leave MAD
registered for the first one, however, the srpt_add_one() will be marked
as "failed" and SRPT will leak resources for that registered but not used
and released first port.
Unregister the MAD agent for all ports in case of failure.
The patches are related to the software workaround for the A050385 erratum.
The first patch ensures optimal buffer usage for non-erratum scenarios. The
second patch fixes a currently inconsequential discrepancy between the
FMan and Ethernet drivers.
Changes in v3:
- refactor defines for clarity in 1/2
- add more details on the user impact in 1/2
- remove unnecessary inline identifier in 2/2
Changes in v2:
- make the returned value for TX ports explicit in 2/2
- simplify the buf_layout reference in 2/2
====================
Camelia Groza [Mon, 2 Nov 2020 18:34:36 +0000 (20:34 +0200)]
dpaa_eth: fix the RX headroom size alignment
The headroom reserved for received frames needs to be aligned to an
RX specific value. There is currently a discrepancy between the values
used in the Ethernet driver and the values passed to the FMan.
Coincidentally, the resulting aligned values are identical.
Fixes: 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround") Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: Camelia Groza <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Camelia Groza [Mon, 2 Nov 2020 18:34:35 +0000 (20:34 +0200)]
dpaa_eth: update the buffer layout for non-A050385 erratum scenarios
Impose a larger RX private data area only when the A050385 erratum is
present on the hardware. A smaller buffer size is sufficient in all
other scenarios. This enables a wider range of linear Jumbo frame
sizes in non-erratum scenarios, instead of turning to multi
buffer Scatter/Gather frames. The maximum linear frame size is
increased by 128 bytes for non-erratum arm64 platforms.
Cleanup the hardware annotations header defines in the process.
Parav Pandit [Fri, 30 Oct 2020 09:38:03 +0000 (11:38 +0200)]
RDMA: Fix software RDMA drivers for dma mapping error
The commit f959dcd6ddfd ("dma-direct: Fix potential NULL pointer
dereference") made dma_mask as mandetory field to be setup even for
dma_virt_ops based dma devices. The commit in the fixes tag omitted
setting up the dma_mask on virtual devices triggering the below trace when
they were combined during the merge window.
Fix it by setting empty DMA MASK for software based RDMA devices.
Keith Busch [Fri, 30 Oct 2020 17:28:54 +0000 (10:28 -0700)]
Revert "nvme-pci: remove last_sq_tail"
Multiple CPUs may be mapped to the same hctx, allowing mulitple
submission contexts to attempt commit_rqs(). We need to verify we're
not writing the same doorbell value multiple times since that's a spec
violation.
Jakub Kicinski [Mon, 2 Nov 2020 17:43:54 +0000 (09:43 -0800)]
Merge tag 'mac80211-for-net-2020-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of fixes, for
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
====================
There is no need to specify a "ULL" suffix for "all bits set": "~0" is
sufficient, and works regardless of type. In fact adding the suffix
makes the code more fragile.
PM: runtime: Resume the device earlier in __device_release_driver()
Since the device is resumed from runtime-suspend in
__device_release_driver() anyway, it is better to do that before
looking for busy managed device links from it to consumers, because
if there are any, device_links_unbind_consumers() will be called
and it will cause the consumer devices' drivers to unbind, so the
consumer devices will be runtime-resumed. In turn, resuming each
consumer device will cause the supplier to be resumed and when the
runtime PM references from the given consumer to it are dropped, it
may be suspended. Then, the runtime-resume of the next consumer
will cause the supplier to resume again and so on.
Update the code accordingly.
Signed-off-by: Rafael J. Wysocki <[email protected]> Fixes: 9ed9895370ae ("driver core: Functional dependencies tracking support") Cc: All applicable <[email protected]> # All applicable Tested-by: Xiang Chen <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]>
After commit d12544fb2aa9 ("PM: runtime: Remove link state checks in
rpm_get/put_supplier()") nothing prevents the consumer device's
runtime PM from acquiring additional references to the supplier
device after pm_runtime_clean_up_links() has run (or even while it
is running), so calling this function from __device_release_driver()
may be pointless (or even harmful).
Moreover, it ignores stateless device links, so the runtime PM
handling of managed and stateless device links is inconsistent
because of it, so better get rid of it entirely.
PM: runtime: Drop runtime PM references to supplier on link removal
While removing a device link, drop the supplier device's runtime PM
usage counter as many times as needed to drop all of the runtime PM
references to it from the consumer in addition to dropping the
consumer's link count.
Viresh Kumar [Fri, 30 Oct 2020 07:21:08 +0000 (12:51 +0530)]
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.
For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.
Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.
This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.
As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.
We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.
Helge Deller [Fri, 16 Oct 2020 10:13:00 +0000 (12:13 +0200)]
nfsroot: Default mount option should ask for built-in NFS version
Change the nfsroot default mount option to ask for NFSv2 only *if* the
kernel was built with NFSv2 support.
If not, default to NFSv3 or as last choice to NFSv4, depending on actual
kernel config.
swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets. Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling
memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
If the allocation must fail, no_iotlb_memory is set.
Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.
When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.
Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.
Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.
ftrace: Handle tracing when switching between context
When an interrupt or NMI comes in and switches the context, there's a delay
from when the preempt_count() shows the update. As the preempt_count() is
used to detect recursion having each context have its own bit get set when
tracing starts, and if that bit is already set, it is considered a recursion
and the function exits. But if this happens in that section where context
has changed but preempt_count() has not been updated, this will be
incorrectly flagged as a recursion.
To handle this case, create another bit call TRANSITION and test it if the
current context bit is already set. Flag the call as a recursion if the
TRANSITION bit is already set, and if not, set it and continue. The
TRANSITION bit will be cleared normally on the return of the function that
set it, or if the current context bit is clear, set it and clear the
TRANSITION bit to allow for another transition between the current context
and an even higher one.
The code that checks recursion will work to only do the recursion check once
if there's nested checks. The top one will do the check, the other nested
checks will see recursion was already checked and return zero for its "bit".
On the return side, nothing will be done if the "bit" is zero.
The problem is that zero is returned for the "good" bit when in NMI context.
This will set the bit for NMIs making it look like *all* NMI tracing is
recursing, and prevent tracing of anything in NMI context!
The simple fix is to return "bit + 1" and subtract that bit on the end to
get the real bit.
Qiujun Huang [Thu, 29 Oct 2020 16:19:05 +0000 (00:19 +0800)]
tracing: Fix out of bounds write in get_trace_buf
The nesting count of trace_printk allows for 4 levels of nesting. The
nesting counter starts at zero and is incremented before being used to
retrieve the current context's buffer. But the index to the buffer uses the
nesting counter after it was incremented, and not its original number,
which in needs to do.
scripts: get_abi.pl: Don't let ABI files to create subtitles
The ReST output should only contain documentation titles
automatically created by the script.
There are two reasons for that:
1) Consistency.
just a handful ABI docs define titles
2) To avoid critical errors.
Docutils (which is the basis for Sphinx) allows a free
assign of documentation title markups. So, one document
could be doing things like:
Level 1
=======
Level 2
-------
While another one could do the reverse:
Level 1
-------
Level 2
=======
But the same document can't mix.
As the output of get_abi.pl will join contents from multiple
files, if they don't define the levels on a consistent errors,
errors like this can happen:
Sphinx parallel build error:
docutils.utils.SystemMessage: /home/rdunlap/lnx/lnx-510-rc2/Documentation/ABI/testing/sysfs-bus-rapidio:2: (SEVERE/4) Title level inconsistent:
Attributes Common for All RapidIO Devices
-----------------------------------------
Which cause some versions of Sphinx to go into an endless
loop.
It should be noticed that an alternative to that would
be to replace all title occurrences by a single markup,
but that will make the parser more complex, and, due to
(1) it would generate an inconsistent output.
So, better to just remove the titles defined at the ABI
files from the output.
Merge tag 'fixes-for-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
USB: fixes for v5.10-rc2
Nothing major as of yet, we're adding support for Intel Alder Lake-S
in dwc3, together with a few fixes that are quite important (memory
leak in raw-gadget, probe crashes in goku_udc, and so on).
Signed-off-by: Felipe Balbi <[email protected]>
* tag 'fixes-for-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: raw-gadget: fix memory leak in gadget_setup
usb: dwc2: Avoid leaving the error_debugfs label unused
usb: dwc3: ep0: Fix delay status handling
usb: gadget: fsl: fix null pointer checking
usb: gadget: goku_udc: fix potential crashes in probe
usb: dwc3: pci: add support for the Intel Alder Lake-S
Maxime Ripard [Thu, 29 Oct 2020 19:01:04 +0000 (20:01 +0100)]
drm/vc4: kms: Add functions to create the state objects
In order to make the vc4_kms_load code and error path a bit easier to
read and extend, add functions to create state objects, and use managed
actions to cleanup if needed.
Maxime Ripard [Thu, 29 Oct 2020 19:01:01 +0000 (20:01 +0100)]
drm/vc4: gem: Add a managed action to cleanup the job queue
The job queue needs to be cleaned up using vc4_gem_destroy, but it's
not used consistently (vc4_drv's bind calls it in its error path, but
doesn't in unbind), and we can make that automatic through a managed
action. Let's remove the requirement to call vc4_gem_destroy.
Maxime Ripard [Thu, 29 Oct 2020 19:00:59 +0000 (20:00 +0100)]
drm/vc4: bo: Add a managed action to cleanup the cache
The BO cache needs to be cleaned up using vc4_bo_cache_destroy, but it's
not used consistently (vc4_drv's bind calls it in its error path, but
doesn't in unbind), and we can make that automatic through a managed
action. Let's remove the requirement to call vc4_bo_cache_destroy.
Qian Cai [Wed, 28 Oct 2020 18:23:34 +0000 (14:23 -0400)]
powerpc/smp: Call rcu_cpu_starting() earlier
The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):
This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.
It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:
It uses a per-CPU variable so that RCU pays attention only to the
first call to rcu_cpu_starting() if there is more than one of them.
This is even intentional, due to there being a generic
arch-independent call to rcu_cpu_starting() in
notify_cpu_starting().
So multiple calls to rcu_cpu_starting() are fine by design.
Qian Cai [Wed, 28 Oct 2020 15:27:17 +0000 (11:27 -0400)]
powerpc/eeh_cache: Fix a possible debugfs deadlock
Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.
Linus Torvalds [Sun, 1 Nov 2020 19:21:26 +0000 (11:21 -0800)]
Merge tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Three fixes all related to #DB:
- Handle the BTF bit correctly so it doesn't get lost due to a kernel
#DB
- Only clear and set the virtual DR6 value used by ptrace on user
space triggered #DB. A kernel #DB must leave it alone to ensure
data consistency for ptrace.
- Make the bitmasking of the virtual DR6 storage correct so it does
not lose DR_STEP"
* tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
x86/debug: Fix BTF handling
Linus Torvalds [Sun, 1 Nov 2020 19:13:45 +0000 (11:13 -0800)]
Merge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A few fixes for timers/timekeeping:
- Prevent undefined behaviour in the timespec64_to_ns() conversion
which is used for converting user supplied time input to
nanoseconds. It lacked overflow protection.
- Mark sched_clock_read_begin/retry() to prevent recursion in the
tracer
- Remove unused debug functions in the hrtimer and timerlist code"
* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Prevent undefined behaviour in timespec64_to_ns()
timers: Remove unused inline funtion debug_timer_free()
hrtimer: Remove unused inline function debug_hrtimer_free()
time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
Linus Torvalds [Sun, 1 Nov 2020 19:11:38 +0000 (11:11 -0800)]
Merge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp fix from Thomas Gleixner:
"A single fix for stop machine.
Mark functions no trace to prevent a crash caused by recursion when
enabling or disabling a tracer on RISC-V (probably all architectures
which patch through stop machine)"
* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine, rcu: Mark functions as notrace
Linus Torvalds [Sun, 1 Nov 2020 18:05:16 +0000 (10:05 -0800)]
Merge tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes/removals from Greg KH:
"Here's some small fixes for 5.10-rc2 and a big driver removal.
The fixes are for some reported issues in the interconnect and
coresight drivers, nothing major.
The "big" driver removal is the MIC drivers have been asked to be
removed as the hardware never shipped and Intel no longer wants to
maintain something that no one can use. This is welcomed by many as
the DMA usage of these drivers was "interesting" and the security
people were starting to question some issues that were starting to be
found in the codebase.
Note, one of the subsystems for this driver, the "VOP" code, will
probably come back in future kernel versions as it was looking to
potentially solve some PCIe virtualization issues that a number of
other vendors were wanting to solve. But as-is, this codebase didn't
work for anyone else so no actual functionality is being removed.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: cti: Initialize dynamic sysfs attributes
coresight: Fix uninitialised pointer bug in etm_setup_aux()
coresight: add module license
misc: mic: remove the MIC drivers
interconnect: qcom: use icc_sync state for sm8[12]50
interconnect: qcom: Ensure that the floor bandwidth value is enforced
interconnect: qcom: sc7180: Init BCMs before creating the nodes
interconnect: qcom: sdm845: Init BCMs before creating the nodes
interconnect: Aggregate before setting initial bandwidth
interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM
Linus Torvalds [Sun, 1 Nov 2020 17:59:13 +0000 (09:59 -0800)]
Merge tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and documentation fixes from Greg KH:
"Here is one tiny debugfs change to fix up an API where the last user
was successfully fixed up in 5.10-rc1 (so it couldn't be merged
earlier), and a much larger Documentation/ABI/ update to the files so
they can be automatically parsed by our tools.
The Documentation/ABI/ updates are just formatting issues, small ones
to bring the files into parsable format, and have been acked by
numerous subsystem maintainers and the documentation maintainer. I
figured it was good to get this into 5.10-rc2 to help wih the merge
issues that would arise if these were to stick in linux-next until
5.11-rc1.
The debugfs change has been in linux-next for a long time, and the
Documentation updates only for the last linux-next release"
* tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
scripts: get_abi.pl: assume ReST format by default
docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
docs: ABI: sysfs-class-backlight: unify ABI documentation
docs: ABI: sysfs-c2port: remove a duplicated entry
docs: ABI: sysfs-class-power: unify duplicated properties
docs: ABI: unify /sys/class/leds/<led>/brightness documentation
docs: ABI: stable: remove a duplicated documentation
docs: ABI: change read/write attributes
docs: ABI: cleanup several ABI documents
docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
docs: ABI: vdso: use the right format for ABI
docs: ABI: fix syntax to be parsed using ReST notation
docs: ABI: convert testing/configfs-acpi to ReST
docs: Kconfig/Makefile: add a check for broken ABI files
docs: abi-testing.rst: enable --rst-sources when building docs
docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
docs: ABI: create a 2-depth index for ABI
docs: ABI: make it parse ABI/stable as ReST-compatible files
docs: ABI: sysfs-uevent: make it compatible with ReST output
docs: ABI: testing: make the files compatible with ReST output
...
Linus Torvalds [Sun, 1 Nov 2020 17:57:24 +0000 (09:57 -0800)]
Merge tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for issues that have been
reported in 5.10-rc1:
- octeon driver fixes
- wfx driver fixes
- memory leak fix in vchiq driver
- fieldbus driver bugfix
- comedi driver bugfix
All of these have been in linux-next with no reported issues"
* tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: fieldbus: anybuss: jump to correct label in an error path
staging: wfx: fix test on return value of gpiod_get_value()
staging: wfx: fix use of uninitialized pointer
staging: mmal-vchiq: Fix memory leak for vchiq_instance
staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
staging: octeon: Drop on uncorrectable alignment or FCS error
staging: octeon: repair "fixed-link" support
Linus Torvalds [Sun, 1 Nov 2020 17:55:36 +0000 (09:55 -0800)]
Merge tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes for reported issues
for 5.10-rc2. They include:
- vt ioctl bugfix for reported problems
- fsl_lpuart serial driver fix
- 21285 serial driver bugfix
All have been in linux-next with no reported issues"
* tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt_ioctl: fix GIO_UNIMAP regression
vt: keyboard, extend func_buf_lock to readers
vt: keyboard, simplify vt_kdgkbsent
tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
tty: serial: 21285: fix lockup on open
Linus Torvalds [Sun, 1 Nov 2020 17:43:32 +0000 (09:43 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- selftest fix
- force PTE mapping on device pages provided via VFIO
- fix detection of cacheable mapping at S2
- fallback to PMD/PTE mappings for composite huge pages
- fix accounting of Stage-2 PGD allocation
- fix AArch32 handling of some of the debug registers
- simplify host HYP entry
- fix stray pointer conversion on nVHE TLB invalidation
- fix initialization of the nVHE code
- simplify handling of capabilities exposed to HYP
- nuke VCPUs caught using a forbidden AArch32 EL0
x86:
- new nested virtualization selftest
- miscellaneous fixes
- make W=1 fixes
- reserve new CPUID bit in the KVM leaves"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx: remove unused variable
KVM: selftests: Don't require THP to run tests
KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
KVM: selftests: test behavior of unmapped L2 APIC-access address
KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
KVM: x86: replace static const variables with macros
KVM: arm64: Handle Asymmetric AArch32 systems
arm64: cpufeature: upgrade hyp caps to final
arm64: cpufeature: reorder cpus_have_{const, final}_cap()
KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
KVM: arm64: Force PTE mapping on fault resulting in a device mapping
KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
KVM: arm64: Fix masks in stage2_pte_cacheable()
KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
Thomas Gleixner [Sun, 1 Nov 2020 16:54:13 +0000 (17:54 +0100)]
Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836)
- Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling)
- "unmapped events" handling for the ti-sci-inta controller
- Tidying up for the irq-mst driver (static functions, Kconfig)
- Small cleanup in the Renesas irqpin driver
- STM32 exti can now handle LP timer events
Peter Ujfalusi [Tue, 20 Oct 2020 07:32:43 +0000 (10:32 +0300)]
irqchip/ti-sci-inta: Add support for unmapped event handling
The DMA (BCDMA/PKTDMA and their rings/flows) events are under the INTA's
supervision as unmapped events in AM64.
In order to keep the current SW stack working, the INTA driver must replace
the dev_id with it's own when a request comes for BCDMA or PKTDMA
resources.
Implement parsing of the optional "ti,unmapped-event-sources" phandle array
to get the sci-dev-ids of the devices where the unmapped events originate.
Peter Ujfalusi [Tue, 20 Oct 2020 07:32:42 +0000 (10:32 +0300)]
dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling
The new DMA architecture introduced with AM64 introduced new event types:
unampped events.
These events are mapped within INTA in contrast to other K3 devices where
the events with similar function was originating from the UDMAP or ringacc.
The ti,unmapped-event-sources should contain phandle array to the devices
in the system (typically DMA controllers) from where the unmapped events
originate.
irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
Get rid of the separate flag to indicate if the IRLM bit is present in
the INTC/Interrupt Control Register 0, by considering -1 an invalid
irlm_bit value.
wenxu [Fri, 30 Oct 2020 03:32:08 +0000 (11:32 +0800)]
ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit 962924fa2b7a ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")
When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.
Mark Deneen [Fri, 30 Oct 2020 15:58:14 +0000 (15:58 +0000)]
cadence: force nonlinear buffers to be cloned
In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
second device with usb ethernet (r8152) sending ICMP packets. If the packet
was larger than about 220 bytes, the SAMA5 device would "oops" with the
following trace:
kernel BUG at net/core/skbuff.c:1863!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
CPU: 0 PID: 0 Comm: swapper Tainted: G O 5.9.1-prox2+ #1
Hardware name: Atmel SAMA5
PC is at skb_put+0x3c/0x50
LR is at macb_start_xmit+0x134/0xad0 [macb]
pc : [<c05258cc>] lr : [<bf0ea5b8>] psr: 20070113
sp : c0d01a60 ip : c07232c0 fp : c4250000
r10: c0d03cc8 r9 : 00000000 r8 : c0d038c0
r7 : 00000000 r6 : 00000008 r5 : c59b66c0 r4 : 0000002a
r3 : 8f659eff r2 : c59e9eea r1 : 00000001 r0 : c59b66c0
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c53c7d Table: 2640c059 DAC: 00000051
Process swapper (pid: 0, stack limit = 0x75002d81)
The solution was to force nonlinear buffers to be cloned. This was previously
reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
but never formally submitted as a patch.
This is the third revision, hopefully the formatting is correct this time!
Linus Torvalds [Sat, 31 Oct 2020 21:41:48 +0000 (14:41 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost fixes from Michael Tsirkin:
"Fixes all over the place.
A new UAPI is borderline: can also be considered a new feature but
also seems to be the only way we could come up with to fix addressing
for userspace - and it seems important to switch to it now before
userspace making assumptions about addressing ability of devices is
set in stone"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpasim: allow to assign a MAC address
vdpasim: fix MAC address configuration
vdpa: handle irq bypass register failure case
vdpa_sim: Fix DMA mask
Revert "vhost-vdpa: fix page pinning leakage in error path"
vdpa/mlx5: Fix error return in map_direct_mr()
vhost_vdpa: Return -EFAULT if copy_from_user() fails
vdpa_sim: implement get_iova_range()
vhost: vdpa: report iova range
vdpa: introduce config op to get valid iova range