eventpoll: fix missing wakeup for ovflist in ep_poll_callback
In the event that we add to ovflist, before commit 339ddb53d373
("fs/epoll: remove unnecessary wakeups of nested epoll") we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
With that wakeup removed, if we add to ovflist here, we may never wake
up. Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.
We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups. I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().
Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory(). As the
original coding of this call was correct, revert the change made by that
commit.
If the trapping instruction contains a ':', for a memory access through
segment registers for example, the sed substitution will insert the '*'
marker in the middle of the instruction instead of the line address:
I started to think I had forgotten some quirk of the assembly syntax
before noticing that it was actually coming from the script. Fix it to
add the address marker at the right place for these instructions:
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: __pageblock_pfn_to_page+0x134/0x1c0
Call Trace:
set_zone_contiguous+0x56/0x70
page_alloc_init_late+0x166/0x176
kernel_init_freeable+0xfa/0x255
kernel_init+0xa/0x106
ret_from_fork+0x35/0x40
The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU. Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.
Yafang Shao [Fri, 8 May 2020 01:35:43 +0000 (18:35 -0700)]
mm, memcg: fix error return value of mem_cgroup_css_alloc()
When I run my memcg testcase which creates lots of memcgs, I found
there're unexpected out of memory logs while there're still enough
available free memory. The error log is
The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs,
an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right
errno should be -ENOSPC "No space left on device", which is an
appropriate errno for userspace's failed mkdir.
As the errno really misled me, we should make it right. After this
patch, the error log will be
mkdir: cannot create directory 'foo.65533': No space left on device
Oleg Nesterov [Fri, 8 May 2020 01:35:39 +0000 (18:35 -0700)]
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.
Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().
This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?
Linus Torvalds [Thu, 7 May 2020 22:27:11 +0000 (15:27 -0700)]
Merge tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM
enabled
- Fix allocation leaks in bootconfig tool
- Fix a double initialization of a variable
- Fix API bootconfig usage from kprobe boot time events
- Reject NULL location for kprobes
- Fix crash caused by preempt delay module not cleaning up kthread
correctly
- Add vmalloc_sync_mappings() to prevent x86_64 page faults from
recursively faulting from tracing page faults
- Fix comment in gpu/trace kerneldoc header
- Fix documentation of how to create a trace event class
- Make the local tracing_snapshot_instance_cond() function static
* tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tools/bootconfig: Fix resource leak in apply_xbc()
tracing: Make tracing_snapshot_instance_cond() static
tracing: Fix doc mistakes in trace sample
gpu/trace: Minor comment updates for gpu_mem_total tracepoint
tracing: Add a vmalloc_sync_mappings() for safe measure
tracing: Wait for preempt irq delay thread to finish
tracing/kprobes: Reject new event if loc is NULL
tracing/boottime: Fix kprobe event API usage
tracing/kprobes: Fix a double initialization typo
bootconfig: Fix to remove bootconfig data from initrd while boot
Linus Torvalds [Thu, 7 May 2020 22:22:08 +0000 (15:22 -0700)]
Merge tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"ftrace test fixes and a fix to kvm Makefile for relocatable
native/cross builds and installs"
* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix kvm relocatable native/cross builds and installs
selftests/ftrace: Make XFAIL green color
ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
ftrace/selftests: workaround cgroup RT scheduling issues
Jens Axboe [Thu, 7 May 2020 20:56:15 +0000 (14:56 -0600)]
io_uring: don't use 'fd' for openat/openat2/statx
We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.
Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.
This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.
gpu/trace: Minor comment updates for gpu_mem_total tracepoint
This change updates the improper comment for the 'size' attribute in the
tracepoint definition. Most gfx drivers pre-fault in physical pages
instead of making virtual allocations. So we drop the 'Virtual' keyword
here and leave this to the implementations.
tracing: Add a vmalloc_sync_mappings() for safe measure
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu
areas can be complex, to say the least. Mappings may happen at boot up, and
if nothing synchronizes the page tables, those page mappings may not be
synced till they are used. This causes issues for anything that might touch
one of those mappings in the path of the page fault handler. When one of
those unmapped mappings is touched in the page fault handler, it will cause
another page fault, which in turn will cause a page fault, and leave us in
a loop of page faults.
Commit 763802b53a42 ("x86/mm: split vmalloc_sync_all()") split
vmalloc_sync_all() into vmalloc_sync_unmappings() and
vmalloc_sync_mappings(), as on system exit, it did not need to do a full
sync on x86_64 (although it still needed to be done on x86_32). By chance,
the vmalloc_sync_all() would synchronize the page mappings done at boot up
and prevent the per cpu area from being a problem for tracing in the page
fault handler. But when that synchronization in the exit of a task became a
nop, it caused the problem to appear.
tracing: Wait for preempt irq delay thread to finish
Running on a slower machine, it is possible that the preempt delay kernel
thread may still be executing if the module was immediately removed after
added, and this can cause the kernel to crash as the kernel thread might be
executing after its code has been removed.
There's no reason that the caller of the code shouldn't just wait for the
delay thread to finish, as the thread can also be created by a trigger in
the sysfs code, which also has the same issues.
Linus Torvalds [Thu, 7 May 2020 16:50:59 +0000 (09:50 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.
Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...
Pavel Begunkov [Mon, 4 May 2020 19:39:35 +0000 (22:39 +0300)]
splice: move f_mode checks to do_{splice,tee}()
do_splice() is used by io_uring, as will be do_tee(). Move f_mode
checks from sys_{splice,tee}() to do_{splice,tee}(), so they're
enforced for io_uring as well.
Kristen found a hang in objtool when building with -ffunction-sections.
It was caused by evergreen_pcie_gen2_enable.cold() being laid out
immediately before evergreen_pcie_gen2_enable(). Since their "pfunc" is
always the same, find_jump_table() got into an infinite loop because it
didn't recognize the boundary between the two functions.
Fix that with a new prev_insn_same_sym() helper, which doesn't cross
subfunction boundaries.
bdi_dev_name is not a fast path function, move it out of line. This
prepares for using it from modular callers without having to export
an implementation detail like bdi_unknown_name.
Simplify the bdi name to mirror what we are doing elsewhere, and
drop them name in favor of just using a number. This avoids a
potentially very long bdi name.
Mark Rutland [Tue, 5 May 2020 12:59:30 +0000 (13:59 +0100)]
arm64: hugetlb: avoid potential NULL dereference
The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:
| CC arch/arm64/mm/pageattr.o
| CC arch/arm64/mm/hugetlbpage.o
| from arch/arm64/mm/hugetlbpage.c:10:
| arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
| ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
| |arch/arm64/mm/hugetlbpage.c:232:10:
| |./arch/arm64/include/asm/pgtable-types.h:28:24:
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
This can only occur when the kernel cannot allocate a page, and so is
unlikely to happen in practice before other systems start failing.
We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
in the function if pud_alloc() fails.
usb: chipidea: msm: Ensure proper controller reset using role switch API
Currently we check to make sure there is no error state on the extcon
handle for VBUS when writing to the HS_PHY_GENCONFIG_2 register. When using
the USB role-switch API we still need to write to this register absent an
extcon handle.
This patch makes the appropriate update to ensure the write happens if
role-switching is true.
1) Fix reference count leaks in various parts of batman-adv, from Xiyu
Yang.
2) Update NAT checksum even when it is zero, from Guillaume Nault.
3) sk_psock reference count leak in tls code, also from Xiyu Yang.
4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
fq_codel, from Eric Dumazet.
5) Fix panic in choke_reset(), also from Eric Dumazet.
6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.
7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.
8) Fix crash in x25_disconnect(), from Yue Haibing.
9) Don't pass pointer to local variable back to the caller in
nf_osf_hdr_ctx_init(), from Arnd Bergmann.
10) Wireguard should use the ECN decap helper functions, from Toke
Høiland-Jørgensen.
11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.
12) Fix uninitialized variable access in mptcp's
subflow_syn_recv_sock(), from Paolo Abeni.
13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
Scott Dial.
14) IPv6 needs to use a global serial number for dst validation just
like ipv4, from David Ahern.
15) Fix up PTP_1588_CLOCK deps, from Clay McClure.
16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
Yoshiyuki Kurauchi.
17) Fix a regression in that dsa user port errors should not be fatal,
from Florian Fainelli.
18) Fix iomap leak in enetc driver, from Dejin Zheng.
19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.
20) Initialize protocol value earlier in neigh code paths when
generating events, from Roman Mashak.
21) netdev_update_features() must be called with RTNL mutex in macsec
driver, from Antoine Tenart.
22) Validate untrusted GSO packets even more strictly, from Willem de
Bruijn.
23) Wireguard decrypt worker needs a cond_resched(), from Jason
Donenfeld.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
wireguard: send/receive: cond_resched() when processing worker ringbuffers
wireguard: socket: remove errant restriction on looping to self
wireguard: selftests: use normal kernel stack size on ppc64
net: ethernet: ti: am65-cpsw-nuss: fix irqs type
ionic: Use debugfs_create_bool() to export bool
net: dsa: Do not leave DSA master with NULL netdev_ops
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
net: stricter validation of untrusted gso packets
seg6: fix SRH processing to comply with RFC8754
net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
net: dsa: ocelot: the MAC table on Felix is twice as large
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
net: hsr: fix incorrect type usage for protocol variable
net: macsec: fix rtnl locking issue
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
...
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
that the frontend does not need counters, this hw stats type request
never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
the driver to disable the stats, however, if the driver cannot disable
counters, it bails out.
TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
(this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.
Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Lukas Bulwahn [Wed, 6 May 2020 20:29:06 +0000 (22:29 +0200)]
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
Commit 9b038086f06b ("docs: networking: convert DIM to RST") added a new
file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following
alphabetical order.
So, ./scripts/checkpatch.pl -f MAINTAINERS complains:
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic
order
#5966: FILE: MAINTAINERS:5966:
+F: lib/dim/
+F: Documentation/networking/net_dim.rst
Reorder the file entries to keep MAINTAINERS nicely ordered.
David S. Miller [Thu, 7 May 2020 03:03:48 +0000 (20:03 -0700)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 5.7-rc5
With Ubuntu and Debian having backported this into their kernels, we're
finally seeing testing from places we hadn't seen prior, which is nice.
With that comes more fixes:
1) The CI for PPC64 was running with extremely small stacks for 64-bit,
causing spurious crashes in surprising places.
2) There's was an old leftover routing loop restriction, which no longer
makes sense given the queueing architecture, and was causing problems
for people who really did want nested routing.
3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused
RCU stalls and other issues, reported by Wang Jian, with the fix
suggested by Sultan Alsawaf.
4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by
Arnd Bergmann.
5) A complicated if statement was simplified to an assignment while also
making the likely/unlikely hinting more correct and simple, and
increasing readability, suggested by Sultan.
Patches (2) and (3) have Fixes: lines and are probably good candidates
for stable.
====================
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.
What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
Without setting these to NULL, clang complains in certain
configurations that have CONFIG_IPV6=n:
In file included from drivers/net/wireguard/ratelimiter.c:223:
drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning
struct sk_buff *skb4, *skb6;
^
= NULL
drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning
struct ipv6hdr *hdr6;
^
We silence this warning by setting the variables to NULL as the warning
suggests.
wireguard: send/receive: cond_resched() when processing worker ringbuffers
Users with pathological hardware reported CPU stalls on CONFIG_
PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning
these workers would never terminate. That turned out not to be okay on
systems without forced preemption, which Sultan observed. This commit
adds a cond_resched() to the bottom of each loop iteration, so that
these workers don't hog the core. Note that we don't need this on the
napi poll worker, since that terminates after its budget is expended.
wireguard: socket: remove errant restriction on looping to self
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.
At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
wireguard: selftests: use normal kernel stack size on ppc64
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.
The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs
triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX
IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs
triggering and NAPI scheduling for empty queues. It was discovered with
RT-kernel.
Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as
IRQF_TRIGGER_HIGH.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: dsa: Do not leave DSA master with NULL netdev_ops
When ndo_get_phys_port_name() for the CPU port was added we introduced
an early check for when the DSA master network device in
dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When
we perform the teardown operation in dsa_master_ndo_teardown() we would
not be checking that cpu_dp->orig_ndo_ops was successfully allocated and
non-NULL initialized.
With network device drivers such as virtio_net, this leads to a NPD as
soon as the DSA switch hanging off of it gets torn down because we are
now assigning the virtio_net device's netdev_ops a NULL pointer.
Vladimir Oltean [Mon, 4 May 2020 19:58:56 +0000 (22:58 +0300)]
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
This was caused by a poor merge conflict resolution on my side. The
"act = &cls->rule->action.entries[0];" assignment was already present in
the code prior to the patch mentioned below.
Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with transport header extending beyond skb_headlen(skb).
Tighten validation at kernel entry:
- Verify that the transport header lies within the linear section.
To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.
- Match the gso_type against the ip_proto found by the flow dissector.
Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined
in RFC8754.
RFC8754 (section 4.1) defines the SR source node behavior which encapsulates
packets into an outer IPv6 header and SRH. The SR source node encodes the
full list of Segments that defines the packet path in the SRH. Then, the
first segment from list of Segments is copied into the Destination address
of the outer IPv6 header and the packet is sent to the first hop in its path
towards the destination.
If the Segment list has only one segment, the SR source node can omit the SRH
as he only segment is added in the destination address.
RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not
require the entire SID list to be preserved in the SRH. A reduced SRH does
not contain the first segment of the related SR Policy (the first segment is
the one already in the DA of the IPv6 header), and the Last Entry field is
set to n-2, where n is the number of elements in the SR Policy.
RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to
validate the SRH (S09, S10, S11) which works for both reduced and
non-reduced behaviors.
This patch updates seg6_validate_srh() to validate the SRH as per RFC8754.
====================
FDB fixes for Felix and Ocelot switches
This series fixes the following problems:
- Dynamically learnt addresses never expiring (neither for Ocelot nor
for Felix)
- Half of the FDB not visible in 'bridge fdb show' (for Felix only)
====================
Vladimir Oltean [Sun, 3 May 2020 22:20:26 +0000 (01:20 +0300)]
net: dsa: ocelot: the MAC table on Felix is twice as large
When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC
addresses would appear, sometimes they wouldn't.
Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192
entries on VSC9959 (Felix), so the existing code from the Ocelot common
library only dumped half of Felix's MAC table. They are both organized
as a 4-way set-associative TCAM, so we just need a single variable
indicating the correct number of rows.
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Wed, 6 May 2020 23:40:14 +0000 (16:40 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
"Fix a resource allocation issue in cros_ec_sensorhub.c"
* tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
Thomas Gleixner [Tue, 14 Apr 2020 09:07:22 +0000 (11:07 +0200)]
ARM: futex: Address build warning
Stephen reported the following build warning on a ARM multi_v7_defconfig
build with GCC 9.2.1:
kernel/futex.c: In function 'do_futex':
kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
1676 | return oldval == cmparg;
| ~~~~~~~^~~~~~~~~
kernel/futex.c:1652:6: note: 'oldval' was declared here
1652 | int oldval, ret;
| ^~~~~~
introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser()
calling conventions change").
While that change should not make any difference it confuses GCC which
fails to work out that oldval is not referenced when the return value is
not zero.
GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the
early return, the issue is with the assembly macros. GCC fails to detect
that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT
which makes oldval uninteresting. The store to the callsite supplied oldval
pointer is conditional on ret == 0.
The straight forward way to solve this is to make the store unconditional.
Aside of addressing the build warning this makes sense anyway because it
removes the conditional from the fastpath. In the error case the stored
value is uninteresting and the extra store does not matter at all.
Vladimir Oltean [Wed, 6 May 2020 17:48:13 +0000 (20:48 +0300)]
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
It looks like the sja1105 external timestamping input is not as generic
as we thought. When fed a signal with 50% duty cycle, it will timestamp
both the rising and the falling edge. When fed a short pulse signal,
only the timestamp of the falling edge will be seen in the PTPSYNCTS
register, because that of the rising edge had been overwritten. So the
moral is: don't feed it short pulse inputs.
Luckily this is not a complete deal breaker, as we can still work with
1 Hz square waves. But the problem is that the extts polling period was
not dimensioned enough for this input signal. If we leave the period at
half a second, we risk losing timestamps due to jitter in the measuring
process. So we need to increase it to 4 times per second.
Also, the very least we can do to inform the user is to deny any other
flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both
set.
Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT") Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Wed, 6 May 2020 13:58:30 +0000 (15:58 +0200)]
net: macsec: fix rtnl locking issue
netdev_update_features() must be called with the rtnl lock taken. Not
doing so triggers a warning, as ASSERT_RTNL() is used in
__netdev_update_features(), the first function called by
netdev_update_features(). Fix this.
Fixes: c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dan Carpenter [Wed, 6 May 2020 10:16:56 +0000 (13:16 +0300)]
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
The "info->fs.location" is a u32 that comes from the user via the
ethtool_set_rxnfc() function. We need to check for invalid values to
prevent a buffer overflow.
I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins()
function.
Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dan Carpenter [Wed, 6 May 2020 10:16:22 +0000 (13:16 +0300)]
net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()
The "rss_context" variable comes from the user via ethtool_get_rxfh().
It can be any u32 value except zero. Eventually it gets passed to
mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it
results in an array overflow.
Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge tag 'usb-serial-5.7-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.7-rc5
Here's a fix adding a missing input sanity check and a new modem device
id.
Both have been in linux-next with no reported issues.
* tag 'usb-serial-5.7-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: qcserial: Add DW5816e support
USB: serial: garmin_gps: add sanity checking for data length
Paolo Bonzini [Wed, 6 May 2020 12:09:17 +0000 (08:09 -0400)]
Merge tag 'kvm-s390-master-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix for running nested uner z/VM
There are circumstances when running nested under z/VM that would trigger a
WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this
code more robust and flexible, but just returning instead of WARNING makes
guest bootable again.
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
as supported. My wild guess is that userspaces like QEMU are using "#ifdef
KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
wrong because the compilation host may not be the runtime host.
The userspace might still want to keep the old "#ifdef" though to not break the
guest debug on old kernels.
Peter Xu [Mon, 4 May 2020 22:06:07 +0000 (18:06 -0400)]
KVM: selftests: Fix build for evmcs.h
I got this error when building kvm selftests:
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here
I think it's because evmcs.h is included both in a test file and a lib file so
the structs have multiple declarations when linking. After all it's not a good
habit to declare structs in the header files.
Paolo Bonzini [Tue, 5 May 2020 13:40:46 +0000 (09:40 -0400)]
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
Using CPUID data can be useful for the processor compatibility
check, but that's it. Using it to compute guest-reserved bits
can have both false positives (such as LA57 and UMIP which we
are already handling) and false negatives: in particular, with
this patch we don't allow anymore a KVM guest to set CR4.PKE
when CR4.PKE is clear on the host.
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the
RSB has always clobbered RFLAGS, its current incarnation just happens
clear CF and ZF in the processs. Relying on the macro to clear CF and
ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation:
Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
that the ZF flag is always set.
docs/virt/kvm: Document configuring and running nested guests
This is a rewrite of this[1] Wiki page with further enhancements. The
doc also includes a section on debugging problems in nested
environments, among other improvements.
Atish Patra [Mon, 4 May 2020 04:03:19 +0000 (21:03 -0700)]
RISC-V: Remove unused code from STRICT_KERNEL_RWX
This patch removes the unused functions set_kernel_text_rw/ro.
Currently, it is not being invoked from anywhere and no other architecture
(except arm) uses this code. Even in ARM, these functions are not invoked
from anywhere currently.
Linus Torvalds [Tue, 5 May 2020 23:29:03 +0000 (16:29 -0700)]
Merge tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
- Avoid loading asus-nb-wmi module on selected laptop models
- Fix S0ix debug support for Jasper Lake PMC
- Few fixes which have been reported by Hulk bot and others
* tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
platform/x86: intel_pmc_core: avoid unused-function warnings
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
platform/x86/intel-uncore-freq: make uncore_root_kobj static
platform/x86: wmi: Make two functions static
platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe
Roman Mashak [Sat, 2 May 2020 01:34:18 +0000 (21:34 -0400)]
neigh: send protocol value in neighbor create notification
When a new neighbor entry has been added, event is generated but it does not
include protocol, because its value is assigned after the event notification
routine has run, so move protocol assignment code earlier.
drm/amd/display: Prevent dpcd reads with passive dongles
[why]
During hotplug, a DP port may be connected to the sink through
passive adapter which does not support DPCD reads. Issuing reads
without checking for this condition will result in errors
[how]
Ensure the link is in aux_mode before initiating operation that result
in a DPCD read.
Sung Lee [Mon, 20 Apr 2020 15:38:30 +0000 (11:38 -0400)]
drm/amd/display: Update DCN2.1 DV Code Revision
[WHY & HOW]
There is a problem in hscale_pixel_rate, the bug
causes DCN to be more optimistic (more likely to underflow)
in upscale cases during prefetch.
This commit ports the fix from DV code to address these issues.
Xiaoguang Wang [Tue, 5 May 2020 08:28:53 +0000 (16:28 +0800)]
io_uring: handle -EFAULT properly in io_uring_setup()
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.
Dejin Zheng [Tue, 5 May 2020 02:03:29 +0000 (10:03 +0800)]
net: broadcom: fix a mistake about ioremap resource
Commit d7a5502b0bb8b ("net: broadcom: convert to
devm_platform_ioremap_resource_byname()") will broke this driver.
idm_base and nicpm_base were optional, after this change, they are
mandatory. it will probe fails with -22 when the dtb doesn't have them
defined. so revert part of this commit and make idm_base and nicpm_base
as optional.
Sean Paul [Tue, 14 Apr 2020 19:02:55 +0000 (15:02 -0400)]
drm: Fix HDCP failures when SRM fw is missing
The SRM cleanup in 79643fddd6eb2 ("drm/hdcp: optimizing the srm
handling") inadvertently altered the behavior of HDCP auth when
the SRM firmware is missing. Before that patch, missing SRM was
interpreted as the device having no revoked keys. With that patch,
if the SRM fw file is missing we reject _all_ keys.
This patch fixes that regression by returning success if the file
cannot be found. It also checks the return value from request_srm such
that we won't end up trying to parse the ksv list if there is an error
fetching it.
Tejun Heo [Mon, 4 May 2020 23:27:54 +0000 (19:27 -0400)]
iocost: protect iocg->abs_vdebt with iocg->waitq.lock
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.
However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.
1. The iocg is in the process of being deactivated by the periodic timer.
2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
without anything because it still sees that the iocg is already active.
3. The iocg is deactivated.
4. The bio from #2 is over budget but needs to be forced. It increases
abs_vdebt and goes over the threshold and enables use_delay.
5. IO control is enabled for the iocg's subtree and now IOs are attributed
to the descendant cgroups and the iocg itself no longer issues IOs.
This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.
The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.
This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
Jeffrey Hugo [Thu, 30 Apr 2020 19:05:55 +0000 (00:35 +0530)]
bus: mhi: core: Fix channel device name conflict
When multiple instances of the same MHI product are present in a system,
we can see a splat from mhi_create_devices() - "sysfs: cannot create
duplicate filename".
This is because the device names assigned to the MHI channel devices are
non-unique. They consist of the channel's name, and the channel's pipe
id. For identical products, each instance is going to have the same
set of channel (both in name and pipe id).
To fix this, we prepend the device name of the parent device that the
MHI channels belong to. Since different instances of the same product
should have unique device names, this makes the MHI channel devices for
each product also unique.
Additionally, remove the pipe id from the MHI channel device name. This
is an internal detail to the MHI product that provides little value, and
imposes too much device specific internal details to userspace. It is
expected that channel with a specific name (ie "SAHARA") has a specific
client, and it does not matter what pipe id that channel is enumerated on.
The pipe id is an internal detail between the MHI bus, and the hardware.
The client is not expected to make decisions based on the pipe id, and to
do so would require the client to have intimate knowledge of the hardware,
which is inappropiate as it may violate the layering provided by the MHI
bus. The limitation of doing this is that each product may only have one
instance of a channel by a unique name. This limitation is appropriate
given the usecases of MHI channels.
Jeffrey Hugo [Thu, 30 Apr 2020 19:05:53 +0000 (00:35 +0530)]
bus: mhi: core: Offload register accesses to the controller
When reading or writing MHI registers, the core assumes that the physical
link is a memory mapped PCI link. This assumption may not hold for all
MHI devices. The controller knows what is the physical link (ie PCI, I2C,
SPI, etc), and therefore knows the proper methods to access that link.
The controller can also handle link specific error scenarios, such as
reading -1 when the PCI link went down.
Therefore, it is appropriate that the MHI core requests the controller to
make register accesses on behalf of the core, which abstracts the core
from link specifics, and end up removing an unnecessary assumption.
Jeffrey Hugo [Thu, 30 Apr 2020 19:05:52 +0000 (00:35 +0530)]
bus: mhi: core: Remove link_status() callback
If the MHI core detects invalid data due to a PCI read, it calls into
the controller via link_status() to double check that the link is infact
down. All in all, this is pretty pointless, and racy. There are no good
reasons for this, and only drawbacks.
Its pointless because chances are, the controller is going to do the same
thing to determine if the link is down - attempt a PCI access and compare
the result. This does not make the link status decision any smarter.
Its racy because its possible that the link was down at the time of the
MHI core access, but then recovered before the controller access. In this
case, the controller will indicate the link is not down, and the MHI core
will precede to use a bad value as the MHI core does not attempt to retry
the access.
Retrying the access in the MHI core is a bad idea because again, it is
racy - what if the link is down again? Furthermore, there may be some
higher level state associated with the link status, that is now invalid
because the link went down.
The only reason why the MHI core could see "invalid" data when doing a PCI
access, that is actually valid, is if the register actually contained the
PCI spec defined sentinel for an invalid access. In this case, it is
arguable that the MHI implementation broken, and should be fixed, not
worked around.
Therefore, remove the link_status() callback before anyone attempts to
implement it.
Jeffrey Hugo [Thu, 30 Apr 2020 19:05:51 +0000 (00:35 +0530)]
bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails
Powerdown is necessary if mhi_sync_power_up fails due to a timeout, to
clean up the resources. Otherwise a BUG could be triggered when
attempting to clean up MSIs because the IRQ is still active from a
request_irq().
With the current parsing of mhi_flags, the following statement always
return false:
eob = !!(flags & MHI_EOB);
This is due to the fact that 'enum mhi_flags' starts with index 0 and we
are using direct AND operation to extract each bit. Fix this by using
BIT() macros for defining the flags so that the reset of the code need not
be touched.
Tomas Winkler [Tue, 28 Apr 2020 21:12:00 +0000 (00:12 +0300)]
mei: me: disable mei interface on LBG servers.
Disable the MEI driver on LBG SPS (server) platforms, some corner
flows such as recovery mode does not work, and the driver
doesn't have working use cases.
Joerg Roedel [Mon, 4 May 2020 12:54:13 +0000 (14:54 +0200)]
iommu/amd: Do not flush Device Table in iommu_map_page()
The flush of the Device Table Entries for the domain has already
happened in increase_address_space(), if necessary. Do no flush them
again in iommu_map_page().
Joerg Roedel [Mon, 4 May 2020 12:54:12 +0000 (14:54 +0200)]
iommu/amd: Update Device Table in increase_address_space()
The Device Table needs to be updated before the new page-table root
can be published in domain->pt_root. Otherwise a concurrent call to
fetch_pte might fetch a PTE which is not reachable through the Device
Table Entry.
Joerg Roedel [Mon, 4 May 2020 12:54:11 +0000 (14:54 +0200)]
iommu/amd: Call domain_flush_complete() in update_domain()
The update_domain() function is expected to also inform the hardware
about domain changes. This needs a COMPLETION_WAIT command to be sent
to all IOMMUs which use the domain.
Joerg Roedel [Mon, 4 May 2020 12:54:10 +0000 (14:54 +0200)]
iommu/amd: Do not loop forever when trying to increase address space
When increase_address_space() fails to allocate memory, alloc_pte()
will call it again until it succeeds. Do not loop forever while trying
to increase the address space and just return an error instead.
Joerg Roedel [Mon, 4 May 2020 12:54:09 +0000 (14:54 +0200)]
iommu/amd: Fix race in increase_address_space()/fetch_pte()
The 'pt_root' and 'mode' struct members of 'struct protection_domain'
need to be get/set atomically, otherwise the page-table of the domain
can get corrupted.
Merge the fields into one atomic64_t struct member which can be
get/set atomically.
When both CONFIG_DEBUG_FS and CONFIG_PM_SLEEP are disabled, the
functions that got moved out of the #ifdef section now cause
a warning:
drivers/platform/x86/intel_pmc_core.c:654:13: error: 'pmc_core_lpm_display' defined but not used [-Werror=unused-function]
654 | static void pmc_core_lpm_display(struct pmc_dev *pmcdev, struct device *dev,
| ^~~~~~~~~~~~~~~~~~~~
drivers/platform/x86/intel_pmc_core.c:617:13: error: 'pmc_core_slps0_display' defined but not used [-Werror=unused-function]
617 | static void pmc_core_slps0_display(struct pmc_dev *pmcdev, struct device *dev,
| ^~~~~~~~~~~~~~~~~~~~~~
Rather than add even more #ifdefs here, remove them entirely and
let the compiler work it out, it can actually get rid of all the
debugfs calls without problems as long as the struct member is
there.
The two PM functions just need a __maybe_unused annotations to avoid
another warning instead of the #ifdef.
Fixes: aae43c2bcdc1 ("platform/x86: intel_pmc_core: Relocate pmc_core_*_display() to outside of CONFIG_DEBUG_FS") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
Hans de Goede [Wed, 22 Apr 2020 22:05:59 +0000 (00:05 +0200)]
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
asus-nb-wmi does not add any extra functionality on these Asus
Transformer books. They have detachable keyboards, so the hotkeys are
send through a HID device (and handled by the hid-asus driver) and also
the rfkill functionality is not used on these devices.
Besides not adding any extra functionality, initializing the WMI interface
on these devices actually has a negative side-effect. For some reason
the \_SB.ATKD.INIT() function which asus_wmi_platform_init() calls drives
GPO2 (INT33FC:02) pin 8, which is connected to the front facing webcam LED,
high and there is no (WMI or other) interface to drive this low again
causing the LED to be permanently on, even during suspend.
This commit adds a blacklist of DMI system_ids on which not to load the
asus-nb-wmi and adds these Transformer books to this list. This fixes
the webcam LED being permanently on under Linux.
platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
Jasper Lake uses Icelake PCH IPs and the S0ix debug interfaces are same as
Icelake. It uses SLP_S0_DBG register latch/read interface from Icelake
generation. It doesn't use Tiger Lake LPM debug registers. Change the
Jasper Lake S0ix debug interface to use the ICL reg map.
Fixes: 16292bed9c56 ("platform/x86: intel_pmc_core: Add Atom based Jasper Lake (JSL) platform support") Signed-off-by: Archana Patni <[email protected]> Acked-by: David E. Box <[email protected]> Tested-by: Divagar Mohandass <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
According to the PMC Type C Subsystem (TCSS) Mux programming guide rev
0.6, when a device is transitioning to DP Alternate Mode state, if the
HPD_STATE (bit 7) field in the status update command VDO is set to
HPD_HIGH, the HPD_HIGH field in the Alternate Mode request “mode_data”
field (bit 14) should also be set. Ensure the bit is correctly handled
while issuing the Alternate Mode request.
On some architectures (e.g. arm64) requests for
IO coherent memory may use non-cachable attributes if
the relevant device isn't cache coherent. If these
pages are then remapped into userspace as cacheable,
they may not be coherent with the non-cacheable mappings.
In particular this happens with libusb, when it attempts
to create zero-copy buffers for use by rtl-sdr
(https://github.com/osmocom/rtl-sdr/). On low end arm
devices with non-coherent USB ports, the application will
be unexpectedly killed, while continuing to work fine on
arm machines with coherent USB controllers.
This bug has been discovered/reported a few times over
the last few years. In the case of rtl-sdr a compile time
option to enable/disable zero copy was implemented to
work around it.
Rather than relaying on application specific workarounds,
dma_mmap_coherent() can be used instead of remap_pfn_range().
The page cache/etc attributes will then be correctly set in
userspace to match the kernel mapping.
Alan Stern [Fri, 1 May 2020 20:07:28 +0000 (16:07 -0400)]
USB: core: Fix misleading driver bug report
The syzbot fuzzer found a race between URB submission to endpoint 0
and device reset. Namely, during the reset we call usb_ep0_reinit()
because the characteristics of ep0 may have changed (if the reset
follows a firmware update, for example). While usb_ep0_reinit() is
running there is a brief period during which the pointers stored in
udev->ep_in[0] and udev->ep_out[0] are set to NULL, and if an URB is
submitted to ep0 during that period, usb_urb_ep_type_check() will
report it as a driver bug. In the absence of those pointers, the
routine thinks that the endpoint doesn't exist. The log message looks
like this:
------------[ cut here ]------------
usb 2-1: BOGUS urb xfer, pipe 2 != type 2
WARNING: CPU: 0 PID: 9241 at drivers/usb/core/urb.c:478
usb_submit_urb+0x1188/0x1460 drivers/usb/core/urb.c:478
Now, although submitting an URB while the device is being reset is a
questionable thing to do, it shouldn't count as a driver bug as severe
as submitting an URB for an endpoint that doesn't exist. Indeed,
endpoint 0 always exists, even while the device is in its unconfigured
state.
To prevent these misleading driver bug reports, this patch updates
usb_disable_endpoint() to avoid clearing the ep_in[] and ep_out[]
pointers when the endpoint being disabled is ep0. There's no danger
of leaving a stale pointer in place, because the usb_host_endpoint
structure being pointed to is stored permanently in udev->ep0; it
doesn't get deallocated until the entire usb_device structure does.
Oscar Carter [Fri, 1 May 2020 15:51:18 +0000 (17:51 +0200)]
staging: gasket: Check the return value of gasket_get_bar_index()
Check the return value of gasket_get_bar_index function as it can return
a negative one (-EINVAL). If this happens, a negative index is used in
the "gasket_dev->bar_data" array.
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
In LPAR we will only get an intercept for FC==3 for the PQAP
instruction. Running nested under z/VM can result in other intercepts as
well as ECA_APIE is an effective bit: If one hypervisor layer has
turned this bit off, the end result will be that we will get intercepts for
all function codes. Usually the first one will be a query like PQAP(QCI).
So the WARN_ON_ONCE is not right. Let us simply remove it.
Linus Torvalds [Tue, 5 May 2020 01:55:20 +0000 (18:55 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- Wacom driver functional and regression fixes from Jason Gerecke
- race condition fix in usbhid, found by syzbot and fixed by Alan Stern
- a few device-specific quirks and ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
HID: mcp2221: add gpiolib dependency
HID: i2c-hid: reset Synaptics SYNA2393 on resume
HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
HID: alps: ALPS_1657 is too specific; use U1_UNICORN_LEGACY instead
HID: alps: Add AUI1657 device ID
HID: logitech: Add support for Logitech G11 extra keys
HID: multitouch: add eGalaxTouch P80H84 support
HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
Zong Li [Mon, 4 May 2020 03:54:48 +0000 (11:54 +0800)]
riscv: force __cpu_up_ variables to put in data section
Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section.
Currently, these two variables are put in bss section, there is a
potential risk that secondary harts get the uninitialized value before
main hart finishing the bss clearing. In this case, all secondary
harts would pass the waiting loop and enable the MMU before main hart
set up the page table.
This issue happens on random booting of multiple harts, which means
it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI
v0.7 (or higher version), we have HSM extension so all the secondary harts
are brought-up by Linux kernel in an orderly fashion. This means we don't
need this change for OpenSBI v0.7 (or higher version).