Junxiao Bi [Fri, 26 Jun 2020 03:29:40 +0000 (20:29 -0700)]
ocfs2: fix value of OCFS2_INVALID_SLOT
In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
implementation, slot number is 32 bits. Usually this will not cause any
issue, because slot number is converted from u16 to u32, but
OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
disk was obtained, its value was (u16)-1, and it was converted to u32.
Then the following checking in get_local_system_inode will be always
skipped:
Junxiao Bi [Fri, 26 Jun 2020 03:29:37 +0000 (20:29 -0700)]
ocfs2: fix panic on nfs server over ocfs2
The following kernel panic was captured when running nfs server over
ocfs2, at that time ocfs2_test_inode_bit() was checking whether one
inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its
"suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from
//global_inode_alloc, but here it wrongly assumed that it was got from per
slot inode alloctor which would cause array overflow and trigger kernel
panic.
Junxiao Bi [Fri, 26 Jun 2020 03:29:33 +0000 (20:29 -0700)]
ocfs2: load global_inode_alloc
Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
make it load during mount. It can be used to test whether some
global/system inodes are valid. One use case is that nfsd will test
whether root inode is valid.
Junxiao Bi [Fri, 26 Jun 2020 03:29:30 +0000 (20:29 -0700)]
ocfs2: avoid inode removal while nfsd is accessing it
Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.
This is a series of patches to fix issues on nfsd over ocfs2. patch 1
is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a
panic issue.
This patch (of 4):
When nfsd is getting file dentry using handle or parent dentry of some
dentry, one cluster lock is used to avoid inode removed from other node,
but it still could be removed from local node, so use a rw lock to avoid
this.
Lianbo Jiang [Fri, 26 Jun 2020 03:29:27 +0000 (20:29 -0700)]
kexec: do not verify the signature without the lockdown or mandatory signature
Signature verification is an important security feature, to protect
system from being attacked with a kernel of unknown origin. Kexec
rebooting is a way to replace the running kernel, hence need be secured
carefully.
In the current code of handling signature verification of kexec kernel,
the logic is very twisted. It mixes signature verification, IMA
signature appraising and kexec lockdown.
If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
signature, the supported crypto, and key, we don't think this is wrong,
Unless kexec lockdown is executed. IMA is considered as another kind of
signature appraising method.
If kexec kernel image has signature/crypto/key, it has to go through the
signature verification and pass. Otherwise it's seen as verification
failure, and won't be loaded.
Seems kexec kernel image with an unqualified signature is even worse
than those w/o signature at all, this sounds very unreasonable. E.g.
If people get a unsigned kernel to load, or a kernel signed with expired
key, which one is more dangerous?
So, here, let's simplify the logic to improve code readability. If the
KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature
verification is mandated. Otherwise, we lift the bar for any kernel
image.
Vlastimil Babka [Fri, 26 Jun 2020 03:29:24 +0000 (20:29 -0700)]
mm, compaction: make capture control handling safe wrt interrupts
Hugh reports:
"While stressing compaction, one run oopsed on NULL capc->cc in
__free_one_page()'s task_capc(zone): compact_zone_order() had been
interrupted, and a page was being freed in the return from interrupt.
Though you would not expect it from the source, both gccs I was using
(4.8.1 and 7.5.0) had chosen to compile compact_zone_order() with the
".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before
callq compact_zone - long after the "current->capture_control =
&capc". An interrupt in between those finds capc->cc NULL (zeroed by
an earlier rep stos).
This could presumably be fixed by a barrier() before setting
current->capture_control in compact_zone_order(); but would also need
more care on return from compact_zone(), in order not to risk leaking
a page captured by interrupt just before capture_control is reset.
Maybe that is the preferable fix, but I felt safer for task_capc() to
exclude the rather surprising possibility of capture at interrupt
time"
I have checked that gcc10 also behaves the same.
The advantage of fix in compact_zone_order() is that we don't add
another test in the page freeing hot path, and that it might prevent
future problems if we stop exposing pointers to uninitialized structures
in current task.
So this patch implements the suggestion for compact_zone_order() with
barrier() (and WRITE_ONCE() to prevent store tearing) for setting
current->capture_control, and prevents page leaking with
WRITE_ONCE/READ_ONCE in the proper order.
Michal Hocko [Fri, 26 Jun 2020 03:29:21 +0000 (20:29 -0700)]
mm: do_swap_page(): fix up the error code
do_swap_page() returns error codes from the VM_FAULT* space. try_charge()
might return -ENOMEM, though, and then do_swap_page() simply returns 0
which means a success.
We almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM otherwise the the page fault path will not notify the
userspace and wait for an action.
Stafford Horne [Fri, 26 Jun 2020 03:29:17 +0000 (20:29 -0700)]
openrisc: fix boot oops when DEBUG_VM is enabled
Since v5.8-rc1 OpenRISC Linux fails to boot when DEBUG_VM is enabled.
This has been bisected to commit 42fc541404f2 ("mmap locking API: add
mmap_assert_locked() and mmap_assert_write_locked()").
The added locking checks exposed the issue that OpenRISC was not taking
this mmap lock when during page walks for DMA operations. This patch
locks and unlocks the mmap lock for page walking.
Harish [Thu, 25 Jun 2020 16:57:21 +0000 (22:27 +0530)]
selftests/powerpc: Fix build failure in ebb tests
We use OUTPUT directory as TMPOUT for checking no-pie option.
Since commit f2f02ebd8f38 ("kbuild: improve cc-option to clean up all
temporary files") when building powerpc/ from selftests directory, the
OUTPUT directory points to powerpc/pmu/ebb/ and gets removed when
checking for -no-pie option in try-run routine, subsequently build
fails with the following:
$ make -C powerpc
...
TARGET=ebb; BUILD_TARGET=$OUTPUT/$TARGET; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C $TARGET all
make[2]: Entering directory '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb'
make[2]: *** No rule to make target 'Makefile'.
make[2]: Failed to remake makefile 'Makefile'.
make[2]: *** No rule to make target 'ebb.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'ebb_handler.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'trace.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'busy_loop.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: Target 'all' not remade because of errors.
Fix this by adding a suffix to the OUTPUT directory so that the
failure is avoided.
1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.
2) The default crypto algorithm selection in Kconfig for IPSEC is out
of touch with modern reality, fix this up. From Eric Biggers.
3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
Nakryiko.
4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
Hangbin Liu.
5) Adjust packet alignment handling in ax88179_178a driver to match
what the hardware actually does. From Jeremy Kerr.
6) register_netdevice can leak in the case one of the notifiers fail,
from Yang Yingliang.
7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.
8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
Oltean.
9) tg3 driver can sleep forever when we get enough EEH errors, fix from
David Christensen.
10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
drivers, from Ciara Loftus.
11) Fix scanning loop break condition in of_mdiobus_register(), from
Florian Fainelli.
12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.
13) Endianness fix in mlxsw, from Ido Schimmel.
14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.
15) Missing bridge mrp configuration validation, from Horatiu Vultur.
16) Fix circular netns references in wireguard, from Jason A. Donenfeld.
17) PTP initialization on recovery is not done properly in qed driver,
from Alexander Lobakin.
18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
from Rahul Lakkireddy.
19) Don't clear bound device TX queue of socket prematurely otherwise we
get problems with ktls hw offloading, from Tariq Toukan.
20) ipset can do atomics on unaligned memory, fix from Russell King.
21) Align ethernet addresses properly in bridging code, from Thomas
Martitz.
22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
rds: transport module should be auto loaded when transport is set
sch_cake: fix a few style nits
sch_cake: don't call diffserv parsing code when it is not needed
sch_cake: don't try to reallocate or unshare skb unconditionally
ethtool: fix error handling in linkstate_prepare_data()
wil6210: account for napi_gro_receive never returning GRO_DROP
hns: do not cast return value of napi_gro_receive to null
socionext: account for napi_gro_receive never returning GRO_DROP
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
vxlan: fix last fdb index during dump of fdb with nhid
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
tc-testing: avoid action cookies with odd length.
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
net: dsa: sja1105: fix tc-gate schedule with single element
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
net: dsa: sja1105: unconditionally free old gating config
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
net: macb: free resources on failure path of at91ether_open()
net: macb: call pm_runtime_put_sync on failure path
...
====================
sched: A couple of fixes for sch_cake
This series contains a couple of fixes for diffserv handling in sch_cake that
provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end).
Not quite sure about whether this should go to stable; it does provide a nice
speedup, but it's not strictly a fix in the "correctness" sense. I lean towards
including this in stable as well, since our most important consumer of that
(OpenWrt) is likely to backport the series anyway.
====================
I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these.
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
sch_cake: don't call diffserv parsing code when it is not needed
As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is configured to neither use diffserv-based
classification, nor to zero out the diffserv bits.
Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ilya Ponetayev [Thu, 25 Jun 2020 20:12:07 +0000 (22:12 +0200)]
sch_cake: don't try to reallocate or unshare skb unconditionally
cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).
Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.
Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Ilya Ponetayev <[email protected]>
[ fix a few style issues, reflow commit message ] Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Michal Kubecek [Wed, 24 Jun 2020 22:09:08 +0000 (00:09 +0200)]
ethtool: fix error handling in linkstate_prepare_data()
When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we
must not return without calling ethnl_ops_complete(dev) as that could
result in imbalance between ethtool_ops ->begin() and ->complete() calls.
Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Thu, 25 Jun 2020 23:16:49 +0000 (16:16 -0700)]
Merge tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Four small fixes:
- Fix a ringbuffer bug for nested events having time go backwards
- Fix a config dependency for boot time tracing to depend on
synthetic events instead of histograms.
- Fix trigger format parsing to handle multiple spaces
- Fix bootconfig to handle failures in multiple events"
* tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/boottime: Fix kprobe multiple events
tracing: Fix event trigger to accept redundant spaces
tracing/boot: Fix config dependency for synthedic event
ring-buffer: Zero out time extend if it is nested and not absolute
====================
napi_gro_receive caller return value cleanups
In 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
napi_gro_receive()"), the GRO_NORMAL case stopped calling
netif_receive_skb_internal, checking its return value, and returning
GRO_DROP in case it failed. Instead, it calls into
netif_receive_skb_list_internal (after a bit of indirection), which
doesn't return any error. Therefore, napi_gro_receive will never return
GRO_DROP, making handling GRO_DROP dead code.
I emailed the author of 6570bc79c0df on netdev [1] to see if this change
was intentional, but the dlink.ru email address has been disconnected,
and looking a bit further myself, it seems somewhat infeasible to start
propagating return values backwards from the internal machinations of
netif_receive_skb_list_internal.
Taking a look at all the callers of napi_gro_receive, it appears that
three are checking the return value for the purpose of comparing it to
the now never-happening GRO_DROP, and one just casts it to (void), a
likely historical leftover. Every other of the 120 callers does not
bother checking the return value.
And it seems like these remaining 116 callers are doing the right thing:
after calling napi_gro_receive, the packet is now in the hands of the
upper layers of the newtworking, and the device driver itself has no
business now making decisions based on what the upper layers choose to
do. Incrementing stats counters on GRO_DROP seems like a mistake, made
by these three drivers, but not by the remaining 117.
It would seem, therefore, that after rectifying these four callers of
napi_gro_receive, that I should go ahead and just remove returning the
value from napi_gro_receive all together. However, napi_gro_receive has
a function event tracer, and being able to introspect into the
networking stack to see how often napi_gro_receive is returning whatever
interesting GRO status (aside from _DROP) remains an interesting
data point worth keeping for debugging.
So, this series simply gets rid of the return value checking for the
four useless places where that check never evaluates to anything
meaningful.
wil6210: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands. In this case, too, the non-gro path didn't bother checking
the return value. Plus, this had some clunky debugging functions that
duplicated code from elsewhere and was generally pretty messy. So, this
commit cleans that all up too.
Fixes: 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
socionext: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Fixes: 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Fixes: 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Roopa Prabhu [Wed, 24 Jun 2020 21:02:36 +0000 (14:02 -0700)]
vxlan: fix last fdb index during dump of fdb with nhid
This patch fixes last saved fdb index in fdb dump handler when
handling fdb's with nhid.
Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.
The fix is to not add IPv4 addresses to ipv6only sockets.
Briana Oursler [Wed, 24 Jun 2020 19:29:14 +0000 (12:29 -0700)]
tc-testing: avoid action cookies with odd length.
Update odd length cookie hexstrings in csum.json, tunnel_key.json and
bpf.json to be even length to comply with check enforced in commit 0149dabf2a1b ("tc: m_actions: check cookie hexstring len") in iproute2.
====================
tcp_cubic: fix spurious HYSTART_DELAY on RTT decrease
This series fixes a long-standing bug in the TCP CUBIC
HYSTART_DELAY mechanim recently reported by Mirja Kuehlewind. The
code can cause a spurious exit of slow start in some particular
cases: upon an RTT decrease that happens on the 9th or later ACK
in a round trip. This series fixes the original Hystart code and
also the recent BPF implementation.
====================
Neal Cardwell [Wed, 24 Jun 2020 16:42:03 +0000 (12:42 -0400)]
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Apply the fix from:
"tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT"
to the BPF implementation of TCP CUBIC congestion control.
Repeating the commit description here for completeness:
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:
o The first 8 RTT samples in a round trip are 150ms, resulting in a
curr_rtt of 150ms and a delay_min of 150ms.
o The 9th RTT sample is 100ms. The curr_rtt does not change after the
first 8 samples, so curr_rtt remains 150ms. But delay_min can be
lowered at any time, so delay_min falls to 100ms. The code executes
the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
of 100ms, and the curr_rtt is declared far enough above delay_min to
force a (spurious) exit of Slow start.
The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.
Neal Cardwell [Wed, 24 Jun 2020 16:42:02 +0000 (12:42 -0400)]
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:
o The first 8 RTT samples in a round trip are 150ms, resulting in a
curr_rtt of 150ms and a delay_min of 150ms.
o The 9th RTT sample is 100ms. The curr_rtt does not change after the
first 8 samples, so curr_rtt remains 150ms. But delay_min can be
lowered at any time, so delay_min falls to 100ms. The code executes
the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
of 100ms, and the curr_rtt is declared far enough above delay_min to
force a (spurious) exit of Slow start.
The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.
====================
Fixes for SJA1105 DSA tc-gate action
This small series fixes 2 bugs in the tc-gate implementation:
1. The TAS state machine keeps getting rescheduled even after removing
tc-gate actions on all ports.
2. tc-gate actions with only one gate control list entry are installed
to hardware with an incorrect interval of zero, which makes the
switch erroneously drop those packets (since the configuration is
invalid).
To keep the code palatable, a forward-declaration was avoided by moving
some code around in patch 1/4. I hope that isn't too much of an issue.
====================
Vladimir Oltean [Wed, 24 Jun 2020 13:54:47 +0000 (16:54 +0300)]
net: dsa: sja1105: fix tc-gate schedule with single element
The sja1105_gating_cfg_time_to_interval function does this, as per the
comments:
/* The gate entries contain absolute times in their e->interval field. Convert
* that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
*/
To perform that task, it iterates over gating_cfg->entries, at each step
updating the interval of the _previous_ entry. So one interval remains
to be updated at the end of the loop: the last one (since it isn't
"prev" for anyone else).
But there was an erroneous check, that the last element's interval
should not be updated if it's also the only element. I'm not quite sure
why that check was there, but it's clearly incorrect, as a tc-gate
schedule with a single element would get an e->interval of zero,
regardless of the duration requested by the user. The switch wouldn't
even consider this configuration as valid: it will just drop all traffic
that matches the rule.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links") Reported-by: Xiaoliang Yang <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:46 +0000 (16:54 +0300)]
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
Currently, tas_data->enabled would remain true even after deleting all
tc-gate rules from the switch ports, which would cause the
sja1105_tas_state_machine to get unnecessarily scheduled.
Also, if there were any errors which would prevent the hardware from
enabling the gating schedule, the sja1105_tas_state_machine would
continuously detect and print that, spamming the kernel log, even if the
rules were subsequently deleted.
The rules themselves are _not_ active, because sja1105_init_scheduling
does enough of a job to not install the gating schedule in the static
config. But the virtual link rules themselves are still present.
So call the functions that remove the tc-gate configuration from
priv->tas_data.gating_cfg, so that tas_data->enabled can be set to
false, and sja1105_tas_state_machine will stop from being scheduled.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:45 +0000 (16:54 +0300)]
net: dsa: sja1105: unconditionally free old gating config
Currently sja1105_compose_gating_subschedule is not prepared to be
called for the case where we want to recompute the global tc-gate
configuration after we've deleted those actions on a port.
After deleting the tc-gate actions on the last port, max_cycle_time
would become zero, and that would incorrectly prevent
sja1105_free_gating_config from getting called.
So move the freeing function above the check for the need to apply a new
configuration.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Wed, 24 Jun 2020 13:54:44 +0000 (16:54 +0300)]
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
It turns out that sja1105_compose_gating_subschedule must also be called
from sja1105_vl_delete, to recalculate the overall tc-gate
configuration. Currently this is not possible without introducing a
forward declaration. So move the function at the top of the file, along
with its dependencies.
Claudiu Beznea [Wed, 24 Jun 2020 10:08:18 +0000 (13:08 +0300)]
net: macb: free resources on failure path of at91ether_open()
DMA buffers were not freed on failure path of at91ether_open().
Along with changes for freeing the DMA buffers the enable/disable
interrupt instructions were moved to at91ether_start()/at91ether_stop()
functions and the operations on at91ether_stop() were done in
their reverse order (compared with how is done in at91ether_start()):
before this patch the operation order on interface open path
was as follows:
1/ alloc DMA buffers
2/ enable tx, rx
3/ enable interrupts
and the order on interface close path was as follows:
1/ disable tx, rx
2/ disable interrupts
3/ free dma buffers.
Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Claudiu Beznea <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Claudiu Beznea [Wed, 24 Jun 2020 10:08:17 +0000 (13:08 +0300)]
net: macb: call pm_runtime_put_sync on failure path
Call pm_runtime_put_sync() on failure path of at91ether_open.
Fixes: e6a41c23df0d ("net: macb: ensure interface is not suspended on at91rm9200") Signed-off-by: Claudiu Beznea <[email protected]> Signed-off-by: David S. Miller <[email protected]>
"ra:0x3ff449e954" is the return address of "call _mcount" in the
prologue of __vdso_gettimeofday(). Without proper relocate, pc jmp
to 0x0000003ff449e000 (vdso map base) with a illegal instruction
trap.
The solution comes from arch/arm64/kernel/vdso/Makefile:
Eddie James [Tue, 9 Jun 2020 20:15:54 +0000 (15:15 -0500)]
i2c: fsi: Fix the port number field in status register
The port number field in the status register was not correct, so fix it.
Fixes: d6ffb6300116 ("i2c: Add FSI-attached I2C master algorithm") Signed-off-by: Eddie James <[email protected]> Signed-off-by: Joel Stanley <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
Vincent Chen [Tue, 23 Jun 2020 05:40:21 +0000 (13:40 +0800)]
riscv: Add extern declarations for vDSO time-related functions
Add extern declarations for vDSO time-related functions to notify the
compiler these functions will be used in somewhere to avoid
"no previous prototype" compile warning.
Vincent Chen [Tue, 23 Jun 2020 01:24:17 +0000 (09:24 +0800)]
clk: sifive: allocate sufficient memory for struct __prci_data
The (struct __prci_data).hw_clks.hws is an array with dynamic elements.
Using struct_size(pd, hw_clks.hws, ARRAY_SIZE(__prci_init_clocks))
instead of sizeof(*pd) to get the correct memory size of
struct __prci_data for sifive/fu540-prci. After applying this
modifications, the kernel runs smoothly with CONFIG_SLAB_FREELIST_RANDOM
enabled on the HiFive unleashed board.
Fixes: 30b8e27e3b58 ("clk: sifive: add a driver for the SiFive FU540 PRCI IP block") Signed-off-by: Vincent Chen <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
Vincent Chen [Tue, 23 Jun 2020 01:13:22 +0000 (09:13 +0800)]
riscv: Add -fPIC option to CFLAGS_vgettimeofday.o
The time related vDSO functions use a variable, vdso_data, to access the
vDSO data page to get the system time information. Because the vdso_data
for CFLAGS_vgettimeofday.o is an external variable defined in vdso.o,
the CFLAGS_vgettimeofday.o should be compiled with -fPIC to ensure
that vdso_data is addressable.
Linus Torvalds [Thu, 25 Jun 2020 20:02:58 +0000 (13:02 -0700)]
Merge tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixlet from Jan Kara:
"A performance improvement to reduce impact of fsnotify for inodes
where it isn't used"
* tag 'fsnotify_for_v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs: Do not check if there is a fsnotify watcher on pseudo inodes
The following patchset contains Netfilter fixes for net, they are:
1) Unaligned atomic access in ipset, from Russell King.
2) Missing module description, from Rob Gill.
3) Patches to fix a module unload causing NULL pointer dereference in
xtables, from David Wilder. For the record, I posting here his cover
letter explaining the problem:
A crash happened on ppc64le when running ltp network tests triggered by
"rmmod iptable_mangle".
See previous discussion in this thread:
https://lists.openwall.net/netdev/2020/06/03/161 .
In the crash I found in iptable_mangle_hook() that
state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference.
net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and
called when ip_mangle modules is unloaded. A rmmod task was found running
in the crash dump. A 2nd crash showed the same problem when running
"rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).
To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will
un-register the underlying hook and exit would do the table freeing. The
netns core does an unconditional +synchronize_rcu after the pre_exit hooks
insuring no packets are in flight that have picked up the pointer before
completing the un-register.
These patches include changes for both iptables and ip6tables.
We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a
loop for 72 hours.
4) Add a selftest for conntrack helper assignment, from Florian Westphal.
====================
Thomas Martitz [Thu, 25 Jun 2020 12:26:03 +0000 (14:26 +0200)]
net: bridge: enfore alignment for ethernet address
The eth_addr member is passed to ether_addr functions that require
2-byte alignment, therefore the member must be properly aligned
to avoid unaligned accesses.
Linus Torvalds [Thu, 25 Jun 2020 19:38:09 +0000 (12:38 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Several regression fixes from work that landed in the merge window,
particularly in the mlx5 driver:
- Various static checker and warning fixes
- General bug fixes in rvt, qedr, hns, mlx5 and hfi1
- Several regression fixes related to the ECE and QP changes in last
cycle
- Fixes for a few long standing crashers in CMA, uverbs ioctl, and
xrc"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
IB/hfi1: Add atomic triggered sleep/wakeup
IB/hfi1: Correct -EBUSY handling in tx code
IB/hfi1: Fix module use count flaw due to leftover module put calls
IB/hfi1: Restore kfree in dummy_netdev cleanup
IB/mad: Fix use after free when destroying MAD agent
RDMA/mlx5: Protect from kernel crash if XRC_TGT doesn't have udata
RDMA/counter: Query a counter before release
RDMA/mad: Fix possible memory leak in ib_mad_post_receive_mads()
RDMA/mlx5: Fix integrity enabled QP creation
RDMA/mlx5: Remove ECE limitation from the RAW_PACKET QPs
RDMA/mlx5: Fix remote gid value in query QP
RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path
RDMA/core: Check that type_attrs is not NULL prior access
RDMA/hns: Fix an cmd queue issue when resetting
RDMA/hns: Fix a calltrace when registering MR from userspace
RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
RDMA/cma: Protect bind_list and listen_list while finding matching cm id
RDMA/qedr: Fix KASAN: use-after-free in ucma_event_handler+0x532
RDMA/efa: Set maximum pkeys device attribute
RDMA/rvt: Fix potential memory leak caused by rvt_alloc_rq
...
// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
// Since CWR was sent, this packet should NOT have ECE set
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script
In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.
v3:
- Add a sequence check to avoid sending an ACK to an ACK
v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets
Ard Biesheuvel [Thu, 25 Jun 2020 07:18:16 +0000 (09:18 +0200)]
net: phy: mscc: avoid skcipher API for single block AES encryption
The skcipher API dynamically instantiates the transformation object
on request that implements the requested algorithm optimally on the
given platform. This notion of optimality only matters for cases like
bulk network or disk encryption, where performance can be a bottleneck,
or in cases where the algorithm itself is not known at compile time.
In the mscc case, we are dealing with AES encryption of a single
block, and so neither concern applies, and we are better off using
the AES library interface, which is lightweight and safe for this
kind of use.
Note that the scatterlist API does not permit references to buffers
that are located on the stack, so the existing code is incorrect in
any case, but avoiding the skcipher and scatterlist APIs entirely is
the most straight-forward approach to fixing this.
Jens Axboe [Thu, 25 Jun 2020 18:57:17 +0000 (12:57 -0600)]
Merge branch 'nvme-5.8' of git://git.infradead.org/nvme into block-5.8
Pull NVMe fixes from Christoph.
* 'nvme-5.8' of git://git.infradead.org/nvme:
nvme-multipath: fix bogus request queue reference put
nvme-multipath: fix deadlock due to head->lock
nvme: don't protect ns mutation with ns->head->lock
nvme-multipath: fix deadlock between ana_work and scan_work
nvme: fix possible deadlock when I/O is blocked
nvme-rdma: assign completion vector correctly
nvme-loop: initialize tagset numa value to the value of the ctrl
nvme-tcp: initialize tagset numa value to the value of the ctrl
nvme-pci: initialize tagset numa value to the value of the ctrl
nvme-pci: override the value of the controller's numa node
nvme: set initial value for controller's numa node
Alex Williamson [Thu, 25 Jun 2020 17:04:23 +0000 (11:04 -0600)]
vfio/pci: Fix SR-IOV VF handling with MMIO blocking
SR-IOV VFs do not implement the memory enable bit of the command
register, therefore this bit is not set in config space after
pci_enable_device(). This leads to an unintended difference
between PF and VF in hand-off state to the user. We can correct
this by setting the initial value of the memory enable bit in our
virtualized config space. There's really no need however to
ever fault a user on a VF though as this would only indicate an
error in the user's management of the enable bit, versus a PF
where the same access could trigger hardware faults.
Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <[email protected]>
Linus Torvalds [Thu, 25 Jun 2020 16:24:28 +0000 (09:24 -0700)]
Merge tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix kernel crash on system call single stepping.
- Make sure early program check handler is executed with DAT on to
avoid an endless program check loop.
- Add __GFP_NOWARN flag to debug feature to avoid user triggerable
allocation failure messages.
* tag 's390-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/debug: avoid kernel warning on too large number of pages
s390/kasan: fix early pgm check handler execution
s390: fix system call single stepping
Linus Torvalds [Thu, 25 Jun 2020 16:15:24 +0000 (09:15 -0700)]
Merge tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes gathered in the last two weeks.
The major changes here are fixes for the recent DPCM regressions found
on i.MX and Qualcomm platforms and fixes for resource leaks in ASoC
DAI registrations.
Other than those are mostly device-specific fixes including the usual
USB- and HD-audio quirks, and a fix for syzkaller case and ID updates
for new Intel platforms"
* tag 'sound-5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits)
ALSA: usb-audio: Fix OOB access of mixer element list
ALSA: usb-audio: add quirk for Samsung USBC Headset (AKG)
ALSA: usb-audio: Add registration quirk for Kingston HyperX Cloud Flight S
ASoC: rockchip: Fix a reference count leak.
ASoC: amd: closing specific instance.
ALSA: hda: Intel: add missing PCI IDs for ICL-H, TGL-H and EKL
ASoC: hdac_hda: fix memleak with regmap not freed on remove
ASoC: SOF: Intel: add PCI IDs for ICL-H and TGL-H
ASoC: SOF: Intel: add PCI ID for CometLake-S
ASoC: Intel: SOF: merge COMETLAKE_LP and COMETLAKE_H
ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems
ALSA: usb-audio: Fix potential use-after-free of streams
ALSA: hda/realtek - Add quirk for MSI GE63 laptop
ASoC: fsl_ssi: Fix bclk calculation for mono channel
ASoC: SOF: Intel: hda: Clear RIRB status before reading WP
ASoC: rt1015: Update rt1015 default register value according to spec modification.
ASoC: qcom: common: set correct directions for dailinks
ASoc: q6afe: add support to get port direction
ASoC: soc-pcm: fix checks for multi-cpu FE dailinks
ASoC: rt5682: Let dai clks be registered whether mclk exists or not
...
Peter Zijlstra [Mon, 15 Jun 2020 16:24:27 +0000 (18:24 +0200)]
rcu: Fixup noinstr warnings
A KCSAN build revealed we have explicit annoations through atomic_*()
usage, switch to arch_atomic_*() for the respective functions.
vmlinux.o: warning: objtool: rcu_nmi_exit()+0x4d: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_dynticks_eqs_enter()+0x25: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_nmi_enter()+0x4f: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: rcu_dynticks_eqs_exit()+0x2a: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: __rcu_is_watching()+0x25: call to __kcsan_check_access() leaves .noinstr.text section
Additionally, without the NOP in instrumentation_begin(), objtool would
not detect the lack of the 'else instrumentation_begin();' branch in
rcu_nmi_enter().
Peter Zijlstra [Thu, 25 Jun 2020 13:55:14 +0000 (15:55 +0200)]
locking/atomics: Provide the arch_atomic_ interface to generic code
Architectures with instrumented (KASAN/KCSAN) atomic operations
natively provide arch_atomic_ variants that are not instrumented.
It turns out that some generic code also requires arch_atomic_ in
order to avoid instrumentation, so provide the arch_atomic_ interface
as a direct map into the regular atomic_ interface for
non-instrumented architectures.
Pavel Begunkov [Thu, 25 Jun 2020 09:37:11 +0000 (12:37 +0300)]
io_uring: fix current->mm NULL dereference on exit
Don't reissue requests from io_iopoll_reap_events(), the task may not
have mm, which ends up with NULL. It's better to kill everything off on
exit anyway.
Pavel Begunkov [Thu, 25 Jun 2020 09:37:10 +0000 (12:37 +0300)]
io_uring: fix hanging iopoll in case of -EAGAIN
io_do_iopoll() won't do anything with a request unless
req->iopoll_completed is set. So io_complete_rw_iopoll() has to set
it, otherwise io_do_iopoll() will poll a file again and again even
though the request of interest was completed long time ago.
Also, remove -EAGAIN check from io_issue_sqe() as it races with
the changed lines. The request will take the long way and be
resubmitted from io_iopoll*().
io_kiocb's result and iopoll_completed")
Fixes: bbde017a32b3 ("io_uring: add memory barrier to synchronize Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
cpuidle: Rearrange s2idle-specific idle state entry code
Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
for the s2idle-specific idle state entry and invoke it from
cpuidle_idle_call() to make the s2idle-specific idle entry code
path look more similar to the "regular" idle entry one.
Sumit Garg [Thu, 4 Jun 2020 10:01:18 +0000 (15:31 +0530)]
kdb: Make kdb_printf() console handling more robust
While rounding up CPUs via NMIs, its possible that a rounded up CPU
maybe holding a console port lock leading to kgdb master CPU stuck in
a deadlock during invocation of console write operations. A similar
deadlock could also be possible while using synchronous breakpoints.
So in order to avoid such a deadlock, set oops_in_progress to encourage
the console drivers to disregard their internal spin locks: in the
current calling context the risk of deadlock is a bigger problem than
risks due to re-entering the console driver. We operate directly on
oops_in_progress rather than using bust_spinlocks() because the calls
bust_spinlocks() makes on exit are not appropriate for this calling
context.
====================
net: bcmgenet: use hardware padding of runt frames
Now that scatter-gather and tx-checksumming are enabled by default
it revealed a packet corruption issue that can occur for very short
fragmented packets.
When padding these frames to the minimum length it is possible for
the non-linear (fragment) data to be added to the end of the linear
header in an SKB. Since the number of fragments is read before the
padding and used afterward without reloading, the fragment that
should have been consumed can be tacked on in place of part of the
padding.
The third commit in this set corrects this by removing the software
padding and allowing the hardware to add the pad bytes if necessary.
The first two commits resolve warnings observed by the kbuild test
robot and are included here for simplicity of application.
====================
Doug Berger [Thu, 25 Jun 2020 01:14:55 +0000 (18:14 -0700)]
net: bcmgenet: use hardware padding of runt frames
When commit 474ea9cafc45 ("net: bcmgenet: correctly pad short
packets") added the call to skb_padto() it should have been
located before the nr_frags parameter was read since that value
could be changed when padding packets with lengths between 55
and 59 bytes (inclusive).
The use of a stale nr_frags value can cause corruption of the
pad data when tx-scatter-gather is enabled. This corruption of
the pad can cause invalid checksum computation when hardware
offload of tx-checksum is also enabled.
Since the original reason for the padding was corrected by
commit 7dd399130efb ("net: bcmgenet: fix skb_len in
bcmgenet_xmit_single()") we can remove the software padding all
together and make use of hardware padding of short frames as
long as the hardware also always appends the FCS value to the
frame.
Fixes: 474ea9cafc45 ("net: bcmgenet: correctly pad short packets") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Doug Berger [Thu, 25 Jun 2020 01:14:54 +0000 (18:14 -0700)]
net: bcmgenet: use __be16 for htons(ETH_P_IP)
The 16-bit value that holds a short in network byte order should
be declared as a restricted big endian type to allow type checks
to succeed during assignment.
Doug Berger [Thu, 25 Jun 2020 01:14:53 +0000 (18:14 -0700)]
net: bcmgenet: re-remove bcmgenet_hfb_add_filter
This function was originally removed by Baoyou Xie in
commit e2072600a241 ("net: bcmgenet: remove unused function in
bcmgenet.c") to prevent a build warning.
Some of the functions removed by Baoyou Xie are now used for
WAKE_FILTER support so his commit was reverted, but this function
is still unused and the kbuild test robot dutifully reported the
warning.
This commit once again removes the remaining unused hfb functions.
Fixes: 14da1510fedc ("Revert "net: bcmgenet: remove unused function in bcmgenet.c"") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Thu, 25 Jun 2020 00:39:30 +0000 (17:39 -0700)]
Merge tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix a regression which uses potential uninitialized high 32-bit value
unexpectedly recently observed with specific compiler options"
* tag 'erofs-for-5.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
Florian Westphal [Mon, 22 Jun 2020 08:28:32 +0000 (10:28 +0200)]
selftests: netfilter: add test case for conntrack helper assignment
check that 'nft ... ct helper set <foo>' works:
1. configure ftp helper via nft and assign it to
connections on port 2121
2. check with 'conntrack -L' that the next connection
has the ftp helper attached to it.
David Wilder [Mon, 22 Jun 2020 17:10:13 +0000 (10:10 -0700)]
netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit helpers.
The pre_exit will un-register the underlying hook and .exit will do
the table freeing. The netns core does an unconditional synchronize_rcu
after the pre_exit hooks insuring no packets are in flight that have
picked up the pointer before completing the un-register.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
David Wilder [Mon, 22 Jun 2020 17:10:11 +0000 (10:10 -0700)]
netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit helpers.
The pre_exit will un-register the underlying hook and .exit will do the
table freeing. The netns core does an unconditional synchronize_rcu after
the pre_exit hooks insuring no packets are in flight that have picked up
the pointer before completing the un-register.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Russell King [Wed, 10 Jun 2020 20:51:11 +0000 (21:51 +0100)]
netfilter: ipset: fix unaligned atomic access
When using ip_set with counters and comment, traffic causes the kernel
to panic on 32-bit ARM:
Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
Unhandled fault: alignment exception (0x221) at 0xea08133c
PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
The problem occurs when we try to update the 64-bit counters - the
faulting address above is not 64-bit aligned. The problem occurs
due to the way elements are allocated, for example:
If the element has a requirement for a member to be 64-bit aligned,
and set->dsize is not a multiple of 8, but is a multiple of four,
then every odd numbered elements will be misaligned - and hitting
an atomic64_add() on that element will cause the kernel to panic.
ip_set_elem_len() must return a size that is rounded to the maximum
alignment of any extension field stored in the element. This change
ensures that is the case.
Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment") Signed-off-by: Russell King <[email protected]> Acked-by: Jozsef Kadlecsik <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Bernard Zhao [Sat, 20 Jun 2020 09:11:52 +0000 (17:11 +0800)]
drm/amd: fix potential memleak in err branch
The function kobject_init_and_add alloc memory like:
kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
->kmalloc_slab, in err branch this memory not free. If use
kmemleak, this path maybe catched.
These changes are to add kobject_put in kobject_init_and_add
failed branch, fix potential memleak.
Stylon Wang [Fri, 12 Jun 2020 11:04:18 +0000 (19:04 +0800)]
drm/amd/display: Fix ineffective setting of max bpc property
[Why]
Regression was introduced where setting max bpc property has no effect
on the atomic check and final commit. It has the same effect as max bpc
being stuck at 8.
[How]
Correctly propagate max bpc with the new connector state.
Colin Ian King [Wed, 24 Jun 2020 10:13:02 +0000 (11:13 +0100)]
qed: add missing error test for DBG_STATUS_NO_MATCHING_FRAMING_MODE
The error DBG_STATUS_NO_MATCHING_FRAMING_MODE was added to the enum
enum dbg_status however there is a missing corresponding entry for
this in the array s_status_str. This causes an out-of-bounds read when
indexing into the last entry of s_status_str. Fix this by adding in
the missing entry.
Addresses-Coverity: ("Out-of-bounds read"). Fixes: 2d22bc8354b1 ("qed: FW 8.42.2.0 debug features") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
net: phy: call phy_disable_interrupts() in phy_init_hw()
We face an issue with rtl8211f, a pin is shared between INTB and PMEB,
and the PHY Register Accessible Interrupt is enabled by default, so
the INTB/PMEB pin is always active in polling mode case.
As Heiner pointed out "I was thinking about calling
phy_disable_interrupts() in phy_init_hw(), to have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state."
patch1 makes phy_disable_interrupts() non-static so that it could be used
in phy_init_hw() to have a defined init state.
patch2 calls phy_disable_interrupts() in phy_init_hw() to have a
defined init state.
Since v3:
- call phy_disable_interrupts() have interrupts disabled first then
config_init, thank Florian
Since v2:
- Don't export phy_disable_interrupts() but just make it non-static
Since v1:
- EXPORT the correct symbol
====================
Jisheng Zhang [Wed, 24 Jun 2020 07:59:23 +0000 (15:59 +0800)]
net: phy: call phy_disable_interrupts() in phy_init_hw()
Call phy_disable_interrupts() in phy_init_hw() to "have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state." as pointed
out by Heiner.
Jisheng Zhang [Wed, 24 Jun 2020 07:58:24 +0000 (15:58 +0800)]
net: phy: make phy_disable_interrupts() non-static
We face an issue with rtl8211f, a pin is shared between INTB and PMEB,
and the PHY Register Accessible Interrupt is enabled by default, so
the INTB/PMEB pin is always active in polling mode case.
As Heiner pointed out "I was thinking about calling
phy_disable_interrupts() in phy_init_hw(), to have a defined init
state as we don't know in which state the PHY is if the PHY driver is
loaded. We shouldn't assume that it's the chip power-on defaults, BIOS
or boot loader could have changed this. Or in case of dual-boot
systems the other OS could leave the PHY in whatever state."
Make phy_disable_interrupts() non-static so that it could be used in
phy_init_hw() to have a defined init state.
Sascha Hauer [Wed, 24 Jun 2020 07:00:45 +0000 (09:00 +0200)]
net: ethernet: mvneta: Add back interface mode validation
When writing the serdes configuration register was moved to
mvneta_config_interface() the whole code block was removed from
mvneta_port_power_up() in the assumption that its only purpose was to
write the serdes configuration register. As mentioned by Russell King
its purpose was also to check for valid interface modes early so that
later in the driver we do not have to care for unexpected interface
modes.
Add back the test to let the driver bail out early on unhandled
interface modes.
Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <[email protected]> Reviewed-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Sascha Hauer [Wed, 24 Jun 2020 07:00:44 +0000 (09:00 +0200)]
net: ethernet: mvneta: Do not error out in non serdes modes
In mvneta_config_interface() the RGMII modes are catched by the default
case which is an error return. The RGMII modes are valid modes for the
driver, so instead of returning an error add a break statement to return
successfully.
This avoids this warning for non comphy SoCs which use RGMII, like
SolidRun Clearfog:
WARNING: CPU: 0 PID: 268 at drivers/net/ethernet/marvell/mvneta.c:3512 mvneta_start_dev+0x220/0x23c
Fixes: b4748553f53f ("net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy") Signed-off-by: Sascha Hauer <[email protected]> Reviewed-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Mack [Sat, 20 Jun 2020 19:39:25 +0000 (21:39 +0200)]
dsa: Allow forwarding of redirected IGMP traffic
The driver for Marvell switches puts all ports in IGMP snooping mode
which results in all IGMP/MLD frames that ingress on the ports to be
forwarded to the CPU only.
The bridge code in the kernel can then interpret these frames and act
upon them, for instance by updating the mdb in the switch to reflect
multicast memberships of stations connected to the ports. However,
the IGMP/MLD frames must then also be forwarded to other ports of the
bridge so external IGMP queriers can track membership reports, and
external multicast clients can receive query reports from foreign IGMP
queriers.
Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
on the skb when it unwraps the tagged frames, and that will make the
switchdev layer prevent the skb from egressing on any other port of
the same switch.
To fix that, look at the To_CPU code in the DSA header and make
forwarding of the frame possible for trapped IGMP packets.
Introduce some #defines for the frame types to make the code a bit more
comprehensive.
Lorenzo Bianconi [Tue, 23 Jun 2020 16:33:15 +0000 (18:33 +0200)]
openvswitch: take into account de-fragmentation/gso_size in execute_check_pkt_len
ovs connection tracking module performs de-fragmentation on incoming
fragmented traffic. Take info account if traffic has been de-fragmented
in execute_check_pkt_len action otherwise we will perform the wrong
nested action considering the original packet size. This issue typically
occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
Moreover take into account GSO fragment size for GSO packet in
execute_check_pkt_len routine
Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Wed, 24 Jun 2020 21:26:28 +0000 (14:26 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Fixes all over the place.
This includes a couple of tests that I would normally defer, but since
they have already been helpful in catching some bugs, don't build for
any users at all, and having them upstream makes life easier for
everyone, I think it's ok even at this late stage"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: Use tools/include/list.h instead of stubs
tools/virtio: Reset index in virtio_test --reset.
tools/virtio: Extract virtqueue initialization in vq_reset
tools/virtio: Use __vring_new_virtqueue in virtio_test.c
tools/virtio: Add --reset
tools/virtio: Add --batch=random option
tools/virtio: Add --batch option
virtio-mem: add memory via add_memory_driver_managed()
virtio-mem: silence a static checker warning
vhost_vdpa: Fix potential underflow in vhost_vdpa_mmap()
vdpa: fix typos in the comments for __vdpa_alloc_device()
Linus Torvalds [Wed, 24 Jun 2020 21:19:45 +0000 (14:19 -0700)]
Merge tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fix from Christian Brauner:
"This fixes a regression introduced with 303cc571d107 ("nsproxy: attach
to namespaces via pidfds").
The LTP testsuite reported a regression where users would now see
EBADF returned instead of EINVAL when an fd was passed that referred
to an open file but the file was not a namespace file.
Fix this by continuing to report EINVAL and add a regression test"
* tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: test for setns() EINVAL regression
nsproxy: restore EINVAL for non-namespace file descriptor
Daniel Vetter [Wed, 24 Jun 2020 09:29:10 +0000 (11:29 +0200)]
drm/fb-helper: Fix vt restore
In the past we had a pile of hacks to orchestrate access between fbdev
emulation and native kms clients. We've tried to streamline this, by
always preferring the kms side above fbdev calls when a drm master
exists, because drm master controls access to the display resources.
Unfortunately this breaks existing userspace, specifically Xorg. When
exiting Xorg first restores the console to text mode using the KDSET
ioctl on the vt. This does nothing, because a drm master is still
around. Then it drops the drm master status, which again does nothing,
because logind is keeping additional drm fd open to be able to
orchestrate vt switches. In the past this is the point where fbdev was
restored, as part of the ->lastclose hook on the drm side.
Now to fix this regression we don't want to go back to letting fbdev
restore things whenever it feels like, or to the pile of hacks we've
had before. Instead try and go with a minimal exception to make the
KDSET case work again, and nothing else.
This means that if userspace does a KDSET call when switching between
graphical compositors, there will be some flickering with fbcon
showing up for a bit. But a) that's not a regression and b) userspace
can fix it by improving the vt switching dance - logind should have
all the information it needs.
While pondering all this I'm also wondering wheter we should have a
SWITCH_MASTER ioctl to allow race-free master status handover. But
that's for another day.
The issue happens because the current implementation relies on the netif
txq being stopped to control the flushing of the tx list.
There are two resources that the transmit logic can wait on and stop the
txq:
- SDMA descriptors
- Ring space to hold completions
The ring space is tested on the sending side and relieved when the ring is
consumed in the napi tx reaping.
Unfortunately, that reaping can run conncurrently with the workqueue
flushing of the txlist. If the txq is started just before the workitem
executes, the txlist will never be flushed, leading to the txq being
stuck.
Fix by:
- Adding sleep/wakeup wrappers
* Use an atomic to control the call to the netif routines inside the
wrappers
- Use another atomic to record ring space exhaustion
* Only wakeup when the a ring space exhaustion has happened and it
relieved
Add additional wrappers to clarify the ring space resource handling.
Mike Marciniszyn [Tue, 23 Jun 2020 20:43:22 +0000 (16:43 -0400)]
IB/hfi1: Correct -EBUSY handling in tx code
The current code mishandles -EBUSY in two ways:
- The flow change doesn't test the return from the flush and runs on to
process the current packet racing with the wakeup processing
- The -EBUSY handling for a single packet inserts the tx into the txlist
after the submit call, racing with the same wakeup processing
Fix the first by dropping the skb and returning NETDEV_TX_OK.
Fix the second by insuring the the list entry within the txreq is inited
when allocated. This enables the sleep routine to detect that the txreq
has used the non-list api and queue the packet to the txlist.
Both flaws can lead to having the flushing thread executing in causing two
threads to manipulate the txlist.
IB/hfi1: Fix module use count flaw due to leftover module put calls
When the try_module_get calls were removed from opening and closing of the
i2c debugfs file, the corresponding module_put calls were missed. This
results in an inaccurate module use count that requires a power cycle to
fix.
We need to do some rework on the dummy netdev. Calling the free_netdev()
would normally make sense, and that will be addressed in an upcoming
patch. For now just revert the behavior to what it was before keeping the
unused variable removal part of the patch.
The dd->dumm_netdev is mainly used for packet receiving through
alloc_netdev_mqs() for typical net devices. A a result, it should be freed
with kfree instead of free_netdev() that leads to a crash when unloading
the hfi1 module: