Guo Ren [Tue, 6 Oct 2020 16:49:33 +0000 (16:49 +0000)]
riscv: Fixup bootup failure with HARDENED_USERCOPY
6184358da000 ("riscv: Fixup static_obj() fail") attempted to elide a lockdep
failure by rearranging our kernel image to place all initdata within [_stext,
_end], thus triggering lockdep to treat these as static objects. These objects
are released and eventually reallocated, causing check_kernel_text_object() to
trigger a BUG().
This backs out the change to make [_stext, _end] all-encompassing, instead just
moving initdata. This results in initdata being outside of [__init_begin,
__init_end], which means initdata can't be freed.
Link: https://lore.kernel.org/linux-riscv/1593266228-61125-1-git-send-email-guoren@kernel.org/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
[Palmer: Clean up commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Tue, 6 Oct 2020 19:09:29 +0000 (12:09 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix a kernel panic in the AES crypto code caused by a BR tail call not
matching the target BTI instruction (when branch target identification
is enabled)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
crypto: arm64: Use x16 with indirect branch to bti_c
Linus Torvalds [Tue, 6 Oct 2020 19:00:52 +0000 (12:00 -0700)]
Merge tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull another x86 platform driver fix from Hans de Goede:
"One final pdx86 fix for Tablet Mode reporting regressions (which make
the keyboard and touchpad unusable) on various Asus notebooks.
These regressions were caused by the asus-nb-wmi and the intel-vbtn
drivers both receiving recent patches to start reporting Tablet Mode /
to report it on more models.
Due to a miscommunication between Andy and me, Andy's earlier pull-req
only contained the fix for the intel-vbtn driver and not the fix for
the asus-nb-wmi code.
This fix has been tested as a downstream patch in Fedora kernels for
approx two weeks with no problems being reported"
* tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models
Linus Torvalds [Tue, 6 Oct 2020 18:05:44 +0000 (11:05 -0700)]
Merge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Daniel queued these up last week and I took a long weekend so didn't
get them out, but fixing the OOB access on get font seems like
something we should land and it's cc'ed stable as well.
The other big change is a partial revert for a regression on android
on the clcd fbdev driver, and one other docs fix.
fbdev:
- Re-add FB_ARMCLCD for android
- Fix global-out-of-bounds read in fbcon_get_font()
core:
- Small doc fix"
* tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm:
drm: drm_dsc.h: fix a kernel-doc markup
Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver"
fbcon: Fix global-out-of-bounds read in fbcon_get_font()
Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
Linus Torvalds [Mon, 5 Oct 2020 17:56:22 +0000 (10:56 -0700)]
usermodehelper: reset umask to default before executing user process
Kernel threads intentionally do CLONE_FS in order to follow any changes
that 'init' does to set up the root directory (or cwd).
It is admittedly a bit odd, but it avoids the situation where 'init'
does some extensive setup to initialize the system environment, and then
we execute a usermode helper program, and it uses the original FS setup
from boot time that may be very limited and incomplete.
[ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
follow the root regardless, since it fixes up other users of root (see
chroot_fs_refs() for details), but overmounting root and doing a
chroot() would not. ]
However, Vegard Nossum noticed that the CLONE_FS not only means that we
follow the root and current working directories, it also means we share
umask with whatever init changed it to. That wasn't intentional.
Just reset umask to the original default (0022) before actually starting
the usermode helper program.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 5 Oct 2020 18:26:27 +0000 (11:26 -0700)]
splice: teach splice pipe reading about empty pipe buffers
Tetsuo Handa reports that splice() can return 0 before the real EOF, if
the data in the splice source pipe is an empty pipe buffer. That empty
pipe buffer case doesn't happen in any normal situation, but you can
trigger it by doing a write to a pipe that fails due to a page fault.
Tetsuo has a test-case to show the behavior:
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
int pipe_fd[2] = { -1, -1 };
pipe(pipe_fd);
write(pipe_fd[1], NULL, 4096);
/* This splice() should wait unless interrupted. */
return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
}
which results in
write(5, NULL, 4096) = -1 EFAULT (Bad address)
splice(4, NULL, 3, NULL, 65536, 0) = 0
and this can confuse splice() users into believing they have hit EOF
prematurely.
The issue was introduced when the pipe write code started pre-allocating
the pipe buffers before copying data from user space.
This is modified verion of Tetsuo's original patch.
Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
Link:https://lore.kernel.org/linux-fsdevel/
20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Linton [Tue, 6 Oct 2020 16:33:26 +0000 (11:33 -0500)]
crypto: arm64: Use x16 with indirect branch to bti_c
The AES code uses a 'br x7' as part of a function called by
a macro. That branch needs a bti_j as a target. This results
in a panic as seen below. Using x16 (or x17) with an indirect
branch keeps the target bti_c.
Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI
CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1
pstate:
20400c05 (nzCv daif +PAN -UAO BTYPE=j-)
pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs]
sp :
ffff80001052b730
aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
__xts_crypt+0xb0/0x2dc [aes_neon_bs]
xts_encrypt+0x28/0x3c [aes_neon_bs]
crypto_skcipher_encrypt+0x50/0x84
simd_skcipher_encrypt+0xc8/0xe0
crypto_skcipher_encrypt+0x50/0x84
test_skcipher_vec_cfg+0x224/0x5f0
test_skcipher+0xbc/0x120
alg_test_skcipher+0xa0/0x1b0
alg_test+0x3dc/0x47c
cryptomgr_test+0x38/0x60
Fixes: 0e89640b640d ("crypto: arm64 - Use modern annotations for assembly functions")
Cc: <stable@vger.kernel.org> # 5.6.x-
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Suggested-by: Dave P Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
David S. Miller [Tue, 6 Oct 2020 13:18:20 +0000 (06:18 -0700)]
Merge tag 'rxrpc-fixes-
20201005' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here are some miscellaneous rxrpc fixes:
(1) Fix the xdr encoding of the contents read from an rxrpc key.
(2) Fix a BUG() for a unsupported encoding type.
(3) Fix missing _bh lock annotations.
(4) Fix acceptance handling for an incoming call where the incoming call
is encrypted.
(5) The server token keyring isn't network namespaced - it belongs to the
server, so there's no need. Namespacing it means that request_key()
fails to find it.
(6) Fix a leak of the server keyring.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 5 Oct 2020 13:48:13 +0000 (06:48 -0700)]
tcp: fix receive window update in tcp_add_backlog()
We got reports from GKE customers flows being reset by netfilter
conntrack unless nf_conntrack_tcp_be_liberal is set to 1.
Traces seemed to suggest ACK packet being dropped by the
packet capture, or more likely that ACK were received in the
wrong order.
wscale=7, SYN and SYNACK not shown here.
This ACK allows the sender to send 1871*128 bytes from seq
51359321 :
New right edge of the window ->
51359321+1871*128=
51598809
09:17:23.389210 IP A > B: Flags [.], ack
51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389212 IP B > A: Flags [.], seq
51422681:
51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
09:17:23.389214 IP A > B: Flags [.], ack
51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389253 IP B > A: Flags [.], seq
51424089:
51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
09:17:23.389272 IP A > B: Flags [.], ack
51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389275 IP B > A: Flags [.], seq
51488857:
51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
Receiver now allows to send 606*128=77568 from seq
51521241 :
New right edge of the window ->
51521241+606*128=
51598809
09:17:23.389296 IP A > B: Flags [.], ack
51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389308 IP B > A: Flags [.], seq
51521241:
51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
It seems the sender exceeds RWIN allowance, since
51611353 >
51598809
09:17:23.389346 IP B > A: Flags [.], seq
51553625:
51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
09:17:23.389356 IP B > A: Flags [.], seq
51611353:
51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040
09:17:23.389367 IP A > B: Flags [.], ack
51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0
netfilter conntrack is not happy and sends RST
09:17:23.389389 IP A > B: Flags [R], seq
92176528, win 0, length 0
09:17:23.389488 IP B > A: Flags [R], seq
174478967, win 0, length 0
Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
New right edge of the window ->
51521241+859*128=
51631193
Normally TCP stack handles OOO packets just fine, but it
turns out tcp_add_backlog() does not. It can update the window
field of the aggregated packet even if the ACK sequence
of the last received packet is too old.
Many thanks to Alexandre Ferrieux for independently reporting the issue
and suggesting a fix.
Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anant Thazhemadam [Mon, 5 Oct 2020 13:29:58 +0000 (18:59 +0530)]
net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
When get_registers() fails in set_ethernet_addr(),the uninitialized
value of node_id gets copied over as the address.
So, check the return value of get_registers().
If get_registers() executed successfully (i.e., it returns
sizeof(node_id)), copy over the MAC address using ether_addr_copy()
(instead of using memcpy()).
Else, if get_registers() failed instead, a randomly generated MAC
address is set as the MAC address instead.
Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com
Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com
Acked-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 5 Oct 2020 10:01:06 +0000 (12:01 +0200)]
mptcp: more DATA FIN fixes
Currently data fin on data packet are not handled properly:
the 'rcv_data_fin_seq' field is interpreted as the last
sequence number carrying a valid data, but for data fin
packet with valid maps we currently store map_seq + map_len,
that is, the next value.
The 'write_seq' fields carries instead the value subseguent
to the last valid byte, so in mptcp_write_data_fin() we
never detect correctly the last DSS map.
Fixes: 7279da6145bb ("mptcp: Use MPTCP-level flag for sending DATA_FIN")
Fixes: 1a49b2c2a501 ("mptcp: Handle incoming 32-bit DATA_FIN values")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Oct 2020 13:05:47 +0000 (06:05 -0700)]
Merge branch 'Fix-tail-dropping-watermarks-for-Ocelot-switches'
Vladimir Oltean says:
====================
Fix tail dropping watermarks for Ocelot switches
This series adds a missing division by 60, and a warning to prevent that
in the future.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Oct 2020 09:09:12 +0000 (12:09 +0300)]
net: mscc: ocelot: warn when encoding an out-of-bounds watermark value
There is an upper bound to the value that a watermark may hold. That
upper bound is not immediately obvious during configuration, and it
might be possible to have accidental truncation.
Actually this has happened already, add a warning to prevent it from
happening again.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Oct 2020 09:09:11 +0000 (12:09 +0300)]
net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
Tail dropping is enabled for a port when:
1. A source port consumes more packet buffers than the watermark encoded
in SYS:PORT:ATOP_CFG.ATOP.
AND
2. Total memory use exceeds the consumption watermark encoded in
SYS:PAUSE_CFG:ATOP_TOT_CFG.
The unit of these watermarks is a 60 byte memory cell. That unit is
programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when
written into ATOP, it would get truncated and wrap around.
Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manivannan Sadhasivam [Mon, 5 Oct 2020 07:16:42 +0000 (12:46 +0530)]
net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
The rcu_read_lock() is not supposed to lock the kernel_sendmsg() API
since it has the lock_sock() in qrtr_sendmsg() which will sleep. Hence,
fix it by excluding the locking for kernel_sendmsg().
While at it, let's also use radix_tree_deref_retry() to confirm the
validity of the pointer returned by radix_tree_deref_slot() and use
radix_tree_iter_resume() to resume iterating the tree properly before
releasing the lock as suggested by Doug.
Fixes: a7809ff90ce6 ("net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks")
Reported-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Tue, 6 Oct 2020 03:40:25 +0000 (20:40 -0700)]
x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
The motivations to go rework memcpy_mcsafe() are that the benefit of
doing slow and careful copies is obviated on newer CPUs, and that the
current opt-in list of CPUs to instrument recovery is broken relative to
those CPUs. There is no need to keep an opt-in list up to date on an
ongoing basis if pmem/dax operations are instrumented for recovery by
default. With recovery enabled by default the old "mcsafe_key" opt-in to
careful copying can be made a "fragile" opt-out. Where the "fragile"
list takes steps to not consume poison across cachelines.
The discussion with Linus made clear that the current "_mcsafe" suffix
was imprecise to a fault. The operations that are needed by pmem/dax are
to copy from a source address that might throw #MC to a destination that
may write-fault, if it is a user page.
So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
the separate precautions taken on source and destination.
copy_mc_to_kernel() is introduced as a non-SMAP version that does not
expect write-faults on the destination, but is still prepared to abort
with an error code upon taking #MC.
The original copy_mc_fragile() implementation had negative performance
implications since it did not use the fast-string instruction sequence
to perform copies. For this reason copy_mc_to_kernel() fell back to
plain memcpy() to preserve performance on platforms that did not indicate
the capability to recover from machine check exceptions. However, that
capability detection was not architectural and now that some platforms
can recover from fast-string consumption of memory errors the memcpy()
fallback now causes these more capable platforms to fail.
Introduce copy_mc_enhanced_fast_string() as the fast default
implementation of copy_mc_to_kernel() and finalize the transition of
copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
With this in place, copy_mc_to_kernel() is fast and recovery-ready by
default regardless of hardware capability.
Thanks to Vivek for identifying that copy_user_generic() is not suitable
as the copy_mc_to_user() backend since the #MC handler explicitly checks
ex_has_fault_handler(). Thanks to the 0day robot for catching a
performance bug in the x86/copy_mc_to_user implementation.
[ bp: Add the "why" for this change from the 0/2th message, massage. ]
Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Reported-by: Erwin Tsaur <erwin.tsaur@intel.com>
Reported-by: 0day robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Erwin Tsaur <erwin.tsaur@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com
Dan Williams [Tue, 6 Oct 2020 03:40:16 +0000 (20:40 -0700)]
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.
Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:
On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.
Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().
Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.
[ bp: Massage a bit. ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
Hans de Goede [Wed, 16 Sep 2020 14:14:39 +0000 (16:14 +0200)]
platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models
Commit
b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for
SW_TABLET_MODE") added support for reporting SW_TABLET_MODE using the
Asus 0x00120063 WMI-device-id to see if various transformer models were
docked into their keyboard-dock (SW_TABLET_MODE=0) or if they were
being used as a tablet.
The new SW_TABLET_MODE support (naively?) assumed that non Transformer
devices would either not support the 0x00120063 WMI-device-id at all,
or would NOT set ASUS_WMI_DSTS_PRESENCE_BIT in their reply when querying
the device-id.
Unfortunately this is not true and we have received many bug reports about
this change causing the asus-wmi driver to always report SW_TABLET_MODE=1
on non Transformer devices. This causes libinput to think that these are
360 degree hinges style 2-in-1s folded into tablet-mode. Making libinput
suppress keyboard and touchpad events from the builtin keyboard and
touchpad. So effectively this causes the keyboard and touchpad to not work
on many non Transformer Asus models.
This commit fixes this by using the existing DMI based quirk mechanism in
asus-nb-wmi.c to allow using the 0x00120063 device-id for reporting
SW_TABLET_MODE on Transformer models and ignoring it on all other models.
Fixes: b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for SW_TABLET_MODE")
Link: https://patchwork.kernel.org/patch/11780901/
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209011
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1876997
Reported-by: Samuel Čavoj <samuel@cavoj.net>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Dave Airlie [Tue, 6 Oct 2020 02:34:58 +0000 (12:34 +1000)]
Merge tag 'drm-misc-fixes-2020-10-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.9:
- Small doc fix.
- Re-add FB_ARMCLCD for android.
- Fix global-out-of-bounds read in fbcon_get_font().
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8585daa2-fcbc-3924-ac4f-e7b5668808e0@linux.intel.com
Linus Torvalds [Mon, 5 Oct 2020 18:54:20 +0000 (11:54 -0700)]
Merge tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
"We have some fixes for Tablet Mode reporting in particular, that users
are complaining a lot about.
Summary:
- Attempt #3 of enabling Tablet Mode reporting w/o regressions
- Improve battery recognition code in ASUS WMI driver
- Fix Kconfig dependency warning for Fujitsu and LG laptop drivers
- Add fixes in Thinkpad ACPI driver for _BCL method and NVRAM polling
- Fix power supply extended topology in Mellanox driver
- Fix memory leak in OLPC EC driver
- Avoid static struct device in Intel PMC core driver
- Add support for the touchscreen found in MPMAN Converter9 2-in-1
- Update MAINTAINERS to reflect the real state of affairs"
* tag 'platform-drivers-x86-v5.9-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
MAINTAINERS: Add Mark Gross and Hans de Goede as x86 platform drivers maintainers
platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting
platform/x86: intel-vbtn: Revert "Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360"
platform/x86: intel_pmc_core: do not create a static struct device
platform/x86: mlx-platform: Fix extended topology configuration for power supply units
platform/x86: pcengines-apuv2: Fix typo on define of AMD_FCH_GPIO_REG_GPIO55_DEVSLP0
platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
platform/x86: fix kconfig dependency warning for LG_LAPTOP
platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360
platform/x86: asus-wmi: Add BATC battery name to the list of supported
platform/x86: asus-nb-wmi: Revert "Do not load on Asus T100TA and T200TA"
platform/x86: touchscreen_dmi: Add info for the MPMAN Converter9 2-in-1
Documentation: laptops: thinkpad-acpi: fix underline length build warning
Platform: OLPC: Fix memleak in olpc_ec_probe
Linus Torvalds [Mon, 5 Oct 2020 18:27:14 +0000 (11:27 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Make sure SKB control block is in the proper state during IPSEC
ESP-in-TCP encapsulation. From Sabrina Dubroca.
2) Various kinds of attributes were not being cloned properly when we
build new xfrm_state objects from existing ones. Fix from Antony
Antony.
3) Make sure to keep BTF sections, from Tony Ambardar.
4) TX DMA channels need proper locking in lantiq driver, from Hauke
Mehrtens.
5) Honour route MTU during forwarding, always. From Maciej
Żenczykowski.
6) Fix races in kTLS which can result in crashes, from Rohit
Maheshwari.
7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
Jha.
8) Use correct address family in xfrm state lookups, from Herbert Xu.
9) A bridge FDB flush should not clear out user managed fdb entries
with the ext_learn flag set, from Nikolay Aleksandrov.
10) Fix nested locking of netdev address lists, from Taehee Yoo.
11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.
12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.
13) Don't free command entries in mlx5 while comp handler could still be
running, from Eran Ben Elisha.
14) Error flow of request_irq() in mlx5 is busted, due to an off by one
we try to free and IRQ never allocated. From Maor Gottlieb.
15) Fix leak when dumping netlink policies, from Johannes Berg.
16) Sendpage cannot be performed when a page is a slab page, or the page
count is < 1. Some subsystems such as nvme were doing so. Create a
"sendpage_ok()" helper and use it as needed, from Coly Li.
17) Don't leak request socket when using syncookes with mptcp, from
Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net/core: check length before updating Ethertype in skb_mpls_{push,pop}
net: mvneta: fix double free of txq->buf
net_sched: check error pointer in tcf_dump_walker()
net: team: fix memory leak in __team_options_register
net: typhoon: Fix a typo Typoon --> Typhoon
net: hinic: fix DEVLINK build errors
net: stmmac: Modify configuration method of EEE timers
tcp: fix syn cookied MPTCP request socket leak
libceph: use sendpage_ok() in ceph_tcp_sendpage()
scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
tcp: use sendpage_ok() to detect misused .sendpage
nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
net: introduce helper sendpage_ok() in include/linux/net.h
net: usb: pegasus: Proper error handing when setting pegasus' MAC address
net: core: document two new elements of struct net_device
netlink: fix policy dump leak
net/mlx5e: Fix race condition on nhe->n pointer in neigh update
net/mlx5e: Fix VLAN create flow
...
Mark Rutland [Mon, 5 Oct 2020 16:43:03 +0000 (17:43 +0100)]
arm64: initialize per-cpu offsets earlier
The current initialization of the per-cpu offset register is difficult
to follow and this initialization is not always early enough for
upcoming instrumentation with KCSAN, where the instrumentation callbacks
use the per-cpu offset.
To make it possible to support KCSAN, and to simplify reasoning about
early bringup code, let's initialize the per-cpu offset earlier, before
we run any C code that may consume it. To do so, this patch adds a new
init_this_cpu_offset() helper that's called before the usual
primary/secondary start functions. For consistency, this is also used to
re-initialize the per-cpu offset after the runtime per-cpu areas have
been allocated (which can change CPU0's offset).
So that init_this_cpu_offset() isn't subject to any instrumentation that
might consume the per-cpu offset, it is marked with noinstr, preventing
instrumentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201005164303.21389-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:30 +0000 (17:26 +0530)]
kselftest/arm64: Check mte tagged user address in kernel
Add a testcase to check that user address with valid/invalid
mte tag works in kernel mode. This test verifies that the kernel
API's __arch_copy_from_user/__arch_copy_to_user works by considering
if the user pointer has valid/invalid allocation tags.
In MTE sync mode, file memory read/write and other similar interfaces
fails if a user memory with invalid tag is accessed in kernel. In async
mode no such failure occurs.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-7-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:29 +0000 (17:26 +0530)]
kselftest/arm64: Verify KSM page merge for MTE pages
Add a testcase to check that KSM should not merge pages containing
same data with same/different MTE tag values.
This testcase has one positive tests and passes if page merging
happens according to the above rule. It also saves and restores
any modified ksm sysfs entries.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-6-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:28 +0000 (17:26 +0530)]
kselftest/arm64: Verify all different mmap MTE options
This testcase checks the different unsupported/supported options for mmap
if used with PROT_MTE memory protection flag. These checks are,
* Either pstate.tco enable or prctl PR_MTE_TCF_NONE option should not cause
any tag mismatch faults.
* Different combinations of anonymous/file memory mmap, mprotect,
sync/async error mode and private/shared mappings should work.
* mprotect should not be able to clear the PROT_MTE page property.
Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-5-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:27 +0000 (17:26 +0530)]
kselftest/arm64: Check forked child mte memory accessibility
This test covers the mte memory behaviour of the forked process with
different mapping properties and flags. It checks that all bytes of
forked child memory are accessible with the same tag as that of the
parent and memory accessed outside the tag range causes fault to
occur.
Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-4-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:26 +0000 (17:26 +0530)]
kselftest/arm64: Verify mte tag inclusion via prctl
This testcase verifies that the tag generated with "irg" instruction
contains only included tags. This is done via prtcl call.
This test covers 4 scenarios,
* At least one included tag.
* More than one included tags.
* All included.
* None included.
Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-3-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Amit Daniel Kachhap [Fri, 2 Oct 2020 11:56:25 +0000 (17:26 +0530)]
kselftest/arm64: Add utilities and a test to validate mte memory
This test checks that the memory tag is present after mte allocation and
the memory is accessible with those tags. This testcase verifies all
sync, async and none mte error reporting mode. The allocated mte buffers
are verified for Allocated range (no error expected while accessing
buffer), Underflow range, and Overflow range.
Different test scenarios covered here are,
* Verify that mte memory are accessible at byte/block level.
* Force underflow and overflow to occur and check the data consistency.
* Check to/from between tagged and untagged memory.
* Check that initial allocated memory to have 0 tag.
This change also creates the necessary infrastructure to add mte test
cases. MTE kselftests can use the several utility functions provided here
to add wide variety of mte test scenarios.
GCC compiler need flag '-march=armv8.5-a+memtag' so those flags are
verified before compilation.
The mte testcases can be launched with kselftest framework as,
make TARGETS=arm64 ARM64_SUBTARGETS=mte kselftest
or compiled as,
make -C tools/testing/selftests TARGETS=arm64 ARM64_SUBTARGETS=mte CC='compiler'
Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-2-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
David Howells [Fri, 2 Oct 2020 13:04:51 +0000 (14:04 +0100)]
rxrpc: Fix server keyring leak
If someone calls setsockopt() twice to set a server key keyring, the first
keyring is leaked.
Fix it to return an error instead if the server key keyring is already set.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 30 Sep 2020 18:52:08 +0000 (19:52 +0100)]
rxrpc: The server keyring isn't network-namespaced
The keyring containing the server's tokens isn't network-namespaced, so it
shouldn't be looked up with a network namespace. It is expected to be
owned specifically by the server, so namespacing is unnecessary.
Fixes: a58946c158a0 ("keys: Pass the network namespace into request_key mechanism")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 30 Sep 2020 20:27:18 +0000 (21:27 +0100)]
rxrpc: Fix accept on a connection that need securing
When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket. This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref. This means that the call runs out of refs too
early and the kernel oopses.
By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.
Change the mode of operation for userspace rxrpc server sockets to work
like this too. Although this is a UAPI change, server sockets aren't
currently functional.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 1 Oct 2020 10:57:40 +0000 (11:57 +0100)]
rxrpc: Fix some missing _bh annotations on locking conn->state_lock
conn->state_lock may be taken in softirq mode, but a previous patch
replaced an outer lock in the response-packet event handling code, and lost
the _bh from that when doing so.
Fix this by applying the _bh annotation to the state_lock locking.
Fixes: a1399f8bb033 ("rxrpc: Call channels should have separate call number spaces")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 8 Sep 2020 21:09:04 +0000 (22:09 +0100)]
rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a
type it doesn't recognise, it can BUG in a couple of places, which is
unnecessary as it can easily get back to userspace.
Fix this to print an error message instead.
Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: David Howells <dhowells@redhat.com>
Marc Dionne [Fri, 4 Sep 2020 17:01:24 +0000 (14:01 -0300)]
rxrpc: Fix rxkad token xdr encoding
The session key should be encoded with just the 8 data bytes and
no length; ENCODE_DATA precedes it with a 4 byte length, which
confuses some existing tools that try to parse this format.
Add an ENCODE_BYTES macro that does not include a length, and use
it for the key. Also adjust the expected length.
Note that commit
774521f353e1d ("rxrpc: Fix an assertion in
rxrpc_read()") had fixed a BUG by changing the length rather than
fixing the encoding. The original length was correct.
Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Serge Semin [Fri, 2 Oct 2020 21:16:47 +0000 (00:16 +0300)]
MAINTAINERS: Add maintainer of DW APB SSI driver
Add myself as a maintainer of the Synopsis DesignWare APB SSI driver.
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201002211648.24320-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
Aaron Ma [Fri, 2 Oct 2020 17:09:16 +0000 (01:09 +0800)]
platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0.
When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered.
Re-initialize buffer size will make ACPI evaluate successfully.
Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Atish Patra [Thu, 1 Oct 2020 19:04:56 +0000 (12:04 -0700)]
RISC-V: Make sure memblock reserves the memory containing DT
Currently, the memory containing DT is not reserved. Thus, that region
of memory can be reallocated or reused for other purposes. This may result
in corrupted DT for nommu virt board in Qemu. We may not face any issue
in kendryte as DT is embedded in the kernel image for that.
Fixes: 6bd33e1ece52 ("riscv: add nommu support")
Cc: stable@vger.kernel.org
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Linus Torvalds [Sun, 4 Oct 2020 23:04:34 +0000 (16:04 -0700)]
Linux 5.9-rc8
Guillaume Nault [Fri, 2 Oct 2020 19:53:08 +0000 (21:53 +0200)]
net/core: check length before updating Ethertype in skb_mpls_{push,pop}
Openvswitch allows to drop a packet's Ethernet header, therefore
skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
doesn't point to an Ethernet header and the new Ethertype is written at
unexpected locations.
Fix this by verifying that mac_len is big enough to contain an Ethernet
header.
Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Rix [Sat, 3 Oct 2020 18:51:21 +0000 (11:51 -0700)]
net: mvneta: fix double free of txq->buf
clang static analysis reports this problem:
drivers/net/ethernet/marvell/mvneta.c:3465:2: warning:
Attempt to free released memory
kfree(txq->buf);
^~~~~~~~~~~~~~~
When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs,
it frees without poisoning txq->buf. The error is caught
in the mvneta_setup_txqs() caller which handles the error
by cleaning up all of the txqs with a call to
mvneta_txq_sw_deinit which also frees txq->buf.
Since mvneta_txq_sw_deinit is a general cleaner, all of the
partial cleaning in mvneta_txq_sw_deinit()'s error handling
is not needed.
Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 2 Oct 2020 19:13:34 +0000 (12:13 -0700)]
net_sched: check error pointer in tcf_dump_walker()
Although we take RTNL on dump path, it is possible to
skip RTNL on insertion path. So the following race condition
is possible:
rtnl_lock() // no rtnl lock
mutex_lock(&idrinfo->lock);
// insert ERR_PTR(-EBUSY)
mutex_unlock(&idrinfo->lock);
tc_dump_action()
rtnl_unlock()
So we have to skip those temporary -EBUSY entries on dump path
too.
Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anant Thazhemadam [Sun, 4 Oct 2020 20:55:36 +0000 (02:25 +0530)]
net: team: fix memory leak in __team_options_register
The variable "i" isn't initialized back correctly after the first loop
under the label inst_rollback gets executed.
The value of "i" is assigned to be option_count - 1, and the ensuing
loop (under alloc_rollback) begins by initializing i--.
Thus, the value of i when the loop begins execution will now become
i = option_count - 2.
Thus, when kfree(dst_opts[i]) is called in the second loop in this
order, (i.e., inst_rollback followed by alloc_rollback),
dst_optsp[option_count - 2] is the first element freed, and
dst_opts[option_count - 1] does not get freed, and thus, a memory
leak is caused.
This memory leak can be fixed, by assigning i = option_count (instead of
option_count - 1).
Fixes: 80f7c6683fe0 ("team: add support for per-port options")
Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarkko Sakkinen [Fri, 18 Sep 2020 23:41:19 +0000 (02:41 +0300)]
MAINTAINERS: TPM DEVICE DRIVER: Update GIT
Update Git URL to
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Si-Wei Liu [Sat, 3 Oct 2020 05:02:10 +0000 (01:02 -0400)]
vhost-vdpa: fix page pinning leakage in error path
Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path. As the inflight pinned
pages, specifically for memory region that strides across
multiple chunks, would need more than one free page for
book keeping and accounting. For simplicity, pin pages
for all memory in the IOVA range in one go rather than
have multiple pin_user_pages calls to make up the entire
region. This way it's easier to track and account the
pages already mapped, particularly for clean-up in the
error path.
Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Si-Wei Liu [Sat, 3 Oct 2020 05:02:09 +0000 (01:02 -0400)]
vhost-vdpa: fix vhost_vdpa_map() on error condition
vhost_vdpa_map() should remove the iotlb entry just added
if the corresponding mapping fails to set up properly.
Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Greg Kurz [Sat, 3 Oct 2020 10:02:13 +0000 (12:02 +0200)]
vhost: Don't call log_access_ok() when using IOTLB
When the IOTLB device is enabled, the log_guest_addr that is passed by
userspace to the VHOST_SET_VRING_ADDR ioctl, and which is then written
to vq->log_addr, is a GIOVA. All writes to this address are translated
by log_user() to writes to an HVA, and then ultimately logged through
the corresponding GPAs in log_write_hva(). No logging will ever occur
with vq->log_addr in this case. It is thus wrong to pass vq->log_addr
and log_guest_addr to log_access_vq() which assumes they are actual
GPAs.
Introduce a new vq_log_used_access_ok() helper that only checks accesses
to the log for the used structure when there isn't an IOTLB device around.
Signed-off-by: Greg Kurz <groug@kaod.org>
Link: https://lore.kernel.org/r/160171933385.284610.10189082586063280867.stgit@bahia.lan
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Greg Kurz [Sat, 3 Oct 2020 10:02:03 +0000 (12:02 +0200)]
vhost: Use vhost_get_used_size() in vhost_vring_set_addr()
The open-coded computation of the used size doesn't take the event
into account when the VIRTIO_RING_F_EVENT_IDX feature is present.
Fix that by using vhost_get_used_size().
Fixes: 8ea8cf89e19a ("vhost: support event index")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kurz <groug@kaod.org>
Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Greg Kurz [Sat, 3 Oct 2020 10:01:52 +0000 (12:01 +0200)]
vhost: Don't call access_ok() when using IOTLB
When the IOTLB device is enabled, the vring addresses we get
from userspace are GIOVAs. It is thus wrong to pass them down
to access_ok() which only takes HVAs.
Access validation is done at prefetch time with IOTLB. Teach
vq_access_ok() about that by moving the (vq->iotlb) check
from vhost_vq_access_ok() to vq_access_ok(). This prevents
vhost_vring_set_addr() to fail when verifying the accesses.
No behavior change for vhost_vq_access_ok().
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084
Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Cc: jasowang@redhat.com
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Christophe JAILLET [Fri, 2 Oct 2020 19:47:43 +0000 (21:47 +0200)]
net: typhoon: Fix a typo Typoon --> Typhoon
s/Typoon/Typhoon/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Thu, 1 Oct 2020 17:54:49 +0000 (10:54 -0700)]
net: hinic: fix DEVLINK build errors
Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK.
ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler':
hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump':
hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump':
hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put'
ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc':
hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free':
hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register':
hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister':
hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create':
hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create'
ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create'
ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy'
ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy':
Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bin Luo <luobin9@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Cc: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vineetha G. Jaya Kumaran [Thu, 1 Oct 2020 15:56:09 +0000 (23:56 +0800)]
net: stmmac: Modify configuration method of EEE timers
Ethtool manual stated that the tx-timer is the "the amount of time the
device should stay in idle mode prior to asserting its Tx LPI". The
previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW
timer duration which is not correct. Hence, this patch fixes the
"ethtool --set-eee tx-timer" to configure the EEE LPI timer.
The LPI TW Timer will be using the defined default value instead of
"ethtool --set-eee tx-timer" which follows the EEE LS timer implementation.
Changelog V2
*Not removing/modifying the eee_timer.
*EEE LPI timer can be configured through ethtool and also the eee_timer
module param.
*EEE TW Timer will be configured with default value only, not able to be
configured through ethtool or module param. This follows the implementation
of the EEE LS Timer.
Fixes: d765955d2ae0 ("stmmac: add the Energy Efficient Ethernet support")
Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 3 Oct 2020 19:19:23 +0000 (12:19 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Two bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest
Linus Torvalds [Sat, 3 Oct 2020 18:57:39 +0000 (11:57 -0700)]
Merge tag 'for-linus-5.9b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"Fix a regression introduced in 5.9-rc3 which caused a system running
as fully virtualized guest under Xen to crash when using legacy
devices like a floppy"
* tag 'for-linus-5.9b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: don't use chip_data for legacy IRQs
Linus Torvalds [Sat, 3 Oct 2020 18:47:35 +0000 (11:47 -0700)]
Merge tag 'usb-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some small USB and PHY driver fixes for 5.9-rc8
The PHY driver fix resolves an issue found by Dan Carpenter for a
memory leak.
The USB fixes fall into two groups:
- usb gadget fix from Bryan that is a fix for a previous security fix
that showed up in in-the-wild testing
- usb core driver matching bugfixes. This fixes a bug that has
plagued the both the usbip driver and syzbot testing tools this -rc
release cycle. All is now working properly so usbip connections
will work, and syzbot can get back to fuzzing USB drivers properly.
All have been in linux-next for a while with no reported issues"
* tag 'usb-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usbcore/driver: Accommodate usbip
usbcore/driver: Fix incorrect downcast
usbcore/driver: Fix specific driver selection
Revert "usbip: Implement a match function to fix usbip"
USB: gadget: f_ncm: Fix NDP16 datagram validation
phy: ti: am654: Fix a leak in serdes_am654_probe()
Linus Torvalds [Sat, 3 Oct 2020 18:40:22 +0000 (11:40 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some more driver fixes for i2c"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: npcm7xx: Clear LAST bit after a failed transaction.
i2c: cpm: Fix i2c_ram structure
i2c: i801: Exclude device from suspend direct complete optimization
Linus Torvalds [Sat, 3 Oct 2020 18:37:23 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A couple more driver quirks, now enabling newer trackpoints from
Synaptics for real"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
Input: trackpoint - enable Synaptics trackpoints
Eric Biggers [Sat, 3 Oct 2020 05:21:48 +0000 (22:21 -0700)]
scripts/spelling.txt: fix malformed entry
One of the entries has three fields "mistake||correction||correction"
rather than the expected two fields "mistake||correction". Fix it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200930234359.255295-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Sat, 3 Oct 2020 05:21:45 +0000 (22:21 -0700)]
mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.
For now, these APIs are used to prevent long-term pinning on the CMA
page. When the long-term pinning is requested on the CMA page, it is
migrated to the non-CMA page before pinning. This non-CMA page is
allocated by using memalloc_nocma_{save/restore} APIs. If APIs doesn't
work as intended, the CMA page is allocated and it is pinned for a long
time. This long-term pin for the CMA page causes cma_alloc() failure
and it could result in wrong behaviour on the device driver who uses the
cma_alloc().
Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA
isn't specified.
Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/1601429472-12599-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Farman [Sat, 3 Oct 2020 05:21:41 +0000 (22:21 -0700)]
mm, slub: restore initial kmem_cache flags
The routine that applies debug flags to the kmem_cache slabs
inadvertantly prevents non-debug flags from being applied to those
same objects. That is, if slub_debug=<flag>,<slab> is specified,
non-debugged slabs will end up having flags of zero, and the slabs
may be unusable.
Fix this by including the input flags for non-matching slabs with the
contents of slub_debug, so that the caches are created as expected
alongside any debugging options that may be requested. With this, we
can remove the check for a NULL slub_debug_string, since it's covered
by the loop itself.
Fixes: e17f1dfba37b ("mm, slub: extend slub_debug syntax for multiple blocks")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: https://lkml.kernel.org/r/20200930161931.28575-1-farman@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Sat, 3 Oct 2020 09:07:59 +0000 (05:07 -0400)]
Merge tag 'kvmarm-fixes-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm64 fixes for 5.9, take #3
- Fix synchronization of VTTBR update on TLB invalidation for nVHE systems
Paolo Bonzini [Tue, 29 Sep 2020 12:31:32 +0000 (08:31 -0400)]
KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of
the #PF intercept bit in the exception bitmap when they do not match.
This means that, if PFEC_MASK and/or PFEC_MATCH are set, the
hypervisor can get a vmexit for #PF exceptions even when the
corresponding bit is clear in the exception bitmap.
This is unexpected and is promptly detected by a WARN_ON_ONCE.
To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept
is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr).
Reported-by: Qian Cai <cai@redhat.com>>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David S. Miller [Fri, 2 Oct 2020 23:14:21 +0000 (16:14 -0700)]
Merge tag 'mlx5-fixes-2020-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
From: Saeed Mahameed <saeedm@nvidia.com>
====================
This series introduces some fixes to mlx5 driver.
v1->v2:
- Patch #1 Don't return while mutex is held. (Dave)
v2->v3:
- Drop patch #1, will consider a better approach (Jakub)
- use cpu_relax() instead of cond_resched() (Jakub)
- while(i--) to reveres a loop (Jakub)
- Drop old mellanox email sign-off and change the committer email
(Jakub)
Please pull and let me know if there is any problem.
For -stable v4.15
('net/mlx5e: Fix VLAN cleanup flow')
('net/mlx5e: Fix VLAN create flow')
For -stable v4.16
('net/mlx5: Fix request_irqs error flow')
For -stable v5.4
('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU')
('net/mlx5: Avoid possible free of command entry while timeout comp handler')
For -stable v5.7
('net/mlx5e: Fix return status when setting unsupported FEC mode')
For -stable v5.8
('net/mlx5e: Fix race condition on nhe->n pointer in neigh update')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Oct 2020 10:39:44 +0000 (12:39 +0200)]
tcp: fix syn cookied MPTCP request socket leak
If a syn-cookies request socket don't pass MPTCP-level
validation done in syn_recv_sock(), we need to release
it immediately, or it will be leaked.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/89
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Reported-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Oct 2020 22:27:08 +0000 (15:27 -0700)]
Merge branch 'Introduce-sendpage_ok-to-detect-misused-sendpage-in-network-related-drivers'
Coly Li says:
====================
Introduce sendpage_ok() to detect misused sendpage in network related drivers
As Sagi Grimberg suggested, the original fix is refind to a more common
inline routine:
static inline bool sendpage_ok(struct page *page)
{
return (!PageSlab(page) && page_count(page) >= 1);
}
If sendpage_ok() returns true, the checking page can be handled by the
concrete zero-copy sendpage method in network layer.
The v10 series has 7 patches, fixes a WARN_ONCE() usage from v9 series,
- The 1st patch in this series introduces sendpage_ok() in header file
include/linux/net.h.
- The 2nd patch adds WARN_ONCE() for improper zero-copy send in
kernel_sendpage().
- The 3rd patch fixes the page checking issue in nvme-over-tcp driver.
- The 4th patch adds page_count check by using sendpage_ok() in
do_tcp_sendpages() as Eric Dumazet suggested.
- The 5th and 6th patches just replace existing open coded checks with
the inline sendpage_ok() routine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:34 +0000 (16:27 +0800)]
libceph: use sendpage_ok() in ceph_tcp_sendpage()
In libceph, ceph_tcp_sendpage() does the following checks before handle
the page by network layer's zero copy sendpage method,
if (page_count(page) >= 1 && !PageSlab(page))
This check is exactly what sendpage_ok() does. This patch replace the
open coded checks by sendpage_ok() as a code cleanup.
Signed-off-by: Coly Li <colyli@suse.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:33 +0000 (16:27 +0800)]
scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
In iscsci driver, iscsi_tcp_segment_map() uses the following code to
check whether the page should or not be handled by sendpage:
if (!recv && page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg)))
The "page_count(sg_page(sg)) >= 1 && !PageSlab(sg_page(sg)" part is to
make sure the page can be sent to network layer's zero copy path. This
part is exactly what sendpage_ok() does.
This patch uses use sendpage_ok() in iscsi_tcp_segment_map() to replace
the original open coded checks.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Chris Leech <cleech@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:32 +0000 (16:27 +0800)]
drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
In _drbd_send_page() a page is checked by following code before sending
it by kernel_sendpage(),
(page_count(page) < 1) || PageSlab(page)
If the check is true, this page won't be send by kernel_sendpage() and
handled by sock_no_sendpage().
This kind of check is exactly what macro sendpage_ok() does, which is
introduced into include/linux/net.h to solve a similar send page issue
in nvme-tcp code.
This patch uses macro sendpage_ok() to replace the open coded checks to
page type and refcount in _drbd_send_page(), as a code cleanup.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:31 +0000 (16:27 +0800)]
tcp: use sendpage_ok() to detect misused .sendpage
commit
a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab
objects") adds the checks for Slab pages, but the pages don't have
page_count are still missing from the check.
Network layer's sendpage method is not designed to send page_count 0
pages neither, therefore both PageSlab() and page_count() should be
both checked for the sending page. This is exactly what sendpage_ok()
does.
This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused
.sendpage, to make the code more robust.
Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:30 +0000 (16:27 +0800)]
nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
send slab pages. But for pages allocated by __get_free_pages() without
__GFP_COMP, which also have refcount as 0, they are still sent by
kernel_sendpage() to remote end, this is problematic.
The new introduced helper sendpage_ok() checks both PageSlab tag and
page_count counter, and returns true if the checking page is OK to be
sent by kernel_sendpage().
This patch fixes the page checking issue of nvme_tcp_try_send_data()
with sendpage_ok(). If sendpage_ok() returns true, send this page by
kernel_sendpage(), otherwise use sock_no_sendpage to handle this page.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Vlastimil Babka <vbabka@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:29 +0000 (16:27 +0800)]
net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
If a page sent into kernel_sendpage() is a slab page or it doesn't have
ref_count, this page is improper to send by the zero copy sendpage()
method. Otherwise such page might be unexpected released in network code
path and causes impredictable panic due to kernel memory management data
structure corruption.
This path adds a WARN_ON() on the sending page before sends it into the
concrete zero-copy sendpage() method, if the page is improper for the
zero-copy sendpage() method, a warning message can be observed before
the consequential unpredictable kernel panic.
This patch does not change existing kernel_sendpage() behavior for the
improper page zero-copy send, it just provides hint warning message for
following potential panic due the kernel memory heap corruption.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Cong Wang <amwang@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coly Li [Fri, 2 Oct 2020 08:27:28 +0000 (16:27 +0800)]
net: introduce helper sendpage_ok() in include/linux/net.h
The original problem was from nvme-over-tcp code, who mistakenly uses
kernel_sendpage() to send pages allocated by __get_free_pages() without
__GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
tail pages, sending them by kernel_sendpage() may trigger a kernel panic
from a corrupted kernel heap, because these pages are incorrectly freed
in network stack as page_count 0 pages.
This patch introduces a helper sendpage_ok(), it returns true if the
checking page,
- is not slab page: PageSlab(page) is false.
- has page refcount: page_count(page) is not zero
All drivers who want to send page to remote end by kernel_sendpage()
may use this helper to check whether the page is OK. If the helper does
not return true, the driver should try other non sendpage method (e.g.
sock_no_sendpage()) to handle the page.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Vlastimil Babka <vbabka@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Petko Manolov [Fri, 2 Oct 2020 07:56:04 +0000 (10:56 +0300)]
net: usb: pegasus: Proper error handing when setting pegasus' MAC address
v2:
If reading the MAC address from eeprom fail don't throw an error, use randomly
generated MAC instead. Either way the adapter will soldier on and the return
type of set_ethernet_addr() can be reverted to void.
v1:
Fix a bug in set_ethernet_addr() which does not take into account possible
errors (or partial reads) returned by its helpers. This can potentially lead to
writing random data into device's MAC address registers.
Signed-off-by: Petko Manolov <petko.manolov@konsulko.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mauro Carvalho Chehab [Fri, 2 Oct 2020 05:49:45 +0000 (07:49 +0200)]
net: core: document two new elements of struct net_device
As warned by "make htmldocs", there are two new struct elements
that aren't documented:
../include/linux/netdevice.h:2159: warning: Function parameter or member 'unlink_list' not described in 'net_device'
../include/linux/netdevice.h:2159: warning: Function parameter or member 'nested_level' not described in 'net_device'
Fixes: 1fc70edb7d7b ("net: core: add nested_level variable in net_device")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Oct 2020 21:51:34 +0000 (14:51 -0700)]
Merge tag 'pinctrl-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some pin control fixes here. All of them are driver fixes, the Intel
Cherryview being the most interesting one.
- Fix a mux problem for I2C in the MVEBU driver.
- Fix a really hairy inversion problem in the Intel Cherryview
driver.
- Fix the register for the sdc2_clk in the Qualcomm SM8250 driver.
- Check the virtual GPIO boot failur in the Mediatek driver"
* tag 'pinctrl-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: check mtk_is_virt_gpio input parameter
pinctrl: qcom: sm8250: correct sdc2_clk
pinctrl: cherryview: Preserve CHV_PADCTRL1_INVRXTX_TXDATA flag on GPIOs
pinctrl: mvebu: Fix i2c sda definition for 98DX3236
Linus Torvalds [Fri, 2 Oct 2020 21:48:25 +0000 (14:48 -0700)]
Merge tag 'pci-v5.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix rockchip regression in rockchip_pcie_valid_device() (Lorenzo
Pieralisi)
- Add Pali Rohár as aardvark PCI maintainer (Pali Rohár)
* tag 'pci-v5.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Add Pali Rohár as aardvark PCI maintainer
PCI: rockchip: Fix bus checks in rockchip_pcie_valid_device()
Linus Torvalds [Fri, 2 Oct 2020 21:42:13 +0000 (14:42 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two patches in driver frameworks. The iscsi one corrects a bug induced
by a BPF change to network locking and the other is a regression we
introduced"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()
scsi: target: Fix lun lookup for TARGET_SCF_LOOKUP_LUN_FROM_TAG case
Linus Torvalds [Fri, 2 Oct 2020 21:38:10 +0000 (14:38 -0700)]
Merge tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- fix for async buffered reads if read-ahead is fully disabled (Hao)
- double poll match fix
- ->show_fdinfo() potential ABBA deadlock complaint fix
* tag 'io_uring-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
io_uring: fix async buffered reads when readahead is disabled
io_uring: fix potential ABBA deadlock in ->show_fdinfo()
io_uring: always delete double poll wait entry on match
Linus Torvalds [Fri, 2 Oct 2020 21:34:52 +0000 (14:34 -0700)]
Merge tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Single fix for a ->commit_rqs failure case"
* tag 'block-5.9-2020-10-02' of git://git.kernel.dk/linux-block:
blk-mq: call commit_rqs while list empty but error happen
Heinrich Schuchardt [Fri, 2 Oct 2020 19:06:23 +0000 (21:06 +0200)]
Documentation/x86: Fix incorrect references to zero-page.txt
The file zero-page.txt does not exit. Add links to zero-page.rst
instead.
[ bp: Massage a bit. ]
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201002190623.7489-1-xypron.glpk@gmx.de
Johannes Berg [Fri, 2 Oct 2020 07:46:04 +0000 (09:46 +0200)]
netlink: fix policy dump leak
If userspace doesn't complete the policy dump, we leak the
allocated state. Fix this.
Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peilin Ye [Fri, 2 Oct 2020 14:22:23 +0000 (10:22 -0400)]
block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()
scsi_put_cdrom_generic_arg() is copying uninitialized stack memory to
userspace, since the compiler may leave a 3-byte hole in the middle of
`cgc32`. Fix it by adding a padding field to `struct
compat_cdrom_generic_command`.
Cc: stable@vger.kernel.org
Fixes: f3ee6e63a9df ("compat_ioctl: move CDROM_SEND_PACKET handling into scsi")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: syzbot+85433a479a646a064ab3@syzkaller.appspotmail.com
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vlad Buslov [Sun, 20 Sep 2020 16:59:08 +0000 (19:59 +0300)]
net/mlx5e: Fix race condition on nhe->n pointer in neigh update
Current neigh update event handler implementation takes reference to
neighbour structure, assigns it to nhe->n, tries to schedule workqueue task
and releases the reference if task was already enqueued. This results
potentially overwriting existing nhe->n pointer with another neighbour
instance, which causes double release of the instance (once in neigh update
handler that failed to enqueue to workqueue and another one in neigh update
workqueue task that processes updated nhe->n pointer instead of original
one):
[ 3376.512806] ------------[ cut here ]------------
[ 3376.513534] refcount_t: underflow; use-after-free.
[ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd
grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c
_core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw]
[ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6
[ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0
[ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
[ 3376.536017] RSP: 0018:
ffffc90002a97e30 EFLAGS:
00010286
[ 3376.536793] RAX:
0000000000000000 RBX:
ffff8882de30d648 RCX:
0000000000000000
[ 3376.537718] RDX:
ffff8882f5c28f20 RSI:
ffff8882f5c18e40 RDI:
ffff8882f5c18e40
[ 3376.538654] RBP:
ffff8882cdf56c00 R08:
000000000000c580 R09:
0000000000001a4d
[ 3376.539582] R10:
0000000000000731 R11:
ffffc90002a97ccd R12:
0000000000000000
[ 3376.540519] R13:
ffff8882de30d600 R14:
ffff8882de30d640 R15:
ffff88821e000900
[ 3376.541444] FS:
0000000000000000(0000) GS:
ffff8882f5c00000(0000) knlGS:
0000000000000000
[ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 3376.543545] CR2:
0000556e5504b248 CR3:
00000002c6f10005 CR4:
0000000000770ee0
[ 3376.544483] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 3376.545419] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 3376.546344] PKRU:
55555554
[ 3376.546911] Call Trace:
[ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core]
[ 3376.548299] process_one_work+0x1d8/0x390
[ 3376.548977] worker_thread+0x4d/0x3e0
[ 3376.549631] ? rescuer_thread+0x3e0/0x3e0
[ 3376.550295] kthread+0x118/0x130
[ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70
[ 3376.551675] ret_from_fork+0x1f/0x30
[ 3376.552312] ---[ end trace
d84e8f46d2a77eec ]---
Fix the bug by moving work_struct to dedicated dynamically-allocated
structure. This enabled every event handler to work on its own private
neighbour pointer and removes the need for handling the case when task is
already enqueued.
Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Sun, 13 Sep 2020 15:05:40 +0000 (18:05 +0300)]
net/mlx5e: Fix VLAN create flow
When interface is attached while in promiscuous mode and with VLAN
filtering turned off, both configurations are not respected and VLAN
filtering is performed.
There are 2 flows which add the any-vid rules during interface attach:
VLAN creation table and set rx mode. Each is relaying on the other to
add any-vid rules, eventually non of them does.
Fix this by adding any-vid rules on VLAN creation regardless of
promiscuous mode.
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Sun, 13 Sep 2020 14:57:23 +0000 (17:57 +0300)]
net/mlx5e: Fix VLAN cleanup flow
Prior to this patch unloading an interface in promiscuous mode with RX
VLAN filtering feature turned off - resulted in a warning. This is due
to a wrong condition in the VLAN rules cleanup flow, which left the
any-vid rules in the VLAN steering table. These rules prevented
destroying the flow group and the flow table.
The any-vid rules are removed in 2 flows, but none of them remove it in
case both promiscuous is set and VLAN filtering is off. Fix the issue by
changing the condition of the VLAN table cleanup flow to clean also in
case of promiscuous mode.
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1
...
...
------------[ cut here ]------------
FW pages counter is 11560 after reclaiming all pages
WARNING: CPU: 1 PID: 28729 at
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660
mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
mlx5_function_teardown+0x2f/0x90 [mlx5_core]
mlx5_unload_one+0x71/0x110 [mlx5_core]
remove_one+0x44/0x80 [mlx5_core]
pci_device_remove+0x3e/0xc0
device_release_driver_internal+0xfb/0x1c0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x68/0x90
pci_stop_and_remove_bus_device+0x12/0x20
hv_eject_device_work+0x6f/0x170 [pci_hyperv]
? __schedule+0x349/0x790
process_one_work+0x206/0x400
worker_thread+0x34/0x3f0
? process_one_work+0x400/0x400
kthread+0x126/0x140
? kthread_park+0x90/0x90
ret_from_fork+0x22/0x30
---[ end trace
6283bde8d26170dc ]---
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Wed, 12 Aug 2020 07:44:36 +0000 (10:44 +0300)]
net/mlx5e: Fix return status when setting unsupported FEC mode
Verify the configured FEC mode is supported by at least a single link
mode before applying the command. Otherwise fail the command and return
"Operation not supported".
Prior to this patch, the command was successful, yet it falsely set all
link modes to FEC auto mode - like configuring FEC mode to auto. Auto
mode is the default configuration if a link mode doesn't support the
configured FEC mode.
Fixes: b5ede32d3329 ("net/mlx5e: Add support for FEC modes based on 50G per lane links")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Sun, 9 Aug 2020 09:34:21 +0000 (12:34 +0300)]
net/mlx5e: Fix driver's declaration to support GRE offload
Declare GRE offload support with respect to the inner protocol. Add a
list of supported inner protocols on which the driver can offload
checksum and GSO. For other protocols, inform the stack to do the needed
operations. There is no noticeable impact on GRE performance.
Fixes: 2729984149e6 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Tue, 1 Sep 2020 09:33:18 +0000 (12:33 +0300)]
net/mlx5e: CT, Fix coverity issue
The cited commit introduced the following coverity issue at function
mlx5_tc_ct_rule_to_tuple_nat:
- Memory - corruptions (OVERRUN)
Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte
elements at element index 7 (byte offset 31) using index
"ip6_offset" (which evaluates to 7).
In case of IPv6 destination address rewrite, ip6_offset values are
between 4 to 7, which will cause memory overrun of array
"tuple->ip.src_v6.in6_u.u6_addr32" to array
"tuple->ip.dst_v6.in6_u.u6_addr32".
Fixed by writing the value directly to array
"tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are
between 4 to 7.
Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 20 Jul 2020 13:53:18 +0000 (16:53 +0300)]
net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU
Prior to this fix, in Striding RQ mode the driver was vulnerable when
receiving packets in the range (stride size - headroom, stride size].
Where stride size is calculated by mtu+headroom+tailroom aligned to the
closest power of 2.
Usually, this filtering is performed by the HW, except for a few cases:
- Between 2 VFs over the same PF with different MTUs
- On bluefield, when the host physical function sets a larger MTU than
the ARM has configured on its representor and uplink representor.
When the HW filtering is not present, packets that are larger than MTU
might be harmful for the RQ's integrity, in the following impacts:
1) Overflow from one WQE to the next, causing a memory corruption that
in most cases is unharmful: as the write happens to the headroom of next
packet, which will be overwritten by build_skb(). In very rare cases,
high stress/load, this is harmful. When the next WQE is not yet reposted
and points to existing SKB head.
2) Each oversize packet overflows to the headroom of the next WQE. On
the last WQE of the WQ, where addresses wrap-around, the address of the
remainder headroom does not belong to the next WQE, but it is out of the
memory region range. This results in a HW CQE error that moves the RQ
into an error state.
Solution:
Add a page buffer at the end of each WQE to absorb the leak. Actually
the maximal overflow size is headroom but since all memory units must be
of the same size, we use page size to comply with UMR WQEs. The increase
in memory consumption is of a single page per RQ. Initialize the mkey
with all MTTs pointing to a default page. When the channels are
activated, UMR WQEs will redirect the RX WQEs to the actual memory from
the RQ's pool, while the overflow MTTs remain mapped to the default page.
Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Aya Levin [Mon, 3 Aug 2020 13:22:42 +0000 (16:22 +0300)]
net/mlx5e: Fix error path for RQ alloc
Increase granularity of the error path to avoid unneeded free/release.
Fix the cleanup to be symmetric to the order of creation.
Fixes: 0ddf543226ac ("xdp/mlx5: setup xdp_rxq_info")
Fixes: 422d4c401edd ("net/mlx5e: RX, Split WQ objects for different RQ types")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Mon, 31 Aug 2020 18:37:31 +0000 (21:37 +0300)]
net/mlx5: Fix request_irqs error flow
Fix error flow handling in request_irqs which try to free irq
that we failed to request.
It fixes the below trace.
WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60
CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1
Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020
RIP: 0010:free_irq+0x4d/0x60
RSP: 0018:
ffffc9000ef47af0 EFLAGS:
00010282
RAX:
ffff88001476ae00 RBX:
0000000000000655 RCX:
0000000000000000
RDX:
ffff88001476ae00 RSI:
ffffc9000ef47ab8 RDI:
ffff8800398bb478
RBP:
ffff88001476a838 R08:
ffff88001476ae00 R09:
000000000000156d
R10:
0000000000000000 R11:
0000000000000004 R12:
ffff88001476a838
R13:
0000000000000006 R14:
ffff88001476a888 R15:
00000000ffffffe4
FS:
00007efeadd32740(0000) GS:
ffff88047fc40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc9cc010008 CR3:
00000001a2380004 CR4:
00000000007606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
mlx5_irq_table_create+0x38d/0x400 [mlx5_core]
? atomic_notifier_chain_register+0x50/0x60
mlx5_load_one+0x7ee/0x1130 [mlx5_core]
init_one+0x4c9/0x650 [mlx5_core]
pci_device_probe+0xb8/0x120
driver_probe_device+0x2a1/0x470
? driver_allows_async_probing+0x30/0x30
bus_for_each_drv+0x54/0x80
__device_attach+0xa3/0x100
pci_bus_add_device+0x4a/0x90
pci_iov_add_virtfn+0x2dc/0x2f0
pci_enable_sriov+0x32e/0x420
mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core]
? kstrtoll+0x22/0x70
num_vf_store+0x4b/0x70 [mlx5_core]
kernfs_fop_write+0x102/0x180
__vfs_write+0x26/0x140
? rcu_all_qs+0x5/0x80
? _cond_resched+0x15/0x30
? __sb_start_write+0x41/0x80
vfs_write+0xad/0x1a0
SyS_write+0x42/0x90
do_syscall_64+0x60/0x110
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 24163189da48 ("net/mlx5: Separate IRQ request/free from EQ life cycle")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Fri, 11 Sep 2020 18:48:55 +0000 (11:48 -0700)]
net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible
In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.
This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.
In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().
Alternative fix:
1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
command success simulation list.
2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eran Ben Elisha [Mon, 31 Aug 2020 12:04:35 +0000 (15:04 +0300)]
net/mlx5: Add retry mechanism to the command entry index allocation
It is possible that new command entry index allocation will temporarily
fail. The new command holds the semaphore, so it means that a free entry
should be ready soon. Add one second retry mechanism before returning an
error.
Patch "net/mlx5: Avoid possible free of command entry while timeout comp
handler" increase the possibility to bump into this temporarily failure
as it delays the entry index release for non-callback commands.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eran Ben Elisha [Tue, 21 Jul 2020 07:25:52 +0000 (10:25 +0300)]
net/mlx5: poll cmd EQ in case of command timeout
Once driver detects a command interface command timeout, it warns the
user and returns timeout error to the caller. In such case, the entry of
the command is not evacuated (because only real event interrupt is allowed
to clear command interface entry). If the HW event interrupt
of this entry will never arrive, this entry will be left unused forever.
Command interface entries are limited and eventually we can end up without
the ability to post a new command.
In addition, if driver will not consume the EQE of the lost interrupt and
rearm the EQ, no new interrupts will arrive for other commands.
Add a resiliency mechanism for manually polling the command EQ in case of
a command timeout. In case resiliency mechanism will find non-handled EQE,
it will consume it, and the command interface will be fully functional
again. Once the resiliency flow finished, wait another 5 seconds for the
command interface to complete for this command entry.
Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow.
Add an async EQ spinlock to avoid races between resiliency flows and real
interrupts that might run simultaneously.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eran Ben Elisha [Tue, 4 Aug 2020 07:40:21 +0000 (10:40 +0300)]
net/mlx5: Avoid possible free of command entry while timeout comp handler
Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.
Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.
Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().
Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.
For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.
In addition, gather the entry index free and the entry free into a one
flow for all command types release.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Eran Ben Elisha [Thu, 13 Aug 2020 13:55:20 +0000 (16:55 +0300)]
net/mlx5: Fix a race when moving command interface to polling mode
As part of driver unload, it destroys the commands EQ (via FW command).
As the commands EQ is destroyed, FW will not generate EQEs for any command
that driver sends afterwards. Driver should poll for later commands status.
Driver commands mode metadata is updated before the commands EQ is
actually destroyed. This can lead for double completion handle by the
driver (polling and interrupt), if a command is executed and completed by
FW after the mode was changed, but before the EQ was destroyed.
Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
that only DESTROY_EQ command can be executed during this time period.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Linus Torvalds [Fri, 2 Oct 2020 17:37:08 +0000 (10:37 -0700)]
Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
"Several race fixes in epoll"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ep_create_wakeup_source(): dentry name can change under you...
epoll: EPOLL_CTL_ADD: close the race in decision to take fast path
epoll: replace ->visited/visited_list with generation count
epoll: do not insert into poll queues until all sanity checks are done
Linus Torvalds [Fri, 2 Oct 2020 17:13:05 +0000 (10:13 -0700)]
Merge tag 'riscv-for-linus-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes for this week:
- The addition of a symbol export for clint_time_val, which has been
inlined into some timex functions and can be used by drivers.
- A fix to avoid calling get_cycles() before the timers have been
probed.
These both only effect !MMU systems"
* tag 'riscv-for-linus-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Check clint_time_val before use
clocksource: clint: Export clint_time_val for modules
Linus Torvalds [Fri, 2 Oct 2020 17:09:40 +0000 (10:09 -0700)]
Merge tag 'for-5.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two more fixes.
One is for a lockdep warning/lockup (also caught by syzbot), that one
has been seen in practice. Regarding the other syzbot reports
mentioned last time, they don't seem to be urgent and reliably
reproducible so they'll be fixed later.
The second fix is for a potential corruption when device replace
finishes and the in-memory state of trim is not copied to the new
device"
* tag 'for-5.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix filesystem corruption after a device replace
btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks
btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing
Linus Torvalds [Fri, 2 Oct 2020 17:05:56 +0000 (10:05 -0700)]
Merge tag 'pm-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix one more issue related to the recent RCU-lockdep changes, a
typo in documentation and add a missing return statement to
intel_pstate.
Specifics:
- Fix up RCU usage for cpuidle on the ARM imx6q platform (Ulf
Hansson)
- Fix typo in the PM documentation (Yoann Congal)
- Add return statement that is missing after recent changes in the
intel_pstate driver (Zhang Rui)"
* tag 'pm-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ARM: imx6q: Fixup RCU usage for cpuidle
Documentation: PM: Fix a reStructuredText syntax error
cpufreq: intel_pstate: Fix missing return statement
Linus Torvalds [Fri, 2 Oct 2020 17:01:00 +0000 (10:01 -0700)]
Merge tag 'staging-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO fixes from Greg KH:
"Here are two small IIO driver fixes for 5.9-rc8 that resolve some
reported issues:
- driver name fixed in one driver
- device name typo fixed
Both have been in linux-next for a while with no reported problems"
* tag 'staging-5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: qcom-spmi-adc5: fix driver name
iio: adc: ad7124: Fix typo in device name
This page took 0.139641 seconds and 4 git commands to generate.