Thomas Gleixner [Wed, 17 Jan 2018 15:01:47 +0000 (16:01 +0100)]
irq/matrix: Spread interrupts on allocation
Keith reported an issue with vector space exhaustion on a server machine
which is caused by the i40e driver allocating 168 MSI interrupts when the
driver is initialized, even when most of these interrupts are not used at
all.
The x86 vector allocation code tries to avoid the immediate allocation with
the reservation mode, but the card uses MSI and does not support MSI entry
masking, which prevents reservation mode and requires immediate vector
allocation.
The matrix allocator is a bit naive and prefers the first CPU in the
cpumask which describes the possible target CPUs for an allocation. That
results in allocating all 168 vectors on CPU0 which later causes vector
space exhaustion when the NVMe driver tries to allocate managed interrupts
on each CPU for the per CPU queues.
Avoid this by finding the CPU which has the lowest vector allocation count
to spread out the non managed interrupt accross the possible target CPUs.
Yossi Kuperman [Wed, 17 Jan 2018 13:52:41 +0000 (15:52 +0200)]
xfrm: Add SA to hardware at the end of xfrm_state_construct()
Current code configures the hardware with a new SA before the state has been
fully initialized. During this time interval, an incoming ESP packet can cause
a crash due to a NULL dereference. More specifically, xfrm_input() considers
the packet as valid, and yet, anti-replay mechanism is not initialized.
Move hardware configuration to the end of xfrm_state_construct(), and mark
the state as valid once the SA is fully initialized.
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
If an invalid CANFD frame is received, from a driver or from a tun
interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
If an invalid CAN frame is received, from a driver or from a tun
interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.
Dave Airlie [Thu, 18 Jan 2018 03:30:22 +0000 (13:30 +1000)]
Merge tag 'drm-misc-fixes-2018-01-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Final 4.15 drm-misc pull:
Just 3 sun4i patches to fix clock computation/checks.
* tag 'drm-misc-fixes-2018-01-17' of git://anongit.freedesktop.org/drm/drm-misc:
drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate
Dave Airlie [Thu, 18 Jan 2018 03:29:24 +0000 (13:29 +1000)]
Merge branch 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Last minute fixes for vmwgfx.
One fix for a drm helper warning introduced in 4.15
One important fix for a longer standing memory corruption issue on older
hardware versions.
* 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: fix memory corruption with legacy/sou connectors
drm/vmwgfx: Fix a boot time warning
Daniel Borkmann [Thu, 18 Jan 2018 00:15:21 +0000 (01:15 +0100)]
bpf: mark dst unknown on inconsistent {s, u}bounds adjustments
syzkaller generated a BPF proglet and triggered a warning with
the following:
0: (b7) r0 = 0
1: (d5) if r0 s<= 0x0 goto pc+0
R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
2: (1f) r0 -= r1
R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
verifier internal error: known but bad sbounds
What happens is that in the first insn, r0's min/max value
are both 0 due to the immediate assignment, later in the jsle
test the bounds are updated for the min value in the false
path, meaning, they yield smin_val = 1, smax_val = 0, and when
ctx pointer is subtracted from r0, verifier bails out with the
internal error and throwing a WARN since smin_val != smax_val
for the known constant.
For min_val > max_val scenario it means that reg_set_min_max()
and reg_set_min_max_inv() (which both refine existing bounds)
demonstrated that such branch cannot be taken at runtime.
In above scenario for the case where it will be taken, the
existing [0, 0] bounds are kept intact. Meaning, the rejection
is not due to a verifier internal error, and therefore the
WARN() is not necessary either.
We could just reject such cases in adjust_{ptr,scalar}_min_max_vals()
when either known scalars have smin_val != smax_val or
umin_val != umax_val or any scalar reg with bounds
smin_val > smax_val or umin_val > umax_val. However, there
may be a small risk of breakage of buggy programs, so handle
this more gracefully and in adjust_{ptr,scalar}_min_max_vals()
just taint the dst reg as unknown scalar when we see ops with
such kind of src reg.
Daniel Borkmann [Wed, 17 Jan 2018 21:36:49 +0000 (22:36 +0100)]
bpf: fix cls_bpf on filter replace
Running the following sequence is currently broken:
# tc qdisc add dev foo clsact
# tc filter replace dev foo ingress prio 1 handle 1 bpf da obj bar.o
# tc filter replace dev foo ingress prio 1 handle 1 bpf da obj bar.o
RTNETLINK answers: Invalid argument
The normal expectation on kernel side is that the second command
succeeds replacing the existing program. However, what happens is
in cls_bpf_change(), we bail out with err in the second run in
cls_bpf_offload(). The EINVAL comes directly in cls_bpf_offload()
when comparing prog vs oldprog's gen_flags. In case of above
replace the new prog's gen_flags are 0, but the old ones are 8,
which means TCA_CLS_FLAGS_NOT_IN_HW is set (e.g. drivers not having
cls_bpf offload).
Fix 102740bd9436 ("cls_bpf: fix offload assumptions after callback
conversion") in the following way: gen_flags from user space passed
down via netlink cannot include status flags like TCA_CLS_FLAGS_IN_HW
or TCA_CLS_FLAGS_NOT_IN_HW as opposed to oldprog that we previously
loaded. Therefore, it doesn't make any sense to include them in the
gen_flags comparison with the new prog before we even attempt to
offload. Thus, lets fix this before 4.15 goes out.
Fixes: 102740bd9436 ("cls_bpf: fix offload assumptions after callback conversion") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Rex Chang [Tue, 16 Jan 2018 20:16:01 +0000 (15:16 -0500)]
Net: ethernet: ti: netcp: Fix inbound ping crash if MTU size is greater than 1500
In the receive queue for 4096 bytes fragments, the page address
set in the SW data0 field of the descriptor is not the one we got
when doing the reassembly in receive. The page structure was retrieved
from the wrong descriptor into SW data0 which is then causing a
page fault when UDP checksum is accessing data above 1500.
Sabrina Dubroca [Tue, 16 Jan 2018 15:04:28 +0000 (16:04 +0100)]
tls: reset crypto_info when do_tls_setsockopt_tx fails
The current code copies directly from userspace to ctx->crypto_send, but
doesn't always reinitialize it to 0 on failure. This causes any
subsequent attempt to use this setsockopt to fail because of the
TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
ready.
This should result in a correctly set up socket after the 3rd call, but
currently it does not:
Sabrina Dubroca [Tue, 16 Jan 2018 15:04:26 +0000 (16:04 +0100)]
tls: fix sw_ctx leak
During setsockopt(SOL_TCP, TLS_TX), if initialization of the software
context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't
reassign ctx->priv_ctx to NULL, so we can't even do another attempt to
set it up on the same socket, as it will fail with -EEXIST.
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
nvme-pci: take sglist coalescing in dma_map_sg into account
Some iommu implementations can merge physically and/or virtually
contiguous segments inside sg_map_dma. The NVMe SGL support does not take
this into account and will warn because of falling off a loop. Pass the
number of mapped segments to nvme_pci_setup_sgls so that the SGL setup
can take the number of mapped segments into account.
Ilya Lesokhin [Tue, 16 Jan 2018 13:31:52 +0000 (15:31 +0200)]
net/tls: Only attach to sockets in ESTABLISHED state
Calling accept on a TCP socket with a TLS ulp attached results
in two sockets that share the same ulp context.
The ulp context is freed while a socket is destroyed, so
after one of the sockets is released, the second second will
trigger a use after free when it tries to access the ulp context
attached to it.
We restrict the TLS ulp to sockets in ESTABLISHED state
to prevent the scenario above.
This patch moves fs_timeout() actions into an async worker.
Fixes: commit 48257c4f168e5 ("Add fs_enet ethernet network driver, for several embedded platforms") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Kai-Heng Feng [Tue, 16 Jan 2018 08:46:27 +0000 (16:46 +0800)]
r8152: disable RX aggregation on Dell TB16 dock
r8153 on Dell TB15/16 dock corrupts rx packets.
This change is suggested by Realtek. They guess that the XHCI controller
doesn't have enough buffer, and their guesswork is correct, once the RX
aggregation gets disabled, the issue is gone.
ASMedia is currently working on a real sulotion for this issue.
Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16.
Note that TB15 has different bcdDevice and iSerialNumber, which are not
unique values. If you still have TB15, please contact Dell to replace it
with TB16.
Linus Torvalds [Wed, 17 Jan 2018 20:30:06 +0000 (12:30 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- A rather involved set of memory hardware encryption fixes to
support the early loading of microcode files via the initrd. These
are larger than what we normally take at such a late -rc stage, but
there are two mitigating factors: 1) much of the changes are
limited to the SME code itself 2) being able to early load
microcode has increased importance in the post-Meltdown/Spectre
era.
- An IRQ vector allocator fix
- An Intel RDT driver use-after-free fix
- An APIC driver bug fix/revert to make certain older systems boot
again
- A pkeys ABI fix
- TSC calibration fixes
- A kdump fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/vector: Fix off by one in error path
x86/intel_rdt/cqm: Prevent use after free
x86/mm: Encrypt the initrd earlier for BSP microcode update
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
x86/mm: Use a struct to reduce parameters for SME PGD mapping
x86/mm: Clean up register saving in the __enc_copy() assembly code
x86/idt: Mark IDT tables __initconst
Revert "x86/apic: Remove init_bsp_APIC()"
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Print tsc_khz, when it differs from cpu_khz
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
x86/tsc: Future-proof native_calibrate_tsc()
kdump: Write the correct address of mem_section into vmcoreinfo
Linus Torvalds [Wed, 17 Jan 2018 20:24:42 +0000 (12:24 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two futex fixes: a input parameters robustness fix, and futex race
fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Prevent overflow by strengthen input validation
futex: Avoid violating the 10th rule of futex
Cong Wang [Mon, 15 Jan 2018 19:37:29 +0000 (11:37 -0800)]
tun: fix a memory leak for tfile->tx_array
tfile->tun could be detached before we close the tun fd,
via tun_detach_all(), so it should not be used to check for
tfile->tx_array.
As Jason suggested, we probably have to clean it up
unconditionally both in __tun_deatch() and tun_detach_all(),
but this requires to check if it is initialized or not.
Currently skb_array_cleanup() doesn't have such a check,
so I check it in the caller and introduce a helper function,
it is a bit ugly but we can always improve it in net-next.
Linus Torvalds [Wed, 17 Jan 2018 19:54:56 +0000 (11:54 -0800)]
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
Russell King [Sat, 13 Jan 2018 12:11:26 +0000 (12:11 +0000)]
ARM: net: bpf: clarify tail_call index
As per 90caccdd8cc0 ("bpf: fix bpf_tail_call() x64 JIT"), the index used
for array lookup is defined to be 32-bit wide. Update a misleading
comment that suggests it is 64-bit wide.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <[email protected]>
Russell King [Sat, 13 Jan 2018 22:38:18 +0000 (22:38 +0000)]
ARM: net: bpf: fix register saving
When an eBPF program tail-calls another eBPF program, it enters it after
the prologue to avoid having complex stack manipulations. This can lead
to kernel oopses, and similar.
Resolve this by always using a fixed stack layout, a CPU register frame
pointer, and using this when reloading registers before returning.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <[email protected]>
Russell King [Sat, 13 Jan 2018 22:51:27 +0000 (22:51 +0000)]
ARM: net: bpf: correct stack layout documentation
The stack layout documentation incorrectly suggests that the BPF JIT
scratch space starts immediately below BPF_FP. This is not correct,
so let's fix the documentation to reflect reality.
Russell King [Sat, 13 Jan 2018 16:10:07 +0000 (16:10 +0000)]
ARM: net: bpf: fix stack alignment
As per 2dede2d8e925 ("ARM EABI: stack pointer must be 64-bit aligned
after a CPU exception") the stack should be aligned to a 64-bit boundary
on EABI systems. Ensure that the eBPF JIT appropraitely aligns the
stack.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <[email protected]>
The ARM assembler for the tail call in this case ends up branching to
instruction 14 instead of instruction 13, resulting in the BPF filter
returning a non-zero value:
For other sequences, the tail call could end up branching midway through
the following BPF instructions, or maybe off the end of the function,
leading to unknown behaviours.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <[email protected]>
Russell King [Sat, 13 Jan 2018 11:35:15 +0000 (11:35 +0000)]
ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs
Avoid the 'bx' instruction on CPUs that have no support for Thumb and
thus do not implement this instruction by moving the generation of this
opcode to a separate function that selects between:
bx reg
and
mov pc, reg
according to the capabilities of the CPU.
Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Russell King <[email protected]>
Rob Clark [Wed, 17 Jan 2018 15:16:20 +0000 (10:16 -0500)]
drm/vmwgfx: fix memory corruption with legacy/sou connectors
It looks like in all cases 'struct vmw_connector_state' is used. But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed. Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.
i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
On a I2C_SMBUS_I2C_BLOCK_DATA read request, if data->block[0] is
greater than I2C_SMBUS_BLOCK_MAX + 1, the underlying I2C driver writes
data out of the msgbuf1 array boundary.
It is possible from a user application to run into that issue by
calling the I2C_SMBUS ioctl with data.block[0] greater than
I2C_SMBUS_BLOCK_MAX + 1.
This patch makes the code compliant with
Documentation/i2c/dev-interface by raising an error when the requested
size is larger than 32 bytes.
Lixin Wang [Mon, 27 Nov 2017 07:06:55 +0000 (15:06 +0800)]
i2c: core: decrease reference count of device node in i2c_unregister_device
Reference count of device node was increased in of_i2c_register_device,
but without decreasing it in i2c_unregister_device. Then the added
device node will never be released. Fix this by adding the of_node_put.
Ondrej Kozina [Fri, 12 Jan 2018 15:30:32 +0000 (16:30 +0100)]
dm crypt: wipe kernel key copy after IV initialization
Loading key via kernel keyring service erases the internal
key copy immediately after we pass it in crypto layer. This is
wrong because IV is initialized later and we use wrong key
for the initialization (instead of real key there's just zeroed
block).
The bug may cause data corruption if key is loaded via kernel keyring
service first and later same crypt device is reactivated using exactly
same key in hexbyte representation, or vice versa. The bug (and fix)
affects only ciphers using following IVs: essiv, lmk and tcw.
Mikulas Patocka [Wed, 10 Jan 2018 14:32:47 +0000 (09:32 -0500)]
dm integrity: don't store cipher request on the stack
Some asynchronous cipher implementations may use DMA. The stack may
be mapped in the vmalloc area that doesn't support DMA. Therefore,
the cipher request and initialization vector shouldn't be on the
stack.
Fix this by allocating the request and iv with kmalloc.
Milan Broz [Wed, 3 Jan 2018 21:48:59 +0000 (22:48 +0100)]
dm crypt: fix crash by adding missing check for auth key size
If dm-crypt uses authenticated mode with separate MAC, there are two
concatenated part of the key structure - key(s) for encryption and
authentication key.
Add a missing check for authenticated key length. If this key length is
smaller than actually provided key, dm-crypt now properly fails instead
of crashing.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: [email protected] # 4.12+ Reported-by: Salah Coronya <[email protected]> Signed-off-by: Milan Broz <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Joe Thornber [Wed, 20 Dec 2017 09:56:06 +0000 (09:56 +0000)]
dm btree: fix serious bug in btree_split_beneath()
When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:
i) space for a new entry
ii) adjusting the first key entry if the new key is lower than any in the node.
If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them. The root node is
left with 2 entries corresponding to the 2 new nodes.
btree_split_beneath() then adjusts the spine to point to one of the two new
children. This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out. This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.
This is a serious bug, and quite hard to make trigger in normal use. A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593
Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine. Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.
Dennis Yang [Tue, 12 Dec 2017 10:21:40 +0000 (18:21 +0800)]
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
For btree removal, there is a corner case that a single thread
could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5)
and leads to deadlock.
A btree removal might eventually call
rebalance_children()->rebalance3() to rebalance entries of three
neighbor child nodes when shadow_spine has already acquired two
write locks. In rebalance3(), it tries to shadow and acquire the
write locks of all three child nodes. However, shadowing a child
node requires acquiring a read lock of the original child node and
a write lock of the new block. Although the read lock will be
released after block shadowing, shadowing the third child node
in rebalance3() could still take the sixth lock.
(2 write locks for shadow_spine +
2 write locks for the first two child nodes's shadow +
1 write lock for the last child node's shadow +
1 read lock for the last child node)
Radim Krčmář [Wed, 17 Jan 2018 13:59:27 +0000 (14:59 +0100)]
Merge tag 'kvm-arm-fixes-for-v4.15-3-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.15, Round 3 (v2)
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
Michael Ellerman [Tue, 16 Jan 2018 10:20:05 +0000 (21:20 +1100)]
powerpc/64s: Wire up cpu_show_meltdown()
The recent commit 87590ce6e373 ("sysfs/cpu: Add vulnerability folder")
added a generic folder and set of files for reporting information on
CPU vulnerabilities. One of those was for meltdown:
/sys/devices/system/cpu/vulnerabilities/meltdown
This commit wires up that file for 64-bit Book3S powerpc.
For now we default to "Vulnerable" unless the RFI flush is enabled.
That may not actually be true on all hardware, further patches will
refine the reporting based on the CPU/platform etc. But for now we
default to being pessimists.
The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.
Thomas Gleixner [Tue, 16 Jan 2018 18:59:59 +0000 (19:59 +0100)]
x86/intel_rdt/cqm: Prevent use after free
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.
BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510
Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510
Andi Kleen [Tue, 16 Jan 2018 20:52:28 +0000 (12:52 -0800)]
module: Add retpoline tag to VERMAGIC
Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.
It doesn't handle the case when retpoline has been runtime disabled. Even
in this case the match of the retcompile status will be enforced. This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.
Woody Suwalski [Wed, 17 Jan 2018 08:07:47 +0000 (09:07 +0100)]
drm/vmwgfx: Fix a boot time warning
The 4.15 vmwgfx driver shows a warning during boot.
It is caused by a mismatch between the result of vmw_enable_vblank()
and what the drm_atomic_helper expects.
Paolo Bonzini [Tue, 16 Jan 2018 15:42:25 +0000 (16:42 +0100)]
x86/cpufeature: Move processor tracing out of scattered features
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.
Besides being more tidy, this will be useful for KVM when it presents
processor tracing to the guests. KVM selects host features that are
supported by both the host kernel (depending on command line options,
CPU errata, or whatever) and KVM. Whenever a full feature word exists,
KVM's code is written in the expectation that the CPUID bit number
matches the X86_FEATURE_* bit number, but this is not the case for
X86_FEATURE_INTEL_PT.
Michael Cree [Wed, 3 Jan 2018 08:58:00 +0000 (21:58 +1300)]
alpha: extend memset16 to EV6 optimised routines
Commit 92ce4c3ea7c4, "alpha: add support for memset16", renamed
the function memsetw() to be memset16() but neglected to do this for
the EV6 optimised version, thus when building a kernel optimised
for EV6 (or later) link errors result. This extends the memset16
support to EV6.
Linus Torvalds [Wed, 17 Jan 2018 00:47:40 +0000 (16:47 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"We had a few more items creep up over the last week. Given we are in
-rc8, these are obviously limited to bugs that have a big downside and
for which we are certain of the fix.
The first is a straight up oops bug that all you have to do is read
the code to see it's a guaranteed 100% oops bug.
The second is a use-after-free issue. We get away lucky if the queue
we are shutting down is empty, but if it isn't, we can end up oopsing.
We really need to drain the queue before destroying it.
The final one is an issue with bad user input causing us to access our
port array out of bounds. While fixing the array out of bounds issue,
it was noticed that the original code did the same thing twice (the
call to rdma_ah_set_port_num()), so its removal is not balanced by a
readd elsewhere, it was already where it needed to be in addition to
where it didn't need to be.
Summary:
- Oops fix in hfi1 driver
- use-after-free issue in iser-target
- use of user supplied array index without proper checking"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Fix out-of-bound access while querying AH
IB/hfi1: Prevent a NULL dereference
iser-target: Fix possible use-after-free in connection establishment error
Daniel Borkmann [Tue, 16 Jan 2018 22:30:10 +0000 (23:30 +0100)]
bpf: reject stores into ctx via st and xadd
Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.
The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.
Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
Linus Walleij [Tue, 16 Jan 2018 08:51:51 +0000 (09:51 +0100)]
gpio: mmio: Also read bits that are zero
The code for .get_multiple() has bugs:
1. The simple .get_multiple() just reads a register, masks it
and sets the return value. This is not correct: we only want to
assign values (whether 0 or 1) to the bits that are set in the
mask. Fix this by using &= ~mask to clear all bits in the mask
and then |= val & mask to set the corresponding bits from the
read.
2. The bgpio_get_multiple_be() call has a similar problem: it
uses the |= operator to set the bits, so only the bits in the
mask are affected, but it misses to clear all returned bits
from the mask initially, so some bits will be returned
erroneously set to 1.
3. The bgpio_get_set_multiple() again fails to clear the bits
from the mask.
4. find_next_bit() wasn't handled correctly, use a totally
different approach for one function and change the other
function to follow the design pattern of assigning the first
bit to -1, then use bit + 1 in the for loop and < num_iterations
as break condition.
1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers.
2) Memory leak in key_notify_policy(), from Steffen Klassert.
3) Fix overflow with bpf arrays, from Daniel Borkmann.
4) Fix RDMA regression with mlx5 due to mlx5 no longer using
pci_irq_get_affinity(), from Saeed Mahameed.
5) Missing RCU read locking in nl80211_send_iface() when it calls
ieee80211_bss_get_ie(), from Dominik Brodowski.
6) cfg80211 should check dev_set_name()'s return value, from Johannes
Berg.
7) Missing module license tag in 9p protocol, from Stephen Hemminger.
8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike
Maloney.
9) Fix endless loop in netlink extack code, from David Ahern.
10) TLS socket layer sets inverted error codes, resulting in an endless
loop. From Robert Hering.
11) Revert openvswitch erspan tunnel support, it's mis-designed and we
need to kill it before it goes into a real release. From William Tu.
12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net, sched: fix panic when updating miniq {b,q}stats
qed: Fix potential use-after-free in qed_spq_post()
nfp: use the correct index for link speed table
lan78xx: Fix failure in USB Full Speed
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
ibmvnic: Fix pending MAC address changes
netlink: extack: avoid parenthesized string constant warning
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
net: Allow neigh contructor functions ability to modify the primary_key
sh_eth: fix dumping ARSTR
Revert "openvswitch: Add erspan tunnel support."
net/tls: Fix inverted error codes to avoid endless loop
ipv6: ip6_make_skb() needs to clear cork.base.dst
sctp: avoid compiler warning on implicit fallthru
net: ipv4: Make "ip route get" match iif lo rules again.
netlink: extack needs to be reset each time through loop
tipc: fix a memory leak in tipc_nl_node_get_link()
ipv6: fix udpv6 sendmsg crash caused by too small MTU
...
Linus Torvalds [Tue, 16 Jan 2018 20:09:36 +0000 (12:09 -0800)]
Merge tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Bring back context level recursive protection in ring buffer.
The simpler counter protection failed, due to a path when tracing
with trace_clock_global() as it could not be reentrant and depended
on the ring buffer recursive protection to keep that from happening.
- Prevent branch profiling when FORTIFY_SOURCE is enabled.
It causes 50 - 60 MB in warning messages. Branch profiling should
never be run on production systems, so there's no reason that it
needs to be enabled with FORTIFY_SOURCE.
* tag 'trace-v4.15-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
ring-buffer: Bring back context level recursive checks
Daniel Borkmann [Mon, 15 Jan 2018 22:12:09 +0000 (23:12 +0100)]
net, sched: fix panic when updating miniq {b,q}stats
While working on fixing another bug, I ran into the following panic
on arm64 by simply attaching clsact qdisc, adding a filter and running
traffic on ingress to it:
Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init()
which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc.
Problem is that this cannot work since they are actually set up right after
the qdisc ->init() callback in qdisc_create(), so first packet going into
sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we
therefore panic.
In order to fix this, allocation of {b,q}stats needs to happen before we
call into ->init(). In net-next, there's already such option through commit d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats").
However, the bug needs to be fixed in net still for 4.15. Thus, include
these bits to reduce any merge churn and reuse the static_flags field to
set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since
there is no other user left. Prashant Bhole ran into the same issue but
for net-next, thus adding him below as well as co-author. Same issue was
also reported by Sandipan Das when using bcc.
Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath")
Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html Reported-by: Sandipan Das <[email protected]> Co-authored-by: Prashant Bhole <[email protected]> Co-authored-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Roland Dreier [Mon, 15 Jan 2018 20:24:49 +0000 (12:24 -0800)]
qed: Fix potential use-after-free in qed_spq_post()
We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
calling qed_spq_add_entry(). The test is fine is the mode is EBLOCK,
but if it isn't then qed_spq_add_entry() might kfree(p_ent).
Jakub Kicinski [Mon, 15 Jan 2018 19:47:53 +0000 (11:47 -0800)]
nfp: use the correct index for link speed table
sts variable is holding link speed as well as state. We should
be using ls to index into ls_to_ethtool.
Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yuiko Oshino [Mon, 15 Jan 2018 18:24:28 +0000 (13:24 -0500)]
lan78xx: Fix failure in USB Full Speed
Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Yuiko Oshino <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Borkmann [Tue, 16 Jan 2018 02:46:08 +0000 (03:46 +0100)]
bpf, arm64: fix stack_depth tracking in combination with tail calls
Using dynamic stack_depth tracking in arm64 JIT is currently broken in
combination with tail calls. In prologue, we cache ctx->stack_size and
adjust SP reg for setting up function call stack, and tearing it down
again in epilogue. Problem is that when doing a tail call, the cached
ctx->stack_size might not be the same.
One way to fix the problem with minimal overhead is to re-adjust SP in
emit_bpf_tail_call() and properly adjust it to the current program's
ctx->stack_size. Tested on Cavium ThunderX ARMv8.
Fixes: f1c9eed7f437 ("bpf, arm64: take advantage of stack_depth tracking") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
Xin Long [Mon, 15 Jan 2018 09:02:00 +0000 (17:02 +0800)]
sctp: do not allow the v4 socket to bind a v4mapped v6 address
The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.
Xin Long [Mon, 15 Jan 2018 09:01:36 +0000 (17:01 +0800)]
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.
Xin Long [Mon, 15 Jan 2018 09:01:19 +0000 (17:01 +0800)]
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.
However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.
This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.
Wright Feng [Tue, 16 Jan 2018 09:26:50 +0000 (17:26 +0800)]
brcmfmac: fix CLM load error for legacy chips when user helper is enabled
For legacy chips without CLM blob files, kernel with user helper function
returns -EAGAIN when we request_firmware(), and then driver got failed
when bringing up legacy chips. We expect the CLM blob file for legacy chip
is not existence in firmware path, but the -ENOENT error is transferred to
-EAGAIN in firmware_class.c with user helper.
Because of that, we continue with CLM data currently present in firmware
if getting error from doing request_firmware().
James Hogan [Mon, 15 Jan 2018 21:17:14 +0000 (21:17 +0000)]
ssb: Disable PCI host for PCI_DRIVERS_GENERIC
Since commit d41e6858ba58 ("MIPS: Kconfig: Set default MIPS system type
as generic") changed the default MIPS platform to the "generic"
platform, which uses PCI_DRIVERS_GENERIC instead of PCI_DRIVERS_LEGACY,
various files in drivers/ssb/ have failed to build.
This is particularly due to the existence of struct pci_controller being
dependent on PCI_DRIVERS_LEGACY since commit c5611df96804 ("MIPS: PCI:
Introduce CONFIG_PCI_DRIVERS_LEGACY"), so add that dependency to Kconfig
to prevent these files being built for the "generic" platform including
all{yes,mod}config builds.
Guenter Roeck [Sun, 14 Jan 2018 21:34:02 +0000 (13:34 -0800)]
bcma: Fix 'allmodconfig' and BCMA builds on MIPS targets
Mips builds with BCMA host mode enabled fail in mainline and -next
with:
In file included from include/linux/bcma/bcma.h:10:0,
from drivers/bcma/bcma_private.h:9,
from drivers/bcma/main.c:8:
include/linux/bcma/bcma_driver_pci.h:218:24: error:
field 'pci_controller' has incomplete type
Bisect points to commit d41e6858ba58c ("MIPS: Kconfig: Set default MIPS
system type as generic") as the culprit. Analysis shows that the commmit
changes PCI configuration and enables PCI_DRIVERS_GENERIC. This in turn
disables PCI_DRIVERS_LEGACY. 'struct pci_controller' is, however, only
defined if PCI_DRIVERS_LEGACY is enabled.
Ultimately that means that BCMA_DRIVER_PCI_HOSTMODE depends on
PCI_DRIVERS_LEGACY. Add the missing dependency.
Fixes: fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") Reported-by: Tom St Denis <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
Marc Zyngier [Tue, 16 Jan 2018 10:23:47 +0000 (10:23 +0000)]
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
KVM doesn't follow the SMCCC when it comes to unimplemented calls,
and inject an UNDEF instead of returning an error. Since firmware
calls are now used for security mitigation, they are becoming more
common, and the undef is counter productive.
Instead, let's follow the SMCCC which states that -1 must be returned
to the caller when getting an unknown function number.
Thomas Falcon [Thu, 11 Jan 2018 01:39:52 +0000 (19:39 -0600)]
ibmvnic: Fix pending MAC address changes
Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.
The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.
Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters") Signed-off-by: Thomas Falcon <[email protected]> Reviewed-by: John Allen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
can: peak: fix potential bug in packet fragmentation
In some rare conditions when running one PEAK USB-FD interface over
a non high-speed USB controller, one useless USB fragment might be sent.
This patch fixes the way a USB command is fragmented when its length is
greater than 64 bytes and when the underlying USB controller is not a
high-speed one.
Josh Snyder [Mon, 18 Dec 2017 16:15:10 +0000 (16:15 +0000)]
delayacct: Account blkio completion on the correct task
Before commit:
e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
delayacct_blkio_end() was called after context-switching into the task which
completed I/O.
This resulted in double counting: the task would account a delay both waiting
for I/O and for time spent in the runqueue.
With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
In ttwu, we have not yet context-switched. This is more correct, in that
the delay accounting ends when the I/O is complete.
But delayacct_blkio_end() relies on 'get_current()', and we have not yet
context-switched into the task whose I/O completed. This results in the
wrong task having its delay accounting statistics updated.
Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
so that it can update the statistics of the correct task.
Tom Lendacky [Wed, 10 Jan 2018 19:26:34 +0000 (13:26 -0600)]
x86/mm: Encrypt the initrd earlier for BSP microcode update
Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.
Tom Lendacky [Wed, 10 Jan 2018 19:26:26 +0000 (13:26 -0600)]
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
In preparation for encrypting more than just the kernel, the encryption
support in sme_encrypt_kernel() needs to support 4KB page aligned
encryption instead of just 2MB large page aligned encryption.
Update the routines that populate the PGD to support non-2MB aligned
addresses. This is done by creating PTE page tables for the start
and end portion of the address range that fall outside of the 2MB
alignment. This results in, at most, two extra pages to hold the
PTE entries for each mapping of a range.
Tom Lendacky [Wed, 10 Jan 2018 19:26:16 +0000 (13:26 -0600)]
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
In preparation for encrypting more than just the kernel during early
boot processing, centralize the use of the PMD flag settings based
on the type of mapping desired. When 4KB aligned encryption is added,
this will allow either PTE flags or large page PMD flags to be used
without requiring the caller to adjust.
Tom Lendacky [Wed, 10 Jan 2018 19:26:05 +0000 (13:26 -0600)]
x86/mm: Use a struct to reduce parameters for SME PGD mapping
In preparation for follow-on patches, combine the PGD mapping parameters
into a struct to reduce the number of function arguments and allow for
direct updating of the next pagetable mapping area pointer.
Tom Lendacky [Wed, 10 Jan 2018 19:25:56 +0000 (13:25 -0600)]
x86/mm: Clean up register saving in the __enc_copy() assembly code
Clean up the use of PUSH and POP and when registers are saved in the
__enc_copy() assembly function in order to improve the readability of the code.
Move parameter register saving into general purpose registers earlier
in the code and move all the pushes to the beginning of the function
with corresponding pops at the end.
Josh Poimboeuf [Mon, 15 Jan 2018 14:17:07 +0000 (08:17 -0600)]
objtool: Fix seg fault with gold linker
Objtool segfaults when the gold linker is used with
CONFIG_MODVERSIONS=y and CONFIG_UNWINDER_ORC=y.
With CONFIG_MODVERSIONS=y, the .o file gets passed to the linker before
being passed to objtool. The gold linker seems to strip unused ELF
symbols by default, which confuses objtool and causes the seg fault when
it's trying to generate ORC metadata.
Objtool should really be running immediately after GCC anyway, without a
linker call in between. Change the makefile ordering so that objtool is
called before the linker.
Leon Romanovsky [Fri, 12 Jan 2018 05:58:39 +0000 (07:58 +0200)]
RDMA/mlx5: Fix out-of-bound access while querying AH
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.
==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
Memory state around the buggy address: ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
warning: array initialized from parenthesized string constant
Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.
====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
This used to be the previous behavior in older kernels but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)
Not having this results in there being an arp entry for every remote ip
address that the device talks to. Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.
$ ip -4 neigh show nud noarp | grep tun | wc -l
55850
Jim Westfall [Sun, 14 Jan 2018 12:18:51 +0000 (04:18 -0800)]
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Sergei Shtylyov [Sat, 13 Jan 2018 17:22:01 +0000 (20:22 +0300)]
sh_eth: fix dumping ARSTR
ARSTR is always located at the start of the TSU register region, thus
using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR. Use the correct condition/macro there...
Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations") Signed-off-by: Sergei Shtylyov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field. Thus, this
patch reverts it and later patch will redo it using nested attr.
net/tls: Fix inverted error codes to avoid endless loop
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().
Eric Dumazet [Fri, 12 Jan 2018 06:31:18 +0000 (22:31 -0800)]
ipv6: ip6_make_skb() needs to clear cork.base.dst
In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Randy Dunlap [Mon, 15 Jan 2018 19:07:27 +0000 (11:07 -0800)]
tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y
I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):
In file included from ../include/linux/string.h:6:0,
from ../include/linux/uuid.h:20,
from ../include/linux/mod_devicetable.h:13,
from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
______f = { \
^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
if (p_size == (size_t)-1 && q_size == (size_t)-1)
^
This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.
Lorenzo Colitti [Thu, 11 Jan 2018 09:36:26 +0000 (18:36 +0900)]
net: ipv4: Make "ip route get" match iif lo rules again.
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.
Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.
inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again Signed-off-by: Lorenzo Colitti <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Ahern [Wed, 10 Jan 2018 21:00:39 +0000 (13:00 -0800)]
netlink: extack needs to be reset each time through loop
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.
Mike Maloney [Wed, 10 Jan 2018 17:45:10 +0000 (12:45 -0500)]
ipv6: fix udpv6 sendmsg crash caused by too small MTU
The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers. A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU. For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.
Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.
Guillaume Nault [Wed, 10 Jan 2018 15:24:45 +0000 (16:24 +0100)]
ppp: unlock all_ppp_mutex before registering device
ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().
Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.
However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-and-tested-by: [email protected] Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>