Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two genirq fixes, plus an irqchip driver error handling fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init()
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
bug"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add rewind_stack_do_exit() to the noreturn list
linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
David S. Miller [Fri, 12 Apr 2019 23:57:23 +0000 (16:57 -0700)]
Merge branch 'rxrpc-fixes'
David Howells says:
====================
rxrpc: Fixes
Here is a collection of fixes for rxrpc:
(1) rxrpc_error_report() needs to call sock_error() to clear the error
code from the UDP transport socket, lest it be unexpectedly revisited
on the next kernel_sendmsg() call. This has been causing all sorts of
weird effects in AFS as the effects have typically been felt by the
wrong RxRPC call.
(2) Allow a kernel user of AF_RXRPC to easily detect if an rxrpc call has
completed.
(3) Allow errors incurred by attempting to transmit data through the UDP
socket to get back up the stack to AFS.
(4) Make AFS use (2) to abort the synchronous-mode call waiting loop if
the rxrpc-level call completed.
(5) Add a missing tracepoint case for tracing abort reception.
(6) Fix detection and handling of out-of-order ACKs.
The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:
1. The allocation of serial numbers cannot be assumed to imply the order
by which acks are populated and transmitted. In some rxrpc
implementations, delayed acks and ping acks are transmitted
asynchronously to the receipt of data packets and so may be transmitted
out of order. As a result, they can race with idle acks.
2. Serial numbers are allocated by the rxrpc connection and not the call
and as such may wrap independently if multiple channels are in use.
In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).
Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.
Fixes: 298bc15b2079 ("rxrpc: Only take the rwind and mtu values from latest ACK") Signed-off-by: Jeffrey Altman <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Marc Dionne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Howells [Fri, 12 Apr 2019 15:34:09 +0000 (16:34 +0100)]
rxrpc: Trace received connection aborts
Trace received calls that are aborted due to a connection abort, typically
because of authentication failure. Without this, connection aborts don't
show up in the trace log.
Marc Dionne [Fri, 12 Apr 2019 15:34:02 +0000 (16:34 +0100)]
afs: Check for rxrpc call completion in wait loop
Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.
Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.
There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.
Marc Dionne [Fri, 12 Apr 2019 15:33:54 +0000 (16:33 +0100)]
rxrpc: Allow errors to be returned from rxrpc_queue_packet()
Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.
This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.
Marc Dionne [Fri, 12 Apr 2019 15:33:40 +0000 (16:33 +0100)]
rxrpc: Clear socket error
When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().
This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.
Colin Ian King [Fri, 12 Apr 2019 14:13:27 +0000 (15:13 +0100)]
qede: fix write to free'd pointer error and double free of ptp
The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.
Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.
Addresses-Coverity: ("Write to pointer after free") Fixes: 035744975aec ("qede: Add support for PTP resource locking.") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Colin Ian King [Fri, 12 Apr 2019 13:45:12 +0000 (14:45 +0100)]
vxge: fix return of a free'd memblock on a failed dma mapping
Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.
Addresses-Coverity: ("Use after free") Fixes: 528f727279ae ("vxge: code cleanup and reorganization") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
mt76x02: avoid status_list.lock and sta->rate_ctrl_lock dependency
Move ieee80211_tx_status_ext() outside of status_list lock section
in order to avoid locking dependency and possible deadlock reposed by
LOCKDEP in below warning.
Also do mt76_tx_status_lock() just before it's needed.
[ 440.224832] WARNING: possible circular locking dependency detected
[ 440.224833] 5.1.0-rc2+ #22 Not tainted
[ 440.224834] ------------------------------------------------------
[ 440.224835] kworker/u16:28/2362 is trying to acquire lock:
[ 440.224836] 0000000089b8cacf (&(&q->lock)->rlock#2){+.-.}, at: mt76_wake_tx_queue+0x4c/0xb0 [mt76]
[ 440.224842]
but task is already holding lock:
[ 440.224842] 000000002cfedc59 (&(&sta->lock)->rlock){+.-.}, at: ieee80211_stop_tx_ba_cb+0x32/0x1f0 [mac80211]
[ 440.224863]
which lock already depends on the new lock.
Felix Fietkau [Tue, 26 Mar 2019 08:34:20 +0000 (09:34 +0100)]
mt76: mt7603: fix sequence number assignment
If the MT_TXD3_SN_VALID flag is not set in the tx descriptor, the hardware
assigns the sequence number. However, the rest of the code assumes that the
sequence number specified in the 802.11 header gets transmitted.
This was causing issues with the aggregation setup, which worked for the
initial one (where the sequence numbers were still close), but not for
further teardown/re-establishing of sessions.
Additionally, the overwrite of the TID sequence number in WTBL2 was resetting
the hardware assigned sequence numbers, causing them to drift further apart.
Fix this by using the software assigned sequence numbers
udpv6: Check address length before reading address family
KMSAN will complain if valid address length passed to udpv6_pre_connect()
is shorter than sizeof("struct sockaddr"->sa_family) bytes.
(This patch is bogus if it is guaranteed that udpv6_pre_connect() is
always called after checking "struct sockaddr"->sa_family. In that case,
we want a comment why we don't need to check valid address length here.)
net/rds: Check address length before reading address family
syzbot is reporting uninitialized value at rds_connect() [1] and
rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
these functions expect that it is safe to access sockaddr->family field
in order to determine minimal address length for validation.
drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR
shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow->tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.
Merge tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix a sparc64 sun4v_pci regression introduced in this merged window,
and a dma-debug stracktrace regression from the big refactor last
merge window"
* tag 'dma-mapping-5.1-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: only skip one stackframe entry
sparc64/pci_sun4v: fix ATU checks for large DMA masks
Merge tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Fix an AMD IOMMU issue where the driver didn't correctly setup the
exclusion range in the hardware registers, resulting in exclusion
ranges being one page too big.
This can cause data corruption of the address of that last page is
used by DMA operations"
* tag 'iommu-fix-v5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Set exclusion range correctly
Merge tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC host fixes from Ulf Hansson:
- alcor: Stabilize data write requests
- sdhci-omap: Fix command error path during tuning
* tag 'mmc-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
mmc: alcor: don't write data before command has completed
Merge tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Well, this one became unpleasantly larger than previous pull requests,
but it's a kind of usual pattern: now it contains a collection of ASoC
fixes, and nothing to worry too much.
The fixes for ASoC core (DAPM, DPCM, topology) are all small and just
covering corner cases. The rest changes are driver-specific, many of
which are for x86 platforms and new drivers like STM32, in addition to
the usual fixups for HD-audio"
* tag 'sound-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (66 commits)
ASoC: wcd9335: Fix missing regmap requirement
ALSA: hda: Fix racy display power access
ASoC: pcm: fix error handling when try_module_get() fails.
ASoC: stm32: sai: fix master clock management
ASoC: Intel: kbl: fix wrong number of channels
ALSA: hda - Add two more machines to the power_save_blacklist
ASoC: pcm: update module refcount if module_get_upon_open is set
ASoC: core: conditionally increase module refcount on component open
ASoC: stm32: fix sai driver name initialisation
ASoC: topology: Use the correct dobj to free enum control values and texts
ALSA: seq: Fix OOB-reads from strlcpy
ASoC: intel: skylake: add remove() callback for component driver
ASoC: cs35l35: Disable regulators on driver removal
ALSA: xen-front: Do not use stream buffer size before it is set
ASoC: rockchip: pdm: change dma burst to 8
ASoC: rockchip: pdm: fix regmap_ops hang issue
ASoC: simple-card: don't select DPCM via simple-audio-card
ASoC: audio-graph-card: don't select DPCM via audio-graph-card
ASoC: tlv320aic32x4: Change author's name
ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
...
Merge tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an ACPICA issue introduced during the 4.20 development cycle and
causing some systems to crash because of leftover operation region
data still maintained after the operation region in question has gone
away (Erik Schmauss)"
* tag 'acpi-5.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: remove address node from global list after method termination
Merge tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fixes across the driver spectrum this week, the mediatek fbdev support
might be a bit late for this round, but I looked over it and it's not
very large and seems like a useful feature for them.
Otherwise the main thing is a regression fix for i915 5.0 bug that
caused black screens on a bunch of Dell XPS 15s I think, I know at
least Fedora is waiting for this to land, and the udl fix is also for
a regression since 5.0 where unplugging the device would end badly.
core:
- make atomic hooks optional
i915:
- Revert a 5.0 regression where some eDP panels stopped working
- DSI related fixes for platforms up to IceLake
- GVT (regression fix, warning fix, use-after free fix)
amdgpu:
- Cursor fixes
- missing PCI ID fix for KFD
- XGMI fix
- shadow buffer handling after reset fix
udl:
- fix unplugging device crashes.
mediatek:
- stabilise MT2701 HDMI support
- fbdev support
* tag 'drm-fixes-2019-04-12' of git://anongit.freedesktop.org/drm/drm: (32 commits)
gpu: host1x: Fix compile error when IOMMU API is not available
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
drm/i915/dp: revert back to max link rate and lane count on eDP
drm/i915/icl: Fix port disable sequence for mipi-dsi
drm/i915/icl: Ungate ddi clocks before IO enable
drm/mediatek: no change parent rate in round_rate() for MT2701 hdmi phy
drm/mediatek: using new factor for tvdpll for MT2701 hdmi phy
drm/mediatek: remove flag CLK_SET_RATE_PARENT for MT2701 hdmi phy
drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy
drm/mediatek: fix the rate and divder of hdmi phy for MT2701
drm/mediatek: fix possible object reference leak
drm/i915: Get power refs in encoder->get_power_domains()
drm/i915: Fix pipe_bpp readout for BXT/GLK DSI
drm/amd/display: Fix negative cursor pos programming (v2)
drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
drm/udl: add a release method and delay modeset teardown
drm/i915/gvt: Prevent use-after-free in ppgtt_free_all_spt()
drm/i915/gvt: Annotate iomem usage
drm/sun4i: DW HDMI: Lower max. supported rate for H6
Revert "Documentation/gpu/meson: Remove link to meson_canvas.c"
...
Will Deacon [Mon, 8 Apr 2019 11:45:09 +0000 (12:45 +0100)]
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
Rather embarrassingly, our futex() FUTEX_WAKE_OP implementation doesn't
explicitly set the return value on the non-faulting path and instead
leaves it holding the result of the underlying atomic operation. This
means that any FUTEX_WAKE_OP atomic operation which computes a non-zero
value will be reported as having failed. Regrettably, I wrote the buggy
code back in 2011 and it was upstreamed as part of the initial arm64
support in 2012.
The reasons we appear to get away with this are:
1. FUTEX_WAKE_OP is rarely used and therefore doesn't appear to get
exercised by futex() test applications
2. If the result of the atomic operation is zero, the system call
behaves correctly
3. Prior to version 2.25, the only operation used by GLIBC set the
futex to zero, and therefore worked as expected. From 2.25 onwards,
FUTEX_WAKE_OP is not used by GLIBC at all.
Fix the implementation by ensuring that the return value is either 0
to indicate that the atomic operation completed successfully, or -EFAULT
if we encountered a fault when accessing the user mapping.
The exlcusion range limit register needs to contain the
base-address of the last page that is part of the range, as
bits 0-11 of this register are treated as 0xfff by the
hardware for comparisons.
So correctly set the exclusion range in the hardware to the
last page which is _in_ the range.
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Dave Airlie [Fri, 12 Apr 2019 03:39:22 +0000 (13:39 +1000)]
Merge tag 'drm-intel-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Revert back to max link rate and lane count on eDP.
- DSI related fixes for all platforms including Ice Lake.
- GVT Fixes including one vGPU display plane size regression fix,
one for preventing use-after-free in ppgtt shadow free function,
and another warning fix for iomem access annotation.
Jason Yan [Fri, 12 Apr 2019 02:09:16 +0000 (10:09 +0800)]
block: fix the return errno for direct IO
If the last bio returned is not dio->bio, the status of the bio will
not assigned to dio->bio if it is error. This will cause the whole IO
status wrong.
ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0]
<idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0]
<idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>]
<idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>]
ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475]
ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
We always have to assign bio->bi_status to dio->bio.bi_status because we
will only check dio->bio.bi_status when we return the whole IO to
the upper layer.
Merge tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix parsing of compression algorithm when set as a inode property,
this could end up with eg. 'zst' or 'zli' in the value
- don't allow trim on a filesystem with unreplayed log, this could
cause data loss if there are pending updates to the block groups that
would not be subject to trim after replay
* tag 'for-5.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prop: fix vanished compression property after failed set
btrfs: prop: fix zstd compression parameter validation
Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
David Ahern [Tue, 9 Apr 2019 21:23:10 +0000 (14:23 -0700)]
selftests: fib_tests: Fix 'Command line is not complete' errors
A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:
$ ip ro ls match
Command line is not complete. Try option "help"
Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.
Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.
Fixes: d69faad76584c ("selftests: fib_tests: Add prefix route tests with metric") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dave Airlie [Thu, 11 Apr 2019 20:55:20 +0000 (06:55 +1000)]
Merge tag 'drm-misc-fixes-2019-04-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- core: Make atomic_enable and disable optional for CRTC
- dw-hdmi: Lower max frequency for the Allwinner H6, SCDC configuration
improvements for older controller versions
- omap: a fix for the CEC clock management policy
Andy Duan [Tue, 9 Apr 2019 03:40:56 +0000 (03:40 +0000)]
net: fec: manage ahb clock in runtime pm
Some SOC like i.MX6SX clock have some limits:
- ahb clock should be disabled before ipg.
- ahb and ipg clocks are required for MAC MII bus.
So, move the ahb clock to runtime management together with
ipg clock.
The ability to optimise here relies on compiler being able to optimise
away tail calls to avoid stack overflows. Unfortunately, we are seeing
reports of problems, so let's just revert.
NFSv4.1 fix incorrect return value in copy_file_range
According to the NFSv4.2 spec if the input and output file is the
same file, operation should fail with EINVAL. However, linux
copy_file_range() system call has no such restrictions. Therefore,
in such case let's return EOPNOTSUPP and allow VFS to fallback
to doing do_splice_direct(). Also when copy_file_range is called
on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
COPY functionality), we also need to return EOPNOTSUPP and
fallback to a regular copy.
Fixes xfstest generic/075, generic/091, generic/112, generic/263
for all NFSv4.x versions.
Tetsuo Handa [Sat, 30 Mar 2019 01:21:07 +0000 (10:21 +0900)]
NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
(which is embedded into user-visible "struct nfs_mount_data" structure)
despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
bytes of AF_INET6 address to rpc_sockaddr2uaddr().
Since "struct nfs_mount_data" structure is user-visible, we can't change
"struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
assuming that everybody is using AF_INET family when passing address via
"struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
br_multicast_start_querier() walks over the port list but it can be
called from a timer with only multicast_lock held which doesn't protect
the port list, so use RCU to walk over it.
Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Thu, 11 Apr 2019 18:10:34 +0000 (11:10 -0700)]
Merge branch 'thunderx-xdp-mtu'
Matteo Croce says:
====================
Fix thunderx MTU with XDP
The thunderx driver can't use XDP with all MTU values.
This patches sets the right MTU values, and add a check to avoid setting
a wrong value which will not function.
v3: Fix a copy-paste from two functions, tested on proper hardware:
2: enP2p1s0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 1c:1b:0d:0d:52:a4 brd ff:ff:ff:ff:ff:ff
[ 787.019730] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1800.
RTNETLINK answers: Operation not supported
[ 800.574568] nicvf 0002:01:00.1 enP2p1s0v0: Link is Up 10000 Mbps Full duplex
[ 807.248321] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1500.
RTNETLINK answers: Invalid argument
====================
The thunderx driver splits frames bigger than 1530 bytes to multiple
pages, making impossible to run an eBPF program on it.
This leads to a maximum MTU of 1508 if QinQ is in use.
The thunderx driver forbids to load an eBPF program if the MTU is higher
than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
protocols which need some more headroom.
Fixes: 05c773f52b96e ("net: thunderx: Add basic XDP support") Signed-off-by: Matteo Croce <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Commit <26d92e951fe0>
("net/smc: move unhash as early as possible in smc_release()")
fixes one occurrence in the smc code, but the same pattern exists
in other places. This patch covers the remaining occurrences and
makes sure, the unhash operation is done before the smc->clcsock is
released. This avoids a potential use-after-free in smc_diag_dump().
The FLUSH command is used to empty the pnet table. No return code is
expected from the command. Commit a9d8b0b1e3d6 added namespace support
for the pnet table and changed the FLUSH command processing to call
smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
returns -ENOENT when no entry was deleted, which is now the return code
of the FLUSH command. As a result the FLUSH command will return an error
when the pnet table is already empty.
Restore the expected behavior and let FLUSH always return 0.
Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support") Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
that are delivered when out-of-band data arrives on socket fd.
If an SMC socket program makes use of such an fcntl() call, it fails
in case of fallback to TCP-mode. In case of fallback the traffic is
processed with the internal TCP socket. Propagating field "file" from the
SMC socket to the internal TCP socket fixes the issue.
net/smc: wait for pending work before clcsock release_sock
When the clcsock is already released using sock_release() and a pending
smc_listen_work accesses the clcsock than that will fail. Solve this
by canceling and waiting for the work to complete first. Because the
work holds the sock_lock it must make sure that the lock is not hold
before the new helper smc_clcsock_release() is invoked. And before the
smc_listen_work starts working check if the parent listen socket is
still valid, otherwise stop the work early.
With the previous value of 2, afu_dma_map_region was being omitted. I
suspect that the code paths have simply changed since the value of 2 was
chosen a decade ago, but it's also possible that it varies based on which
mapping function was used, compiler inlining choices, etc. In any case,
it's best to err on the side of skipping less.
Merge branch 'nvme-5.1' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph:
"Two nvme fixes for 5.1 - fixing the initial CSN for nvme-fc, and handle
log page offsets properly in the target."
* 'nvme-5.1' of git://git.infradead.org/nvme:
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
Keith Busch [Tue, 9 Apr 2019 16:03:59 +0000 (10:03 -0600)]
nvmet: fix discover log page when offsets are used
The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.
Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.
James Smart [Mon, 8 Apr 2019 18:15:19 +0000 (11:15 -0700)]
nvme-fc: correct csn initialization and increments on error
This patch fixes a long-standing bug that initialized the FC-NVME
cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
with CSN=1. By the time the spec reached approval, the language had
changed to state a connection should start with CSN=0. This patch
corrects the initialization value for FC-NVME connections.
Additionally, in reviewing the transport, the CSN value is assigned to
the new IU early in the start routine. It's possible that a later dma
map request may fail, causing the command to never be sent to the
controller. Change the location of the assignment so that it is
immediately prior to calling the lldd. Add a comment block to explain
the impacts if the lldd were to additionally fail sending the command.
Lin Yi [Wed, 10 Apr 2019 02:23:34 +0000 (10:23 +0800)]
drm/ttm: fix dma_fence refcount imbalance on error path
the ttm_bo_add_move_fence takes a reference to the struct dma_fence, but
failed to release it on the error path, leading to a memory leak.
add dma_fence_put before return when error occur.
Christian König [Tue, 2 Apr 2019 07:26:52 +0000 (09:26 +0200)]
drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
When ttm_put_pages() tries to figure out whether it's dealing with
transparent hugepages, it just reads past the bounds of the pages array
without a check.
v2: simplify the test if enough pages are left in the array (Christian).
Merge tag 'asoc-fix-v5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.1
A few core fixes along with the driver specific ones, mainly fixing
small issues that only affect x86 platforms for various reasons (their
unusual machine enumeration mechanisms mainly, plus a fix for error
handling in topology).
There's some of the driver fixes that look larger than they are, like
the hdmi-codec changes which resulted in an indentation change, and most
of the other large changes are for new drivers like the STM32 changes.
Faiz Abbas [Thu, 11 Apr 2019 08:59:37 +0000 (14:29 +0530)]
mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
commit 5b0d62108b46 ("mmc: sdhci-omap: Add platform specific reset
callback") skips data resets during tuning operation. Because of this,
a data error or data finish interrupt might still arrive after a command
error has been handled and the mrq ended. This ends up with a "mmc0: Got
data interrupt 0x00000002 even though no data operation was in progress"
error message.
Fix this by adding a platform specific callback for sdhci_irq. Mark the
mrq as a failure but wait for a data interrupt instead of calling
finish_mrq().
Stefan Agner [Wed, 10 Apr 2019 22:47:46 +0000 (00:47 +0200)]
gpu: host1x: Fix compile error when IOMMU API is not available
In case the IOMMU API is not available compiling host1x fails with
the following error:
In file included from drivers/gpu/host1x/hw/host1x06.c:27:
drivers/gpu/host1x/hw/channel_hw.c: In function ‘host1x_channel_set_streamid’:
drivers/gpu/host1x/hw/channel_hw.c:118:30: error: implicit declaration of function
‘dev_iommu_fwspec_get’; did you mean ‘iommu_fwspec_free’? [-Werror=implicit-function-declaration]
struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
^~~~~~~~~~~~~~~~~~~~
iommu_fwspec_free
Fixes: de5469c21ff9 ("gpu: host1x: Program the channel stream ID") Signed-off-by: Stefan Agner <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
gue tunnels run iptunnel_pull_offloads on received skbs. This can
determine a possible use-after-free accessing guehdr pointer since
the packet will be 'uncloned' running pskb_expand_head if it is a
cloned gso skb (e.g if the packet has been sent though a veth device)
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hoang Le [Tue, 9 Apr 2019 07:59:24 +0000 (14:59 +0700)]
tipc: missing entries in name table of publications
When binding multiple services with specific type 1Ki, 2Ki..,
this leads to some entries in the name table of publications
missing when listed out via 'tipc name show'.
The problem is at identify zero last_type conditional provided
via netlink. The first is initial 'type' when starting name table
dummping. The second is continuously with zero type (node state
service type). Then, lookup function failure to finding node state
service type in next iteration.
To solve this, adding more conditional to marked as dirty type and
lookup correct service type for the next iteration instead of select
the first service as initial 'type' zero.
Jakub Kicinski [Tue, 9 Apr 2019 00:59:50 +0000 (17:59 -0700)]
net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()
Unlike '&&' operator, the '&' does not have short-circuit
evaluation semantics. IOW both sides of the operator always
get evaluated. Fix the wrong operator in
tls_is_sk_tx_device_offloaded(), which would lead to
out-of-bounds access for for non-full sockets.
Si-Wei Liu [Mon, 8 Apr 2019 23:45:27 +0000 (19:45 -0400)]
failover: allow name change on IFF_UP slave interfaces
When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.
As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.
Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.
It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
events on failover slaves. Userspace component interacting with slaves
is expected to be changed to operate on failover master interface
instead, as the failover slave is dynamic in nature which may come and
go at any point. The goal is to make the role of failover slaves less
relevant, and userspace components should only deal with failover master
in the long run.
drm/i915/gvt: Roundup fb->height into tile's height at calucation fb->size
When fb is tiled and fb->height isn't the multiple of tile's height,
the format fb->size = fb->stride * fb->height, will get a smaller size
than the actual size. As the memory height of tiled fb should be multiple
of tile's height.
Fixes: 7f1a93b1f1d1 ("drm/i915/gvt: Correct the calculation of plane size") Reviewed-by: Zhenyu Wang <[email protected]> Signed-off-by: Xiong Zhang <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
Hangbin Liu [Mon, 8 Apr 2019 08:45:17 +0000 (16:45 +0800)]
team: set slave to promisc if team is already in promisc mode
After adding a team interface to bridge, the team interface will enter
promisc mode. Then if we add a new slave to team0, the slave will keep
promisc off. Fix it by setting slave to promisc on if team master is
already in promisc mode, also do the same for allmulti.
v2: add promisc and allmulti checking when delete ports
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Wed, 10 Apr 2019 23:23:39 +0000 (16:23 -0700)]
net/tls: fix build without CONFIG_TLS_DEVICE
buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n.
Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx()
wouldn't be compiled either in this case.
Fixes: 35b71a34ada6 ("net/tls: don't leak partially sent record in device mode") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Müller [Mon, 8 Apr 2019 13:33:54 +0000 (15:33 +0200)]
clk: x86: Add system specific quirk to mark clocks as critical
Since commit 648e921888ad ("clk: x86: Stop marking clocks as
CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are
unconditionally gated off. Unfortunately this will break systems where these
clocks are used for external purposes beyond the kernel's knowledge. Fix it
by implementing a system specific quirk to mark the necessary pmc_plt_clks as
critical.
PCI: pciehp: Ignore Link State Changes after powering off a slot
During a safe hot remove, the OS powers off the slot, which may cause a
Data Link Layer State Changed event. The slot has already been set to
OFF_STATE, so that event results in re-enabling the device, making it
impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may
not actually turn off the drive.
David S. Miller [Wed, 10 Apr 2019 20:07:02 +0000 (13:07 -0700)]
Merge branch 'tls-leaks'
Jakub Kicinski says:
====================
net: tls: fix memory leaks and freeing skbs
This series fixes two memory issues and a stack overflow.
First two patches are fairly simple leaks. Third patch
partially reverts an optimization made to the strparser
which causes creation of skb->frag_list->skb->frag_list...
chains of 100s of skbs, leading to recursive kfree_skb()
filling up the kernel stack.
====================
This reverts the first part of commit 4e485d06bb8c ("strparser: Call
skb_unclone conditionally"). To build a message with multiple
fragments we need our own root of frag_list. We can't simply
use the frag_list of orig_skb, because it will lead to linking
all orig_skbs together creating very long frag chains, and causing
stack overflow on kfree_skb() (which is called recursively on
the frag_lists).
Jakub Kicinski [Wed, 10 Apr 2019 18:04:31 +0000 (11:04 -0700)]
net/tls: don't leak partially sent record in device mode
David reports that tls triggers warnings related to
sk->sk_forward_alloc not being zero at destruction time:
WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170
When sender fills up the write buffer and dies from
SIGPIPE. This is due to the device implementation
not cleaning up the partially_sent_record.
This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
moved the partial record cleanup to the SW-only path.