Jiri Wiesner [Wed, 5 Dec 2018 15:55:29 +0000 (16:55 +0100)]
ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.
Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:
It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.
The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.
If for some reason an association's fragmentation point is zero,
sctp_datamsg_from_user will try to endlessly try to divide a message
into zero-sized chunks. This eventually causes kernel panic due to
running out of memory.
Although this situation is quite unlikely, it has occurred before as
reported. I propose to add this simple last-ditch sanity check due to
the severity of the potential consequences.
Sam Bobroff [Mon, 3 Dec 2018 00:53:21 +0000 (11:53 +1100)]
drm/ast: Fix connector leak during driver unload
When unloading the ast driver, a warning message is printed by
drm_mode_config_cleanup() because a reference is still held to one of
the drm_connector structs.
Correct this by calling drm_crtc_force_disable_all() in
ast_fbdev_destroy().
Dave Airlie [Thu, 6 Dec 2018 04:08:43 +0000 (14:08 +1000)]
Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Fixes for 4.20:
- Fix banding regression on 6 bpc panels
- Vega20 fix for six 4k displays
- Fix LRU handling in ttm_buffer_object_transfer
- Use proper MC firmware for newer polaris variants
- Vega20 powerplay fixes
- VCN suspend/resume fix for PCO
- Misc other fixes
Dave Airlie [Thu, 6 Dec 2018 04:07:26 +0000 (14:07 +1000)]
Merge tag 'msm-fixes-2018-12-04' of https://gitlab.freedesktop.org/seanpaul/dpu-staging into drm-fixes
- Several related to incorrect error checking/handling (Various)
- Prevent IRQ storm on MDP5 HDMI hotplug (Todor)
- Don't capture crash state if unsupported (Sharat)
- Properly grab vblank reference in atomic wait for commit done (Sean)
put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().
Anders Roxell [Fri, 30 Nov 2018 15:08:59 +0000 (16:08 +0100)]
stackleak: Mark stackleak_track_stack() as notrace
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.
Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200 mcount_get_lr_addr x0 // pointer to function's saved lr
(gdb) bt
\#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\#1 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#2 0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\#3 0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\#4 0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\#5 ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\#6 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#7 0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\#8 0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\#9 0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205
Rework so we mark stackleak_track_stack as notrace
Anson Huang [Tue, 4 Dec 2018 03:17:45 +0000 (03:17 +0000)]
ARM: imx: update the cpu power up timing setting on i.mx6sx
The sw2iso count should cover ARM LDO ramp-up time,
the MAX ARM LDO ramp-up time may be up to more than
100us on some boards, this patch sets sw2iso to 0xf
(~384us) which is the reset value, and it is much
more safe to cover different boards, since we have
observed that some customer boards failed with current
setting of 0x2.
Fixes: 05136f0897b5 ("ARM: imx: support arm power off in cpuidle for i.mx6sx") Signed-off-by: Anson Huang <[email protected]> Reviewed-by: Fabio Estevam <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
Linus Torvalds [Thu, 6 Dec 2018 01:06:31 +0000 (17:06 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four obvious bug fixes. The vmw_pscsi is so old that it's amazing
no-one noticed before now"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix a race in sub-channel creation that can cause panic
scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload
scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset
scsi: lpfc: fix block guard enablement on SLI3 adapters
Yuchung Cheng [Wed, 5 Dec 2018 22:38:38 +0000 (14:38 -0800)]
tcp: fix NULL ref in tail loss probe
TCP loss probe timer may fire when the retranmission queue is empty but
has a non-zero tp->packets_out counter. tcp_send_loss_probe will call
tcp_rearm_rto which triggers NULL pointer reference by fetching the
retranmission queue head in its sub-routines.
Add a more detailed warning to help catch the root cause of the inflight
accounting inconsistency.
Eric Dumazet [Wed, 5 Dec 2018 22:24:31 +0000 (14:24 -0800)]
tcp: Do not underestimate rwnd_limited
If available rwnd is too small, tcp_tso_should_defer()
can decide it is worth waiting before splitting a TSO packet.
This really means we are rwnd limited.
Fixes: 5615f88614a4 ("tcp: instrument how long TCP is limited by receive window") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Reviewed-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Edward Cree [Tue, 4 Dec 2018 17:37:57 +0000 (17:37 +0000)]
net: use skb_list_del_init() to remove from RX sublists
list_del() leaves the skb->next pointer poisoned, which can then lead to
a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN
forwarding bridge on sfc as per:
Fixes: 9af86f933894 ("net: core: fix use-after-free in __netif_receive_skb_list_core") Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing") Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()") Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Wed, 5 Dec 2018 23:51:41 +0000 (15:51 -0800)]
Merge tag 'arc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes/updates from Vineet Gupta
- Missing reads{x}()/writes{x}() getting in the way of some drivers [Jose Abreu]
- Builds defaulting to ARCv2 ISA based configsa [Kevin Hilman]
- Misc fixes
* tag 'arc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: io.h: Implement reads{x}()/writes{x}()
ARC: change defconfig defaults to ARCv2
arc: [devboards] Add support of NFSv3 ACL
ARC: mm: fix uninitialised signal code in do_page_fault
ARC: [plat-hsdk] Enable DW APB GPIO support
ARCv2: boot log unaligned access in use
ARC: IOC: panic if kernel was started with previously enabled IOC
ARC: remove redundant 'default n' from Kconfig
David Rientjes [Wed, 5 Dec 2018 23:45:54 +0000 (15:45 -0800)]
mm, thp: restore node-local hugepage allocations
This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
By not setting __GFP_THISNODE, applications can allocate remote hugepages
when the local node is fragmented or low on memory when either the thp
defrag setting is "always" or the vma has been madvised with
MADV_HUGEPAGE.
Remote access to hugepages often has much higher latency than local pages
of the native page size. On Haswell, ac5b2c18911f was shown to have a
13.9% access regression after this commit for binaries that remap their
text segment to be backed by transparent hugepages.
The intent of ac5b2c18911f is to address an issue where a local node is
low on memory or fragmented such that a hugepage cannot be allocated. In
every scenario where this was described as a fix, there is abundant and
unfragmented remote memory available to allocate from, even with a greater
access latency.
If remote memory is also low or fragmented, not setting __GFP_THISNODE was
also measured on Haswell to have a 40% regression in allocation latency.
Dan Williams [Mon, 3 Dec 2018 18:30:25 +0000 (10:30 -0800)]
acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
A "short" ARS (address range scrub) instructs the platform firmware to
return known errors. In contrast, a "long" ARS instructs platform
firmware to arrange every data address on the DIMM to be read / checked
for poisoned data.
The conversion of the flags in commit d3abaf43bab8 "acpi, nfit: Fix
Address Range Scrub completion tracking", changed the meaning of passing
'0' to acpi_nfit_ars_rescan(). Previously '0' meant "not short", now '0'
is ARS_REQ_SHORT. Pass ARS_REQ_LONG to restore the expected scrub-type
behavior of user-initiated ARS sessions.
Dan Williams [Sat, 24 Nov 2018 18:47:04 +0000 (10:47 -0800)]
libnvdimm, pfn: Pad pfn namespaces relative to other regions
Commit cfe30b872058 "libnvdimm, pmem: adjust for section collisions with
'System RAM'" enabled Linux to workaround occasions where platform
firmware arranges for "System RAM" and "Persistent Memory" to collide
within a single section boundary. Unfortunately, as reported in this
issue [1], platform firmware can inflict the same collision between
persistent memory regions.
The approach of interrogating iomem_resource does not work in this
case because platform firmware may merge multiple regions into a single
iomem_resource range. Instead provide a method to interrogate regions
that share the same parent bus.
This is a stop-gap until the core-MM can grow support for hotplug on
sub-section boundaries.
Dan Williams [Wed, 5 Dec 2018 22:11:48 +0000 (14:11 -0800)]
tools/testing/nvdimm: Align test resources to 128M
In preparation for libnvdimm growing new restrictions to detect section
conflicts between persistent memory regions, enable nfit_test to
allocate aligned resources. Use a gen_pool to allocate nfit_test's fake
resources in a separate address space from the virtual translation of
the same.
Linus Torvalds [Wed, 5 Dec 2018 21:28:01 +0000 (13:28 -0800)]
Merge tag 'for-linus-20181205' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A bit earlier in the week as usual, but there's a fix here that should
go in sooner rather than later.
Under a combination of circumstance, the direct issue path in blk-mq
could corrupt data. This wasn't easy to hit, but the ones that are
affected by it, seem to hit it pretty easily. Full explanation in the
patch. None of the regular filesystem and storage testing has
triggered it, even though it's been around since 4.19-rc1.
Outside of that, whitelist trim tweak for certain Samsung devices for
libata"
* tag 'for-linus-20181205' of git://git.kernel.dk/linux-block:
blk-mq: fix corruption with direct issue
libata: whitelist all SAMSUNG MZ7KM* solid-state disks
David S. Miller [Wed, 5 Dec 2018 19:46:06 +0000 (11:46 -0800)]
Merge tag 'mac80211-for-davem-2018-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg:
====================
As it's been a while, we have various fixes for
* hwsim
* AP mode (client powersave related)
* CSA/FTM interaction
* a busy loop in IE handling
* and similar
====================
Sakari Ailus [Wed, 5 Dec 2018 17:23:54 +0000 (12:23 -0500)]
media: Add a Kconfig option for the Request API
The Request API is now merged to the kernel but the confidence on the
stability of that API is not great, especially regarding the interaction
with V4L2.
Add a Kconfig option for the API, with a scary-looking warning.
The patch itself disables request creation as well as does not advertise
them as buffer flags. The driver requiring requests (cedrus) now depends
on the Kconfig option as well.
Hans Verkuil [Wed, 5 Dec 2018 11:28:20 +0000 (06:28 -0500)]
media: mpeg2-ctrls.h: move MPEG2 state controls to non-public header
The MPEG2 state controls for the cedrus stateless MPEG2 driver are
not yet stable. Move them out of the public headers into media/mpeg2-ctrls.h.
Eventually, once this has stabilized, they will be moved back to the
public headers.
Unfortunately I had to cast the control type to a u32 in two switch
statements to prevent a compiler warning about a control type define
not being part of the enum.
Linus Torvalds [Wed, 5 Dec 2018 17:58:17 +0000 (09:58 -0800)]
Merge tag 'for-4.20-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A patch in 4.19 introduced a sanity check that was too strict and a
filesystem cannot be mounted.
This happens for filesystems with more than 10 devices and has been
reported by a few users so we need the fix to propagate to stable"
* tag 'for-4.20-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
Linus Torvalds [Wed, 5 Dec 2018 17:51:10 +0000 (09:51 -0800)]
Merge tag 'pm-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Revert a problematic recent commit that attempted to fix a system-wide
suspend issue related to the freezer"
* tag 'pm-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "exec: make de_thread() freezable"
Chris Chiu [Wed, 5 Dec 2018 06:48:56 +0000 (14:48 +0800)]
ALSA: hda/realtek: Fix mic issue on Acer AIO Veriton Z4860G/Z6860G
Acer AIO Veriton Z4860G/Z6860G with the same ALC286 codec has issues
with the input from external microphone. The issue can be fixed by
the fixup ALC286_FIXUP_ACER_AIO_MIC_NO_PRESENCE for Veriton Z4660G.
Chris Chiu [Wed, 5 Dec 2018 06:48:55 +0000 (14:48 +0800)]
ALSA: hda/realtek: Fix mic issue on Acer AIO Veriton Z4660G
Acer AIO Veriton Z4660G with ALC286 codec has issue with the input
from external microphones connecting via 'Front Mic' jack. The fixup
ALC286_FIXUP_ACER_AIO_MIC_NO_PRESENCE enables the jack sensing of
the headset and fix the audio input issue of external microphone.
Chris Chiu [Wed, 5 Dec 2018 06:48:54 +0000 (14:48 +0800)]
ALSA: hda/realtek - Add support for Acer Aspire C24-860 headset mic
The Acer AIO Aspire C24-860 with ALC286 can't detect the headset
microphone. Just like another Acer AIO U27-880, it needs a different
pin value for 0x18 and the headset fixup to make headset mic work.
Chris Chiu [Wed, 5 Dec 2018 06:48:53 +0000 (14:48 +0800)]
ALSA: hda/realtek: ALC286 mic and headset-mode fixups for Acer Aspire U27-880
Acer Aspire U27-880(AIO) with ALC286 codec can not detect headset mic
and internal mic not working either. It needs the similar quirk like
Sony laptops to fix headphone jack sensing and enables use of the
internal microphone.
Unfortunately jack sensing for the headset mic is still not working.
The thermal_zone_of_device_ops structure can be const as it is only
passed as the last argument of thermal_zone_of_sensor_register
and the corresponding parameter is declared as const.
The thermal_zone_of_device_ops structure can be const as it is only
passed as the last argument of devm_thermal_zone_of_sensor_register
and the corresponding parameter is declared as const.
Trond Myklebust [Mon, 3 Dec 2018 23:49:00 +0000 (18:49 -0500)]
SUNRPC: Don't force a redundant disconnection in xs_read_stream()
If the connection is broken, then xs_tcp_state_change() will take care
of scheduling the socket close as soon as appropriate. xs_read_stream()
just needs to report the error.
Trond Myklebust [Tue, 4 Dec 2018 12:52:11 +0000 (07:52 -0500)]
SUNRPC: Fix RPC receive hangs
The RPC code is occasionally hanging when the receive code fails to
empty the socket buffer due to a partial read of the data. When we
convert that to an EAGAIN, it appears we occasionally leave data in the
socket. The fix is to just keep reading until the socket returns
EAGAIN/EWOULDBLOCK.
Faiz Abbas [Wed, 21 Nov 2018 10:33:55 +0000 (16:03 +0530)]
mmc: sdhci-omap: Fix DCRC error handling during tuning
Commit 7d33c3581536 ("mmc: sdhci-omap: Workaround for Errata i802")
disabled DCRC interrupts during tuning. This write to the interrupt
enable register gets overwritten in sdhci_prepare_data() and the
interrupt is not in fact disabled. Fix this by disabling the interrupt
in the host->ier variable.
Jouni Malinen [Wed, 5 Dec 2018 10:55:54 +0000 (12:55 +0200)]
cfg80211: Fix busy loop regression in ieee80211_ie_split_ric()
This function was modified to support the information element extension
case (WLAN_EID_EXTENSION) in a manner that would result in an infinite
loop when going through set of IEs that include WLAN_EID_RIC_DATA and
contain an IE that is in the after_ric array. The only place where this
can currently happen is in mac80211 ieee80211_send_assoc() where
ieee80211_ie_split_ric() is called with after_ric[].
This can be triggered by valid data from user space nl80211
association/connect request (i.e., requiring GENL_UNS_ADMIN_PERM). The
only known application having an option to include WLAN_EID_RIC_DATA in
these requests is wpa_supplicant and it had a bug that prevented this
specific contents from being used (and because of that, not triggering
this kernel bug in an automated test case ap_ft_ric) and now that this
bug is fixed, it has a workaround to avoid this kernel issue.
WLAN_EID_RIC_DATA is currently used only for testing purposes, so this
does not cause significant harm for production use cases.
Tvrtko Ursulin [Wed, 5 Dec 2018 11:33:24 +0000 (11:33 +0000)]
drm/i915: Introduce per-engine workarounds
We stopped re-applying the GT workarounds after engine reset since commit 59b449d5c82a ("drm/i915: Split out functions for different kinds of
workarounds").
Issue with this is that some of the GT workarounds live in the MMIO space
which gets lost during engine resets. So far the registers in 0x2xxx and
0xbxxx address range have been identified to be affected.
This losing of applied workarounds has obvious negative effects and can
even lead to hard system hangs (see the linked Bugzilla).
Rather than just restoring this re-application, because we have also
observed that it is not safe to just re-write all GT workarounds after
engine resets (GPU might be live and weird hardware states can happen),
we introduce a new class of per-engine workarounds and move only the
affected GT workarounds over.
Using the framework introduced in the previous patch, we therefore after
engine reset, re-apply only the workarounds living in the affected MMIO
address ranges.
v2:
* Move Wa_1406609255:icl to engine workarounds as well.
* Rename API. (Chris Wilson)
* Drop redundant IS_KABYLAKE. (Chris Wilson)
* Re-order engine wa/ init so latest platforms are first. (Rodrigo Vivi)
Tvrtko Ursulin [Wed, 5 Dec 2018 11:33:23 +0000 (11:33 +0000)]
drm/i915: Record GT workarounds in a list
To enable later verification of GT workaround state at various stages of
driver lifetime, we record the list of applicable ones per platforms to a
list, from which they are also applied.
The added data structure is a simple array of register, mask and value
items, which is allocated on demand as workarounds are added to the list.
This is a temporary implementation which later in the series gets fused
with the existing per context workaround list handling. It is separated at
this stage since the following patch fixes a bug which needs to be as easy
to backport as possible.
Also, since in the following patch we will be adding a new class of
workarounds (per engine) which can be applied from interrupt context, we
straight away make the provision for safe read-modify-write cycle.
v2:
* Change dev_priv to i915 along the init path. (Chris Wilson)
* API rename. (Chris Wilson)
v3:
* Remove explicit list size tracking in favour of growing the allocation
in power of two chunks. (Chris Wilson)
v4:
Chris Wilson:
* Change wa_list_finish to early return.
* Copy workarounds using the compiler for static checking.
* Do not bother zeroing unused entries.
* Re-order struct i915_wa_list.
mac80211: ignore NullFunc frames in the duplicate detection
NullFunc packets should never be duplicate just like
QoS-NullFunc packets.
We saw a client that enters / exits power save with
NullFunc frames (and not with QoS-NullFunc) despite the
fact that the association supports HT.
This specific client also re-uses a non-zero sequence number
for different NullFunc frames.
At some point, the client had to send a retransmission of
the NullFunc frame and we dropped it, leading to a
misalignment in the power save state.
Fix this by never consider a NullFunc frame as duplicate,
just like we do for QoS NullFunc frames.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201449
Felix Fietkau [Wed, 28 Nov 2018 21:39:16 +0000 (22:39 +0100)]
mac80211: fix reordering of buffered broadcast packets
If the buffered broadcast queue contains packets, letting new packets bypass
that queue can lead to heavy reordering, since the driver is probably throttling
transmission of buffered multicast packets after beacons.
Keep buffering packets until the buffer has been cleared (and no client
is in powersave mode).
Felix Fietkau [Tue, 13 Nov 2018 19:32:13 +0000 (20:32 +0100)]
mac80211: ignore tx status for PS stations in ieee80211_tx_status_ext
Make it behave like regular ieee80211_tx_status calls, except for the lack of
filtered frame processing.
This fixes spurious low-ack triggered disconnections with powersave clients
connected to an AP.
Peter Shih [Tue, 27 Nov 2018 04:49:50 +0000 (12:49 +0800)]
tty: serial: 8250_mtk: always resume the device in probe.
serial8250_register_8250_port calls uart_config_port, which calls
config_port on the port before it tries to power on the port. So we need
the port to be on before calling serial8250_register_8250_port. Change
the code to always do a runtime resume in probe before registering port,
and always do a runtime suspend in remove.
This basically reverts the change in commit 68e5fc4a255a ("tty: serial:
8250_mtk: use pm_runtime callbacks for enabling"), but still use
pm_runtime callbacks.
Fixes: 68e5fc4a255a ("tty: serial: 8250_mtk: use pm_runtime callbacks for enabling") Signed-off-by: Peter Shih <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The USB-serial console implementation has never reported the actual
terminal settings used. Despite storing the corresponding cflags in its
struct console, these were never honoured on later tty open() where the
tty termios would be left initialised to the driver defaults.
Unlike the serial console implementation, the USB-serial code calls
subdriver open() already at console setup. While calling set_termios()
and write() before open() looks like it could work for some USB-serial
drivers, others definitely do not expect this, so modelling this after
serial core is going to be intrusive, if at all possible.
Instead, use a (renamed) tty helper to save the termios data used at
console setup so that the tty termios reflects the actual terminal
settings after a subsequent tty open().
Note that the calls to tty_init_termios() (tty_driver_install()) and
tty_save_termios() are serialised using the disconnect mutex.
This specifically fixes a regression that was triggered by a recent
change adding software flow control to the pl2303 driver: a getty trying
to disable flow control while leaving the baud rate unchanged would now
also set the baud rate to the driver default (prior to the flow-control
change this had been a noop).
That commit triggered a new WARN when unloading the module (see at the
end of the commit message). When a class_dev is embedded in a structure
then that class_dev is the thing that controls the lifetime of that
structure, for that reason device managed allocations can't be used here.
See Documentation/kobject.txt.
Revert the above patch, so the struct is allocated using kzalloc and we
have a release function for it that frees the allocated memory, otherwise
it is broken.
Harry Pan [Wed, 28 Nov 2018 16:40:41 +0000 (00:40 +0800)]
usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device
Some lower volume SanDisk Ultra Flair in 16GB, which the VID:PID is
in 0781:5591, will aggressively request LPM of U1/U2 during runtime,
when using this thumb drive as the OS installation key we found the
device will generate failure during U1 exit path making it dropped
from the USB bus, this causes a corrupted installation in system at
the end.
i.e.,
[ 166.918296] hub 2-0:1.0: state 7 ports 7 chg 0000 evt 0004
[ 166.918327] usb usb2-port2: link state change
[ 166.918337] usb usb2-port2: do warm reset
[ 166.970039] usb usb2-port2: not warm reset yet, waiting 50ms
[ 167.022040] usb usb2-port2: not warm reset yet, waiting 200ms
[ 167.276043] usb usb2-port2: status 02c0, change 0041, 5.0 Gb/s
[ 167.276050] usb 2-2: USB disconnect, device number 2
[ 167.276058] usb 2-2: unregistering device
[ 167.276060] usb 2-2: unregistering interface 2-2:1.0
[ 167.276170] xhci_hcd 0000:00:15.0: shutdown urb ffffa3c7cc695cc0 ep1in-bulk
[ 167.284055] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 167.284064] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 33 04 90 00 01 00 00
...
Analyzed the USB trace in the link layer we realized it is because
of the 6-ms timer of tRecoveryConfigurationTimeout which documented
on the USB 3.2 Revision 1.0, the section 7.5.10.4.2 of "Exit from
Recovery.Configuration"; device initiates U1 exit -> Recovery.Active
-> Recovery.Configuration, then the host timer timeout makes the link
transits to eSS.Inactive -> Rx.Detect follows by a Warm Reset.
Interestingly, the other higher volume of SanDisk Ultra Flair sharing
the same VID:PID, such as 64GB, would not request LPM during runtime,
it sticks at U0 always, thus disabling LPM does not affect those thumb
drives at all.
The same odd occures in SanDisk Ultra Fit 16GB, VID:PID in 0781:5583.
Alan Stern [Wed, 28 Nov 2018 16:25:58 +0000 (11:25 -0500)]
USB: Fix invalid-free bug in port_over_current_notify()
Syzbot and KASAN found the following invalid-free bug in
port_over_current_notify():
--------------------------------------------------------------------------
BUG: KASAN: double-free or invalid-free in port_over_current_notify
drivers/usb/core/hub.c:5192 [inline]
BUG: KASAN: double-free or invalid-free in port_event
drivers/usb/core/hub.c:5241 [inline]
BUG: KASAN: double-free or invalid-free in hub_event+0xd97/0x4140
drivers/usb/core/hub.c:5384
The problem is caused by use of a static array to store
environment-string pointers. When the routine is called by multiple
threads concurrently, the pointers from one thread can overwrite those
from another.
The solution is to use an ordinary automatic array instead of a static
array.
Young Xiao [Wed, 28 Nov 2018 08:06:53 +0000 (08:06 +0000)]
staging: rtl8712: Fix possible buffer overrun
In commit 8b7a13c3f404 ("staging: r8712u: Fix possible buffer
overrun") we fix a potential off by one by making the limit smaller.
The better fix is to make the buffer larger. This makes it match up
with the similar code in other drivers.
Bin Liu [Mon, 12 Nov 2018 15:43:22 +0000 (09:43 -0600)]
dmaengine: cppi41: delete channel from pending list when stop channel
The driver defines three states for a cppi channel.
- idle: .chan_busy == 0 && not in .pending list
- pending: .chan_busy == 0 && in .pending list
- busy: .chan_busy == 1 && not in .pending list
There are cases in which the cppi channel could be in the pending state
when cppi41_dma_issue_pending() is called after cppi41_runtime_suspend()
is called.
cppi41_stop_chan() has a bug for these cases to set channels to idle state.
It only checks the .chan_busy flag, but not the .pending list, then later
when cppi41_runtime_resume() is called the channels in .pending list will
be transitioned to busy state.
Removing channels from the .pending list solves the problem.
Lucas Stach [Tue, 6 Nov 2018 03:40:33 +0000 (03:40 +0000)]
dmaengine: imx-sdma: implement channel termination via worker
The dmaengine documentation states that device_terminate_all may be
asynchronous and need not wait for the active transfers to stop.
This allows us to move most of the functionality currently implemented
in the sdma channel termination function to run in a worker, outside
of any atomic context. Moving this out of atomic context has two
benefits: we can now sleep while waiting for the channel to terminate,
instead of busy waiting and the freeing of the dma descriptors happens
with IRQs enabled, getting rid of a warning in the dma mapping code.
As the termination is now async, we need to implement the
device_synchronize dma engine function which simply waits for the
worker to finish its execution.
Lucas Stach [Tue, 6 Nov 2018 03:40:28 +0000 (03:40 +0000)]
Revert "dmaengine: imx-sdma: alloclate bd memory from dma pool"
This reverts commit fe5b85c656bc. The SDMA engine needs the descriptors to
be contiguous in memory. As the dma pool API is only able to provide a
single descriptor per alloc invocation there is no guarantee that multiple
descriptors satisfy this requirement. Also the code in question is broken
as it only allocates memory for a single descriptor, without looking at the
number of descriptors required for the transfer, leading to out-of-bounds
accesses when the descriptors are written.
Masahiro Yamada [Wed, 5 Dec 2018 06:27:19 +0000 (15:27 +0900)]
x86/build: Fix compiler support check for CONFIG_RETPOLINE
It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.
Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.
The easiest fix is to move this check to the "archprepare" like
this commit did:
829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler")
Matthew Wilcox [Fri, 30 Nov 2018 16:05:06 +0000 (11:05 -0500)]
dax: Fix unlock mismatch with updated API
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.
In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.
Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:
Russell King [Fri, 9 Nov 2018 17:01:05 +0000 (17:01 +0000)]
thermal: armada: fix legacy resource fixup
When the armada thermal module is inserted, removed and then reinserted,
the system panics as per the messages below. The reason is that "edit"
a live resource in the resource tree twice, and end up with it pointing
to some other hardware.
Editing live resources (resources that are part of the registered
resource tree) is not permissible - the resource tree is an ordered
set of resources, sorted by start address, and when a new resource is
inserted, it is validated that it (a) fits within its parent resource
and (b) does not overlap a neighbouring resource.
Get rid of this resource editing. We can instead adjust the return
value from ioremap() as ioremap() deals with the creation of page-
based mappings - provided the adjustment does not cross a page
boundary.
Baruch Siach [Tue, 4 Dec 2018 14:03:52 +0000 (16:03 +0200)]
net: mvpp2: fix detection of 10G SFP modules
The mvpp2_phylink_validate() relies on the interface field of
phylink_link_state to determine valid link modes. However, when called
from phylink_sfp_module_insert() this field in not initialized. The
default switch case then excludes 10G link modes. This allows 10G SFP
modules that are detected correctly to be configured at max rate of
2.5G.
Catch the uninitialized PHY mode case, and allow 10G rates.
Russell King [Fri, 9 Nov 2018 16:44:14 +0000 (16:44 +0000)]
thermal: armada: fix legacy validity test sense
Commit 8c0e64ac4075 ("thermal: armada: get rid of the ->is_valid()
pointer") removed the unnecessary indirection through a function
pointer, but in doing so, also removed the negation operator too:
- if (priv->data->is_valid && !priv->data->is_valid(priv)) {
+ if (armada_is_valid(priv)) {
which results in:
armada_thermal f06f808c.thermal: Temperature sensor reading not valid
armada_thermal f2400078.thermal: Temperature sensor reading not valid
armada_thermal f4400078.thermal: Temperature sensor reading not valid
at boot, or whenever the "temp" sysfs file is read. Replace the
negation operator.
Fixes: 8c0e64ac4075 ("thermal: armada: get rid of the ->is_valid() pointer") Signed-off-by: Russell King <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>
ethernet: fman: fix wrong of_node_put() in probe function
After getting a reference to the platform device's of_node the probe
function ends up calling of_find_matching_node() using the node as an
argument. The function takes care of decreasing the refcount on it. We
are then incorrectly decreasing the refcount on that node again.
This patch removes the unwarranted call to of_node_put().
Fixes: 414fd46e7762 ("fsl/fman: Add FMan support") Signed-off-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Fabio Estevam [Fri, 30 Nov 2018 10:31:29 +0000 (08:31 -0200)]
ARM: dts: imx7d-pico: Describe the Wifi clock
The Wifi chip should be clocked by a 32kHz clock coming from i.MX7D
CLKO2 output pin, so describe the pinmux and clock hierarchy in the
device tree to allow the Wifi chip to be properly clocked.
Managed to successfully test Wifi with such change. Used the standard
nvram.txt file provided by TechNexion, which selects an external 32kHz
clock for the Wifi chip by default.
Jens Axboe [Wed, 5 Dec 2018 03:06:48 +0000 (20:06 -0700)]
blk-mq: fix corruption with direct issue
If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.
This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:
[ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
because most of it is garbage...
This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.
Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.
See also:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
While trying to use the dma_mmap_*() interface, it was noticed that this
interface returns strange values when passed an incorrect length.
If neither of the if() statements fire then the return value is
uninitialized. In the worst case it returns 0 which means the caller
will think the function succeeded.
Vladimir Murzin [Fri, 23 Nov 2018 11:25:21 +0000 (12:25 +0100)]
ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart
Chris has discovered and reported that v7_dma_inv_range() may corrupt
memory if address range is not aligned to cache line size.
Since the whole cache-v7m.S was lifted form cache-v7.S the same
observation applies to v7m_dma_inv_range(). So the fix just mirrors
what has been done for v7 with a little specific of M-class.
Chris Cole [Fri, 23 Nov 2018 11:20:45 +0000 (12:20 +0100)]
ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
This patch addresses possible memory corruption when
v7_dma_inv_range(start_address, end_address) address parameters are not
aligned to whole cache lines. This function issues "invalidate" cache
management operations to all cache lines from start_address (inclusive)
to end_address (exclusive). When start_address and/or end_address are
not aligned, the start and/or end cache lines are first issued "clean &
invalidate" operation. The assumption is this is done to ensure that any
dirty data addresses outside the address range (but part of the first or
last cache lines) are cleaned/flushed so that data is not lost, which
could happen if just an invalidate is issued.
The problem is that these first/last partial cache lines are issued
"clean & invalidate" and then "invalidate". This second "invalidate" is
not required and worse can cause "lost" writes to addresses outside the
address range but part of the cache line. If another component writes to
its part of the cache line between the "clean & invalidate" and
"invalidate" operations, the write can get lost. This fix is to remove
the extra "invalidate" operation when unaligned addressed are used.
A kernel module is available that has a stress test to reproduce the
issue and a unit test of the updated v7_dma_inv_range(). It can be
downloaded from
http://ftp.sageembedded.com/outgoing/linux/cache-test-20181107.tgz.
v7_dma_inv_range() is call by dmac_[un]map_area(addr, len, direction)
when the direction is DMA_FROM_DEVICE. One can (I believe) successfully
argue that DMA from a device to main memory should use buffers aligned
to cache line size, because the "clean & invalidate" might overwrite
data that the device just wrote using DMA. But if a driver does use
unaligned buffers, at least this fix will prevent memory corruption
outside the buffer.
drm/amd/display: Fix overflow/truncation from strncpy.
[Why]
New GCC warnings for stringop-truncation and stringop-overflow help
catch common misuse of strncpy. This patch suppresses these warnings
by fixing bugs identified by them.
[How]
Since the parameter passed for name in amdpgu_dm_create_common_mode has
no fixed length, if the string is >= DRM_DISPLAY_MODE_LEN then
mode->name will not be null-terminated.
The truncation in fill_audio_info won't actually occur (and the string
will be null-terminated since the buffer is initialized to zero), but
the warning can be suppressed by using the proper buffer size.
This patch fixes both issues by using the real size for the buffer and
making use of strscpy (which always terminates).
wentalou [Mon, 3 Dec 2018 02:49:50 +0000 (10:49 +0800)]
drm/amdgpu: enlarge maximum waiting time of KIQ
KIQ in VF’s init delayed by another VF’s reset,
which would cause late_init failed occasionally.
MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue.
Darrick J. Wong [Sun, 2 Dec 2018 16:38:07 +0000 (08:38 -0800)]
iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read. This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.
In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads. This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation. The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.
Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.
Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <[email protected]> Ranted-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
Linus Torvalds [Tue, 4 Dec 2018 17:10:39 +0000 (09:10 -0800)]
Merge branch 'parisc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"On parisc, use -ffunction-sections compiler option when building
32-bit kernel modules to avoid sysfs-warnings when loading such
modules.
This got broken with kernel v4.18"
* 'parisc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable -ffunction-sections for modules on 32-bit kernel
Darrick J. Wong [Fri, 30 Nov 2018 18:37:49 +0000 (10:37 -0800)]
splice: don't read more than available pipe space
In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read. This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.
In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.
The brokenness is compounded by splice_direct_to_actor immediately
bailing on do_splice_to returning <= 0 without ever calling ->actor
(which empties out the pipe), so if userspace calls back we'll EFAULT
again on the full pipe, and nothing ever gets copied.
Therefore, teach splice_direct_to_actor to clamp its requests to the
amount of free space in the pipe and remove the simulated short read
from the iomap directio code.
Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <[email protected]> Ranted-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
Darrick J. Wong [Fri, 30 Nov 2018 20:32:38 +0000 (12:32 -0800)]
vfs: allow some remap flags to be passed to vfs_clone_file_range
In overlayfs, ovl_remap_file_range calls vfs_clone_file_range on the
lower filesystem's inode, passing through whatever remap flags it got
from its caller. Since vfs_copy_file_range first tries a filesystem's
remap function with REMAP_FILE_CAN_SHORTEN, this can get passed through
to the second vfs_copy_file_range call, and this isn't an issue.
Change the WARN_ON to look only for the DEDUP flag.
Eric Sandeen [Fri, 30 Nov 2018 15:55:57 +0000 (07:55 -0800)]
xfs: fix inverted return from xfs_btree_sblock_verify_crc
xfs_btree_sblock_verify_crc is a bool so should not be returning
a failaddr_t; worse, if xfs_log_check_lsn fails it returns
__this_address which looks like a boolean true (i.e. success)
to the caller.
(interestingly xfs_btree_lblock_verify_crc doesn't have the issue)
Darrick J. Wong [Tue, 27 Nov 2018 19:01:43 +0000 (11:01 -0800)]
xfs: fix PAGE_MASK usage in xfs_free_file_space
In commit e53c4b598, I *tried* to teach xfs to force writeback when we
fzero/fpunch right up to EOF so that if EOF is in the middle of a page,
the post-EOF part of the page gets zeroed before we return to userspace.
Unfortunately, I missed the part where PAGE_MASK is ~(PAGE_SIZE - 1),
which means that we totally fail to zero if we're fpunching and EOF is
within the first page. Worse yet, the same PAGE_MASK thinko plagues the
filemap_write_and_wait_range call, so we'd initiate writeback of the
entire file, which (mostly) masked the thinko.
Drop the tricky PAGE_MASK and replace it with correct usage of PAGE_SIZE
and the proper rounding macros.
Fixes: e53c4b598 ("xfs: ensure post-EOF zeroing happens after zeroing part of a file") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
David S. Miller [Tue, 4 Dec 2018 16:47:44 +0000 (08:47 -0800)]
phy: Revert toggling reset changes.
This reverts:
ef1b5bf506b1 ("net: phy: Fix not to call phy_resume() if PHY is not attached") 8c85f4b81296 ("net: phy: micrel: add toggling phy reset if PHY is not attached")
Andrew Lunn informs me that there are alternative efforts
underway to fix this more properly.
Daniel Borkmann [Tue, 4 Dec 2018 16:22:02 +0000 (17:22 +0100)]
Merge branch 'bpf-verifier-resilience'
Alexei Starovoitov says:
====================
Three patches to improve verifier ability to handle pathological bpf
programs with a lot of branches:
- make sure prog_load syscall can be aborted
- improve branch taken analysis
- introduce per-insn complexity limit for unprivileged programs
====================
malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.
The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
before after
bpf_lb-DLB_L3.o 1940 1940
bpf_lb-DLB_L4.o 3089 3089
bpf_lb-DUNKNOWN.o 1065 1065
bpf_lxc-DDROP_ALL.o 28052 | 28162
bpf_lxc-DUNKNOWN.o 35487 | 35541
bpf_netdev.o 10864 10864
bpf_overlay.o 6643 6643
bpf_lcx_jit.o 38437 38437
But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.
pathological bpf programs may try to force verifier to explode in
the number of branch states:
20: (d5) if r1 s<= 0x24000028 goto pc+0
21: (b5) if r0 <= 0xe1fa20 goto pc+2
22: (d5) if r1 s<= 0x7e goto pc+0
23: (b5) if r0 <= 0xe880e000 goto pc+0
24: (c5) if r0 s< 0x2100ecf4 goto pc+0
25: (d5) if r1 s<= 0xe880e000 goto pc+1
26: (c5) if r0 s< 0xf4041810 goto pc+0
27: (d5) if r1 s<= 0x1e007e goto pc+0
28: (b5) if r0 <= 0xe86be000 goto pc+0
29: (07) r0 += 16614
30: (c5) if r0 s< 0x6d0020da goto pc+0
31: (35) if r0 >= 0x2100ecf4 goto pc+0
Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.
It also helps real bpf programs to be verified faster:
before after
bpf_lb-DLB_L3.o 2003 1940
bpf_lb-DLB_L4.o 3173 3089
bpf_lb-DUNKNOWN.o 1080 1065
bpf_lxc-DDROP_ALL.o 29584 28052
bpf_lxc-DUNKNOWN.o 36916 35487
bpf_netdev.o 11188 10864
bpf_overlay.o 6679 6643
bpf_lcx_jit.o 39555 38437
bpf: check pending signals while verifying programs
Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.
The warning gets triggered by an ancient lockdep check in the freezer:
(gdb) list *0xffffffff812ece06
0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57).
52 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
54 */
55 static inline bool try_to_freeze_unsafe(void)
56 {
57 might_sleep();
58 if (likely(!freezing(current)))
59 return false;
60 return __refrigerator(false);
61 }
I reviewed the ->cred_guard_mutex code, and the mutex is held across all
of exec() - and we always did this.
But there's this recent -rc4 commit:
> Chanho Min (1):
> exec: make de_thread() freezable
Qu Wenruo [Fri, 23 Nov 2018 01:06:36 +0000 (09:06 +0800)]
btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
[BUG]
A completely valid btrfs will refuse to mount, with error message like:
BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \
bg_start=12018974720 bg_len=10888413184, invalid block group size, \
have 10888413184 expect (0, 10737418240]
This has been reported several times as the 4.19 kernel is now being
used. The filesystem refuses to mount, but is otherwise ok and booting
4.18 is a workaround.
Btrfs check returns no error, and all kernels used on this fs is later
than 2011, which should all have the 10G size limit commit.
[CAUSE]
For a 12 devices btrfs, we could allocate a chunk larger than 10G due to
stripe stripe bump up.
Aaro Koskinen [Mon, 19 Nov 2018 23:14:00 +0000 (01:14 +0200)]
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
Since v2.6.22 or so there has been reports [1] about OMAP MMC being
broken on OMAP15XX based hardware (OMAP5910 and OMAP310). The breakage
seems to have been caused by commit 46a6730e3ff9 ("mmc-omap: Fix
omap to use MMC_POWER_ON") that changed clock enabling to be done
on MMC_POWER_ON. This can happen multiple times in a row, and on 15XX
the hardware doesn't seem to like it and the MMC just stops responding.
Fix by memorizing the power mode and do the init only when necessary.
The commit broke some selinux-testsuite cases, and it looks like there's no
straightforward fix keeping the direction of this patch, so revert for now.
The original patch was trying to fix the consistency of permission checks, and
not an observed bug. So reverting should be safe.
Wolfram Sang [Mon, 26 Nov 2018 13:38:13 +0000 (14:38 +0100)]
mmc: core: use mrq->sbc when sending CMD23 for RPMB
When sending out CMD23 in the blk preparation, the comment there
rightfully says:
* However, it is not sufficient to just send CMD23,
* and avoid the final CMD12, as on an error condition
* CMD12 (stop) needs to be sent anyway. This, coupled
* with Auto-CMD23 enhancements provided by some
* hosts, means that the complexity of dealing
* with this is best left to the host. If CMD23 is
* supported by card and host, we'll fill sbc in and let
* the host deal with handling it correctly.
Let's do this behaviour for RPMB as well, and not send CMD23
independently. Otherwise IP cores (like Renesas SDHI) may timeout
because of automatic CMD23/CMD12 handling.
Masami Hiramatsu [Thu, 23 Aug 2018 17:16:12 +0000 (02:16 +0900)]
kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
After copy_optimized_instructions() copies several instructions
to the working buffer it tries to fix up the real RIP address, but it
adjusts the RIP-relative instruction with an incorrect RIP address
for the 2nd and subsequent instructions due to a bug in the logic.
This will break the kernel pretty badly (with likely outcomes such as
a kernel freeze, a crash, or worse) because probed instructions can refer
to the wrong data.
For example putting kprobes on cpumask_next() typically hits this bug.
cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
(in this case nr_cpumask_bits is an alias of nr_cpu_ids):
This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
the original *nr_cpu_ids. This leads to a kernel freeze because
cpumask_next() always returns 0 and for_each_cpu() never ends.
Fix this by adding 'len' correctly to the real RIP address while
copying.