Eric Biggers [Fri, 8 Dec 2017 15:13:28 +0000 (15:13 +0000)]
509: fix printing uninitialized stack memory when OID is empty
Callers of sprint_oid() do not check its return value before printing
the result. In the case where the OID is zero-length, -EBADMSG was
being returned without anything being written to the buffer, resulting
in uninitialized stack memory being printed. Fix this by writing
"(bad)" to the buffer in the cases where -EBADMSG is returned.
Fixes: 4f73175d0375 ("X.509: Add utility functions to render OIDs as strings") Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
Eric Biggers [Fri, 8 Dec 2017 15:13:28 +0000 (15:13 +0000)]
X.509: fix buffer overflow detection in sprint_oid()
In sprint_oid(), if the input buffer were to be more than 1 byte too
small for the first snprintf(), 'bufsize' would underflow, causing a
buffer overflow when printing the remainder of the OID.
Fortunately this cannot actually happen currently, because no users pass
in a buffer that can be too small for the first snprintf().
Regardless, fix it by checking the snprintf() return value correctly.
For consistency also tweak the second snprintf() check to look the same.
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
X.509: reject invalid BIT STRING for subjectPublicKey
Adding a specially crafted X.509 certificate whose subjectPublicKey
ASN.1 value is zero-length caused x509_extract_key_data() to set the
public key size to SIZE_MAX, as it subtracted the nonexistent BIT STRING
metadata byte. Then, x509_cert_parse() called kmemdup() with that bogus
size, triggering the WARN_ON_ONCE() in kmalloc_slab().
This appears to be harmless, but it still must be fixed since WARNs are
never supposed to be user-triggerable.
Fix it by updating x509_cert_parse() to validate that the value has a
BIT STRING metadata byte, and that the byte is 0 which indicates that
the number of bits in the bitstring is a multiple of 8.
It would be nice to handle the metadata byte in asn1_ber_decoder()
instead. But that would be tricky because in the general case a BIT
STRING could be implicitly tagged, and/or could legitimately have a
length that is not a whole number of bytes.
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
ASN.1: check for error from ASN1_OP_END__ACT actions
asn1_ber_decoder() was ignoring errors from actions associated with the
opcodes ASN1_OP_END_SEQ_ACT, ASN1_OP_END_SET_ACT,
ASN1_OP_END_SEQ_OF_ACT, and ASN1_OP_END_SET_OF_ACT. In practice, this
meant the pkcs7_note_signed_info() action (since that was the only user
of those opcodes). Fix it by checking for the error, just like the
decoder does for actions associated with the other opcodes.
This bug allowed users to leak slab memory by repeatedly trying to add a
specially crafted "pkcs7_test" key (requires CONFIG_PKCS7_TEST_KEY).
In theory, this bug could also be used to bypass module signature
verification, by providing a PKCS#7 message that is misparsed such that
a signature's ->authattrs do not contain its ->msgdigest. But it
doesn't seem practical in normal cases, due to restrictions on the
format of the ->authattrs.
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
ASN.1: fix out-of-bounds read when parsing indefinite length item
In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed
to the action functions before their lengths had been computed, using
the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in
reading data past the end of the input buffer, when given a specially
crafted message.
Fix it by rearranging the code so that the indefinite length is resolved
before the action is called.
This bug was originally found by fuzzing the X.509 parser in userspace
using libFuzzer from the LLVM project.
KASAN report (cleaned up slightly):
BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline]
BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
Read of size 128 at addr ffff880035dd9eaf by task keyctl/195
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
KEYS: add missing permission check for request_key() destination
When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring. This should require Write permission to the keyring. However,
there is actually no permission check.
This can be abused to add keys to any keyring to which only Search
permission is granted. This is because Search permission allows joining
the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.
Both negatively and positively instantiated keys can be added using this
method. Adding negative keys is trivial. Adding a positive key is a
bit trickier. It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().
Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.
We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key(). Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.
We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring. (I don't know of any users
who actually do that, though...)
Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key") Cc: <[email protected]> # v2.6.13+ Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
Eric Biggers [Fri, 8 Dec 2017 15:13:27 +0000 (15:13 +0000)]
KEYS: remove unnecessary get/put of explicit dest_keyring
In request_key_and_link(), in the case where the dest_keyring was
explicitly specified, there is no need to get another reference to
dest_keyring before calling key_link(), then drop it afterwards. This
is because by definition, we already have a reference to dest_keyring.
This change is useful because we'll be making
construct_get_dest_keyring() able to return an error code, and we don't
want to have to handle that error here for no reason.
Yousuk Seung [Thu, 7 Dec 2017 21:41:34 +0000 (13:41 -0800)]
tcp: invalidate rate samples during SACK reneging
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.
In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.
This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.
can: peak/pcie_fd: fix potential bug in restarting tx queue
Don't rely on can_get_echo_skb() return value to wake the network tx
queue up: can_get_echo_skb() returns 0 if the echo array slot was not
occupied, but also when the DLC of the released echo frame was 0.
Martin Kelly [Tue, 5 Dec 2017 19:15:50 +0000 (11:15 -0800)]
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).
This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.
Martin Kelly [Tue, 5 Dec 2017 19:15:49 +0000 (11:15 -0800)]
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).
This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.
Martin Kelly [Tue, 5 Dec 2017 19:15:48 +0000 (11:15 -0800)]
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).
This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.
Martin Kelly [Tue, 5 Dec 2017 19:15:47 +0000 (11:15 -0800)]
can: ems_usb: cancel urb on -EPIPE and -EPROTO
In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).
This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.
Martin Kelly [Tue, 5 Dec 2017 18:34:03 +0000 (10:34 -0800)]
can: mcba_usb: cancel urb on -EPROTO
When we unplug the device, we can see both -EPIPE and -EPROTO depending
on exact timing and what system we run on. If we continue to resubmit
URBs, they will immediately fail, and they can cause stalls, especially
on slower CPUs.
Fix this by not resubmitting on -EPROTO, as we already do on -EPIPE.
Dave Airlie [Thu, 7 Dec 2017 22:17:53 +0000 (08:17 +1000)]
Merge tag 'drm-misc-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
regression fix for vc4 + rpm stable fix for analogix bridge
* tag 'drm-misc-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc:
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
Dave Airlie [Thu, 7 Dec 2017 22:17:09 +0000 (08:17 +1000)]
Merge tag 'drm-intel-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix for fd.o bug #103997 CNL eDP + HDMI causing a machine hard hang (James)
- Fix to allow suspending with a wedged GPU to hopefully unwedge it (Chris)
- Fix for Gen2 vblank timestap/frame counter jumps (Ville)
- Revert of a W/A for enabling FBC on CNL/GLK for certain images
and sizes (Rodrigo)
- Lockdep fix for i915 userptr code (Chris)
gvt-fixes-2017-12-06
- Fix invalid hw reg read value for vGPU (Xiong)
- Fix qemu warning on PCI ROM bar missing (Changbin)
- Workaround preemption regression (Zhenyu)
* tag 'drm-intel-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-intel:
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
drm/i915/gvt: set max priority for gvt context
drm/i915/gvt: Don't mark vgpu context as inactive when preempted
drm/i915/gvt: Limit read hw reg to active vgpu
drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
drm/i915/gvt: Emulate PCI expansion ROM base address register
drm/i915/cnl: Mask previous DDI - PLL mapping
drm/i915: Fix vblank timestamp/frame counter jumps on gen2
drm/i915: Skip switch-to-kernel-context on suspend when wedged
Dave Airlie [Thu, 7 Dec 2017 22:15:09 +0000 (08:15 +1000)]
Merge tag 'exynos-drm-fixes-for-v4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
- fix page fault issue due to using wrong device object in prime import.
- drop NONCONTIG flag without IOMMU support.
- remove unnecessary members and declaration.
* tag 'exynos-drm-fixes-for-v4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: remove unnecessary function declaration
drm/exynos: remove unnecessary descrptions
drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
drm/exynos: Fix dma-buf import
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fixing broken uapi for BPF tracing programs for s390 and arm64
architectures due to pt_regs being in-kernel only, and not part
of uapi right now. A wrapper is added that exports pt_regs in
an asm-generic way. For arm64 this maps to existing user_pt_regs
structure and for s390 a user_pt_regs structure exporting the
beginning of pt_regs is added and uapi-exported, thus fixing the
BPF issues seen in perf (and BPF selftests), all from Hendrik.
====================
Bjørn Mork [Wed, 6 Dec 2017 19:21:24 +0000 (20:21 +0100)]
usbnet: fix alignment for frames with no ethernet header
The qmi_wwan minidriver support a 'raw-ip' mode where frames are
received without any ethernet header. This causes alignment issues
because the skbs allocated by usbnet are "IP aligned".
Fix by allowing minidrivers to disable the additional alignment
offset. This is implemented using a per-device flag, since the same
minidriver also supports 'ethernet' mode.
Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Reported-and-tested-by: Jay Foster <[email protected]> Signed-off-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Ahern [Thu, 7 Dec 2017 04:09:12 +0000 (20:09 -0800)]
netlink: Relax attr validation for fixed length types
Commit 28033ae4e0f5 ("net: netlink: Update attr validation to require
exact length for some types") requires attributes using types NLA_U* and
NLA_S* to have an exact length. This change is exposing bugs in various
userspace commands that are sending attributes with an invalid length
(e.g., attribute has type NLA_U8 and userspace sends NLA_U32). While
the commands are clearly broken and need to be fixed, users are arguing
that the sudden change in enforcement is breaking older commands on
newer kernels for use cases that otherwise "worked".
Relax the validation to print a warning mesage similar to what is done
for messages containing extra bytes after parsing.
Fixes: 28033ae4e0f5 ("net: netlink: Update attr validation to require exact length for some types") Signed-off-by: David Ahern <[email protected]> Reviewed-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this
v1->v2:
instead of doing rcu_read_unlock in place, we are going to "drop"
section (to prevent skb leakage)
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Nikita V. Shirokov <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Thu, 7 Dec 2017 18:53:05 +0000 (13:53 -0500)]
Merge branch 'mv88e6xxx-error-patch-fixes'
Andrew Lunn says:
====================
mv88e6xxx error patch fixes
While trying to bring up a new PHY on a board, i exercised the error
paths a bit, and discovered some bugs. The unwind for interrupt
handling deadlocks, and the MDIO code hits a BUG() when a registered
MDIO device is freed without first being unregistered.
====================
Andrew Lunn [Thu, 7 Dec 2017 00:05:57 +0000 (01:05 +0100)]
net: dsa: mv88e6xxx: Unregister MDIO bus on error path
The MDIO busses need to be unregistered before they are freed,
otherwise BUG() is called. Add a call to the unregister code if the
registration fails, since we can have multiple busses, of which some
may correctly register before one fails. This requires moving the code
around a little.
Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") Signed-off-by: Andrew Lunn <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Andrew Lunn [Thu, 7 Dec 2017 00:05:56 +0000 (01:05 +0100)]
net: dsa: mv88e6xxx: Fix interrupt masking on removal
When removing the interrupt handling code, we should mask the
generation of interrupts. The code however unmasked all
interrupts. This can then cause a new interrupt. We then get into a
deadlock where the interrupt thread is waiting to run, and the code
continues, trying to remove the interrupt handler, which means waiting
for the thread to complete. On a UP machine this deadlocks.
Fix so we really mask interrupts in the hardware. The same error is
made in the error path when install the interrupt handling code.
Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt") Signed-off-by: Andrew Lunn <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Tobias Jordan [Wed, 6 Dec 2017 14:23:23 +0000 (15:23 +0100)]
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
add appropriate calls to clk_disable_unprepare() by jumping to out_mdio
in case orion_mdio_probe() returns -EPROBE_DEFER.
Found by Linux Driver Verification project (linuxtesting.org).
Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock") Signed-off-by: Tobias Jordan <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
weiping zhang [Wed, 6 Dec 2017 13:59:16 +0000 (21:59 +0800)]
virtio_mmio: add cleanup for virtio_mmio_probe
As mentioned at drivers/base/core.c:
/*
* NOTE: _Never_ directly free @dev after calling this function, even
* if it returned an error! Always use put_device() to give up the
* reference initialized in this function instead.
*/
so we don't free vm_dev until vm_dev.dev.release be called.
Arnd Bergmann [Wed, 6 Dec 2017 13:17:17 +0000 (14:17 +0100)]
ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds
In configurations without CONFIG_OMAP3 but with secure RAM support,
we now run into a link failure:
arch/arm/mach-omap2/omap-secure.o: In function `omap3_save_secure_ram':
omap-secure.c:(.text+0x130): undefined reference to `save_secure_ram_context'
The omap3_save_secure_ram() function is only called from the OMAP34xx
power management code, so we can simply hide that function in the
appropriate #ifdef.
Fixes: d09220a887f7 ("ARM: OMAP2+: Fix SRAM virt to phys translation for save_secure_ram_context") Acked-by: Tony Lindgren <[email protected]> Tested-by: Dan Murphy <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
Rob Herring [Thu, 9 Nov 2017 22:26:12 +0000 (16:26 -0600)]
arm: dts: nspire: Add missing #phy-cells to usb-nop-xceiv
"usb-nop-xceiv" is using the phy binding, but is missing #phy-cells
property. This is probably because the binding was the precursor to the phy
binding.
Fixes the following warning in nspire dts files:
Warning (phys_property): Missing property '#phy-cells' in node ...
Marek Szyprowski [Tue, 21 Nov 2017 07:49:36 +0000 (08:49 +0100)]
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
get_modes() callback might be called asynchronously from the DRM core and
it is not synchronized with bridge_enable(), which sets proper runtime PM
state of the main DP device. Fix this by calling pm_runtime_get_sync()
before calling drm_get_edid(), which in turn calls drm_dp_i2c_xfer() and
analogix_dp_transfer() to ensure that main DP device is runtime active
when doing any access to its registers.
This fixes the following kernel issue on Samsung Exynos5250 Snow board:
Unhandled fault: imprecise external abort (0x406) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 406 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 62 Comm: kworker/0:2 Not tainted 4.13.0-rc2-00364-g4a97a3da420b #3357
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events output_poll_execute
task: edc14800 task.stack: edcb2000
PC is at analogix_dp_transfer+0x15c/0x2fc
LR is at analogix_dp_transfer+0x134/0x2fc
pc : [<c0468538>] lr : [<c0468510>] psr: 60000013
sp : edcb3be8 ip : 0000002a fp : 00000001
r10: 00000000 r9 : edcb3cd8 r8 : edcb3c40
r7 : 00000000 r6 : edd3b380 r5 : edd3b010 r4 : 00000064
r3 : 00000000 r2 : f0ad3000 r1 : edcb3c40 r0 : edd3b010
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 4000406a DAC: 00000051
Process kworker/0:2 (pid: 62, stack limit = 0xedcb2210)
Stack: (0xedcb3be8 to 0xedcb4000)
[<c0468538>] (analogix_dp_transfer) from [<c0424ba4>] (drm_dp_i2c_do_msg+0x8c/0x2b4)
[<c0424ba4>] (drm_dp_i2c_do_msg) from [<c0424e64>] (drm_dp_i2c_xfer+0x98/0x214)
[<c0424e64>] (drm_dp_i2c_xfer) from [<c057b2d8>] (__i2c_transfer+0x140/0x29c)
[<c057b2d8>] (__i2c_transfer) from [<c057b4a4>] (i2c_transfer+0x70/0xe4)
[<c057b4a4>] (i2c_transfer) from [<c0441de4>] (drm_do_probe_ddc_edid+0xb4/0x114)
[<c0441de4>] (drm_do_probe_ddc_edid) from [<c0441e5c>] (drm_probe_ddc+0x18/0x28)
[<c0441e5c>] (drm_probe_ddc) from [<c0445728>] (drm_get_edid+0x124/0x2d4)
[<c0445728>] (drm_get_edid) from [<c0465ea0>] (analogix_dp_get_modes+0x90/0x114)
[<c0465ea0>] (analogix_dp_get_modes) from [<c0425e8c>] (drm_helper_probe_single_connector_modes+0x198/0x68c)
[<c0425e8c>] (drm_helper_probe_single_connector_modes) from [<c04325d4>] (drm_setup_crtcs+0x1b4/0xd18)
[<c04325d4>] (drm_setup_crtcs) from [<c04344a8>] (drm_fb_helper_hotplug_event+0x94/0xd0)
[<c04344a8>] (drm_fb_helper_hotplug_event) from [<c0425a50>] (drm_kms_helper_hotplug_event+0x24/0x28)
[<c0425a50>] (drm_kms_helper_hotplug_event) from [<c04263ec>] (output_poll_execute+0x6c/0x174)
[<c04263ec>] (output_poll_execute) from [<c0136f18>] (process_one_work+0x188/0x3fc)
[<c0136f18>] (process_one_work) from [<c01371f4>] (worker_thread+0x30/0x4b8)
[<c01371f4>] (worker_thread) from [<c013daf8>] (kthread+0x128/0x164)
[<c013daf8>] (kthread) from [<c0108510>] (ret_from_fork+0x14/0x24)
Code: 0a000002ea000009e25440010a00004a (e59537c8)
---[ end trace cddc7919c79f7878 ]---
Kalle Valo [Thu, 7 Dec 2017 13:50:34 +0000 (15:50 +0200)]
Merge tag 'iwlwifi-for-kalle-2017-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes
Third batch of iwlwifi patches intended for 4.15.
* Tell mac80211 when the MAC has been stripped (9000 series);
* Tell mac80211 when the IVC has been stripped (9000 series);
* Add 2 new PCI IDs, one for 9000 and one for 22000;
* Fix a queue hang due during ROC.
HSD says "WA withdrawn. It was causing corruption with some images.
WA is not strictly necessary since this bug just causes loss of FBC
compression with some sizes and images, but doesn't break anything."
Boris Brezillon [Wed, 22 Nov 2017 20:39:28 +0000 (21:39 +0100)]
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
With CONFIG_REFCOUNT_FULL enabled, refcount_inc() complains when it's
passed a refcount object that has its counter set to 0. In this driver,
this is a valid use case since we want to increment ->usecnt only when
the BO object starts to be used by real HW components and this is
definitely not the case when the BO is created.
Fix the problem by using refcount_inc_not_zero() instead of
refcount_inc() and fallback to refcount_set(1) when
refcount_inc_not_zero() returns false. Note that this 2-steps operation
is not racy here because the whole section is protected by a mutex
which guarantees that the counter does not change between the
refcount_inc_not_zero() and refcount_set() calls.
Chris Wilson [Wed, 22 Nov 2017 17:26:21 +0000 (17:26 +0000)]
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
We don't need struct_mutex to initialise userptr (it just allocates a
workqueue for itself etc), but we do need struct_mutex later on in
i915_gem_init() in order to feed requests onto the HW.
This should break the chain
[ 385.697902] ======================================================
[ 385.697907] WARNING: possible circular locking dependency detected
[ 385.697913] 4.14.0-CI-Patchwork_7234+ #1 Tainted: G U
[ 385.697917] ------------------------------------------------------
[ 385.697922] perf_pmu/2631 is trying to acquire lock:
[ 385.697927] (&mm->mmap_sem){++++}, at: [<ffffffff811bfe1e>] __might_fault+0x3e/0x90
[ 385.697941]
but task is already holding lock:
[ 385.697946] (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.697957]
which lock already depends on the new lock.
Heiko Carstens [Wed, 6 Dec 2017 15:11:27 +0000 (16:11 +0100)]
s390: fix compat system call table
When wiring up the socket system calls the compat entries were
incorrectly set. Not all of them point to the corresponding compat
wrapper functions, which clear the upper 33 bits of user space
pointers, like it is required.
Fixes: 977108f89c989 ("s390: wire up separate socketcalls system calls") Cc: <[email protected]> # v4.3+ Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
Linus Torvalds [Thu, 7 Dec 2017 02:33:17 +0000 (18:33 -0800)]
Merge tag 'for_linus-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull kgdb fixes from Jason Wessel:
- Fix long standing problem with kdb kallsyms_symbol_next() return
value
- Add new co-maintainer Daniel Thompson
* tag 'for_linus-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kgdb/kdb/debug_core: Add co-maintainer Daniel Thompson
kdb: Fix handling of kallsyms_symbol_next() return value
Linus Torvalds [Thu, 7 Dec 2017 02:23:27 +0000 (18:23 -0800)]
proc: show si_ptr in /proc/<pid>/timers without hashing
It's a user pointer, and while the permissions of the file are pretty
questionable (should it really be readable to everybody), hashing the
pointer isn't going to be the solution.
We should take a closer look at more of the /proc/<pid> file permissions
in general. Sure, we do want many of them to often be readable (for
'ps' and friends), but I think we should probably do a few conversions
from S_IRUGO to S_IRUSR.
Linus Torvalds [Thu, 7 Dec 2017 02:16:20 +0000 (18:16 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fixes from Greg Ungerer:
"There are two fixes here. One to add a missing linker section to the
m68k architecture linker scripts, the other to fix a defconfig build
problem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k/defconfig: fix stmark2 broken local compilation
m68k: add missing SOFTIRQENTRY_TEXT linker section
Linus Torvalds [Thu, 7 Dec 2017 01:47:29 +0000 (17:47 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- make CR4 handling irq-safe, which bug vmware guests ran into
- don't crash on early IRQs in Xen guests
- don't crash secondary CPU bringup if #UD assisted WARN()ings are
triggered
- make X86_BUG_FXSAVE_LEAK optional on newer AMD CPUs that have the fix
- fix AMD Fam17h microcode loading
- fix broadcom_postcore_init() if ACPI is disabled
- fix resume regression in __restore_processor_context()
- fix Sparse warnings
- fix a GCC-8 warning
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Change time() prototype to match __vdso_time()
x86: Fix Sparse warnings about non-static functions
x86/power: Fix some ordering bugs in __restore_processor_context()
x86/PCI: Make broadcom_postcore_init() check acpi_disabled
x86/microcode/AMD: Add support for fam17h microcode loading
x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD
x86/idt: Load idt early in start_secondary
x86/xen: Support early interrupts in xen pv guests
x86/tlb: Disable interrupts when changing CR4
x86/tlb: Refactor CR4 setting and shadow write
Linus Torvalds [Thu, 7 Dec 2017 01:43:26 +0000 (17:43 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"This includes a fix for the add_wait_queue() queue ordering brown
paperbag bug, plus PELT accounting fixes for cgroups scheduling
artifacts"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Update and fix the runnable propagation rule
sched/wait: Fix add_wait_queue() behavioral change
Linus Torvalds [Thu, 7 Dec 2017 01:41:24 +0000 (17:41 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This includes perf namespace support kernel side fixes, plus an
accumulated set of perf tooling fixes - including UAPI header
synchronization that should make the perf build less noisy"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
tooling/headers: Synchronize updated s390 and x86 UAPI headers
tools headers: Syncronize mman.h ABI header
tools headers: Synchronize prctl.h ABI header
tools headers: Synchronize KVM arch ABI headers
tools headers: Synchronize drm/i915_drm.h
tools headers uapi: Synchronize drm/drm.h
tools headers: Synchronize perf_event.h header
tools headers: Synchronize kernel ABI headers wrt SPDX tags
tools/headers: Synchronize kernel x86 UAPI headers
perf intel-pt: Bring instruction decoder files into line with the kernel
perf test: Fix test 21 for s390x
perf bench numa: Fixup discontiguous/sparse numa nodes
perf top: Use signal interface for SIGWINCH handler
perf top: Fix window dimensions change handling
perf: Fix header.size for namespace events
perf top: Ignore kptr_restrict when not sampling the kernel
perf record: Ignore kptr_restrict when not sampling the kernel
perf report: Ignore kptr_restrict when not sampling the kernel
perf evlist: Add helper to check if attr.exclude_kernel is set in all evsels
perf test shell: Fix test case probe libc's inet_pton on s390x
...
Linus Torvalds [Thu, 7 Dec 2017 01:39:44 +0000 (17:39 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull lockdep fix from Ingo Molnar:
"Fix a possible NULL dereference for the (rare) case when a task
doesn't have ->xhlocks space allocated due to kmalloc() OOM-ing"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix possible NULL deref
Marek Szyprowski [Wed, 22 Nov 2017 13:14:47 +0000 (14:14 +0100)]
drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
When no IOMMU is available, all GEM buffers allocated by Exynos DRM driver
are contiguous, because of the underlying dma_alloc_attrs() function
provides only such buffers. In such case it makes no sense to keep
BO_NONCONTIG flag for the allocated GEM buffers. This allows to avoid
failures for buffer contiguity checks in the subsequent operations on GEM
objects.
Marek Szyprowski [Mon, 30 Oct 2017 07:28:09 +0000 (08:28 +0100)]
drm/exynos: Fix dma-buf import
When IOMMU support was enabled, dma-buf import in Exynos DRM was broken
since commit f43c35966a5a ("drm/exynos: use real device for DMA-mapping
operations") due to using wrong struct device in drm_gem_prime_import()
function. This patch fixes following kernel BUG caused by incorrect buffer
mapping to DMA address space:
exynos-sysmmu 14650000.sysmmu: 14450000.mixer: PAGE FAULT occurred at 0xb2e00000
------------[ cut here ]------------
kernel BUG at drivers/iommu/exynos-iommu.c:449!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4-next-20171016-00033-g990d723669fd #3165
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
task: c0e0b7c0 task.stack: c0e00000
PC is at exynos_sysmmu_irq+0x1d0/0x24c
LR is at exynos_sysmmu_irq+0x154/0x24c
------------[ cut here ]------------
Reported-by: Marian Mihailescu <[email protected]> Fixes: f43c35966a5a ("drm/exynos: use real device for DMA-mapping operations") Signed-off-by: Marek Szyprowski <[email protected]> Reviewed-by: Tobias Jakobi <[email protected]> Signed-off-by: Inki Dae <[email protected]>
Linus Torvalds [Wed, 6 Dec 2017 23:47:51 +0000 (15:47 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Two fixes: use bool type consistently, plus a irq_matrix_available()
bugfix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdesc: Use bool return type instead of int
genirq/matrix: Fix the precedence fix for real
Nikolay Borisov [Fri, 1 Dec 2017 09:19:42 +0000 (11:19 +0200)]
btrfs: Fix possible off-by-one in btrfs_search_path_in_tree
The name char array passed to btrfs_search_path_in_tree is of size
BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes
are in the range of [0, 4079]. Currently the code uses the define but this
represents an off-by-one.
Implications:
Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be
written to extra space, not some padding that could be provided by the
allocator.
btrfs-progs store the arguments on stack, but kernel does own copy of
the ioctl buffer and the off-by-one overwrite does not affect userspace,
but the ending 0 might be lost.
Kernel ioctl buffer is allocated dynamically so we're overwriting
somebody else's memory, and the ioctl is privileged if args.objectid is
not 256. Which is in most cases, but resolving a subvolume stored in
another directory will trigger that path.
Before this patch the buffer was one byte larger, but then the -1 was
not added.
Fixes: ac8e9819d71f907 ("Btrfs: add search and inode lookup ioctls") Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: David Sterba <[email protected]>
[ added implications ] Signed-off-by: David Sterba <[email protected]>
Omar Sandoval [Wed, 6 Dec 2017 06:54:02 +0000 (22:54 -0800)]
Btrfs: disable FUA if mounted with nobarrier
I was seeing disk flushes still happening when I mounted a Btrfs
filesystem with nobarrier for testing. This is because we use FUA to
write out the first super block, and on devices without FUA support, the
block layer translates FUA to a flush. Even on devices supporting true
FUA, using FUA when we asked for no barriers is surprising.
Jeff Mahoney [Tue, 21 Nov 2017 18:58:49 +0000 (13:58 -0500)]
btrfs: handle errors while updating refcounts in update_ref_for_cow
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false. The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.
Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.
Justin Maggard [Mon, 30 Oct 2017 22:29:10 +0000 (15:29 -0700)]
btrfs: Fix quota reservation leak on preallocated files
Commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to")
changed the behavior of __btrfs_buffered_write() so that it first tries
to get a data space reservation, and then skips the relatively expensive
nocow check if the reservation succeeded.
If we have quotas enabled, the data space reservation also includes a
quota reservation. But in the rewrite case, the space has already been
accounted for in qgroups. So btrfs_check_data_free_space() increases
the quota reservation, but it never gets decreased when the data
actually gets written and overwrites the pre-existing data. So we're
left with both the qgroup and qgroup reservation accounting for the same
space.
This commit adds the missing btrfs_qgroup_free_data() call in the case
of BTRFS_ORDERED_PREALLOC extents.
Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to") Signed-off-by: Justin Maggard <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
Linus Torvalds [Wed, 6 Dec 2017 23:20:51 +0000 (15:20 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Misc fixes: world-readable pointer removal from sysfs, a ESRT kfree()
bug fix and a comment update"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Add comment to avoid future expanding of sysfs systab
efi/esrt: Use memunmap() instead of kfree() to free the remapping
efi: Move some sysfs files to be read-only by root
Linus Torvalds [Wed, 6 Dec 2017 22:53:32 +0000 (14:53 -0800)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"Two fixes:
- objtool cross-build fixes
- removal of an obsolete CPU-hotplug state name from comments"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix 64-bit build on 32-bit host
cpu/hotplug: Fix state name in takedown_cpu() comment
Daniel Thompson [Mon, 2 Mar 2015 14:13:36 +0000 (14:13 +0000)]
kdb: Fix handling of kallsyms_symbol_next() return value
kallsyms_symbol_next() returns a boolean (true on success). Currently
kdb_read() tests the return value with an inequality that
unconditionally evaluates to true.
This is fixed in the obvious way and, since the conditional branch is
supposed to be unreachable, we also add a WARN_ON().
of: overlay: Fix (un)locking in of_overlay_apply()
The special overlay mutex is taken first, hence it should be released
last in the error path.
of_resolve_phandles() must be called with of_mutex held. Without it, a
node and new phandle could be added via of_attach_node(), making the max
phandle wrong.
free_overlay_changeset() must be called with of_mutex held, if any
non-trivial cleanup is to be done.
Hence move "mutex_lock(&of_mutex)" up, as suggested by Frank, and merge
the two tail statements of the success and error paths, now they became
identical.
Note that while the two mutexes are adjacent, we still need both:
__of_changeset_apply_notify(), which is called by __of_changeset_apply()
unlocks of_mutex, then does notifications then locks of_mutex. So the
mutex get released in the middle of of_overlay_apply()
Fixes: f948d6d8b792bb90 ("of: overlay: avoid race condition between applying multiple overlays") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Frank Rowand <[email protected]> Signed-off-by: Rob Herring <[email protected]>
of: overlay: Fix memory leak in of_overlay_apply() error path
If of_resolve_phandles() fails, free_overlay_changeset() is called in
the error path. However, that function returns early if the list hasn't
been initialized yet, before freeing the object.
Explicitly calling kfree() instead would solve that issue. However, that
complicates matter, by having to consider which of two different methods
to use to dispose of the same object.
Hence make free_overlay_changeset() consider initialization state of the
different parts of the object, making it always safe to call (once!) to
dispose of a (partially) initialized overlay_changeset:
- Only destroy the changeset if the list was initialized,
- Make init_overlay_changeset() store the ID in ovcs->id on success,
to avoid calling idr_remove() with an error value or an already
released ID.
Reported-by: Colin King <[email protected]> Fixes: f948d6d8b792bb90 ("of: overlay: avoid race condition between applying multiple overlays") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Frank Rowand <[email protected]> Signed-off-by: Rob Herring <[email protected]>
Mikulas Patocka [Sat, 2 Dec 2017 22:17:44 +0000 (16:17 -0600)]
objtool: Fix 64-bit build on 32-bit host
The new ORC unwinder breaks the build of a 64-bit kernel on a 32-bit
host. Building the kernel on a i386 or x32 host fails with:
orc_dump.c: In function 'orc_dump':
orc_dump.c:105:26: error: passing argument 2 of 'elf_getshdrnum' from incompatible pointer type [-Werror=incompatible-pointer-types]
if (elf_getshdrnum(elf, &nr_sections)) {
^
In file included from /usr/local/include/gelf.h:32:0,
from elf.h:22,
from warn.h:26,
from orc_dump.c:20:
/usr/local/include/libelf.h:304:12: note: expected 'size_t * {aka unsigned int *}' but argument is of type 'long unsigned int *'
extern int elf_getshdrnum (Elf *__elf, size_t *__dst);
^~~~~~~~~~~~~~
orc_dump.c:190:17: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'Elf64_Sxword {aka long long int}' [-Werror=format=]
printf("%s+%lx:", name, rela.r_addend);
~~^ ~~~~~~~~~~~~~
%llx
Fix the build failure.
Another problem is that if the user specifies HOSTCC or HOSTLD
variables, they are ignored in the objtool makefile. Change the
Makefile to respect these variables.
Document the recommended presence of a device-specific compatible value,
and list examples that are already in use or soon will be.
This will allow checkpatch to validate compatible values in DTS.
Update the example to match current best practices (generic node name,
specific compatible value first).
Chris Dion [Wed, 6 Dec 2017 15:50:28 +0000 (10:50 -0500)]
net_sched: use macvlan real dev trans_start in dev_trans_start()
Macvlan devices are similar to vlans and do not update their
own trans_start. In order for arp monitoring to work for a bond device
when the slaves are macvlans, obtain its real device.
Arnd Bergmann [Mon, 4 Dec 2017 15:01:55 +0000 (16:01 +0100)]
x86/vdso: Change time() prototype to match __vdso_time()
gcc-8 warns that time() is an alias for __vdso_time() but the two
have different prototypes:
arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias]
int time(time_t *t)
^~~~
arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here
I could not figure out whether this is intentional, but I see that
changing it to return time_t avoids the warning.
Returning 'int' from time() is also a bit questionable, as it causes an
overflow in y2038 even on 64-bit architectures that use a 64-bit time_t
type. On 32-bit architecture with 64-bit time_t, time() should always
be implement by the C library by calling a (to be added) clock_gettime()
variant that takes a sufficiently wide argument.
Dave Airlie [Wed, 6 Dec 2017 20:27:13 +0000 (06:27 +1000)]
Merge branch 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
ttm and license fixes
* 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux:
drm/ttm: swap consecutive allocated pooled pages v4
drm/ttm: swap consecutive allocated cached pages v3
drm/ttm: roundup the shrink request to prevent skip huge pool
drm/ttm: add page order support in ttm_pages_put
drm/ttm: add set_pages_wb for handling page order more than zero
drm/ttm: add page order in page pool
drm/ttm: use NUM_PAGES_TO_ALLOC always
drm/amdgpu: add license to files where it was missing
drm/amdgpu: add license to Makefiles
net: thunderx: Fix TCP/UDP checksum offload for IPv4 pkts
Offload IP header checksum to NIC.
This fixes a previous patch which disabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting disabled for IPv4 pkts. And HW is dropping these pkts
for some reason.
Without this patch, IPv4 TSO appears to be broken:
WIthout this patch I get ~16kbyte/s, with patch close to 2mbyte/s
when copying files via scp from test box to my home workstation.
Looking at tcpdump on sender it looks like hardware drops IPv4 TSO skbs.
This patch restores performance for me, ipv6 looks good too.
Dave Martin [Wed, 6 Dec 2017 16:45:47 +0000 (16:45 +0000)]
arm64/sve: Avoid dereference of dead task_struct in KVM guest entry
When deciding whether to invalidate FPSIMD state cached in the cpu,
the backend function sve_flush_cpu_state() attempts to dereference
__this_cpu_read(fpsimd_last_state). However, this is not safe:
there is no guarantee that this task_struct pointer is still valid,
because the task could have exited in the meantime.
This means that we need another means to get the appropriate value
of TIF_SVE for the associated task.
This patch solves this issue by adding a cached copy of the TIF_SVE
flag in fpsimd_last_state, which we can check without dereferencing
the task pointer.
In particular, although this patch is not a KVM fix per se, this
means that this check is now done safely in the KVM world switch
path (which is currently the only user of this code).
Linus Torvalds [Wed, 6 Dec 2017 18:49:14 +0000 (10:49 -0800)]
Merge tag 'sound-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All fixes are small and for stable:
- a PCM ioctl race fix
- yet another USB-audio hardening for malicious descriptors
- Realtek ALC257 codec support"
* tag 'sound-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: prevent UAF in snd_pcm_info
ALSA: hda/realtek - New codec support for ALC257
ALSA: usb-audio: Add check return value for usb_string()
ALSA: usb-audio: Fix out-of-bound error
ALSA: seq: Remove spurious WARN_ON() at timer check
Colin Ian King [Wed, 6 Dec 2017 17:33:58 +0000 (17:33 +0000)]
x86: Fix Sparse warnings about non-static functions
Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common()
are local to the source and do not need to be in global scope, so make them
static.
Dave Young [Wed, 6 Dec 2017 09:50:10 +0000 (09:50 +0000)]
efi: Add comment to avoid future expanding of sysfs systab
/sys/firmware/efi/systab shows several different values, it breaks sysfs
one file one value design. But since there are already userspace tools
depend on it eg. kexec-tools so add code comment to alert future expanding
of this file.
Vincent Guittot [Thu, 16 Nov 2017 14:21:52 +0000 (15:21 +0100)]
sched/fair: Update and fix the runnable propagation rule
Unlike running, the runnable part can't be directly propagated through
the hierarchy when we migrate a task. The main reason is that runnable
time can be shared with other sched_entities that stay on the rq and
this runnable time will also remain on prev cfs_rq and must not be
removed.
Instead, we can estimate what should be the new runnable of the prev
cfs_rq and check that this estimation stay in a possible range. The
prop_runnable_sum is a good estimation when adding runnable_sum but
fails most often when we remove it. Instead, we could use the formula
below instead:
50816c48997a ("sched/wait: Standardize internal naming of wait-queue entries")
... unintentionally changed the behavior of add_wait_queue() from
inserting the wait entry at the head of the wait queue to the tail
of the wait queue.
Beyond a negative performance impact this change in behavior
theoretically also breaks wait queues which mix exclusive and
non-exclusive waiters, as non-exclusive waiters will not be
woken up if they are queued behind enough exclusive waiters.
Brendan Jackman [Wed, 6 Dec 2017 10:59:11 +0000 (10:59 +0000)]
cpu/hotplug: Fix state name in takedown_cpu() comment
CPUHP_AP_SCHED_MIGRATE_DYING doesn't exist, it looks like this was
supposed to refer to CPUHP_AP_SCHED_STARTING's teardown callback,
i.e. sched_cpu_dying().
Will Deacon [Wed, 6 Dec 2017 10:51:12 +0000 (10:51 +0000)]
arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
enter_lazy_tlb is called when a kernel thread rides on the back of
another mm, due to a context switch or an explicit call to unuse_mm
where a call to switch_mm is elided.
In these cases, it's important to keep the saved ttbr value up to date
with the active mm, otherwise we can end up with a stale value which
points to a potentially freed page table.
This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0
is kept up-to-date with the active mm for kernel threads.
Will Deacon [Wed, 6 Dec 2017 10:42:10 +0000 (10:42 +0000)]
arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper
contains kernel mappings and should never be installed into ttbr0. However,
this means that callers must avoid passing the init_mm to update_saved_ttbr0
which in turn can cause the saved ttbr0 value to be out-of-date in the context
of the idle thread. For example, EFI runtime services may leave the saved ttbr0
pointing at the EFI page table, and kernel threads may end up with stale
references to freed page tables.
This patch changes update_saved_ttbr0 so that the init_mm points the saved
ttbr0 value to the empty zero page, which always exists and never contains
valid translations. EFI and switch can then call into update_saved_ttbr0
unconditionally.
Dave Martin [Wed, 6 Dec 2017 16:45:46 +0000 (16:45 +0000)]
arm64: fpsimd: Abstract out binding of task's fpsimd context to the cpu.
There is currently some duplicate logic to associate current's
FPSIMD context with the cpu when loading FPSIMD state into the cpu
regs.
Subsequent patches will update that logic, so in order to ensure it
only needs to be done in one place, this patch factors the relevant
code out into a new function fpsimd_bind_to_cpu().
Dave Martin [Tue, 5 Dec 2017 14:56:42 +0000 (14:56 +0000)]
arm64: fpsimd: Prevent registers leaking from dead tasks
Currently, loading of a task's fpsimd state into the CPU registers
is skipped if that task's state is already present in the registers
of that CPU.
However, the code relies on the struct fpsimd_state * (and by
extension struct task_struct *) to unambiguously identify a task.
There is a particular case in which this doesn't work reliably:
when a task exits, its task_struct may be recycled to describe a
new task.
Consider the following scenario:
1) Task P loads its fpsimd state onto cpu C.
per_cpu(fpsimd_last_state, C) := P;
P->thread.fpsimd_state.cpu := C;
2) Task X is scheduled onto C and loads its fpsimd state on C.
per_cpu(fpsimd_last_state, C) := X;
X->thread.fpsimd_state.cpu := C;
3) X exits, causing X's task_struct to be freed.
4) P forks a new child T, which obtains X's recycled task_struct.
T == X.
T->thread.fpsimd_state.cpu == C (inherited from P).
5) T is scheduled on C.
T's fpsimd state is not loaded, because
per_cpu(fpsimd_last_state, C) == T (== X) &&
T->thread.fpsimd_state.cpu == C.
(This is the check performed by fpsimd_thread_switch().)
So, T gets X's registers because the last registers loaded onto C
were those of X, in (2).
This patch fixes the problem by ensuring that the sched-in check
fails in (5): fpsimd_flush_task_state(T) is called when T is
forked, so that T->thread.fpsimd_state.cpu == C cannot be true.
This relies on the fact that T is not schedulable until after
copy_thread() completes.
Once T's fpsimd state has been loaded on some CPU C there may still
be other cpus D for which per_cpu(fpsimd_last_state, D) ==
&X->thread.fpsimd_state. But D is necessarily != C in this case,
and the check in (5) must fail.
An alternative fix would be to do refcounting on task_struct. This
would result in each CPU holding a reference to the last task whose
fpsimd state was loaded there. It's not clear whether this is
preferable, and it involves higher overhead than the fix proposed
in this patch. It would also move all the task_struct freeing
work into the context switch critical section, or otherwise some
deferred cleanup mechanism would need to be introduced, neither of
which seems obviously justified.
Cc: <[email protected]> Fixes: 005f78cd8849 ("arm64: defer reloading a task's FPSIMD state to userland resume") Signed-off-by: Dave Martin <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]>
[will: word-smithed the comment so it makes more sense] Signed-off-by: Will Deacon <[email protected]>
Radim Krčmář [Thu, 30 Nov 2017 18:05:45 +0000 (19:05 +0100)]
KVM: x86: fix APIC page invalidation
Implementation of the unpinned APIC page didn't update the VMCS address
cache when invalidation was done through range mmu notifiers.
This became a problem when the page notifier was removed.
Re-introduce the arch-specific helper and call it from ...range_start.