Linus Torvalds [Thu, 24 Jan 2019 20:07:18 +0000 (09:07 +1300)]
Merge tag 'for-5.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM crypt's parsing of extended IV arguments.
- Fix DM thinp's discard passdown to properly account for extra
reference that is taken to guard against reallocating a block before
a discard has been issued.
- Fix bio-based DM's redundant IO accounting that was occurring for
bios that must be split due to the nature of the DM target (e.g.
dm-stripe, dm-thinp, etc).
* tag 'for-5.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: add missing trace_block_split() to __split_and_process_bio()
dm: fix dm_wq_work() to only use __split_and_process_bio() if appropriate
dm: fix redundant IO accounting for bios that need splitting
dm: fix clone_bio() to trigger blk_recount_segments()
dm thin: fix passdown_double_checking_shared_status()
dm crypt: fix parsing of extended IV arguments
Damien Le Moal [Thu, 24 Jan 2019 09:20:13 +0000 (18:20 +0900)]
uapi: fix ioctl documentation
The description of the BLKGETNRZONES zoned block device ioctl was not
added as a comment together with this ioctl definition in commit 65e4e3eee83d7 ("block: Introduce BLKGETNRZONES ioctl"). Add its
description here.
Bart Van Assche [Wed, 23 Jan 2019 19:05:57 +0000 (11:05 -0800)]
blk-wbt: Declare local functions static
This patch avoids that sparse reports the following warnings:
CHECK block/blk-wbt.c
block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
CC block/blk-wbt.o
block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
void wbt_issue(struct rq_qos *rqos, struct request *rq)
^~~~~~~~~
block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
void wbt_requeue(struct rq_qos *rqos, struct request *rq)
^~~~~~~~~~~
Linus Torvalds [Thu, 24 Jan 2019 16:59:22 +0000 (05:59 +1300)]
Merge tag 'ceph-for-5.0-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix for a potential use-after-free, a patch to close a (mostly
benign) race in the messenger and a licence clarification for quota.c"
* tag 'ceph-for-5.0-rc4' of git://github.com/ceph/ceph-client:
ceph: quota: cleanup license mess
libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
ceph: clear inode pointer when snap realm gets dropped by its inode
Linus Torvalds [Thu, 24 Jan 2019 16:55:26 +0000 (05:55 +1300)]
Merge tag 'sound-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A significant amount of fixes at this time, mostly for covering the
recent ASoC issues.
- Fixes for the missing ASoC driver initialization with non-deferred
probes; these triggered other problems in chain, which resulted in
yet more fix commits
- DaVinci runtime PM fix; the diff looks large but it's just a code
shuffling
- Various fixes for ASoC Intel drivers: a regression in HD-A HDMI,
Kconfig dependency, machine driver adjustments, PLL fix.
- Other ASoC driver-specific stuff including the trivial fixes caught
by static analysis
- Usual HD-audio quirks"
* tag 'sound-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
ALSA: hda - Add mute LED support for HP ProBook 470 G5
ASoC: amd: Fix potential NULL pointer dereference
ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
ASoC: rt5514-spi: Fix potential NULL pointer dereference
ASoC: dapm: change snprintf to scnprintf for possible overflow
ASoC: rt5682: Fix PLL source register definitions
ASoC: core: Don't defer probe on optional, NULL components
ASoC: core: Make snd_soc_find_component() more robust
ASoC: soc-core: fix init platform memory handling
ASoC: intel: skl: Fix display power regression
ALSA: hda/realtek - Fix typo for ALC225 model
ASoC: soc-core: Hold client_mutex around soc_init_dai_link()
ASoC: Intel: Boards: move the codec PLL configuration to _init
ASoC: soc-core: defer card probe until all component is added to list
ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
ASoC: ti: davinci-mcasp: Move context save/restore to runtime_pm callbacks
ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
ASoC: rt5682: Fix recording no sound issue
ASoC: Intel: atom: Make PCI dependency explicit
...
Thomas Gleixner [Thu, 17 Jan 2019 23:14:23 +0000 (00:14 +0100)]
smb3: Cleanup license mess
Precise and non-ambiguous license information is important. The recently
added aegis header file has a SPDX license identifier, which is nice, but
at the same time it has a contradictionary license boiler plate text.
SPDX-License-Identifier: GPL-2.0
versus
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
Oh well.
Assuming that the SPDX identifier is correct and according to x86/hyper-v
contributions from Microsoft GPL V2 only is the usual license.
Remove the boiler plate as it is wrong and even if correct it is redundant.
Pavel Shilovsky [Thu, 17 Jan 2019 16:21:24 +0000 (08:21 -0800)]
CIFS: Fix possible hang during async MTU reads and writes
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.
Was able to reproduce this when server was configured
to give out fewer credits than usual.
The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.
Colin Ian King [Wed, 16 Jan 2019 16:28:59 +0000 (16:28 +0000)]
cifs: fix memory leak of an allocated cifs_ntsd structure
The call to SMB2_queary_acl can allocate memory to pntsd and also
return a failure via a call to SMB2_query_acl (and then query_info).
This occurs when query_info allocates the structure and then in
query_info the call to smb2_validate_and_copy_iov fails. Currently the
failure just returns without kfree'ing pntsd hence causing a memory
leak.
Currently, *data is allocated if it's not already pointing to a buffer,
so it needs to be kfree'd only if was allocated in query_info, so the
fix adds an allocated flag to track this. Also set *dlen to zero on
an error just to be safe since *data is kfree'd.
Also set errno to -ENOMEM if the allocation of *data fails.
Deepa Dinamani [Thu, 24 Jan 2019 08:29:20 +0000 (00:29 -0800)]
Input: input_event - fix the CONFIG_SPARC64 mixup
Arnd Bergmann pointed out that CONFIG_* cannot be used in a uapi header.
Override with an equivalent conditional.
Fixes: 2e746942ebac ("Input: input_event - provide override for sparc64") Fixes: 152194fe9c3f ("Input: extend usable life of event timestamps to 2106 on 32 bit systems") Signed-off-by: Deepa Dinamani <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
Chris Wilson [Wed, 23 Jan 2019 13:51:55 +0000 (13:51 +0000)]
drm/i915/execlists: Mark up priority boost on preemption
Record the priority boost we giving to the preempted client or else we
may end up in a situation where the priority queue no longer matches the
request priority order and so we can end up in an infinite loop of
preempting the same pair of requests.
Hannes Reinecke [Wed, 9 Jan 2019 08:45:15 +0000 (09:45 +0100)]
nvme-multipath: drop optimization for static ANA group IDs
Bit 6 in the ANACAP field is used to indicate that the ANA group ID
doesn't change while the namespace is attached to the controller.
There is an optimisation in the code to only allocate space
for the ANA group header, as the namespace list won't change and
hence would not need to be refreshed.
However, this optimisation was never carried over to the actual
workflow, which always assumes that the buffer is large enough
to hold the ANA header _and_ the namespace list.
So drop this optimisation and always allocate enough space.
Raju Rangoju [Thu, 3 Jan 2019 17:35:31 +0000 (23:05 +0530)]
nvmet-rdma: fix null dereference under heavy load
Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.
To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()
Sagi Grimberg [Sat, 19 Jan 2019 00:43:24 +0000 (16:43 -0800)]
nvme-rdma: rework queue maps handling
If the device supports less queues than provided (if the device has less
completion vectors), we might hit a bug due to the fact that we ignore
that in nvme_rdma_map_queues (we override the maps nr_queues with user
opts).
Instead, keep track of how many default/read/poll queues we actually
allocated (rather than asked by the user) and use that to assign our
queue mappings.
Fixes: b65bb777ef22 (" nvme-rdma: support separate queue maps for read and write") Reported-by: Saleem, Shiraz <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Sagi Grimberg [Tue, 8 Jan 2019 09:01:30 +0000 (01:01 -0800)]
nvme-tcp: fix timeout handler
Currently, we have several problems with the timeout
handler:
1. If we timeout on the controller establishment flow, we will hang
because we don't execute the error recovery (and we shouldn't because
the create_ctrl flow needs to fail and cleanup on its own)
2. We might also hang if we get a disconnet on a queue while the
controller is already deleting. This racy flow can cause the controller
disable/shutdown admin command to hang.
We cannot complete a timed out request from the timeout handler without
mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
So we serialize it in the timeout handler and teardown io and admin
queues to guarantee that no one races with us from completing the
request.
Sagi Grimberg [Tue, 8 Jan 2019 08:53:22 +0000 (00:53 -0800)]
nvme-rdma: fix timeout handler
Currently, we have several problems with the timeout
handler:
1. If we timeout on the controller establishment flow, we will hang
because we don't execute the error recovery (and we shouldn't because
the create_ctrl flow needs to fail and cleanup on its own)
2. We might also hang if we get a disconnet on a queue while the
controller is already deleting. This racy flow can cause the controller
disable/shutdown admin command to hang.
We cannot complete a timed out request from the timeout handler without
mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
So we serialize it in the timeout handler and teardown io and admin
queues to guarantee that no one races with us from completing the
request.
Xen-swiotlb hooks into the arm/arm64 arch code through a copy of the DMA
DMA mapping operations stored in the struct device arch data.
Switching arm64 to use the direct calls for the merged DMA direct /
swiotlb code broke this scheme. Replace the indirect calls with
direct-calls in xen-swiotlb as well to fix this problem.
It turns out that my hope that we could just remove the code that
exposes the cache residency status from mincore() was too optimistic.
There are various random users that want it, and one example would be
the Netflix database cluster maintenance. To quote Josh Snyder:
"For Netflix, losing accurate information from the mincore syscall
would lengthen database cluster maintenance operations from days to
months. We rely on cross-process mincore to migrate the contents of a
page cache from machine to machine, and across reboots.
To do this, I wrote and maintain happycache [1], a page cache
dumper/loader tool. It is quite similar in architecture to pgfincore,
except that it is agnostic to workload. The gist of happycache's
operation is "produce a dump of residence status for each page, do
some operation, then reload exactly the same pages which were present
before." happycache is entirely dependent on accurate reporting of the
in-core status of file-backed pages, as accessed by another process.
We primarily use happycache with Cassandra, which (like Postgres +
pgfincore) relies heavily on OS page cache to reduce disk accesses.
Because our workloads never experience a cold page cache, we are able
to provision hardware for a peak utilization level that is far lower
than the hypothetical "every query is a cache miss" peak.
A database warmed by happycache can be ready for service in seconds
(bounded only by the performance of the drives and the I/O subsystem),
with no period of in-service degradation. By contrast, putting a
database in service without a page cache entails a potentially
unbounded period of degradation (at Netflix, the time to populate a
single node's cache via natural cache misses varies by workload from
hours to weeks). If a single node upgrade were to take weeks, then
upgrading an entire cluster would take months. Since we want to apply
security upgrades (and other things) on a somewhat tighter schedule,
we would have to develop more complex solutions to provide the same
functionality already provided by mincore.
At the bottom line, happycache is designed to benignly exploit the
same information leak documented in the paper [2]. I think it makes
perfect sense to remove cross-process mincore functionality from
unprivileged users, but not to remove it entirely"
We do have an alternate approach that limits the cache residency
reporting only to processes that have write permissions to the file, so
we can fix the original information leak issue that way. It involves
_adding_ code rather than removing it, which is sad, but hey, at least
we haven't found any users that would find the restrictions
unacceptable.
So revert the optimistic first approach to make room for that alternate
fix instead.
Linus Torvalds [Wed, 23 Jan 2019 20:00:19 +0000 (09:00 +1300)]
Merge tag 'for-linus-5.0' of git://github.com/cminyard/linux-ipmi
Pull IPMI fixes from Corey Minyard:
"I missed the merge window, which wasn't really important at the time
as there was nothing that critical that I had for 5.0.
However, I say that,and then a number of critical fixes come in:
- ipmi: fix use-after-free of user->release_barrier.rda
- ipmi: Prevent use-after-free in deliver_response
- ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
which are obvious candidates for 5.0. Then there is:
- ipmi:ssif: Fix handling of multi-part return messages
which is less critical, but it still has some off-by-one things that
are not great, so it seemed appropriate. Some machines are broken
without it. Then:
- ipmi: Don't initialize anything in the core until something uses it
It turns out that using SRCU causes large chunks of memory to be used
on big iron machines, even if IPMI is never used. This was causing
some issues for people on those machines.
Everything here is destined for stable"
* tag 'for-linus-5.0' of git://github.com/cminyard/linux-ipmi:
ipmi: Don't initialize anything in the core until something uses it
ipmi: fix use-after-free of user->release_barrier.rda
ipmi: Prevent use-after-free in deliver_response
ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
ipmi:ssif: Fix handling of multi-part return messages
Linus Torvalds [Wed, 23 Jan 2019 19:58:01 +0000 (08:58 +1300)]
Merge tag 's390-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- Do not claim to run under z/VM if the hypervisor can not be
identified
- Fix crashes due to outdated ASCEs in CR1
- Avoid a deadlock in regard to CPU hotplug
- Really fix the vdso mapping issue for compat tasks
- Avoid crash on restart due to an incorrect stack address
* tag 's390-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
s390/vdso: correct vdso mapping for compat tasks
s390/smp: fix CPU hotplug deadlock with CPU rescan
s390/mm: always force a load of the primary ASCE on context switch
s390/early: improve machine detection
Memory state around the buggy address: ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^ ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Edward Cree [Tue, 22 Jan 2019 19:02:17 +0000 (19:02 +0000)]
sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
Use a bitmap to keep track of which partition types we've already seen;
for duplicates, return -EEXIST from efx_ef10_mtd_probe_partition() and
thus skip adding that partition.
Duplicate partitions occur because of the A/B backup scheme used by newer
sfc NICs. Prior to this patch they cause sysfs_warn_dup errors because
they have the same name, causing us not to expose any MTDs at all.
Haiyang Zhang [Tue, 15 Jan 2019 00:51:44 +0000 (00:51 +0000)]
hv_netvsc: Fix hash key value reset after other ops
Changing mtu, channels, or buffer sizes ops call to netvsc_attach(),
rndis_set_subchannel(), which always reset the hash key to default
value. That will override hash key changed previously. This patch
fixes the problem by save the hash key, then restore it when we re-
add the netvsc device.
Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table") Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Michael Kelley <[email protected]>
[sl: fix up subject line] Signed-off-by: Sasha Levin <[email protected]>
Haiyang Zhang [Tue, 15 Jan 2019 00:51:43 +0000 (00:51 +0000)]
hv_netvsc: Refactor assignments of struct netvsc_device_info
These assignments occur in multiple places. The patch refactor them
to a function for simplicity. It also puts the struct to heap area
for future expension.
Simon Horman [Wed, 23 Jan 2019 11:14:52 +0000 (12:14 +0100)]
ravb: expand rx descriptor data to accommodate hw checksum
EtherAVB may provide a checksum of packet data appended to packet data. In
order to allow this checksum to be received by the host descriptor data
needs to be enlarged by 2 bytes to accommodate the checksum.
In the case of MTU-sized packets without a VLAN tag the
checksum were already accommodated by virtue of the space reserved for the
VLAN tag. However, a packet of MTU-size with a VLAN tag consumed all
packet data space provided by a descriptor leaving no space for the
trailing checksum.
This was not detected by the driver which incorrectly used the last two
bytes of packet data as the checksum and truncate the packet by two bytes.
This resulted all such packets being dropped.
A work around is to disable RX checksum offload
# ethtool -K eth0 rx off
This patch resolves this problem by increasing the size available for
packet data in RX descriptors by two bytes.
Tested on R-Car E3 (r8a77990) ES1.0 based Ebisu-4D board
v2
* Use sizeof(__sum16) directly rather than adding a driver-local
#define for the size of the checksum provided by the hw (2 bytes).
Corey Minyard [Thu, 20 Dec 2018 22:50:23 +0000 (16:50 -0600)]
ipmi: Don't initialize anything in the core until something uses it
The IPMI driver was recently modified to use SRCU, but it turns out
this uses a chunk of percpu memory, even if IPMI is never used.
So modify thing to on initialize on the first use. There was already
code to sort of handle this for handling init races, so piggy back
on top of that, and simplify it in the process.
Because the user->release_barrier.rda is freed in ipmi_destroy_user(), but
the refcount is not zero, when acquire_ipmi_user() uses user->release_barrier.rda
in __srcu_read_lock(), it causes oops.
Fix this by calling cleanup_srcu_struct() when the refcount is zero.
Fred Klassen [Sat, 19 Jan 2019 22:28:18 +0000 (14:28 -0800)]
ipmi: Prevent use-after-free in deliver_response
Some IPMI modules (e.g. ibmpex_msg_handler()) will have ipmi_usr_hdlr
handlers that call ipmi_free_recv_msg() directly. This will essentially
kfree(msg), leading to use-after-free.
This does not happen in the ipmi_devintf module, which will queue the
message and run ipmi_free_recv_msg() later.
BUG: KASAN: use-after-free in deliver_response+0x12f/0x1b0
Read of size 8 at addr ffff888a7bf20018 by task ksoftirqd/3/27
CPU: 3 PID: 27 Comm: ksoftirqd/3 Tainted: G O 4.19.11-amd64-ani99-debug #12.0.1.601133+pv
Hardware name: AppNeta r1000/X11SPW-TF, BIOS 2.1a-AP 09/17/2018
Call Trace:
dump_stack+0x92/0xeb
print_address_description+0x73/0x290
kasan_report+0x258/0x380
deliver_response+0x12f/0x1b0
? ipmi_free_recv_msg+0x50/0x50
deliver_local_response+0xe/0x50
handle_one_recv_msg+0x37a/0x21d0
handle_new_recv_msgs+0x1ce/0x440
...
Allocated by task 9885:
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc_trace+0x116/0x290
ipmi_alloc_recv_msg+0x28/0x70
i_ipmi_request+0xb4a/0x1640
ipmi_request_settime+0x1b8/0x1e0
...
Fix this by sanitizing channel and addr->channel before using them to
index user->intf->addrinfo and intf->addrinfo, correspondingly.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
Using the {0} construct as a generic initializer is perfectly fine in C,
however due to a bug in old gcc there is a warning:
+ /kisskb/src/drivers/vfio/pci/vfio_pci_nvlink2.c: warning: (near
initialization for 'cap.header') [-Wmissing-braces]: => 181:9
Since for whatever reason we still want to compile the modern kernel
with such an old gcc without warnings, this changes the capabilities
initialization.
The gcc bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
Weinan Li [Tue, 22 Jan 2019 05:46:27 +0000 (13:46 +0800)]
drm/i915/gvt: release shadow batch buffer and wa_ctx before destroy one workload
GVT-g will shadow the privilege batch buffer and the indirect context
during command scan, move the release process into
intel_vgpu_destroy_workload() to ensure the resources are recycled
properly.
Fixes: 0cce2823ed37 ("drm/i915/gvt/kvmgt:Refine error handling for prepare_execlist_workload") Reviewed-by: Zhenyu Wang <[email protected]> Signed-off-by: Weinan Li <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
Andrew Lunn [Mon, 21 Jan 2019 18:08:49 +0000 (19:08 +0100)]
net: phy: Fixup GPLv2+ SPDX tags based on license text
A few PHY drivers have the GPLv2+ license text. They then either have
a MODULE_LICENSE() of GPLv2 only, or an SPDX tag of GPLv2 only.
Since the license text is much easier to understand than either the
SPDX tag or the MODULE_LICENSE, use it as the definitive source of the
licence, and fixup the others when there are contradictions.
Stefan Agner [Mon, 21 Jan 2019 14:58:47 +0000 (15:58 +0100)]
net: fec: get regulator optional
According to the device tree binding the phy-supply property is
optional. Use the regulator_get_optional API accordingly. The
code already handles NULL just fine.
This gets rid of the following warning:
fec 2188000.ethernet: 2188000.ethernet supply phy not found, using dummy regulator
Lubomir Rintel [Mon, 21 Jan 2019 13:54:20 +0000 (14:54 +0100)]
net/ipv6: lower the level of "link is not ready" messages
This message gets logged far too often for how interesting is it.
Most distributions nowadays configure NetworkManager to use randomly
generated MAC addresses for Wi-Fi network scans. The interfaces end up
being periodically brought down for the address change. When they're
subsequently brought back up, the message is logged, eventually flooding
the log.
Perhaps the message is not all that helpful: it seems to be more
interesting to hear when the addrconf actually start, not when it does
not. Let's lower its level.
Marc Gonzalez [Tue, 22 Jan 2019 17:29:22 +0000 (18:29 +0100)]
scsi: ufs: Use explicit access size in ufshcd_dump_regs
memcpy_fromio() doesn't provide any control over access size. For example,
on arm64, it is implemented using readb and readq. This may trigger a
synchronous external abort:
Ewan D. Milne [Thu, 17 Jan 2019 16:14:45 +0000 (11:14 -0500)]
scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport
We cannot wait on a completion object in the lpfc_nvme_targetport structure
in the _destroy_targetport() code path because the NVMe/fc transport will
free that structure immediately after the .targetport_delete() callback.
This results in a use-after-free, and a hang if slub_debug=FZPU is enabled.
Ewan D. Milne [Thu, 17 Jan 2019 16:14:44 +0000 (11:14 -0500)]
scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport
We cannot wait on a completion object in the lpfc_nvme_lport structure in
the _destroy_localport() code path because the NVMe/fc transport will free
that structure immediately after the .localport_delete() callback. This
results in a use-after-free, and a hang if slub_debug=FZPU is enabled.
scsi: communicate max segment size to the DMA mapping code
When a host driver sets a maximum segment size we should not only propagate
that setting to the block layer, which can merge segments, but also to the
DMA mapping layer which can merge segments as well.
Fixes: 50c2e9107f ("scsi: introduce a max_segment_size host_template parameters") Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Yangbo Lu [Mon, 21 Jan 2019 06:26:37 +0000 (14:26 +0800)]
net: dpaa2: improve PTP Kconfig option
Converted to use "imply" instead of "select" for PTP_1588_CLOCK
driver selecting. This could break the hard dependency between
the PTP clock subsystem and ethernet drivers.
This patch also set "default y" for dpaa2 ptp driver building to
provide user an available ptp clock in default.
David S. Miller [Wed, 23 Jan 2019 01:30:39 +0000 (17:30 -0800)]
Merge branch 'qed-Error-recovery-process'
Michal Kalderon says:
====================
qed*: Error recovery process
Parity errors might happen in the device's memories due to momentary bit
flips which are caused by radiation.
Errors that are not correctable initiate a process kill event, which blocks
the device access towards the host and the network, and a recovery process
is started in the management FW and in the driver.
This series adds the support of this process in the qed core module and in
the qede driver (patches 2 & 3).
Patch 1 in the series revises the load sequence, to avoid PCI errors that
might be observed during a recovery process.
====================
Tomer Tayar [Sun, 20 Jan 2019 09:36:39 +0000 (11:36 +0200)]
qede: Error recovery process
This patch adds the error recovery process in the qede driver.
The process includes a partial/customized driver unload and load, which
allows it to look like a short suspend period to the kernel while
preserving the net devices' state.
Tomer Tayar [Sun, 20 Jan 2019 09:36:38 +0000 (11:36 +0200)]
qed: Add infrastructure for error detection and recovery
This patch adds the detection and handling of a parity error ("process kill
event"), including the update of the protocol drivers, and the prevention
of any HW access that will lead to device access towards the host while
recovery is in progress.
It also provides the means for the protocol drivers to trigger a recovery
process on their decision.
Tomer Tayar [Sun, 20 Jan 2019 09:36:37 +0000 (11:36 +0200)]
qed: Revise load sequence to avoid PCI errors
Initiating final cleanup after an ungraceful driver unload can lead to bad
PCI accesses towards the host.
This patch revises the load sequence so final cleanup is sent while the
internal master enable is cleared, to prevent the host accesses, and clears
the internal error indications just before enabling the internal master
enable.
Jakub Kicinski [Tue, 22 Jan 2019 22:47:19 +0000 (14:47 -0800)]
net/ipv6: don't return positive numbers when nothing was dumped
in6_dump_addrs() returns a positive 1 if there was nothing to dump.
This return value can not be passed as return from inet6_dump_addr()
as is, because it will confuse rtnetlink, resulting in NLMSG_DONE
never getting set:
$ ip addr list dev lo
EOF on netlink
Dump terminated
Linus Torvalds [Wed, 23 Jan 2019 01:02:14 +0000 (14:02 +1300)]
Merge tag 'linux-kselftest-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes to rtc, seccomp and other tests"
* tag 'linux-kselftest-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/seccomp: Abort without user notification support
selftests: gpio-mockup-chardev: Check asprintf() for error
selftests: seccomp: use LDLIBS instead of LDFLAGS
selftests/vm/gup_benchmark.c: match gup struct to kernel
tools/testing/selftests/x86/unwind_vdso.c: Remove duplicate header
x86/mpx/selftests: fix spelling mistake "succeded" -> "succeeded"
selftests: rtc: rtctest: add alarm test on minute boundary
selftests: rtc: rtctest: fix alarm tests
Tejun Heo [Tue, 12 Dec 2017 16:38:30 +0000 (08:38 -0800)]
writeback: synchronize sync(2) against cgroup writeback membership switches
sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes. For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.
This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.
v2: Fixed misplaced rwsem init. Spotted by Jiufei.
Lorenzo Bianconi [Fri, 18 Jan 2019 11:05:39 +0000 (12:05 +0100)]
net: ip_gre: use erspan key field for tunnel lookup
Use ERSPAN key header field as tunnel key in gre_parse_header routine
since ERSPAN protocol sets the key field of the external GRE header to
0 resulting in a tunnel lookup fail in ip6gre_err.
In addition remove key field parsing and pskb_may_pull check in
erspan_rcv and ip6erspan_rcv
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Thomas Gleixner [Fri, 18 Jan 2019 10:49:58 +0000 (11:49 +0100)]
net: sun: cassini: Cleanup license conflict
The recent addition of SPDX license identifiers to the files in
drivers/net/ethernet/sun created a licensing conflict.
The cassini driver files contain a proper license notice:
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of the
* License, or (at your option) any later version.
but the SPDX change added:
SPDX-License-Identifier: GPL-2.0
So the file got tagged GPL v2 only while in fact it is licensed under GPL
v2 or later.
It's nice that people care about the SPDX tags, but they need to be more
careful about it. Not everything under (the) sun belongs to ...
Fix up the SPDX identifier and remove the boiler plate text as it is
redundant.
Linus Torvalds [Tue, 22 Jan 2019 18:16:05 +0000 (07:16 +1300)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- descriptor parsing regression fix for devices that have more than 16
collections, from Peter Hutterer (and followup cleanup from Philipp
Zabel)
- quirk for Goodix touchpad
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: core: simplify active collection tracking
HID: i2c-hid: Disable runtime PM on Goodix touchpad
HID: core: replace the collection tree pointers with indices
Thomas Gleixner [Thu, 17 Jan 2019 23:14:25 +0000 (00:14 +0100)]
vfio/pci: Cleanup license mess
The recently added nvlink2 VFIO driver introduced a license conflict in two
files. In both cases the SPDX license identifier is:
SPDX-License-Identifier: GPL-2.0+
but the files contain also the following license boiler plate text:
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation
The latter is GPL-2.9-only and not GPL-2.0=.
Looking deeper. The nvlink source file is derived from vfio_pci_igd.c which
is also licensed under GPL-2.0-only and it can be assumed that the file was
copied and modified. As the original file is licensed GPL-2.0-only it's not
possible to relicense derivative work to GPL-2.0-or-later.
Fix the SPDX identifier and remove the boiler plate as it is redundant.
Ming Lei [Tue, 22 Jan 2019 08:20:17 +0000 (16:20 +0800)]
block: cover another queue enter recursion via BIO_QUEUE_ENTERED
Except for blk_queue_split(), bio_split() is used for splitting bio too,
then the remained bio is often resubmit to queue via generic_make_request().
So the same queue enter recursion exits in this case too. Unfortunatley
commit cd4a4ae4683dc2 doesn't help this case.
This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
q->make_request_fn.
In theory the per-bio flag is used to simulate one stack variable, it is
just fine to clear it after q->make_request_fn is returned. Especially
the same bio can't be submitted from another context.
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently
not allowed to create blocks for an empty inode. This confusion comes
from trying to bit shift a negative number, so check the size of the
inode first.
The problem is most visible for hfsplus, because the fallback to
buffered I/O doesn't happen and the write fails with EIO. This is in
part the fault of the module, because it gives a wrong return value on
->get_block(); that will be fixed in a separate patch.
In a previous commit we switched from a d_alloc_name() + d_lookup()
combination to setup a new dentry and find potential duplicates to the more
idiomatic lookup_one_len(). As far as I understand, this also means we need
to switch from d_add() to d_instantiate() since lookup_one_len() will
create a new dentry when it doesn't find an existing one and add the new
dentry to the hash queues. So we only need to call d_instantiate() to
connect the dentry to the inode and turn it into a positive dentry.
If we were to use d_add() we sure see stack traces like the following
indicating that adding the same dentry twice over the same inode:
The binderfs_binder_ctl_create() call is a no-op on subsequent calls and
the first call is done before we unlock the suberblock. Hence, there is no
need to take inode_lock() in there. Let's remove it.
Al pointed out that first calling kill_litter_super() before cleaning up
info is more correct since destroying info doesn't depend on the state of
the dentries and inodes. That the opposite remains true is not guaranteed.
- switch from d_alloc_name() + d_lookup() to lookup_one_len():
Instead of using d_alloc_name() and then doing a d_lookup() with the
allocated dentry to find whether a device with the name we're trying to
create already exists switch to using lookup_one_len(). The latter will
either return the existing dentry or a new one.
- switch from kmalloc() + strscpy() to kmemdup():
Use a more idiomatic way to copy the name for the new dentry that
userspace gave us.
Al pointed out that on binderfs_fill_super() error
deactivate_locked_super() will call binderfs_kill_super() so all of the
freeing and putting we currently do in binderfs_fill_super() is unnecessary
and buggy. Let's simply return errors and let binderfs_fill_super() take
care of cleaning up on error.
- make binderfs control dentry immutable:
We don't allow to unlink it since it is crucial for binderfs to be
useable but if we allow to rename it we make the unlink trivial to
bypass. So prevent renaming too and simply treat the control dentry as
immutable.
- add is_binderfs_control_device() helper:
Take the opportunity and turn the check for the control dentry into a
separate helper is_binderfs_control_device() since it's now used in two
places.
- simplify binderfs_rename():
Instead of hand-rolling our custom version of simple_rename() just dumb
the whole function down to first check whether we're trying to rename the
control dentry. If we do EPERM the caller and if not call simple_rename().
We allow more then 255 binderfs binder devices to be created since there
are workloads that require more than that. If we use __u8 we'll overflow
after 255. So let's use a __u32.
Note that there's no released kernel with binderfs out there so this is
not a regression.
Liam Mark [Fri, 18 Jan 2019 18:37:44 +0000 (10:37 -0800)]
staging: android: ion: Support cpu access during dma_buf_detach
Often userspace doesn't know when the kernel will be calling dma_buf_detach
on the buffer.
If userpace starts its CPU access at the same time as the sg list is being
freed it could end up accessing the sg list after it has been freed.
Thread A Thread B
- DMA_BUF_IOCTL_SYNC IOCT
- ion_dma_buf_begin_cpu_access
- list_for_each_entry
- ion_dma_buf_detatch
- free_duped_table
- dma_sync_sg_for_cpu
Fix this by getting the ion_buffer lock before freeing the sg table memory.
Fixes: 2a55e7b5e544 ("staging: android: ion: Call dma_map_sg for syncing and mapping") Signed-off-by: Liam Mark <[email protected]> Acked-by: Laura Abbott <[email protected]> Acked-by: Andrew F. Davis <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Uwe Kleine-König [Fri, 11 Jan 2019 11:20:41 +0000 (12:20 +0100)]
can: flexcan: fix NULL pointer exception during bringup
Commit cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX")
introduced a loop letting i run up to (including) ARRAY_SIZE(regs->mb)
and in the body accessed regs->mb[i] which is an out-of-bounds array
access that then resulted in an access to an reserved register area.
Later this was changed by commit 0517961ccdf1 ("can: flexcan: Add
provision for variable payload size") to iterate a bit differently but
still runs one iteration too much resulting to call
flexcan_get_mb(priv, priv->mb_count)
which results in a WARN_ON and then a NULL pointer exception. This
only affects devices compatible with "fsl,p1010-flexcan",
"fsl,imx53-flexcan", "fsl,imx35-flexcan", "fsl,imx25-flexcan",
"fsl,imx28-flexcan", so newer i.MX SoCs are not affected.
Fixes: cbffaf7aa09e ("can: flexcan: Always use last mailbox for TX") Signed-off-by: Uwe Kleine-König <[email protected]> Cc: linux-stable <[email protected]> # >= 4.20 Signed-off-by: Marc Kleine-Budde <[email protected]>
Oliver Hartkopp [Sun, 13 Jan 2019 18:31:43 +0000 (19:31 +0100)]
can: bcm: check timer values before ktime conversion
Kyungtae Kim detected a potential integer overflow in bcm_[rx|tx]_setup()
when the conversion into ktime multiplies the given value with NSEC_PER_USEC
(1000).
Add a check for the given tv_usec, so that the value stays below one second.
Additionally limit the tv_sec value to a reasonable value for CAN related
use-cases of 400 days and ensure all values to be positive.
Manfred Schlaegl [Wed, 19 Dec 2018 18:39:58 +0000 (19:39 +0100)]
can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
This patch revert commit 7da11ba5c506
("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb")
After introduction of this change we encountered following new error
message on various i.MX plattforms (flexcan):
| flexcan 53fc8000.can can0: __can_get_echo_skb: BUG! Trying to echo non
| existing skb: can_priv::echo_skb[0]
The introduction of the message was a mistake because
priv->echo_skb[idx] = NULL is a perfectly valid in following case: If
CAN_RAW_LOOPBACK is disabled (setsockopt) in applications, the pkt_type
of the tx skb's given to can_put_echo_skb is set to PACKET_LOOPBACK. In
this case can_put_echo_skb will not set priv->echo_skb[idx]. It is
therefore kept NULL.
As additional argument for revert: The order of check and usage of idx
was changed. idx is used to access an array element before checking it's
boundaries.
Signed-off-by: Manfred Schlaegl <[email protected]> Fixes: 7da11ba5c506 ("can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb") Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
Priit Laes [Tue, 22 Jan 2019 07:32:32 +0000 (09:32 +0200)]
drm/sun4i: hdmi: Fix usage of TMDS clock
Although TMDS clock is required for HDMI to properly function,
nobody called clk_prepare_enable(). This fixes reference counting
issues and makes sure clock is running when it needs to be running.
Due to TDMS clock being parent clock for DDC clock, TDMS clock
was turned on/off for each EDID probe, causing spurious failures
for certain HDMI/DVI screens.
Linus Torvalds [Mon, 21 Jan 2019 18:27:17 +0000 (07:27 +1300)]
Merge tag 'iommu-fixes-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"One fix only for now: Fix probe deferral in iommu/of code (broke with
recent changes to iommu_ops->add_device invocation)"
* tag 'iommu-fixes-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/of: Fix probe-deferral
Dan Williams [Sat, 19 Jan 2019 18:55:04 +0000 (10:55 -0800)]
acpi/nfit: Fix command-supported detection
The _DSM function number validation only happens to succeed when the
generic Linux command number translation corresponds with a
DSM-family-specific function number. This breaks NVDIMM-N
implementations that correctly implement _LSR, _LSW, and _LSI, but do
not happen to publish support for DSM function numbers 4, 5, and 6.
Recall that the support for _LS{I,R,W} family of methods results in the
DIMM being marked as supporting those command numbers at
acpi_nfit_register_dimms() time. The DSM function mask is only used for
ND_CMD_CALL support of non-NVDIMM_FAMILY_INTEL devices.
Dan Williams [Sat, 19 Jan 2019 16:45:56 +0000 (08:45 -0800)]
libnvdimm/security: Require nvdimm_security_setup_events() to succeed
The following warning:
ACPI0012:00: security event setup failed: -19
...is meant to capture exceptional failures of sysfs_get_dirent(),
however it will also fail in the common case when security support is
disabled. A few issues:
1/ A dev_warn() report for a common case is too chatty
2/ The setup of this notifier is generic, no need for it to be driven
from the nfit driver, it can exist completely in the core.
3/ If it fails for any reason besides security support being disabled,
that's fatal and should abort DIMM activation. Userspace may hang if
it never gets overwrite notifications.
4/ The dirent needs to be released.
Move the call to the core 'dimm' driver, make it conditional on security
support being active, make it fatal for the exceptional case, add the
missing sysfs_put() at device disable time.
Fixes: 7d988097c546 ("...Add security DSM overwrite support") Reviewed-by: Dave Jiang <[email protected]> Signed-off-by: Dan Williams <[email protected]>
Dave Jiang [Tue, 15 Jan 2019 01:41:04 +0000 (18:41 -0700)]
nfit_test: fix security state pull for nvdimm security nfit_test
The override status function needs to be updated to use the proper
request parameter in order to get the security state.
Fixes: 3c13e2ac747a ("...Add test support for Intel nvdimm security DSMs") Reported-by: Vishal Verma <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Signed-off-by: Dan Williams <[email protected]>
tty: Handle problem if line discipline does not have receive_buf
Some tty line disciplines do not have a receive buf callback, so
properly check for that before calling it. If they do not have this
callback, just eat the character quietly, as we can't fail this call.
Mike Snitzer [Thu, 17 Jan 2019 15:48:01 +0000 (10:48 -0500)]
dm: fix redundant IO accounting for bios that need splitting
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da84354 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
Mike Snitzer [Wed, 16 Jan 2019 23:53:26 +0000 (18:53 -0500)]
dm: fix clone_bio() to trigger blk_recount_segments()
DM's clone_bio() now benefits from using bio_trim() by fixing the fact
that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
which triggers blk_recount_segments() via bio_phys_segments().