Dave Airlie [Wed, 12 Apr 2017 23:56:05 +0000 (09:56 +1000)]
Merge branch 'linux-4.11' of git://github.com/skeggsb/linux into drm-fixes
GP107 modesetting support (just recognising the chipset, no other changes until 4.12)
a couple of regression fixes, one of them a rather serious double-free issue that appeared in 4.10.
* 'linux-4.11' of git://github.com/skeggsb/linux:
drm/nouveau: initial support (display-only) for GP107
drm/nouveau/kms/nv50: fix double dma_fence_put() when destroying plane state
drm/nouveau/kms/nv50: fix setting of HeadSetRasterVertBlankDmi method
drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
drm/nouveau/mpeg: mthd returns true on success now
Dave Airlie [Wed, 12 Apr 2017 23:13:04 +0000 (09:13 +1000)]
Merge tag 'drm-intel-fixes-2017-04-12' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
drm/i915 fixes for v4.11-rc7
one rcu related fix, and a few GVT fixes.
* tag 'drm-intel-fixes-2017-04-12' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Don't call synchronize_rcu_expedited under struct_mutex
drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
drm/i915/gvt: set the correct default value of CTX STATUS PTR
drm/i915/gvt: Fix firmware loading interface for GVT-g golden HW state
drm/i915: Use a dummy timeline name for a signaled fence
drm/i915: Ironlake do_idle_maps w/a may be called w/o struct_mutex
drm/i915/gvt: remove the redundant info NULL check
drm/i915/gvt: adjust mem size for low resolution type
drm/i915: Avoid lock dropping between rescheduling
drm/i915/gvt: exclude cfg space from failsafe mode
drm/i915/gvt: Activate/de-activate vGPU in mdev ops.
drm/i915/execlists: Wrap tail pointer after reset tweaking
drm/i915/perf: remove user triggerable warn
drm/i915/perf: destroy stream on sample_flags mismatch
drm/i915: Align "unfenced" tiled access on gen2, early gen3
Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.
Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.
Inserting a page table entry may trigger an allocation while we are
holding a read lock to keep the device instance alive for the duration
of the fault. Use srcu for this keep-alive protection.
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Cc: <[email protected]> Signed-off-by: Dan Williams <[email protected]>
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:
This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.
Fixes: 6a923934c33 (Revert "ipv6: Revert optional address flusing on ifdown.") Signed-off-by: Rabin Vincent <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore:
"One more small audit fix, this should be the last for v4.11.
Seth Forshee noticed a problem where the audit retry queue wasn't
being flushed properly when audit was enabled and the audit daemon
wasn't running; this patches fixes the problem (see the commit
description for more details on the change).
Both Seth and I have tested this and everything looks good"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: make sure we don't let the retry queue grow without bounds
- Fix iscsi-target + iser-target queue-full handling in order to
support iw_cxgb4 RNICs. (Potnuri Bharat Teja + Sagi Grimberg)
- Fix ALUA transition state race between multiple initiator (Mike
Christie)
- Drop work-around for legacy GlobalSAN initiator, to allow QLogic
57840S + 579xx offload HBAs to work out-of-the-box in MSFT
environments. (Martin Svec + Arun Easi)
Note that a number are CC'ed for stable, and although the queue-full
bug-fixes required for iser-target to work with iw_cxgb4 aren't CC'ed
here, they'll be posted to Greg-KH separately"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
iscsi-target: Drop work-around for legacy GlobalSAN initiator
target: Fix ALUA transition state race between multiple initiators
iser-target: avoid posting a recv buffer twice
iser-target: Fix queue-full response handling
iscsi-target: Propigate queue_data_in + queue_status errors
target: Fix unknown fabric callback queue-full errors
tcmu: Fix wrongly calculating of the base_command_size
tcmu: Fix possible overwrite of t_data_sg's last iov[]
target: Avoid mappedlun symlink creation during lun shutdown
iscsi-target: Fix TMR reference leak during session shutdown
usb: gadget: Correct usb EP argument for BOT status request
tcmu: Allow cmd_time_out to be set to zero (disabled)
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This contains fixes for two long standing subtle bugs:
- kthread_bind() on a new kthread binds it to specific CPUs and
prevents userland from messing with the affinity or cgroup
membership. Unfortunately, for cgroup membership, there's a window
between kthread creation and kthread_bind*() invocation where the
kthread can be moved into a non-root cgroup by userland.
Depending on what controllers are in effect, this can assign the
kthread unexpected attributes. For example, in the reported case,
workqueue workers ended up in a non-root cpuset cgroups and had
their CPU affinities overridden. This broke workqueue invariants
and led to workqueue stalls.
Fixed by closing the window between kthread creation and
kthread_bind() as suggested by Oleg.
- There was a bug in cgroup mount path which could allow two
competing mount attempts to attach the same cgroup_root to two
different superblocks.
This was caused by mishandling return value from kernfs_pin_sb().
Fixed"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: avoid attaching a cgroup root to two different superblocks
cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Two libata fixes.
One to disable hotplug on VT6420 which never worked properly. The
other reverts an earlier patch which disabled the second port on
SB600/700. There were some confusions due to earlier datasheets which
incorrectly indicated that the second port is not implemented on both
SB600 and 700"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
sata_via: Enable hotplug only on VT6421
Revert "pata_atiixp: Don't use unconnected secondary port on SB600/SB700"
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- revert of a commit that switched all Synaptics touchpads over to be
driven by hid-rmi. It turns out that this caused several user-visible
regressions, and therefore we revert back to the original state
before all the reported issues have been fixed.
- a new uclogic device ID addition, from Xiaolei Yu.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
Revert "HID: rmi: Handle all Synaptics touchpads using hid-rmi"
HID: uclogic: add support for Ugee Tablet EX07S
The problem is that the bridge's VLAN group is created after setting the
default PVID, when registering the netdevice and executing its
ndo_init().
Fix this by changing the order of both operations, so that
br_changelink() is only processed after the netdevice is registered,
when the VLAN group is already initialized.
While the bridge driver implements an ndo_init(), it was missing a
symmetric ndo_uninit(), causing the different de-initialization
operations to be scattered around its dellink() and destructor().
Implement a symmetric ndo_uninit() and remove the overlapping operations
from its dellink() and destructor().
This is a prerequisite for the next patch, as it allows us to have a
proper cleanup upon changelink() failure during the bridge's newlink().
Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION
On a dual controller setup with multipath enabled, some MEDIUM ERRORs
caused both paths to be failed, thus I/O got queued/blocked since the
'queue_if_no_path' feature is enabled by default on IPR controllers.
This example disabled 'queue_if_no_path' so the I/O failure is seen at
the sg_dd program. Notice that after the sg_dd test-case, both paths
are in 'failed' state, and both path/priority groups are in 'enabled'
state (not 'active') -- which would block I/O with 'queue_if_no_path'.
This is not the desired behavior. The dm-multipath explicitly checks
for the MEDIUM ERROR case (and a few others) so not to fail the path
(e.g., I/O to other sectors could potentially happen without problems).
See dm-mpath.c :: do_end_io_bio() -> noretry_error() !->! fail_path().
The problem trace is:
1) ipr_scsi_done() // SENSE KEY/CHECK CONDITION detected, go to..
2) ipr_erp_start() // ipr_is_gscsi() and masked_ioasc OK, go to..
3) ipr_gen_sense() // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
// so set DID_PASSTHROUGH.
4) scsi_decide_disposition() // check for DID_PASSTHROUGH and return
// early on, faking a DID_OK.. *instead*
// of reaching scsi_check_sense().
// Had it reached the latter, that would
// set host_byte to DID_MEDIUM_ERROR.
5) scsi_finish_command()
6) scsi_io_completion()
7) __scsi_error_from_host_byte() // That would be converted to -ENODATA
<...>
8) dm_softirq_done()
9) multipath_end_io()
10) do_end_io()
11) noretry_error() // And that is checked in dm-mpath :: noretry_error()
// which would cause fail_path() not to be called.
With this patch applied, the I/O is failed but the paths are not. This
multipath device continues accepting more I/O requests without blocking.
(and notice the different host byte/driver byte handling per SCSI layer).
# dmesg
[...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
[...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
[...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
[...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
[...] blk_update_request: critical medium error, dev sdaf, sector 0
[...] blk_update_request: critical medium error, dev dm-6, sector 0
[...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
[...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
[...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
[...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
[...] blk_update_request: critical medium error, dev sdaf, sector 0
[...] blk_update_request: critical medium error, dev dm-6, sector 0
[...] Buffer I/O error on dev dm-6, logical block 0, async page read
During a PCI error recovery, if aac_check_health() is not aware that a
PCI error happened and we have an offline PCI channel, it might trigger
some errors (like NULL pointer dereference) and inhibit the error
recovery process to complete.
This patch makes the health check procedure aware of PCI channel issues,
and in case of error recovery process, the function
aac_adapter_check_health() returns -1 and let the recovery process to
complete successfully. This patch was tested on upstream kernel
v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
(VID:DID) - the error recovery procedure was able to recover fine.
Liu Bo [Mon, 10 Apr 2017 19:36:26 +0000 (12:36 -0700)]
Btrfs: fix potential use-after-free for cloned bio
KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.
If we need to submit IOs to several disks at a time, the original bio
would get cloned and mapped to the destination disk, but we really should
use the original bio instead of a cloned bio to do the sanity check
because cloned bios are likely to be freed by its endio.
Liu Bo [Fri, 7 Apr 2017 20:11:10 +0000 (13:11 -0700)]
Btrfs: fix segmentation fault when doing dio read
Commit 2dabb3248453 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
introduced this bug during iterating bio pages in dio read's endio hook,
and it could end up with segment fault of the dio reading task.
So the reason is 'if (nr_sectors--)', and it makes the code assume that
there is one more block in the same page, so page offset is increased and
the bio which is created to repair the bad block then has an incorrect
bvec.bv_offset, and a later access of the page content would throw a
segmentation fault.
This also adds ASSERT to check page offset against page size.
Johannes Berg [Tue, 11 Apr 2017 10:10:58 +0000 (12:10 +0200)]
bpf: reference may_access_skb() from __bpf_prog_run()
It took me quite some time to figure out how this was linked,
so in order to save the next person the effort of finding it
add a comment in __bpf_prog_run() that indicates what exactly
determines that a program can access the ctx == skb.
drm/i915: Don't call synchronize_rcu_expedited under struct_mutex
Only call synchronize_rcu_expedited after unlocking struct_mutex to
avoid deadlock because the workqueues depend on struct_mutex.
>From original patch by Andrea:
synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
hang until its own workqueues are run. The i915 gem workqueues will
wait on the struct_mutex to be released. So we cannot wait for a
quiescent state using those rcu primitives while holding the
struct_mutex or it creates a circular lock dependency resulting in
kernel hangs (which is reproducible but goes undetected by lockdep).
drm/i915: Suspend GuC prior to GPU Reset during GEM suspend
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().
Mika Westerberg [Mon, 10 Apr 2017 10:16:33 +0000 (13:16 +0300)]
pinctrl: cherryview: Add a quirk to make Acer Chromebook keyboard work again
After commit 47c950d10202 ("pinctrl: cherryview: Do not add all
southwest and north GPIOs to IRQ domain") the driver does not add all
GPIOs to the irqdomain. The reason for that is that those GPIOs cannot
generate IRQs at all, only GPEs (General Purpose Events). This causes
Linux virtual IRQ numbering to change.
However, it seems some CYAN Chromebooks, including Acer Chromebook
hardcodes these Linux IRQ numbers in the ACPI tables of the machine.
Since the numbering is different now, the IRQ meant for keyboard does
not match the Linux virtual IRQ number anymore making the keyboard
non-functional.
Work this around by adding special quirk just for these machines where
we add back all GPIOs to the irqdomain. Rest of the Cherryview/Braswell
based machines will not be affected by the change.
Jiri Olsa [Tue, 11 Apr 2017 07:14:46 +0000 (09:14 +0200)]
x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
The schemata lock is released before freeing the resource's temporary
tmp_cbms allocation. That's racy versus another write which allocates and
uses new temporary storage, resulting in memory leaks, freeing in use
memory, double a free or any combination of those.
CIFS: store results of cifs_reopen_file to avoid infinite wait
This fixes Continuous Availability when errors during
file reopen are encountered.
cifs_user_readv and cifs_user_writev would wait for ever if
results of cifs_reopen_file are not stored and for later inspection.
In fact, results are checked and, in case of errors, a chain
of function calls leading to reads and writes to be scheduled in
a separate thread is skipped.
These threads will wake up the corresponding waiters once reads
and writes are done.
However, given the return value is not stored, when rc is checked
for errors a previous one (always zero) is inspected instead.
This leads to pending reads/writes added to the list, making
cifs_user_readv and cifs_user_writev wait for ever.
STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.
Once the only place where it is set is removed, the remaining
bits are rendered moot.
Removing it does not prevent "mount" from failing when a non
existent share is passed.
What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.
Mark Syms [Tue, 29 Nov 2016 11:36:46 +0000 (11:36 +0000)]
CIFS: handle guest access errors to Windows shares
Commit 1a967d6c9b39c226be1b45f13acd4d8a5ab3dc44 ("correctly to
anonymous authentication for the NTLM(v2) authentication") introduces
a regression in handling errors related to attempting a guest
connection to a Windows share which requires authentication. This
should result in a permission denied error but actually causes the
kernel module to enter a never-ending loop trying to follow a DFS
referal which doesn't exist.
The base cause of this is the failure now occurs later in the process
during tree connect and not at the session setup setup and all errors
in tree connect are interpreted as needing to follow the DFS paths
which isn't in this case correct. So, check the returned error against
EACCES and fail if this is returned error.
I've tested v3.10, v4.4, master, master+your patch using default options
(empty or no user "NU") and user=abc (U).
NT_LOGON_FAILURE in session setup: LF
This is what you seem to have in 3.10.
NT_ACCESS_DENIED in tree connect to the share: AD
This is what you get before your infinite loop.
| NU U
--------------------------------
3.10 | LF LF
4.4 | LF LF
master | AD LF
master+patch | AD LF
No infinite DFS loop :(
All these issues result in mount failing very fast with permission denied.
I guess it could be from either the Windows version or the share/folder
ACL. A deeper analysis of the packets might reveal more.
In any case I did not notice any issues for on a basic DFS setup with
the patch so I don't think it introduced any regressions, which is
probably all that matters. It still bothers me a little I couldn't hit
the bug.
I've included kernel output w/ debugging output and network capture of
my tests if anyone want to have a look at it. (master+patch = ml-guestfix).
Pavel Shilovsky [Mon, 10 Apr 2017 17:31:33 +0000 (10:31 -0700)]
CIFS: Fix null pointer deref during read resp processing
Currently during receiving a read response mid->resp_buf can be
NULL when it is being passed to cifs_discard_remaining_data() from
cifs_readv_discard(). Fix it by always passing server->smallbuf
instead and initializing mid->resp_buf at the end of read response
processing.
Dan Williams [Fri, 7 Apr 2017 19:25:52 +0000 (12:25 -0700)]
libnvdimm: band aid btt vs clear poison locking
The following warning results from holding a lane spinlock,
preempt_disable(), or the btt map spinlock and then trying to take the
reconfig_mutex to walk the poison list and potentially add new entries.
As a minimal fix, disable error clearing when the BTT is enabled for the
namespace. For the final fix a larger rework of the poison list locking
is needed.
Note that this is not a problem in the blk case since that path never
calls nvdimm_clear_poison().
Dan Williams [Fri, 7 Apr 2017 16:47:24 +0000 (09:47 -0700)]
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
Holding the reconfig_mutex over a potential userspace fault sets up a
lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl
path. Move the user access outside of the lock.
[ INFO: possible circular locking dependency detected ]
4.11.0-rc3+ #13 Tainted: G W O
-------------------------------------------------------
fallocate/16656 is trying to acquire lock:
(&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffa00080b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
but task is already holding lock:
(jbd2_handle){++++..}, at: [<ffffffff813b4944>] start_this_handle+0x104/0x460
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Ondrej Zary [Fri, 31 Mar 2017 18:35:42 +0000 (20:35 +0200)]
sata_via: Enable hotplug only on VT6421
Commit 57e5568fda27 ("sata_via: Implement hotplug for VT6421") adds
hotplug IRQ handler for VT6421 but enables hotplug on all chips. This
is a bug because it causes "irq xx: nobody cared" error on VT6420 when
hot-(un)plugging a drive:
Seems that VT6420 can do hotplug too (there's no documentation) but the
comments say that SCR register access (required for detecting hotplug
events) can cause problems on these chips.
For now, just keep hotplug disabled on anything other than VT6421.
Zefan Li [Fri, 7 Apr 2017 08:51:55 +0000 (16:51 +0800)]
cgroup: avoid attaching a cgroup root to two different superblocks
Run this:
touch file0
for ((; ;))
{
mount -t cpuset xxx file0
}
And this concurrently:
touch file1
for ((; ;))
{
mount -t cpuset xxx file1
}
We'll trigger a warning like this:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
percpu_ref_kill_and_confirm called more than once on css_release!
CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
Call Trace:
dump_stack+0x63/0x84
__warn+0xd1/0xf0
warn_slowpath_fmt+0x5f/0x80
percpu_ref_kill_and_confirm+0x92/0xb0
cgroup_kill_sb+0x95/0xb0
deactivate_locked_super+0x43/0x70
deactivate_super+0x46/0x60
...
---[ end trace a79f61c2a2633700 ]---
Here's a race:
Thread A Thread B
cgroup1_mount()
# alloc a new cgroup root
cgroup_setup_root()
cgroup1_mount()
# no sb yet, returns NULL
kernfs_pin_sb()
# but succeeds in getting the refcnt,
# so re-use cgroup root
percpu_ref_tryget_live()
# alloc sb with cgroup root
cgroup_do_mount()
cgroup_kill_sb()
# alloc another sb with same root
cgroup_do_mount()
cgroup_kill_sb()
We end up using the same cgroup root for two different superblocks,
so percpu_ref_kill() will be called twice on the same root when the
two superblocks are destroyed.
We should fix to make sure the superblock pinning is really successful.
Follow-up patches will revert 07ec51480b5e ("virtio_pci: use shared
interrupts for virtqueues") that triggered the problem so no need for
this one anymore.
x86/vdso: Ensure vdso32_enabled gets set to valid values only
vdso_enabled can be set to arbitrary integer values via the kernel command
line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'.
load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32
merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR
auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which
causes a segfault when the application tries to use the VDSO.
Restrict the valid arguments on the command line and the sysctl to 0 and 1.
Paul Moore [Mon, 10 Apr 2017 15:16:59 +0000 (11:16 -0400)]
audit: make sure we don't let the retry queue grow without bounds
The retry queue is intended to provide a temporary buffer in the case
of transient errors when communicating with auditd, it is not meant
as a long life queue, that functionality is provided by the hold
queue.
This patch fixes a problem identified by Seth where the retry queue
could grow uncontrollably if an auditd instance did not connect to
the kernel to drain the queues. This commit fixes this by doing the
following:
* Make sure we always call auditd_reset() if we decide the connection
with audit is really dead. There were some cases in
kauditd_hold_skb() where we did not reset the connection, this patch
relocates the reset calls to kauditd_thread() so all the error
conditions are caught and the connection reset. As a side effect,
this means we could move auditd_reset() and get rid of the forward
definition at the top of kernel/audit.c.
* We never checked the status of the auditd connection when
processing the main audit queue which meant that the retry queue
could grow unchecked. This patch adds a call to auditd_reset()
after the main queue has been processed if auditd is not connected,
the auditd_reset() call will make sure the retry and hold queues are
correctly managed/flushed so that the retry queue remains reasonable.
Chanwoo Choi [Wed, 22 Mar 2017 08:03:31 +0000 (17:03 +0900)]
pinctrl: samsung: Add missing part for PINCFG_TYPE_DRV of Exynos5433
The commit 1259feddd0f8("pinctrl: samsung: Fix the width of
PINCFG_TYPE_DRV bitfields for Exynos5433") already fixed
the different width of PINCFG_TYPE_DRV from previous Exynos SoC.
However wrong merge conflict resolution was chosen in commit 7f36f5d11cda ("Merge tag 'v4.10-rc6' into devel") effectively dropping
the changes for PINCFG_TYPE_DRV. Re-do them here.
The macro EXYNOS_PIN_BANK_EINTW is no longer used so remove it.
Fixes: 7f36f5d11cda ("Merge tag 'v4.10-rc6' into devel") Signed-off-by: Chanwoo Choi <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
Fixes: cd8ae85299d5 ("tcp: provide SYN headers for passive connections") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"This is a set of CIFS/SMB3 fixes for stable.
There is another set of four SMB3 reconnect fixes for stable in
progress but they are still being reviewed/tested, so didn't want to
wait any longer to send these five below"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Reset TreeId to zero on SMB2 TREE_CONNECT
CIFS: Fix build failure with smb2
Introduce cifs_copy_file_range()
SMB3: Rename clone_range to copychunk_range
Handle mismatched open calls
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A number of ARM fixes:
- prevent oopses caused by dma_get_sgtable() and declared DMA
coherent memory
- fix boot failure on nommu caused by ID_PFR1 access
- a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
arm: kprobes: Align stack to 8-bytes in test code
arm: kprobes: Fix the return address of multiple kretprobes
arm: kprobes: Skip single-stepping in recursing path if possible
arm: kprobes: Allow to handle reentered kprobe on single-stepping
Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are 3 small fixes for 4.11-rc6.
One resolves a reported issue with sysfs files that NeilBrown found,
one is a documenatation fix for the stable kernel rules, and the last
is a small MAINTAINERS file update for kernfs"
* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
MAINTAINERS: separate out kernfs maintainership
sysfs: be careful of error returns from ops->show()
Documentation: stable-kernel-rules: fix stable-tag format
Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver rfixes from Greg KH:
"Here are a number of small IIO and staging driver fixes for 4.11-rc6.
Nothing big here, just iio fixes for reported issues, and an ashmem
fix for a very old bug that has been reported by a number of Android
vendors"
* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
iio: hid-sensor-attributes: Fix sensor property setting failure.
iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
iio: st_pressure: initialize lps22hb bootime
iio: bmg160: reset chip when probing
iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"statx followup fixes and a fix for stack-smashing on alpha"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
alpha: fix stack smashing in old_adjtimex(2)
statx: Include a mask for stx_attributes in struct statx
statx: Reserve the top bit of the mask for future struct expansion
xfs: report crtime and attribute flags to statx
ext4: Add statx support
statx: optimize copy of struct statx to userspace
statx: remove incorrect part of vfs_statx() comment
statx: reject unknown flags when using NULL path
Documentation/filesystems: fix documentation for ->getattr()
netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
We should use proper RCU list APIs to manipulate help->expectations,
as we can dump the conntrack's expectations via nfnetlink, i.e. in
ctnetlink_exp_ct_dump_table(), where only rcu_read_lock is acquired.
So for list traversal, use hlist_for_each_entry_rcu; for list add/del,
use hlist_add_head_rcu and hlist_del_rcu.
netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
For IPCTNL_MSG_EXP_GET, if the CTA_EXPECT_MASTER attr is specified, then
the NLM_F_DUMP request will dump the expectations related to this
connection tracking.
But we forget to check whether the conntrack has nf_conn_help or not,
so if nfct_help(ct) is NULL, oops will happen:
netfilter: make it safer during the inet6_dev->addr_list traversal
inet6_dev->addr_list is protected by inet6_dev->lock, so only using
rcu_read_lock is not enough, we should acquire read_lock_bh(&idev->lock)
before the inet6_dev->addr_list traversal.
netfilter: ctnetlink: make it safer when checking the ct helper name
One CPU is doing ctnetlink_change_helper(), while another CPU is doing
unhelp() at the same time. So even if help->helper is not NULL at first,
the later statement strcmp(help->helper->name, ...) may still access
the NULL pointer.
So we must use rcu_read_lock and rcu_dereference to avoid such _bad_
thing happen.
Fixes: f95d7a46bc57 ("netfilter: ctnetlink: Fix regression in CTA_HELP processing") Signed-off-by: Liping Zhang <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Gao Feng [Wed, 29 Mar 2017 11:11:27 +0000 (19:11 +0800)]
netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
When invoke __nf_conntrack_helper_find, it needs the rcu lock to
protect the helper module which would not be unloaded.
Now there are two caller nf_conntrack_helper_try_module_get and
ctnetlink_create_expect which don't hold rcu lock. And the other
callers left like ctnetlink_change_helper, ctnetlink_create_conntrack,
and ctnetlink_glue_attach_expect, they already hold the rcu lock
or spin_lock_bh.
Remove the rcu lock in functions nf_ct_helper_expectfn_find_by_name
and nf_ct_helper_expectfn_find_by_symbol. Because they return one pointer
which needs rcu lock, so their caller should hold the rcu lock, not in
these two functions.
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a pull request for 4.11-rc, fixing a set of issues mostly
centered around the new scheduling framework. These have been brewing
for a while, but split up into what we absolutely need in 4.11, and
what we can defer until 4.12. These are well tested, on both single
queue and multiqueue setups, and with and without shared tags. They
fix several hangs that have happened in testing.
This is obviously larger than I would have preferred at this point in
time, but I don't think we can shave much off this and still get the
desired results.
In detail, this pull request contains:
- a set of five fixes for NVMe, mostly from Christoph and one from
Roland.
- a series from Bart, fixing issues with dm-mq and SCSI shared tags
and scheduling. Note that one of those patches commit messages may
read like an optimization, but it is in fact an important fix for
queue restarts in particular.
- a series from Omar, most importantly fixing a hang with multiple
hardware queues when we fail to get a driver tag. Another important
fix in there is for resizing hardware queues, which nbd does when
handling multiple sockets for one connection.
- fixing an imbalance in putting the ctx for hctx request allocations
from Minchan"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: Restart a single queue if tag sets are shared
dm rq: Avoid that request processing stalls sporadically
scsi: Avoid that SCSI queues get stuck
blk-mq: Introduce blk_mq_delay_run_hw_queue()
blk-mq: remap queues when adding/removing hardware queues
blk-mq-sched: fix crash in switch error path
blk-mq-sched: set up scheduler tags when bringing up new queues
blk-mq-sched: refactor scheduler initialization
blk-mq: use the right hctx when getting a driver tag fails
nvmet: fix byte swap in nvmet_parse_io_cmd
nvmet: fix byte swap in nvmet_execute_write_zeroes
nvmet: add missing byte swap in nvmet_get_smart_log
nvme: add missing byte swap in nvme_setup_discard
nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
block: do not put mq context in blk_mq_alloc_request_hctx
Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"This late fix for pin control is hopefully the last I send this cycle.
The problem was detected early in the v4.11 release cycle and there
has been some back and forth on how to solve it. Sadly the proper fix
arrives late, but at least not too late.
An issue was detected with pin control on the Freescale i.MX after the
refactorings for more general group and function handling.
We now have the proper fix for this"
* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.11:
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host
kernel crash triggered by a hostile guest, but only in
configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian
kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
Paul Mackerras"
* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
powerpc/mm: Add missing global TLB invalidate if cxl is active
powerpc/64: Fix flush_(d|i)cache_range() called from modules
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc: Disable HFSCR[TM] if TM is not supported
selftests/powerpc: Fix standalone powerpc build
Chris Salls [Sat, 8 Apr 2017 06:48:11 +0000 (23:48 -0700)]
mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
David S. Miller [Sat, 8 Apr 2017 15:48:50 +0000 (08:48 -0700)]
Merge tag 'linux-can-fixes-for-4.12-20170404' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-04-04
this is a pull request of two patches for net/master.
The first patch by Markus Marb fixes a register read access in the ifi driver.
The second patch by Geert Uytterhoeven for the rcar driver remove the printing
of a kernel virtual address.
====================
sysfs: be careful of error returns from ops->show()
ops->show() can return a negative error code.
Commit 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.
Commit 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.
This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.
I can reproduce it reliably only by putting delay like
usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
Johan Hovold [Mon, 3 Apr 2017 13:53:34 +0000 (15:53 +0200)]
Documentation: stable-kernel-rules: fix stable-tag format
A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).
Drop the minus sign to avoid any confusion.
Fixes: fdc81b7910ad ("stable_kernel_rules: Add clause about specification of kernel versions to patch.") Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
pppol2tp_getsockopt() doesn't take into account the error code returned
by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If
error occurs there, pppol2tp_getsockopt() continues unconditionally and
reports erroneous values.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
pppol2tp_setsockopt() unconditionally overwrites the error value
returned by pppol2tp_tunnel_setsockopt() or
pppol2tp_session_setsockopt(), thus hiding errors from userspace.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.
Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file. 91360b02ab48 ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call. This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit cleanup from Paul Moore:
"A week later than I had hoped, but as promised, here is the audit
uninline-fix we talked about during the last audit pull request.
The patch is slightly different than what we originally discussed as
it made more sense to keep the audit_signal_info() function in
auditsc.c rather than move it and bunch of other related
variables/definitions into audit.c/audit.h.
At some point in the future I need to look at how the audit code is
organized across kernel/audit*, I suspect we could do things a bit
better, but it doesn't seem like a -rc release is a good place for
that ;)
Regardless, this patch passes our tests without problem and looks good
for v4.11"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_signal_info() into kernel/auditsc.c
* emailed patches from Andrew Morton <[email protected]>:
mm: move pcp and lru-pcp draining into single wq
mailmap: update Yakir Yang email address
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
dax: fix radix tree insertion race
mm, thp: fix setting of defer+madvise thp defrag mode
ptrace: fix PTRACE_LISTEN race corrupting task->state
vmlinux.lds: add missing VMLINUX_SYMBOL macros
mm/page_alloc.c: fix print order in show_free_areas()
userfaultfd: report actual registered features in fdinfo
mm: fix page_vma_mapped_walk() for ksm pages
Michal Hocko [Fri, 7 Apr 2017 23:05:05 +0000 (16:05 -0700)]
mm: move pcp and lru-pcp draining into single wq
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Ross Zwisler [Fri, 7 Apr 2017 23:04:57 +0000 (16:04 -0700)]
dax: fix radix tree insertion race
While running generic/340 in my test setup I hit the following race. It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.
Thread 1 Thread 2
-------- --------
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
'entry' is NULL, can't call lock_slot()
spin_unlock_irq()
radix_tree_preload()
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
...
lock_slot()
spin_unlock_irq()
dax_pmd_insert_mapping()
<inserts a PMD mapping>
spin_lock_irq()
__radix_tree_insert() fails with -EEXIST
<fall back to 4k fault, and die horribly
when inserting a 4k entry where a PMD exists>
The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index. For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.
So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.
This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop. It used to fail within the first 10 iterations.
David Rientjes [Fri, 7 Apr 2017 23:04:54 +0000 (16:04 -0700)]
mm, thp: fix setting of defer+madvise thp defrag mode
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().
Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:
kernel/extable.c:169: undefined reference to `__start_ro_after_init'
kernel/extable.c:169: undefined reference to `__end_ro_after_init'
The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.
Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.