When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).
Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age") Reported-by: Satish Ashok <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
We were missing check for 25G and 100G while checking port speed,
which lead to less number of queues getting allocated for 25G & 100G
adapters and leading to low throughput. Adding the missing check for
both NIC and vNIC driver.
Also fixes port advertisement for 25G and 100G in ethtool output.
Russell Currey [Wed, 14 Sep 2016 06:37:17 +0000 (16:37 +1000)]
powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags. This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.
The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.
Revert these cases to the previous behaviour, adding a new helper function
to do so. This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.
Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some 64-bit BARs") Reported-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Russell Currey <[email protected]> Tested-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Al Viro [Tue, 20 Sep 2016 19:07:42 +0000 (20:07 +0100)]
fix fault_in_multipages_...() on architectures with no-op access_ok()
Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around. Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).
However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().
Since any wraparound means EFAULT there, the fix is trivial - turn
those
while (uaddr <= end)
...
into
if (unlikely(uaddr > end))
return -EFAULT;
do
...
while (uaddr <= end);
fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.
Paul Burton [Tue, 13 Sep 2016 16:53:35 +0000 (17:53 +0100)]
irqchip/mips-gic: Fix local interrupts
Since the device hierarchy domain was added by commit c98c1822ee13
("irqchip/mips-gic: Add device hierarchy domain"), GIC local interrupts
have been broken.
Users attempting to setup a per-cpu local IRQ, for example the GIC timer
clock events code in drivers/clocksource/mips-gic-timer.c, the
setup_percpu_irq function would refuse with -EINVAL because the GIC
irqchip driver never called irq_set_percpu_devid so the
IRQ_PER_CPU_DEVID flag was never set for the IRQ. This happens because
irq_set_percpu_devid was being called from the gic_irq_domain_map
function which is no longer called.
Doing only that runs into further problems because gic_dev_domain_alloc
set the struct irq_chip for all interrupts, local or shared, to
gic_level_irq_controller despite that only being suitable for shared
interrupts. The typical outcome of this is that gic_level_irq_controller
callback functions are called for local interrupts, and then hwirq
number calculations overflow & the driver ends up attempting to access
some invalid register with an address calculated from an invalid hwirq
number. Best case scenario is that this then leads to a bus error. This
is fixed by abstracting the setup of the hwirq & chip to a new function
gic_setup_dev_chip which is used by both the root GIC IRQ domain & the
device domain.
Finally, decoding local interrupts failed because gic_dev_domain_alloc
only called irq_domain_alloc_irqs_parent for shared interrupts. Local
ones were therefore never associated with hwirqs in the root GIC IRQ
domain and the virq in gic_handle_local_int would always be 0. This is
fixed by calling irq_domain_alloc_irqs_parent unconditionally & having
gic_irq_domain_alloc handle both local & shared interrupts, which is
easy due to the aforementioned abstraction of chip setup into
gic_setup_dev_chip.
This fixes use of the MIPS GIC timer for clock events, which has been
broken since c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy
domain") but hadn't been noticed due to a silent fallback to the MIPS
coprocessor 0 count/compare clock events device.
Dave Airlie [Tue, 20 Sep 2016 20:28:25 +0000 (06:28 +1000)]
Merge branch 'for-next' of http://git.agner.ch/git/linux-drm-fsl-dcu into drm-next
fsl-dcu fixes.
* 'for-next' of http://git.agner.ch/git/linux-drm-fsl-dcu:
drm/fsl-dcu: disable clock on error path
drm/fsl-dcu: use PTR_ERR_OR_ZERO() to simplify the code
drm/fsl-dcu: fix endian issue when using clk_register_divider
This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.
Matt Fleming [Mon, 19 Sep 2016 12:09:09 +0000 (13:09 +0100)]
x86/efi: Only map RAM into EFI page tables if in mixed-mode
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Matt Fleming [Tue, 20 Sep 2016 13:26:21 +0000 (14:26 +0100)]
x86/mm/pat: Prevent hang during boot when mapping pages
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.
This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.
commit 742563777e8d ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().
Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.
Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.
The ioctl name and description on the documentation block don't
match the ioctl being defined. This was probably overlooked while
renaming the ioctls during the sync file destaging. This patch
provides a more accurate description of what the ioctl actually does.
Ray Strode [Fri, 9 Sep 2016 15:09:05 +0000 (11:09 -0400)]
drm/qxl: reapply cursor after SetCrtc calls
The qxl driver currently destroys and recreates the
qxl "primary" any time the first crtc is set.
A side-effect of destroying the primary is mouse state
associated with the crtc is lost, which leads to
disappearing mouse cursors on wayland sessions.
This commit changes the driver to reapply the cursor
any time SetCrtc is called. It achieves this by keeping
a reference to the cursor bo on the qxl_crtc struct.
Gerd Hoffmann [Tue, 16 Dec 2014 11:29:59 +0000 (12:29 +0100)]
bochs: ignore device if there isn't enougth memory
The qemu stdvga can be configured with a wide range of video memory,
from 1 MB to 256 MB (default is 16 MB). In case it is configured
with only 1 or 2 MB it isn't really usable with bochsdrm, due to
depths other than 32bpp not being supported so that isn't enough
memory for a reasonable sized framebuffer. So skip the device
and let vgacon or vesafb+fbcon handle the it.
dma-buf/sync_file: Increment refcount of fence when all are signaled.
When we merge several fences, if all of them are signaled already, we
still keep one of them. So instead of using add_fence(), which will not
increase the refcount of signaled fences, we should explicitly call
fence_get() for the fence we are keeping.
This patch fixes a kernel panic that can be triggered by creating a fence
that is expired (or increasing the timeline until it expires), then
creating a merged fence out of it, and deleting the merged fence. This
will make the original expired fence's refcount go to zero.
Imre Deak [Wed, 14 Sep 2016 10:04:13 +0000 (13:04 +0300)]
drm/i915: Unlock PPS registers after GPU reset
Reapply the PPS register unlock workaround after GPU reset on platforms
where the reset clobbers the display HW state. This at least gets rid of
the related WARN during LVDS encoder enabling on PNV.
In atomic mode the crtc_xxx (eg crtc_hdisplay) members of the mode
structure may be unset before calling atomic_check/commit for planes.
Instead of, use xxx members which are actually set.
drm/sti: use different notifier_block for each pipe
Each pipe shall have its own notifier block to manage the vblank event.
This fixes issues where a client registered on given pipe is later
abusively notified of events on the other pipe.
When a drm_plane is being disabled, its ->crtc member is set to NULL
before the .atomic_disable() func is called.
To get the crtc of the plane, read old_state->crtc instead of
drm_plane->crtc
Do not rely on plane->status to define whether this is the first update
but rather check for gdp->vtg.
This avoids multiple and unwanted calls to sti_vtg_register_client()
which breaks the kernel scheduler.
Fabien Dessenne [Wed, 24 Aug 2016 10:12:37 +0000 (12:12 +0200)]
drm/sti: run hqvdp init sequence only once
Do not rely on plane->status to define whether this is the first update
but rather check for hqvdp->xp70_initialized bit status.
This avoids multiple and unwanted calls to sti_vtg_register_client()
which breaks the kernel scheduler.
Ville Syrjälä [Mon, 19 Sep 2016 13:33:53 +0000 (16:33 +0300)]
drm/sti: Fix sparse warnings
drm/sti/sti_mixer.c:361:6: warning: symbol 'sti_mixer_set_matrix' was not declared. Should it be static?
drm/sti/sti_gdp.c:476:5: warning: symbol 'sti_gdp_field_cb' was not declared. Should it be static?
drm/sti/sti_gdp.c:885:24: warning: symbol 'sti_gdp_plane_helpers_funcs' was not declared. Should it be static?
drm/sti/sti_cursor.c:348:24: warning: symbol 'sti_cursor_plane_helpers_funcs' was not declared. Should it be static?
drm/sti/sti_compositor.c:28:28: warning: symbol 'stih407_compositor_data' was not declared. Should it be static?
drm/sti/sti_compositor.c:49:28: warning: symbol 'stih416_compositor_data' was not declared. Should it be static?
drm/sti/sti_vtg.c:75:1: warning: symbol 'vtg_lookup' was not declared. Should it be static?
drm/sti/sti_vtg.c:476:24: warning: symbol 'sti_vtg_driver' was not declared. Should it be static?
drm/sti/sti_dvo.c:109:5: warning: symbol 'dvo_awg_generate_code' was not declared. Should it be static?
drm/sti/sti_dvo.c:602:24: warning: symbol 'sti_dvo_driver' was not declared. Should it be static?
drm/sti/sti_vtac.c:209:24: warning: symbol 'sti_vtac_driver' was not declared. Should it be static?
drm/sti/sti_tvout.c:914:24: warning: symbol 'sti_tvout_driver' was not declared. Should it be static?
drm/sti/sti_hqvdp.c:786:5: warning: symbol 'sti_hqvdp_vtg_cb' was not declared. Should it be static?
drm/sti/sti_hqvdp.c:1253:24: warning: symbol 'sti_hqvdp_plane_helpers_funcs' was not declared. Should it be static?
drm/sti/sti_hqvdp.c:1292:5: warning: symbol 'sti_hqvdp_bind' was not declared. Should it be static?
drm/sti/sti_hqvdp.c:1385:24: warning: symbol 'sti_hqvdp_driver' was not declared. Should it be static?
drm/sti/sti_drv.c:143:6: warning: symbol 'sti_drm_dbg_cleanup' was not declared. Should it be static?
Shawn Lee [Mon, 19 Sep 2016 10:35:26 +0000 (13:35 +0300)]
drm/i915/backlight: setup backlight pwm alternate increment on backlight enable
Backlight enable is supposed to do a full setup of the backlight. We
were missing the PWM alternate increment bit in the south chicken
registers on lpt+ pch. This potentially caused a PWM frequency change
when the chicken register value was lost e.g. on suspend.
v2 by Jani, rebase on the patch caching alt increment
Commit fe56b9e6a8d95 ("qed: Add module with basic common support")
has introduced a stack corruption during probe, where filling a
local struct with data to be sent to management firmware is incorrectly
filled; The data is written outside of the struct and corrupts
the stack.
Changes from v1:
----------------
- Correct the value written [Caught by David Laight]
Fixes: fe56b9e6a8d95 ("qed: Add module with basic common support") Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Andrew Lunn [Sun, 18 Sep 2016 19:17:19 +0000 (21:17 +0200)]
MAINTAINERS: Add an entry for the core network DSA code
The core distributed switch architecture code currently does not have
a MAINTAINERS entry, which results in some contributions not landing
in the right peoples inbox.
Vincent Bernat [Sun, 18 Sep 2016 15:46:07 +0000 (17:46 +0200)]
net: ipv6: fallback to full lookup if table lookup is unsuitable
Commit 8c14586fc320 ("net: ipv6: Use passed in table for nexthop
lookups") introduced a regression: insertion of an IPv6 route in a table
not containing the appropriate connected route for the gateway but which
contained a non-connected route (like a default gateway) fails while it
was previously working:
$ ip link add eth0 type dummy
$ ip link set up dev eth0
$ ip addr add 2001:db8::1/64 dev eth0
$ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
RTNETLINK answers: No route to host
$ ip -6 route show table 20
default via 2001:db8::5 dev eth0 metric 1024 pref medium
After this patch, we get:
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
$ ip -6 route show table 20
2001:db8:cafe::1 via 2001:db8::6 dev eth0 metric 1024 pref medium
default via 2001:db8::5 dev eth0 metric 1024 pref medium
With display pixel clocks we want to have the closest possible clock
rate, to minimize timing and refresh rate skews. Whether the actual
clock rate is higher or lower than the requested rate is less important.
Also check candidates against the requested rate, rather than the
ideal parent rate, the varying dividers also influence the difference
between the requested rate and the rounded rate.
David S. Miller [Tue, 20 Sep 2016 02:10:25 +0000 (22:10 -0400)]
Merge branch 'mlx5-fixes'
Or Gerlitz says:
====================
mlx5 fixes to 4.8-rc6
This series series has a fix from Roi to memory corruption bug in
the bulk flow counters code and two late and hopefully last fixes
from me to the new eswitch offloads code.
Series done over net commit 37dd348 "bna: fix crash in bnad_get_strings()"
====================
Or Gerlitz [Sun, 18 Sep 2016 15:20:29 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Handle mode change failures
E-switch mode changes involve creating HW tables, potentially allocating
netdevices, etc, and things can fail. Add an attempt to rollback to the
existing mode when changing to the new mode fails. Only if rollback fails,
getting proper SRIOV functionality requires module unload or sriov
disablement/enablement.
Or Gerlitz [Sun, 18 Sep 2016 15:20:28 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).
Roi Dayan [Sun, 18 Sep 2016 15:20:27 +0000 (18:20 +0300)]
net/mlx5: Fix flow counter bulk command out mailbox allocation
The FW command output length should be only the length of struct
mlx5_cmd_fc_bulk out field. Failing to do so will cause the memcpy
call which is invoked later in the driver to write over wrong memory
address and corrupt kernel memory which results in random crashes.
This bug was found using the kernel address sanitizer (kasan).
Fixes: a351a1b03bf1 ('net/mlx5: Introduce bulk reading of flow counters') Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
gic_raise_softirq() walks the list of cpus using for_each_cpu(), it calls
gic_compute_target_list() which advances the iterator by the number of
CPUs in the cluster.
If gic_compute_target_list() reaches the last CPU it leaves the iterator
pointing at the last CPU. This means the next time round the for_each_cpu()
loop cpumask_next() will be called with an invalid CPU.
This triggers a warning when built with CONFIG_DEBUG_PER_CPU_MAPS:
[ 3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[ 3.077943] CPU1: Booted secondary processor [410fd0f0]
[ 3.078542] ------------[ cut here ]------------
[ 3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121 gic_raise_softirq+0x12c/0x170
[ 3.078812] Modules linked in:
[ 3.078869]
[ 3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[ 3.078994] Hardware name: Foundation-v8A (DT)
[ 3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[ 3.079145] PC is at gic_raise_softirq+0x12c/0x170
[ 3.079226] LR is at gic_raise_softirq+0xa4/0x170
[ 3.079296] pc : [<ffff0000083ead24>] lr : [<ffff0000083eac9c>] pstate: 200001c9
[ 3.081139] Call trace:
[ 3.081202] Exception stack(0xffff80087a19fbe0 to 0xffff80087a19fd10)
Avoid updating the iterator if the next call to cpumask_next() would
cause the for_each_cpu() loop to exit.
There is no change to gic_raise_softirq()'s behaviour, (cpumask_next()s
eventual call to _find_next_bit() will return early as start >= nbits),
this patch just silences the warning.
* emailed patches from Andrew Morton <[email protected]>:
rapidio/rio_cm: avoid GFP_KERNEL in atomic context
Revert "ocfs2: bump up o2cb network protocol version"
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
cgroup: duplicate cgroup reference when cloning sockets
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
ocfs2: fix double unlock in case retry after free truncate log
fanotify: fix list corruption in fanotify_get_response()
fsnotify: add a way to stop queueing events on group shutdown
ocfs2: fix trans extend while free cached blocks
ocfs2: fix trans extend while flush truncate log
ipc/shm: fix crash if CONFIG_SHMEM is not set
mm: fix the page_swap_info() BUG_ON check
autofs: use dentry flags to block walks during expire
MAINTAINERS: update email for VLYNQ bus entry
mm: avoid endless recursion in dump_page()
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
khugepaged: fix use-after-free in collapse_huge_page()
MAINTAINERS: Maik has moved
ocfs2/dlm: fix race between convert and migration
mem-hotplug: don't clear the only node in new_node_page()
rapidio/rio_cm: avoid GFP_KERNEL in atomic context
As reported by Alexey Khoroshilov (https://lkml.org/lkml/2016/9/9/737):
riocm_send_close() is called from rio_cm_shutdown() under
spin_lock_bh(idr_lock), but riocm_send_close() uses a GFP_KERNEL
allocation.
Fix by taking riocm_send_close() outside of spinlock protected code.
Junxiao Bi [Mon, 19 Sep 2016 21:44:44 +0000 (14:44 -0700)]
Revert "ocfs2: bump up o2cb network protocol version"
This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol
version").
This commit made rolling upgrade fail. When one node is upgraded to new
version with this commit, the remaining nodes will fail to establish
connections to it, then the application like VMs on the remaining nodes
can't be live migrated to the upgraded one. This will cause an outage.
Since negotiate hb timeout behavior didn't change without this commit,
so revert it.
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Johannes Weiner [Mon, 19 Sep 2016 21:44:38 +0000 (14:44 -0700)]
cgroup: duplicate cgroup reference when cloning sockets
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.
Johannes Weiner [Mon, 19 Sep 2016 21:44:36 +0000 (14:44 -0700)]
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.
The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked. Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path. When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.
Disable IRQs during all per-cpu charge cache operations.
Joseph Qi [Mon, 19 Sep 2016 21:44:33 +0000 (14:44 -0700)]
ocfs2: fix double unlock in case retry after free truncate log
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.
Jan Kara [Mon, 19 Sep 2016 21:44:30 +0000 (14:44 -0700)]
fanotify: fix list corruption in fanotify_get_response()
fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).
However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.
Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().
Junxiao Bi [Mon, 19 Sep 2016 21:44:24 +0000 (14:44 -0700)]
ocfs2: fix trans extend while free cached blocks
The root cause of this issue is the same with the one fixed by the last
patch, but this time credits for allocator inode and group descriptor
may not be consumed before trans extend.
Junxiao Bi [Mon, 19 Sep 2016 21:44:21 +0000 (14:44 -0700)]
ocfs2: fix trans extend while flush truncate log
Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().
Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.
The following error was caught when unlinking a large file with many
extents:
Commit c01d5b300774 ("shmem: get_unmapped_area align huge page") makes
use of shm_get_unmapped_area() in shm_file_operations() unconditional to
CONFIG_MMU.
As Tony Battersby pointed this can lead NULL-pointer dereference on
machine with CONFIG_MMU=y and CONFIG_SHMEM=n. In this case ipc/shm is
backed by ramfs which doesn't provide f_op->get_unmapped_area for
configurations with MMU.
The solution is to provide dummy f_op->get_unmapped_area for ramfs when
CONFIG_MMU=y, which just call current->mm->get_unmapped_area().
Commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages") replaced the
swap_aops dirty hook from __set_page_dirty_no_writeback() with
swap_set_page_dirty().
For normal cases without these special SWP flags code path falls back to
__set_page_dirty_no_writeback() so the behaviour is expected to be the
same as before.
But swap_set_page_dirty() makes use of the page_swap_info() helper to
get the swap_info_struct to check for the flags like SWP_FILE,
SWP_BLKDEV etc as desired for those features. This helper has
BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
set_page_dirty_lock() path.
For the set_page_dirty() path which is often needed for cases to be
called from irq context, kswapd() can toggle the flag behind the back
while the call is getting executed when system is low on memory and
heavy swapping is ongoing.
This ends up with undesired kernel panic.
This patch just moves the check outside the helper to its users
appropriately to fix kernel panic for the described path. Couple of
users of helpers already take care of SwapCache condition so I skipped
them.
Ian Kent [Mon, 19 Sep 2016 21:44:12 +0000 (14:44 -0700)]
autofs: use dentry flags to block walks during expire
Somewhere along the way the autofs expire operation has changed to hold
a spin lock over expired dentry selection. The autofs indirect mount
expired dentry selection is complicated and quite lengthy so it isn't
appropriate to hold a spin lock over the operation.
Commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") added a
might_sleep() to dput() causing a WARN_ONCE() about this usage to be
issued.
But the spin lock doesn't need to be held over this check, the autofs
dentry info. flags are enough to block walks into dentrys during the
expire.
I've left the direct mount expire as it is (for now) because it is much
simpler and quicker than the indirect mount expire and adding spin lock
release and re-aquires would do nothing more than add overhead.
dump_page() uses page_mapcount() to get mapcount of the page.
page_mapcount() has VM_BUG_ON_PAGE(PageSlab(page)) as mapcount doesn't
make sense for slab pages and the field in struct page used for other
information.
It leads to recursion if dump_page() called for slub page and DEBUG_VM
is enabled:
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
Currently, khugepaged does not permit swapin if there are enough young
pages in a THP. The problem is when a THP does not have enough young
pages, khugepaged leaks mapped ptes.
khugepaged: fix use-after-free in collapse_huge_page()
hugepage_vma_revalidate() tries to re-check if we still should try to
collapse small pages into huge one after the re-acquiring mmap_sem.
The problem Dmitry Vyukov reported[1] is that the vma found by
hugepage_vma_revalidate() can be suitable for huge pages, but not the
same vma we had before dropping mmap_sem. And dereferencing original
vma can lead to fun results..
Let's use vma hugepage_vma_revalidate() found instead of assuming it's the
same as what we had before the lock was dropped.
Joseph Qi [Mon, 19 Sep 2016 21:43:55 +0000 (14:43 -0700)]
ocfs2/dlm: fix race between convert and migration
Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.
In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.
Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.
Li Zhong [Mon, 19 Sep 2016 21:43:52 +0000 (14:43 -0700)]
mem-hotplug: don't clear the only node in new_node_page()
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") introduced new_node_page() for memory
hotplug.
In new_node_page(), the nid is cleared before calling
__alloc_pages_nodemask(). But if it is the only node of the system, and
the first round allocation fails, it will not be able to get memory from
an empty nodemask, and will trigger oom.
The patch checks whether it is the last node on the system, and if it
is, then don't clear the nid in the nodemask.
scripts/faddr2line: improve on base path filtering a bit
Due to our compiler include directives, the build pathnames for header
files often end up being of the form "$srcdir/./include/linux/xyz.h",
which ends up having that extra "." path component after the build base
in it.
Teach faddr2line to skip that too, to make code generated in inline
functions in header files match the filename for the regular C files.
Rabin Vincent pointed out that I can't make a stricter regexp match by
using the " at " prefix for the pathname, because that ends up being
locale-dependent. But this does require that the path match be preceded
by a space, to make it a bit more strict (that matters mainly if we
didn't find any base_dir at all, and we only end up with the "./" part
of the match)
blk-throttle: Extend slice if throttle group is not empty
Right now, if slice is expired, we start a new slice. If a bio is
queued, we keep on extending slice by throtle_slice interval (100ms).
This worked well as long as pending timer function got executed with-in
few milli seconds of scheduled time. But looks like with recent changes
in timer subsystem, slack can be much longer depending on the expiry time
of the scheduled timer.
commit 500462a9de65 ("timers: Switch to a non-cascading wheel")
This means, by the time timer function gets executed, it is possible the
delay from scheduled time is more than 100ms. That means current code
will conclude that existing slice has expired and a new one needs to
be started. New slice will be 100ms by default and that will not be
sufficient to meet rate requirement of group given the bio size and
bio will not be dispatched and we will start a new timer function to
wait. And when that timer expires, same process will repeat and we
will wait again and this can easily be an infinite loop.
Solve this issue by starting a new slice only if throttle gropup is
empty. If it is not empty, that means there should be an active slice
going on. Ideally it should not be expired but given the slack, it is
possible that it has expired.
Dave Airlie [Mon, 19 Sep 2016 20:24:26 +0000 (06:24 +1000)]
Merge tag 'imx-drm-next-2016-09-19' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm active plane reconfiguration, cleanup, FSU/IC/IRT/VDIC support
- add active plane reconfiguration support (v4),
use the atomic_disable callback
- stop calling disable_plane manually in the plane destroy path
- let mode cleanup destroy mode objects on driver unbind
- drop deprecated load/unload drm_driver ops
- add exclusive fence to plane state, so the atomic helper can
wait on them, remove the open-coded fence wait from imx-drm
- add low level deinterlacer (VDIC) support
- add support for channel linking via the frame synchronisation unit (FSU)
- add queued image conversion support for memory-to-memory scaling, rotation,
and color space conversion, using IC and IRT.
* tag 'imx-drm-next-2016-09-19' of git://git.pengutronix.de/git/pza/linux:
gpu: ipu-v3: Add queued image conversion support
gpu: ipu-v3: Add ipu_rot_mode_is_irt()
gpu: ipu-v3: fix a possible NULL dereference
drm/imx: parallel-display: detach bridge or panel on unbind
drm/imx: imx-ldb: detach bridge on unbind
drm/imx: imx-ldb: detach panel on unbind
gpu: ipu-v3: Add FSU channel linking support
gpu: ipu-v3: Add Video Deinterlacer unit
drm/imx: add exclusive fence to plane state
drm/imx: fold ipu_plane_disable into ipu_disable_plane
drm/imx: don't destroy mode objects manually on driver unbind
drm/imx: drop deprecated load/unload drm_driver ops
drm/imx: don't call disable_plane in plane destroy path
drm/imx: Add active plane reconfiguration support
drm/imx: Use DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODESET flag
drm/imx: ipuv3-crtc: Use the callback ->atomic_disable instead of ->disable
gpu: ipu-v3: Do not wait for DMFC FIFO to clear when disabling DMFC channel
Dave Airlie [Mon, 19 Sep 2016 20:23:22 +0000 (06:23 +1000)]
Merge tag 'drm-intel-next-2016-09-19' of git://anongit.freedesktop.org/drm-intel into drm-next
- refactor the sseu code (Imre)
- refine guc dmesg output (Dave Gordon)
- more vgpu work
- more skl wm fixes (Lyude)
- refactor dpll code in prep for upfront link training (Jim Bride et al)
- consolidate all platform feature checks into intel_device_info (Carlos Santa)
- refactor elsp/execlist submission as prep for re-submission after hang
recovery and eventually scheduling (Chris Wilson)
- allow synchronous gpu reset handling, to remove tricky/impossible/fragile
error recovery code (Chris Wilson)
- prep work for nonblocking (execlist) submission, using fences to track
depencies and drive elsp submission (Chris Wilson)
- partial error recover/resubmission of non-guilty batches after hangs (Chris Wilson)
- full dma-buf implicit fencing support (Chris Wilson)
- dp link training fixes (Jim, Dhinkaran, Navare, ...)
- obey dp branch device pixel rate/bpc/clock limits (Mika Kahola), needed for
many vga dongles
- bunch of small cleanups and polish all over, as usual
[airlied: printing macros collided]
* tag 'drm-intel-next-2016-09-19' of git://anongit.freedesktop.org/drm-intel: (163 commits)
drm/i915: Update DRIVER_DATE to 20160919
drm: Fix DisplayPort branch device ID kernel-doc
drm/i915: use NULL for NULL pointers
drm/i915: do not use 'false' as a NULL pointer
drm/i915: make intel_dp_compute_bpp static
drm: Add DP branch device info on debugfs
drm/i915: Update bits per component for display info
drm/i915: Check pixel rate for DP to VGA dongle
drm/i915: Read DP branch device SW revision
drm/i915: Read DP branch device HW revision
drm/i915: Cleanup DisplayPort AUX channel initialization
drm: Read DP branch device id
drm: Helper to read max bits per component
drm: Helper to read max clock rate
drm: Drop VGA from bpc definitions
drm: Add missing DP downstream port types
drm/i915: Add ddb size field to device info structure
drm/i915/guc: general tidying up (submission)
drm/i915/guc: general tidying up (loader)
drm/i915: clarify PMINTRMSK/pm_intr_keep usage
...
Dave Airlie [Mon, 19 Sep 2016 20:17:38 +0000 (06:17 +1000)]
Merge branch 'drm-next-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-next
More radeon and amdgpu changes for 4.9. Highlights:
- Initial SI support for amdgpu (controlled by a Kconfig option)
- misc ttm cleanups
- runtimepm fixes
- S3/S4 fixes
- power improvements
- lots of code cleanups and optimizations
* 'drm-next-4.9' of git://people.freedesktop.org/~agd5f/linux: (151 commits)
drm/ttm: remove cpu_address member from ttm_tt
drm/radeon/radeon_device: remove unused function
drm/amdgpu: clean function declarations in amdgpu_ttm.c up
drm/amdgpu: use the new ring ib and dma frame size callbacks (v2)
drm/amdgpu/vce3: add ring callbacks for ib and dma frame size
drm/amdgpu/vce2: add ring callbacks for ib and dma frame size
drm/amdgpu/vce: add common ring callbacks for ib and dma frame size
drm/amdgpu/uvd6: add ring callbacks for ib and dma frame size
drm/amdgpu/uvd5: add ring callbacks for ib and dma frame size
drm/amdgpu/uvd4.2: add ring callbacks for ib and dma frame size
drm/amdgpu/sdma3: add ring callbacks for ib and dma frame size
drm/amdgpu/sdma2.4: add ring callbacks for ib and dma frame size
drm/amdgpu/cik_sdma: add ring callbacks for ib and dma frame size
drm/amdgpu/si_dma: add ring callbacks for ib and dma frame size
drm/amdgpu/gfx8: add ring callbacks for ib and dma frame size
drm/amdgpu/gfx7: add ring callbacks for ib and dma frame size
drm/amdgpu/gfx6: add ring callbacks for ib and dma frame size
drm/amdgpu/ring: add an interface to get dma frame and ib size
drm/amdgpu/sdma3: drop unused functions
drm/amdgpu/gfx6: drop gds_switch callback
...
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a potential weakness in IPsec CBC IV generation, as well as
a number of issues that arose out of an OOM crash on ARM with CTR-mode
AES"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/aes-ctr - fix NULL dereference in tail processing
crypto: arm/aes-ctr - fix NULL dereference in tail processing
crypto: skcipher - Fix blkcipher walk OOM crash
crypto: echainiv - Replace chaining with multiplication
Ville Syrjälä [Mon, 19 Sep 2016 13:33:54 +0000 (16:33 +0300)]
drm/sun4i: Fix sparse warnings
drm/sun4i/sun4i_tv.c:181:21: warning: symbol 'ntsc_video_levels' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:185:21: warning: symbol 'pal_video_levels' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:189:21: warning: symbol 'ntsc_burst_levels' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:193:21: warning: symbol 'pal_burst_levels' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:197:20: warning: symbol 'ntsc_color_gains' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:201:20: warning: symbol 'pal_color_gains' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:205:26: warning: symbol 'ntsc_resync_parameters' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:209:26: warning: symbol 'pal_resync_parameters' was not declared. Should it be static?
drm/sun4i/sun4i_tv.c:213:16: warning: symbol 'tv_modes' was not declared. Should it be static?
Merge tag 'drm-fixes-for-4.8-rc7' of git://people.freedesktop.org/~airlied/linux
Pull exynos and one stable ABI fix from Dave Airlie:
"One important drm 32/64 ABI fix came in so I'll dequeue what I have,
the rest is just exynos runtime pm fixes, but the net removal of code
seems like a win to me.
I'm going to be sporadic this week due to school holidays, so if
anything urgent turns up, Daniel will take care of it"
* tag 'drm-fixes-for-4.8-rc7' of git://people.freedesktop.org/~airlied/linux:
drm: Only use compat ioctl for addfb2 on X86/IA64
Subject: [PATCH, RESEND] drm: exynos: avoid unused function warning
drm/exynos: g2d: fix system and runtime pm integration
drm/exynos: rotator: fix system and runtime pm integration
drm/exynos: gsc: fix system and runtime pm integration
drm/exynos: fimc: fix system and runtime pm integration
exynos-drm: Fix unsupported GEM memory type error message to be clear
Markus Elfring [Sun, 18 Sep 2016 15:00:52 +0000 (17:00 +0200)]
drm/amdgpu: Use kmalloc_array() in amdgpu_debugfs_gca_config_read()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
When the loop predicating timeout parameter passed happens to
not be a multiple of 20 the unsigned integer will overflow and
the loop will become unbounded.