Arnd Bergmann [Thu, 10 Nov 2016 16:44:52 +0000 (17:44 +0100)]
infiniband: shut up a maybe-uninitialized warning
Some configurations produce this harmless warning when built with gcc
-Wmaybe-uninitialized:
infiniband/core/cma.c: In function 'cma_get_net_dev':
infiniband/core/cma.c:1242:12: warning: 'src_addr_storage.sin_addr.s_addr' may be used uninitialized in this function [-Wmaybe-uninitialized]
I previously reported this for the powerpc64 defconfig, but have now
reproduced the same thing for x86 as well, using gcc-5 or higher.
The code looks correct to me, and this change just rearranges it by
making sure we alway initialize the entire address structure to make the
warning disappear. My first approach added an initialization at the
time of the declaration, which Doug commented may be too costly, so I
hope this version doesn't add overhead.
Arnd Bergmann [Thu, 10 Nov 2016 16:44:51 +0000 (17:44 +0100)]
crypto: aesni: shut up -Wmaybe-uninitialized warning
The rfc4106 encrypy/decrypt helper functions cause an annoying
false-positive warning in allmodconfig if we turn on
-Wmaybe-uninitialized warnings again:
arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’:
include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized]
The problem seems to be that the compiler doesn't track the state of the
'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end
section.
This takes the easy way out by adding a bogus initialization, which
should be harmless enough to get the patch into v4.9 so we can turn on
this warning again by default without producing useless output. A
follow-up patch for v4.10 rearranges the code to make the warning go
away.
Arnd Bergmann [Thu, 10 Nov 2016 16:44:50 +0000 (17:44 +0100)]
rc: print correct variable for z8f0811
A recent rework accidentally left a debugging printk untouched while
changing the meaning of the variables, leading to an uninitialized
variable being printed:
drivers/media/i2c/ir-kbd-i2c.c: In function 'get_key_haup_common':
drivers/media/i2c/ir-kbd-i2c.c:62:2: error: 'toggle' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This prints the correct one instead, as we did before the patch.
Sean Young [Thu, 10 Nov 2016 16:44:49 +0000 (17:44 +0100)]
dib0700: fix nec repeat handling
When receiving a nec repeat, ensure the correct scancode is repeated
rather than a random value from the stack. This removes the need for
the bogus uninitialized_var() and also fixes the warnings:
drivers/media/usb/dvb-usb/dib0700_core.c: In function ‘dib0700_rc_urb_completion’:
drivers/media/usb/dvb-usb/dib0700_core.c:679: warning: ‘protocol’ may be used uninitialized in this function
[sean addon: So after writing the patch and submitting it, I've bought the
hardware on ebay. Without this patch you get random scancodes
on nec repeats, which the patch indeed fixes.]
Arnd Bergmann [Thu, 10 Nov 2016 16:44:48 +0000 (17:44 +0100)]
s390: pci: don't print uninitialized data for debugging
gcc correctly warns about an incorrect use of the 'pa' variable in case
we pass an empty scatterlist to __s390_dma_map_sg:
arch/s390/pci/pci_dma.c: In function '__s390_dma_map_sg':
arch/s390/pci/pci_dma.c:309:13: warning: 'pa' may be used uninitialized in this function [-Wmaybe-uninitialized]
This adds a bogus initialization to the function to sanitize the debug
output. I would have preferred a solution without the initialization,
but I only got the report from the kbuild bot after turning on the
warning again, and didn't manage to reproduce it myself.
Arnd Bergmann [Thu, 10 Nov 2016 16:44:46 +0000 (17:44 +0100)]
x86: apm: avoid uninitialized data
apm_bios_call() can fail, and return a status in its argument structure.
If that status however is zero during a call from
apm_get_power_status(), we end up using data that may have never been
set, as reported by "gcc -Wmaybe-uninitialized":
arch/x86/kernel/apm_32.c: In function ‘apm’:
arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here
This changes the function to return "APM_NO_ERROR" here, which makes the
code more robust to broken BIOS versions, and avoids the warning.
Arnd Bergmann [Thu, 10 Nov 2016 16:44:45 +0000 (17:44 +0100)]
NFSv4.1: work around -Wmaybe-uninitialized warning
A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use if
we enable -Wmaybe-uninitialized again:
fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized]
gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair results
in a nonzero return value here. Using PTR_ERR_OR_ZERO() instead makes
this clear to the compiler.
Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback races") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Arnd Bergmann [Thu, 10 Nov 2016 16:44:44 +0000 (17:44 +0100)]
Kbuild: enable -Wmaybe-uninitialized warning for "make W=1"
Traditionally, we have always had warnings about uninitialized variables
enabled, as this is part of -Wall, and generally a good idea [1], but it
also always produced false positives, mainly because this is a variation
of the halting problem and provably impossible to get right in all cases
[2].
Various people have identified cases that are particularly bad for false
positives, and in commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized
when building with -Os"), I turned off the warning for any build that
was done with CC_OPTIMIZE_FOR_SIZE. This drastically reduced the number
of false positive warnings in the default build but unfortunately had
the side effect of turning the warning off completely in 'allmodconfig'
builds, which in turn led to a lot of warnings (both actual bugs, and
remaining false positives) to go in unnoticed.
With commit 877417e6ffb9 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") enabled the warning again for allmodconfig builds in v4.7
and in v4.8-rc1, I had finally managed to address all warnings I get in
an ARM allmodconfig build and most other maybe-uninitialized warnings
for ARM randconfig builds.
However, commit 6e8d666e9253 ("Disable "maybe-uninitialized" warning
globally") was merged at the same time and disabled it completely for
all configurations, because of false-positive warnings on x86 that I had
not addressed until then. This caused a lot of actual bugs to get
merged into mainline, and I sent several dozen patches for these during
the v4.9 development cycle. Most of these are actual bugs, some are for
correct code that is safe because it is only called under external
constraints that make it impossible to run into the case that gcc sees,
and in a few cases gcc is just stupid and finds something that can
obviously never happen.
I have now done a few thousand randconfig builds on x86 and collected
all patches that I needed to address every single warning I got (I can
provide the combined patch for the other warnings if anyone is
interested), so I hope we can get the warning back and let people catch
the actual bugs earlier.
This reverts the change to disable the warning completely and for now
brings it back at the "make W=1" level, so we can get it merged into
mainline without introducing false positives. A follow-up patch enables
it on all levels unless some configuration option turns it off because
of false-positives.
Chris Wilson [Thu, 10 Nov 2016 18:46:47 +0000 (10:46 -0800)]
lib/stackdepot: export save/fetch stack for drivers
Some drivers would like to record stacktraces in order to aide leak
tracing. As stackdepot already provides a facility for only storing the
unique traces, thereby reducing the memory required, export that
functionality for use by drivers.
The code was originally created for KASAN and moved under lib in commit cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot
for SLAB") so that it could be shared with mm/. In turn, we want to
share it now with drivers.
Jakub Kicinski [Thu, 10 Nov 2016 18:46:44 +0000 (10:46 -0800)]
mm: kmemleak: scan .data.ro_after_init
Limit the number of kmemleak false positives by including
.data.ro_after_init in memory scanning. To achieve this we need to add
symbols for start and end of the section to the linker scripts.
The problem was been uncovered by commit 56989f6d8568 ("genetlink: mark
families as __ro_after_init").
Greg Thelen [Thu, 10 Nov 2016 18:46:41 +0000 (10:46 -0800)]
memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB
While testing OBJFREELIST_SLAB integration with pagealloc, we found a
bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
CFLGS_OBJFREELIST_SLAB. When it happened, critical allocations needed
for loading drivers or creating new caches will fail.
The original kmem_cache is created early making OFF_SLAB not possible.
When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
is enabled it will try to enable it first under certain conditions.
Given kmem_cache(sys) reuses the original flag, you can have both flags
at the same time resulting in allocation failures and odd behaviors.
This fix discards allocator specific flags from memcg before calling
create_cache.
The bug exists since 4.6-rc1 and affects testing debug pagealloc
configurations.
Andrey Ryabinin [Thu, 10 Nov 2016 18:46:38 +0000 (10:46 -0800)]
coredump: fix unfreezable coredumping task
It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.
Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation. So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.
Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.
Eryu Guan [Thu, 10 Nov 2016 18:46:35 +0000 (10:46 -0800)]
mm/filemap: don't allow partially uptodate page for pipes
Starting from 4.9-rc1 kernel, I started noticing some test failures of
sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when
testing on sub-page block size filesystems (tested both XFS and ext4),
these syscalls start to return EIO in the tests. e.g.
This is because that in sub-page block size cases, we don't need the
whole page to be uptodate, only the part we care about is uptodate is OK
(if fs has ->is_partially_uptodate defined).
But page_cache_pipe_buf_confirm() doesn't have the ability to check the
partially-uptodate case, it needs the whole page to be uptodate. So it
returns EIO in this case.
This is a regression introduced by commit 82c156f85384 ("switch
generic_file_splice_read() to use of ->read_iter()"). Prior to the
change, generic_file_splice_read() doesn't allow partially-uptodate page
either, so it worked fine.
Fix it by skipping the partially-uptodate check if we're working on a
pipe in do_generic_file_read(), so we read the whole page from disk as
long as the page is not uptodate.
I think the other way to fix it is to add the ability to check & allow
partially-uptodate page to page_cache_pipe_buf_confirm(), but that is
much harder to do and seems gain little.
Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly
allocated huge page.
If a reservation was associated with the huge page, alloc_huge_page()
consumed the reservation while allocating. When the newly allocated
page is freed in free_huge_page(), it will increment the global
reservation count. However, the reservation entry in the reserve map
will remain.
This is not an issue for shared mappings as the entry in the reserve map
indicates a reservation exists. But, an entry in a private mapping
reserve map indicates the reservation was consumed and no longer exists.
This results in an inconsistency between the reserve map and the global
reservation count. This 'leaks' a reserved huge page.
Create a new routine restore_reserve_on_error() to restore the reserve
entry in these specific error paths. This routine makes use of a new
function vma_add_reservation() which will add a reserve entry for a
specific address/page.
In general, these error paths were rarely (if ever) taken on most
architectures. However, powerpc contained arch specific code that that
resulted in an extra fault and execution of these error paths on all
private mappings.
Junxiao Bi [Thu, 10 Nov 2016 18:46:29 +0000 (10:46 -0800)]
ocfs2: fix not enough credit panic
The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192). ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up.
The total credit should include 3 times of "num_dx_leaves" from
ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
ocfs2_dx_dir_format_cluster(). But only two times is included in
ocfs2_dx_dir_rebalance_credits(), fix it.
This can cause read-only fs(v4.1+) or panic for mainline linux depending
on mount option.
Hans de Goede [Thu, 10 Nov 2016 18:46:26 +0000 (10:46 -0800)]
Revert "console: don't prefer first registered if DT specifies stdout-path"
This reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path").
The reverted commit changes existing behavior on which many ARM boards
rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have
both a video output and a serial console. Depending on whether the user
is using the device as a more regular computer; or as a headless device
we need to have the console on either one or the other.
Many users rely on the kernel behavior of the console being present on
both outputs, before the reverted commit the console setup with no
console= kernel arguments on an ARM board which sets stdout-path in dt
would look like this:
[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64
tty0 -WU (E p ) 4:1
Where as after the reverted commit, it looks like this:
[root@localhost ~]# cat /proc/consoles
ttyS0 -W- (EC p a) 4:64
This commit reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path") restoring the original
behavior.
Jann Horn [Thu, 10 Nov 2016 18:46:19 +0000 (10:46 -0800)]
swapfile: fix memory corruption via malformed swapfile
When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.
This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.
Shiraz Hashim [Thu, 10 Nov 2016 18:46:16 +0000 (10:46 -0800)]
mm/cma.c: check the max limit for cma allocation
CMA allocation request size is represented by size_t that gets truncated
when same is passed as int to bitmap_find_next_zero_area_off.
We observe that during fuzz testing when cma allocation request is too
high, bitmap_find_next_zero_area_off still returns success due to the
truncation. This leads to kernel crash, as subsequent code assumes that
requested memory is available.
Fail cma allocation in case the request breaches the corresponding cma
region size.
Alexey Dobriyan [Thu, 10 Nov 2016 18:46:13 +0000 (10:46 -0800)]
scripts/bloat-o-meter: fix SIGPIPE
Fix piping output to a program which quickly exits (read: head -n1)
$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux | head -n1
add/remove: 0/0 grow/shrink: 9/60 up/down: 124/-305 (-181)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
Hugh Dickins [Thu, 10 Nov 2016 18:46:11 +0000 (10:46 -0800)]
shmem: fix pageflags after swapping DMA32 object
If shmem_alloc_page() does not set PageLocked and PageSwapBacked, then
shmem_replace_page() needs to do so for itself. Without this, it puts
newpage on the wrong lru, re-unlocks the unlocked newpage, and system
descends into "Bad page" reports and freeze; or if CONFIG_DEBUG_VM=y, it
hits an earlier VM_BUG_ON_PAGE(!PageLocked), depending on config.
But shmem_replace_page() is not a common path: it's only called when
swapin (or swapoff) finds the page was already read into an unsuitable
zone: usually all zones are suitable, but gem objects for a few drm
devices (gma500, omapdrm, crestline, broadwater) require zone DMA32 if
there's more than 4GB of ram.
Turns out kmemleak is right. We now allocate the frontswap map
depending on the kernel config (and no longer on the enablement)
swapfile.c:
[...]
if (IS_ENABLED(CONFIG_FRONTSWAP))
frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));
but later on this is passed along
--> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
and ignored if frontswap is disabled
--> frontswap_init(p->type, frontswap_map);
static inline void frontswap_init(unsigned type, unsigned long *map)
{
if (frontswap_enabled())
__frontswap_init(type, map);
}
Thing is, that frontswap map is never freed.
The leakage is relatively not that bad, because swapon is an infrequent
and privileged operation. However, if the first frontswap backend is
registered after a swap type has been already enabled, it will WARN_ON
in frontswap_register_ops() and frontswap will not be available for the
swap type.
Fix this by making sure the map is assigned by frontswap_init() as long
as CONFIG_FRONTSWAP is enabled.
Paolo Bonzini [Fri, 11 Nov 2016 10:13:36 +0000 (11:13 +0100)]
Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for v4.9-rc4
- Kick the vcpu when a pending interrupt becomes pending again
- Prevent access to invalid interrupt registers
- Invalid TLBs when two vcpus from the same VM share a CPU
Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.
We can resolve this problem by doing our error and wakeup checking
(particularly, for the async_error flag) after waiting for children to
suspend, instead of before. This also helps align the logic for the noirq and
late suspend cases with the logic in __device_suspend().
It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)
Fixes: de377b397272 (PM / sleep: Asynchronous threads for suspend_late) Fixes: 28b6fd6e3779 (PM / sleep: Asynchronous threads for suspend_noirq) Reported-by: Jeffy Chen <[email protected]> Signed-off-by: Brian Norris <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Al Viro [Thu, 10 Nov 2016 23:32:13 +0000 (18:32 -0500)]
splice: remove detritus from generic_file_splice_read()
i_size check is a leftover from the horrors that used to play with
the page cache in that function. With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).
Dave Airlie [Thu, 10 Nov 2016 23:09:57 +0000 (09:09 +1000)]
Merge tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux into drm-fixes
imx-drm: fix possible hangup when disabling crtcs
- only ever disable the display controller (DC) module after all plane
IDMAC channels are stopped. This fixes a regression introduced by the
atomic modeset conversion.
* tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux:
drm/imx: disable planes before DC
Dave Airlie [Thu, 10 Nov 2016 22:58:57 +0000 (08:58 +1000)]
Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Regression fix for powerplay on some iceland boards.
* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: implement get_clock_by_type for iceland.
drm/amd/powerplay/smu7: fix checks in smu7_get_evv_voltages (v2)
drm/amd/powerplay: update phm_get_voltage_evv_on_sclk for iceland
drm/amd/powerplay: propagate errors in phm_get_voltage_evv_on_sclk
Ilya Dryomov [Tue, 8 Nov 2016 14:15:24 +0000 (15:15 +0100)]
libceph: initialize last_linger_id with a large integer
osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies. Starting with a large integer should ease the task
of telling apart kernel and userspace clients.
Yan, Zheng [Wed, 9 Nov 2016 08:42:48 +0000 (16:42 +0800)]
libceph: fix legacy layout decode with pool 0
If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid pool.
Yan, Zheng [Wed, 9 Nov 2016 08:47:54 +0000 (16:47 +0800)]
ceph: use default file splice read callback
Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).
Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.
Jiri Pirko [Thu, 10 Nov 2016 11:31:05 +0000 (12:31 +0100)]
mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces
Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:
Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.
Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <[email protected]> Acked-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jiri Pirko [Thu, 10 Nov 2016 11:31:04 +0000 (12:31 +0100)]
mlxsw: spectrum_router: Fix handling of neighbour structure
__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.
Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.
Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Jiri Pirko <[email protected]> Acked-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Thu, 10 Nov 2016 17:55:26 +0000 (12:55 -0500)]
Merge branch 'qed-fixes'
Yuval Mintz says:
====================
qed: Fix RoCE infrastructure
This series fixes 2 basic issues with RoCE support,
one handles a missing configuration in the initial infrastructure
support while the other is a regression introduced by one of the
initial fix submissions.
====================
Ram Amrani [Wed, 9 Nov 2016 20:48:44 +0000 (22:48 +0200)]
qed: Correct rdma params configuration
Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.
Fixes: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR") Signed-off-by: Ram Amrani <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ram Amrani [Wed, 9 Nov 2016 20:48:43 +0000 (22:48 +0200)]
qed: configure ll2 RoCE v1/v2 flavor correctly
Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.
Shawn Lin [Thu, 10 Nov 2016 17:14:37 +0000 (11:14 -0600)]
PCI: rockchip: Add three new resets as required properties
pm_rst, aclk_rst, pclk_rst was controlled by ROM code so the software
wasn't needed to control it again in theory. But it didn't work properly,
so we do need to do it again and add enough delay between the assert of
pm_rst and the deassert of pm_rst. The Soc intergrated with this
controller, rk3399, is still under MP test internally, so the backward
compatibility won't be a big deal.
ipv4: update comment to document GSO fragmentation cases.
This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.
Chuck Lever [Mon, 7 Nov 2016 21:16:24 +0000 (16:16 -0500)]
xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect
When a LOCALINV WR is flushed, the frmr is marked STALE, then
frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
are then recovered when frwr_op_map hunts for an INVALID frmr to
use.
All other cases that need frmr recovery leave that SGL DMA-mapped.
The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.
To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
alter the recovery logic (rather than the hot frwr_op_unmap_sync
path) to distinguish among these cases. This solution also takes
care of the case where multiple LOCAL_INV WRs are issued for the
same rpcrdma_req, some complete successfully, but some are flushed.
Johan Hovold [Tue, 8 Nov 2016 12:10:57 +0000 (13:10 +0100)]
USB: cdc-acm: fix TIOCMIWAIT
The TIOCMIWAIT implementation would return -EINVAL if any of the three
supported signals were included in the mask.
Instead of returning an error in case TIOCM_CTS is included, simply
drop the mask check completely, which is in accordance with how other
drivers implement this ioctl.
David Ahern [Wed, 9 Nov 2016 17:07:26 +0000 (09:07 -0800)]
net: tcp response should set oif only if it is L3 master
Lorenzo noted an Android unit test failed due to e0d56fdd7342:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."
Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.
Allan Chou [Tue, 8 Nov 2016 22:08:01 +0000 (16:08 -0600)]
Net Driver: Add Cypress GX3 VID=04b4 PID=3610.
Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).
Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.
A similar patch was signed-off and tested-by Allan Chou
<[email protected]> on 2015-12-01.
Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.
The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:
1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
support the set element updates from the packet path. From Liping
Zhang.
2) Fix leak when nft_expr_clone() fails, from Liping Zhang.
3) Fix a race when inserting new elements to the set hash from the
packet path, also from Liping.
4) Handle segmented TCP SIP packets properly, basically avoid that the
INVITE in the allow header create bogus expectations by performing
stricter SIP message parsing, from Ulrich Weber.
5) nft_parse_u32_check() should return signed integer for errors, from
John Linville.
6) Fix wrong allocation instead of connlabels, allocate 16 instead of
32 bytes, from Florian Westphal.
7) Fix compilation breakage when building the ip_vs_sync code with
CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.
8) Destroy the new set if the transaction object cannot be allocated,
also from Liping Zhang.
9) Use device to route duplicated packets via nft_dup only when set by
the user, otherwise packets may not follow the right route, again
from Liping.
10) Fix wrong maximum genetlink attribute definition in IPVS, from
WANG Cong.
11) Ignore untracked conntrack objects from xt_connmark, from Florian
Westphal.
12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
via CT target, otherwise we cannot use the h.245 helper, from
Florian.
13) Revisit garbage collection heuristic in the new workqueue-based
timer approach for conntrack to evict objects earlier, again from
Florian.
14) Fix crash in nf_tables when inserting an element into a verdict map,
from Liping Zhang.
====================
Mathias Krause [Mon, 7 Nov 2016 22:22:19 +0000 (23:22 +0100)]
rtnl: reset calcit fptr in rtnl_unregister()
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.
This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.
Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.
DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.
Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time". Apparently that was unlikely enough for most, so this went
unnoticed for years.
Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/
This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.
It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).
It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().
Arnd Bergmann [Mon, 7 Nov 2016 21:09:07 +0000 (22:09 +0100)]
vxlan: hide unused local variable
A bugfix introduced a harmless warning in v4.9-rc4:
drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]
This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.
John Allen [Mon, 7 Nov 2016 20:27:28 +0000 (14:27 -0600)]
ibmvnic: Start completion queue negotiation at server-provided optimum values
Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.
David Ahern [Mon, 7 Nov 2016 20:03:09 +0000 (12:03 -0800)]
net: icmp_route_lookup should use rt dev to determine L3 domain
icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.
Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Wed, 9 Nov 2016 23:45:36 +0000 (18:45 -0500)]
Merge branch 'qcom-emac-pause'
Timur Tabi says:
====================
net: qcom/emac: ensure that pause frames are enabled
The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it. Therefore, we
need to enable flow control and force the phy to pass pause frames.
====================
Timur Tabi [Mon, 7 Nov 2016 16:51:40 +0000 (10:51 -0600)]
net: qcom/emac: configure the external phy to allow pause frames
Pause frames are used to enable flow control. A MAC can send and
receive pause frames in order to throttle traffic. However, the PHY
must be configured to allow those frames to pass through.
Heikki Krogerus [Thu, 3 Nov 2016 14:21:26 +0000 (16:21 +0200)]
ACPI / platform: Add support for build-in properties
We have a couple of drivers, acpi_apd.c and acpi_lpss.c,
that need to pass extra build-in properties to the devices
they create. Previously the drivers added those properties
to the struct device which is member of the struct
acpi_device, but that does not work. Those properties need
to be assigned to the struct device of the platform device
instead in order for them to become available to the
drivers.
To fix this, this patch changes acpi_create_platform_device
function to take struct property_entry pointer as parameter.
Dave Airlie [Wed, 9 Nov 2016 22:37:52 +0000 (08:37 +1000)]
Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
3 more amdgpu fixes.
* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: return false instead of -EINVAL
drm/amdgpu/powerplay/smu7: fix unintialized data usage
drm/amdgpu: fix crash in acp_hw_fini
Dave Airlie [Wed, 9 Nov 2016 22:37:01 +0000 (08:37 +1000)]
Merge tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel into drm-fixes
i915 fixes, include Sandybridge rendering regression fix.
* tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Limit Valleyview and earlier to only using mappable scanout
drm/i915: Round tile chunks up for constructing partial VMAs
drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms
drm/i915/dp: BDW cdclk fix for DP audio
drm/i915/vlv: Prevent enabling hpd polling in late suspend
drm/i915: Respect alternate_ddc_pin for all DDI ports
Thomas Gleixner [Wed, 9 Nov 2016 15:35:51 +0000 (16:35 +0100)]
x86/cpu: Deal with broken firmware (VMWare/XEN)
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.
The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.
There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.
The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.
If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.
Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:
[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....
Linus Torvalds [Wed, 9 Nov 2016 19:39:02 +0000 (11:39 -0800)]
Merge tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a largish pull-request, as we've got a bunch of pending
ASoC fixes at this time. One noticeable change is the removal of error
directive in uapi/sound/asoc.h. We found that the API has been already
used on Chromebooks, so we need to support it even now.
A slight big LOC is found in Qualcomm lpass driver, but the rest are
all small and easy fixes for ASoC drivers (sti, sun4i, Realtek codecs,
Intel, tas571x, etc) in addition to the patches to harden the ALSA
core proc file accesses"
* tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
ALSA: info: Return error for invalid read/write
ALSA: info: Limit the proc text input size
ASoC: samsung: spdif: Fix DMA filter initialization
ASoC: sun4i-codec: Enable bus clock after getting GPIO
ASoC: lpass-cpu: add module licence and description
ASoC: lpass-platform: Fix broken pcm data usage
ASoC: sun4i-codec: return error code instead of NULL when create_card fails
ASoC: hdmi-codec: Fix hdmi_of_xlate_dai_name when #sound-dai-cells = <0>
ASoC: samsung: get access to DMA engine early to defer probe properly
ASoC: da7219: Connect output enable register to DAIOUT
ASoC: Intel: Skylake: Fix to turn off hdmi power on probe failure
ASoC: sti-sas: enable fast io for regmap
ASoC: sti: fix channel status update after playback start
ASoC: PXA: Brownstone needs I2C
ASoC: Intel: Skylake: Always acquire runtime pm ref on unload
ASoC: Intel: Atom: add terminate entry for dmi_system_id tables
ASoC: rt298: fix jack type detect error
ASoC: rt5663: fix a debug statement
ASoC: cs4270: fix DAPM stream name mismatch
ASoC: Intel: haswell depends on sst-firmware
...
Linus Torvalds [Wed, 9 Nov 2016 19:36:43 +0000 (11:36 -0800)]
Merge tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fix from Mike Marshall:
"We recently refactored the Orangefs debugfs code. The refactor seemed
to trigger [email protected]'s static tester to find a possible
double-free in the code.
While designing the fix we saw a condition under which the buffer
being freed could also be overflowed.
We also realized how to rebuild the related debugfs file's "contents"
(a string) without deleting and re-creating the file.
This fix should eliminate the possible double-free, the potential
overflow and improve code readability"
* tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: clean up debugfs
Linus Torvalds [Wed, 9 Nov 2016 19:09:40 +0000 (11:09 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Two bug fixes
- a memory alignment fix in the s390 only hypfs code
- a fix for the generic percpu code that caused ftrace to break on
s390. This is not relevant for x86 but for all architectures that
use the generic percpu code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
percpu: use notrace variant of preempt_disable/preempt_enable
s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment
Rafał Miłecki [Mon, 7 Nov 2016 12:53:27 +0000 (13:53 +0100)]
net: bgmac: fix reversed checks for clock control flag
This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.
This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.
Fixes: db791eb2970b ("net: ethernet: bgmac: convert to feature flags") Fixes: 4af1474e6198 ("net: bgmac: Fix errant feature flag check") Cc: Jon Mason <[email protected]> Signed-off-by: Rafał Miłecki <[email protected]> Signed-off-by: David S. Miller <[email protected]>
We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.
bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.
Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().
net-ipv6: on device mtu change do not add mtu to mtu-less routes
Routes can specify an mtu explicitly or inherit the mtu from
the underlying device - this inheritance is implemented in
dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
Currently changing the mtu of a device adds mtu explicitly
to routes using that device.
ie.
# ip link set dev lo mtu 65536
# ip -6 route add local 2000::1 dev lo
# ip -6 route get 2000::1
local 2000::1 dev lo table local src ... metric 1024 pref medium
# ip link set dev lo mtu 65535
# ip -6 route get 2000::1
local 2000::1 dev lo table local src ... metric 1024 mtu 65535 pref medium
# ip link set dev lo mtu 65536
# ip -6 route get 2000::1
local 2000::1 dev lo table local src ... metric 1024 mtu 65536 pref medium
# ip -6 route del local 2000::1
After this patch the route entry no longer changes unless it already has an mtu.
There is no need: this inheritance is already done in ip6_mtu()
# ip link set dev lo mtu 65536
# ip -6 route add local 2000::1 dev lo
# ip -6 route add local 2000::2 dev lo mtu 2000
# ip -6 route get 2000::1; ip -6 route get 2000::2
local 2000::1 dev lo table local src ... metric 1024 pref medium
local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium
# ip link set dev lo mtu 65535
# ip -6 route get 2000::1; ip -6 route get 2000::2
local 2000::1 dev lo table local src ... metric 1024 pref medium
local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium
# ip link set dev lo mtu 1501
# ip -6 route get 2000::1; ip -6 route get 2000::2
local 2000::1 dev lo table local src ... metric 1024 pref medium
local 2000::2 dev lo table local src ... metric 1024 mtu 1501 pref medium
# ip link set dev lo mtu 65536
# ip -6 route get 2000::1; ip -6 route get 2000::2
local 2000::1 dev lo table local src ... metric 1024 pref medium
local 2000::2 dev lo table local src ... metric 1024 mtu 65536 pref medium
# ip -6 route del local 2000::1
# ip -6 route del local 2000::2
This is desirable because changing device mtu and then resetting it
to the previous value shouldn't change the user visible routing table.
Do not send the next message in sendmmsg for partial sendmsg
invocations.
sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.
For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.
Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.
Gao Feng [Fri, 4 Nov 2016 02:28:49 +0000 (10:28 +0800)]
driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.
When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.
Now add one flag to indicate if new macvlan port is created.
Sumit Saxena [Wed, 9 Nov 2016 10:59:42 +0000 (02:59 -0800)]
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").
The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).
Yazen Ghannam [Tue, 8 Nov 2016 08:35:06 +0000 (09:35 +0100)]
x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.
For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.
Other things like EDAC use cpu_llc_id so they will be b0rked too.
So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.
The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.
Namhyung Kim [Tue, 8 Nov 2016 13:08:33 +0000 (22:08 +0900)]
perf hists: Fix column length on --hierarchy
Markus reported that there's a weird behavior on perf top --hierarchy
regarding the column length.
Looking at the code, I found a dubious code which affects the symptoms.
When --hierarchy option is used, the last column length might be
inaccurate since it skips to update the length on leaf entries.
I cannot remember why it did and looks like a leftover from previous
version during the development.
Anyway, updating the column length often is not harmful. So let's move
the code out.
Namhyung Kim [Tue, 8 Nov 2016 13:08:32 +0000 (22:08 +0900)]
perf hists browser: Fix column indentation on --hierarchy
When horizontall scrolling is used in hierarchy mode, the the right most
column has unnecessary indentation. Actually it's needed only if some
of left (overhead) columns were shown.
Namhyung Kim [Tue, 8 Nov 2016 13:08:31 +0000 (22:08 +0900)]
perf hists browser: Show folded sign properly on --hierarchy
When horizontal scrolling is used in hierarchy mode, the folded signed
disappears at the right most column.
Committer note:
To test it, run 'perf top --hierarchy, see the '+' symbol at the first
column, then press the right arrow key, the '+' symbol will disappear,
this patch fixes that.
Lucas Stach [Tue, 8 Nov 2016 16:04:10 +0000 (17:04 +0100)]
drm/imx: disable planes before DC
If the DC clock is disabled before the attached IDMACs are properly
stopped the IDMACs may hang the IPU or even the whole system.
Make sure the IDMACs are in safe state by disabling the planes before
removal of the DC clock.
Also set the atomic parameter to false to stop calling the atomic_begin
hook, which does nothing useful as we immediately afterwards turn off
vblank interrupts and possibly send the pending vblank event.
Fixes: 33f14235302f (drm/imx: atomic phase 1: Use transitional atomic
CRTC and plane helpers) Signed-off-by: Lucas Stach <[email protected]> Signed-off-by: Philipp Zabel <[email protected]>
scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove
If a command is aborted in the kernel but not in the adapter, it might be
considered complete and its DMA memory released, but it is still alive in
the adapter, which will trigger an invalid DMA access upon its completion
(in the DMA operations to deliver the command response to the driver).
On powerpc platforms with IOMMU/EEH capabilities, the problem is observed
during PCI device removal with ongoing IO requests -- which might trigger
an EEH event very often, pointing to a 'TCE Request Page Access Error'.
In that path, which is qla2x00_remove_one(), the commands are aborted in
qla2x00_abort_all_cmds(), which does not perform an abort in the adapter
as is done in qla2xxx_eh_abort() for example.
So, this patch changes qla2x00_abort_all_cmds() to abort commands in the
adapter too, with a call to qla2xxx_eh_abort(), which already implements
all the logic to submit abort requests and handle responses.
scsi: qla2xxx: do not queue commands when unloading
When the driver is unloading, in qla2x00_remove_one(), there is a single
call/point in time to abort ongoing commands, qla2x00_abort_all_cmds(),
which is still several steps away from the call to scsi_remove_host().
If more commands continue to arrive and be processed during that
interval, when the driver is tearing down and releasing its structures,
it might potentially hit an oops due to invalid memory access:
Unable to handle kernel paging request for data at address 0x00000138
<...>
NIP [d000000004700a40] qla2xxx_queuecommand+0x80/0x3f0 [qla2xxx]
LR [d000000004700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]
So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.
Before calling task_release_itt() task data is memset to zero because of
which DDP context information is lost resulting in incorrect DDP
resource cleanup, to fix this call task_release_itt() before memset.
Liping Zhang [Sun, 6 Nov 2016 06:40:01 +0000 (14:40 +0800)]
netfilter: nf_tables: fix oops when inserting an element into a verdict map
Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
map sourcemap {
type ipv4_addr : verdict;
}
chain postrouting {
ip saddr vmap @sourcemap accept
}
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----
Nicolas Dichtel says:
After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
remove timed-out entries"), netlink conntrack deletion events may be
sent with a huge delay.
and indeed, this isn't optimal at all. Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge. But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.
We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).
So, after some discussion with Nicolas:
1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)
2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries
3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead
4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.
The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.
Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').
Florian Westphal [Sat, 29 Oct 2016 01:01:50 +0000 (03:01 +0200)]
netfilter: connmark: ignore skbs with magic untracked conntrack objects
The (percpu) untracked conntrack entries can end up with nonzero connmarks.
The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).
Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.
Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.
Bjorn Helgaas [Tue, 8 Nov 2016 20:25:24 +0000 (14:25 -0600)]
PCI: Don't attempt to claim shadow copies of ROM
If we're using a shadow copy of a PCI device ROM, the shadow copy is in RAM
and the device never sees accesses to it and doesn't respond to it. We
don't have to route the shadow range to the PCI device, and the device
doesn't have to claim the range.
Previously we treated the shadow copy as though it were the ROM BAR, and we
failed to claim it because the region wasn't routed to the device:
pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
pci_bus 0000:01: Allocating resources
pci 0000:01:00.0: can't claim BAR 6 [mem 0x000c0000-0x000dffff]: no compatible bridge window
The failure path of pcibios_allocate_dev_rom_resource() cleared out the
resource start address, which also caused the following ioremap() warning:
WARNING: CPU: 0 PID: 116 at /build/linux-akdJXO/linux-4.8.0/arch/x86/mm/ioremap.c:121 __ioremap_caller+0x1ec/0x370
ioremap on RAM at 0x0000000000000000 - 0x000000000001ffff
Handle an option ROM shadow copy as RAM, without trying to insert it into
the iomem resource tree.
This fixes a regression caused by 0c0e0736acad ("PCI: Set ROM shadow
location in arch code, not in PCI core"), which appeared in v4.6. The
regression causes video device initialization to fail. This was reported
on AMD Turks, but it likely affects others as well.
Yuriy Kolerov [Tue, 8 Nov 2016 07:08:32 +0000 (10:08 +0300)]
ARCv2: MCIP: Use IDU_M_DISTRI_DEST mode if there is only 1 destination core
ARC linux uses 2 distribution modes for common interrupts: round robin
mode (IDU_M_DISTRI_RR) and a simple destination mode (IDU_M_DISTRI_DEST).
The first one is used when more than 1 cores may handle a common interrupt
and the second one is used when only 1 core may handle a common interrupt.
However idu_irq_set_affinity() always sets IDU_M_DISTRI_RR for all affinity
values. But there is no sense in setting of such mode if only 1 core must
handle a common interrupt.
Yuriy Kolerov [Tue, 8 Nov 2016 07:08:31 +0000 (10:08 +0300)]
ARC: IRQ: Do not use hwirq as virq and vice versa
This came up when reviewing code to address missing IRQ affinity
setting in AXS103 platform and/or implementing hierarchical IRQ domains
- smp_ipi_irq_setup() callers pass hwirq but in turn calls
request_percpu_irq() which expects a linux virq. So invoke
irq_find_mapping() to do the conversion
(also explicitify this in code by renaming the args appropriately)
- idu_of_init()/idu_cascade_isr() were similarly using linux virq where
hwirq is expected, so do the conversion using irqd_to_hwirq() helper
Signed-off-by: Yuriy Kolerov <[email protected]>
[vgupta: made changelog a bit concise a bit] Signed-off-by: Vineet Gupta <[email protected]>
Linus Torvalds [Tue, 8 Nov 2016 18:07:13 +0000 (10:07 -0800)]
Merge tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- Four patches from Robin Murphy fix several issues with the recently
merged generic DT-bindings support for arm-smmu drivers
- A fix for a dead-lock issue in the VT-d driver, which shows up on
iommu hotplug
* tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
iommu/arm-smmu: Fix out-of-bounds dereference
iommu/arm-smmu: Check that iommu_fwspecs are ours
iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s
iommu/arm-smmu: Work around ARM DMA configuration
Noam Camus [Tue, 8 Nov 2016 09:58:23 +0000 (11:58 +0200)]
ARC: [plat-eznps] remove IPI clear from SMP operations
Today we register to plat_smp_ops.clear() method which actually
is acking the IPI.
However this is already taking care by our irqchip driver specifically
by the irq_chip.irq_eoi() method.
This is perfect timing where it should be done and no special handling
is needed at plat_smp_ops.clear().