Mika Kuoppala [Tue, 7 Jun 2016 14:19:11 +0000 (17:19 +0300)]
drm/i915/kbl: Add WaDisableGafsUnitClkGating
We need to disable clock gating in this unit to work around
hardware issue causing possible corruption/hang.
v2: name the bit (Ville)
v3: leave the fix enabled for 2227050 and set correct bit (Matthew)
v4: Split out the skl part in separate commit for easier backport
Mika Kuoppala [Tue, 7 Jun 2016 14:18:58 +0000 (17:18 +0300)]
drm/i915: Mimic skl with WaForceEnableNonCoherent
Past evidence with system hangs and hsds tie
WaForceEnableNonCoherent and WaDisableHDCInvalidation to
WaForceContextSaveRestoreNonCoherent. Documentation
states that WaForceContextSaveRestoreNonCoherent would
not be needed on skl past E0 but evidence proved otherwise. See
commit <510650e8b2ab> ("drm/i915/skl: Fix spurious gpu hang with gt3/gt4
revs"). In this scope consider kbl to be skl with a bigger revision than
E0 so play it safe and bind these two workarounds to the
WaForceContextSaveRestoreNonCoherent, and apply to all gen9.
* emailed patches from Andrew Morton <[email protected]>:
m32r: fix build warning about putc
mm: workingset: printk missing log level, use pr_info()
mm: thp: refix false positive BUG in page_move_anon_rmap()
mm: rmap: call page_check_address() with sync enabled to avoid racy check
mm: thp: move pmd check inside ptl for freeze_page()
vmlinux.lds: account for destructor sections
gcov: add support for gcc version >= 6
mm, meminit: ensure node is online before checking whether pages are uninitialised
mm, meminit: always return a valid node from early_pfn_to_nid
kasan/quarantine: fix bugs on qlist_move_cache()
uapi: export lirc.h header
madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
Revert "scripts/gdb: add documentation example for radix tree"
Revert "scripts/gdb: add a Radix Tree Parser"
scripts/gdb: Perform path expansion to lx-symbol's arguments
scripts/gdb: add constants.py to .gitignore
scripts/gdb: rebuild constants.py on dependancy change
scripts/gdb: silence 'nothing to do' message
kasan: add newline to messages
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Round three of 4.7 rc fixes:
- two fixes for hfi1
- two fixes for i40iw
- one omission correction in the port table counter arrays"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
i40iw: Enable remote access rights for stag allocation
i40iw: do not print unitialized variables in error message
IB core: Add port_xmit_wait counter
IB/hfi1: Fix sleep inside atomic issue in init_asic_data
IB/hfi1: Correct issues with sc5 computation
i40e: use valid online CPU on q_vector initialization
Currently, the q_vector initialization routine sets the affinity_mask
of a q_vector based on v_idx value. Meaning a loop iterates on v_idx,
which is an incremental value, and the cpumask is created based on
this value.
This is a problem in systems with multiple logical CPUs per core (like in
SMT scenarios). If we disable some logical CPUs, by turning SMT off for
example, we will end up with a sparse cpu_online_mask, i.e., only the first
CPU in a core is online, and incremental filling in q_vector cpumask might
lead to multiple offline CPUs being assigned to q_vectors.
Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.
In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1) where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.
This patch changes the way q_vector's affinity_mask is created: it iterates
on v_idx, but consumes the CPU index from the cpu_online_mask instead of
just using the v_idx incremental value.
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Four driver bugfixes for the I2C subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mux: reg: wrong condition checked for of_address_to_resource return value
i2c: tegra: Correct error path in probe
i2c: remove __init from i2c_register_board_info()
i2c: qup: Fix wrong value of index variable
Paolo Abeni [Wed, 15 Jun 2016 13:37:59 +0000 (15:37 +0200)]
ixgbe: napi_poll must return the work done
Currently the function ixgbe_poll() returns 0 when it clean completely
the rx rings, but this foul budget accounting in core code.
Fix this returning the actual work done, capped to weight - 1, since
the core doesn't allow to return the full budget when the driver modifies
the napi status
Alexander Duyck [Tue, 14 Jun 2016 22:45:42 +0000 (15:45 -0700)]
i40e/i40evf: Fix i40e_rx_checksum
There are a couple of issues I found in i40e_rx_checksum while doing some
recent testing. As a result I have found the Rx checksum logic is pretty
much broken and returning that the checksum is valid for tunnels in cases
where it is not.
First the inner types are not the correct values to use to test for if a
tunnel is present or not. In addition the inner protocol types are not a
bitmask as such performing an OR of the values doesn't make sense. I have
instead changed the code so that the inner protocol types are used to
determine if we report CHECKSUM_UNNECESSARY or not. For anything that does
not end in UDP, TCP, or SCTP it doesn't make much sense to report a
checksum offload since it won't contain a checksum anyway.
This leaves us with the need to set the csum_level based on some value.
For that purpose I am using the tunnel_type field. If the tunnel type is
GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
header. In the case of GRE or UDP we will have a possible checksum present
so for this reason it should be safe to set the csum_level to 1 to indicate
that we are reporting the state of the inner header.
Merge tag 'drm-fixes-for-v4.7-rc8-vmware' of git://people.freedesktop.org/~airlied/linux
Pull drm vmware fixes from Dave Airlie:
"These are some fixes for the vmware graphics driver, that fix some
black screen issues on at least Ubuntu 16.04, I think VMware would
like to get these in so stable can pick them up ASAP"
* tag 'drm-fixes-for-v4.7-rc8-vmware' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Fix error paths when mapping framebuffer
drm/vmwgfx: Fix corner case screen target management
drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
drm/vmwgfx: Check pin count before attempting to move a buffer
drm/ttm: Make ttm_bo_mem_compat available
drm/vmwgfx: Add an option to change assumed FB bpp
drm/vmwgfx: Work around mode set failure in 2D VMs
drm/vmwgfx: Add a check to handle host message failure
Merge tag 'drm-fixes-for-v4.7-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"These are just some i915 and amdgpu fixes that shows up, the amdgpu
ones are polaris fixes, and the i915 one is a major regression fix"
* tag 'drm-fixes-for-v4.7-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: fix power distribution issue for Polaris10 XT
drm/amdgpu: Add a missing register to Polaris golden setting
drm/i915: Ignore panel type from OpRegion on SKL
drm/i915: Update ifdeffery for mutex->owner
mm: thp: refix false positive BUG in page_move_anon_rmap()
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
worth: the syzkaller fuzzer hit it again. It's still wrong for some THP
cases, because linear_page_index() was never intended to apply to
addresses before the start of a vma.
That's easily fixed with a signed long cast inside linear_page_index();
and Dmitry has tested such a patch, to verify the false positive. But
why extend linear_page_index() just for this case? when the avoidance in
page_move_anon_rmap() has already grown ugly, and there's no reason for
the check at all (nothing else there is using address or index).
And one more thing: should the compound_head(page) be done inside or
outside page_move_anon_rmap()? It's usually pushed down to the lowest
level nowadays (and mm/memory.c shows no other explicit use of it), so I
think it's better done in page_move_anon_rmap() than by caller.
mm: rmap: call page_check_address() with sync enabled to avoid racy check
The previous patch addresses the race between split_huge_pmd_address()
and someone changing the pmd. The fix is only for splitting of normal
thp (i.e. pmd-mapped thp,) and for splitting of pte-mapped thp there
still is the similar race.
For splitting pte-mapped thp, the pte's conversion is done by
try_to_unmap_one(TTU_MIGRATION). This function checks
page_check_address() to get the target pte, but it can return NULL under
some race, leading to VM_BUG_ON() in freeze_page(). Fortunately,
page_check_address() already has an argument to decide whether we do a
quick/racy check or not, so let's flip it when called from
freeze_page().
mm: thp: move pmd check inside ptl for freeze_page()
I found a race condition triggering VM_BUG_ON() in freeze_page(), when
running a testcase with 3 processes:
- process 1: keep writing thp,
- process 2: keep clearing soft-dirty bits from virtual address of process 1
- process 3: call migratepages for process 1,
I'm not sure of the full scenario of the reproduction, but my debug
showed that split_huge_pmd_address(freeze=true) returned without running
main code of pmd splitting because pmd_present(*pmd) in precheck somehow
returned 0. If this happens, the subsequent try_to_unmap() fails and
returns non-zero (because page_mapcount() still > 0), and finally
VM_BUG_ON() fires. This patch tries to fix it by prechecking pmd state
inside ptl.
If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array. .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.
mm, meminit: ensure node is online before checking whether pages are uninitialised
early_page_uninitialised looks up an arbitrary PFN. While a machine
without node 0 will boot with "mm, page_alloc: Always return a valid
node from early_pfn_to_nid", it works because it assumes that nodes are
always in PFN order. This is not guaranteed so this patch adds
robustness by always checking if the node being checked is online.
mm, meminit: always return a valid node from early_pfn_to_nid
early_pfn_to_nid can return node 0 if a PFN is invalid on machines that
has no node 0. A machine with only node 1 was observed to crash with
the following message:
The problem is that early_page_uninitialised uses the early_pfn_to_nid
helper which returns node 0 for invalid PFNs. No caller of
early_pfn_to_nid cares except early_page_uninitialised. This patch has
early_pfn_to_nid always return a valid node.
Joonsoo Kim [Thu, 14 Jul 2016 19:07:17 +0000 (12:07 -0700)]
kasan/quarantine: fix bugs on qlist_move_cache()
There are two bugs on qlist_move_cache(). One is that qlist's tail
isn't set properly. curr->next can be NULL since it is singly linked
list and NULL value on tail is invalid if there is one item on qlist.
Another one is that if cache is matched, qlist_put() is called and it
will set curr->next to NULL. It would cause to stop the loop
prematurely.
These problems come from complicated implementation so I'd like to
re-implement it completely. Implementation in this patch is really
simple. Iterate all qlist_nodes and put them to appropriate list.
Unfortunately, I got this bug sometime ago and lose oops message. But,
the bug looks trivial and no need to attach oops.
This is a fixup for commit b7be755733dc ("[media] bz#75751: Move
internal header file lirc.h to uapi/"). It moved the header to the
right place, but it forgot to add it at Kbuild. So, despite being at
uapi, it is not copied to the right place.
madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
madvise_free_huge_pmd should return 0 if the fallback PTE operations are
required. In madvise_free_huge_pmd, if part pages of THP are discarded,
the THP will be split and fallback PTE operations should be used if
splitting succeeds. But the original code will make fallback PTE
operations skipped, after splitting succeeds. Fix that via make
madvise_free_huge_pmd return 0 after splitting successfully, so that the
fallback PTE operations will be done.
Revert "scripts/gdb: add documentation example for radix tree"
This reverts commit 9b5580359a84 ("scripts/gdb: add documentation
example for radix tree")
The python implementation of radix tree was merged at the same time as a
refactoring of the radix tree implementation and doesn't work. The
feature is being reverted, thus we revert the documentation as well.
This reverts commit e127a73d41ac ("scripts/gdb: add a Radix Tree
Parser")
The python implementation of radix-tree was merged at the same time as
the radix-tree system was heavily reworked from commit e9256efcc8e3
("radix-tree: introduce radix_tree_empty") to 3bcadd6fa6c4 ("radix-tree:
free up the bottom bit of exceptional entries for reuse") and no longer
functions, but also prevents other gdb scripts from loading.
This functionality has not yet hit a release, so simply remove it for
now
Nikolay Borisov [Thu, 14 Jul 2016 19:07:04 +0000 (12:07 -0700)]
scripts/gdb: Perform path expansion to lx-symbol's arguments
Python doesn't do automatic expansion of paths. In case one passes path
of the from ~/foo/bar the gdb scripts won't automatically expand that
and as a result the symbols files won't be loaded.
Fix this by explicitly expanding all paths which begin with "~"
scripts/gdb: rebuild constants.py on dependancy change
The autogenerated constants.py file was only being built on the initial
call, and if the constants.py.in file changed. As we are utilising the
CPP hooks, we can successfully use the call if_changed_dep rules to
determine when to rebuild the file based on it's inclusions.
The constants.py generation, involves a rule to link into the main
makefile. This rule has no command and generates a spurious warning
message in the build logs when CONFIG_SCRIPTS_GDB is enabled.
David Rientjes [Thu, 14 Jul 2016 19:06:50 +0000 (12:06 -0700)]
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check. In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.
This should never have been a VM_BUG_ON() since it's not a fatal
condition. It should have been a VM_WARN_ON() at best, or even handled
gracefully.
Regardless, we need to terminate anytime the full pageblock scan was not
done. The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.
Dave Airlie [Fri, 15 Jul 2016 03:51:55 +0000 (13:51 +1000)]
Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
A bunch of vmwgfx fixes that fix a black screen issue on latest distros/hw combos.
* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Fix error paths when mapping framebuffer
drm/vmwgfx: Fix corner case screen target management
drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
drm/vmwgfx: Check pin count before attempting to move a buffer
drm/ttm: Make ttm_bo_mem_compat available
drm/vmwgfx: Add an option to change assumed FB bpp
drm/vmwgfx: Work around mode set failure in 2D VMs
drm/vmwgfx: Add a check to handle host message failure
Dave Airlie [Thu, 14 Jul 2016 23:19:14 +0000 (09:19 +1000)]
Merge tag 'drm-intel-fixes-2016-07-14' of git://anongit.freedesktop.org/drm-intel into drm-fixes
I've also realized that a pile of hang fixes for kbl landed in next, and
no one thought of backporting it to 4.7 - kbl has lost prelim_hw_support
tagging in 4.7-rc1 already. Mika is prepping a topic branch for those,
will send you a separate pull request since it's quite a bit (but should
be all well restricted to kbl code, so similar to polaris in amdgpu).
* tag 'drm-intel-fixes-2016-07-14' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Ignore panel type from OpRegion on SKL
drm/i915: Update ifdeffery for mutex->owner
bonding: set carrier off for devices created through netlink
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:
$ modprobe bonding
$ echo +bond1 > /sys/class/net/bonding_masters
$ ip link add bond2 type bond
$ grep "MII Status" /proc/net/bonding/*
/proc/net/bonding/bond0:MII Status: down
/proc/net/bonding/bond1:MII Status: down
/proc/net/bonding/bond2:MII Status: up
Ensure that carrier is initially off also for devices created through
netlink.
Dave Airlie [Thu, 14 Jul 2016 23:17:39 +0000 (09:17 +1000)]
Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Two more polaris fixes.
* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: fix power distribution issue for Polaris10 XT
drm/amdgpu: Add a missing register to Polaris golden setting
Andrew Duggan [Thu, 14 Jul 2016 16:35:44 +0000 (09:35 -0700)]
Input: synaptics-rmi4 - use of_get_child_by_name() to fix refcount
Calling of_find_node_by_name() assumes that the caller has incremented
the refcount of the of_node being passed in. Currently, the caller is
not incrementing the refcount of the of_node which results in the node
being prematurely freed when of_find_node_by_name() calls of_node_put()
on it. Instead use of_get_child_by_name() which does not call put on the
of_node.
Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE"
This reverts commit 5f7e5445a2de848c66d2d80ba5479197e8287c33 because
removal of input_mt_report_slot_state() means we no longer generate
tracking IDs for the reported contacts.
commit c9711ec5250b ("mtd: nand: omap: Clean up device tree support")
removes the check for the old elm phandle binding.
Add it again to keep backward compatibility.
Fixes: commit c9711ec5250b ("mtd: nand: omap: Clean up device tree support") Signed-off-by: Teresa Remmet <[email protected]> Signed-off-by: Brian Norris <[email protected]>
Keith Busch [Wed, 13 Jul 2016 17:45:02 +0000 (11:45 -0600)]
nvme: Remove RCU namespace protection
We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.
To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.
List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.
List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.
Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9 nvme: switch to RCU freeing the namespace] Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Ville Syrjälä [Tue, 12 Jul 2016 12:00:37 +0000 (15:00 +0300)]
drm/i915: Ignore panel type from OpRegion on SKL
Dell XPS 13 9350 apparently doesn't like it when we use the panel type
from OpRegion. The OpRegion panel type (0) tells us to use use low
vswing for eDP, whereas the VBT panel type (2) tells us to use normal
vswing. The problem is that low vswing results in some display flickers.
Since no one seems to know how this stuff is supposed to be handled,
let's just ignore the OpRegion panel type on SKL for now.
v2: Print the panel type correctly in the debug output
Chris Wilson [Mon, 11 Jul 2016 13:46:17 +0000 (14:46 +0100)]
drm/i915: Update ifdeffery for mutex->owner
In commit 7608a43d8f2e ("locking/mutexes: Use MUTEX_SPIN_ON_OWNER when
appropriate") the owner field in the mutex was updated from being
dependent upon CONFIG_SMP to using optimistic spin. Update our peek
function to suite.
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This contains three commits to fix memory corruption bugs with certain
Apple AirPort cards, plus a fix for a X86_BUG() ID definitions collision
bug in asm/cpufeatures.h"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/quirks: Add early quirk to reset Apple AirPort card
x86/quirks: Reintroduce scanning of secondary buses
x86/quirks: Apply nvidia_bugs quirk only on root bus
x86/cpu: Fix duplicated X86_BUG(9) macro
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"Fix an objtool false positive plus an UP kernel memory corruption bug
on certain configs"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble
objtool: Fix STACK_FRAME_NON_STANDARD macro checking for function symbols
David S. Miller [Wed, 13 Jul 2016 18:53:41 +0000 (11:53 -0700)]
Merge branch 'sk_filter-trim-limit'
Willem de Bruijn says:
====================
limit sk_filter trim to payload
Sockets can apply a filter to incoming packets to drop or trim them.
Fix two codepaths that call skb_pull/__skb_pull after sk_filter
without checking for packet length.
Reading beyond skb->tail after trimming happens in more codepaths, but
safety of reading in the linear segment is based on minimum allocation
size (MAX_HEADER, GRO_MAX_HEAD, ..).
====================
Willem de Bruijn [Tue, 12 Jul 2016 22:18:57 +0000 (18:18 -0400)]
dccp: limit sk_filter trim to payload
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.
Willem de Bruijn [Tue, 12 Jul 2016 22:18:56 +0000 (18:18 -0400)]
rose: limit sk_filter trim to payload
Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.
Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.
Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.
This patch set provides two trivial fixes for the tx timeout series lately
applied into net 4.7.
From Daniel, detect stuck queues due to BQL
From Mohamad, fix tx timeout watchdog false alarm
Hopefully those two fixes will make it to -stable, assuming 3947ca185999 ('net/mlx5e: Implement ndo_tx_timeout callback') was also backported to -stable.
====================
net/mlx5e: start/stop all tx queues upon open/close netdev
Start all tx queues (including inactive ones) when opening the netdev.
Stop all tx queues (including inactive ones) when closing the netdev.
This is a workaround for the tx timeout watchdog false alarm issue in
which the netdev watchdog is polling all the tx queues which may include
inactive queues and thus once lowering the real tx queues number
(ethtool -L) it will generate tx timeout watchdog false alarms.
Fixes: 3947ca185999 ('net/mlx5e: Implement ndo_tx_timeout callback') Signed-off-by: Mohamad Haj Yahia <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Thomas Gleixner [Tue, 12 Jul 2016 16:33:56 +0000 (18:33 +0200)]
sched/core: Correct off by one bug in load migration calculation
The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.
Fix it by adjusting for the currently running thread which calls
calc_load_migrate().
Merge tag 'media/v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Two regression fixes:
- a regression when handling VIDIOC_CROPCAP at the media core;
- a regression at adv7604 that was ignoring pad number in subdev ops"
* tag 'media/v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] adv7604: Don't ignore pad number in subdev DV timings pad operations
[media] v4l2-ioctl: fix stupid mistake in cropcap condition
Thomas Gleixner [Tue, 12 Jul 2016 19:59:23 +0000 (21:59 +0200)]
cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble
Xiaolong Ye reported lock debug warnings triggered by the following commit:
8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")
The bug is the following: the cpuhp_bp_states[] array is cut short when
CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
and happily scribble outside of the array bounds...
We need to store them in case that the state is unregistered so we can invoke
the teardown function. That's independent of CONFIG_SMP. Make sure the array
is large enough.
David S. Miller [Wed, 13 Jul 2016 06:13:01 +0000 (23:13 -0700)]
Merge branch 'ethoc-fixes'
Florian Fainelli says:
====================
net: ethoc: Error path and transmit fixes
This patch series contains two patches for the ethoc driver while testing on a
TS-7300 board where ethoc is provided by an on-board FPGA.
First patch was cooked after chasing crashes with invalid resources passed to
the driver.
Second patch was cooked after seeing that an interface configured with IP
192.168.2.2 was sending ARP packets for 192.168.0.0, no wonder why it could not
work.
I don't have access to any other platform using an ethoc interface so
it could be good to some testing on Xtensa for instance.
Changes in v3:
- corrected the error path if skb_put_padto() fails, thanks to Max
for spotting this!
Changes in v2:
- fixed the first commit message
====================
Even though the hardware can be doing zero padding, we want the SKB to
be going out on the wire with the appropriate size. This fixes packet
truncations observed with e.g: ARP packets.
In case any operation fails before we can successfully go the point
where we would register a MDIO bus, we would be going to an error label
which involves unregistering then freeing this yet to be created MDIO
bus. Update all error paths to go to label free which is the only one
valid until either the clock is enabled, or the MDIO bus is allocated
and registered. This fixes kernel oops observed while trying to
dereference the MDIO bus structure which is not yet allocated.
Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge tag 'acpi-urgent-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"One ACPI EC driver regression fix (code ordering) and three reverts of
ACPICA commits, one that introduced a problem and two unsuccessful
attempted fixes on top of it.
Specifics:
- Fix a recent regression in the ACPI EC driver introduced by a fix
of another problem that uncovered a latent code ordering issue in
the driver (Lv Zheng).
- Revert a recent ACPICA commit that attempted to address a lock
ordering issue introduced by a previous fix, but caused Dell
Precision 5510 to fail to boot, revert that previous fix too and
finally revert the commit that caused the original problem (a
deadlock in the ACPICA code) to happen (Rafael Wysocki)"
* tag 'acpi-urgent-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI 2.0 / AML: Improve module level execution by moving the If/Else/While execution to per-table basis"
Revert "ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading"
Revert "ACPICA: Namespace: Fix namespace/interpreter lock ordering"
ACPI / EC: Fix code ordering issue in ec_remove_handlers()
During commit b54b8c2d6e3c
("net: ezchip: adapt driver to little endian architecture")
adapting to little endian architecture,
zeroing of controller was left out.
Merge tag 'qcom-smd-list-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"Fix qcom-smd list voltage issues for msm8974
This commit looks like a cleanup but in fact by causing the core to go
down some simplified code paths for noop regulators it avoids a boot
time crash for msm8974 platforms which was introduced in v4.7. It has
been in -next for a while, the issues in mainline for these platforms
weren't flagged up to me until yesterday (I think it took some time to
figure out what was going wrong)"
* tag 'qcom-smd-list-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom_smd: Remove list_voltage callback for rpm_smps_ldo_ops_fixed
Nicolas Iooss [Sat, 25 Jun 2016 15:55:07 +0000 (17:55 +0200)]
i40iw: do not print unitialized variables in error message
i40iw_create_cqp() printed the contents of variables maj_err and min_err
in an error message before they could be initialized (by calling
dev->cqp_ops->cqp_create).
Tadeusz Struk [Wed, 6 Jul 2016 21:14:47 +0000 (17:14 -0400)]
IB/hfi1: Fix sleep inside atomic issue in init_asic_data
The critical section should protect only the list traversal
and dd->asic_data modification, not the memory allocation.
The fix pulls the allocation out of the critical section.
There are several computatations of the sc in the
ud receive routine.
Besides the code duplication, all are wrong when the
sc is greater than 15. In that case the code incorrectly
or's a 1 into the computed sc instead of 1 shifted left
by 4.
Fix precomputed sc5 by using an already implemented routine
hdr2sc() and deleting flawed duplicated code.
netfilter: conntrack: skip clash resolution if nat is in place
The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:
Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().
Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Signed-off-by: Pablo Neira Ayuso <[email protected]> Tested-by: Marc Dionne <[email protected]>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
posix_acl: de-union a_refcount and a_rcu
nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
Use the right predicate in ->atomic_open() instances
Jon Paul Maloy [Mon, 11 Jul 2016 20:08:37 +0000 (16:08 -0400)]
tipc: reset all unicast links when broadcast send link fails
In test situations with many nodes and a heavily stressed system we have
observed that the transmission broadcast link may fail due to an
excessive number of retransmissions of the same packet. In such
situations we need to reset all unicast links to all peers, in order to
reset and re-synchronize the broadcast link.
In this commit, we add a new function tipc_bearer_reset_all() to be used
in such situations. The function scans across all bearers and resets all
their pertaining links.
Jon Paul Maloy [Mon, 11 Jul 2016 20:08:36 +0000 (16:08 -0400)]
tipc: ensure correct broadcast send buffer release when peer is lost
After a new receiver peer has been added to the broadcast transmission
link, we allow immediate transmission of new broadcast packets, trusting
that the new peer will not accept the packets until it has received the
previously sent unicast broadcast initialiation message. In the same
way, the sender must not accept any acknowledges until it has itself
received the broadcast initialization from the peer, as well as
confirmation of the reception of its own initialization message.
Furthermore, when a receiver peer goes down, the sender has to produce
the missing acknowledges from the lost peer locally, in order ensure
correct release of the buffers that were expected to be acknowledged by
the said peer.
In a highly stressed system we have observed that contact with a peer
may come up and be lost before the above mentioned broadcast initial-
ization and confirmation have been received. This leads to the locally
produced acknowledges being rejected, and the non-acknowledged buffers
to linger in the broadcast link transmission queue until it fills up
and the link goes into permanent congestion.
In this commit, we remedy this by temporarily setting the corresponding
broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
state to true before we issue the local acknowledges. This ensures that
those acknowledges will always be accepted. The mentioned state values
are restored immediately afterwards when the link is reset.
Jon Paul Maloy [Mon, 11 Jul 2016 20:08:35 +0000 (16:08 -0400)]
tipc: extend broadcast link initialization criteria
At first contact between two nodes, an endpoint might sometimes have
time to send out a LINK_PROTOCOL/STATE packet before it has received
the broadcast initialization packet from the peer, i.e., before it has
received a valid broadcast packet number to add to the 'bc_ack' field
of the protocol message.
This means that the peer endpoint will receive a protocol packet with an
invalid broadcast acknowledge value of 0. Under unlucky circumstances
this may lead to the original, already received acknowledge value being
overwritten, so that the whole broadcast link goes stale after a while.
We fix this by delaying the setting of the link field 'bc_peer_is_up'
until we know that the peer really has received our own broadcast
initialization message. The latter is always sent out as the first
unicast message on a link, and always with seqeunce number 1. Because
of this, we only need to look for a non-zero unicast acknowledge value
in the arriving STATE messages, and once that is confirmed we know we
are safe and can set the mentioned field. Before this moment, we must
ignore all broadcast acknowledges from the peer.
r8152: Add support for setting pass through MAC address on RTL8153-AD
The RTL8153-AD supports a persistent system specific MAC address.
This means a device plugged into two different systems with host side
support will show different (but persistent) MAC addresses.
This information for the system's persistent MAC address is burned in when
the system HW is built and available under \_SB.AMAC in the DSDT at runtime.
This technology is currently implemented in the Dell TB15 and WD15 Type-C
docks. More information is available here:
http://www.dell.com/support/article/us/en/04/SLN301147
sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send
Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.
SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).
Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.
Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.
Eric Dumazet [Sun, 10 Jul 2016 08:04:02 +0000 (10:04 +0200)]
tcp: make challenge acks less predictable
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Michal Kubeček [Fri, 8 Jul 2016 15:52:33 +0000 (17:52 +0200)]
udp: prevent bugcheck if filter truncates packet too much
If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8
Colin Ian King [Fri, 8 Jul 2016 15:42:48 +0000 (16:42 +0100)]
bnxt_en: initialize rc to zero to avoid returning garbage
rc is not initialized so it can contain garbage if it is not
set by the call to bnxt_read_sfp_module_eeprom_info. Ensure
garbage is not returned by initializing rc to 0.