Jakub Kicinski [Wed, 8 Jun 2016 19:11:04 +0000 (20:11 +0100)]
net: cls_u32: be more strict about skip-sw flag for knodes
Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).
This patch fixes the knode handling.
Wei Fang [Tue, 7 Jun 2016 06:53:56 +0000 (14:53 +0800)]
scsi: fix race between simultaneous decrements of ->host_failed
sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.
It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.
Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.
Dave Airlie [Thu, 9 Jun 2016 02:32:09 +0000 (12:32 +1000)]
Merge tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux into drm-fixes
This pull request brings in vblank/pageflip fixes I had hoped to see
merged before 4.7rc1, plus two new fixes that have come in since then.
* tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux:
drm/vc4: Make pageflip completion handling more robust.
drm/vc4: Fix ioctl permissions for render nodes.
drm/vc4: Return -EBUSY if there's already a pending flip event.
drm/vc4: Fix drm_vblank_put/get imbalance in page flip path.
drm/vc4: Fix get_vblank_counter with proper no-op for Linux 4.4+
Dave Airlie [Thu, 9 Jun 2016 02:30:29 +0000 (12:30 +1000)]
Merge branch 'linux-4.7' of git://github.com/skeggsb/linux into drm-fixes
Fixes for two issues reported by KASAN, a display engine hang due to
incorrect BIOS table parsing, and incorrect LTC interrupt handling on
Maxwell which could lead to a never-ending interrupt storm.
* 'linux-4.7' of git://github.com/skeggsb/linux:
drm/nouveau/disp/sor/gm107: training pattern registers are like gm200
drm/nouveau/disp/sor/gf119: both links use the same training register
drm/nouveau/core: swap the order of imem/fb
drm/nouveau/fbcon: fix out-of-bounds memory accesses
drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
drm/nouveau/ltc/gm107-: fix typo in the address of NV_PLTCG_LTC0_LTS0_INTR
drm/nouveau/bios/disp: fix handling of "match any protocol" entries
Stefan Agner [Fri, 3 Jun 2016 21:21:34 +0000 (14:21 -0700)]
drm/fsl-dcu: use flat regmap cache
Using flat regmap cache instead of RB-tree to avoid the following
lockdep warning on driver load:
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
The RB-tree regmap cache needs to allocate new space on first
writes. However, allocations in an atomic context (e.g. when a
spinlock is held) are not allowed. The function regmap_write
calls map->lock, which acquires a spinlock in the fast_io case.
Since the FSL DCU driver uses MMIO, the regmap bus of type
regmap_mmio is being used which has fast_io set to true.
Use flat regmap cache and specify max register to be large
enouth to cover all registers available in LS1021a and Vybrids
register space.
Lenovo Thinkpad devices T460, T460s, T460p, T560, X260 use
HKEY version 0x200 without adaptive keyboard.
HKEY version 0x200 has method MHKA with one parameter value.
Passing parameter value 1 will get hotkey_all_mask (the same like
HKEY version 0x100 without parameter). Passing parameter value 2 to
MHKA method will retrieve hotkey_all_adaptive_mask. If 0 is returned in
that case there is no adaptive keyboard available.
Eric Dumazet [Wed, 8 Jun 2016 13:19:45 +0000 (06:19 -0700)]
net_sched: add missing paddattr description
"make htmldocs" complains otherwise:
.//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr'
.//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr'
Jakub Sitnicki [Wed, 8 Jun 2016 13:13:34 +0000 (15:13 +0200)]
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.
If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.
To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:
1) Set up a transformation and lower the MTU to trigger fragmentation
# ip xfrm policy add dir out src ::1 dst ::1 \
tmpl src ::1 dst ::1 proto esp spi 1
# ip xfrm state add src ::1 dst ::1 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
# ip link set dev lo mtu 1500
2) Monitor the packet flow and set up an UDP sink
# tcpdump -ni lo -ttt &
# socat udp6-listen:12345,fork /dev/null &
4) Compare it to a non-connected socket
# perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
What happens in step (3) is:
1) when connecting the socket in __ip6_datagram_connect(), we
perform an XFRM lookup, miss the flow cache, create an XFRM
bundle, and cache the destination,
2) afterwards, when sending the datagram, we perform an XFRM lookup,
again, miss the flow cache (due to mismatch of flowi6_iif and
flowi6_oif, which is an issue of its own), and recreate an XFRM
bundle based on the cached (and already transformed) destination.
To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.
The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.
Guillaume Nault [Wed, 8 Jun 2016 10:59:17 +0000 (12:59 +0200)]
l2tp: fix configuration passed to setup_udp_tunnel_sock()
Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().
v2: use empty initialiser instead of "{ NULL }" to avoid relying on
first field's type.
Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Bob Liu [Tue, 31 May 2016 08:59:17 +0000 (16:59 +0800)]
xen-blkfront: fix resume issues after a migration
After a migrate to another host (which may not have multiqueue
support), the number of rings (block hardware queues)
may be changed and the ring info structure will also be reallocated.
This patch fixes two related bugs:
* call blk_mq_update_nr_hw_queues() to make blk-core know the number
of hardware queues have been changed.
* Don't store rinfo pointer to hctx->driver_data, because rinfo may be
reallocated so use hctx->queue_num to get the rinfo structure instead.
Bob Liu [Tue, 7 Jun 2016 14:43:15 +0000 (10:43 -0400)]
xen-blkfront: don't call talk_to_blkback when already connected to blkback
Sometimes blkfront may twice receive blkback_changed() notification
(XenbusStateConnected) after migration, which will cause
talk_to_blkback() to be called twice too and confuse xen-blkback.
The flow is as follow:
blkfront blkback
blkfront_resume()
> talk_to_blkback()
> Set blkfront to XenbusStateInitialised
front changed()
> Connect()
> Set blkback to XenbusStateConnected
blkback_changed()
> Skip talk_to_blkback()
because frontstate == XenbusStateInitialised
> blkfront_connect()
> Set blkfront to XenbusStateConnected
-----
And here we get another XenbusStateConnected notification leading
to:
-----
blkback_changed()
> because now frontstate != XenbusStateInitialised
talk_to_blkback() is also called again
> blkfront state changed from
XenbusStateConnected to XenbusStateInitialised
(Which is not correct!)
front_changed():
> Do nothing because blkback
already in XenbusStateConnected
Now blkback is in XenbusStateConnected but blkfront is still
in XenbusStateInitialised - leading to no disks.
Poking of the XenbusStateConnected state is allowed (to deal with
block disk change) and has to be dealt with. The most likely
cause of this bug are custom udev scripts hooking up the disks
and then validating the size.
Mel Gorman [Wed, 8 Jun 2016 13:25:22 +0000 (14:25 +0100)]
futex: Calculate the futex key based on a tail page for file-based futexes
Mike Galbraith reported that the LTP test case futex_wake04 was broken
by commit 65d8fc777f6d ("futex: Remove requirement for lock_page()
in get_futex_key()").
This test case uses futexes backed by hugetlbfs pages and so there is an
associated inode with a futex stored on such pages. The problem is that
the key is being calculated based on the head page index of the hugetlbfs
page and not the tail page.
Prior to the optimisation, the page lock was used to stabilise mappings and
pin the inode is file-backed which is overkill. If the page was a compound
page, the head page was automatically looked up as part of the page lock
operation but the tail page index was used to calculate the futex key.
After the optimisation, the compound head is looked up early and the page
lock is only relied upon to identify truncated pages, special pages or a
shmem page moving to swapcache. The head page is looked up because without
the page lock, special care has to be taken to pin the inode correctly.
However, the tail page is still required to calculate the futex key so
this patch records the tail page.
On vanilla 4.6, the output of the test case is;
futex_wake04 0 TINFO : Hugepagesize 2097152
futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.
Josef Bacik [Wed, 8 Jun 2016 14:32:10 +0000 (10:32 -0400)]
nbd: pass the nbd pointer for flags debugfs
We were passing in &nbd for the private data in debugfs_create_file() for the
flags entry. We expect it to just be nbd, fix this so we get proper output from
this debugfs entry.
drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_send_msg()+0x107: duplicate frame pointer save
drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_host_get_guestinfo()+0x252: duplicate frame pointer save
To quote Linus:
"The reason is that VMW_PORT_HB_OUT() uses a magic instruction sequence
(a "rep outsb") to communicate with the hypervisor (it's a virtual GPU
driver for vmware), and %rbp is part of the communication. So the
inline asm does a save-and-restore of the frame pointer around the
instruction sequence.
I actually find the objtool warning to be quite reasonable, so it's
not exactly a false positive, since in this case it actually does
point out that the frame pointer won't be reliable over that
instruction sequence.
But in this particular case it just ends up being the wrong thing -
the code is what it is, and %rbp just can't have the frame information
due to annoying magic calling conventions."
Silence the warnings by telling objtool to ignore the two functions
which use the VMW_PORT_HB_{IN,OUT} macros.
Robin Murphy [Tue, 7 Jun 2016 17:44:48 +0000 (18:44 +0100)]
drivers: of: Fix of_pci.h header guard
The compilation of of_pci.c is governed by CONFIG_OF_PCI, but the
corresponding declarations in of_pci.h are inconsistently guarded by
CONFIG_OF, with the result that if CONFIG_PCI is disabled for an OF
platform, the dangling external declarations are still active and the
inline stub definitions not. So far this has managed to go unnoticed
since it happens that the only references to these functions are from
code which itself depends on CONFIG_PCI or CONFIG_OF_PCI.
Fix this with the appropriate config guard so that any new callers
outside PCI-specific code don't start unexpectedly breaking under
certain configs.
The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.
Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.
Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.
Josh Poimboeuf [Fri, 3 Jun 2016 22:58:40 +0000 (17:58 -0500)]
sched/debug: Fix /proc/sched_debug regression
Commit:
cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
... introduced a bug when CONFIG_SCHEDSTATS is enabled and the
runtime tunable is disabled (which is the default).
The wait-time, sum-exec, and sum-sleep fields are missing from the
/proc/sched_debug file in the runnable_tasks section.
Fix it with a new schedstat_val() macro which returns the field value
when schedstats is enabled and zero otherwise. The macro works with
both SCHEDSTATS and !SCHEDSTATS. I put the macro in stats.h since it
might end up being useful in other places.
There is no way to end up in _free_event() with event::pmu being NULL.
The latter is initialized in event allocation path and remains set
forever. In case of allocation failure, the error path doesn't use
_free_event().
Having the check, however, suggests that it is possible to have a
event::pmu==NULL situation in _free_event() and confuses the robots.
Peter Zijlstra [Wed, 8 Jun 2016 08:19:51 +0000 (10:19 +0200)]
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:
flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.
The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
with what looks like the wrong parameter. The write_lock_regs
function takes a pointer to the registers, not the bcm_kona_gpio
structure.
Fix the warning, and probably bug by changing the function to
pass reg_base instead of kona_gpio, fixing the following warning:
drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
(different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
Borislav Petkov [Wed, 1 Jun 2016 10:04:28 +0000 (12:04 +0200)]
x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models
We need to reenable the topology extensions CPUID leafs on newer models
too, if BIOS has disabled them, as we rely on them to get proper compute
unit topology.
regulator: qcom_smd: add regulator ops for pm8941 lnldo
After "regulator: qcom_smd: add list_voltage callback" patch adding
pm8941 lnldo regulators would bug on list_voltages as it is a fixed
regulator without any linear range.
This patch fixes that issue by adding dedicated ops for pm8941 lnldo
without list_voltages callback.
Dave Hansen [Fri, 3 Jun 2016 00:19:27 +0000 (17:19 -0700)]
x86/cpu/intel: Introduce macros for Intel family numbers
Problem:
We have a boatload of open-coded family-6 model numbers. Half of
them have these model numbers in hex and the other half in
decimal. This makes grepping for them tons of fun, if you were
to try.
Solution:
Consolidate all the magic numbers. Put all the definitions in
one header.
The names here are closely derived from the comments describing
the models from arch/x86/events/intel/core.c. We could easily
make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
they seemed fine even with the longer versions to me.
Do not take any of these names too literally, like "DESKTOP"
or "MOBILE". These are all colloquial names and not precise
descriptions of everywhere a given model will show up.
Linus Walleij [Wed, 8 Jun 2016 08:29:48 +0000 (10:29 +0200)]
leds: handle suspend/resume in heartbeat trigger
The following phenomena was observed: when suspending the
system, sometimes the heartbeat LED was left on, glowing and
wasting power while the rest of the system is asleep, also
disturbing power dissapation measures on the odd suspend
cycle when it's left on.
Clearly this is not how we want the heartbeat trigger to
work: it should turn off and leave the LED off during
system suspend.
This removes the heartbeat trigger when preparing suspend and
restores it during resume. The trigger code will make sure all
LEDs are left in OFF state after removing the trigger, and
will re-enable the trigger on all LEDs after resuming.
Tony Makkiel [Wed, 18 May 2016 16:22:45 +0000 (17:22 +0100)]
leds: core: Fix brightness setting upon hardware blinking enabled
Commit 76931edd54f8 ("leds: fix brightness changing when software blinking
is active") changed the semantics of led_set_brightness() which according
to the documentation should disable blinking upon any brightness setting.
Moreover it made it different for soft blink case, where it was possible
to change blink brightness, and for hardware blink case, where setting
any brightness greater than 0 was ignored.
While the change itself is against the documentation claims, it was driven
also by the fact that timer trigger remained active after turning blinking
off. Fixing that would have required major refactoring in the led-core,
led-class, and led-triggers because of cyclic dependencies.
Finally, it has been decided that allowing for brightness change during
blinking is beneficial as it can be accomplished without disturbing
blink rhythm.
The change in brightness setting semantics will not affect existing
LED class drivers that implement blink_set op thanks to the LED_BLINK_SW
flag introduced by this patch. The flag state will be from now on checked
in led_set_brightness() which will allow to distinguish between software
and hardware blink mode. In the latter case the control will be passed
directly to the drivers which apply their semantics on brightness set,
which is disable the blinking in case of most such drivers. New drivers
will apply new semantics and just change the brightness while hardware
blinking is on, if possible.
The issue was smuggled by subsequent LED core improvements, which modified
the code that originally introduced the problem.
Fixes: f1e80c07416a ("leds: core: Add two new LED_BLINK_ flags") Signed-off-by: Tony Makkiel <[email protected]> Signed-off-by: Jacek Anaszewski <[email protected]>
Will Deacon [Tue, 7 Jun 2016 16:55:15 +0000 (17:55 +0100)]
arm64: mm: always take dirty state from new pte in ptep_set_access_flags
Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:
| This patch breaks swapping for me.
| In the broken case, you'll see either systemd cpu time spike (because
| it's stuck in a page fault loop) or the system hang (because the
| application owning the screen is stuck in a page fault loop).
It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:
1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
3. A read faults due to the missing access flag
4. ptep_set_access_flags is called with dirty = 0, due to the read fault
5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
6. A write faults, but pte_write is true so we get stuck
The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.
Linus Walleij [Wed, 8 Jun 2016 08:58:20 +0000 (10:58 +0200)]
gpio: include <linux/io-mapping.h> in gpiolib-of
When enabling the gpiolib for all archs a build robot came
up with this:
All errors (new ones prefixed by >>):
drivers/gpio/gpiolib-of.c: In function 'of_mm_gpiochip_add_data':
>> drivers/gpio/gpiolib-of.c:317:2: error: implicit declaration of
function 'iounmap' [-Werror=implicit-function-declaration]
iounmap(mm_gc->regs);
^~~~~~~
cc1: some warnings being treated as errors
Fix this by including <linux/io-mapping.h> explicitly.
Fixes: 296ad4acb8ef ("gpio: remove deps on ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
gpiolib relies on the reference counters to clean up the gpio_device
structure.
Although the number of get/put is properly aligned on gpiolib.c
itself, it does not take into consideration how the referece counters
are affected by other external functions such as cdev_add and device_add.
Because of this, after the last call to put_device, the reference counter
has a value of +3, therefore never calling gpiodevice_release.
Due to the fact that some of the device has already been cleaned on
gpiochip_remove, the library will end up OOPsing the kernel (e.g. a call
to of_gpiochip_find_and_xlate).
Helmut Grohne [Fri, 3 Jun 2016 12:15:32 +0000 (14:15 +0200)]
gpio: zynq: initialize clock even without CONFIG_PM
When the PM initialization was moved in the commit referenced below, the
code enabling the clock was removed from the probe function. On
CONFIG_PM=y kernels, this is not a problem as the pm resume hook enables
the clock, but when power management is disabled, all those pm_*
functions are noops and the clock is never enabled resulting in a
dysfunctional gpio controller.
Put the clock initialization back to support CONFIG_PM=n.
Cc: [email protected] Signed-off-by: Helmut Grohne <[email protected]> Fixes: 3773c195d387 ("gpio: zynq: Do PM initialization earlier to support gpio hogs") Signed-off-by: Linus Walleij <[email protected]>
gpio: 104-dio-48e: Fix control port offset computation off-by-one error
There are only two control ports, each controlling three distinct I/O
ports. To compute the control port address offset for a respective I/O
port, the I/O port address offset should be divided by 3; dividing by 2
may result in not only the wrong address offset but possibly also an
out-of-bounds array memory access for a non-existent third control port.
Fixes: 1b06d64f7374 ("gpio: Add GPIO support for the ACCES 104-DIO-48E") Signed-off-by: William Breathitt Gray <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
Ben Dooks [Tue, 7 Jun 2016 18:27:51 +0000 (19:27 +0100)]
net-sysfs: fix missing <linux/of_net.h>
The of_find_net_device_by_node() function is defined in
<linux/of_net.h> but not included in the .c file that
implements it. Fix the following warning by including the
header:
net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?
Toshiaki Makita [Tue, 7 Jun 2016 10:14:17 +0000 (19:14 +0900)]
bridge: Don't insert unnecessary local fdb entry on changing mac address
The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.
Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge tag 'iio-fixes-for-4.7a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First round of iio fixes for the 4.7 cycle.
A slightly bumper set due to travel delaying the pull request and a fair few
issues with the recent merge window patches. Patches all over the place.
The st-sensors one is probably the most involved, but definitly solves the
issues seen. Note there are some other issues around that handler
(and the fact that a lot of boards tie a level interrupt chip to an
edge interrupt only irq chip). These are not regressions however, so
will turn up the slow route.
* core
- iio_trigger_attach_pollfunc had some really badly wrong error handling.
Another nasty triggered whilst chasing down issues with the st sensors
rework below.
* ad5592r
- fix an off by one error when allocating channels.
* am2315
- a stray mutex unlock before we ever take the lock.
* apds9960
- missing a parent in the driver model (which should be the i2c device).
Result is it doesn't turn up under /sys/bus/i2c/devices which some
userspace code uses for repeatable device identification.
* as3935
- ABI usage bug which meant a processed value was reported as raw. Now
reporting scale as well to ensure userspace has the info it needs.
- Don't return processed value via the buffer - it doesn't conform to
the ABI and will overflow in some cases.
- Fix a wrongly sized buffer which would overflow trashing part of the
stack. Also move it onto the heap as part of the fix.
* bh1780
- a missing return after write in debugfs lead to an incorrect read and
a null pointer dereference.
- dereferencing the wrong pointer in suspend and resume leading to
unpredictable results.
- assign a static name to avoid accidentally ending up with no name if
loaded via device tree.
* bmi160
- output data rate for the accelerometer was incorrectly reported. Fix it.
- writing the output data rate was also wrong due to reverse parameters.
* bmp280
- error message for wrong chip ID gave the wrong expected value.
* hdc100x
- mask for writing the integration time was wrong allowin g us to get
'stuck' in a particular value with no way back.
- temperature reported in celsius rather than millicelsius as per the
ABI.
- Get rid of some incorrect data shifting which lead to readings being
rather incorrect.
* max44000
- drop scale attribute for proximity as it is an unscaled value (depends
on what is in range rather than anything knowable at the detector).
* st-pressure
- ABI compliance fixes - units were wrong.
* st-sensors
- We introduced some nasty issues with the recent switch over to a
a somewhat threaded handler in that we broke using a software trigger
with these devices. Now do it properly. It's a larger patch than ideal
for a fix, but the logic is straight forward.
- Make sure the trigger is initialized before requesting the interrupt.
This matters now the interrupt can be shared. Before it was ugly and wrong
but short of flakey hardware could not be triggered.
- Hammer down the dataready pin at boot - otherwise with really
unlucky timing things could get interestingly wedged requiring a hard power
down of the chip.
usb: echi-hcd: Add ehci_setup check before echi_shutdown
This patch protects system from crashing at shutdown in
cases where usb host is not added yet from OTG controller driver.
As ehci_setup() not done yet, so stop accessing registers or
variables initialized as part of ehci_setup().
The use case is simple, for boards like DB410c where the usb host
or device functionality is decided based on the micro-usb cable
presence. If the board boots up with micro-usb connected, the
OTG driver like echi-msm would not add the usb host by default.
However a system shutdown would go and access registers and
uninitialized variables, resulting in below crash.
This patch fixes a suspend/resume issue where the driver is blindly
calling ehci_suspend/resume functions when the ehci hasn't been setup.
This results in a crash during suspend/resume operations.
Thierry Reding [Thu, 26 May 2016 15:23:30 +0000 (17:23 +0200)]
usb: host: ehci-tegra: Avoid getting the same reset twice
Starting with commit 0b52297f2288 ("reset: Add support for shared reset
controls") there is a reference count for reset control assertions. The
goal is to allow resets to be shared by multiple devices and an assert
will take effect only when all instances have asserted the reset.
In order to preserve backwards-compatibility, all reset controls become
exclusive by default. This is to ensure that reset_control_assert() can
immediately assert in hardware.
However, this new behaviour triggers the following warning in the EHCI
driver for Tegra:
[ 3.365019] ------------[ cut here ]------------
[ 3.369639] WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x16c/0x23c
[ 3.382151] Modules linked in:
[ 3.385214] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-next-20160503 #140
[ 3.392769] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[ 3.399046] [<c010fa50>] (unwind_backtrace) from [<c010b120>] (show_stack+0x10/0x14)
[ 3.406787] [<c010b120>] (show_stack) from [<c0347dcc>] (dump_stack+0x90/0xa4)
[ 3.414007] [<c0347dcc>] (dump_stack) from [<c011f4fc>] (__warn+0xe8/0x100)
[ 3.420964] [<c011f4fc>] (__warn) from [<c011f5c4>] (warn_slowpath_null+0x20/0x28)
[ 3.428525] [<c011f5c4>] (warn_slowpath_null) from [<c03cc8cc>] (__of_reset_control_get+0x16c/0x23c)
[ 3.437648] [<c03cc8cc>] (__of_reset_control_get) from [<c0526858>] (tegra_ehci_probe+0x394/0x518)
[ 3.446600] [<c0526858>] (tegra_ehci_probe) from [<c04516d8>] (platform_drv_probe+0x4c/0xb0)
[ 3.455029] [<c04516d8>] (platform_drv_probe) from [<c044fe78>] (driver_probe_device+0x1ec/0x330)
[ 3.463892] [<c044fe78>] (driver_probe_device) from [<c0450074>] (__driver_attach+0xb8/0xbc)
[ 3.472320] [<c0450074>] (__driver_attach) from [<c044e1ec>] (bus_for_each_dev+0x68/0x9c)
[ 3.480489] [<c044e1ec>] (bus_for_each_dev) from [<c044f338>] (bus_add_driver+0x1a0/0x218)
[ 3.488743] [<c044f338>] (bus_add_driver) from [<c0450768>] (driver_register+0x78/0xf8)
[ 3.496738] [<c0450768>] (driver_register) from [<c010178c>] (do_one_initcall+0x40/0x170)
[ 3.504909] [<c010178c>] (do_one_initcall) from [<c0c00ddc>] (kernel_init_freeable+0x158/0x1f8)
[ 3.513600] [<c0c00ddc>] (kernel_init_freeable) from [<c0810784>] (kernel_init+0x8/0x114)
[ 3.521770] [<c0810784>] (kernel_init) from [<c0107778>] (ret_from_fork+0x14/0x3c)
[ 3.529361] ---[ end trace 4bda87dbe4ecef8a ]---
The reason is that Tegra SoCs have three EHCI controllers, each with a
separate reset line. However the first controller contains UTMI pads
configuration registers that are shared with its siblings and that are
reset as part of the first controller's reset. There is special code in
the driver to assert and deassert this shared reset at probe time, and
it does so irrespective of which controller is probed first to ensure
that these shared registers are reset before any of the controllers are
initialized. Unfortunately this means that if the first controller gets
probed first, it will request its own reset line and will subsequently
request the same reset line again (temporarily) to perform the reset.
This used to work fine before the above-mentioned commit, but now
triggers the new WARN.
Work around this by making sure we reuse the controller's reset if the
controller happens to be the first controller.
Thierry Reding [Thu, 26 May 2016 15:23:29 +0000 (17:23 +0200)]
usb: host: ehci-tegra: Grab the correct UTMI pads reset
There are three EHCI controllers on Tegra SoCs, each with its own reset
line. However, the first controller contains a set of UTMI configuration
registers that are shared with its siblings. These registers will only
be reset as part of the first controller's reset. For proper operation
it must be ensured that the UTMI configuration registers are reset
before any of the EHCI controllers are enabled, irrespective of the
probe order.
Commit a47cc24cd1e5 ("USB: EHCI: tegra: Fix probe order issue leading to
broken USB") introduced code that ensures the first controller is always
reset before setting up any of the controllers, and is never again reset
afterwards.
This code, however, grabs the wrong reset. Each EHCI controller has two
reset controls attached: 1) the USB controller reset and 2) the UTMI
pads reset (really the first controller's reset). In order to reset the
UTMI pads registers the code must grab the second reset, but instead it
grabbing the first.
Sudip Mukherjee [Mon, 30 May 2016 13:46:33 +0000 (19:16 +0530)]
USB: mos7720: delete parport
parport subsystem has introduced parport_del_port() to delete a port
when it is going away. Without parport_del_port() the registered port
will not be unregistered.
To reproduce and verify the error:
Command to be used is : ls /sys/bus/parport/devices
1) without the device attached there is no output as there is no
registered parport.
2) Attach the device, and the command will show "parport0".
3) Remove the device and the command still shows "parport0".
4) Attach the device again and we get "parport1".
With the patch applied:
1) without the device attached there is no output as there is no
registered parport.
2) Attach the device, and the command will show "parport0".
3) Remove the device and there is no output as "parport0" is now
removed.
4) Attach device again to get "parport0" again.
Michał Pecio [Tue, 7 Jun 2016 10:34:45 +0000 (12:34 +0200)]
USB: OHCI: Don't mark EDs as ED_OPER if scheduling fails
Since ed_schedule begins with marking the ED as "operational",
the ED may be left in such state even if scheduling actually
fails.
This allows future submission attempts to smuggle this ED to the
hardware behind the scheduler's back and without linking it to
the ohci->eds_in_use list.
The former causes bandwidth saturation and data loss on isoc
endpoints, the latter crashes the kernel when attempt is made
to unlink such ED from this list.
Fix ed_schedule to update ED state only on successful return.
powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT
In some of the radix TLB flush routines, we use a local to store the
mm->context.id, AKA the PID.
Currently we use an int, but the PID is unsigned long, so large values
of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which
means all our comparisons against that value can never be true.
This means we'll issue TLB flushes when we shouldn't on radix enabled
machines.
Fix it by using an unsigned long for the local. Discovered by Coverity.
Linus Torvalds [Wed, 8 Jun 2016 03:41:36 +0000 (20:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
4.6, d_walk - 3.2+.
The atomic_open() and coredump ones are regressions from this window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: fix dumping through pipes
fix a regression in atomic_open()
fix d_walk()/non-delayed __d_free() race
autofs braino fix for do_last()
fix EOPENSTALE bug in do_last()
Al Viro [Wed, 8 Jun 2016 01:53:51 +0000 (21:53 -0400)]
fix a regression in atomic_open()
open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong. That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open(). Easy to
fix, fortunately.
Al Viro [Wed, 8 Jun 2016 01:26:55 +0000 (21:26 -0400)]
fix d_walk()/non-delayed __d_free() race
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.
Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.
For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in 1500000 KHz
This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.
One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.
Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog, rewrite in fewer lines of code ] Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Wolfram Sang [Mon, 6 Jun 2016 16:48:38 +0000 (18:48 +0200)]
of: fix autoloading due to broken modalias with no 'compatible'
Because of an improper dereference, a stray 'C' character was output to
the modalias when no 'compatible' was specified. This is the case for
some old PowerMac drivers which only set the 'name' property. Fix it to
let them match again.
powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
The recent commit 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support
to ibm,client-architecture-support call") added a new PVR mask & value
to the start of the ibm_architecture_vec[] array.
However it missed the fact that further down in the array, we hard code
the offset of one of the fields, and then at boot use that value to
patch the value in the array. This means every update to the array must
also update the #define, ugh.
This means that on pseries machines we will misreport to firmware the
number of cores we support, by a factor of threads_per_core.
Fix it for now by updating the #define.
Fixes: 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") Cc: [email protected] # v4.0+ Signed-off-by: Michael Ellerman <[email protected]>
The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:
1) Fix missing alignment in next offset calculation for standard
targets, introduced in the previous merge window, patch from
Florian Westphal.
2) Fix to correct the handling of outgoing connections which use the
SIP-pe such that the binding of a real-server is updated when needed.
This was an omission from changes introduced by Marco Angaroni in
the previous merge window too, to allow handling of outgoing
connections by the SIP-pe. Patch and report came via Simon Horman.
====================
Daniel Borkmann [Mon, 6 Jun 2016 20:50:39 +0000 (22:50 +0200)]
net: sched: fix tc_should_offload for specific clsact classes
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.
Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support") Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support") Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
WANG Cong [Mon, 6 Jun 2016 16:54:30 +0000 (09:54 -0700)]
act_police: fix a crash during removal
The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Mon, 6 Jun 2016 16:12:39 +0000 (09:12 -0700)]
fq_codel: return non zero qlen in class dumps
We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.
This set fixes two small issues with error codes I noticed
in cls_u32. Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.
Colin Ian King [Mon, 6 Jun 2016 15:08:41 +0000 (16:08 +0100)]
gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Fix clang build warning:
./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]
fix by defining _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Linus Torvalds [Tue, 7 Jun 2016 23:24:44 +0000 (16:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"This finally removes the CLK_IS_ROOT flag by picking up the last few
stragglers that didn't get merged by anyone this time around.
Better to do it now than wait for another one to pop up. There's also
a minor maintainers update and a Kconfig fix"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: nxp: Select MFD_SYSCON for creg driver
MAINTAINERS: Add file patterns for clock device tree bindings
clk: Remove CLK_IS_ROOT flag
clk: microchip: Remove CLK_IS_ROOT
powerpc/512x: clk: Remove CLK_IS_ROOT
vexpress/spc: Remove CLK_IS_ROOT
Michael Chan [Mon, 6 Jun 2016 06:37:16 +0000 (02:37 -0400)]
bnxt_en: Simplify VLAN receive logic.
Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().
Michael Chan [Mon, 6 Jun 2016 06:37:15 +0000 (02:37 -0400)]
bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.
The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG. It cannot strip one and not strip the other. Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.
Michael Chan [Mon, 6 Jun 2016 06:37:14 +0000 (02:37 -0400)]
bnxt_en: Fix tx push race condition.
Set the is_push flag in the software BD before the tx data is pushed to
the chip. It is possible to get the tx interrupt as soon as the tx data
is pushed. The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.
H. Peter Anvin [Wed, 6 Apr 2016 00:01:33 +0000 (17:01 -0700)]
x86, build: copy ldlinux.c32 to image.iso
For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.
This three part patchset fixes bugs in synchronization between
rds_tcp_accept_one() and the rds-tcp send/recv path.
Patch 1 ensures that the lock_sock() is taken appropriately
and the RDS datagram reassembly state is reset to synchronize
with the receive path.
Patch 2 ensures that partially sent RDS datagrams will get
retransmitted after rds_tcp_accept_one() switches sockets.
Patch 3 fixes a race window which would prematurely re-enable
rds_send_xmit() before the rds_tcp_connection setup has been
completed in rds_tcp_accept_one().
====================
RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()
The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
take t_conn_lock in rds_tcp_conn_connect
send outgoing syn to peer
drop t_conn_lock in rds_tcp_conn_connect
incoming from peer triggers rds_tcp_accept_one, conn is
marked CONNECTING
wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
call rds_tcp_reset_callbacks
[.. race-window where incoming syn-ack can cause the conn
to be marked UP from rds_tcp_state_change ..]
lock_sock called from rds_tcp_reset_callbacks, and we set
t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.
Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().
RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks
When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.
RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely
When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed. Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.
This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.
Eric Dumazet [Sat, 4 Jun 2016 19:55:13 +0000 (12:55 -0700)]
fq_codel: fix NET_XMIT_CN behavior
My prior attempt to fix the backlogs of parents failed.
If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.
v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()
Daniel Borkmann [Sat, 4 Jun 2016 18:50:59 +0000 (20:50 +0200)]
bpf, trace: use READ_ONCE for retrieving file ptr
In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.
We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Tue, 7 Jun 2016 17:04:35 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric Biederman:
"This contains two small but significant fixes to fs/namespace.c.
The first adds a filesystem refcount drop on error. The second
corrects a test in fs_fully_visible which could be abused to allow
mounting of proc or sysfs, when that should not be allowed.
To keep myself honest I have tested to ensure the incorrect test in
fs_fully_visible actually allows improper mounting of proc before the
fix and that when fixed the improper mounting is not allowed"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: fs_fully_visible test the proper mount for MNT_LOCKED
mnt: If fs_fully_visible fails call put_filesystem.
Erez Shitrit [Sat, 4 Jun 2016 12:15:19 +0000 (15:15 +0300)]
IB/IPoIB: Don't update neigh validity for unresolved entries
ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send. This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it. If the queue for this neighbor is full, then don't update the
alive timestamp. That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.
Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
Achiad Shochat [Sat, 4 Jun 2016 12:15:37 +0000 (15:15 +0300)]
IB/mlx5: Fix alternate path code
Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path
including fields alt_pkey_index and alt_timeout.
Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when
calling mlx5_set_path for the alternate path to force setting the
alt_pkey_index and alt_timeout values.
Noa Osherovich [Sat, 4 Jun 2016 12:15:36 +0000 (15:15 +0300)]
IB/mlx5: Fix pkey_index length in the QP path record
Pkey index fields in the QP context path record are extended to 16
bits, as required by IB spec (version 1.3).
This change affects all QP commands which include path records.
To enable this change, moved the free adaptive routing flag bit
(free_ar) to the most significant byte of the QP path record.
Noa Osherovich [Sat, 4 Jun 2016 12:15:31 +0000 (15:15 +0300)]
IB/mlx5: Limit query HCA clock
When PAGE_SIZE is larger than 4K, the user shouldn't be able to query
the HCA core clock. This counter is within 4KB boundary and the
user-space shall not read information that's after this boundary.