drm/amdgpu: add pipeline sync while vmid switch in same ctx
Since vmid-mgr supports vmid sharing in one vm, the same ctx could
get different vmids for two emits without vm flush, vm_flush could
be done in another ring.
Prasun Maiti [Mon, 6 Jun 2016 14:34:19 +0000 (20:04 +0530)]
wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
iwpriv app uses iw_point structure to send data to Kernel. The iw_point
structure holds a pointer. For compatibility Kernel converts the pointer
as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
may use iw_handler_def.private_args to populate iwpriv commands instead
of iw_handler_def.private. For those case, the IOCTLs from
SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
Accordingly when the filled up iw_point structure comes from 32 bit
iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
it to driver. So, the driver may get the invalid data.
The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
Johannes Berg [Thu, 9 Jun 2016 07:40:55 +0000 (09:40 +0200)]
cfg80211: remove get/set antenna and tx power warnings
Since set_tx_power and set_antenna are frequently implemented
without the matching get_tx_power/get_antenna, we shouldn't
have added warnings for those. Remove them.
The remaining ones are correct and need to be implemented
symmetrically for correct operation.
Cc: [email protected] Fixes: de3bb771f471 ("cfg80211: add more warnings for inconsistent ops") Signed-off-by: Johannes Berg <[email protected]>
Shweta Choudaha [Wed, 8 Jun 2016 19:15:43 +0000 (20:15 +0100)]
ip6gre: Allow live link address change
The ip6 GRE tap device should not be forced to down state to change
the mac address and should allow live address change for tap device
similar to ipv4 gre.
These are incremental changes from v1 of cls_u32 fixes.
First patch is reposted in its entirety, patch 2 is an
incremental change from patch 2 of the original series.
====================
Jakub Kicinski [Wed, 8 Jun 2016 19:11:04 +0000 (20:11 +0100)]
net: cls_u32: be more strict about skip-sw flag for knodes
Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).
This patch fixes the knode handling.
Dave Airlie [Thu, 9 Jun 2016 02:32:09 +0000 (12:32 +1000)]
Merge tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux into drm-fixes
This pull request brings in vblank/pageflip fixes I had hoped to see
merged before 4.7rc1, plus two new fixes that have come in since then.
* tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux:
drm/vc4: Make pageflip completion handling more robust.
drm/vc4: Fix ioctl permissions for render nodes.
drm/vc4: Return -EBUSY if there's already a pending flip event.
drm/vc4: Fix drm_vblank_put/get imbalance in page flip path.
drm/vc4: Fix get_vblank_counter with proper no-op for Linux 4.4+
Dave Airlie [Thu, 9 Jun 2016 02:30:29 +0000 (12:30 +1000)]
Merge branch 'linux-4.7' of git://github.com/skeggsb/linux into drm-fixes
Fixes for two issues reported by KASAN, a display engine hang due to
incorrect BIOS table parsing, and incorrect LTC interrupt handling on
Maxwell which could lead to a never-ending interrupt storm.
* 'linux-4.7' of git://github.com/skeggsb/linux:
drm/nouveau/disp/sor/gm107: training pattern registers are like gm200
drm/nouveau/disp/sor/gf119: both links use the same training register
drm/nouveau/core: swap the order of imem/fb
drm/nouveau/fbcon: fix out-of-bounds memory accesses
drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
drm/nouveau/ltc/gm107-: fix typo in the address of NV_PLTCG_LTC0_LTS0_INTR
drm/nouveau/bios/disp: fix handling of "match any protocol" entries
Stefan Agner [Fri, 3 Jun 2016 21:21:34 +0000 (14:21 -0700)]
drm/fsl-dcu: use flat regmap cache
Using flat regmap cache instead of RB-tree to avoid the following
lockdep warning on driver load:
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
The RB-tree regmap cache needs to allocate new space on first
writes. However, allocations in an atomic context (e.g. when a
spinlock is held) are not allowed. The function regmap_write
calls map->lock, which acquires a spinlock in the fast_io case.
Since the FSL DCU driver uses MMIO, the regmap bus of type
regmap_mmio is being used which has fast_io set to true.
Use flat regmap cache and specify max register to be large
enouth to cover all registers available in LS1021a and Vybrids
register space.
Eric Dumazet [Wed, 8 Jun 2016 13:19:45 +0000 (06:19 -0700)]
net_sched: add missing paddattr description
"make htmldocs" complains otherwise:
.//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr'
.//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr'
Jakub Sitnicki [Wed, 8 Jun 2016 13:13:34 +0000 (15:13 +0200)]
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.
If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.
To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:
1) Set up a transformation and lower the MTU to trigger fragmentation
# ip xfrm policy add dir out src ::1 dst ::1 \
tmpl src ::1 dst ::1 proto esp spi 1
# ip xfrm state add src ::1 dst ::1 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
# ip link set dev lo mtu 1500
2) Monitor the packet flow and set up an UDP sink
# tcpdump -ni lo -ttt &
# socat udp6-listen:12345,fork /dev/null &
4) Compare it to a non-connected socket
# perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
What happens in step (3) is:
1) when connecting the socket in __ip6_datagram_connect(), we
perform an XFRM lookup, miss the flow cache, create an XFRM
bundle, and cache the destination,
2) afterwards, when sending the datagram, we perform an XFRM lookup,
again, miss the flow cache (due to mismatch of flowi6_iif and
flowi6_oif, which is an issue of its own), and recreate an XFRM
bundle based on the cached (and already transformed) destination.
To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.
The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.
Guillaume Nault [Wed, 8 Jun 2016 10:59:17 +0000 (12:59 +0200)]
l2tp: fix configuration passed to setup_udp_tunnel_sock()
Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().
v2: use empty initialiser instead of "{ NULL }" to avoid relying on
first field's type.
Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Bob Liu [Tue, 31 May 2016 08:59:17 +0000 (16:59 +0800)]
xen-blkfront: fix resume issues after a migration
After a migrate to another host (which may not have multiqueue
support), the number of rings (block hardware queues)
may be changed and the ring info structure will also be reallocated.
This patch fixes two related bugs:
* call blk_mq_update_nr_hw_queues() to make blk-core know the number
of hardware queues have been changed.
* Don't store rinfo pointer to hctx->driver_data, because rinfo may be
reallocated so use hctx->queue_num to get the rinfo structure instead.
Bob Liu [Tue, 7 Jun 2016 14:43:15 +0000 (10:43 -0400)]
xen-blkfront: don't call talk_to_blkback when already connected to blkback
Sometimes blkfront may twice receive blkback_changed() notification
(XenbusStateConnected) after migration, which will cause
talk_to_blkback() to be called twice too and confuse xen-blkback.
The flow is as follow:
blkfront blkback
blkfront_resume()
> talk_to_blkback()
> Set blkfront to XenbusStateInitialised
front changed()
> Connect()
> Set blkback to XenbusStateConnected
blkback_changed()
> Skip talk_to_blkback()
because frontstate == XenbusStateInitialised
> blkfront_connect()
> Set blkfront to XenbusStateConnected
-----
And here we get another XenbusStateConnected notification leading
to:
-----
blkback_changed()
> because now frontstate != XenbusStateInitialised
talk_to_blkback() is also called again
> blkfront state changed from
XenbusStateConnected to XenbusStateInitialised
(Which is not correct!)
front_changed():
> Do nothing because blkback
already in XenbusStateConnected
Now blkback is in XenbusStateConnected but blkfront is still
in XenbusStateInitialised - leading to no disks.
Poking of the XenbusStateConnected state is allowed (to deal with
block disk change) and has to be dealt with. The most likely
cause of this bug are custom udev scripts hooking up the disks
and then validating the size.
Mel Gorman [Wed, 8 Jun 2016 13:25:22 +0000 (14:25 +0100)]
futex: Calculate the futex key based on a tail page for file-based futexes
Mike Galbraith reported that the LTP test case futex_wake04 was broken
by commit 65d8fc777f6d ("futex: Remove requirement for lock_page()
in get_futex_key()").
This test case uses futexes backed by hugetlbfs pages and so there is an
associated inode with a futex stored on such pages. The problem is that
the key is being calculated based on the head page index of the hugetlbfs
page and not the tail page.
Prior to the optimisation, the page lock was used to stabilise mappings and
pin the inode is file-backed which is overkill. If the page was a compound
page, the head page was automatically looked up as part of the page lock
operation but the tail page index was used to calculate the futex key.
After the optimisation, the compound head is looked up early and the page
lock is only relied upon to identify truncated pages, special pages or a
shmem page moving to swapcache. The head page is looked up because without
the page lock, special care has to be taken to pin the inode correctly.
However, the tail page is still required to calculate the futex key so
this patch records the tail page.
On vanilla 4.6, the output of the test case is;
futex_wake04 0 TINFO : Hugepagesize 2097152
futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.
Josef Bacik [Wed, 8 Jun 2016 14:32:10 +0000 (10:32 -0400)]
nbd: pass the nbd pointer for flags debugfs
We were passing in &nbd for the private data in debugfs_create_file() for the
flags entry. We expect it to just be nbd, fix this so we get proper output from
this debugfs entry.
drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_send_msg()+0x107: duplicate frame pointer save
drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_host_get_guestinfo()+0x252: duplicate frame pointer save
To quote Linus:
"The reason is that VMW_PORT_HB_OUT() uses a magic instruction sequence
(a "rep outsb") to communicate with the hypervisor (it's a virtual GPU
driver for vmware), and %rbp is part of the communication. So the
inline asm does a save-and-restore of the frame pointer around the
instruction sequence.
I actually find the objtool warning to be quite reasonable, so it's
not exactly a false positive, since in this case it actually does
point out that the frame pointer won't be reliable over that
instruction sequence.
But in this particular case it just ends up being the wrong thing -
the code is what it is, and %rbp just can't have the frame information
due to annoying magic calling conventions."
Silence the warnings by telling objtool to ignore the two functions
which use the VMW_PORT_HB_{IN,OUT} macros.
Robin Murphy [Tue, 7 Jun 2016 17:44:48 +0000 (18:44 +0100)]
drivers: of: Fix of_pci.h header guard
The compilation of of_pci.c is governed by CONFIG_OF_PCI, but the
corresponding declarations in of_pci.h are inconsistently guarded by
CONFIG_OF, with the result that if CONFIG_PCI is disabled for an OF
platform, the dangling external declarations are still active and the
inline stub definitions not. So far this has managed to go unnoticed
since it happens that the only references to these functions are from
code which itself depends on CONFIG_PCI or CONFIG_OF_PCI.
Fix this with the appropriate config guard so that any new callers
outside PCI-specific code don't start unexpectedly breaking under
certain configs.
The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.
Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.
Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.
Josh Poimboeuf [Fri, 3 Jun 2016 22:58:40 +0000 (17:58 -0500)]
sched/debug: Fix /proc/sched_debug regression
Commit:
cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
... introduced a bug when CONFIG_SCHEDSTATS is enabled and the
runtime tunable is disabled (which is the default).
The wait-time, sum-exec, and sum-sleep fields are missing from the
/proc/sched_debug file in the runnable_tasks section.
Fix it with a new schedstat_val() macro which returns the field value
when schedstats is enabled and zero otherwise. The macro works with
both SCHEDSTATS and !SCHEDSTATS. I put the macro in stats.h since it
might end up being useful in other places.
There is no way to end up in _free_event() with event::pmu being NULL.
The latter is initialized in event allocation path and remains set
forever. In case of allocation failure, the error path doesn't use
_free_event().
Having the check, however, suggests that it is possible to have a
event::pmu==NULL situation in _free_event() and confuses the robots.
Peter Zijlstra [Wed, 8 Jun 2016 08:19:51 +0000 (10:19 +0200)]
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:
flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.
The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
with what looks like the wrong parameter. The write_lock_regs
function takes a pointer to the registers, not the bcm_kona_gpio
structure.
Fix the warning, and probably bug by changing the function to
pass reg_base instead of kona_gpio, fixing the following warning:
drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
(different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:2>*reg_base
got struct bcm_kona_gpio *kona_gpio
Borislav Petkov [Wed, 1 Jun 2016 10:04:28 +0000 (12:04 +0200)]
x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models
We need to reenable the topology extensions CPUID leafs on newer models
too, if BIOS has disabled them, as we rely on them to get proper compute
unit topology.
regulator: qcom_smd: add regulator ops for pm8941 lnldo
After "regulator: qcom_smd: add list_voltage callback" patch adding
pm8941 lnldo regulators would bug on list_voltages as it is a fixed
regulator without any linear range.
This patch fixes that issue by adding dedicated ops for pm8941 lnldo
without list_voltages callback.
Dave Hansen [Fri, 3 Jun 2016 00:19:27 +0000 (17:19 -0700)]
x86/cpu/intel: Introduce macros for Intel family numbers
Problem:
We have a boatload of open-coded family-6 model numbers. Half of
them have these model numbers in hex and the other half in
decimal. This makes grepping for them tons of fun, if you were
to try.
Solution:
Consolidate all the magic numbers. Put all the definitions in
one header.
The names here are closely derived from the comments describing
the models from arch/x86/events/intel/core.c. We could easily
make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
they seemed fine even with the longer versions to me.
Do not take any of these names too literally, like "DESKTOP"
or "MOBILE". These are all colloquial names and not precise
descriptions of everywhere a given model will show up.
Will Deacon [Tue, 7 Jun 2016 16:55:15 +0000 (17:55 +0100)]
arm64: mm: always take dirty state from new pte in ptep_set_access_flags
Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:
| This patch breaks swapping for me.
| In the broken case, you'll see either systemd cpu time spike (because
| it's stuck in a page fault loop) or the system hang (because the
| application owning the screen is stuck in a page fault loop).
It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:
1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
3. A read faults due to the missing access flag
4. ptep_set_access_flags is called with dirty = 0, due to the read fault
5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
6. A write faults, but pte_write is true so we get stuck
The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.
Linus Walleij [Wed, 8 Jun 2016 08:58:20 +0000 (10:58 +0200)]
gpio: include <linux/io-mapping.h> in gpiolib-of
When enabling the gpiolib for all archs a build robot came
up with this:
All errors (new ones prefixed by >>):
drivers/gpio/gpiolib-of.c: In function 'of_mm_gpiochip_add_data':
>> drivers/gpio/gpiolib-of.c:317:2: error: implicit declaration of
function 'iounmap' [-Werror=implicit-function-declaration]
iounmap(mm_gc->regs);
^~~~~~~
cc1: some warnings being treated as errors
Fix this by including <linux/io-mapping.h> explicitly.
Fixes: 296ad4acb8ef ("gpio: remove deps on ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB") Reported-by: kbuild test robot <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
gpiolib relies on the reference counters to clean up the gpio_device
structure.
Although the number of get/put is properly aligned on gpiolib.c
itself, it does not take into consideration how the referece counters
are affected by other external functions such as cdev_add and device_add.
Because of this, after the last call to put_device, the reference counter
has a value of +3, therefore never calling gpiodevice_release.
Due to the fact that some of the device has already been cleaned on
gpiochip_remove, the library will end up OOPsing the kernel (e.g. a call
to of_gpiochip_find_and_xlate).
Helmut Grohne [Fri, 3 Jun 2016 12:15:32 +0000 (14:15 +0200)]
gpio: zynq: initialize clock even without CONFIG_PM
When the PM initialization was moved in the commit referenced below, the
code enabling the clock was removed from the probe function. On
CONFIG_PM=y kernels, this is not a problem as the pm resume hook enables
the clock, but when power management is disabled, all those pm_*
functions are noops and the clock is never enabled resulting in a
dysfunctional gpio controller.
Put the clock initialization back to support CONFIG_PM=n.
Cc: [email protected] Signed-off-by: Helmut Grohne <[email protected]> Fixes: 3773c195d387 ("gpio: zynq: Do PM initialization earlier to support gpio hogs") Signed-off-by: Linus Walleij <[email protected]>
gpio: 104-dio-48e: Fix control port offset computation off-by-one error
There are only two control ports, each controlling three distinct I/O
ports. To compute the control port address offset for a respective I/O
port, the I/O port address offset should be divided by 3; dividing by 2
may result in not only the wrong address offset but possibly also an
out-of-bounds array memory access for a non-existent third control port.
Fixes: 1b06d64f7374 ("gpio: Add GPIO support for the ACCES 104-DIO-48E") Signed-off-by: William Breathitt Gray <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
Ben Dooks [Tue, 7 Jun 2016 18:27:51 +0000 (19:27 +0100)]
net-sysfs: fix missing <linux/of_net.h>
The of_find_net_device_by_node() function is defined in
<linux/of_net.h> but not included in the .c file that
implements it. Fix the following warning by including the
header:
net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?
Toshiaki Makita [Tue, 7 Jun 2016 10:14:17 +0000 (19:14 +0900)]
bridge: Don't insert unnecessary local fdb entry on changing mac address
The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.
Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT
In some of the radix TLB flush routines, we use a local to store the
mm->context.id, AKA the PID.
Currently we use an int, but the PID is unsigned long, so large values
of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which
means all our comparisons against that value can never be true.
This means we'll issue TLB flushes when we shouldn't on radix enabled
machines.
Fix it by using an unsigned long for the local. Discovered by Coverity.
Linus Torvalds [Wed, 8 Jun 2016 03:41:36 +0000 (20:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
4.6, d_walk - 3.2+.
The atomic_open() and coredump ones are regressions from this window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: fix dumping through pipes
fix a regression in atomic_open()
fix d_walk()/non-delayed __d_free() race
autofs braino fix for do_last()
fix EOPENSTALE bug in do_last()
Al Viro [Wed, 8 Jun 2016 01:53:51 +0000 (21:53 -0400)]
fix a regression in atomic_open()
open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong. That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open(). Easy to
fix, fortunately.
Al Viro [Wed, 8 Jun 2016 01:26:55 +0000 (21:26 -0400)]
fix d_walk()/non-delayed __d_free() race
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.
Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.
For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in 1500000 KHz
This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.
One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.
Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog, rewrite in fewer lines of code ] Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Wolfram Sang [Mon, 6 Jun 2016 16:48:38 +0000 (18:48 +0200)]
of: fix autoloading due to broken modalias with no 'compatible'
Because of an improper dereference, a stray 'C' character was output to
the modalias when no 'compatible' was specified. This is the case for
some old PowerMac drivers which only set the 'name' property. Fix it to
let them match again.
powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
The recent commit 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support
to ibm,client-architecture-support call") added a new PVR mask & value
to the start of the ibm_architecture_vec[] array.
However it missed the fact that further down in the array, we hard code
the offset of one of the fields, and then at boot use that value to
patch the value in the array. This means every update to the array must
also update the #define, ugh.
This means that on pseries machines we will misreport to firmware the
number of cores we support, by a factor of threads_per_core.
Fix it for now by updating the #define.
Fixes: 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") Cc: [email protected] # v4.0+ Signed-off-by: Michael Ellerman <[email protected]>
The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:
1) Fix missing alignment in next offset calculation for standard
targets, introduced in the previous merge window, patch from
Florian Westphal.
2) Fix to correct the handling of outgoing connections which use the
SIP-pe such that the binding of a real-server is updated when needed.
This was an omission from changes introduced by Marco Angaroni in
the previous merge window too, to allow handling of outgoing
connections by the SIP-pe. Patch and report came via Simon Horman.
====================
Daniel Borkmann [Mon, 6 Jun 2016 20:50:39 +0000 (22:50 +0200)]
net: sched: fix tc_should_offload for specific clsact classes
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.
Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support") Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support") Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
WANG Cong [Mon, 6 Jun 2016 16:54:30 +0000 (09:54 -0700)]
act_police: fix a crash during removal
The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.
Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Mon, 6 Jun 2016 16:12:39 +0000 (09:12 -0700)]
fq_codel: return non zero qlen in class dumps
We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.
This set fixes two small issues with error codes I noticed
in cls_u32. Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.
Colin Ian King [Mon, 6 Jun 2016 15:08:41 +0000 (16:08 +0100)]
gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Fix clang build warning:
./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]
fix by defining _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Linus Torvalds [Tue, 7 Jun 2016 23:24:44 +0000 (16:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"This finally removes the CLK_IS_ROOT flag by picking up the last few
stragglers that didn't get merged by anyone this time around.
Better to do it now than wait for another one to pop up. There's also
a minor maintainers update and a Kconfig fix"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: nxp: Select MFD_SYSCON for creg driver
MAINTAINERS: Add file patterns for clock device tree bindings
clk: Remove CLK_IS_ROOT flag
clk: microchip: Remove CLK_IS_ROOT
powerpc/512x: clk: Remove CLK_IS_ROOT
vexpress/spc: Remove CLK_IS_ROOT
Michael Chan [Mon, 6 Jun 2016 06:37:16 +0000 (02:37 -0400)]
bnxt_en: Simplify VLAN receive logic.
Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().
Michael Chan [Mon, 6 Jun 2016 06:37:15 +0000 (02:37 -0400)]
bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.
The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG. It cannot strip one and not strip the other. Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.
Michael Chan [Mon, 6 Jun 2016 06:37:14 +0000 (02:37 -0400)]
bnxt_en: Fix tx push race condition.
Set the is_push flag in the software BD before the tx data is pushed to
the chip. It is possible to get the tx interrupt as soon as the tx data
is pushed. The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.
H. Peter Anvin [Wed, 6 Apr 2016 00:01:33 +0000 (17:01 -0700)]
x86, build: copy ldlinux.c32 to image.iso
For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.
This three part patchset fixes bugs in synchronization between
rds_tcp_accept_one() and the rds-tcp send/recv path.
Patch 1 ensures that the lock_sock() is taken appropriately
and the RDS datagram reassembly state is reset to synchronize
with the receive path.
Patch 2 ensures that partially sent RDS datagrams will get
retransmitted after rds_tcp_accept_one() switches sockets.
Patch 3 fixes a race window which would prematurely re-enable
rds_send_xmit() before the rds_tcp_connection setup has been
completed in rds_tcp_accept_one().
====================
RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()
The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
take t_conn_lock in rds_tcp_conn_connect
send outgoing syn to peer
drop t_conn_lock in rds_tcp_conn_connect
incoming from peer triggers rds_tcp_accept_one, conn is
marked CONNECTING
wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
call rds_tcp_reset_callbacks
[.. race-window where incoming syn-ack can cause the conn
to be marked UP from rds_tcp_state_change ..]
lock_sock called from rds_tcp_reset_callbacks, and we set
t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.
Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().
RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks
When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.
RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely
When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed. Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.
This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.
Eric Dumazet [Sat, 4 Jun 2016 19:55:13 +0000 (12:55 -0700)]
fq_codel: fix NET_XMIT_CN behavior
My prior attempt to fix the backlogs of parents failed.
If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.
v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()
Daniel Borkmann [Sat, 4 Jun 2016 18:50:59 +0000 (20:50 +0200)]
bpf, trace: use READ_ONCE for retrieving file ptr
In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.
We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Tue, 7 Jun 2016 17:04:35 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric Biederman:
"This contains two small but significant fixes to fs/namespace.c.
The first adds a filesystem refcount drop on error. The second
corrects a test in fs_fully_visible which could be abused to allow
mounting of proc or sysfs, when that should not be allowed.
To keep myself honest I have tested to ensure the incorrect test in
fs_fully_visible actually allows improper mounting of proc before the
fix and that when fixed the improper mounting is not allowed"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
mnt: fs_fully_visible test the proper mount for MNT_LOCKED
mnt: If fs_fully_visible fails call put_filesystem.
Erez Shitrit [Sat, 4 Jun 2016 12:15:19 +0000 (15:15 +0300)]
IB/IPoIB: Don't update neigh validity for unresolved entries
ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send. This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it. If the queue for this neighbor is full, then don't update the
alive timestamp. That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.
Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
Achiad Shochat [Sat, 4 Jun 2016 12:15:37 +0000 (15:15 +0300)]
IB/mlx5: Fix alternate path code
Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path
including fields alt_pkey_index and alt_timeout.
Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when
calling mlx5_set_path for the alternate path to force setting the
alt_pkey_index and alt_timeout values.
Noa Osherovich [Sat, 4 Jun 2016 12:15:36 +0000 (15:15 +0300)]
IB/mlx5: Fix pkey_index length in the QP path record
Pkey index fields in the QP context path record are extended to 16
bits, as required by IB spec (version 1.3).
This change affects all QP commands which include path records.
To enable this change, moved the free adaptive routing flag bit
(free_ar) to the most significant byte of the QP path record.