Git Repo - linux.git/log

net: cls_u32: be more strict about skip-sw flag for knodes

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).
This patch fixes the knode handling.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cls_u32: catch all hardware offload errors

Errors reported by u32_replace_hw_hnode() were not propagated.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Sridhar Samudrala <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: fix race between simultaneous decrements of ->host_failed

sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.

Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.

Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Cc: [email protected]
Signed-off-by: Wei Fang <[email protected]>
Reviewed-by: James Bottomley <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Merge tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux into drm-fixes

This pull request brings in vblank/pageflip fixes I had hoped to see
merged before 4.7rc1, plus two new fixes that have come in since then.

* tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux:
  drm/vc4: Make pageflip completion handling more robust.
  drm/vc4: Fix ioctl permissions for render nodes.
  drm/vc4: Return -EBUSY if there's already a pending flip event.
  drm/vc4: Fix drm_vblank_put/get imbalance in page flip path.
  drm/vc4: Fix get_vblank_counter with proper no-op for Linux 4.4+

drm/omap: fix unused variable warning in dsi & hdmi

Signed-off-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

Merge branch 'linux-4.7' of git://github.com/skeggsb/linux into drm-fixes

Fixes for two issues reported by KASAN, a display engine hang due to
incorrect BIOS table parsing, and incorrect LTC interrupt handling on
Maxwell which could lead to a never-ending interrupt storm.

* 'linux-4.7' of git://github.com/skeggsb/linux:
  drm/nouveau/disp/sor/gm107: training pattern registers are like gm200
  drm/nouveau/disp/sor/gf119: both links use the same training register
  drm/nouveau/core: swap the order of imem/fb
  drm/nouveau/fbcon: fix out-of-bounds memory accesses
  drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
  drm/nouveau/ltc/gm107-: fix typo in the address of NV_PLTCG_LTC0_LTS0_INTR
  drm/nouveau/bios/disp: fix handling of "match any protocol" entries

drm/fsl-dcu: use flat regmap cache

Using flat regmap cache instead of RB-tree to avoid the following
lockdep warning on driver load:
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))

The RB-tree regmap cache needs to allocate new space on first
writes. However, allocations in an atomic context (e.g. when a
spinlock is held) are not allowed. The function regmap_write
calls map->lock, which acquires a spinlock in the fast_io case.
Since the FSL DCU driver uses MMIO, the regmap bus of type
regmap_mmio is being used which has fast_io set to true.

Use flat regmap cache and specify max register to be large
enouth to cover all registers available in LS1021a and Vybrids
register space.

Signed-off-by: Stefan Agner <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: [email protected]

Merge branch 'misc-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7

Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7

platform/x86: Drop duplicate dependencies on X86

The whole menu depends on X86 so there is no point in repeating this
dependency on individual driver entries.

Signed-off-by: Jean Delvare <[email protected]>
Signed-off-by: Darren Hart <[email protected]>

thinkpad_acpi: Add support for HKEY version 0x200

Lenovo Thinkpad devices T460, T460s, T460p, T560, X260 use
HKEY version 0x200 without adaptive keyboard.

HKEY version 0x200 has method MHKA with one parameter value.
Passing parameter value 1 will get hotkey_all_mask (the same like
HKEY version 0x100 without parameter). Passing parameter value 2 to
MHKA method will retrieve hotkey_all_adaptive_mask. If 0 is returned in
that case there is no adaptive keyboard available.

Signed-off-by: Dennis Wassenberg <[email protected]>
Signed-off-by: Lyude <[email protected]>
Tested-by: Lyude <[email protected]>
Tested-by: Marco Trevisan <[email protected]>
Acked-by: Henrique de Moraes Holschuh <[email protected]>
[dvhart: Keep MHKA error string on one line in new and existing pr_err calls]
Signed-off-by: Darren Hart <[email protected]>

ideapad_laptop: Add an event for mic mute hotkey

Newer ideapads support a new mic hotkey implemented via an ACPI
interface. This patch converts the mic mute event to a keycode
KEY_MICMUTE.

Signed-off-by: Alex Hung <[email protected]>
Acked-by: Ike Panhc <[email protected]>
Signed-off-by: Darren Hart <[email protected]>

ARM: dts: socfpga: Add missing PHY phandle

Add missing PHY phandle into the DT, otherwise the stmmac code won't
detect the PHY correctly anymore.

Signed-off-by: Marek Vasut <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Signed-off-by: Dinh Nguyen <[email protected]>

sfc: report supported link speeds on SFP connections

7000-series SFC NICs connected with an SFP+ module currently fail to
report any supported link speeds.

Reported-by: Jarod Wilson <[email protected]>
Signed-off-by: Bert Kenward <[email protected]>
Reviewed-by: Jarod Wilson <[email protected]>
Tested-by: Jarod Wilson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: add missing paddattr description

"make htmldocs" complains otherwise:

.//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr'
.//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr'

Fixes: 9854518ea04d ("sched: align nlattr properly when needed")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Skip XFRM lookup if dst_entry in socket cache is valid

At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.

If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.

To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:

  1) Set up a transformation and lower the MTU to trigger fragmentation
    # ip xfrm policy add dir out src ::1 dst ::1 \
      tmpl src ::1 dst ::1 proto esp spi 1
    # ip xfrm state add src ::1 dst ::1 \
      proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
    # ip link set dev lo mtu 1500

  2) Monitor the packet flow and set up an UDP sink
    # tcpdump -ni lo -ttt &
    # socat udp6-listen:12345,fork /dev/null &

  3) Send a datagram that needs fragmentation with a connected socket
    # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
    2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
    00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
    00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
    00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
    (^ ICMPv6 Parameter Problem)
    00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136

  4) Compare it to a non-connected socket
    # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
    00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
    00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)

What happens in step (3) is:

  1) when connecting the socket in __ip6_datagram_connect(), we
     perform an XFRM lookup, miss the flow cache, create an XFRM
     bundle, and cache the destination,

  2) afterwards, when sending the datagram, we perform an XFRM lookup,
     again, miss the flow cache (due to mismatch of flowi6_iif and
     flowi6_oif, which is an issue of its own), and recreate an XFRM
     bundle based on the cached (and already transformed) destination.

To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.

The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.

Joint work with Hannes Frederic Sowa.

Reported-by: Jan Tluka <[email protected]>
Signed-off-by: Jakub Sitnicki <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: fix configuration passed to setup_udp_tunnel_sock()

Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().

[   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
[   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
[   72.696530] Oops: 0011 [#1] SMP KASAN
[   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
[   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
[   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
[   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
[   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
[   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
[   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
[   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
[   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
[   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
[   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
[   72.696530] Stack:
[   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
[   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
[   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
[   72.696530] Call Trace:
[   72.696530]  <IRQ>
[   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
[   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
[   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
[   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
[   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
[   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
[   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
[   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
[   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
[   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
[   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
[   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
[   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
[   72.696530]  <EOI>
[   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
[   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
[   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
[   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
[   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
[   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
[   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
[   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
[   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530]  RSP <ffff880035f87bc0>
[   72.696530] CR2: ffff880033f87d78
[   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
[   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
[   72.696530] Kernel Offset: disabled
[   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

v2: use empty initialiser instead of "{ NULL }" to avoid relying on
    first field's type.

Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

xen-blkfront: fix resume issues after a migration

After a migrate to another host (which may not have multiqueue
support), the number of rings (block hardware queues)
may be changed and the ring info structure will also be reallocated.

This patch fixes two related bugs:
* call blk_mq_update_nr_hw_queues() to make blk-core know the number
of hardware queues have been changed.
* Don't store rinfo pointer to hctx->driver_data, because rinfo may be
reallocated so use hctx->queue_num to get the rinfo structure instead.

Signed-off-by: Bob Liu <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

xen-blkfront: don't call talk_to_blkback when already connected to blkback

Sometimes blkfront may twice receive blkback_changed() notification
(XenbusStateConnected) after migration, which will cause
talk_to_blkback() to be called twice too and confuse xen-blkback.

The flow is as follow:
   blkfront                                        blkback
blkfront_resume()
> talk_to_blkback()
  > Set blkfront to XenbusStateInitialised
                                                front changed()
                                                 > Connect()
                                                  > Set blkback to XenbusStateConnected

blkback_changed()
> Skip talk_to_blkback()
   because frontstate == XenbusStateInitialised
> blkfront_connect()
  > Set blkfront to XenbusStateConnected

-----
And here we get another XenbusStateConnected notification leading
to:
-----
blkback_changed()
> because now frontstate != XenbusStateInitialised
   talk_to_blkback() is also called again
  > blkfront state changed from
  XenbusStateConnected to XenbusStateInitialised
    (Which is not correct!)

front_changed():
                                                 > Do nothing because blkback
                                                   already in XenbusStateConnected

Now blkback is in XenbusStateConnected but blkfront is still
in XenbusStateInitialised - leading to no disks.

Poking of the XenbusStateConnected state is allowed (to deal with
block disk change) and has to be dealt with. The most likely
cause of this bug are custom udev scripts hooking up the disks
and then validating the size.

Signed-off-by: Bob Liu <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

futex: Calculate the futex key based on a tail page for file-based futexes

Mike Galbraith reported that the LTP test case futex_wake04 was broken
by commit 65d8fc777f6d ("futex: Remove requirement for lock_page()
in get_futex_key()").

This test case uses futexes backed by hugetlbfs pages and so there is an
associated inode with a futex stored on such pages. The problem is that
the key is being calculated based on the head page index of the hugetlbfs
page and not the tail page.

Prior to the optimisation, the page lock was used to stabilise mappings and
pin the inode is file-backed which is overkill. If the page was a compound
page, the head page was automatically looked up as part of the page lock
operation but the tail page index was used to calculate the futex key.

After the optimisation, the compound head is looked up early and the page
lock is only relied upon to identify truncated pages, special pages or a
shmem page moving to swapcache. The head page is looked up because without
the page lock, special care has to be taken to pin the inode correctly.
However, the tail page is still required to calculate the futex key so
this patch records the tail page.

On vanilla 4.6, the output of the test case is;

futex_wake04    0  TINFO  :  Hugepagesize 2097152
futex_wake04    1  TFAIL  :  futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs.

With the patch applied

futex_wake04    0  TINFO  :  Hugepagesize 2097152
futex_wake04    1  TPASS  :  Hi hydra, thread2 awake!

Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()"
Reported-and-tested-by: Mike Galbraith <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

cxgb4: Add device id of T540-BT adapter

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nbd: pass the nbd pointer for flags debugfs

We were passing in &nbd for the private data in debugfs_create_file() for the
flags entry. We expect it to just be nbd, fix this so we get proper output from
this debugfs entry.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

objtool, drm/vmwgfx: Fix "duplicate frame pointer save" warning

objtool reports the following warnings:

  drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_send_msg()+0x107: duplicate frame pointer save
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: warning: objtool: vmw_host_get_guestinfo()+0x252: duplicate frame pointer save

To quote Linus:

"The reason is that VMW_PORT_HB_OUT() uses a magic instruction sequence
  (a "rep outsb") to communicate with the hypervisor (it's a virtual GPU
  driver for vmware), and %rbp is part of the communication. So the
  inline asm does a save-and-restore of the frame pointer around the
  instruction sequence.

  I actually find the objtool warning to be quite reasonable, so it's
  not exactly a false positive, since in this case it actually does
  point out that the frame pointer won't be reliable over that
  instruction sequence.

  But in this particular case it just ends up being the wrong thing -
  the code is what it is, and %rbp just can't have the frame information
  due to annoying magic calling conventions."

Silence the warnings by telling objtool to ignore the two functions
which use the VMW_PORT_HB_{IN,OUT} macros.

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: DRI <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20160526184343.fdtjjjg67smmeekt@treble
Signed-off-by: Ingo Molnar <[email protected]>

drivers: of: Fix of_pci.h header guard

The compilation of of_pci.c is governed by CONFIG_OF_PCI, but the
corresponding declarations in of_pci.h are inconsistently guarded by
CONFIG_OF, with the result that if CONFIG_PCI is disabled for an OF
platform, the dangling external declarations are still active and the
inline stub definitions not. So far this has managed to go unnoticed
since it happens that the only references to these functions are from
code which itself depends on CONFIG_PCI or CONFIG_OF_PCI.

Fix this with the appropriate config guard so that any new callers
outside PCI-specific code don't start unexpectedly breaking under
certain configs.

Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: Add vendor prefix for TechNexion

TechNexion designs and manufactures embedded computing systems:
http://www.technexion.com/

Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

sched/debug: Fix 'schedstats=enable' cmdline option

The 'schedstats=enable' option doesn't work, and also produces the
following warning during boot:

  WARNING: CPU: 0 PID: 0 at /home/jpoimboe/git/linux/kernel/jump_label.c:61 static_key_slow_inc+0x8c/0xa0
  static_key_slow_inc used before call to jump_label_init
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc1+ #25
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   0000000000000086 3ae3475a4bea95d4 ffffffff81e03da8 ffffffff8143fc83
   ffffffff81e03df8 0000000000000000 ffffffff81e03de8 ffffffff810b1ffb
   0000003d00000096 ffffffff823514d0 ffff88007ff197c8 0000000000000000
  Call Trace:
   [<ffffffff8143fc83>] dump_stack+0x85/0xc2
   [<ffffffff810b1ffb>] __warn+0xcb/0xf0
   [<ffffffff810b207f>] warn_slowpath_fmt+0x5f/0x80
   [<ffffffff811e9c0c>] static_key_slow_inc+0x8c/0xa0
   [<ffffffff810e07c6>] static_key_enable+0x16/0x40
   [<ffffffff8216d633>] setup_schedstats+0x29/0x94
   [<ffffffff82148a05>] unknown_bootoption+0x89/0x191
   [<ffffffff810d8617>] parse_args+0x297/0x4b0
   [<ffffffff82148d61>] start_kernel+0x1d8/0x4a9
   [<ffffffff8214897c>] ? set_init_arg+0x55/0x55
   [<ffffffff82148120>] ? early_idt_handler_array+0x120/0x120
   [<ffffffff821482db>] x86_64_start_reservations+0x2f/0x31
   [<ffffffff82148427>] x86_64_start_kernel+0x14a/0x16d

The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.

Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.

Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/453775fe3433bed65731a583e228ccea806d18cd.1465322027.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

sched/debug: Fix /proc/sched_debug regression

Commit:

  cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")

... introduced a bug when CONFIG_SCHEDSTATS is enabled and the
runtime tunable is disabled (which is the default).

The wait-time, sum-exec, and sum-sleep fields are missing from the
/proc/sched_debug file in the runnable_tasks section.

Fix it with a new schedstat_val() macro which returns the field value
when schedstats is enabled and zero otherwise.  The macro works with
both SCHEDSTATS and !SCHEDSTATS.  I put the macro in stats.h since it
might end up being useful in other places.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/bcda7c2790cf2ccbe586a28c02dd7b6fe7749a2b.1464994423.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

perf/core: Remove a redundant check

There is no way to end up in _free_event() with event::pmu being NULL.
The latter is initialized in event allocation path and remains set
forever. In case of allocation failure, the error path doesn't use
_free_event().

Having the check, however, suggests that it is possible to have a
event::pmu==NULL situation in _free_event() and confuses the robots.

This patch gets rid of the check.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/1465303455-26032-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <[email protected]>

locking/qspinlock: Fix spin_unlock_wait() some more

While this prior commit:

  54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")

... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.

So while the 2 locks crossed problem:

spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
  foo() foo();

... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:

flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
  // add to lockless list
// iterate lockless list

... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.

This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.

It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Giovanni Gherdovich <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Pan Xinhui <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected] # v4.2 and later
Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
Signed-off-by: Ingo Molnar <[email protected]>

gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings

The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
with what looks like the wrong parameter. The write_lock_regs
function takes a pointer to the registers, not the bcm_kona_gpio
structure.

Fix the warning, and probably bug by changing the function to
pass reg_base instead of kona_gpio, fixing the following warning:

drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
  (different address spaces)
  expected void [noderef] <asn:2>*reg_base
  got struct bcm_kona_gpio *kona_gpio
  warning: incorrect type in argument 1 (different address spaces)
  expected void [noderef] <asn:2>*reg_base
  got struct bcm_kona_gpio *kona_gpio

Cc: [email protected]
Signed-off-by: Ben Dooks <[email protected]>
Acked-by: Ray Jui <[email protected]>
Reviewed-by: Markus Mayer <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models

We need to reenable the topology extensions CPUID leafs on newer models
too, if BIOS has disabled them, as we rely on them to get proper compute
unit topology.

Make the printk a once thing, while at it.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rui Huang <[email protected]>
Cc: Sherry Hurwitz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

gpio: select ANON_INODES

The build servers found that gpiolib is using ANON_INODES but
has forgotten to select it. Fix this.

Reported-by: kbuild test robot <[email protected]>
Fixes: 521a2ad6f862 ("gpio: add userspace ABI for GPIO line information")
Signed-off-by: Linus Walleij <[email protected]>

regulator: qcom_smd: add regulator ops for pm8941 lnldo

After "regulator: qcom_smd: add list_voltage callback" patch adding
pm8941 lnldo regulators would bug on list_voltages as it is a fixed
regulator without any linear range.
This patch fixes that issue by adding dedicated ops for pm8941 lnldo
without list_voltages callback.

Signed-off-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected] # v4.6

regulator: qcom_smd: add list_voltage callback

This patch adds support to list_voltage callback, so that consumers
like mmc core, can get information of supported voltage range.

Without this patch there is no way for mmc core to know this voltage range.

Signed-off-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Cc: [email protected] # v4.6

x86/cpu/intel: Introduce macros for Intel family numbers

Problem:

We have a boatload of open-coded family-6 model numbers.  Half of
them have these model numbers in hex and the other half in
decimal.  This makes grepping for them tons of fun, if you were
to try.

Solution:

Consolidate all the magic numbers.  Put all the definitions in
one header.

The names here are closely derived from the comments describing
the models from arch/x86/events/intel/core.c.  We could easily
make them shorter by doing things like s/SANDYBRIDGE/SNB/, but
they seemed fine even with the longer versions to me.

Do not take any of these names too literally, like "DESKTOP"
or "MOBILE".  These are all colloquial names and not precise
descriptions of everywhere a given model will show up.

Signed-off-by: Dave Hansen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Doug Thompson <[email protected]>
Cc: Eduardo Valentin <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Rajneesh Bhardwaj <[email protected]>
Cc: Souvik Kumar Chakravarty <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Vishwanath Somayaji <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

leds: handle suspend/resume in heartbeat trigger

The following phenomena was observed: when suspending the
system, sometimes the heartbeat LED was left on, glowing and
wasting power while the rest of the system is asleep, also
disturbing power dissapation measures on the odd suspend
cycle when it's left on.

Clearly this is not how we want the heartbeat trigger to
work: it should turn off and leave the LED off during
system suspend.

This removes the heartbeat trigger when preparing suspend and
restores it during resume. The trigger code will make sure all
LEDs are left in OFF state after removing the trigger, and
will re-enable the trigger on all LEDs after resuming.

Cc: [email protected]
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Signed-off-by: Jacek Anaszewski <[email protected]>

leds: core: Fix brightness setting upon hardware blinking enabled

Commit 76931edd54f8 ("leds: fix brightness changing when software blinking
is active") changed the semantics of led_set_brightness() which according
to the documentation should disable blinking upon any brightness setting.
Moreover it made it different for soft blink case, where it was possible
to change blink brightness, and for hardware blink case, where setting
any brightness greater than 0 was ignored.

While the change itself is against the documentation claims, it was driven
also by the fact that timer trigger remained active after turning blinking
off. Fixing that would have required major refactoring in the led-core,
led-class, and led-triggers because of cyclic dependencies.

Finally, it has been decided that allowing for brightness change during
blinking is beneficial as it can be accomplished without disturbing
blink rhythm.

The change in brightness setting semantics will not affect existing
LED class drivers that implement blink_set op thanks to the LED_BLINK_SW
flag introduced by this patch. The flag state will be from now on checked
in led_set_brightness() which will allow to distinguish between software
and hardware blink mode. In the latter case the control will be passed
directly to the drivers which apply their semantics on brightness set,
which is disable the blinking in case of most such drivers. New drivers
will apply new semantics and just change the brightness while hardware
blinking is on, if possible.

The issue was smuggled by subsequent LED core improvements, which modified
the code that originally introduced the problem.

Fixes: f1e80c07416a ("leds: core: Add two new LED_BLINK_ flags")
Signed-off-by: Tony Makkiel <[email protected]>
Signed-off-by: Jacek Anaszewski <[email protected]>

arm64: mm: always take dirty state from new pte in ptep_set_access_flags

Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:

| This patch breaks swapping for me.
| In the broken case, you'll see either systemd cpu time spike (because
| it's stuck in a page fault loop) or the system hang (because the
| application owning the screen is stuck in a page fault loop).

It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:

  1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
  2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
  3. A read faults due to the missing access flag
  4. ptep_set_access_flags is called with dirty = 0, due to the read fault
  5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
  6. A write faults, but pte_write is true so we get stuck

The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.

Cc: <[email protected]> # 4.3+
Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Reviewed-by: Catalin Marinas <[email protected]>
Reported-by: Alexander Graf <[email protected]>
Tested-by: Alexander Graf <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

gpio: include <linux/io-mapping.h> in gpiolib-of

When enabling the gpiolib for all archs a build robot came
up with this:

All errors (new ones prefixed by >>):

   drivers/gpio/gpiolib-of.c: In function 'of_mm_gpiochip_add_data':
>> drivers/gpio/gpiolib-of.c:317:2: error: implicit declaration of
   function 'iounmap' [-Werror=implicit-function-declaration]
     iounmap(mm_gc->regs);
     ^~~~~~~
   cc1: some warnings being treated as errors

Fix this by including <linux/io-mapping.h> explicitly.

Fixes: 296ad4acb8ef ("gpio: remove deps on ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB")
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

gpiolib: Fix unaligned used of reference counters

gpiolib relies on the reference counters to clean up the gpio_device
structure.

Although the number of get/put is properly aligned on gpiolib.c
itself, it does not take into consideration how the referece counters
are affected by other external functions such as cdev_add and device_add.

Because of this, after the last call to put_device, the reference counter
has a value of +3, therefore never calling gpiodevice_release.

Due to the fact that some of the device has already been cleaned on
gpiochip_remove, the library will end up OOPsing the kernel (e.g. a call
to of_gpiochip_find_and_xlate).

Cc: [email protected]
Signed-off-by: Ricardo Ribalda Delgado <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

gpiolib: Fix NULL pointer deference

Under some circumstances, a gpiochip might be half cleaned from the
gpio_device list.

This patch makes sure that the chip pointer is still valid, before
calling the match function.

[  104.088296] BUG: unable to handle kernel NULL pointer dereference at
0000000000000090
[  104.089772] IP: [<ffffffff813d2045>] of_gpiochip_find_and_xlate+0x15/0x80
[  104.128273] Call Trace:
[  104.129802]  [<ffffffff813d2030>] ? of_parse_own_gpio+0x1f0/0x1f0
[  104.131353]  [<ffffffff813cd910>] gpiochip_find+0x60/0x90
[  104.132868]  [<ffffffff813d21ba>] of_get_named_gpiod_flags+0x9a/0x120
...
[  104.141586]  [<ffffffff8163d12b>] gpio_led_probe+0x11b/0x360

Cc: [email protected]
Signed-off-by: Ricardo Ribalda Delgado <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

gpio: zynq: initialize clock even without CONFIG_PM

When the PM initialization was moved in the commit referenced below, the
code enabling the clock was removed from the probe function. On
CONFIG_PM=y kernels, this is not a problem as the pm resume hook enables
the clock, but when power management is disabled, all those pm_*
functions are noops and the clock is never enabled resulting in a
dysfunctional gpio controller.

Put the clock initialization back to support CONFIG_PM=n.

Cc: [email protected]
Signed-off-by: Helmut Grohne <[email protected]>
Fixes: 3773c195d387 ("gpio: zynq: Do PM initialization earlier to support gpio hogs")
Signed-off-by: Linus Walleij <[email protected]>

gpio: 104-dio-48e: Fix control port offset computation off-by-one error

There are only two control ports, each controlling three distinct I/O
ports. To compute the control port address offset for a respective I/O
port, the I/O port address offset should be divided by 3; dividing by 2
may result in not only the wrong address offset but possibly also an
out-of-bounds array memory access for a non-existent third control port.

Fixes: 1b06d64f7374 ("gpio: Add GPIO support for the ACCES 104-DIO-48E")
Signed-off-by: William Breathitt Gray <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

net-sysfs: fix missing <linux/of_net.h>

The of_find_net_device_by_node() function is defined in
<linux/of_net.h> but not included in the .c file that
implements it. Fix the following warning by including the
header:

net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?

Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bridge: Don't insert unnecessary local fdb entry on changing mac address

The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.

Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'iio-fixes-for-4.7a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus

Jonathan writes:

First round of iio fixes for the 4.7 cycle.

A slightly bumper set due to travel delaying the pull request and a fair few
issues with the recent merge window patches.  Patches all over the place.
The st-sensors one is probably the most involved, but definitly solves the
issues seen.  Note there are some other issues around that handler
(and the fact that a lot of boards tie a level interrupt chip to an
edge interrupt only irq chip).  These are not regressions however, so
will turn up the slow route.

* core
  - iio_trigger_attach_pollfunc had some really badly wrong error handling.
  Another nasty triggered whilst chasing down issues with the st sensors
  rework below.
* ad5592r
  - fix an off by one error when allocating channels.
* am2315
  - a stray mutex unlock before we ever take the lock.
* apds9960
  - missing a parent in the driver model (which should be the i2c device).
  Result is it doesn't turn up under /sys/bus/i2c/devices which some
  userspace code uses for repeatable device identification.
* as3935
  - ABI usage bug which meant a processed value was reported as raw. Now
    reporting scale as well to ensure userspace has the info it needs.
  - Don't return processed value via the buffer - it doesn't conform to
    the ABI and will overflow in some cases.
  - Fix a wrongly sized buffer which would overflow trashing part of the
    stack.  Also move it onto the heap as part of the fix.
* bh1780
  - a missing return after write in debugfs lead to an incorrect read and
    a null pointer dereference.
  - dereferencing the wrong pointer in suspend and resume leading to
    unpredictable results.
  - assign a static name to avoid accidentally ending up with no name if
    loaded via device tree.
* bmi160
  - output data rate for the accelerometer was incorrectly reported. Fix it.
  - writing the output data rate was also wrong due to reverse parameters.
* bmp280
  - error message for wrong chip ID gave the wrong expected value.
* hdc100x
  - mask for writing the integration time was wrong allowin g us to get
  'stuck' in a particular value with no way back.
  - temperature reported in celsius rather than millicelsius as per the
  ABI.
  - Get rid of some incorrect data shifting which lead to readings being
  rather incorrect.
* max44000
  - drop scale attribute for proximity as it is an unscaled value (depends
    on what is in range rather than anything knowable at the detector).
* st-pressure
  - ABI compliance fixes - units were wrong.
* st-sensors
  - We introduced some nasty issues with the recent switch over to a
  a somewhat threaded handler in that we broke using a software trigger
  with these devices.  Now do it properly.  It's a larger patch than ideal
  for a fix, but the logic is straight forward.
  - Make sure the trigger is initialized before requesting the interrupt.
  This matters now the interrupt can be shared. Before it was ugly and wrong
  but short of flakey hardware could not be triggered.
  - Hammer down the dataready pin at boot - otherwise with really
  unlucky timing things could get interestingly wedged requiring a hard power
  down of the chip.

usb: host: ehci-msm: Conditionally call ehci suspend/resume

This patch fixes a suspend/resume issue where the driver is blindly
calling ehci_suspend/resume functions when the ehci hasn't been setup.
This results in a crash during suspend/resume operations.

Signed-off-by: Andy Gross <[email protected]>
Tested-by: Pramod Gurav <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

MAINTAINERS: Add file patterns for usb device tree bindings

Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: host: ehci-tegra: Avoid getting the same reset twice

Starting with commit 0b52297f2288 ("reset: Add support for shared reset
controls") there is a reference count for reset control assertions. The
goal is to allow resets to be shared by multiple devices and an assert
will take effect only when all instances have asserted the reset.

In order to preserve backwards-compatibility, all reset controls become
exclusive by default. This is to ensure that reset_control_assert() can
immediately assert in hardware.

However, this new behaviour triggers the following warning in the EHCI
driver for Tegra:

[    3.365019] ------------[ cut here ]------------
[    3.369639] WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x16c/0x23c
[    3.382151] Modules linked in:
[    3.385214] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-next-20160503 #140
[    3.392769] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    3.399046] [<c010fa50>] (unwind_backtrace) from [<c010b120>] (show_stack+0x10/0x14)
[    3.406787] [<c010b120>] (show_stack) from [<c0347dcc>] (dump_stack+0x90/0xa4)
[    3.414007] [<c0347dcc>] (dump_stack) from [<c011f4fc>] (__warn+0xe8/0x100)
[    3.420964] [<c011f4fc>] (__warn) from [<c011f5c4>] (warn_slowpath_null+0x20/0x28)
[    3.428525] [<c011f5c4>] (warn_slowpath_null) from [<c03cc8cc>] (__of_reset_control_get+0x16c/0x23c)
[    3.437648] [<c03cc8cc>] (__of_reset_control_get) from [<c0526858>] (tegra_ehci_probe+0x394/0x518)
[    3.446600] [<c0526858>] (tegra_ehci_probe) from [<c04516d8>] (platform_drv_probe+0x4c/0xb0)
[    3.455029] [<c04516d8>] (platform_drv_probe) from [<c044fe78>] (driver_probe_device+0x1ec/0x330)
[    3.463892] [<c044fe78>] (driver_probe_device) from [<c0450074>] (__driver_attach+0xb8/0xbc)
[    3.472320] [<c0450074>] (__driver_attach) from [<c044e1ec>] (bus_for_each_dev+0x68/0x9c)
[    3.480489] [<c044e1ec>] (bus_for_each_dev) from [<c044f338>] (bus_add_driver+0x1a0/0x218)
[    3.488743] [<c044f338>] (bus_add_driver) from [<c0450768>] (driver_register+0x78/0xf8)
[    3.496738] [<c0450768>] (driver_register) from [<c010178c>] (do_one_initcall+0x40/0x170)
[    3.504909] [<c010178c>] (do_one_initcall) from [<c0c00ddc>] (kernel_init_freeable+0x158/0x1f8)
[    3.513600] [<c0c00ddc>] (kernel_init_freeable) from [<c0810784>] (kernel_init+0x8/0x114)
[    3.521770] [<c0810784>] (kernel_init) from [<c0107778>] (ret_from_fork+0x14/0x3c)
[    3.529361] ---[ end trace 4bda87dbe4ecef8a ]---

The reason is that Tegra SoCs have three EHCI controllers, each with a
separate reset line. However the first controller contains UTMI pads
configuration registers that are shared with its siblings and that are
reset as part of the first controller's reset. There is special code in
the driver to assert and deassert this shared reset at probe time, and
it does so irrespective of which controller is probed first to ensure
that these shared registers are reset before any of the controllers are
initialized. Unfortunately this means that if the first controller gets
probed first, it will request its own reset line and will subsequently
request the same reset line again (temporarily) to perform the reset.
This used to work fine before the above-mentioned commit, but now
triggers the new WARN.

Work around this by making sure we reuse the controller's reset if the
controller happens to be the first controller.

Cc: Philipp Zabel <[email protected]>
Cc: Hans de Goede <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: host: ehci-tegra: Grab the correct UTMI pads reset

There are three EHCI controllers on Tegra SoCs, each with its own reset
line. However, the first controller contains a set of UTMI configuration
registers that are shared with its siblings. These registers will only
be reset as part of the first controller's reset. For proper operation
it must be ensured that the UTMI configuration registers are reset
before any of the EHCI controllers are enabled, irrespective of the
probe order.

Commit a47cc24cd1e5 ("USB: EHCI: tegra: Fix probe order issue leading to
broken USB") introduced code that ensures the first controller is always
reset before setting up any of the controllers, and is never again reset
afterwards.

This code, however, grabs the wrong reset. Each EHCI controller has two
reset controls attached: 1) the USB controller reset and 2) the UTMI
pads reset (really the first controller's reset). In order to reset the
UTMI pads registers the code must grab the second reset, but instead it
grabbing the first.

Fixes: a47cc24cd1e5 ("USB: EHCI: tegra: Fix probe order issue leading to broken USB")
Acked-by: Jon Hunter <[email protected]>
Cc: [email protected]
Signed-off-by: Thierry Reding <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: mos7720: delete parport

parport subsystem has introduced parport_del_port() to delete a port
when it is going away. Without parport_del_port() the registered port
will not be unregistered.
To reproduce and verify the error:
Command to be used is : ls /sys/bus/parport/devices
1) without the device attached there is no output as there is no
registered parport.
2) Attach the device, and the command will show "parport0".
3) Remove the device and the command still shows "parport0".
4) Attach the device again and we get "parport1".

With the patch applied:
1) without the device attached there is no output as there is no
registered parport.
2) Attach the device, and the command will show "parport0".
3) Remove the device and there is no output as "parport0" is now
removed.
4) Attach device again to get "parport0" again.

Cc: <[email protected]> # 4.2+
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: OHCI: Don't mark EDs as ED_OPER if scheduling fails

Since ed_schedule begins with marking the ED as "operational",
the ED may be left in such state even if scheduling actually
fails.

This allows future submission attempts to smuggle this ED to the
hardware behind the scheduler's back and without linking it to
the ohci->eds_in_use list.

The former causes bandwidth saturation and data loss on isoc
endpoints, the latter crashes the kernel when attempt is made
to unlink such ED from this list.

Fix ed_schedule to update ED state only on successful return.

Signed-off-by: Michal Pecio <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

powerpc/mm/hash: Compute the segment size correctly for ISA 3.0

PowerISA 3.0 encodes the segment size in the second half of hash page
table entry. Update hpte_decode() accordingly.

Fixes: 50de596de8be ("powerpc/mm/hash: Add support for Power9 Hash")
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT

In some of the radix TLB flush routines, we use a local to store the
mm->context.id, AKA the PID.

Currently we use an int, but the PID is unsigned long, so large values
of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which
means all our comparisons against that value can never be true.

This means we'll issue TLB flushes when we shouldn't on radix enabled
machines.

Fix it by using an unsigned long for the local. Discovered by Coverity.

Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
[mpe: Write change log]
Signed-off-by: Michael Ellerman <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
  4.6, d_walk - 3.2+.

  The atomic_open() and coredump ones are regressions from this window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coredump: fix dumping through pipes
  fix a regression in atomic_open()
  fix d_walk()/non-delayed __d_free() race
  autofs braino fix for do_last()
  fix EOPENSTALE bug in do_last()

hwmon: (lm90) use proper type for update_interval

The code handles this variable always as unsigned, so adapt the type.

Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>

hwmon: (ina2xx) Document compatible for INA231

Document the compatible for INA231 sensor.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>

hwmon: (fam15h_power) Disable preemption when reading registers

We need to read a bunch of registers on each compute unit and possibly
on the current CPU too. Disable preemption around it. Otherwise, you
get:

  BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/327
  caller is read_registers+0x6a/0x110 [fam15h_power]
  CPU: 3 PID: 327 Comm: systemd-udevd Not tainted 4.7.0-rc1+ #4
  Hardware name: HP HP EliteBook 745 G3/807E, BIOS N73 Ver. 01.08 01/28/2016
  ...

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Rui Huang <[email protected]>
Cc: Sherry Hurwitz <[email protected]>
Cc: Guenter Roeck <[email protected]>
Acked-by: Huang Rui <[email protected]>
Tested-by: Huang Rui <[email protected]>
Fixes: fa7943449943 ("hwmon: (fam15h_power) Add compute unit accumulated power")
Signed-off-by: Guenter Roeck <[email protected]>

coredump: fix dumping through pipes

The offset in the core file used to be tracked with ->written field of
the coredump_params structure. The field was retired in favour of
file->f_pos.

However, ->f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
->pos field for this purpose since ->written was already reused.

Fixes: a00839395103 ("get rid of coredump_params->written").
Reported-by: Zbigniew Jędrzejewski-Szmek <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fix a regression in atomic_open()

open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong. That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open(). Easy to
fix, fortunately.

Spotted-by: "Yan, Zheng" <[email protected]>
Tested-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fix d_walk()/non-delayed __d_free() race

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Cc: [email protected] # v3.2+ (and watch out for __d_materialise_dentry())
Signed-off-by: Al Viro <[email protected]>

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in 1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog ]
Cc: 4.4+ <[email protected]> # 4.4+
Signed-off-by: Rafael J. Wysocki <[email protected]>

of: fix autoloading due to broken modalias with no 'compatible'

Because of an improper dereference, a stray 'C' character was output to
the modalias when no 'compatible' was specified. This is the case for
some old PowerMac drivers which only set the 'name' property. Fix it to
let them match again.

Reported-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Tested-by: Mathieu Malaterre <[email protected]>
Cc: Philipp Zabel <[email protected]>
Cc: Andreas Schwab <[email protected]>
Fixes: 6543becf26fff6 ("mod/file2alias: make modalias generation safe for cross compiling")
Cc: [email protected] # v3.9+
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added

The recent commit 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support
to ibm,client-architecture-support call") added a new PVR mask & value
to the start of the ibm_architecture_vec[] array.

However it missed the fact that further down in the array, we hard code
the offset of one of the fields, and then at boot use that value to
patch the value in the array. This means every update to the array must
also update the #define, ugh.

This means that on pseries machines we will misreport to firmware the
number of cores we support, by a factor of threads_per_core.

Fix it for now by updating the #define.

Fixes: 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call")
Cc: [email protected] # v4.0+
Signed-off-by: Michael Ellerman <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:

1) Fix missing alignment in next offset calculation for standard
   targets, introduced in the previous merge window, patch from
   Florian Westphal.

2) Fix to correct the handling of outgoing connections which use the
   SIP-pe such that the binding of a real-server is updated when needed.
   This was an omission from changes introduced by Marco Angaroni in
   the previous merge window too, to allow handling of outgoing
   connections by the SIP-pe. Patch and report came via Simon Horman.
====================

Signed-off-by: David S. Miller <[email protected]>

tcp: record TLP and ER timer stats in v6 stats

The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c676 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: fix tc_should_offload for specific clsact classes

When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.

Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

act_police: fix a crash during removal

The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.

This patch fixed the following crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
PGD d3c34067 PUD d3e18067 PMD 0
Oops: 0000 [#1] SMP
CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
Stack:
  ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
  0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
  ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
Call Trace:
  <IRQ>
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
  [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
  [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
  [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
  [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
  [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
  [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
  [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
  [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
  [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
  [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fq_codel: return non zero qlen in class dumps

We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.

Fixes: 640158536632 ("net: sched: restrict use of qstats qlen")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: John Fastabend <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'u32-hwoffload-fixes'

Jakub Kicinski says:

====================
cls_u32 hardware offload fixes

This set fixes two small issues with error codes I noticed
in cls_u32. Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.

Compile tested only.
====================

Signed-off-by: David S. Miller <[email protected]>

net: cls_u32: be more strict about skip-sw flag

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cls_u32: fix error code for invalid flags

'err' variable is not set in this test, we would return whatever
previous test set 'err' to.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Sridhar Samudrala <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Fix clang build warning:

./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]

fix by defining _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"This finally removes the CLK_IS_ROOT flag by picking up the last few
  stragglers that didn't get merged by anyone this time around.

  Better to do it now than wait for another one to pop up.  There's also
  a minor maintainers update and a Kconfig fix"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: nxp: Select MFD_SYSCON for creg driver
  MAINTAINERS: Add file patterns for clock device tree bindings
  clk: Remove CLK_IS_ROOT flag
  clk: microchip: Remove CLK_IS_ROOT
  powerpc/512x: clk: Remove CLK_IS_ROOT
  vexpress/spc: Remove CLK_IS_ROOT

net: fec: fix spelling mistakes and add missing newline

trivial fix to spelling mistakes and add missing newline in pr_err
messages

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes.

Fix a race condition and VLAN rx acceleration logic.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Simplify VLAN receive logic.

Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.

The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG. It cannot strip one and not strip the other. Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Fix tx push race condition.

Set the is_push flag in the software BD before the tx data is pushed to
the chip. It is possible to get the tx interrupt as soon as the tx data
is pushed. The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86, build: copy ldlinux.c32 to image.iso

For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.

Signed-off-by: H. Peter Anvin <[email protected]>
Cc: Linux Stable Tree <[email protected]>

rxrpc: fix ptr_ret.cocci warnings

net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: David Howells <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'rds-packet-assembly-fixes'

Sowmini Varadhan says:

====================
RDS: TCP: socket locking RDS packet assembly fixes

This three part patchset fixes bugs in synchronization between
rds_tcp_accept_one() and the rds-tcp send/recv path.

Patch 1 ensures that the lock_sock() is taken appropriately
and the RDS datagram reassembly state is reset to synchronize
with the receive path.

Patch 2 ensures that partially sent RDS datagrams will get
retransmitted after rds_tcp_accept_one() switches sockets.

Patch 3 fixes a race window which would prematurely re-enable
rds_send_xmit() before the rds_tcp_connection setup has been
completed in rds_tcp_accept_one().
====================

Signed-off-by: David S. Miller <[email protected]>

RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()

The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
    take t_conn_lock in rds_tcp_conn_connect
    send outgoing syn to peer
    drop t_conn_lock in rds_tcp_conn_connect
    incoming from peer triggers rds_tcp_accept_one, conn is
marked CONNECTING
    wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
    call rds_tcp_reset_callbacks
    [.. race-window where incoming syn-ack can cause the conn
to be marked UP from rds_tcp_state_change ..]
    lock_sock called from rds_tcp_reset_callbacks, and we set
t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.

Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks

When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely

When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed. Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.

This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fq_codel: fix NET_XMIT_CN behavior

My prior attempt to fix the backlogs of parents failed.

If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.

v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()

Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Reported-by: Stas Nichiporovich <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: WANG Cong <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpf, trace: use READ_ONCE for retrieving file ptr

In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.

We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.

Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull userns fixes from Eric Biederman:
"This contains two small but significant fixes to fs/namespace.c.

  The first adds a filesystem refcount drop on error.  The second
  corrects a test in fs_fully_visible which could be abused to allow
  mounting of proc or sysfs, when that should not be allowed.

  To keep myself honest I have tested to ensure the incorrect test in
  fs_fully_visible actually allows improper mounting of proc before the
  fix and that when fixed the improper mounting is not allowed"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mnt: fs_fully_visible test the proper mount for MNT_LOCKED
  mnt: If fs_fully_visible fails call put_filesystem.

block: missing bio_put following submit_bio_wait

submit_bio_wait() gives the caller an opportunity to examine
struct bio and so expects the caller to issue the put_bio()

This fixes a memory leak reported by a few people in 4.7-rc2
kmemleak report after 9082e87bfbf8 ("block: remove struct bio_batch")

Signed-off-by: Shaun Tancheff <[email protected]>
Tested-by: Catalin Marinas <[email protected]>
Tested-by: Larry [email protected]
Tested-by: David Drysdale <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

IB/IPoIB: Don't update neigh validity for unresolved entries

ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send.  This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it.  If the queue for this neighbor is full, then don't update the
alive timestamp.  That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.

Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Erez Shitrit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix alternate path code

Userspace flag IBV_QP_ALT_PATH is supposed to set the alternate path
including fields alt_pkey_index and alt_timeout.
Added IB_QP_PKEY_INDEX and IB_QP_TIMEOUT to the attribute mask when
calling mlx5_set_path for the alternate path to force setting the
alt_pkey_index and alt_timeout values.

Fixes: bf24481a3a7c4 ('IB/mlx5: Consider alternate path in pkey ...')
Signed-off-by: Achiad Shochat <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Reviewed-by: Jack Morgenstein <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix pkey_index length in the QP path record

Pkey index fields in the QP context path record are extended to 16
bits, as required by IB spec (version 1.3).
This change affects all QP commands which include path records.

To enable this change, moved the free adaptive routing flag bit
(free_ar) to the most significant byte of the QP path record.

Fixes: e126ba97dba9e ('mlx5: Add driver for Mellanox Connect-IB ...')
Signed-off-by: Noa Osherovich <[email protected]>
Reviewed-by: Jack Morgenstein <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix entries check in mlx5_ib_resize_cq

Verify that number of entries is less than device capability.
Add an appropriate warning message for error flow.

Fixes: bde51583f49b ('IB/mlx5: Add support for resize CQ')
Signed-off-by: Majd Dibbiny <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix entries checks in mlx5_ib_create_cq

Number of entries shouldn't be greater than the device's max
capability. This should be checked before rounding the entries number
to power of two.

Fixes: 51ee86a4af639 ('IB/mlx5: Fix check of number of entries...')
Signed-off-by: Majd Dibbiny <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Check BlueFlame HCA support

BlueFlame support is reported only for PFs when the HCA capability is
on.

Fixes: 938fe83c8dcbb ('net/mlx5_core: New device capabilities...')
Signed-off-by: Majd Dibbiny <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix returned values of query QP

Some variables were not initialized properly: max_recv_wr,
max_recv_sge, max_send_wr, qp_context and max_inline_data.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB...')
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Limit query HCA clock

When PAGE_SIZE is larger than 4K, the user shouldn't be able to query
the HCA core clock. This counter is within 4KB boundary and the
user-space shall not read information that's after this boundary.

Fixes: b368d7cb8ceb7 ('IB/mlx5: Add hca_core_clock_offset to...')
Signed-off-by: Majd Dibbiny <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx5: Fix FW version diaplay in sysfs

Add a 4-digit padding to show FW version in proper format.

Fixes: 9603b61de1eee ('mlx5: Move pci device handling from...')
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Noa Osherovich <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>