Git Repo - linux.git/log

infiniband: shut up a maybe-uninitialized warning

Some configurations produce this harmless warning when built with gcc
-Wmaybe-uninitialized:

  infiniband/core/cma.c: In function 'cma_get_net_dev':
  infiniband/core/cma.c:1242:12: warning: 'src_addr_storage.sin_addr.s_addr' may be used uninitialized in this function [-Wmaybe-uninitialized]

I previously reported this for the powerpc64 defconfig, but have now
reproduced the same thing for x86 as well, using gcc-5 or higher.

The code looks correct to me, and this change just rearranges it by
making sure we alway initialize the entire address structure to make the
warning disappear.  My first approach added an initialization at the
time of the declaration, which Doug commented may be too costly, so I
hope this version doesn't add overhead.

Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.7-rc6/buildall.powerpc.ppc64_defconfig.log.passed
Link: https://patchwork.kernel.org/patch/9212825/
Acked-by: Haggai Eran <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

crypto: aesni: shut up -Wmaybe-uninitialized warning

The rfc4106 encrypy/decrypt helper functions cause an annoying
false-positive warning in allmodconfig if we turn on
-Wmaybe-uninitialized warnings again:

  arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’:
  include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized]

The problem seems to be that the compiler doesn't track the state of the
'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end
section.

This takes the easy way out by adding a bogus initialization, which
should be harmless enough to get the patch into v4.9 so we can turn on
this warning again by default without producing useless output.  A
follow-up patch for v4.10 rearranges the code to make the warning go
away.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

rc: print correct variable for z8f0811

A recent rework accidentally left a debugging printk untouched while
changing the meaning of the variables, leading to an uninitialized
variable being printed:

drivers/media/i2c/ir-kbd-i2c.c: In function 'get_key_haup_common':
drivers/media/i2c/ir-kbd-i2c.c:62:2: error: 'toggle' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This prints the correct one instead, as we did before the patch.

Fixes: 00bb820755ed ("[media] rc: Hauppauge z8f0811 can decode RC6")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dib0700: fix nec repeat handling

When receiving a nec repeat, ensure the correct scancode is repeated
rather than a random value from the stack.  This removes the need for
the bogus uninitialized_var() and also fixes the warnings:

    drivers/media/usb/dvb-usb/dib0700_core.c: In function ‘dib0700_rc_urb_completion’:
    drivers/media/usb/dvb-usb/dib0700_core.c:679: warning: ‘protocol’ may be used uninitialized in this function

[sean addon: So after writing the patch and submitting it, I've bought the
             hardware on ebay. Without this patch you get random scancodes
             on nec repeats, which the patch indeed fixes.]

Signed-off-by: Sean Young <[email protected]>
Tested-by: Sean Young <[email protected]>
Cc: [email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

s390: pci: don't print uninitialized data for debugging

gcc correctly warns about an incorrect use of the 'pa' variable in case
we pass an empty scatterlist to __s390_dma_map_sg:

  arch/s390/pci/pci_dma.c: In function '__s390_dma_map_sg':
  arch/s390/pci/pci_dma.c:309:13: warning: 'pa' may be used uninitialized in this function [-Wmaybe-uninitialized]

This adds a bogus initialization to the function to sanitize the debug
output.  I would have preferred a solution without the initialization,
but I only got the report from the kbuild bot after turning on the
warning again, and didn't manage to reproduce it myself.

Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Sebastian Ott <[email protected]>
Acked-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nios2: fix timer initcall return value

When called more than twice, the nios2_time_init() function return an
uninitialized value, as detected by gcc -Wmaybe-uninitialized

arch/nios2/kernel/time.c: warning: 'ret' may be used uninitialized in this function

This makes it return '0' here, matching the comment above the function.

Acked-by: Ley Foon Tan <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: apm: avoid uninitialized data

apm_bios_call() can fail, and return a status in its argument structure.
If that status however is zero during a call from
apm_get_power_status(), we end up using data that may have never been
set, as reported by "gcc -Wmaybe-uninitialized":

  arch/x86/kernel/apm_32.c: In function ‘apm’:
  arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
  arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here

This changes the function to return "APM_NO_ERROR" here, which makes the
code more robust to broken BIOS versions, and avoids the warning.

Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Jiri Kosina <[email protected]>
Reviewed-by: Luis R. Rodriguez <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

NFSv4.1: work around -Wmaybe-uninitialized warning

A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use if
we enable -Wmaybe-uninitialized again:

fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized]

gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair results
in a nonzero return value here. Using PTR_ERR_OR_ZERO() instead makes
this clear to the compiler.

Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback races")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Kbuild: enable -Wmaybe-uninitialized warning for "make W=1"

Traditionally, we have always had warnings about uninitialized variables
enabled, as this is part of -Wall, and generally a good idea [1], but it
also always produced false positives, mainly because this is a variation
of the halting problem and provably impossible to get right in all cases
[2].

Various people have identified cases that are particularly bad for false
positives, and in commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized
when building with -Os"), I turned off the warning for any build that
was done with CC_OPTIMIZE_FOR_SIZE.  This drastically reduced the number
of false positive warnings in the default build but unfortunately had
the side effect of turning the warning off completely in 'allmodconfig'
builds, which in turn led to a lot of warnings (both actual bugs, and
remaining false positives) to go in unnoticed.

With commit 877417e6ffb9 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") enabled the warning again for allmodconfig builds in v4.7
and in v4.8-rc1, I had finally managed to address all warnings I get in
an ARM allmodconfig build and most other maybe-uninitialized warnings
for ARM randconfig builds.

However, commit 6e8d666e9253 ("Disable "maybe-uninitialized" warning
globally") was merged at the same time and disabled it completely for
all configurations, because of false-positive warnings on x86 that I had
not addressed until then.  This caused a lot of actual bugs to get
merged into mainline, and I sent several dozen patches for these during
the v4.9 development cycle.  Most of these are actual bugs, some are for
correct code that is safe because it is only called under external
constraints that make it impossible to run into the case that gcc sees,
and in a few cases gcc is just stupid and finds something that can
obviously never happen.

I have now done a few thousand randconfig builds on x86 and collected
all patches that I needed to address every single warning I got (I can
provide the combined patch for the other warnings if anyone is
interested), so I hope we can get the warning back and let people catch
the actual bugs earlier.

This reverts the change to disable the warning completely and for now
brings it back at the "make W=1" level, so we can get it merged into
mainline without introducing false positives.  A follow-up patch enables
it on all levels unless some configuration option turns it off because
of false-positives.

Link: https://rusty.ozlabs.org/?p=232
Link: https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/stackdepot: export save/fetch stack for drivers

Some drivers would like to record stacktraces in order to aide leak
tracing.  As stackdepot already provides a facility for only storing the
unique traces, thereby reducing the memory required, export that
functionality for use by drivers.

The code was originally created for KASAN and moved under lib in commit
cd11016e5f521 ("mm, kasan: stackdepot implementation.  Enable stackdepot
for SLAB") so that it could be shared with mm/.  In turn, we want to
share it now with drivers.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chris Wilson <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: kmemleak: scan .data.ro_after_init

Limit the number of kmemleak false positives by including
.data.ro_after_init in memory scanning. To achieve this we need to add
symbols for start and end of the section to the linker scripts.

The problem was been uncovered by commit 56989f6d8568 ("genetlink: mark
families as __ro_after_init").

Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Catalin Marinas <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB

While testing OBJFREELIST_SLAB integration with pagealloc, we found a
bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
CFLGS_OBJFREELIST_SLAB. When it happened, critical allocations needed
for loading drivers or creating new caches will fail.

The original kmem_cache is created early making OFF_SLAB not possible.
When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
is enabled it will try to enable it first under certain conditions.
Given kmem_cache(sys) reuses the original flag, you can have both flags
at the same time resulting in allocation failures and odd behaviors.

This fix discards allocator specific flags from memcg before calling
create_cache.

The bug exists since 4.6-rc1 and affects testing debug pagealloc
configurations.

Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Thelen <[email protected]>
Signed-off-by: Thomas Garnier <[email protected]>
Tested-by: Thomas Garnier <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: fix unfreezable coredumping task

It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.

Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation. So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.

Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap: don't allow partially uptodate page for pipes

Starting from 4.9-rc1 kernel, I started noticing some test failures of
sendfile(2) and splice(2) (sendfile0N and splice01 from LTP) when
testing on sub-page block size filesystems (tested both XFS and ext4),
these syscalls start to return EIO in the tests.  e.g.

  sendfile02    1  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 26, got: -1
  sendfile02    2  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 24, got: -1
  sendfile02    3  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 22, got: -1
  sendfile02    4  TFAIL  :  sendfile02.c:133: sendfile(2) failed to return expected value, expected: 20, got: -1

This is because that in sub-page block size cases, we don't need the
whole page to be uptodate, only the part we care about is uptodate is OK
(if fs has ->is_partially_uptodate defined).

But page_cache_pipe_buf_confirm() doesn't have the ability to check the
partially-uptodate case, it needs the whole page to be uptodate.  So it
returns EIO in this case.

This is a regression introduced by commit 82c156f85384 ("switch
generic_file_splice_read() to use of ->read_iter()").  Prior to the
change, generic_file_splice_read() doesn't allow partially-uptodate page
either, so it worked fine.

Fix it by skipping the partially-uptodate check if we're working on a
pipe in do_generic_file_read(), so we read the whole page from disk as
long as the page is not uptodate.

I think the other way to fix it is to add the ability to check & allow
partially-uptodate page to page_cache_pipe_buf_confirm(), but that is
much harder to do and seems gain little.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Eryu Guan <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hugetlb: fix huge page reservation leak in private mapping error paths

Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly
allocated huge page.

If a reservation was associated with the huge page, alloc_huge_page()
consumed the reservation while allocating.  When the newly allocated
page is freed in free_huge_page(), it will increment the global
reservation count.  However, the reservation entry in the reserve map
will remain.

This is not an issue for shared mappings as the entry in the reserve map
indicates a reservation exists.  But, an entry in a private mapping
reserve map indicates the reservation was consumed and no longer exists.
This results in an inconsistency between the reserve map and the global
reservation count.  This 'leaks' a reserved huge page.

Create a new routine restore_reserve_on_error() to restore the reserve
entry in these specific error paths.  This routine makes use of a new
function vma_add_reservation() which will add a reserve entry for a
specific address/page.

In general, these error paths were rarely (if ever) taken on most
architectures.  However, powerpc contained arch specific code that that
resulted in an extra fault and execution of these error paths on all
private mappings.

Fixes: 67961f9db8c4 ("mm/hugetlb: fix huge page reserve accounting for private mappings)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Reported-by: Jan Stancek <[email protected]>
Tested-by: Jan Stancek <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Kirill A . Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix not enough credit panic

The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192).  ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up.

The total credit should include 3 times of "num_dx_leaves" from
ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
ocfs2_dx_dir_format_cluster().  But only two times is included in
ocfs2_dx_dir_rebalance_credits(), fix it.

This can cause read-only fs(v4.1+) or panic for mainline linux depending
on mount option.

  ------------[ cut here ]------------
  kernel BUG at fs/ocfs2/journal.c:775!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
  RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
  RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
  RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
  RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
  R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
  R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
  FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
  Call Trace:
    ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
    ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
    ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
    ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
    ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
    ocfs2_mknod+0x5a2/0x1400 [ocfs2]
    ocfs2_create+0x73/0x180 [ocfs2]
    vfs_create+0xd8/0x100
    lookup_open+0x185/0x1c0
    do_last+0x36d/0x780
    path_openat+0x92/0x470
    do_filp_open+0x4a/0xa0
    do_sys_open+0x11a/0x230
    SyS_open+0x1e/0x20
    system_call_fastpath+0x12/0x71
  Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
  RIP  ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  ---[ end trace 91ac5312a6ee1288 ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Junxiao Bi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joseph Qi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "console: don't prefer first registered if DT specifies stdout-path"

This reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path").

The reverted commit changes existing behavior on which many ARM boards
rely.  Many ARM small-board-computers, like e.g.  the Raspberry Pi have
both a video output and a serial console.  Depending on whether the user
is using the device as a more regular computer; or as a headless device
we need to have the console on either one or the other.

Many users rely on the kernel behavior of the console being present on
both outputs, before the reverted commit the console setup with no
console= kernel arguments on an ARM board which sets stdout-path in dt
would look like this:

  [root@localhost ~]# cat /proc/consoles
  ttyS0                -W- (EC p a)    4:64
  tty0                 -WU (E  p  )    4:1

Where as after the reverted commit, it looks like this:

  [root@localhost ~]# cat /proc/consoles
  ttyS0                -W- (EC p a)    4:64

This commit reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path") restoring the original
behavior.

Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: hwpoison: fix thp split handling in memory_failure()

When memory_failure() runs on a thp tail page after pmd is split, we
trigger the following VM_BUG_ON_PAGE():

   page:ffffd7cd819b0040 count:0 mapcount:0 mapping:         (null) index:0x1
   flags: 0x1fffc000400000(hwpoison)
   page dumped because: VM_BUG_ON_PAGE(!page_count(p))
   ------------[ cut here ]------------
   kernel BUG at /src/linux-dev/mm/memory-failure.c:1132!

memory_failure() passed refcount and page lock from tail page to head
page, which is not needed because we can pass any subpage to
split_huge_page().

Fixes: 61f5d698cc97 ("mm: re-enable THP")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: <[email protected]> [4.5+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

swapfile: fix memory corruption via malformed swapfile

When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.

This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jann Horn <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Jerome Marchand <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma.c: check the max limit for cma allocation

CMA allocation request size is represented by size_t that gets truncated
when same is passed as int to bitmap_find_next_zero_area_off.

We observe that during fuzz testing when cma allocation request is too
high, bitmap_find_next_zero_area_off still returns success due to the
truncation. This leads to kernel crash, as subsequent code assumes that
requested memory is available.

Fail cma allocation in case the request breaches the corresponding cma
region size.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Shiraz Hashim <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/bloat-o-meter: fix SIGPIPE

Fix piping output to a program which quickly exits (read: head -n1)

$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux | head -n1
add/remove: 0/0 grow/shrink: 9/60 up/down: 124/-305 (-181)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

Link: http://lkml.kernel.org/r/20161028204618.GA29923@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Matt Mackall <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

shmem: fix pageflags after swapping DMA32 object

If shmem_alloc_page() does not set PageLocked and PageSwapBacked, then
shmem_replace_page() needs to do so for itself. Without this, it puts
newpage on the wrong lru, re-unlocks the unlocked newpage, and system
descends into "Bad page" reports and freeze; or if CONFIG_DEBUG_VM=y, it
hits an earlier VM_BUG_ON_PAGE(!PageLocked), depending on config.

But shmem_replace_page() is not a common path: it's only called when
swapin (or swapoff) finds the page was already read into an unsuitable
zone: usually all zones are suitable, but gem objects for a few drm
devices (gma500, omapdrm, crestline, broadwater) require zone DMA32 if
there's more than 4GB of ram.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: <[email protected]> [4.8.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, frontswap: make sure allocated frontswap map is assigned

Christian Borntraeger reports:

With commit 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to
static key") kmemleak complains about a memory leak in swapon

    unreferenced object 0x3e09ba56000 (size 32112640):
      comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
         __vmalloc_node_range+0x194/0x2d8
         vzalloc+0x58/0x68
         SyS_swapon+0xd60/0x12f8
         system_call+0xd6/0x270

Turns out kmemleak is right.  We now allocate the frontswap map
depending on the kernel config (and no longer on the enablement)

  swapfile.c:
  [...]
      if (IS_ENABLED(CONFIG_FRONTSWAP))
                frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));

but later on this is passed along
  --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);

and ignored if frontswap is disabled
  --> frontswap_init(p->type, frontswap_map);

  static inline void frontswap_init(unsigned type, unsigned long *map)
  {
        if (frontswap_enabled())
                __frontswap_init(type, map);
  }

Thing is, that frontswap map is never freed.

The leakage is relatively not that bad, because swapon is an infrequent
and privileged operation.  However, if the first frontswap backend is
registered after a swap type has been already enabled, it will WARN_ON
in frontswap_register_ops() and frontswap will not be available for the
swap type.

Fix this by making sure the map is assigned by frontswap_init() as long
as CONFIG_FRONTSWAP is enabled.

Fixes: 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to static key")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Christian Borntraeger <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: remove extra newline from allocation stall warning

Commit 63f53dea0c98 ("mm: warn about allocations which stall for too
long") by error embedded "\n" in the format string, resulting in strange
output.

  [  722.876655] kworker/0:1: page alloction stalls for 160001ms, order:0
  [  722.876656] , mode:0x2400000(GFP_NOIO)
  [  722.876657] CPU: 0 PID: 6966 Comm: kworker/0:1 Not tainted 4.8.0+ #69

Link: http://lkml.kernel.org/r/1476026219-7974-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for v4.9-rc4

- Kick the vcpu when a pending interrupt becomes pending again
- Prevent access to invalid interrupt registers
- Invalid TLBs when two vcpus from the same VM share a CPU

perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake

Several uncore IMC PCI IDs are missed for Intel SkyLake.

Add the PCI IDs for SkyLake Y, U, H and S platforms.
Rename the ID macros for 0x191f and 0x190c.

The corresponding bug:

https://bugzilla.kernel.org/show_bug.cgi?id=187301

The related datasheets are also attached in the bug entry for permanent reference.

Reported-by: Ben Widawsky <[email protected]>
Tested-by: Ben Widawsky <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

sched/deadline: Fix typo in a comment

In the comment:

        /*
         * The task might have changed its scheduling policy to something
         * different than SCHED_DEADLINE (through switched_fromd_dl()).
         */

s/fromd/from/

Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Luca Abeni <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/5408b3b3f9ee197a7b7f10fb834341100a4f2c88.1478599881.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'linus' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <[email protected]>

PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by doing our error and wakeup checking
(particularly, for the async_error flag) after waiting for children to
suspend, instead of before. This also helps align the logic for the noirq and
late suspend cases with the logic in __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 (PM / sleep: Asynchronous threads for suspend_late)
Fixes: 28b6fd6e3779 (PM / sleep: Asynchronous threads for suspend_noirq)
Reported-by: Jeffy Chen <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

splice: remove detritus from generic_file_splice_read()

i_size check is a leftover from the horrors that used to play with
the page cache in that function. With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).

Spotted-by: Abhi Das <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux into drm-fixes

imx-drm: fix possible hangup when disabling crtcs

- only ever disable the display controller (DC) module after all plane
  IDMAC channels are stopped. This fixes a regression introduced by the
  atomic modeset conversion.

* tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux:
  drm/imx: disable planes before DC

Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Regression fix for powerplay on some iceland boards.

* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: implement get_clock_by_type for iceland.
  drm/amd/powerplay/smu7: fix checks in smu7_get_evv_voltages (v2)
  drm/amd/powerplay: update phm_get_voltage_evv_on_sclk for iceland
  drm/amd/powerplay: propagate errors in phm_get_voltage_evv_on_sclk

drm/udl: make control msg static const. (v2)

Thou shall not send control msg from the stack,
does that mean I can send it from the RO memory area?

and it looks like the answer is no, so here's
v2 which kmemdups.

Reported-by: poma
Tested-by: poma <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

PCI: VMD: Update filename to reflect move

Updating MAINTAINERS to reflect the new location of the VMD driver.

Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

libceph: initialize last_linger_id with a large integer

osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies. Starting with a large integer should ease the task
of telling apart kernel and userspace clients.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: fix legacy layout decode with pool 0

If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid pool.

Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825
Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: use default file splice read callback

Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

drm/amd/powerplay: implement get_clock_by_type for iceland.

iceland use pptable v0.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Merge remote-tracking branch 'mkp-scsi/4.9/scsi-fixes' into fixes

Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: Couple of router fixes

v1->v2:
- patch2:
- use net_eq
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces

Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:

mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter))

Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Fix handling of neighbour structure

__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.

Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.

Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qed-fixes'

Yuval Mintz says:

====================
qed: Fix RoCE infrastructure

This series fixes 2 basic issues with RoCE support,
one handles a missing configuration in the initial infrastructure
support while the other is a regression introduced by one of the
initial fix submissions.
====================

Signed-off-by: David S. Miller <[email protected]>

qed: Correct rdma params configuration

Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.

Fixes: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: configure ll2 RoCE v1/v2 flavor correctly

Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.

Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

arm64: dts: rockchip: add three new resets for rk3399 PCIe controller

pm_rst, aclk_rst and pclk_rst should be controlled by driver, so we
need to add these three resets for PCIe controller.

Signed-off-by: Shawn Lin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Heiko Stuebner <[email protected]>

PCI: rockchip: Add three new resets as required properties

pm_rst, aclk_rst, pclk_rst was controlled by ROM code so the software
wasn't needed to control it again in theory. But it didn't work properly,
so we do need to do it again and add enough delay between the assert of
pm_rst and the deassert of pm_rst. The Soc intergrated with this
controller, rk3399, is still under MP test internally, so the backward
compatibility won't be a big deal.

Signed-off-by: Shawn Lin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Acked-by: Rob Herring <[email protected]>

ipv4: update comment to document GSO fragmentation cases.

This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani <[email protected]>
Reviewed-by: Shmulik Ladkani <[email protected]>
Signed-off-by: Lance Richardson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/amd/powerplay/smu7: fix checks in smu7_get_evv_voltages (v2)

Only check if the tables exist in relevant configs. This
fixes a failure on V0 tables.

v2: fix version check as suggested by Rex

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: update phm_get_voltage_evv_on_sclk for iceland

Was missing the handling for iceland.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: propagate errors in phm_get_voltage_evv_on_sclk

Missing for one case.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect

When a LOCALINV WR is flushed, the frmr is marked STALE, then
frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
are then recovered when frwr_op_map hunts for an INVALID frmr to
use.

All other cases that need frmr recovery leave that SGL DMA-mapped.
The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.

To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
alter the recovery logic (rather than the hot frwr_op_unmap_sync
path) to distinguish among these cases. This solution also takes
care of the case where multiple LOCAL_INV WRs are issued for the
same rpcrdma_req, some complete successfully, but some are flushed.

Reported-by: Vasco Steinmetz <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Vasco Steinmetz <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

ppdev: fix double-free of pp->pdev->name

free_pardevice() is called by parport_unregister_device() and already frees
pp->pdev->name, don't try to do it again.

This bug causes kernel crashes.

I found and verified this with KASAN and some added pr_emerg()s:

[   60.316568] pp_release: pp->pdev->name == ffff88039cb264c0
[   60.316692] free_pardevice: freeing par_dev->name at ffff88039cb264c0
[   60.316706] pp_release: kfree(ffff88039cb264c0)
[   60.316714] ==========================================================
[   60.316722] BUG: Double free or freeing an invalid pointer
[   60.316731] Unexpected shadow byte: 0xFB
[   60.316801] Object at ffff88039cb264c0, in cache kmalloc-32 size: 32
[   60.316813] Allocated:
[   60.316824] PID = 1695
[   60.316869] Freed:
[   60.316880] PID = 1695
[   60.316935] ==========================================================

Signed-off-by: Jann Horn <[email protected]>
Acked-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: cdc-acm: fix TIOCMIWAIT

The TIOCMIWAIT implementation would return -EINVAL if any of the three
supported signals were included in the mask.

Instead of returning an error in case TIOCM_CTS is included, simply
drop the mask check completely, which is in accordance with how other
drivers implement this ioctl.

Fixes: 5a6a62bdb925 ("cdc-acm: add TIOCMIWAIT")
Cc: stable <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: tcp response should set oif only if it is L3 master

Lorenzo noted an Android unit test failed due to e0d56fdd7342:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."

Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.

Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls")
Reported-by: Lorenzo Colitti <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Tested-by: Lorenzo Colitti <[email protected]>
Acked-by: Lorenzo Colitti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).

Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.

A similar patch was signed-off and tested-by Allan Chou
<[email protected]> on 2015-12-01.

Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.

Tested-by: Allan Chou <[email protected]>
Tested-by: Chris Roth <[email protected]>
Tested-by: Artjom Simon <[email protected]>
Signed-off-by: Allan Chou <[email protected]>
Signed-off-by: Chris Roth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
====================

Signed-off-by: David S. Miller <[email protected]>

rtnl: reset calcit fptr in rtnl_unregister()

To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <[email protected]>
Cc: Greg Rose <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drbd: Fix kernel_sendmsg() usage - potential NULL deref

Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.

DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);

tcp_sendmsg() used to totally ignore the size parameter.
57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.

Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time". Apparently that was unlikely enough for most, so this went
unnoticed for years.

Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/

This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.

It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).

It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().

Fixes: b411b3637fa7 ("The DRBD driver")
Cc: <[email protected]> # 2.6.33.x-
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reported-by: Christoph Lechleitner <[email protected]>
Tested-by: Christoph Lechleitner <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
[changed oneliner to be "obvious" without context; more verbose message]
Signed-off-by: Lars Ellenberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

vxlan: hide unused local variable

A bugfix introduced a harmless warning in v4.9-rc4:

drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]

This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.

Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Start completion queue negotiation at server-provided optimum values

Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.

Signed-off-by: John Allen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: icmp_route_lookup should use rt dev to determine L3 domain

icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qcom-emac-pause'

Timur Tabi says:

====================
net: qcom/emac: ensure that pause frames are enabled

The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it. Therefore, we
need to enable flow control and force the phy to pass pause frames.
====================

Signed-off-by: David S. Miller <[email protected]>

net: qcom/emac: enable flow control if requested

If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.

Signed-off-by: Timur Tabi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qcom/emac: configure the external phy to allow pause frames

Pause frames are used to enable flow control. A MAC can send and
receive pause frames in order to throttle traffic. However, the PHY
must be configured to allow those frames to pass through.

Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Timur Tabi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ACPI / platform: Add support for build-in properties

We have a couple of drivers, acpi_apd.c and acpi_lpss.c,
that need to pass extra build-in properties to the devices
they create. Previously the drivers added those properties
to the struct device which is member of the struct
acpi_device, but that does not work. Those properties need
to be assigned to the struct device of the platform device
instead in order for them to become available to the
drivers.

To fix this, this patch changes acpi_create_platform_device
function to take struct property_entry pointer as parameter.

Fixes: 20a875e2e86e (serial: 8250_dw: Add quirk for APM X-Gene SoC)
Signed-off-by: Heikki Krogerus <[email protected]>
Tested-by: Yazen Ghannam <[email protected]>
Tested-by: Jérôme de Bretagne <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

3 more amdgpu fixes.

* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: return false instead of -EINVAL
  drm/amdgpu/powerplay/smu7: fix unintialized data usage
  drm/amdgpu: fix crash in acp_hw_fini

Merge tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel into drm-fixes

i915 fixes, include Sandybridge rendering regression fix.

* tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Limit Valleyview and earlier to only using mappable scanout
  drm/i915: Round tile chunks up for constructing partial VMAs
  drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms
  drm/i915/dp: BDW cdclk fix for DP audio
  drm/i915/vlv: Prevent enabling hpd polling in late suspend
  drm/i915: Respect alternate_ddc_pin for all DDI ports

x86/cpu: Deal with broken firmware (VMWare/XEN)

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <[email protected]>,
Reported-by: M. Vefa Bicakci <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <[email protected]>

Merge tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This became a largish pull-request, as we've got a bunch of pending
  ASoC fixes at this time. One noticeable change is the removal of error
  directive in uapi/sound/asoc.h. We found that the API has been already
  used on Chromebooks, so we need to support it even now.

  A slight big LOC is found in Qualcomm lpass driver, but the rest are
  all small and easy fixes for ASoC drivers (sti, sun4i, Realtek codecs,
  Intel, tas571x, etc) in addition to the patches to harden the ALSA
  core proc file accesses"

* tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: info: Return error for invalid read/write
  ALSA: info: Limit the proc text input size
  ASoC: samsung: spdif: Fix DMA filter initialization
  ASoC: sun4i-codec: Enable bus clock after getting GPIO
  ASoC: lpass-cpu: add module licence and description
  ASoC: lpass-platform: Fix broken pcm data usage
  ASoC: sun4i-codec: return error code instead of NULL when create_card fails
  ASoC: hdmi-codec: Fix hdmi_of_xlate_dai_name when #sound-dai-cells = <0>
  ASoC: samsung: get access to DMA engine early to defer probe properly
  ASoC: da7219: Connect output enable register to DAIOUT
  ASoC: Intel: Skylake: Fix to turn off hdmi power on probe failure
  ASoC: sti-sas: enable fast io for regmap
  ASoC: sti: fix channel status update after playback start
  ASoC: PXA: Brownstone needs I2C
  ASoC: Intel: Skylake: Always acquire runtime pm ref on unload
  ASoC: Intel: Atom: add terminate entry for dmi_system_id tables
  ASoC: rt298: fix jack type detect error
  ASoC: rt5663: fix a debug statement
  ASoC: cs4270: fix DAPM stream name mismatch
  ASoC: Intel: haswell depends on sst-firmware
  ...

Merge tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fix from Mike Marshall:
"We recently refactored the Orangefs debugfs code. The refactor seemed
  to trigger [email protected]'s static tester to find a possible
  double-free in the code.

  While designing the fix we saw a condition under which the buffer
  being freed could also be overflowed.

  We also realized how to rebuild the related debugfs file's "contents"
  (a string) without deleting and re-creating the file.

  This fix should eliminate the possible double-free, the potential
  overflow and improve code readability"

* tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: clean up debugfs

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Two bug fixes

   - a memory alignment fix in the s390 only hypfs code

   - a fix for the generic percpu code that caused ftrace to break on
     s390. This is not relevant for x86 but for all architectures that
     use the generic percpu code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  percpu: use notrace variant of preempt_disable/preempt_enable
  s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment

net: bgmac: fix reversed checks for clock control flag

This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.

This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.

Fixes: db791eb2970b ("net: ethernet: bgmac: convert to feature flags")
Fixes: 4af1474e6198 ("net: bgmac: Fix errant feature flag check")
Cc: Jon Mason <[email protected]>
Signed-off-by: Rafał Miłecki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bna: Add synchronization for tx ring.

We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.

bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.

Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().

Signed-off-by: Benjamin Poirier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net/mlx4_en: Fix panic during reboot"

This reverts commit 9d2afba058722d40cc02f430229c91611c0e8d16.

The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.

The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.

Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-ipv6: on device mtu change do not add mtu to mtu-less routes

Routes can specify an mtu explicitly or inherit the mtu from
the underlying device - this inheritance is implemented in
dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().

Currently changing the mtu of a device adds mtu explicitly
to routes using that device.

ie.
  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1

After this patch the route entry no longer changes unless it already has an mtu.
There is no need: this inheritance is already done in ip6_mtu()

  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route add local 2000::2 dev lo mtu 2000
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 1501
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1
  # ip -6 route del local 2000::2

This is desirable because changing device mtu and then resetting it
to the previous value shouldn't change the user visible routing table.

Signed-off-by: Maciej Żenczykowski <[email protected]>
CC: Eric Dumazet <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock: fix sendmmsg for partial sendmsg

Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e6020 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Maciej Żenczykowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.

When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.

Now add one flag to indicate if new macvlan port is created.

Signed-off-by: Gao Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression

This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").

The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).

[mkp: clarified patch description]

Fixes: 1e793f6fc0db920400574211c48f9157a37e3945
Reported-by: Jens Axboe <[email protected]>
CC: [email protected]
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Sumit Saxena <[email protected]>
Tested-by: Sumit Saxena <[email protected]>
Reviewed-by: Tomas Henzl <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems

cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: <[email protected]> # v4.4..
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

perf hists: Fix column length on --hierarchy

Markus reported that there's a weird behavior on perf top --hierarchy
regarding the column length.

Looking at the code, I found a dubious code which affects the symptoms.
When --hierarchy option is used, the last column length might be
inaccurate since it skips to update the length on leaf entries.

I cannot remember why it did and looks like a leftover from previous
version during the development.

Anyway, updating the column length often is not harmful. So let's move
the code out.

Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 1a3906a7e6b9 ("perf hists: Resort hist entries with hierarchy")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Fix column indentation on --hierarchy

When horizontall scrolling is used in hierarchy mode, the the right most
column has unnecessary indentation. Actually it's needed only if some
of left (overhead) columns were shown.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Show folded sign properly on --hierarchy

When horizontal scrolling is used in hierarchy mode, the folded signed
disappears at the right most column.

Committer note:

To test it, run 'perf top --hierarchy, see the '+' symbol at the first
column, then press the right arrow key, the '+' symbol will disappear,
this patch fixes that.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Move 'width -= 2' invariant to right after the if/else ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Fix indentation of folded sign on --hierarchy

It should indent 2 spaces for folded sign and a whitespace.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hist browser: Fix hierarchy column counts

The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.

But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.

In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.

Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

drm/imx: disable planes before DC

If the DC clock is disabled before the attached IDMACs are properly
stopped the IDMACs may hang the IPU or even the whole system.

Make sure the IDMACs are in safe state by disabling the planes before
removal of the DC clock.

Also set the atomic parameter to false to stop calling the atomic_begin
hook, which does nothing useful as we immediately afterwards turn off
vblank interrupts and possibly send the pending vblank event.

Fixes: 33f14235302f (drm/imx: atomic phase 1: Use transitional atomic
CRTC and plane helpers)
Signed-off-by: Lucas Stach <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>

scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove

If a command is aborted in the kernel but not in the adapter, it might be
considered complete and its DMA memory released, but it is still alive in
the adapter, which will trigger an invalid DMA access upon its completion
(in the DMA operations to deliver the command response to the driver).

On powerpc platforms with IOMMU/EEH capabilities, the problem is observed
during PCI device removal with ongoing IO requests -- which might trigger
an EEH event very often, pointing to a 'TCE Request Page Access Error'.

In that path, which is qla2x00_remove_one(), the commands are aborted in
qla2x00_abort_all_cmds(), which does not perform an abort in the adapter
as is done in qla2xxx_eh_abort() for example.

So, this patch changes qla2x00_abort_all_cmds() to abort commands in the
adapter too, with a call to qla2xxx_eh_abort(), which already implements
all the logic to submit abort requests and handle responses.

Reported-by: Naresh Bannoth <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: do not queue commands when unloading

When the driver is unloading, in qla2x00_remove_one(), there is a single
call/point in time to abort ongoing commands, qla2x00_abort_all_cmds(),
which is still several steps away from the call to scsi_remove_host().

If more commands continue to arrive and be processed during that
interval, when the driver is tearing down and releasing its structures,
it might potentially hit an oops due to invalid memory access:

    Unable to handle kernel paging request for data at address 0x00000138
    <...>
    NIP [d000000004700a40] qla2xxx_queuecommand+0x80/0x3f0 [qla2xxx]
    LR [d000000004700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]

So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.

Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libcxgbi: fix incorrect DDP resource cleanup

Before calling task_release_itt() task data is memset to zero because of
which DDP context information is lost resulting in incorrect DDP
resource cleanup, to fix this call task_release_itt() before memset.

Signed-off-by: Varun Prakash <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

netfilter: nf_tables: fix oops when inserting an element into a verdict map

Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
   map sourcemap {
     type ipv4_addr : verdict;
   }

   chain postrouting {
     ip saddr vmap @sourcemap accept
   }
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----

results in a kernel oops:
BUG: unable to handle kernel paging request at 0000000000001344
IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
[...]
Call Trace:
  [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
  [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
  [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
  [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
  [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
  [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
  [<ffffffff8132c400>] ? nla_validate+0x60/0x80
  [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.

Reported-by: Dalegaard <[email protected]>
Signed-off-by: Liping Zhang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: refine gc worker heuristics

Nicolas Dichtel says:
  After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
  remove timed-out entries"), netlink conntrack deletion events may be
  sent with a huge delay.

Nicolas further points at this line:

  goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

and indeed, this isn't optimal at all.  Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge.  But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.

We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).

So, after some discussion with Nicolas:

1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)

2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries

3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead

4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.

The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.

Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').

Reported-by: Nicolas Dichtel <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: fix CT target for UNSPEC helpers

Thomas reports its not possible to attach the H.245 helper:

iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"

This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

We should treat UNSPEC as wildcard and ignore the l3num instead.

Reported-by: Thomas Woerner <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: connmark: ignore skbs with magic untracked conntrack objects

The (percpu) untracked conntrack entries can end up with nonzero connmarks.

The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).

Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.

Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.

Reported-by: XU Tianwen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr

family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().

Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

PCI: Don't attempt to claim shadow copies of ROM

If we're using a shadow copy of a PCI device ROM, the shadow copy is in RAM
and the device never sees accesses to it and doesn't respond to it.  We
don't have to route the shadow range to the PCI device, and the device
doesn't have to claim the range.

Previously we treated the shadow copy as though it were the ROM BAR, and we
failed to claim it because the region wasn't routed to the device:

  pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
  pci_bus 0000:01: Allocating resources
  pci 0000:01:00.0: can't claim BAR 6 [mem 0x000c0000-0x000dffff]: no compatible bridge window

The failure path of pcibios_allocate_dev_rom_resource() cleared out the
resource start address, which also caused the following ioremap() warning:

  WARNING: CPU: 0 PID: 116 at /build/linux-akdJXO/linux-4.8.0/arch/x86/mm/ioremap.c:121 __ioremap_caller+0x1ec/0x370
  ioremap on RAM at 0x0000000000000000 - 0x000000000001ffff

Handle an option ROM shadow copy as RAM, without trying to insert it into
the iomem resource tree.

This fixes a regression caused by 0c0e0736acad ("PCI: Set ROM shadow
location in arch code, not in PCI core"), which appeared in v4.6.  The
regression causes video device initialization to fail.  This was reported
on AMD Turks, but it likely affects others as well.

Fixes: 0c0e0736acad ("PCI: Set ROM shadow location in arch code, not in PCI core")
Reported-and-tested-by: Vecu Bosseur <[email protected]>
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627496
Link: https://bugzilla.kernel.org/show_bug.cgi?id=175391
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1352272
Signed-off-by: Bjorn Helgaas <[email protected]>
CC: [email protected] # v4.6+

ARCv2: MCIP: Use IDU_M_DISTRI_DEST mode if there is only 1 destination core

ARC linux uses 2 distribution modes for common interrupts: round robin
mode (IDU_M_DISTRI_RR) and a simple destination mode (IDU_M_DISTRI_DEST).
The first one is used when more than 1 cores may handle a common interrupt
and the second one is used when only 1 core may handle a common interrupt.

However idu_irq_set_affinity() always sets IDU_M_DISTRI_RR for all affinity
values. But there is no sense in setting of such mode if only 1 core must
handle a common interrupt.

Signed-off-by: Yuriy Kolerov <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

ARC: IRQ: Do not use hwirq as virq and vice versa

This came up when reviewing code to address missing IRQ affinity
setting in AXS103 platform and/or implementing hierarchical IRQ domains

- smp_ipi_irq_setup() callers pass hwirq but in turn calls
  request_percpu_irq() which expects a linux virq. So invoke
  irq_find_mapping() to do the conversion
  (also explicitify this in code by renaming the args appropriately)

- idu_of_init()/idu_cascade_isr() were similarly using linux virq where
  hwirq is expected, so do the conversion using irqd_to_hwirq() helper

Signed-off-by: Yuriy Kolerov <[email protected]>
[vgupta: made changelog a bit concise a bit]
Signed-off-by: Vineet Gupta <[email protected]>

Merge tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

- Four patches from Robin Murphy fix several issues with the recently
   merged generic DT-bindings support for arm-smmu drivers

- A fix for a dead-lock issue in the VT-d driver, which shows up on
   iommu hotplug

* tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
  iommu/arm-smmu: Fix out-of-bounds dereference
  iommu/arm-smmu: Check that iommu_fwspecs are ours
  iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s
  iommu/arm-smmu: Work around ARM DMA configuration

ARC: [plat-eznps] set default baud for early console

For CONFIG_SERIAL_EARLYCON we need 800MHz for NPS SoC
The early console driver uses BASE_BAUD and not using dtb.

The default of 50MHz is NOT good for NPS SoC.

Signed-off-by: Noam Camus <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

ARC: [plat-eznps] remove IPI clear from SMP operations

Today we register to plat_smp_ops.clear() method which actually
is acking the IPI.
However this is already taking care by our irqchip driver specifically
by the irq_chip.irq_eoi() method.
This is perfect timing where it should be done and no special handling
is needed at plat_smp_ops.clear().

Signed-off-by: Noam Camus <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>