Git Repo - linux.git/log

Revert "console: don't prefer first registered if DT specifies stdout-path"

This reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path").

The reverted commit changes existing behavior on which many ARM boards
rely.  Many ARM small-board-computers, like e.g.  the Raspberry Pi have
both a video output and a serial console.  Depending on whether the user
is using the device as a more regular computer; or as a headless device
we need to have the console on either one or the other.

Many users rely on the kernel behavior of the console being present on
both outputs, before the reverted commit the console setup with no
console= kernel arguments on an ARM board which sets stdout-path in dt
would look like this:

  [root@localhost ~]# cat /proc/consoles
  ttyS0                -W- (EC p a)    4:64
  tty0                 -WU (E  p  )    4:1

Where as after the reverted commit, it looks like this:

  [root@localhost ~]# cat /proc/consoles
  ttyS0                -W- (EC p a)    4:64

This commit reverts commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path") restoring the original
behavior.

Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Thorsten Leemhuis <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: hwpoison: fix thp split handling in memory_failure()

When memory_failure() runs on a thp tail page after pmd is split, we
trigger the following VM_BUG_ON_PAGE():

   page:ffffd7cd819b0040 count:0 mapcount:0 mapping:         (null) index:0x1
   flags: 0x1fffc000400000(hwpoison)
   page dumped because: VM_BUG_ON_PAGE(!page_count(p))
   ------------[ cut here ]------------
   kernel BUG at /src/linux-dev/mm/memory-failure.c:1132!

memory_failure() passed refcount and page lock from tail page to head
page, which is not needed because we can pass any subpage to
split_huge_page().

Fixes: 61f5d698cc97 ("mm: re-enable THP")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: <[email protected]> [4.5+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

swapfile: fix memory corruption via malformed swapfile

When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.

This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jann Horn <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Jerome Marchand <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma.c: check the max limit for cma allocation

CMA allocation request size is represented by size_t that gets truncated
when same is passed as int to bitmap_find_next_zero_area_off.

We observe that during fuzz testing when cma allocation request is too
high, bitmap_find_next_zero_area_off still returns success due to the
truncation. This leads to kernel crash, as subsequent code assumes that
requested memory is available.

Fail cma allocation in case the request breaches the corresponding cma
region size.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Shiraz Hashim <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/bloat-o-meter: fix SIGPIPE

Fix piping output to a program which quickly exits (read: head -n1)

$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux | head -n1
add/remove: 0/0 grow/shrink: 9/60 up/down: 124/-305 (-181)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

Link: http://lkml.kernel.org/r/20161028204618.GA29923@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Matt Mackall <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

shmem: fix pageflags after swapping DMA32 object

If shmem_alloc_page() does not set PageLocked and PageSwapBacked, then
shmem_replace_page() needs to do so for itself. Without this, it puts
newpage on the wrong lru, re-unlocks the unlocked newpage, and system
descends into "Bad page" reports and freeze; or if CONFIG_DEBUG_VM=y, it
hits an earlier VM_BUG_ON_PAGE(!PageLocked), depending on config.

But shmem_replace_page() is not a common path: it's only called when
swapin (or swapoff) finds the page was already read into an unsuitable
zone: usually all zones are suitable, but gem objects for a few drm
devices (gma500, omapdrm, crestline, broadwater) require zone DMA32 if
there's more than 4GB of ram.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: <[email protected]> [4.8.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, frontswap: make sure allocated frontswap map is assigned

Christian Borntraeger reports:

With commit 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to
static key") kmemleak complains about a memory leak in swapon

    unreferenced object 0x3e09ba56000 (size 32112640):
      comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
         __vmalloc_node_range+0x194/0x2d8
         vzalloc+0x58/0x68
         SyS_swapon+0xd60/0x12f8
         system_call+0xd6/0x270

Turns out kmemleak is right.  We now allocate the frontswap map
depending on the kernel config (and no longer on the enablement)

  swapfile.c:
  [...]
      if (IS_ENABLED(CONFIG_FRONTSWAP))
                frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));

but later on this is passed along
  --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);

and ignored if frontswap is disabled
  --> frontswap_init(p->type, frontswap_map);

  static inline void frontswap_init(unsigned type, unsigned long *map)
  {
        if (frontswap_enabled())
                __frontswap_init(type, map);
  }

Thing is, that frontswap map is never freed.

The leakage is relatively not that bad, because swapon is an infrequent
and privileged operation.  However, if the first frontswap backend is
registered after a swap type has been already enabled, it will WARN_ON
in frontswap_register_ops() and frontswap will not be available for the
swap type.

Fix this by making sure the map is assigned by frontswap_init() as long
as CONFIG_FRONTSWAP is enabled.

Fixes: 8ea1d2a1985a ("mm, frontswap: convert frontswap_enabled to static key")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Christian Borntraeger <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: remove extra newline from allocation stall warning

Commit 63f53dea0c98 ("mm: warn about allocations which stall for too
long") by error embedded "\n" in the format string, resulting in strange
output.

  [  722.876655] kworker/0:1: page alloction stalls for 160001ms, order:0
  [  722.876656] , mode:0x2400000(GFP_NOIO)
  [  722.876657] CPU: 0 PID: 6966 Comm: kworker/0:1 Not tainted 4.8.0+ #69

Link: http://lkml.kernel.org/r/1476026219-7974-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for v4.9-rc4

- Kick the vcpu when a pending interrupt becomes pending again
- Prevent access to invalid interrupt registers
- Invalid TLBs when two vcpus from the same VM share a CPU

perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake

Several uncore IMC PCI IDs are missed for Intel SkyLake.

Add the PCI IDs for SkyLake Y, U, H and S platforms.
Rename the ID macros for 0x191f and 0x190c.

The corresponding bug:

https://bugzilla.kernel.org/show_bug.cgi?id=187301

The related datasheets are also attached in the bug entry for permanent reference.

Reported-by: Ben Widawsky <[email protected]>
Tested-by: Ben Widawsky <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails

Consider two devices, A and B, where B is a child of A, and B utilizes
asynchronous suspend (it does not matter whether A is sync or async). If
B fails to suspend_noirq() or suspend_late(), or is interrupted by a
wakeup (pm_wakeup_pending()), then it aborts and sets the async_error
variable. However, device A does not (immediately) check the async_error
variable; it may continue to run its own suspend_noirq()/suspend_late()
callback. This is bad.

We can resolve this problem by doing our error and wakeup checking
(particularly, for the async_error flag) after waiting for children to
suspend, instead of before. This also helps align the logic for the noirq and
late suspend cases with the logic in __device_suspend().

It's easy to observe this erroneous behavior by, for example, forcing a
device to sleep a bit in its suspend_noirq() (to ensure the parent is
waiting for the child to complete), then return an error, and watch the
parent suspend_noirq() still get called. (Or similarly, fake a wakeup
event at the right (or is it wrong?) time.)

Fixes: de377b397272 (PM / sleep: Asynchronous threads for suspend_late)
Fixes: 28b6fd6e3779 (PM / sleep: Asynchronous threads for suspend_noirq)
Reported-by: Jeffy Chen <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

splice: remove detritus from generic_file_splice_read()

i_size check is a leftover from the horrors that used to play with
the page cache in that function. With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).

Spotted-by: Abhi Das <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux into drm-fixes

imx-drm: fix possible hangup when disabling crtcs

- only ever disable the display controller (DC) module after all plane
  IDMAC channels are stopped. This fixes a regression introduced by the
  atomic modeset conversion.

* tag 'imx-drm-fixes-2016-11-10' of git://git.pengutronix.de/git/pza/linux:
  drm/imx: disable planes before DC

Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Regression fix for powerplay on some iceland boards.

* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: implement get_clock_by_type for iceland.
  drm/amd/powerplay/smu7: fix checks in smu7_get_evv_voltages (v2)
  drm/amd/powerplay: update phm_get_voltage_evv_on_sclk for iceland
  drm/amd/powerplay: propagate errors in phm_get_voltage_evv_on_sclk

drm/udl: make control msg static const. (v2)

Thou shall not send control msg from the stack,
does that mean I can send it from the RO memory area?

and it looks like the answer is no, so here's
v2 which kmemdups.

Reported-by: poma
Tested-by: poma <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

PCI: VMD: Update filename to reflect move

Updating MAINTAINERS to reflect the new location of the VMD driver.

Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

libceph: initialize last_linger_id with a large integer

osdc->last_linger_id is a counter for lreq->linger_id, which is used
for watch cookies. Starting with a large integer should ease the task
of telling apart kernel and userspace clients.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: fix legacy layout decode with pool 0

If your data pool was pool 0, ceph_file_layout_from_legacy()
transform that to -1 unconditionally, which broke upgrades.
We only want do that for a fully zeroed ceph_file_layout,
so that it still maps to a file_layout_t. If any fields
are set, though, we trust the fl_pgpool to be a valid pool.

Fixes: 7627151ea30bc ("libceph: define new ceph_file_layout structure")
Link: http://tracker.ceph.com/issues/17825
Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: use default file splice read callback

Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

drm/amd/powerplay: implement get_clock_by_type for iceland.

iceland use pptable v0.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Merge remote-tracking branch 'mkp-scsi/4.9/scsi-fixes' into fixes

Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: Couple of router fixes

v1->v2:
- patch2:
- use net_eq
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces

Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:

mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter))

Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Fix handling of neighbour structure

__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.

Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.

Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qed-fixes'

Yuval Mintz says:

====================
qed: Fix RoCE infrastructure

This series fixes 2 basic issues with RoCE support,
one handles a missing configuration in the initial infrastructure
support while the other is a regression introduced by one of the
initial fix submissions.
====================

Signed-off-by: David S. Miller <[email protected]>

qed: Correct rdma params configuration

Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.

Fixes: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: configure ll2 RoCE v1/v2 flavor correctly

Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.

Fixes: abd49676c707 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

arm64: dts: rockchip: add three new resets for rk3399 PCIe controller

pm_rst, aclk_rst and pclk_rst should be controlled by driver, so we
need to add these three resets for PCIe controller.

Signed-off-by: Shawn Lin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Heiko Stuebner <[email protected]>

PCI: rockchip: Add three new resets as required properties

pm_rst, aclk_rst, pclk_rst was controlled by ROM code so the software
wasn't needed to control it again in theory. But it didn't work properly,
so we do need to do it again and add enough delay between the assert of
pm_rst and the deassert of pm_rst. The Soc intergrated with this
controller, rk3399, is still under MP test internally, so the backward
compatibility won't be a big deal.

Signed-off-by: Shawn Lin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Acked-by: Rob Herring <[email protected]>

ipv4: update comment to document GSO fragmentation cases.

This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani <[email protected]>
Reviewed-by: Shmulik Ladkani <[email protected]>
Signed-off-by: Lance Richardson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/amd/powerplay/smu7: fix checks in smu7_get_evv_voltages (v2)

Only check if the tables exist in relevant configs. This
fixes a failure on V0 tables.

v2: fix version check as suggested by Rex

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: update phm_get_voltage_evv_on_sclk for iceland

Was missing the handling for iceland.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: propagate errors in phm_get_voltage_evv_on_sclk

Missing for one case.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=185681
https://bugs.freedesktop.org/show_bug.cgi?id=98357

Reviewed-by: Rex Zhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect

When a LOCALINV WR is flushed, the frmr is marked STALE, then
frwr_op_unmap_sync DMA-unmaps the frmr's SGL. These STALE frmrs
are then recovered when frwr_op_map hunts for an INVALID frmr to
use.

All other cases that need frmr recovery leave that SGL DMA-mapped.
The FRMR recovery path unconditionally DMA-unmaps the frmr's SGL.

To avoid DMA unmapping the SGL twice for flushed LOCAL_INV WRs,
alter the recovery logic (rather than the hot frwr_op_unmap_sync
path) to distinguish among these cases. This solution also takes
care of the case where multiple LOCAL_INV WRs are issued for the
same rpcrdma_req, some complete successfully, but some are flushed.

Reported-by: Vasco Steinmetz <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Vasco Steinmetz <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

ppdev: fix double-free of pp->pdev->name

free_pardevice() is called by parport_unregister_device() and already frees
pp->pdev->name, don't try to do it again.

This bug causes kernel crashes.

I found and verified this with KASAN and some added pr_emerg()s:

[   60.316568] pp_release: pp->pdev->name == ffff88039cb264c0
[   60.316692] free_pardevice: freeing par_dev->name at ffff88039cb264c0
[   60.316706] pp_release: kfree(ffff88039cb264c0)
[   60.316714] ==========================================================
[   60.316722] BUG: Double free or freeing an invalid pointer
[   60.316731] Unexpected shadow byte: 0xFB
[   60.316801] Object at ffff88039cb264c0, in cache kmalloc-32 size: 32
[   60.316813] Allocated:
[   60.316824] PID = 1695
[   60.316869] Freed:
[   60.316880] PID = 1695
[   60.316935] ==========================================================

Signed-off-by: Jann Horn <[email protected]>
Acked-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: cdc-acm: fix TIOCMIWAIT

The TIOCMIWAIT implementation would return -EINVAL if any of the three
supported signals were included in the mask.

Instead of returning an error in case TIOCM_CTS is included, simply
drop the mask check completely, which is in accordance with how other
drivers implement this ioctl.

Fixes: 5a6a62bdb925 ("cdc-acm: add TIOCMIWAIT")
Cc: stable <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: tcp response should set oif only if it is L3 master

Lorenzo noted an Android unit test failed due to e0d56fdd7342:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."

Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.

Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls")
Reported-by: Lorenzo Colitti <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Tested-by: Lorenzo Colitti <[email protected]>
Acked-by: Lorenzo Colitti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).

Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.

A similar patch was signed-off and tested-by Allan Chou
<[email protected]> on 2015-12-01.

Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.

Tested-by: Allan Chou <[email protected]>
Tested-by: Chris Roth <[email protected]>
Tested-by: Artjom Simon <[email protected]>
Signed-off-by: Allan Chou <[email protected]>
Signed-off-by: Chris Roth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
====================

Signed-off-by: David S. Miller <[email protected]>

rtnl: reset calcit fptr in rtnl_unregister()

To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <[email protected]>
Cc: Greg Rose <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drbd: Fix kernel_sendmsg() usage - potential NULL deref

Don't pass a size larger than iov_len to kernel_sendmsg().
Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
returns with rv < size.

DRBD as external module has been around in the kernel 2.4 days already.
We used to be compatible to 2.4 and very early 2.6 kernels,
we used to use
rv = sock_sendmsg(sock, &msg, iov.iov_len);
then later changed to
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
when we should have used
rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);

tcp_sendmsg() used to totally ignore the size parameter.
57be5bd ip: convert tcp_sendmsg() to iov_iter primitives
changes that, and exposes our long standing error.

Even with this error exposed, to trigger the bug, we would need to have
an environment (config or otherwise) causing us to not use sendpage()
for larger transfers, a failing connection, and have it fail "just at the
right time". Apparently that was unlikely enough for most, so this went
unnoticed for years.

Still, it is known to trigger at least some of these,
and suspected for the others:
[0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
[1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
[2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
[3] https://ubuntuforums.org/showthread.php?t=2336150
[4] http://e2.howsolveproblem.com/i/1175162/

This should go into 4.9,
and into all stable branches since and including v4.0,
which is the first to contain the exposing change.

It is correct for all stable branches older than that as well
(which contain the DRBD driver; which is 2.6.33 and up).

It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
we dropped the comment block immediately preceding the kernel_sendmsg().

Fixes: b411b3637fa7 ("The DRBD driver")
Cc: <[email protected]> # 2.6.33.x-
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reported-by: Christoph Lechleitner <[email protected]>
Tested-by: Christoph Lechleitner <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
[changed oneliner to be "obvious" without context; more verbose message]
Signed-off-by: Lars Ellenberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

vxlan: hide unused local variable

A bugfix introduced a harmless warning in v4.9-rc4:

drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]

This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.

Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Start completion queue negotiation at server-provided optimum values

Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.

Signed-off-by: John Allen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: icmp_route_lookup should use rt dev to determine L3 domain

icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qcom-emac-pause'

Timur Tabi says:

====================
net: qcom/emac: ensure that pause frames are enabled

The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it. Therefore, we
need to enable flow control and force the phy to pass pause frames.
====================

Signed-off-by: David S. Miller <[email protected]>

net: qcom/emac: enable flow control if requested

If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.

Signed-off-by: Timur Tabi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qcom/emac: configure the external phy to allow pause frames

Pause frames are used to enable flow control. A MAC can send and
receive pause frames in order to throttle traffic. However, the PHY
must be configured to allow those frames to pass through.

Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Timur Tabi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ACPI / platform: Add support for build-in properties

We have a couple of drivers, acpi_apd.c and acpi_lpss.c,
that need to pass extra build-in properties to the devices
they create. Previously the drivers added those properties
to the struct device which is member of the struct
acpi_device, but that does not work. Those properties need
to be assigned to the struct device of the platform device
instead in order for them to become available to the
drivers.

To fix this, this patch changes acpi_create_platform_device
function to take struct property_entry pointer as parameter.

Fixes: 20a875e2e86e (serial: 8250_dw: Add quirk for APM X-Gene SoC)
Signed-off-by: Heikki Krogerus <[email protected]>
Tested-by: Yazen Ghannam <[email protected]>
Tested-by: Jérôme de Bretagne <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

3 more amdgpu fixes.

* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/powerplay: return false instead of -EINVAL
  drm/amdgpu/powerplay/smu7: fix unintialized data usage
  drm/amdgpu: fix crash in acp_hw_fini

Merge tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel into drm-fixes

i915 fixes, include Sandybridge rendering regression fix.

* tag 'drm-intel-fixes-2016-11-09' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Limit Valleyview and earlier to only using mappable scanout
  drm/i915: Round tile chunks up for constructing partial VMAs
  drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms
  drm/i915/dp: BDW cdclk fix for DP audio
  drm/i915/vlv: Prevent enabling hpd polling in late suspend
  drm/i915: Respect alternate_ddc_pin for all DDI ports

x86/cpu: Deal with broken firmware (VMWare/XEN)

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <[email protected]>,
Reported-by: M. Vefa Bicakci <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <[email protected]>

Merge tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This became a largish pull-request, as we've got a bunch of pending
  ASoC fixes at this time. One noticeable change is the removal of error
  directive in uapi/sound/asoc.h. We found that the API has been already
  used on Chromebooks, so we need to support it even now.

  A slight big LOC is found in Qualcomm lpass driver, but the rest are
  all small and easy fixes for ASoC drivers (sti, sun4i, Realtek codecs,
  Intel, tas571x, etc) in addition to the patches to harden the ALSA
  core proc file accesses"

* tag 'sound-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: info: Return error for invalid read/write
  ALSA: info: Limit the proc text input size
  ASoC: samsung: spdif: Fix DMA filter initialization
  ASoC: sun4i-codec: Enable bus clock after getting GPIO
  ASoC: lpass-cpu: add module licence and description
  ASoC: lpass-platform: Fix broken pcm data usage
  ASoC: sun4i-codec: return error code instead of NULL when create_card fails
  ASoC: hdmi-codec: Fix hdmi_of_xlate_dai_name when #sound-dai-cells = <0>
  ASoC: samsung: get access to DMA engine early to defer probe properly
  ASoC: da7219: Connect output enable register to DAIOUT
  ASoC: Intel: Skylake: Fix to turn off hdmi power on probe failure
  ASoC: sti-sas: enable fast io for regmap
  ASoC: sti: fix channel status update after playback start
  ASoC: PXA: Brownstone needs I2C
  ASoC: Intel: Skylake: Always acquire runtime pm ref on unload
  ASoC: Intel: Atom: add terminate entry for dmi_system_id tables
  ASoC: rt298: fix jack type detect error
  ASoC: rt5663: fix a debug statement
  ASoC: cs4270: fix DAPM stream name mismatch
  ASoC: Intel: haswell depends on sst-firmware
  ...

Merge tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fix from Mike Marshall:
"We recently refactored the Orangefs debugfs code. The refactor seemed
  to trigger [email protected]'s static tester to find a possible
  double-free in the code.

  While designing the fix we saw a condition under which the buffer
  being freed could also be overflowed.

  We also realized how to rebuild the related debugfs file's "contents"
  (a string) without deleting and re-creating the file.

  This fix should eliminate the possible double-free, the potential
  overflow and improve code readability"

* tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: clean up debugfs

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Two bug fixes

   - a memory alignment fix in the s390 only hypfs code

   - a fix for the generic percpu code that caused ftrace to break on
     s390. This is not relevant for x86 but for all architectures that
     use the generic percpu code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  percpu: use notrace variant of preempt_disable/preempt_enable
  s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment

net: bgmac: fix reversed checks for clock control flag

This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.

This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.

Fixes: db791eb2970b ("net: ethernet: bgmac: convert to feature flags")
Fixes: 4af1474e6198 ("net: bgmac: Fix errant feature flag check")
Cc: Jon Mason <[email protected]>
Signed-off-by: Rafał Miłecki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bna: Add synchronization for tx ring.

We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.

bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.

Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().

Signed-off-by: Benjamin Poirier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net/mlx4_en: Fix panic during reboot"

This reverts commit 9d2afba058722d40cc02f430229c91611c0e8d16.

The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.

The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.

Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net-ipv6: on device mtu change do not add mtu to mtu-less routes

Routes can specify an mtu explicitly or inherit the mtu from
the underlying device - this inheritance is implemented in
dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().

Currently changing the mtu of a device adds mtu explicitly
to routes using that device.

ie.
  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1

After this patch the route entry no longer changes unless it already has an mtu.
There is no need: this inheritance is already done in ip6_mtu()

  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route add local 2000::2 dev lo mtu 2000
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 1501
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1
  # ip -6 route del local 2000::2

This is desirable because changing device mtu and then resetting it
to the previous value shouldn't change the user visible routing table.

Signed-off-by: Maciej Żenczykowski <[email protected]>
CC: Eric Dumazet <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock: fix sendmmsg for partial sendmsg

Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e6020 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Maciej Żenczykowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.

When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.

Now add one flag to indicate if new macvlan port is created.

Signed-off-by: Gao Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression

This patch will fix regression caused by commit 1e793f6fc0db ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").

The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).

[mkp: clarified patch description]

Fixes: 1e793f6fc0db920400574211c48f9157a37e3945
Reported-by: Jens Axboe <[email protected]>
CC: [email protected]
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Sumit Saxena <[email protected]>
Tested-by: Sumit Saxena <[email protected]>
Reviewed-by: Tomas Henzl <[email protected]>
Tested-by: Jens Axboe <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems

cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: <[email protected]> # v4.4..
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 3849e91f571d ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

perf hists: Fix column length on --hierarchy

Markus reported that there's a weird behavior on perf top --hierarchy
regarding the column length.

Looking at the code, I found a dubious code which affects the symptoms.
When --hierarchy option is used, the last column length might be
inaccurate since it skips to update the length on leaf entries.

I cannot remember why it did and looks like a leftover from previous
version during the development.

Anyway, updating the column length often is not harmful. So let's move
the code out.

Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 1a3906a7e6b9 ("perf hists: Resort hist entries with hierarchy")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Fix column indentation on --hierarchy

When horizontall scrolling is used in hierarchy mode, the the right most
column has unnecessary indentation. Actually it's needed only if some
of left (overhead) columns were shown.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Show folded sign properly on --hierarchy

When horizontal scrolling is used in hierarchy mode, the folded signed
disappears at the right most column.

Committer note:

To test it, run 'perf top --hierarchy, see the '+' symbol at the first
column, then press the right arrow key, the '+' symbol will disappear,
this patch fixes that.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Move 'width -= 2' invariant to right after the if/else ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hists browser: Fix indentation of folded sign on --hierarchy

It should indent 2 spaces for folded sign and a whitespace.

Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Markus Trippelsdorf <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf hist browser: Fix hierarchy column counts

The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.

But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.

In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.

Reported-and-Tested-by: Markus Trippelsdorf <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

drm/imx: disable planes before DC

If the DC clock is disabled before the attached IDMACs are properly
stopped the IDMACs may hang the IPU or even the whole system.

Make sure the IDMACs are in safe state by disabling the planes before
removal of the DC clock.

Also set the atomic parameter to false to stop calling the atomic_begin
hook, which does nothing useful as we immediately afterwards turn off
vblank interrupts and possibly send the pending vblank event.

Fixes: 33f14235302f (drm/imx: atomic phase 1: Use transitional atomic
CRTC and plane helpers)
Signed-off-by: Lucas Stach <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>

scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove

If a command is aborted in the kernel but not in the adapter, it might be
considered complete and its DMA memory released, but it is still alive in
the adapter, which will trigger an invalid DMA access upon its completion
(in the DMA operations to deliver the command response to the driver).

On powerpc platforms with IOMMU/EEH capabilities, the problem is observed
during PCI device removal with ongoing IO requests -- which might trigger
an EEH event very often, pointing to a 'TCE Request Page Access Error'.

In that path, which is qla2x00_remove_one(), the commands are aborted in
qla2x00_abort_all_cmds(), which does not perform an abort in the adapter
as is done in qla2xxx_eh_abort() for example.

So, this patch changes qla2x00_abort_all_cmds() to abort commands in the
adapter too, with a call to qla2xxx_eh_abort(), which already implements
all the logic to submit abort requests and handle responses.

Reported-by: Naresh Bannoth <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: qla2xxx: do not queue commands when unloading

When the driver is unloading, in qla2x00_remove_one(), there is a single
call/point in time to abort ongoing commands, qla2x00_abort_all_cmds(),
which is still several steps away from the call to scsi_remove_host().

If more commands continue to arrive and be processed during that
interval, when the driver is tearing down and releasing its structures,
it might potentially hit an oops due to invalid memory access:

    Unable to handle kernel paging request for data at address 0x00000138
    <...>
    NIP [d000000004700a40] qla2xxx_queuecommand+0x80/0x3f0 [qla2xxx]
    LR [d000000004700a10] qla2xxx_queuecommand+0x50/0x3f0 [qla2xxx]

So, fail commands in qla2xxx_queuecommand() if the UNLOADING bit is set.

Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libcxgbi: fix incorrect DDP resource cleanup

Before calling task_release_itt() task data is memset to zero because of
which DDP context information is lost resulting in incorrect DDP
resource cleanup, to fix this call task_release_itt() before memset.

Signed-off-by: Varun Prakash <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

netfilter: nf_tables: fix oops when inserting an element into a verdict map

Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
   map sourcemap {
     type ipv4_addr : verdict;
   }

   chain postrouting {
     ip saddr vmap @sourcemap accept
   }
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----

results in a kernel oops:
BUG: unable to handle kernel paging request at 0000000000001344
IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
[...]
Call Trace:
  [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
  [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
  [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
  [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
  [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
  [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
  [<ffffffff8132c400>] ? nla_validate+0x60/0x80
  [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.

Reported-by: Dalegaard <[email protected]>
Signed-off-by: Liping Zhang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: refine gc worker heuristics

Nicolas Dichtel says:
  After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
  remove timed-out entries"), netlink conntrack deletion events may be
  sent with a huge delay.

Nicolas further points at this line:

  goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);

and indeed, this isn't optimal at all.  Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge.  But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.

We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).

So, after some discussion with Nicolas:

1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)

2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries

3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead

4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.

The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.

Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').

Reported-by: Nicolas Dichtel <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: fix CT target for UNSPEC helpers

Thomas reports its not possible to attach the H.245 helper:

iptables -t raw -A PREROUTING -p udp -j CT --helper H.245
iptables: No chain/target/match by that name.
xt_CT: No such helper "H.245"

This is because H.245 registers as NFPROTO_UNSPEC, but the CT target
passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get.

We should treat UNSPEC as wildcard and ignore the l3num instead.

Reported-by: Thomas Woerner <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: connmark: ignore skbs with magic untracked conntrack objects

The (percpu) untracked conntrack entries can end up with nonzero connmarks.

The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).

Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.

Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.

Reported-by: XU Tianwen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr

family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().

Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

PCI: Don't attempt to claim shadow copies of ROM

If we're using a shadow copy of a PCI device ROM, the shadow copy is in RAM
and the device never sees accesses to it and doesn't respond to it.  We
don't have to route the shadow range to the PCI device, and the device
doesn't have to claim the range.

Previously we treated the shadow copy as though it were the ROM BAR, and we
failed to claim it because the region wasn't routed to the device:

  pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
  pci_bus 0000:01: Allocating resources
  pci 0000:01:00.0: can't claim BAR 6 [mem 0x000c0000-0x000dffff]: no compatible bridge window

The failure path of pcibios_allocate_dev_rom_resource() cleared out the
resource start address, which also caused the following ioremap() warning:

  WARNING: CPU: 0 PID: 116 at /build/linux-akdJXO/linux-4.8.0/arch/x86/mm/ioremap.c:121 __ioremap_caller+0x1ec/0x370
  ioremap on RAM at 0x0000000000000000 - 0x000000000001ffff

Handle an option ROM shadow copy as RAM, without trying to insert it into
the iomem resource tree.

This fixes a regression caused by 0c0e0736acad ("PCI: Set ROM shadow
location in arch code, not in PCI core"), which appeared in v4.6.  The
regression causes video device initialization to fail.  This was reported
on AMD Turks, but it likely affects others as well.

Fixes: 0c0e0736acad ("PCI: Set ROM shadow location in arch code, not in PCI core")
Reported-and-tested-by: Vecu Bosseur <[email protected]>
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627496
Link: https://bugzilla.kernel.org/show_bug.cgi?id=175391
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1352272
Signed-off-by: Bjorn Helgaas <[email protected]>
CC: [email protected] # v4.6+

ARCv2: MCIP: Use IDU_M_DISTRI_DEST mode if there is only 1 destination core

ARC linux uses 2 distribution modes for common interrupts: round robin
mode (IDU_M_DISTRI_RR) and a simple destination mode (IDU_M_DISTRI_DEST).
The first one is used when more than 1 cores may handle a common interrupt
and the second one is used when only 1 core may handle a common interrupt.

However idu_irq_set_affinity() always sets IDU_M_DISTRI_RR for all affinity
values. But there is no sense in setting of such mode if only 1 core must
handle a common interrupt.

Signed-off-by: Yuriy Kolerov <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

ARC: IRQ: Do not use hwirq as virq and vice versa

This came up when reviewing code to address missing IRQ affinity
setting in AXS103 platform and/or implementing hierarchical IRQ domains

- smp_ipi_irq_setup() callers pass hwirq but in turn calls
  request_percpu_irq() which expects a linux virq. So invoke
  irq_find_mapping() to do the conversion
  (also explicitify this in code by renaming the args appropriately)

- idu_of_init()/idu_cascade_isr() were similarly using linux virq where
  hwirq is expected, so do the conversion using irqd_to_hwirq() helper

Signed-off-by: Yuriy Kolerov <[email protected]>
[vgupta: made changelog a bit concise a bit]
Signed-off-by: Vineet Gupta <[email protected]>

Merge tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

- Four patches from Robin Murphy fix several issues with the recently
   merged generic DT-bindings support for arm-smmu drivers

- A fix for a dead-lock issue in the VT-d driver, which shows up on
   iommu hotplug

* tag 'iommu-fixes-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
  iommu/arm-smmu: Fix out-of-bounds dereference
  iommu/arm-smmu: Check that iommu_fwspecs are ours
  iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s
  iommu/arm-smmu: Work around ARM DMA configuration

ARC: [plat-eznps] set default baud for early console

For CONFIG_SERIAL_EARLYCON we need 800MHz for NPS SoC
The early console driver uses BASE_BAUD and not using dtb.

The default of 50MHz is NOT good for NPS SoC.

Signed-off-by: Noam Camus <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

ARC: [plat-eznps] remove IPI clear from SMP operations

Today we register to plat_smp_ops.clear() method which actually
is acking the IPI.
However this is already taking care by our irqchip driver specifically
by the irq_chip.irq_eoi() method.
This is perfect timing where it should be done and no special handling
is needed at plat_smp_ops.clear().

Signed-off-by: Noam Camus <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

Revert "ARC: build: retire old toggles"

This has caused a bunch of build failures at a few sites, with GNU
2015.12 and older as the assembler seems to need -mlock to be able to
grok llock/scond instructions for ARC700 builds.
different places since the
older tools still seem to release
of tools which most people are using seem to trip with the -mlock flag
not being passed.

This reverts commit c3005475889c7c730638f95d13be3360f0b33e98.

drm/amd/powerplay: return false instead of -EINVAL

Returning -EINVAL from a bool-returning function
phm_check_smc_update_required_for_display_configuration has an unexpected
effect of returning true, which is probably not what was intended.
Replace -EINVAL by false.

The only place this function is called from is
psm_adjust_power_state_dynamic in
drivers/gpu/drm/amd/powerplay/eventmgr/psm.c:106:

if (!equal || phm_check_smc_update_required_for_display_configuration(hwmgr)) {
phm_apply_state_adjust_rules(hwmgr, requested, pcurrent);
phm_set_power_state(hwmgr, &pcurrent->hardware, &requested->hardware);
hwmgr->current_ps = requested;
}

It seems to expect a boolean value here.

This issue has been found using the following Coccinelle semantic patch
written by Peter Senna Tschudin:
<smpl>
@@
identifier f;
constant C;
typedef bool;
@@
bool f (...){
<+...
* return -C;
...+>
}
</smpl>

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Andrew Shadura <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/powerplay/smu7: fix unintialized data usage

A recent bugfix replaced an out-of-bounds access with direct
use of unintialized data:

drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c: In function 'smu7_patch_limits_vddc':
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:2033:6: error: 'vddc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:2146:11: note: 'vddc' was declared here
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:2033:6: error: 'vddci' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:2146:17: note: 'vddci' was declared here
uint32_t vddc, vddci;

This initializes the data as before using the correct type.

Fixes: 77f7f71f5be1 ("drm/amdgpu/powerplay/smu7: fix static checker warning")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

genirq: Use irq type from irqdata instead of irqdesc

The type flags in the irq descriptor are there for historical reasons and
only updated via irq_modify_status() or irq_set_type(). Both functions also
update the type flags in irqdata. __setup_irq() is the only left over user
of the type flags in the irq descriptor.

If __setup_irq() is called with empty irq type flags, then the type flags
are retrieved from irqdata. If an interrupt is shared, then the type flags
are compared with the type flags stored in the irq descriptor.

On x86 the ioapic does not have a irq_set_type() callback because the type
is defined in the BIOS tables and cannot be changed. The type is stored in
irqdata at setup time without updating the type data in the irq
descriptor. As a result the comparison described above fails.

There is no point in updating the irq descriptor flags because the only
relevant storage is irqdata. Use the type flags from irqdata for both
retrieval and comparison in __setup_irq() instead.

Aside of that the print out in case of non matching type flags has the old
and new type flags arguments flipped. Fix that as well.

For correctness sake the flags stored in the irq descriptor should be
removed, but this is beyond the scope of this bugfix and will be done in a
later patch.

Fixes: 4b357daed698 ("genirq: Look-up trigger type if not specified by caller")
Reported-and-tested-by: Mika Westerberg <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Jon Hunter <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611072020360.3501@nanos
Signed-off-by: Thomas Gleixner <[email protected]>

iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path

It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.

Fixes: 55d940430ab9 ('iommu/vt-d: Get rid of domain->iommu_lock')
Signed-off-by: Joerg Roedel <[email protected]>

iommu/arm-smmu: Fix out-of-bounds dereference

When we iterate a master's config entries, what we generally care
about is the entry's stream map index, rather than the entry index
itself, so it's nice to have the iterator automatically assign the
former from the latter. Unfortunately, booting with KASAN reveals
the oversight that using a simple comma operator results in the
entry index being dereferenced before being checked for validity,
so we always access one element past the end of the fwspec array.

Flip things around so that the check always happens before the index
may be dereferenced.

Fixes: adfec2e709d2 ("iommu/arm-smmu: Convert to iommu_fwspec")
Reported-by: Mark Rutland <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/arm-smmu: Check that iommu_fwspecs are ours

We seem to have forgotten to check that iommu_fwspecs actually belong to
us before we go ahead and dereference their private data. Oops.

Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/arm-smmu: Don't inadvertently reject multiple SMMUv3s

We now delay installing our per-bus iommu_ops until we know an SMMU has
successfully probed, as they don't serve much purpose beforehand, and
doing so also avoids fights between multiple IOMMU drivers in a single
kernel. However, the upshot of passing the return value of bus_set_iommu()
back from our probe function is that if there happens to be more than
one SMMUv3 device in a system, the second and subsequent probes will
wind up returning -EBUSY to the driver core and getting torn down again.

Avoid re-setting ops if ours are already installed, so that any genuine
failures stand out.

Fixes: 08d4ca2a672b ("iommu/arm-smmu: Support non-PCI devices with SMMUv3")
CC: Lorenzo Pieralisi <[email protected]>
CC: Hanjun Guo <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/arm-smmu: Work around ARM DMA configuration

The 32-bit ARM DMA configuration code predates the IOMMU core's default
domain functionality, and instead relies on allocating its own domains
and attaching any devices using the generic IOMMU binding to them.
Unfortunately, it does this relatively early on in the creation of the
device, before we've seen our add_device callback, which leads us to
attempt to operate on a half-configured master.

To avoid a crash, check for this situation on attach, but refuse to
play, as there's nothing we can do. This at least allows VFIO to keep
working for people who update their 32-bit DTs to the generic binding,
albeit with a few (innocuous) warnings from the DMA layer on boot.

Signed-off-by: Robin Murphy <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

ALSA: info: Return error for invalid read/write

Currently the ALSA proc handler allows read or write even if the proc
file were write-only or read-only. It's mostly harmless, does thing
but allocating memory and ignores the input/output. But it doesn't
tell user about the invalid use, and it's confusing and inconsistent
in comparison with other proc files.

This patch adds some sanity checks and let the proc handler returning
an -EIO error when the invalid read/write is performed.

Cc: <[email protected]> # v4.2+
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: info: Limit the proc text input size

The ALSA proc handler allows currently the write in the unlimited size
until kmalloc() fails. But basically the write is supposed to be only
for small inputs, mostly for one line inputs, and we don't have to
handle too large sizes at all. Since the kmalloc error results in the
kernel warning, it's better to limit the size beforehand.

This patch adds the limit of 16kB, which must be large enough for the
currently existing code.

Cc: [email protected] # v4.2+
Signed-off-by: Takashi Iwai <[email protected]>

percpu: use notrace variant of preempt_disable/preempt_enable

Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like
events do") added a couple of this_cpu_read calls to the ftrace code.

On x86 this is not a problem, since it has single instructions to read
percpu data. Other architectures which use the generic variant now
have additional preempt_disable and preempt_enable calls in the core
ftrace code. This may lead to recursive calls and in result to a dead
machine, e.g. if preemption and debugging options are enabled.

To fix this use the notrace variant of preempt_disable and
preempt_enable within the generic percpu code.

Reported-and-bisected-by: Sebastian Ott <[email protected]>
Tested-by: Sebastian Ott <[email protected]>
Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: Heiko Carstens <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

fib_trie: Correct /proc/net/route off by one error

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key. In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid. With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
Cc: Andy Whitcroft <[email protected]>
Reported-by: Jason Baron <[email protected]>
Tested-by: Jason Baron <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Documentation: networking: dsa: Update tagging protocols

Add Qualcomm QCA tagging introduced in cafdc45c9 to the
list of supported protocols.

Signed-off-by: Fabian Mewes <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio-net: drop legacy features in virtio 1 mode

Virtio 1.0 spec says VIRTIO_F_ANY_LAYOUT and VIRTIO_NET_F_GSO are
legacy-only feature bits. Do not negotiate them in virtio 1 mode. Note
this is a spec violation so we need to backport it to stable/downstream
kernels.

Cc: [email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: icmp6_send should use dst dev to determine L3 domain

icmp6_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have a dst set.
Update icmp6_send to use the dst on the skb to determine L3 domain.

Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()

Fix the following warn:

fs/nfs/nfs4session.c: In function ‘nfs4_slot_seqid_in_use’:
fs/nfs/nfs4session.c:203:54: warning: ‘cur_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (nfs4_slot_get_seqid(tbl, slotid, &cur_seq) == 0 &&
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
      cur_seq == seq_nr && test_bit(slotid, tbl->used_slots))
      ~~~~~~~~~~~~~~~~~

Signed-off-by: Shuah Khan <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFS: Don't print a pNFS error if we aren't using pNFS

We used to check for a valid layout type id before verifying pNFS flags
as an indicator for if we are using pNFS.  This changed in 3132e49ece
with the introduction of multiple layout types, since now we are passing
an array of ids instead of just one.  Since then, users have been seeing
a KERN_ERR printk show up whenever mounting NFS v4 without pNFS.  This
patch restores the original behavior of exiting set_pnfs_layoutdriver()
early if we aren't using pNFS.

Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo
structure")
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>