Git Repo - linux.git/log

Merge tag 'drm/tegra/for-4.16-rc7-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-fixes

drm/tegra: Fixes for v4.16-rc7

This contains two small fixes for the alpha blending support that was
merged into v4.16-rc1 and a fix for connector reference leaks caused by
the fact that display pipelines are no longer automatically disabled if
the framebuffer is removed.

Furthermore this contains a fix for a crash on IOMMU detach at driver
unbind time and a regulator enable/disable unbalance fix.

* tag 'drm/tegra/for-4.16-rc7-fixes' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: Shutdown on driver unbind
  drm/tegra: dsi: Don't disable regulator on ->exit()
  drm/tegra: dc: Detach IOMMU group from domain only once
  drm/tegra: plane: Correct legacy blending
  drm/tegra: plane: Fix RGB565 format on older Tegra

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A late collection of fixes for regressions seen this release cycle.
  Normally I send this earlier than now but real life got in the way.
  Things are back to normal now.

  There's the normal set of SoC driver fixes: i.MX boot warning, TI
  display clks, allwinner clk ops being wrong (fun), driver probe
  badness on error paths, correctness fix for the new aspeed driver, and
  even a fix for a race condition in the bcm2835 clk driver.

  At the core framework level we also got some fixes for the clk phase
  API caching at the wrong time, better handling of the enabled state of
  orphan clks, and a fix for a newly introduced bug in how we handle
  rate calculations for pass-through clks"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: bcm2835: Protect sections updating shared registers
  clk: bcm2835: Fix ana->maskX definitions
  clk: aspeed: Prevent reset if clock is enabled
  clk: aspeed: Fix is_enabled for certain clocks
  clk: qcom: msm8916: Fix return value check in qcom_apcs_msm8916_clk_probe()
  clk: hisilicon: hi3660：Fix potential NULL dereference in hi3660_stub_clk_probe()
  clk: fix determine rate error with pass-through clock
  clk: migrate the count of orphaned clocks at init
  clk: update cached phase to respect the fact when setting phase
  clk: ti: am43xx: add set-rate-parent support for display clkctrl clock
  clk: ti: am33xx: add set-rate-parent support for display clkctrl clock
  clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag
  clk: imx51-imx53: Fix UART4/5 registration on i.MX50 and i.MX53
  clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops

kbuild: disable clang's default use of -fmerge-all-constants

Prasad reported that he has seen crashes in BPF subsystem with netd
on Android with arm64 in the form of (note, the taint is unrelated):

  [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
  [ 4134.820925] Mem abort info:
  [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
  [ 4135.016736]   SET = 0, FnV = 0
  [ 4135.119820]   EA = 0, S1PTW = 0
  [ 4135.201431] Data abort info:
  [ 4135.301388]   ISV = 0, ISS = 0x00000021
  [ 4135.359599]   CM = 0, WnR = 0
  [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
  [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
  [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
  [ 4135.674610] Modules linked in:
  [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S      W       4.14.19+ #1
  [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
  [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
  [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
  [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
  [ 4135.769062] sp : ffffff801d4e3ce0
  [...]
  [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
  [ 4136.273746] Call trace:
  [...]
  [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
  [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
  [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
  [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
  [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
  [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88

Android's netd was basically pinning the uid cookie BPF map in BPF
fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
again resulting in above panic. Issue is that the map was wrongly
identified as a prog! Above kernel was compiled with clang 4.0,
and it turns out that clang decided to merge the bpf_prog_iops and
bpf_map_iops into a single memory location, such that the two i_ops
could then not be distinguished anymore.

Reason for this miscompilation is that clang has the more aggressive
-fmerge-all-constants enabled by default. In fact, clang source code
has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
to do so:

  Pointers with different bases cannot represent the same object.
  (Note that clang defaults to -fmerge-all-constants, which can
  lead to inconsistent results for comparisons involving the address
  of a constant; this generally doesn't matter in practice.)

The issue never appeared with gcc however, since gcc does not enable
-fmerge-all-constants by default and even *explicitly* states in
it's option description that using this flag results in non-conforming
behavior, quote from man gcc:

  Languages like C or C++ require each variable, including multiple
  instances of the same variable in recursive calls, to have distinct
  locations, so using this option results in non-conforming behavior.

There are also various clang bug reports open on that matter [1],
where clang developers acknowledge the non-conforming behavior,
and refer to disabling it with -fno-merge-all-constants. But even
if this gets fixed in clang today, there are already users out there
that triggered this. Thus, fix this issue by explicitly adding
-fno-merge-all-constants to the kernel's Makefile to generically
disable this optimization, since potentially other places in the
kernel could subtly break as well.

Note, there is also a flag called -fmerge-constants (not supported
by clang), which is more conservative and only applies to strings
and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
gcc's code, the two flags -fmerge-{all-,}constants share the same
variable internally, so when disabling it via -fno-merge-all-constants,
then we really don't merge any const data (e.g. strings), and text
size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).

  $ gcc -fverbose-asm -O2 foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
    -> foo.S doesn't list -fmerge-constants under options enabled
  $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
    -> foo.S lists -fmerge-constants under options enabled

Thus, as a workaround we need to set both -fno-merge-all-constants
*and* -fmerge-constants in the Makefile in order for text size to
stay as is.

  [1] https://bugs.llvm.org/show_bug.cgi?id=18538

Reported-by: Prasad Sodagudi <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Chenbo Feng <[email protected]>
Cc: Richard Smith <[email protected]>
Cc: Chandler Carruth <[email protected]>
Cc: [email protected]
Tested-by: Prasad Sodagudi <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Not much exciting here, almost entirely syzkaller fixes.

  This is going to be on ongoing theme for some time, I think. Both
  Google and Mellanox are now running syzkaller on different parts of
  the user API.

  Summary:

   - Many bug fixes related to syzkaller from Leon Romanovsky. These are
     still for the mlx driver and ucma interface.

   - Fix a situation with port reuse for iWarp, discovered during
     scale-up testing

   - Bug fixes for the profile and restrack patches accepted during this
     merge window

   - Compile warning cleanups from Arnd, this is apparently the last
     warning to make 32 bit builds quiet"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/ucma: Ensure that CM_ID exists prior to access it
  RDMA/verbs: Remove restrack entry from XRCD structure
  RDMA/ucma: Fix use-after-free access in ucma_close
  RDMA/ucma: Check AF family prior resolving address
  infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
  infiniband: qplib_fp: fix pointer cast
  IB/mlx5: Fix cleanup order on unload
  RDMA/ucma: Don't allow join attempts for unsupported AF family
  RDMA/ucma: Fix access to non-initialized CM_ID object
  RDMA/core: Do not use invalid destination in determining port reuse
  RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
  IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
  IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:

- one driver patch (qla2xxx) which fixes a problem caused by an
   existing regression fix (FCP discovery is failing)

- one generic fix to a longstanding bug in libsas that causes I/O
   eventually to hang to the device in the face of ATA error recovery.

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Remove FC_NO_LOOP_ID for FCP and FC-NVMe Discovery
  scsi: libsas: defer ata device eh commands to libata

Merge tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
"Just one fix for an occasional panic from Jeff Layton"

* tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux:
nfsd: remove blocked locks on client teardown

bpf: skip unnecessary capability check

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng <[email protected]>
Acked-by: Lorenzo Colitti <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf, doc: add description wrt native/bpf clang target and pointer size

As this recently came up on netdev [0], lets add it to the BPF devel doc.

[0] https://www.spinics.net/lists/netdev/msg489612.html

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs

Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
added helper bpf_perf_prog_read_value so that perf_event type program
can read event counter and enabled/running time.
This commit, however, introduced a bug which allows this helper
for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
needs to access perf_event through its bpf_perf_event_data_kern type context,
which is not available for tracepoint type program.

This patch fixed the issue by separating bpf_func_proto between tracepoint
and perf_event type programs and removed bpf_perf_prog_read_value
from tracepoint func prototype.

Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value")
Reported-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches

Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when
CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
case.

However, it does not fail on other architectures, which have a different
JIT compiler design. So, test_bpf has started to fail to load on those.

After this fix, test_bpf loads fine on both x86_64 and ppc64el.

Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Reviewed-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

kvm/x86: fix icebp instruction handling

The undocumented 'icebp' instruction (aka 'int1') works pretty much like
'int3' in the absense of in-circuit probing equipment (except,
obviously, that it raises #DB instead of raising #BP), and is used by
some validation test-suites as such.

But Andy Lutomirski noticed that his test suite acted differently in kvm
than on bare hardware.

The reason is that kvm used an inexact test for the icebp instruction:
it just assumed that an all-zero VM exit qualification value meant that
the VM exit was due to icebp.

That is not unlike the guess that do_debug() does for the actual
exception handling case, but it's purely a heuristic, not an absolute
rule. do_debug() does it because it wants to ascribe _some_ reasons to
the #DB that happened, and an empty %dr6 value means that 'icebp' is the
most likely casue and we have no better information.

But kvm can just do it right, because unlike the do_debug() case, kvm
actually sees the real reason for the #DB in the VM-exit interruption
information field.

So instead of relying on an inexact heuristic, just use the actual VM
exit information that says "it was 'icebp'".

Right now the 'icebp' instruction isn't technically documented by Intel,
but that will hopefully change. The special "privileged software
exception" information _is_ actually mentioned in the Intel SDM, even
though the cause of it isn't enumerated.

Reported-by: Andy Lutomirski <[email protected]>
Tested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

RDMA/ucma: Ensure that CM_ID exists prior to access it

Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.

Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.

[  109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  109.090315] IP: ucma_connect+0x138/0x1d0
[  109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0
[  109.095384] Oops: 0000 [#1] SMP KASAN PTI
[  109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G    B 4.16.0-rc1-00062-g2975d5de6428 #45
[  109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[  109.105943] RIP: 0010:ucma_connect+0x138/0x1d0
[  109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246
[  109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2
[  109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297
[  109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb
[  109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000
[  109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118
[  109.126221] FS:  00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[  109.129468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0
[  109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  109.142057] Call Trace:
[  109.144160]  ? ucma_listen+0x110/0x110
[  109.146386]  ? wake_up_q+0x59/0x90
[  109.148853]  ? futex_wake+0x10b/0x2a0
[  109.151297]  ? save_stack+0x89/0xb0
[  109.153489]  ? _copy_from_user+0x5e/0x90
[  109.155500]  ucma_write+0x174/0x1f0
[  109.157933]  ? ucma_resolve_route+0xf0/0xf0
[  109.160389]  ? __mod_node_page_state+0x1d/0x80
[  109.162706]  __vfs_write+0xc4/0x350
[  109.164911]  ? kernel_read+0xa0/0xa0
[  109.167121]  ? path_openat+0x1b10/0x1b10
[  109.169355]  ? fsnotify+0x899/0x8f0
[  109.171567]  ? fsnotify_unmount_inodes+0x170/0x170
[  109.174145]  ? __fget+0xa8/0xf0
[  109.177110]  vfs_write+0xf7/0x280
[  109.179532]  SyS_write+0xa1/0x120
[  109.181885]  ? SyS_read+0x120/0x120
[  109.184482]  ? compat_start_thread+0x60/0x60
[  109.187124]  ? SyS_read+0x120/0x120
[  109.189548]  do_syscall_64+0xeb/0x250
[  109.192178]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  109.194725] RIP: 0033:0x7fabb61ebe99
[  109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99
[  109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004
[  109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000
[  109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0
[  109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0
[  109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f
b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff
31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7
[  109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80
[  109.226256] CR2: 0000000000000020

Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

ipv6: old_dport should be a __be16 in __ip6_datagram_connect()

Fixes: 2f987a76a977 ("net: ipv6: keep sk status consistent after datagram connect failure")
Signed-off-by: Stefano Brivio <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Acked-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-fixes-for-4.16-20180319' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2018-03-19

this is a pull reqeust of one patch for net/master.

The patch is by Andri Yngvason and fixes a potential use-after-free bug
in the cc770 driver introduced in the previous pull-request.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-mv88e6xxx-some-fixes'

Uwe Kleine-König says:

====================
net: dsa: mv88e6xxx: some fixes

these patches target net-next and got approved by Andrew Lunn.

Compared to (implicit) v1, I dropped the patch that I didn't know if it
was right because of missing documentation on my side. But Andrew
already cared for that in a patch that is now adfccf118211 in net-next.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix interrupt name for g2 irq

This changes the respective line in /proc/interrupts from

49: x x mv88e6xxx-g1 7 Edge mv88e6xxx-g1

to

49: x x mv88e6xxx-g1 7 Edge mv88e6xxx-g2

which makes more sense.

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix typo in a comment

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix name of switch 88E6141

The switch name is emitted in the kernel log, so having the right name
there is nice.

Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141")
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-Adapt-driver-to-upcoming-firmware-versions'

Ido Schimmel says:

====================
mlxsw: Adapt driver to upcoming firmware versions

The first two patches make sure that reserved fields are set to zero, as
required by the device's programmer's reference manual (PRM).

Last two patches prevent the driver from performing an invalid operation
that is going to be denied by upcoming firmware versions.
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Do not invalidate already invalid ACL groups

When a new ACL group is created its region (ACL) list is initially
empty. Thus, the call to mlxsw_sp_acl_tcam_group_update() would
basically invalidate an already invalid (non-existent) group.

Remove the unnecessary call and make the function symmetric to its del()
counterpart.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Adapt ACL configuration to new firmware versions

The driver currently creates empty ACL groups, binds them to the
requested port and then fills them with actual ACLs that point to TCAM
regions.

However, empty ACL groups are considered invalid and upcoming firmware
versions are going to forbid their binding.

Work around this limitation by only performing the binding after the
first ACL was added to the group.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: Reserved field in mbox profile shouldn't be set

There is no need to set some of the fields within 'mbox_config_profile',
since they are reserved and capability mask should be set to zero.

Signed-off-by: Tal Bar <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: pci: Set mbox dma addresses to zero when not used

Some of the opcodes don't use in, out or both mboxes. In such cases, the
mbox address is a reserved field and FW expects it to be zero.

Signed-off-by: Shalom Toledo <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: gemini: fix memory leak

cppcheck report:
[drivers/net/ethernet/cortina/gemini.c:543]: (error) Memory leak: skb_tab

Signed-off-by: Igor Pylypiv <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred

If the optional regulator is deferred, we must release some resources.
They will be re-allocated when the probe function will be called again.

Fixes: 6eacf31139bf ("ethernet: arc: Add support for Rockchip SoC layer device tree bindings")
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

devlink: Remove redundant free on error path

The current code performs unneeded free. Remove the redundant skb freeing
during the error path.

Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Arkadi Sharshevsky <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vmxnet3: remove unused flag "rxcsum" from struct vmxnet3_adapter

Signed-off-by: Igor Pylypiv <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx5: Remove call to ida_pre_get

The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
reason. The driver uses ida_simple_get() which will call ida_pre_get()
by itself and there's no need to use ida_pre_get() unless using
ida_get_new().

Signed-off-by: Matthew Wilcox <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
Here are a few more important Bluetooth driver fixes for the 4.16
kernel.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <[email protected]>

drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'

If we can not get the HDMI DDC clock, we still need to free some
resources before returning.

Fixes: 939d749ad664 ("drm/sun4i: hdmi: Add support for controller hardware variants")
Reviewed-by: Chen-Yu Tsai <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/5e0084af4ad57e9eea3bca5bd8e2e95970cd6714.1521413031.git.christophe.jaillet@wanadoo.fr

drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'

If we can not allocate the HDMI encoder regmap, we still need to free some
resources before returning.

Fixes: 4b1c924b1fc1 ("drm/sun4i: hdmi: create a regmap for later use")
Reviewed-by: Chen-Yu Tsai <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/14c42391e1b562c7495bda6ad6fa1d24ec8dc052.1521413031.git.christophe.jaillet@wanadoo.fr

Merge branch 'phy-relax-error-checking'

Grygorii Strashko says:

====================
net: phy: relax error checking when creating sysfs link netdev->phydev

Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY.
As result, second CPSW external port will became unusable.
This regression was introduced by commits:
5568363f0cb3 ("net: phy: Create sysfs reciprocal links for attached_dev/phydev"
a3995460491d ("net: phy: Relax error checking on sysfs_create_link()"

Patch 1: exports sysfs_create_link_nowarn() function as preparation for Patch 2.
Patch 2: relaxes error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppresses warning by using
sysfs_create_link_nowarn() and adds error message instead, so links creation
failure is not fatal any more and system can continue working,
which fixes TI CPSW issue and makes boot logs accessible
in case of NFS boot, for example.

This can be stable material 4.13+.

Changes in v2:
- commit messages updated.

v1:
https://patchwork.ozlabs.org/cover/886058/
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: relax error checking when creating sysfs link netdev->phydev

Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per
one netdevice, as result such drivers will produce warning during system
boot and fail to connect second phy to netdevice when PHYLIB framework
will try to create sysfs link netdev->phydev for second PHY
in phy_attach_direct(), because sysfs link with the same name has been
created already for the first PHY. As result, second CPSW external
port will became unusable.

Fix it by relaxing error checking when PHYLIB framework is creating sysfs
link netdev->phydev in phy_attach_direct(), suppressing warning by using
sysfs_create_link_nowarn() and adding error message instead.
After this change links (phy->netdev and netdev->phy) creation failure is not
fatal any more and system can continue working, which fixes TI CPSW issue.

Cc: Florian Fainelli <[email protected]>
Cc: Andrew Lunn <[email protected]>
Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sysfs: symlink: export sysfs_create_link_nowarn()

The sysfs_create_link_nowarn() is going to be used in phylib framework in
subsequent patch which can be built as module. Hence, export
sysfs_create_link_nowarn() to avoid build errors.

Cc: Florian Fainelli <[email protected]>
Cc: Andrew Lunn <[email protected]>
Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.

If bios sets up an MST output and hardware state readout code sees this is
an SST configuration, when disabling the encoder we end up calling
->post_disable_dp() hook instead of the MST version. Consequently, we write
to the DP_SET_POWER dpcd to set it D3 state. Further along when we try
enable the encoder in MST mode, POWER_UP_PHY transaction fails to power up
the MST hub. This results in continuous link training failures which keep
the system busy delaying boot. We could identify bios MST boot discrepancy
and handle it accordingly but a simple way to solve this is to write to the
DP_SET_POWER dpcd for MST too.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105470
Cc: Ville Syrjälä <[email protected]>
Cc: Jani Nikula <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Reported-by: Laura Abbott <[email protected]>
Cc: [email protected]
Fixes: 5ea2355a100a ("drm/i915/mst: Use MST sideband message transactions for dpms control")
Tested-by: Laura Abbott <[email protected]>
Signed-off-by: Dhinakaran Pandiyan <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ad260ab32a4d94fa974f58262f8000472d34fd5b)
Signed-off-by: Rodrigo Vivi <[email protected]>

Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"Two commits to fix the following subtle cgroup2 behavior bugs:

   - cpu.max was rejecting config when it shouldn't

   - thread mode enable was allowed when it shouldn't"

* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix rule checking for threaded mode switching
  sched, cgroup: Don't reject lower cpu.max on ancestors

ACPI / watchdog: Fix off-by-one error at resource assignment

The resource allocation in WDAT watchdog has off-one-by error, it sets
one byte more than the actual end address. This may eventually lead
to unexpected resource conflicts.

Fixes: 058dfc767008 (ACPI / watchdog: Add support for WDAT hardware watchdog)
Cc: 4.9+ <[email protected]> # 4.9+
Signed-off-by: Takashi Iwai <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fixes from Tejun Heo:
"Two low-impact workqueue commits.

  One fixes workqueue creation error path and the other removes the
  unused cancel_work()"

* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove unused cancel_work()
  workqueue: use put_device() instead of kfree()

Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

Pull percpu fixes from Tejun Heo:
"Late percpu pull request for v4.16-rc6.

   - percpu allocator pool replenishing no longer triggers OOM or
     warning messages.

     Also, the alloc interface now understands __GFP_NORETRY and
     __GFP_NOWARN. This is to allow avoiding OOMs from userland
     triggered actions like bpf map creation.

     Also added cond_resched() in alloc loop.

   - perpcu allocation now can be interrupted by kill sigs to avoid
     deadlocking OOM killer.

   - Added Dennis Zhou as a co-maintainer.

     He has rewritten the area map allocator, understands most of the
     code base and has been responsive for all bug reports"

* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods
  mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()
  percpu: include linux/sched.h for cond_resched()
  percpu: add a schedule point in pcpu_balance_workfn()
  percpu: allow select gfp to be passed to underlying allocators
  percpu: add __GFP_NORETRY semantics to the percpu balancing path
  percpu: match chunk allocator declarations with definitions
  percpu: add Dennis Zhou as a percpu co-maintainer

Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"I sat on them too long and it's quite a few this late, but nothing has
  a wide blast area. The changes are...

   - Fix corner cases in SG command handling.

   - Recent introduction of default powersaving mode config option
     exposed several devices with broken powersaving behaviors. A number
     of patches to update the blacklist accordingly.

   - Fix a kernel panic on SAS hotplug.

   - Other misc and device specific updates"

* 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version
  libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions
  libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs
  libata: Enable queued TRIM for Samsung SSD 860
  PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L
  ahci: Add PCI-id for the Highpoint Rocketraid 644L card
  ata: do not schedule hot plug if it is a sas host
  libata: disable LPM for Crucial BX100 SSD 500GB drive
  libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs
  libata: update documentation for sysfs interfaces
  ata: sata_rcar: Remove unused variable in sata_rcar_init_controller()
  libata: transport: cleanup documentation of sysfs interface
  sata_rcar: Reset SATA PHY when Salvator-X board resumes
  libata: don't try to pass through NCQ commands to non-NCQ devices
  libata: remove WARN() for DMA or PIO command without data
  libata: fix length validation of ATAPI-relayed SCSI commands
  ata: libahci: fix comment indentation
  ahci: Add check for device presence (PCIe hot unplug) in ahci_stop_engine()
  libata: Fix compile warning with ATA_DEBUG enabled

nfsd: remove blocked locks on client teardown

We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.

Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.

With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.

Fixes: 76d348fadff5 (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Cc: [email protected] # 4.9
Signed-off-by: J. Bruce Fields <[email protected]>

Merge branch 'bpf-sockmap-ulp'

John Fastabend says:

====================
This series adds a BPF hook for sendmsg and senfile by using
the ULP infrastructure and sockmap. A simple pseudocode example
would be,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
                &obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

After the above snippet any socket attached to the map would run
msg_prog on sendmsg and sendfile system calls.

Three additional helpers are added bpf_msg_apply_bytes(),
bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With
bpf_msg_apply_bytes BPF programs can tell the infrastructure how
many bytes the given verdict should apply to. This has two cases.
First, a BPF program applies verdict to fewer bytes than in the
current sendmsg/sendfile msg this will apply the verdict to the
first N bytes of the message then run the BPF program again with
data pointers recalculated to the N+1 byte. The second case is the
BPF program applies a verdict to more bytes than the current sendmsg
or sendfile system call. In this case the infrastructure will cache
the verdict and apply it to future sendmsg/sendfile calls until the
byte limit is reached. This avoids the overhead of running BPF
programs on large payloads.

The helper bpf_msg_cork_bytes() handles a different case where
a BPF program can not reach a verdict on a msg until it receives
more bytes AND the program doesn't want to forward the packet
until it is known to be "good". The example case being a user
(albeit a dumb one probably) sends a N byte header in 1B system
calls. The BPF program can call bpf_msg_cork_bytes with the
required byte limit to reach a verdict and then the program will
only be called again once N bytes are received.

The last helper added in this series is bpf_msg_pull_data(). It
is used to pull data in for modification or reading. Similar to
how sk_pull_data() works msg_pull_data can be used to access data
not in the initial (data_start, data_end) range. For sendpage()
calls this is needed if any data is accessed because the BPF
sendpage hook initializes the data_start and data_end pointers to
zero. We do this because sendpage data is shared with the user
and can be modified during or after the BPF verdict possibly
invalidating any verdict the BPF program decides. For sendmsg
the data is already copied by the sendmsg bpf infrastructure so
we only copy the data if the user request a data range that is
not already linearized. This happens if the user requests larger
blocks of data that are not in a single scatterlist element. The
common case seems to be accessing headers which normally are
in the first scatterlist element and already linearized.

For more examples please review the sample program. There are
examples for all the actions and helpers there.

Patches 1-8 implement the above sockmap/BPF infrastructure. The
remaining patches flush out some minimal selftests and the sample
sockmap program. The sockmap sample program is the main vehicle
for testing this infrastructure and will be moved into selftests
shortly. The final patch in this series is a simple shell script
to run a set of tests. These are the tests I run after any changes
to sockmap. The next task on the list after this series is to
push those into selftests so we can avoid manually testing.

Couple notes on future items in the pipeline,

  0. move sample sockmap programs into selftests (noted above)
  1. add additional support for tcp flags, most are ignored now.
  2. add a Documentation/bpf/sockmap file with these details
  3. support stacked ULP types to allow this and ktls to cooperate
  4. Ingress flag support, redirect only supports egress here. The
     other redirect helpers support ingress and egress flags.
  5. add optimizations, I cut a few optimizations here in the
     first iteration of the code for later study/implementation

-v3 updates
  : u32 data pointers in msg_md changed to void *
  : page_address NULL check and flag verification in msg_pull_data
  : remove old note in commit msg that is no longer relevant
  : remove enum sk_msg_action its not used anywhere
  : fixup test_verifier W -> DW insn to account for data pointers
  : unintentionally dropped a smap_stop_tx() call in sockmap.c

I propagated the ACKs forward because above changes were small
one/two line changes.

-v2 updates (discussion):

Dave noticed that sendpage call was previously (in v1) running
on the data directly. This allowed users to potentially modify
the data after or during the BPF program. However doing a copy
automatically even if the data is not accessed has measurable
performance impact. So we added another helper modeled after
the existing skb_pull_data() helper to allow users to selectively
pull data from the msg. This is also useful in the sendmsg case
when users need to access data outside the first scatterlist
element or across scatterlist boundaries.

While doing this I also unified the sendmsg and sendfile handlers
a bit. Originally the sendfile call was optimized for never
touching the data. I've decided for a first submission to drop
this optimization and we can add it back later. It introduced
unnecessary complexity, at least for a first posting, for a
use case I have not entirely flushed out yet. When the use
case is deployed we can add it back if needed. Then we can
review concrete performance deltas as well on real-world
use-cases/applications.

Lastly, I reorganized the patches a bit. Now all sockmap
changes are in a single patch and each helper gets its own
patch. This, at least IMO, makes it easier to review because
sockmap changes are not spread across the patch series. On
the other hand now apply_bytes, cork_bytes logic is only
activated later in the series. But that should be OK.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap test script

This adds the test script I am currently using to validate
the latest sockmap changes. Shortly sockmap will be ported
to selftests and these will be run from the infrastructure
there. Until then add the script here so we have a coverage
checklist when porting into selftests.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap sample test for bpf_msg_pull_data

This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.

The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap add SK_DROP tests

Add tests for SK_DROP.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap sample support for bpf_msg_cork_bytes()

Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.

Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap, add sample option to test apply_bytes helper

This adds an option to test the apply_bytes helper. This option lets
the user specify an int on the command line specifying how much data
each verdict should apply to.

When this is set a map entry is set with the bytes input by the user
and then the specified program --txmsg or --txmsg_redir will use the
value and set the applied data. If no other option is set then a
default --txmsg_apply program is run. This program will drop pkts
if an error is detected on the bytes map lookup. Useful to verify
the map lookup and apply helper are working and causing a hard
error if it is not.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap sample, add data verification option

To verify data is not being dropped or corrupted this adds an option
to verify test-patterns on recv.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap sample, add sendfile test

To exercise TX ULP sendpage implementation we need a test that does
a sendfile. Add sendfile test option here.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap sample, add option to attach SK_MSG program

Add sockmap option to use SK_MSG program types.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG

Test read and writes for BPF_PROG_TYPE_SK_MSG.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: add map tests for BPF_PROG_TYPE_SK_MSG

Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sk_msg program helper bpf_sk_msg_pull_data

Currently, if a bpf sk msg program is run the program
can only parse data that the (start,end) pointers already
consumed. For sendmsg hooks this is likely the first
scatterlist element. For sendpage this will be the range
(0,0) because the data is shared with userspace and by
default we want to avoid allowing userspace to modify
data while (or after) BPF verdict is being decided.

To support pulling in additional bytes for parsing use
a new helper bpf_sk_msg_pull(start, end, flags) which
works similar to cls tc logic. This helper will attempt
to point the data start pointer at 'start' bytes offest
into msg and data end pointer at 'end' bytes offset into
message.

After basic sanity checks to ensure 'start' <= 'end' and
'end' <= msg_length there are a few cases we need to
handle.

First the sendmsg hook has already copied the data from
userspace and has exclusive access to it. Therefor, it
is not necessesary to copy the data. However, it may
be required. After finding the scatterlist element with
'start' offset byte in it there are two cases. One the
range (start,end) is entirely contained in the sg element
and is already linear. All that is needed is to update the
data pointers, no allocate/copy is needed. The other case
is (start, end) crosses sg element boundaries. In this
case we allocate a block of size 'end - start' and copy
the data to linearize it.

Next sendpage hook has not copied any data in initial
state so that data pointers are (0,0). In this case we
handle it similar to the above sendmsg case except the
allocation/copy must always happen. Then when sending
the data we have possibly three memory regions that
need to be sent, (0, start - 1), (start, end), and
(end + 1, msg_length). This is required to ensure any
writes by the BPF program are correctly transmitted.

Lastly this operation will invalidate any previous
data checks so BPF programs will have to revalidate
pointers after making this BPF call.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap, add msg_cork_bytes() helper

In the case where we need a specific number of bytes before a
verdict can be assigned, even if the data spans multiple sendmsg
or sendfile calls. The BPF program may use msg_cork_bytes().

The extreme case is a user can call sendmsg repeatedly with
1-byte msg segments. Obviously, this is bad for performance but
is still valid. If the BPF program needs N bytes to validate
a header it can use msg_cork_bytes to specify N bytes and the
BPF program will not be called again until N bytes have been
accumulated. The infrastructure will attempt to coalesce data
if possible so in many cases (most my use cases at least) the
data will be in a single scatterlist element with data pointers
pointing to start/end of the element. However, this is dependent
on available memory so is not guaranteed. So BPF programs must
validate data pointer ranges, but this is the case anyways to
convince the verifier the accesses are valid.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: sockmap, add bpf_msg_apply_bytes() helper

A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.

To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.

Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

This implements a BPF ULP layer to allow policy enforcement and
monitoring at the socket layer. In order to support this a new
program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
the sendmsg/sendpage hook. To attach the policy to sockets a
sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.

Similar to previous sockmap usages when a sock is added to a
sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
program type attached then the BPF ULP layer is created on the
socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
every msg in sendmsg case and page/offset in sendpage case.

BPF_PROG_TYPE_SK_MSG Semantics/API:

BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
case and in the sendpage case leaves the data untouched. Both cases
return -EACESS to the user. Returning SK_PASS will allow the msg to
be sent.

In the sendmsg case data is copied into kernel space buffers before
running the BPF program. The kernel space buffers are stored in a
scatterlist object where each element is a kernel memory buffer.
Some effort is made to coalesce data from the sendmsg call here.
For example a sendmsg call with many one byte iov entries will
likely be pushed into a single entry. The BPF program is run with
data pointers (start/end) pointing to the first sg element.

In the sendpage case data is not copied. We opt not to copy the
data by default here, because the BPF infrastructure does not
know what bytes will be needed nor when they will be needed. So
copying all bytes may be wasteful. Because of this the initial
start/end data pointers are (0,0). Meaning no data can be read or
written. This avoids reading data that may be modified by the
user. A new helper is added later in this series if reading and
writing the data is needed. The helper call will do a copy by
default so that the page is exclusively owned by the BPF call.

The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
in the sendmsg() case and the entire page/offset in the sendpage case.
This avoids ambiguity on how to handle mixed return codes in the
sendmsg case. Again a helper is added later in the series if
a verdict needs to apply to multiple system calls and/or only
a subpart of the currently being processed message.

The helper msg_redirect_map() can be used to select the socket to
send the data on. This is used similar to existing redirect use
cases. This allows policy to redirect msgs.

Pseudo code simple example:

The basic logic to attach a program to a socket is as follows,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
&obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

Adding sockets to the map is done in the normal way,

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

After the above any socket attached to "my_sock_map", in this case
'fd', will run the BPF msg verdict program (msg_prog) on every
sendmsg and sendpage system call.

For a complete example see BPF selftests or sockmap samples.

Implementation notes:

It seemed the simplest, to me at least, to use a refcnt to ensure
psock is not lost across the sendmsg copy into the sg, the bpf program
running on the data in sg_data, and the final pass to the TCP stack.
Some performance testing may show a better method to do this and avoid
the refcnt cost, but for now use the simpler method.

Another item that will come after basic support is in place is
supporting MSG_MORE flag. At the moment we call sendpages even if
the MSG_MORE flag is set. An enhancement would be to collect the
pages into a larger scatterlist and pass down the stack. Notice that
bpf_tcp_sendmsg() could support this with some additional state saved
across sendmsg calls. I built the code to support this without having
to do refactoring work. Other features TBD include ZEROCOPY and the
TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series
shortly.

Future work could improve size limits on the scatterlist rings used
here. Currently, we use MAX_SKB_FRAGS simply because this was being
used already in the TLS case. Future work could extend the kernel sk
APIs to tune this depending on workload. This is a trade-off
between memory usage and throughput performance.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

net: generalize sk_alloc_sg to work with scatterlist rings

The current implementation of sk_alloc_sg expects scatterlist to always
start at entry 0 and complete at entry MAX_SKB_FRAGS.

Future patches will want to support starting at arbitrary offset into
scatterlist so add an additional sg_start parameters and then default
to the current values in TLS code paths.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG

When calling do_tcp_sendpages() from in kernel and we know the data
has no references from user side we can omit SKBTX_SHARED_FRAG flag.
This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
to omit setting SKBTX_SHARED_FRAG.

The flag is not exposed to userspace because the sendpage call from
the splice logic masks out all bits except MSG_MORE.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

sockmap: convert refcnt to an atomic refcnt

The sockmap refcnt up until now has been wrapped in the
sk_callback_lock(). So its not actually needed any locking of its
own. The counter itself tracks the lifetime of the psock object.
Sockets in a sockmap have a lifetime that is independent of the
map they are part of. This is possible because a single socket may
be in multiple maps. When this happens we can only release the
psock data associated with the socket when the refcnt reaches
zero. There are three possible delete sock reference decrement
paths first through the normal sockmap process, the user deletes
the socket from the map. Second the map is removed and all sockets
in the map are removed, delete path is similar to case 1. The third
case is an asyncronous socket event such as a closing the socket. The
last case handles removing sockets that are no longer available.
For completeness, although inc does not pose any problems in this
patch series, the inc case only happens when a psock is added to a
map.

Next we plan to add another socket prog type to handle policy and
monitoring on the TX path. When we do this however we will need to
keep a reference count open across the sendmsg/sendpage call and
holding the sk_callback_lock() here (on every send) seems less than
ideal, also it may sleep in cases where we hit memory pressure.
Instead of dealing with these issues in some clever way simply make
the reference counting a refcnt_t type and do proper atomic ops.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

sock: make static tls function alloc_sg generic sock helper

The TLS ULP module builds scatterlists from a sock using
page_frag_refill(). This is going to be useful for other ULPs
so move it into sock file for more general use.

In the process remove useless goto at end of while loop.

Signed-off-by: John Fastabend <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

RDMA/verbs: Remove restrack entry from XRCD structure

XRCD object is not implemented in the restrack, so lets remove it.

Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/ucma: Fix use-after-free access in ucma_close

The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.

Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Sean Hefty <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-03-19

This series contains updates to i40e and i40evf only.

Alex fixes a potential deadlock in the configure_clsflower function in
i40evf, where we exit with the "IN_CRITICAL_TASK" bit set while
notifying the PF of flower filters.

Jan fixed an issue where it was possible to set a mode that is not
allowed which resulted in link being down, so fixed the parity between
i40e_set_link_ksettings() and i40e_get_link_ksettings().

Patryk fixes a bug where a backplane device was allowing the setting of
link settings, which is not allowed.

Shiraz fixes a crash when entering S3 because the client interface was
freeing the MSIx vectors while they are still in use.

Jake fixes up a function header comment to document a newly added
parameter. Also cleaned up flags that were never used.

Doug fixes the incorrect return type for i40e_aq_add_cloud_filters().
====================

Signed-off-by: David S. Miller <[email protected]>

percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods

percpu_ref internally uses sched-RCU to implement the percpu -> atomic
mode switching and the documentation suggested that this could be
depended upon.  This doesn't seem like a good idea.

* percpu_ref uses sched-RCU which has different grace periods regular
  RCU.  Users may combine percpu_ref with regular RCU usage and
  incorrectly believe that regular RCU grace periods are performed by
  percpu_ref.  This can lead to, for example, use-after-free due to
  premature freeing.

* percpu_ref has a grace period when switching from percpu to atomic
  mode.  It doesn't have one between the last put and release.  This
  distinction is subtle and can lead to surprising bugs.

* percpu_ref allows starting in and switching to atomic mode manually
  for debugging and other purposes.  This means that there may not be
  any grace periods from kill to release.

This patch makes it clear that the grace periods are percpu_ref's
internal implementation detail and can't be depended upon by the
users.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

i40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE

This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the
PF Reset path when Global Reset is in progress. While the driver
is polling for the end of the PF Reset and the Global Reset is
triggered, abandon the PF Reset path and prepare for the
upcoming Global Reset.

Signed-off-by: Paweł Jabłoński <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40evf: remove flags that are never used

These flags were defined, but there is no use within the driver code, so
we don't need to keep them.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Prevent setting link speed on I40E_DEV_ID_25G_B

Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Fix incorrect return types

Fix return types from i40e_status to enum i40e_status_code.

Signed-off-by: Doug Dziggel <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: add doxygen comment for new mode parameter

A recent patch updated the signature for i40e_aq_set_switch_config() to
add a new parameter 'mode'. It forgot to document the parameter in the
doxygen function header comment. Add the parameter to the function
description now.

Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Close client on suspend and restore client MSIx on resume

During suspend client MSIx vectors are freed while they are still
in use causing a crash on entering S3.

Fix this calling client close before freeing up its MSIx vectors.
Also update the client MSIx vectors on resume before client
open is called.

Fixes commit b980c0634fe5 ("i40e: shutdown all IRQs and disable MSI-X
when suspended")

Reported-by: Stefan Assmann <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Prevent setting link speed on KX_X722

Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Properly check allowed advertisement capabilities

The i40e_set_link_ksettings and i40e_get_link_ksettings use different
codepaths to check available and supported advertisement modes. This
creates scenarios where it's possible to set a mode that's not allowed,
resulting in a link down.

Fix setting advertisement in i40e_set_link_ksettings by calling
i40e_get_link_ksettings to check what modes are allowed.

Signed-off-by: Jan Sokolowski <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()

In case of memory deficit and low percpu memory pages,
pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
time (as it makes memory allocations itself and waits
for memory reclaim). If tasks doing pcpu_alloc() are
choosen by OOM killer, they can't exit, because they
are waiting for the mutex.

The patch makes pcpu_alloc() to care about killing signal
and use mutex_lock_killable(), when it's allowed by GFP
flags. This guarantees, a task does not miss SIGKILL
from OOM killer.

Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

percpu: include linux/sched.h for cond_resched()

microblaze build broke due to missing declaration of the
cond_resched() invocation added recently. Let's include linux/sched.h
explicitly.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: kbuild test robot <[email protected]>

i40evf: Reorder configure_clsflower to avoid deadlock on error

While doing some code review I noticed that we can get into a state where
we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of
flower filters. This patch is meant to address that plus tweak the ordering
of the while loop waiting on it slightly so that we don't wait an extra
period after we have failed for the last time.

Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

clk: bcm2835: Protect sections updating shared registers

CM_PLLx and A2W_XOSC_CTRL registers are accessed by different clock
handlers and must be accessed with ->regs_lock held.
Update the sections where this protection is missing.

Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks")
Cc: <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: bcm2835: Fix ana->maskX definitions

ana->maskX values are already '~'-ed in bcm2835_pll_set_rate(). Remove
the '~' in the definition to fix ANA setup.

Note that this commit fixes a long standing bug preventing one from
using an HDMI display if it's plugged after the FW has booted Linux.
This is because PLLH is used by the HDMI encoder to generate the pixel
clock.

Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio domain clocks")
Cc: <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

drm/amd/display: fix dereferencing possible ERR_PTR()

This patch fixes static checker warning caused by
"36cc549d5986: "drm/amd/display: disable CRTCs with
NULL FB on their primary plane (V2)"

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Shirish S <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Refine disable VGA

bad case won't follow normal sense, it will not enable vga1 as usual, but vga2,3,4 is on.

Signed-off-by: Clark Zheng <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version

When commit 9c7be59fc519af ("libata: Apply NOLPM quirk to Crucial MX100
512GB SSDs") was added it inherited the ATA_HORKAGE_NO_NCQ_TRIM quirk
from the existing "Crucial_CT*MX100*" entry, but that entry sets model_rev
to "MU01", where as the entry adding the NOLPM quirk sets it to NULL.

This means that after this commit we no apply the NO_NCQ_TRIM quirk to
all "Crucial_CT512MX100*" SSDs even if they have the fixed "MU02"
firmware. This commit splits the "Crucial_CT512MX100*" quirk into 2
quirks, one for the "MU01" firmware and one for all other firmware
versions, so that we once again only apply the NO_NCQ_TRIM quirk to the
"MU01" firmware version.

Fixes: 9c7be59fc519af ("libata: Apply NOLPM quirk to ... MX100 512GB SSDs")
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions

Commit b17e5729a630 ("libata: disable LPM for Crucial BX100 SSD 500GB
drive"), introduced a ATA_HORKAGE_NOLPM quirk for Crucial BX100 500GB SSDs
but limited this to the MU02 firmware version, according to:
http://www.crucial.com/usa/en/support-ssd-firmware

MU02 is the last version, so there are no newer possibly fixed versions
and if the MU02 version has broken LPM then the MU01 almost certainly
also has broken LPM, so this commit changes the quirk to apply to all
firmware versions.

Fixes: b17e5729a630 ("libata: disable LPM for Crucial BX100 SSD 500GB...")
Cc: [email protected]
Cc: Kai-Heng Feng <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs

There have been reports of the Crucial M500 480GB model not working
with LPM set to min_power / med_power_with_dipm level.

It has not been tested with medium_power, but that typically has no
measurable power-savings.

Note the reporters Crucial_CT480M500SSD3 has a firmware version of MU03
and there is a MU05 update available, but that update does not mention any
LPM fixes in its changelog, so the quirk matches all firmware versions.

In my experience the LPM problems with (older) Crucial SSDs seem to be
limited to higher capacity versions of the SSDs (different firmware?),
so this commit adds a NOLPM quirk for the 480 and 960GB versions of the
M500, to avoid LPM causing issues with these SSDs.

Cc: [email protected]
Reported-and-tested-by: Martin Steigerwald <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

can: cc770: Fix use after free in cc770_tx_interrupt()

This fixes use after free introduced by the last cc770 patch.

Signed-off-by: Andri Yngvason <[email protected]>
Fixes: 746201235b3f ("can: cc770: Fix queue stall & dropped RTR reply")
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

Revert "ACPI / battery: Add quirk for Asus GL502VSK and UX305LA"

Revert commit c68f0676ef7d ("ACPI / battery: Add quirk for Asus
GL502VSK and UX305LA") and commit 4446823e2573 ("ACPI / battery: Add
quirk for Asus UX360UA and UX410UAK").

On many many Asus products, the battery is sometimes reported as
charging or discharging even when it is full and you are on AC power.
This change quirked the kernel to avoid advertising the discharging
state when this happens on 4 laptop models, under the belief that
this was incorrect information.  I presume it originates from user
reports who are confused that their battery status icon says that it
is discharging.

However, the reported information is indeed correct, and the quirk
approach taken is inadequate and more thought is needed first.
Specifically:

1. It only quirks discharging state, not charging

2. There are so many different Asus products and DMI naming variants
    within those product families that behave this way; Linux could
    grow to quirk hundreds of products and still not even be close at
    "winning" this battle.

3. Asus previously clarified that this behaviour is intentional. The
    platform will periodically do a partial discharge/charge cycle
    when the battery is full, because this is one way to extend the
    lifetime of the battery (leaving a battery at 100% charge and
    unused will decrease its usable capacity over time).

    My understanding is that any decent consumer product will have
    this behaviour, but it appears that Asus is different in that
    they expose this info through ACPI.

    However, the behaviour seems correct. The ACPI spec does not
    suggest in that the platform should hide the truth.  It lets you
    report that the battery is full of charge, and discharging, and
    with external power connected; and Asus does this.

4. In terms of not confusing the user, this seems like something that
    could/should be handled by userspace, which can also detect these
    same (accurate) conditions in the general case.

Revert this quirk before it gets included in a release, while we look
for better approaches.

Signed-off-by: Daniel Drake <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

drm/tegra: Shutdown on driver unbind

Since commit 846c7dfc1193 ("drm/atomic: Try to preserve the crtc enabled
state in drm_atomic_remove_fb, v2."), removing the last framebuffer will
no longer disable the corresponding pipeline, which causes the KMS core
to complain about leaked connectors on driver unbind.

Fix this by calling drm_atomic_helper_shutdown() on driver unbind, which
will cause all display pipelines to be shut down and therefore drop the
extra references on the connectors.

Signed-off-by: Thierry Reding <[email protected]>

drm/tegra: dsi: Don't disable regulator on ->exit()

The regulator is controlled as part of runtime PM, so it should not be
additionally disabled from the ->exit() callback.

Signed-off-by: Thierry Reding <[email protected]>

drm/tegra: dc: Detach IOMMU group from domain only once

Detaching from an IOMMU group multiple times can lead to a crash. This
could potentially be fixed in the IOMMU driver, but it's easy to avoid
the subsequent detach operations in this driver, so do that as well.

Signed-off-by: Thierry Reding <[email protected]>

Linux 4.16-rc6

dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node

The #sound-dai-cells DT property is required to describe link between
the HDMI IP block and the SoC's audio subsystem.

Signed-off-by: Sylwester Nawrocki <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Inki Dae <[email protected]>

selftests: pmtu: Drop prints to kernel log from pmtu_vti6_link_change_mtu

Reported-by: David Ahern <[email protected]>
Fixes: 1fad59ea1c34 ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mv88e6xxx-auto-phy-intr'

Andrew Lunn says:

====================
Automatic PHY interrupts

Now that the mv88e6xxx driver either installs in interrupt handler, or
polls for interrupts, it is possible to always handle PHY interrupts,
rather than have phylib perform the polling. This speeds up detection
of link changes and reduces the load on the MDIO bus, which is
beneficial for PTP.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add MDIO interrupts for internal PHYs

When registering an MDIO bus, it is possible to pass an array of
interrupts, one per address on the bus. phylib will then associate the
interrupt to the PHY device, if no other interrupt is provided.

Some of the global2 interrupts are PHY interrupts. Place them into the
MDIO bus structure.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add number of internal PHYs

Add to the info structure the number of internal PHYs, if they generate
interrupts. Some of the older generations of switches have internal
PHYs, but no interrupt registers. In this case, set the count to zero.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add missing g1 IRQ numbers

With the recent change to polling for interrupts, it is important that
the number of global 1 interrupts is listed. Without it, the driver
requests an interrupt domain for zero interrupts, which returns
EINVAL, and the probe fails.

Add two missing entries.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats

We can hit the register lock not held assertion with the following path:

[   34.170631] mv88e6085 0.1:00: Switch registers lock not held!
[   34.176510] CPU: 0 PID: 950 Comm: ethtool Not tainted 4.16.0-rc4 #143
[   34.182985] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[   34.189519] Backtrace:
[   34.192033] [<8010c4b4>] (dump_backtrace) from [<8010c788>] (show_stack+0x20/0x24)
[   34.199680]  r6:9f5dc010 r5:00000011 r4:9f5dc010 r3:00000000
[   34.205434] [<8010c768>] (show_stack) from [<80679d38>] (dump_stack+0x24/0x28)
[   34.212719] [<80679d14>] (dump_stack) from [<804844a8>] (mv88e6xxx_read+0x70/0x7c)
[   34.220376] [<80484438>] (mv88e6xxx_read) from [<804870dc>] (mv88e6xxx_port_get_cmode+0x34/0x4c)
[   34.229257]  r5:a09cd128 r4:9ee31d07
[   34.232880] [<804870a8>] (mv88e6xxx_port_get_cmode) from [<80487e6c>] (mv88e6352_port_has_serdes+0x24/0x64)
[   34.242690]  r4:9f5dc010
[   34.245309] [<80487e48>] (mv88e6352_port_has_serdes) from [<804880b8>] (mv88e6352_serdes_get_stats+0x28/0x12c)
[   34.255389]  r4:00000001
[   34.257973] [<80488090>] (mv88e6352_serdes_get_stats) from [<804811e8>] (mv88e6xxx_get_ethtool_stats+0xb0/0xc0)
[   34.268156]  r10:00000000 r9:00000000 r8:00000000 r7:a09cd020 r6:00000001 r5:9f5dc01c
[   34.276052]  r4:9f5dc010
[   34.278631] [<80481138>] (mv88e6xxx_get_ethtool_stats) from [<8064f740>] (dsa_slave_get_ethtool_stats+0xbc/0xc4)

mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
chip->info->ops->stats_get_stats(), which holds the register lock, and
chip->info->ops->serdes_get_stats() which does not. Have
chip->info->ops->serdes_get_stats() be running with the register lock held to
avoid such assertions.

Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fec: Fix unbalanced PM runtime calls

When unbinding/removing the driver, we will run into the following warnings:

[  259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not found, using dummy regulator
[  259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
[  259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Invalid MAC address: 00:00:00:00:00:00
[  259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): Using random MAC address: f2:3e:93:b7:29:c1
[  259.696239] libphy: fec_enet_mii_bus: probed

Avoid these warnings by balancing the runtime PM calls during fec_drv_remove().

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum updates:

   - Iron out the last late microcode loading issues by actually
     checking whether new microcode is present and preventing the CPU
     synchronization to run into a timeout induced hang.

   - Remove Skylake C2 from the microcode blacklist according to the
     latest Intel documentation

   - Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
     not. Enhance the selftests to catch that kind of issue

   - Annotate indirect calls/jumps for objtool on 32bit. This is not a
     functional issue, but for consistency sake its the right thing to
     do.

   - Fix a jump label build warning observed on SPARC64 which uses 32bit
     storage for the code location which is casted to 64 bit pointer w/o
     extending it to 64bit first.

   - Add two new cpufeature bits. Not really an urgent issue, but
     provides them for both x86 and x86/kvm work. No impact on the
     current kernel"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Fix CPU synchronization routine
  x86/microcode: Attempt late loading only when new microcode is present
  x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
  jump_label: Fix sparc64 warning
  x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
  x86/vm86/32: Fix POPF emulation
  selftests/x86/entry_from_vm86: Add test cases for POPF
  selftests/x86/entry_from_vm86: Exit with 1 if we fail
  x86/cpufeatures: Add Intel PCONFIG cpufeature
  x86/cpufeatures: Add Intel Total Memory Encryption cpufeature

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
"A single fix for vmalloc_fault() which uses p*d_huge() unconditionally
  whether CONFIG_HUGETLBFS is set or not. In case of CONFIG_HUGETLBFS=n
  this results in a crash as p*d_huge() returns 0 in that case"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix vmalloc_fault to use pXd_large

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"Three fixes for irq chip drivers:

   - Make sure the allocations in the GIC-V3 ITS driver are large enough
     to accomodate the interrupt space

   - Fix a misplaced __iomem annotation which causes a splat of 26
     sparse warnings

   - Remove an unused function in the IMX GPCV2 driver which causes
     build warnings"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-imx-gpcv2: Remove unused function
  irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
  irqchip/gic-v3-its: Fix misplaced __iomem annotations

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fix from Thomas Gleixner:
"A single fix to prevent partially initialized pointers in mixed mode
(64bit kernel on 32bit UEFI)"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/tpm: Initialize pointer variables to zero for mixed mode