Git Repo - linux.git/log

mpls: fix uninitialized in_label var warning in mpls_getroute

Fix the below warning generated by static checker:
net/mpls/af_mpls.c:2111 mpls_getroute()
error: uninitialized symbol 'in_label'."

Fixes: 397fc9e5cefe ("mpls: route get support")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

doc: SKB_GSO_[IPIP|SIT] have been replaced

Those enum values don't exist anymore.

Fixes: 7e13318daa4a ("net: define gso types for IPx over IPv4 and IPv6")
CC: Tom Herbert <[email protected]>
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: avoid NETDEV_CHANGEMTU event when unregistering slave

As Hongjun/Nicolas summarized in their original patch:

"
When a device changes from one netns to another, it's first unregistered,
then the netns reference is updated and the dev is registered in the new
netns. Thus, when a slave moves to another netns, it is first
unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
the bonding driver. The driver calls bond_release(), which calls
dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
the old netns).
"

This is a very special case, because the device is being unregistered
no one should still care about the NETDEV_CHANGEMTU event triggered
at this point, we can avoid broadcasting this event on this path,
and avoid touching inetdev_event()/addrconf_notify() path.

It requires to export __dev_set_mtu() to bonding driver.

Reported-by: Hongjun Li <[email protected]>
Reported-by: Nicolas Dichtel <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'rds-tcp-sock_graft-leak'

Sowmini Varadhan says:

====================
rds-tcp: sock_graft() leak

Following up on the discussion at
https://www.spinics.net/lists/netdev/msg442859.html
- make rds_tcp_accept_one() call sock_create_lite()
- add a WARN_ON() to sock_graft()

Tested by running an infinite while() loop that does
(module-load; rds-stress; module-unload) and monitors
TCP slabinfo while the test is running.
====================

Signed-off-by: David S. Miller <[email protected]>

net/sock: add WARN_ON(parent->sk) in sock_graft()

sock_graft() unilaterally sets up parent->sk based on the
assumption that the existing parent->sk is null. If this
condition is not true, then the existing parent->sk would
be leaked, so add a WARN_ON() to alert callers who may fall
in this category.

Signed-off-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rds: tcp: use sock_create_lite() to create the accept socket

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'hns-fixes'

Lin Yun Sheng says:

====================
Bugfixs for hns ethernet driver

This patchset fix skb used after free and C45 op code issues
in hns driver.

Patch V2:
1. Remove ndev->feature checking in TX description patch.
2. Add Fixes: Tag in patch description.
====================

Signed-off-by: David S. Miller <[email protected]>

net: hns: Fix a skb used after free bug

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK,
which cause hns_nic_net_xmit to use a freed skb.

BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
[17659.112635]      alloc_debug_processing+0x18c/0x1a0
[17659.117208]      __slab_alloc+0x52c/0x560
[17659.120909]      kmem_cache_alloc_node+0xac/0x2c0
[17659.125309]      __alloc_skb+0x6c/0x260
[17659.128837]      tcp_send_ack+0x8c/0x280
[17659.132449]      __tcp_ack_snd_check+0x9c/0xf0
[17659.136587]      tcp_rcv_established+0x5a4/0xa70
[17659.140899]      tcp_v4_do_rcv+0x27c/0x620
[17659.144687]      tcp_prequeue_process+0x108/0x170
[17659.149085]      tcp_recvmsg+0x940/0x1020
[17659.152787]      inet_recvmsg+0x124/0x180
[17659.156488]      sock_recvmsg+0x64/0x80
[17659.160012]      SyS_recvfrom+0xd8/0x180
[17659.163626]      __sys_trace_return+0x0/0x4
[17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
[17659.174000]      free_debug_processing+0x1d4/0x2c0
[17659.178486]      __slab_free+0x240/0x390
[17659.182100]      kmem_cache_free+0x24c/0x270
[17659.186062]      kfree_skbmem+0xa0/0xb0
[17659.189587]      __kfree_skb+0x28/0x40
[17659.193025]      napi_gro_receive+0x168/0x1c0
[17659.197074]      hns_nic_rx_up_pro+0x58/0x90
[17659.201038]      hns_nic_rx_poll_one+0x518/0xbc0
[17659.205352]      hns_nic_common_poll+0x94/0x140
[17659.209576]      net_rx_action+0x458/0x5e0
[17659.213363]      __do_softirq+0x1b8/0x480
[17659.217062]      run_ksoftirqd+0x64/0x80
[17659.220679]      smpboot_thread_fn+0x224/0x310
[17659.224821]      kthread+0x150/0x170
[17659.228084]      ret_from_fork+0x10/0x40

BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
[17751.080490]      __slab_alloc+0x52c/0x560
[17751.084188]      kmem_cache_alloc+0x244/0x280
[17751.088238]      __build_skb+0x40/0x150
[17751.091764]      build_skb+0x28/0x100
[17751.095115]      __alloc_rx_skb+0x94/0x150
[17751.098900]      __napi_alloc_skb+0x34/0x90
[17751.102776]      hns_nic_rx_poll_one+0x180/0xbc0
[17751.107097]      hns_nic_common_poll+0x94/0x140
[17751.111333]      net_rx_action+0x458/0x5e0
[17751.115123]      __do_softirq+0x1b8/0x480
[17751.118823]      run_ksoftirqd+0x64/0x80
[17751.122437]      smpboot_thread_fn+0x224/0x310
[17751.126575]      kthread+0x150/0x170
[17751.129838]      ret_from_fork+0x10/0x40
[17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
[17751.139951]      free_debug_processing+0x1d4/0x2c0
[17751.144436]      __slab_free+0x240/0x390
[17751.148051]      kmem_cache_free+0x24c/0x270
[17751.152014]      kfree_skbmem+0xa0/0xb0
[17751.155543]      __kfree_skb+0x28/0x40
[17751.159022]      napi_gro_receive+0x168/0x1c0
[17751.163074]      hns_nic_rx_up_pro+0x58/0x90
[17751.167041]      hns_nic_rx_poll_one+0x518/0xbc0
[17751.171358]      hns_nic_common_poll+0x94/0x140
[17751.175585]      net_rx_action+0x458/0x5e0
[17751.179373]      __do_softirq+0x1b8/0x480
[17751.183076]      run_ksoftirqd+0x64/0x80
[17751.186691]      smpboot_thread_fn+0x224/0x310
[17751.190826]      kthread+0x150/0x170
[17751.194093]      ret_from_fork+0x10/0x40

Fixes: 13ac695e7ea1 ("net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem")
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: lipeng <[email protected]>
Reported-by: Jun He <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Fix a wrong op phy C45 code

As the user manual described, the second step to write to C45 phy
by mdio should be data, but not address. Here we should fix this
issue.

Fixes: 5b904d39406a ("net: add Hisilicon Network Subsystem MDIO support")
Signed-off-by: Yunsheng Lin <[email protected]>
Reviewed-by: lipeng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: macb: Adding Support for Jumbo Frames up to 10240 Bytes in SAMA5D3

As per the SAMA5D3 device specification it supports Jumbo frames.
But the suggested flag and length of bytes it supports was not updated
in this driver config_structure.
The maximum jumbo frames the device supports :
10240 bytes as per the device spec.

While changing the MTU value greater than 1500, it threw error:
sudo ifconfig eth1 mtu 9000
SIOCSIFMTU: Invalid argument

Add this support to driver so that it works as expected and designed.

Signed-off-by: vishnuvardhan <[email protected]>
[[email protected]: modify slightly commit msg]
Signed-off-by: Nicolas Ferre <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors

Consistently use types provided by <linux/types.h> to fix the following
linux/sched/types.h userspace compilation errors:

  /usr/include/linux/sched/types.h:57:2: error: unknown type name 'u32'
    u32 size;
  ...
  u64 sched_period;

Signed-off-by: Dmitry V. Levin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected] # v4.12
Fixes: e2d1e2aec572 ("sched/headers: Move various ABI definitions to <uapi/linux/sched/types.h>")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

kprobes: Ensure that jprobe probepoints are at function entry

Similar to commit 90ec5e89e393c ("kretprobes: Ensure probe location is
at function entry"), ensure that the jprobe probepoint is at function
entry.

Signed-off-by: Naveen N. Rao <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/a4525af6c5a42df385efa31251246cf7cca73598.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <[email protected]>

kprobes: Simplify register_jprobes()

Re-factor jprobe registration functions as the current version is
getting too unwieldy. Move the actual jprobe registration to
register_jprobe() and re-organize code accordingly.

Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/089cae4bfe73767f765291ee0e6fb0c3d240e5f1.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <[email protected]>

kprobes: Rename [arch_]function_offset_within_entry() to [arch_]kprobe_on_func_entry()

Rename function_offset_within_entry() to scope it to kprobe namespace by
using kprobe_ prefix, and to also simplify it.

Suggested-by: Ingo Molnar <[email protected]>
Suggested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/3aa6c7e2e4fb6e00f3c24fa306496a66edb558ea.1499443367.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <[email protected]>

locking/qspinlock: Explicitly include asm/prefetch.h

In architectures that use qspinlock, like x86, prefetch is loaded
indirectly via the asm/qspinlock.h include. On other architectures, like
OpenRISC, which may want to use asm-generic/qspinlock.h the built will
fail without the asm/prefetch.h include.

Fix this by including directly.

Signed-off-by: Stafford Horne <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

objtool: Fix sibling call detection logic

With some configs, objtool reports the following warning:

  arch/x86/kernel/ftrace.o: warning: objtool: ftrace_modify_code_direct()+0x2d: sibling call from callable instruction with modified stack frame

The instruction it's complaining about isn't actually a sibling call.
It's just a normal jump to an address inside the function.  Objtool
thought it was a sibling call because the instruction's jump_dest wasn't
initialized because the function was supposed to be ignored due to its
use of sync_core().

Objtool ended up validating the function instead of ignoring it because
it didn't properly recognize a sibling call to the function.  So fix the
sibling call logic.  Also add a warning to catch ignored functions being
validated so we'll get a more useful error message next time.

Reported-by: Mike Galbraith <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/96cc8ecbcdd8cb29ddd783817b4af918a6a171b0.1499437107.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull read/write fix from Al Viro:
"file_start_write()/file_end_write() got mixed into vfs_iter_write() by
  accident; that's a deadlock for all existing callers - they already do
  that, some - quite a bit outside.

  Easily fixed, fortunately"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  move file_{start,end}_write() out of do_iter_write()

Merge branch 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter hardening from Al Viro:
"This is the iov_iter/uaccess/hardening pile.

  For one thing, it trims the inline part of copy_to_user/copy_from_user
  to the minimum that *does* need to be inlined - object size checks,
  basically. For another, it sanitizes the checks for iov_iter
  primitives. There are 4 groups of checks: access_ok(), might_fault(),
  object size and KASAN.

   - access_ok() had been verified by whoever had set the iov_iter up.
     However, that has happened in a function far away, so proving that
     there's no path to actual copying bypassing those checks is hard
     and proving that iov_iter has not been buggered in the meanwhile is
     also not pleasant. So we want those redone in actual
     copyin/copyout.

   - might_fault() is better off consolidated - we know whether it needs
     to be checked as soon as we enter iov_iter primitive and observe
     the iov_iter flavour. No need to wait until the copyin/copyout. The
     call chains are short enough to make sure we won't miss anything -
     in fact, it's more robust that way, since there are cases where we
     do e.g. forced fault-in before getting to copyin/copyout. It's not
     quite what we need to check (in particular, combination of
     iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a
     cause to skip checks), but that's for later series. For now let's
     keep might_fault().

   - KASAN checks belong in copyin/copyout - at the same level where
     other iov_iter flavours would've hit them in memcpy().

   - object size checks should apply to *all* iov_iter flavours, not
     just iovec-backed ones.

  There are two groups of primitives - one gets the kernel object
  described as pointer + size (copy_to_iter(), etc.) while another gets
  it as page + offset + size (copy_page_to_iter(), etc.)

  For the first group the checks are best done where we actually have a
  chance to find the object size. In other words, those belong in inline
  wrappers in uio.h, before calling into iov_iter.c. Same kind as we
  have for inlined part of copy_to_user().

  For the second group there is no object to look at - offset in page is
  just a number, it bears no type information. So we do them in the
  common helper called by iov_iter.c primitives of that kind. All it
  currently does is checking that we are not trying to access outside of
  the compound page; eventually we might want to add some sanity checks
  on the page involved.

  So the things we need in copyin/copyout part of iov_iter.c do not
  quite match anything in uaccess.h (we want no zeroing, we *do* want
  access_ok() and KASAN and we want no might_fault() or object size
  checks done on that level). OTOH, these needs are simple enough to
  provide a couple of helpers (static in iov_iter.c) doing just what we
  need..."

* 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  iov_iter: saner checks on copyin/copyout
  iov_iter: sanity checks for copy to/from page primitives
  iov_iter/hardening: move object size checks to inlined part
  copy_{to,from}_user(): consolidate object size checks
  copy_{from,to}_user(): move kasan checks and might_fault() out-of-line

exec: Limit arg stack to at most 75% of _STK_LIM

To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
  that I have for this cycle. Some of the earlier patches in this pile
  may look trivial but they are prerequisites for later patches in the
  series.

  The aim of this set is to improve how we track and report writeback
  errors to userland. Most applications that care about data integrity
  will periodically call fsync/fdatasync/msync to ensure that their
  writes have made it to the backing store.

  For a very long time, we have tracked writeback errors using two flags
  in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
  writeback error occurs (via mapping_set_error) and are cleared as a
  side-effect of filemap_check_errors (as you noted yesterday). This
  model really sucks for userland.

  Only the first task to call fsync (or msync or fdatasync) will see the
  error. Any subsequent task calling fsync on a file will get back 0
  (unless another writeback error occurs in the interim). If I have
  several tasks writing to a file and calling fsync to ensure that their
  writes got stored, then I need to have them coordinate with one
  another. That's difficult enough, but in a world of containerized
  setups that coordination may even not be possible.

  But wait...it gets worse!

  The calls to filemap_check_errors can be buried pretty far down in the
  call stack, and there are internal callers of filemap_write_and_wait
  and the like that also end up clearing those errors. Many of those
  callers ignore the error return from that function or return it to
  userland at nonsensical times (e.g. truncate() or stat()). If I get
  back -EIO on a truncate, there is no reason to think that it was
  because some previous writeback failed, and a subsequent fsync() will
  (incorrectly) return 0.

  This pile aims to do three things:

   1) ensure that when a writeback error occurs that that error will be
      reported to userland on a subsequent fsync/fdatasync/msync call,
      regardless of what internal callers are doing

   2) report writeback errors on all file descriptions that were open at
      the time that the error occurred. This is a user-visible change,
      but I think most applications are written to assume this behavior
      anyway. Those that aren't are unlikely to be hurt by it.

   3) document what filesystems should do when there is a writeback
      error. Today, there is very little consistency between them, and a
      lot of cargo-cult copying. We need to make it very clear what
      filesystems should do in this situation.

  To achieve this, the set adds a new data type (errseq_t) and then
  builds new writeback error tracking infrastructure around that. Once
  all of that is in place, we change the filesystems to use the new
  infrastructure for reporting wb errors to userland.

  Note that this is just the initial foray into cleaning up this mess.
  There is a lot of work remaining here:

   1) convert the rest of the filesystems in a similar fashion. Once the
      initial set is in, then I think most other fs' will be fairly
      simple to convert. Hopefully most of those can in via individual
      filesystem trees.

   2) convert internal waiters on writeback to use errseq_t for
      detecting errors instead of relying on the AS_* flags. I have some
      draft patches for this for ext4, but they are not quite ready for
      prime time yet.

  This was a discussion topic this year at LSF/MM too. If you're
  interested in the gory details, LWN has some good articles about this:

      https://lwn.net/Articles/718734/
      https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  btrfs: minimal conversion to errseq_t writeback error reporting on fsync
  xfs: minimal conversion to errseq_t writeback error reporting
  ext4: use errseq_t based error handling for reporting data writeback errors
  fs: convert __generic_file_fsync to use errseq_t based reporting
  block: convert to errseq_t based writeback error tracking
  dax: set errors in mapping when writeback fails
  Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
  fs: new infrastructure for writeback error handling and reporting
  lib: add errseq_t type and infrastructure for handling it
  mm: don't TestClearPageError in __filemap_fdatawait_range
  mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
  jbd2: don't clear and reset errors after waiting on writeback
  buffer: set errors in mapping at the time that the error occurs
  fs: check for writeback errors after syncing out buffers in generic_file_fsync
  buffer: use mapping_set_error instead of setting the flag
  mm: fix mapping_set_error call in me_pagecache_dirty

xfs: don't crash on unexpected holes in dir/attr btrees

In quite a few places we call xfs_da_read_buf with a mappedbno that we
don't control, then assume that the function passes back either an error
code or a buffer pointer.  Unfortunately, if mappedbno == -2 and bno
maps to a hole, we get a return code of zero and a NULL buffer, which
means that we crash if we actually try to use that buffer pointer.  This
happens immediately when we set the buffer type for transaction context.

Therefore, check that we have no error code and a non-NULL bp before
trying to use bp.  This patch is a follow-up to an incomplete fix in
96a3aefb8ffde231 ("xfs: don't crash if reading a directory results in an
unexpected hole").

Signed-off-by: Darrick J. Wong <[email protected]>

Merge tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull Writeback error handling fixes from Jeff Layton:
"The main rationale for all of these changes is to tighten up writeback
  error reporting to userland. There are many ways now that writeback
  errors can be lost, such that fsync/fdatasync/msync return 0 when
  writeback actually failed.

  This pile contains a small set of cleanups and writeback error
  handling fixes that I was able to break off from the main pile (#2).

  Two of the patches in this pile are trivial. The exceptions are the
  patch to fix up error handling in write_one_page, and the patch to
  make JFS pay attention to write_one_page errors"

* tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fs: remove call_fsync helper function
  mm: clean up error handling in write_one_page
  JFS: do not ignore return code from write_one_page()
  mm: drop "wait" parameter from write_one_page()

Merge tag 'cifs-bug-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"First set of CIFS/SMB3 fixes for the merge window. Also improves POSIX
  character mapping for SMB3"

* tag 'cifs-bug-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: fix circular locking dependency
  cifs: set oparms.create_options rather than or'ing in CREATE_OPEN_BACKUP_INTENT
  cifs: Do not modify mid entry after submitting I/O in cifs_call_async
  CIFS: add SFM mapping for 0x01-0x1F
  cifs: hide unused functions
  cifs: Use smb 2 - 3 and cifsacl mount options getacl functions
  cifs: prototype declaration and definition for smb 2 - 3 and cifsacl mount options
  CIFS: add CONFIG_CIFS_DEBUG_KEYS to dump encryption keys
  cifs: set mapping error when page writeback fails in writepage or launder_pages
  SMB3: Enable encryption for SMB3.1.1

Merge tag 'gfs2-4.13.fixes.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 fix from Bob Peterson:
"Sorry for the additional merge request, but Andreas discovered this
  problem soon after you processed our last gfs2 merge.

  This fixes a regression introduced by a patch we did in mid-2015
  (commit 88ffbf3e037e: "GFS2: Use resizable hash table for glocks"), so
  best to get it fixed. Some code was reverted that should not have
  been.

  The patch from Andreas Gruenbacher just re-adds code that had been
  there originally"

* tag 'gfs2-4.13.fixes.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix glock rhashtable rcu bug

dentry name snapshots

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified). In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
struct name_snapshot s;

take_dentry_name_snapshot(&s, dentry);
...
access s.name
...
release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.

Signed-off-by: Al Viro <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull security layer fixes from James Morris:
"Bugfixes for TPM and SELinux"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  IB/core: Fix static analysis warning in ib_policy_change_task
  IB/core: Fix uninitialized variable use in check_qp_port_pkey_settings
  tpm: do not suspend/resume if power stays on
  tpm: use tpm2_pcr_read() in tpm2_do_selftest()
  tpm: use tpm_buf functions in tpm2_pcr_read()
  tpm_tis: make ilb_base_addr static
  tpm: consolidate the TPM startup code
  tpm: Enable CLKRUN protocol for Braswell systems
  tpm/tpm_crb: fix priv->cmd_size initialisation
  tpm: fix a kernel memory leak in tpm-sysfs.c
  tpm: Issue a TPM2_Shutdown for TPM2 devices.
  Add "shutdown" to "struct class".

net: Update networking MAINTAINERS entry.

James and Patrick haven't been active in years.

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild thin archives updates from Masahiro Yamada:
"Thin archives migration by Nicholas Piggin.

  THIN_ARCHIVES has been available for a while as an optional feature
  only for PowerPC architecture, but we do not need two different
  intermediate-artifact schemes.

  Using thin archives instead of conventional incremental linking has
  various advantages:

   - save disk space for builds

   - speed-up building a little

   - fix some link issues (for example, allyesconfig on ARM) due to more
     flexibility for the final linking

   - work better with dead code elimination we are planning

  As discussed before, this migration has been done unconditionally so
  that any problems caused by this will show up with "git bisect".

  With testing with 0-day and linux-next, some architectures actually
  showed up problems, but they were trivial and all fixed now"

* tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  tile: remove unneeded extra-y in Makefile
  kbuild: thin archives make default for all archs
  x86/um: thin archives build fix
  tile: thin archives fix linking
  ia64: thin archives fix linking
  sh: thin archives fix linking
  kbuild: handle libs-y archives separately from built-in.o archives
  kbuild: thin archives use P option to ar
  kbuild: thin archives final link close --whole-archives option
  ia64: remove unneeded extra-y in Makefile.gate
  tile: fix dependency and .*.cmd inclusion for incremental build
  sparc64: Use indirect calls in hamming weight stubs

Merge tag 'kbuild-misc-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull misc Kbuild updates from Masahiro Yamada:

- Use more portable shebang for Perl scripts

- Remove trailing spaces from GCC version in kernel log

- Make initramfs generation deterministic

* tag 'kbuild-misc-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: create deterministic initramfs directory listings
  scripts/mkcompile_h: Remove trailing spaces from compiler version
  scripts: Switch to more portable Perl shebang

Merge tag 'kbuild-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- Clean up Makefiles and scripts

- Improve clang support

- Remove unneeded genhdr-y syntax

- Remove unneeded cc-option-align macro

- Introduce __cc-option macro and use it to fix x86 boot code compiler
   flags

* tag 'kbuild-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: improve comments on KBUILD_SRC
  x86/build: Specify stack alignment for clang
  x86/build: Use __cc-option for boot code compiler options
  kbuild: Add __cc-option macro
  kbuild: remove cc-option-align
  kbuild: replace genhdr-y with generated-y
  kbuild: clang: Disable 'address-of-packed-member' warning
  kbuild: remove duplicated arch/*/include/generated/uapi include path
  kbuild: speed up checksyscalls.sh
  kbuild: simplify silent build (-s) detection

Merge tag 'linux-kselftest-4.13-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest updates from Shuah Khan:
"This update consists of:

   - TAP13 framework and changes to some tests to convert to TAP13.
     Converting kselftest output to standard format will help identify
     run to run differences and pin point failures easily. TAP13 format
     has been in use for several years and the output is human friendly.

     Please find the specification:
       https://testanything.org/tap-version-13-specification.html

     Credit goes to Tim Bird for recommending TAP13 as a suitable
     format, and to Grag KH for kick starting the work with help from
     Paul Elder and Alice Ferrazzi

     The first phase of the TAp13 conversion is included in this update.
     Future updates will include updates to rest of the tests.

   - Masami Hiramatsu fixed ftrace to run on 4.9 stable kernels.

   - Kselftest documnetation has been converted to ReST format. Document
     now has a new home under Documentation/dev-tools.

   - kselftest_harness.h is now available for general use as a result of
     Mickaël Salaün's work.

   - Several fixes to skip and/or fail tests gracefully on older
     releases"

* tag 'linux-kselftest-4.13-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (48 commits)
  selftests: membarrier: use ksft_* var arg msg api
  selftests: breakpoints: breakpoint_test_arm64: convert test to use TAP13
  selftests: breakpoints: step_after_suspend_test use ksft_* var arg msg api
  selftests: breakpoint_test: use ksft_* var arg msg api
  kselftest: add ksft_print_msg() function to output general information
  kselftest: make ksft_* output functions variadic
  selftests/capabilities: Fix the test_execve test
  selftests: intel_pstate: add .gitignore
  selftests: fix memory-hotplug test
  selftests: add missing test name in memory-hotplug test
  selftests: check percentage range for memory-hotplug test
  selftests: check hot-pluggagble memory for memory-hotplug test
  selftests: typo correction for memory-hotplug test
  selftests: ftrace: Use md5sum to take less time of checking logs
  tools/testing/selftests/sysctl: Add pre-check to the value of writes_strict
  kselftest.rst: do some adjustments after ReST conversion
  selftest/net/Makefile: Specify output with $(OUTPUT)
  selftest/intel_pstate/aperf: Use LDLIBS instead of LDFLAGS
  selftest/memfd/Makefile: Fix build error
  selftests: lib: Skip tests on missing test modules
  ...

Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
"Openrisc fixes for this 4.13 merge window, there is not really much
  here:

   - include cleanups, one with should reduce build time slightly

   - switch to new toolchain to new (>2 year old) toolchain prefix"

* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
  openrisc: defconfig: Cleanup from old Kconfig options
  openrisc: explicitly include linux/bug.h in asm/fixmap.h
  openrisc: Switch to use export.h instead of module.h
  openrisc: Change toolchain from or32- to or1k-

Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
"Highlights include:

   - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.

   - Platform support for FSP2 (476fpe) board

   - Enable ZONE_DEVICE on 64-bit server CPUs.

   - Generic & powerpc spin loop primitives to optimise busy waiting

   - Convert VDSO update function to use new update_vsyscall() interface

   - Optimisations to hypercall/syscall/context-switch paths

   - Improvements to the CPU idle code on Power8 and Power9.

  As well as many other fixes and improvements.

  Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
  Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
  Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
  Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
  Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
  Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
  Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
  Bauermann, Yang Li"

* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
  powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/lib/code-patching: Use alternate map for patch_instruction()
  powerpc/xmon: Add patch_instruction() support for xmon
  powerpc/kprobes/optprobes: Use patch_instruction()
  powerpc/kprobes: Move kprobes over to patch_instruction()
  powerpc/mm/radix: Fix execute permissions for interrupt_vectors
  powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
  powerpc/64s: Blacklist rtas entry/exit from kprobes
  powerpc/64s: Blacklist functions invoked on a trap
  powerpc/64s: Un-blacklist system_call() from kprobes
  powerpc/64s: Move system_call() symbol to just after setting MSR_EE
  powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
  powerpc/64s: Convert .L__replay_interrupt_return to a local label
  powerpc64/elfv1: Only dereference function descriptor for non-text symbols
  cxl: Export library to support IBM XSL
  powerpc/dts: Use #include "..." to include local DT
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
  ...

vfs: fix flock compat thinko

Michael Ellerman reported that commit 8c6657cb50cb ("Switch flock
copyin/copyout primitives to copy_{from,to}_user()") broke his
networking on a bunch of PPC machines (64-bit kernel, 32-bit userspace).

The reason is a brown-paper bug by that commit, which had the arguments
to "copy_flock_fields()" in the wrong order, breaking the compat
handling for file locking. Apparently very few people run 32-bit user
space on x86 any more, so the PPC people got the honor of noticing this
"feature".

Michael also sent a minimal diff that just changed the order of the
arguments in that macro.

This is not that minimal diff.

This not only changes the order of the arguments in the macro, it also
changes them to be pointers (to be consistent with all the other uses of
those pointers), and makes the functions that do all of this also have
the proper "const" attribution on the source pointers in order to make
issues like that (using the source as a destination) be really obvious.

Reported-by: Michael Ellerman <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'usb-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some remaining USB fixes for 4.13-rc1. They were originally
  scheduled for 4.12-final, but I didn't send them to you in time.
  Because of that, they were in a separate branch from the larger USB
  set of patches, so here they are in a separate pull request.

  Nothing major here a all, just three small patches:

   - some usb-serial new device ids
   - xhci bugfix for some crazy AMD hardware

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'usb-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: Limit USB2 port wake support for AMD Promontory hosts
  USB: serial: qcserial: new Sierra Wireless EM7305 device ID
  USB: serial: option: add two Longcheer device ids

Merge tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
"Core Framework:
   - Report correct error status to user

  Fix-ups:
   - Move Backlight headers out of I2C (adp8860, adp8870)"

* tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  video: adp8870: move header file out of I2C realm
  backlight: adp8860: Move header file out of I2C realm
  backlight: Report error on failure

Merge (most of) tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
"New Drivers:
   - Intel Cherry Trail Whiskey Cove PMIC
   - TI LP87565 PMIC

  New Device Support:
   - Add support for Cannonlake to intel-lpss-pci
   - Add support for Simatic IOT2000 to intel_quark_i2c_gpio

  New Functionality:
   - Add Regulator support (axp20x)

  Fix-ups:
   - Rework IRQ handling (intel_soc_pmic_bxtwc, rtsx_pcr, cros_ec)
   - Remove unused/unwelcome code (ipaq-micro, wm831x-core, da9062-core)
   - Provide deregistration on unbind (rn5t618)
   - Rework DT code/documentation (arizona)
   - Constify things (fsl-imx25-tsadc)
   - MAINTAINERS updates (DA9062/61)
   - Kconfig configuration adaptions (INTEL_SOC_PMIC, MFD_AXP20X_I2C)
   - Switch to DMI matching (intel_quark_i2c_gpio)
   - Provide an appropriate level of error checking (wm831x-{i2c,spi},
     twl4030-irq, tc6393xb)
   - Make use of devm_* (resource handling) calls (intel_soc_pmic_bxtwc,
     stm32-timers, atmel-flexcom, cros_ec, fsl-imx25-tsadc,
     exynos-lpass, palmas, qcom-spmi-pmic, smsc-ece1099,
     motorola-cpcap)"

[ Skipped the last commit in that series that added eight thousand
  lines of pointless repeated register definitions.  - Linus ]

* tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (38 commits)
  mfd: Add LP87565 PMIC support
  mfd: cros_ec: Free IRQ on exit
  dt-bindings: vendor-prefixes: Add arctic to vendor prefix
  mfd: da9061: Fix to remove BBAT_CONT register from chip model
  mfd: da9061: Fix to remove BBAT_CONT register from chip model
  mfd: axp20x-i2c: Document that this must be builtin on x86
  mfd: Add Cherry Trail Whiskey Cove PMIC driver
  mfd: tc6393xb: Handle return value of clk_prepare_enable
  mfd: intel_quark_i2c_gpio: Add support for SIMATIC IOT2000 platform
  mfd: intel_quark_i2c_gpio: Use dmi_system_id table for retrieving frequency
  mfd: motorola-cpcap: Use devm_of_platform_populate()
  mfd: smsc-ece: Use devm_of_platform_populate()
  mfd: qcom-spmi-pmic: Use devm_of_platform_populate()
  mfd: palmas: Use devm_of_platform_populate()
  mfd: exynos: Use devm_of_platform_populate()
  mfd: fsl-imx25: Use devm_of_platform_populate()
  mfd: cros_ec: Use devm_of_platform_populate()
  mfd: atmel: Use devm_of_platform_populate()
  mfd: stm32-timers: Use devm_of_platform_populate()
  mfd: intel_soc_pmic: Select designware i2c-bus driver
  ...

Merge tag 'gpio-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.13 series.

  Some administrativa:

  I have a slew of 8250 serial patches and the new IOT2040 serial+GPIO
  driver coming in through this tree, along with a whole bunch of Exar
  8250 fixes. These are ACKed by Greg and also hit drivers/platform/*
  where they are ACKed by Andy Shevchenko.

  Speaking about drivers/platform/* there is also a bunch of ACPI stuff
  coming through that route, again ACKed by Andy.

  The MCP23S08 changes are coming in here as well. You already have the
  commits in your tree, so this is just a result of sharing an immutable
  branch between pin control and GPIO.

  Core:
   - Export add/remove for lookup tables so that modules can export GPIO
     descriptor tables.
   - Handle GPIO sleep states: it is now possible to flag that a GPIO
     line may loose its state during suspend/resume of the system to
     save power. This is used in the Wolfson Micro Arizona driver.
   - ACPI-based GPIO was tightened up a lot around the edges.
   - Use bitmap_fill() to speed up a loop.

  New drivers:
   - Exar XRA1403 SPI-based GPIO.
   - MVEBU driver now supports Armada 7K and 8K.
   - LP87565 PMIC GPIO.
   - Renesas R-CAR R8A7743 (RZ/G1M).
   - The new IOT2040 8250 serial/GPIO also comes in through this
     changeset.

  Substantial driver changes:
   - Seriously fix the Exar 8250 GPIO portions to work.
   - The MCP23S08 was moved out to a pin control driver.
   - Convert MEVEBU to use regmap for register access.
   - Drop Vulcan support from the Broadcom driver.
   - Serious cleanup and improvement of the mockup driver, giving us a
     better test coverage.

  Misc:
   - Lots of janitorial clean up.
   - A bunch of documentation fixes"

* tag 'gpio-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (70 commits)
  serial: exar: Add support for IOT2040 device
  gpio-exar/8250-exar: Make set of exported GPIOs configurable
  platform: Accept const properties
  serial: exar: Factor out platform hooks
  gpio-exar/8250-exar: Rearrange gpiochip parenthood
  gpio: exar: Fix iomap request
  gpio-exar/8250-exar: Do not even instantiate a GPIO device for Commtech cards
  serial: uapi: Add support for bus termination
  gpio: rcar: Add R8A7743 (RZ/G1M) support
  gpio: gpio-wcove: Fix GPIO control register offset calculation
  gpio: lp87565: Add support for GPIO
  gpio: dwapb: fix missing first irq for edgeboth irq type
  MAINTAINERS: Take maintainership for GPIO ACPI support
  gpio: exar: Fix reading of directions and values
  gpio: exar: Allocate resources on behalf of the platform device
  gpio-exar/8250-exar: Fix passing in of parent PCI device
  gpio: mockup: use devm_kcalloc() where applicable
  gpio: mockup: add myself as author
  gpio: mockup: improve the error message
  gpio: mockup: don't return magic numbers from probe()
  ...

openrisc: defconfig: Cleanup from old Kconfig options

Remove old, dead Kconfig option INET_LRO. It is gone since
commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library").

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>

openrisc: explicitly include linux/bug.h in asm/fixmap.h

openrisc's asm/fixmap.h uses the BUG() and BUG_ON() macros but relies on
implict inclusion of linux/bug.h which means that changes in other
headers could break the build. Thus, add an explicit include.

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
"This time we've got one core change to introduce a bulk clk_get API,
  some new clk drivers and updates for old ones. The diff is pretty
  spread out across a handful of different SoC clk drivers for Broadcom,
  TI, Qualcomm, Renesas, Rockchip, Samsung, and Allwinner, mostly due to
  the introduction of new drivers.

  Core:
   - New clk bulk get APIs
   - Clk divider APIs gained the ability to consider a different parent
     than the current one

  New Drivers:
   - Renesas r8a779{0,1,2,4} CPG/MSSR
   - TI Keystone SCI firmware controlled clks and OMAP4 clkctrl
   - Qualcomm IPQ8074 SoCs
   - Cortina Systems Gemini (SL3516/CS3516)
   - Rockchip rk3128 SoCs
   - Allwinner A83T clk control units
   - Broadcom Stingray SoCs
   - CPU clks for Mediatek MT8173/MT2701/MT7623 SoCs

  Removed Drivers:
   - Old non-DT version of the Realview clk driver

  Updates:
   - Renesas Kconfig/Makefile cleanups
   - Amlogic CEC EE clk support
   - Improved Armada 7K/8K cp110 clk support
   - Rockchip clk id exposing, critical clk markings
   - Samsung converted to clk_hw registration APIs
   - Fixes for Samsung exynos5420 audio clks
   - USB2 clks for Hisilicon hi3798cv200 SoC and video/camera clks for
     hi3660"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (147 commits)
  clk: gemini: Read status before using the value
  clk: scpi: error when clock fails to register
  clk: at91: Add sama5d2 suspend/resume
  gpio: dt-bindings: Add documentation for gpio controllers on Armada 7K/8K
  clk: keystone: TI_SCI_PROTOCOL is needed for clk driver
  clk: samsung: audss: Fix silent hang on Exynos4412 due to disabled EPLL
  clk: uniphier: provide NAND controller clock rate
  clk: hisilicon: add usb2 clocks for hi3798cv200 SoC
  clk: Add Gemini SoC clock controller
  clk: iproc: Remove __init marking on iproc_pll_clk_setup()
  clk: bcm: Add clocks for Stingray SOC
  dt-bindings: clk: Extend binding doc for Stingray SOC
  clk: mediatek: export cpu multiplexer clock for MT8173 SoCs
  clk: mediatek: export cpu multiplexer clock for MT2701/MT7623 SoCs
  clk: mediatek: add missing cpu mux causing Mediatek cpufreq can't work
  clk: renesas: cpg-mssr: Use of_device_get_match_data() helper
  clk: hi6220: add acpu clock
  clk: zx296718: export I2S mux clocks
  clk: imx7d: create clocks behind rawnand clock gate
  clk: hi3660: Set PPLL2 to 2880M
  ...

lightnvm: pblk: remove unnecessary checks

Remove unnecessary checks when freeing dma memory in the completion
path.

Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

lightnvm: pblk: control I/O flow also on tear down

When removing a pblk instance, control the write I/O flow to the
controller as we do in the fast path.

Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

virtio-net: fix leaking of ctx array

Fixes: commit d45b897b11ea ("virtio_net: allow specifying context for rx")
Signed-off-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'pci/host-tango' into next

* pci/host-tango:
PCI: tango: Add Sigma Designs Tango SMP8759 PCIe host bridge support
PCI: Add DT binding for Sigma Designs Tango PCIe controller

Conflicts:
drivers/pci/host/Kconfig
drivers/pci/host/Makefile

PCI: tango: Add Sigma Designs Tango SMP8759 PCIe host bridge support

This driver is required to work around several hardware bugs in the PCIe
controller.

The SMP8759 does not support legacy interrupts or IO space.

Signed-off-by: Marc Gonzalez <[email protected]>
[bhelgaas: add CONFIG_BROKEN dependency, various cleanups]
Signed-off-by: Bjorn Helgaas <[email protected]>

gfs2: Fix glock rhashtable rcu bug

Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu. This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe. Bring back the original
code for freeing glocks via call_rcu.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Bob Peterson <[email protected]>
Cc: [email protected] # 4.3+

f2fs: avoid deadlock caused by lock order of page and lock_op

- punch_hole
- fill_zero
  - f2fs_lock_op
  - get_new_data_page
   - lock_page

- f2fs_write_data_pages
- lock_page
- do_write_data_page
  - f2fs_lock_op

Signed-off-by: Jaegeuk Kim <[email protected]>

Merge tag 'devicetree-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

- vsprintf format specifier %pOF for device_node's. This will enable us
   to stop storing the full node names. Conversion of users will happen
   next cycle.

- Update documentation to point to DT specification instead of ePAPR.

- Split out graph and property functions to a separate file.

- New of-graph functions for ALSA

- Add vendor prefixes for RISC-V, Linksys, iWave Systems, Roofull,
   Itead, and BananaPi.

- Improve dtx_diff utility filename printing.

* tag 'devicetree-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (32 commits)
  of: document /sys/firmware/fdt
  dt-bindings: Add RISC-V vendor prefix
  vsprintf: Add %p extension "%pOF" for device tree
  of: find_node_by_full_name rewrite to compare each level
  of: use kbasename instead of open coding
  dt-bindings: thermal: add file extension to brcm,ns-thermal
  of: update ePAPR references to point to Devicetree Specification
  scripts/dtc: dtx_diff - Show real file names in diff header
  of: detect invalid phandle in overlay
  of: be consistent in form of file mode
  of: make __of_attach_node() static
  of: address.c header comment typo
  of: fdt.c header comment typo
  of: make of_fdt_is_compatible() static
  dt-bindings: display-timing.txt convert non-ascii characters to ascii
  Documentation: remove overlay-notes reference to non-existent file
  dt-bindings: usb: exynos-usb: Add missing required VDD properties
  dt-bindings: Add vendor prefix for Linksys
  MAINTAINERS: add device tree ABI documentation file
  of: Add vendor prefix for iWave Systems Technologies Pvt. Ltd
  ...

f2fs: use spin_{,un}lock_irq{save,restore}

generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

irq event stamp: 83391573
loop: Write error at byte offset 438729728, length 1024.
hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
loop: Write error at byte offset 438860288, length 1536.
softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
loop: Write error at byte offset 438990848, length 2048.
================================
WARNING: inconsistent lock state
4.12.0-rc2+ #30 Tainted: G           O
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
{HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
irq event stamp: 1957420
hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

stack backtrace:
CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
EIP: _raw_spin_unlock_irq+0x2d/0x50
EFLAGS: 00000296 CPU: 2
EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7786c61
EFLAGS: 00000293 CPU: 2
EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: [email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: relax migratepage for atomic written page

In order to avoid lock contention for atomic written pages, we'd better give
EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be
released soon as transaction commits.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: don't count inode block in in-memory inode.i_blocks

Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.

This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

Revert "f2fs: fix to clean previous mount option when remount_fs"

Don't clear old mount option before parse new option during ->remount_fs
like other generic filesystems.

This reverts commit 26666c8a4366debae30ae37d0688b2bec92d196a.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: do not set LOST_PINO for renamed dir

After renaming a directory, fsck could detect unmatched pino. The scenario
can be reproduced as the following:

$ mkdir /bar/subbar /foo
$ rename /bar/subbar /foo

Then fsck will report:
[ASSERT] (__chk_dots_dentries:1182) --> Bad inode number[0x3] for '..', parent parent ino is [0x4]

Rename sets LOST_PINO for old_inode. However, the flag cannot be cleared,
since dir is written back with CP. So, let's get rid of LOST_PINO for a
renamed dir and fix the pino directly at the end of rename.

Signed-off-by: Sheng Yong <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: do not set LOST_PINO for newly created dir

Since directories will be written back with checkpoint and fsync a
directory will always write CP, there is no need to set LOST_PINO
after creating a directory.

Signed-off-by: Sheng Yong <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: skip ->writepages for {mete,node}_inode during recovery

Skip ->writepages in prior to ->writepage for {meta,node}_inode during
recovery, hence unneeded loop in ->writepages can be avoided.

Moreover, check SBI_POR_DOING earlier while writebacking pages.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce __check_sit_bitmap

After we introduce discard thread, discard command can be issued
concurrently with data allocating, this patch adds new function to
heck sit bitmap to ensure that userdata was invalid in which on-going
discard command covered.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: stop gc/discard thread in prior during umount

This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on

f2fs: add f2fs_bug_on in __remove_discard_cmd

For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
avoid referring and releasing race among them.

Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: introduce reserved_blocks in sysfs

In this patch, we add a new sysfs interface, with it, we can control
number of reserved blocks in system which could not be used by user,
it enable f2fs to let user to configure for adjusting over-provision
ratio dynamically instead of changing it by mkfs.

So we can expect it will help to reserve more free space for relieving
GC in both filesystem and flash device.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: avoid redundant f2fs_flush after remount

create_flush_cmd_control will create redundant issue_flush_thread after each
remount with flush_merge option.

Signed-off-by: Yunlong Song <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: report # of free inodes more precisely

If the partition is small, we don't need to report total # of inodes including
hidden free nodes.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration

Pull mailbox updates from Jassi Brar:

- Minor improvement : avoid requiring unnecessary startup/shutdown
   callback that many drivers seem to not need

- New controller driver for Qualcomm's APCS IPC

* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: Introduce Qualcomm APCS IPC driver
  dt-bindings: mailbox: Introduce Qualcomm APCS global binding
  mailbox: Make startup and shutdown ops optional

Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
"libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with ->copy_from_iter() and ->flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  <[email protected]>"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...

xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN

XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days. Unfortunately, the constant establishing
this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.

The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.

Since this is an on-disk format constraint, put the define in the XFS
namespace and move everything over to use the new name.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS

All four SERVER_LUMINOUS feature bits are implemented, switch it on!

NEW_OSDOP_ENCODING doesn't mean much for the client (it signifies
support for MOSDOp v6) but needs to be enabled in order to get the
latest (currently v25) pg_pool_t.

Signed-off-by: Ilya Dryomov <[email protected]>
Acked-by: Sage Weil <[email protected]>

libceph: osd_state is 32 bits wide in luminous

Signed-off-by: Ilya Dryomov <[email protected]>

crush: remove an obsolete comment

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.

Signed-off-by: Ilya Dryomov <[email protected]>

crush: crush_init_workspace starts with struct crush_work

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.

Signed-off-by: Ilya Dryomov <[email protected]>

crush: implement weight and id overrides for straw2

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: apply_upmap()

Previously, pg_to_raw_osds() didn't filter for existent OSDs because
raw_to_up_osds() would filter for "up" ("up" is predicated on "exists")
and raw_to_up_osds() was called directly after pg_to_raw_osds(). Now,
with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds()
output can affect apply_upmap(). Introduce remove_nonexistent_osds()
to deal with that.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: compute actual pgid in ceph_pg_to_up_acting_osds()

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: pg_upmap[_items] infrastructure

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: ceph_decode_skip_* helpers

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: kill __{insert,lookup,remove}_pg_mapping()

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: introduce and switch to decode_pg_mapping()

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: don't pass pgid by value

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: respect RADOS_BACKOFF backoffs

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: make DEFINE_RB_* helpers more general

Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id,
compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare()
respectively.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: avoid unnecessary pi lookups in calc_target()

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: use target pi for calc_target() calculations

For luminous and beyond we are encoding the actual spgid, which
requires operating with the correct pg_num, i.e. that of the target
pool.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: always populate t->target_{oid,oloc} in calc_target()

need_check_tiering logic doesn't make a whole lot of sense. Drop it
and apply tiering unconditionally on every calc_target() call instead.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: make sure need_resend targets reflect latest map

Otherwise we may miss events like PG splits, pool deletions, etc when
we get multiple incremental maps at once. Because check_pool_dne() can
now be fed an unlinked request, finish_request() needed to be taught to
handle unlinked requests.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: delete from need_resend_linger before check_linger_pool_dne()

When processing a map update consisting of multiple incrementals, we
may end up running check_linger_pool_dne() on a lingering request that
was previously added to need_resend_linger list.  If it is concluded
that the target pool doesn't exist, the request is killed off while
still on need_resend_linger list, which leads to a crash on a NULL
lreq->osd in kick_requests():

    libceph: linger_id 18446462598732840961 pool does not exist
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: ceph_osdc_handle_map+0x4ae/0x870

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: resend on PG splits if OSD has RESEND_ON_SPLIT

Note that ceph_osd_request_target fields are updated regardless of
RESEND_ON_SPLIT.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: drop need_resend from calc_target()

Replace it with more fine-grained bools to separate updating
ceph_osd_request_target fields and the decision to resend.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: MOSDOp v8 encoding (actual spgid + full hash)

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: ceph_connection_operations::reencode_message() method

Give upper layers a chance to reencode the message after the connection
is negotiated and ->peer_features is set. OSD client will use this to
support both luminous and pre-luminous OSDs (in a single cluster): the
former need MOSDOp v8; the latter will continue to be sent MOSDOp v4.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: encode_{pgid,oloc}() helpers

Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: introduce ceph_spg, ceph_pg_to_primary_shard()

Store both raw pgid and actual spgid in ceph_osd_request_target.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: new pi->last_force_request_resend

The old (v15) pi->last_force_request_resend has been repurposed to
make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do
obey pi->last_force_request_resend resend on splits. See ceph.git
commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend
ops on split").

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: fold [l]req->last_force_resend into ceph_osd_request_target

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: support SERVER_JEWEL feature bits

Only MON_STATEFUL_SUB, really. MON_ROUTE_OSDMAP and
OSDSUBOP_NO_SNAPCONTEXT are irrelevant.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: advertise support for OSD_POOLRESEND

The code has been in place since commit 63244fa123a7 ("libceph:
introduce ceph_osd_request_target, calc_target()"), and, with the
ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now
in working order.

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: handle non-empty dest in ceph_{oloc,oid}_copy()

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: new features macros

Signed-off-by: Ilya Dryomov <[email protected]>

libceph: remove ceph_sanitize_features() workaround

Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a.

Signed-off-by: Ilya Dryomov <[email protected]>

ceph: update ceph_dentry_info::lease_session when necessary

Current code does not update ceph_dentry_info::lease_session once
it is set. If auth mds of corresponding dentry changes, dentry lease
keeps in an invalid state.

Signed-off-by: "Yan, Zheng" <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: new mount option that specifies fscache uniquifier

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.

Signed-off-by: "Yan, Zheng" <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: avoid accessing freeing inode in ceph_check_delayed_caps()

Signed-off-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>