Git Repo - linux.git/log

rxrpc: Express protocol timeouts in terms of RTT

Express protocol timeouts for data retransmission and deferred ack
generation in terms on RTT rather than specified timeouts once we have
sufficient RTT samples.

For the moment, this requires just one RTT sample to be able to use this
for ack deferral and two for data retransmission.

The data retransmission timeout is set at RTT*1.5 and the ACK deferral
timeout is set at RTT.

Note that the calculated timeout is limited to a minimum of 4ns to make
sure it doesn't happen too quickly.

Signed-off-by: David Howells <[email protected]>

rxrpc: Don't transmit DELAY ACKs immediately on proposal

Don't transmit a DELAY ACK immediately on proposal when the Rx window is
rotated, but rather defer it to the work function. This means that we have
a chance to queue/consume more received packets before we actually send the
DELAY ACK, or even cancel it entirely, thereby reducing the number of
packets transmitted.

We do, however, want to continue sending other types of packet immediately,
particularly REQUESTED ACKs, as they may be used for RTT calculation by the
other side.

Signed-off-by: David Howells <[email protected]>

rxrpc: Fix call timeouts

Fix the rxrpc call expiration timeouts and make them settable from
userspace.  By analogy with other rx implementations, there should be three
timeouts:

(1) "Normal timeout"

     This is set for all calls and is triggered if we haven't received any
     packets from the peer in a while.  It is measured from the last time
     we received any packet on that call.  This is not reset by any
     connection packets (such as CHALLENGE/RESPONSE packets).

     If a service operation takes a long time, the server should generate
     PING ACKs at a duration that's substantially less than the normal
     timeout so is to keep both sides alive.  This is set at 1/6 of normal
     timeout.

(2) "Idle timeout"

     This is set only for a service call and is triggered if we stop
     receiving the DATA packets that comprise the request data.  It is
     measured from the last time we received a DATA packet.

(3) "Hard timeout"

     This can be set for a call and specified the maximum lifetime of that
     call.  It should not be specified by default.  Some operations (such
     as volume transfer) take a long time.

Allow userspace to set/change the timeouts on a call with sendmsg, using a
control message:

RXRPC_SET_CALL_TIMEOUTS

The data to the message is a number of 32-bit words, not all of which need
be given:

u32 hard_timeout; /* sec from first packet */
u32 idle_timeout; /* msec from packet Rx */
u32 normal_timeout; /* msec from data Rx */

This can be set in combination with any other sendmsg() that affects a
call.

Signed-off-by: David Howells <[email protected]>

rxrpc: Split the call params from the operation params

When rxrpc_sendmsg() parses the control message buffer, it places the
parameters extracted into a structure, but lumps together call parameters
(such as user call ID) with operation parameters (such as whether to send
data, send an abort or accept a call).

Split the call parameters out into their own structure, a copy of which is
then embedded in the operation parameters struct.

The call parameters struct is then passed down into the places that need it
instead of passing the individual parameters. This allows for extra call
parameters to be added.

Signed-off-by: David Howells <[email protected]>

rxrpc: Delay terminal ACK transmission on a client call

Delay terminal ACK transmission on a client call by deferring it to the
connection processor. This allows it to be skipped if we can send the next
call instead, the first DATA packet of which will implicitly ack this call.

Signed-off-by: David Howells <[email protected]>

rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls

Provide a different lockdep key for rxrpc_call::user_mutex when the call is
made on a kernel socket, such as by the AFS filesystem.

The problem is that lockdep registers a false positive between userspace
calling the sendmsg syscall on a user socket where call->user_mutex is held
whilst userspace memory is accessed whereas the AFS filesystem may perform
operations with mmap_sem held by the caller.

In such a case, the following warning is produced.

======================================================
WARNING: possible circular locking dependency detected
4.14.0-fscache+ #243 Tainted: G            E
------------------------------------------------------
modpost/16701 is trying to acquire lock:
(&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]

but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++}:
       __might_fault+0x61/0x89
       _copy_from_iter_full+0x40/0x1fa
       rxrpc_send_data+0x8dc/0xff3
       rxrpc_do_sendmsg+0x62f/0x6a1
       rxrpc_sendmsg+0x166/0x1b7
       sock_sendmsg+0x2d/0x39
       ___sys_sendmsg+0x1ad/0x22b
       __sys_sendmsg+0x41/0x62
       do_syscall_64+0x89/0x1be
       return_from_SYSCALL_64+0x0/0x75

-> #2 (&call->user_mutex){+.+.}:
       __mutex_lock+0x86/0x7d2
       rxrpc_new_client_call+0x378/0x80e
       rxrpc_kernel_begin_call+0xf3/0x154
       afs_make_call+0x195/0x454 [kafs]
       afs_vl_get_capabilities+0x193/0x198 [kafs]
       afs_vl_lookup_vldb+0x5f/0x151 [kafs]
       afs_create_volume+0x2e/0x2f4 [kafs]
       afs_mount+0x56a/0x8d7 [kafs]
       mount_fs+0x6a/0x109
       vfs_kern_mount+0x67/0x135
       do_mount+0x90b/0xb57
       SyS_mount+0x72/0x98
       do_syscall_64+0x89/0x1be
       return_from_SYSCALL_64+0x0/0x75

-> #1 (k-sk_lock-AF_RXRPC){+.+.}:
       lock_sock_nested+0x74/0x8a
       rxrpc_kernel_begin_call+0x8a/0x154
       afs_make_call+0x195/0x454 [kafs]
       afs_fs_get_capabilities+0x17a/0x17f [kafs]
       afs_probe_fileserver+0xf7/0x2f0 [kafs]
       afs_select_fileserver+0x83f/0x903 [kafs]
       afs_fetch_status+0x89/0x11d [kafs]
       afs_iget+0x16f/0x4f8 [kafs]
       afs_mount+0x6c6/0x8d7 [kafs]
       mount_fs+0x6a/0x109
       vfs_kern_mount+0x67/0x135
       do_mount+0x90b/0xb57
       SyS_mount+0x72/0x98
       do_syscall_64+0x89/0x1be
       return_from_SYSCALL_64+0x0/0x75

-> #0 (&vnode->io_lock){+.+.}:
       lock_acquire+0x174/0x19f
       __mutex_lock+0x86/0x7d2
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
       __clear_user+0x3d/0x60
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       return_from_SYSCALL_64+0x0/0x75

other info that might help us debug this:

Chain exists of:
  &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&call->user_mutex);
                               lock(&mm->mmap_sem);
  lock(&vnode->io_lock);

*** DEADLOCK ***

1 lock held by modpost/16701:
#0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486

stack backtrace:
CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
dump_stack+0x67/0x8e
print_circular_bug+0x341/0x34f
check_prev_add+0x11f/0x5d4
? add_lock_to_list.isra.12+0x8b/0x8b
? add_lock_to_list.isra.12+0x8b/0x8b
? __lock_acquire+0xf77/0x10b4
__lock_acquire+0xf77/0x10b4
lock_acquire+0x174/0x19f
? afs_begin_vnode_operation+0x33/0x77 [kafs]
__mutex_lock+0x86/0x7d2
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_fetch_data+0x80/0x12a [kafs]
afs_readpages+0x314/0x405 [kafs]
__do_page_cache_readahead+0x203/0x2ba
? filemap_fault+0x179/0x54d
filemap_fault+0x179/0x54d
__do_fault+0x17/0x60
__handle_mm_fault+0x6d7/0x95c
handle_mm_fault+0x24e/0x2a3
__do_page_fault+0x301/0x486
do_page_fault+0x236/0x259
page_fault+0x22/0x30
RIP: 0010:__clear_user+0x3d/0x60
RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
padzero+0x1c/0x2b
load_elf_binary+0x785/0xdc7
search_binary_handler+0x81/0x1ff
do_execveat_common.isra.14+0x600/0x888
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x89/0x1be
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdb6009ee07
RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000

Signed-off-by: David Howells <[email protected]>

rxrpc: Don't set upgrade by default in sendmsg()

Don't set upgrade by default when creating a call from sendmsg(). This is
a holdover from when I was testing the code.

Signed-off-by: David Howells <[email protected]>

rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing

The caller of rxrpc_accept_call() must release the lock on call->user_mutex
returned by that function.

Signed-off-by: David Howells <[email protected]>

s390: fix alloc_pgste check in init_new_context again

git commit badb8bb983e9 "fix alloc_pgste check in init_new_context" fixed
the problem of 'current->mm == NULL' in init_new_context back in 2011.

git commit 3eabaee998c7 "KVM: s390: allow sie enablement for multi-
threaded programs" completely removed the check against alloc_pgste.

git commit 23fefe119ceb "s390/kvm: avoid global config of vm.alloc_pgste=1"
re-added a check against the alloc_pgste flag but without the required
check for current->mm != NULL.

For execve() called by a kernel thread init_new_context() reads from
((struct mm_struct *) NULL)->context.alloc_pgste to decide between
2K vs 4K page tables. If the bit happens to be set for the init process
it will be created with large page tables. This decision is inherited by
all the children of init, this waste quite some memory.

Re-add the check for 'current->mm != NULL'.

Fixes: 23fefe119ceb ("s390/kvm: avoid global config of vm.alloc_pgste=1")
Signed-off-by: Martin Schwidefsky <[email protected]>

s390/disassembler: correct disassembly lines alignment

176.718956 Krnl Code: 00000000004d38b0: a54c0018        llihh   %r4,24
176.718956    00000000004d38b4: b9080014        agr     %r1,%r4
           ^
Using a tab to align disassembly lines which follow the first line with
"Krnl Code: " doesn't always work, e.g. if there is a prefix (timestamp
or syslog prefix) which is not 8 chars aligned. Go back to alignment
with spaces.

Fixes: b192571d1ae3 ("s390/disassembler: increase show_code buffer size")
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

sched/debug: Fix task state recording/printout

The recent conversion of the task state recording to use task_state_index()
broke the sched_switch tracepoint task state output.

task_state_index() returns surprisingly an index (0-7) which is then
printed with __print_flags() applying bitmasks. Not really working and
resulting in weird states like 'prev_state=t' instead of 'prev_state=I'.

Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a
bitmask from the return value of task_state_index() and store it in
entry->prev_state, which makes __print_flags() work as expected.

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos
Signed-off-by: Ingo Molnar <[email protected]>

x86/decoder: Add new TEST instruction pattern

The kbuild test robot reported this build warning:

  Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c

  Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
  Warning: objdump says 3 bytes, but insn_get_length() says 2
  Warning: decoded and checked 1569014 instructions with 1 warnings

This sequence seems to be a new instruction not in the opcode map in the Intel SDM.

The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"

In that table, opcodes listed by the index REG bits as:

  000         001       010 011  100        101        110         111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX

So, it seems TEST Ib is assigned to 001.

Add the new pattern.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho.

2) bpf offload bug fixes from Jakub Kicinski.

3) Fix bpf verifier to NOP out code which is dead at run time because
    due to branch pruning the verifier will not explore such
    instructions. From Alexei Starovoitov.

4) Fix crash when deleting secondary chains in packet scheduler
    classifier. From Roman Kapl.

5) Fix buffer management bugs in smc, from Ursula Braun.

6) Fix regression in anycast route handling, from David Ahern.

7) Fix link settings regression in r8169, from Tobias Jakobi.

8) Add back enough UFO support so that live migration still works, from
    Willem de Bruijn.

9) Linearize enough packet data for the full extent to which the ipvlan
    code will inspect the packet headers, from Gao Feng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  ipvlan: Fix insufficient skb linear check for ipv6 icmp
  ipvlan: Fix insufficient skb linear check for arp
  geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
  net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
  net: accept UFO datagrams from tuntap and packet
  net: realtek: r8169: implement set_link_ksettings()
  net: ipv6: Fixup device for anycast routes during copy
  net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
  net/smc: use sk_rcvbuf as start for rmb creation
  ipv6: Do not consider linkdown nexthops during multipath
  net: sched: fix crash when deleting secondary chains
  net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
  bpf: fix branch pruning logic
  bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
  bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
  bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
  bpf: introduce ARG_PTR_TO_MEM_OR_NULL
  i40evf: Use smp_rmb rather than read_barrier_depends
  fm10k: Use smp_rmb rather than read_barrier_depends
  igb: Use smp_rmb rather than read_barrier_depends
  ...

Merge tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
"Fix two issues resulting from the dell-smbios refactoring and
  introduction of the dell-smbios-wmi dispatcher.

  The first ensures a proper error code is returned when kzalloc fails.

  The second avoids an issue in older Dell BIOS implementations which
  would fail if the more complex calls were made by limiting those
  platforms to the simple calls such as those used by the existing
  dell-laptop and dell-wmi drivers, preserving their functionality prior
  to the addition of the dell-smbios-wmi dispatcher"

* tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: dell-laptop: fix error return code in dell_init()
  platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two basic fixes: one for the sparse problem with the blacklist flags
  and another for a hang forever in bnx2i"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Use 'blist_flags_t' for scsi_devinfo flags
  scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort

Merge tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"All commits found here are small fixes for regression or stable:

   - PCM timestamp behavior fix that could be seen as a regression

   - Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl

   - HD-audio HDMI/DP channel mapping fix for 32bit archs

   - Fix the previous fix for HD-audio initialization code

   - More hardening USB-audio against malicious USB descriptors

   - HD-audio quirks/fixes (Realtek codec, AMD controller)

   - Missing help text for the recent Intel SST kconfig change"

* tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Add Raven PCI ID
  ALSA: hda/realtek - Fix ALC700 family no sound issue
  ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
  ALSA: usb-audio: Add sanity checks in v2 clock parsers
  ALSA: usb-audio: Fix potential zero-division at parsing FU
  ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
  ALSA: usb-audio: Add sanity checks to FE parser
  ALSA: timer: Remove kernel warning at compat ioctl error paths
  ALSA: pcm: update tstamp only if audio_tstamp changed
  ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon
  ALSA: hda: Fix too short HDMI/DP chmap reporting
  ALSA: usb-audio: uac1: Invalidate ctl on interrupt
  ALSA: hda/realtek - Fix ALC275 no sound issue
  ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL

Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux

Pull more drm updates from Dave Airlie:
"Fixes/cleanups for rc1, non-desktop flags for VR

   - remove the MSM dt-bindings file Rob managed to push in the previous
     pull.

   - add a property/edid quirk to denote HMD devices, I had these
     hanging around for a few weeks and Keith had done some work on
     them, they are fairly self contained and small, and only affect
     people using HTC Vive VR headsets so far.

   - amdgpu, tegra, tilcdc, fsl fixes

   - some imx-drm cleanups I missed, these seemed pretty small, and no
     reason to hold off.

  I have one TTM regression fix (fixes bochs-vga in qemu) sitting
  locally awaiting review I'll probably send that in a separate pull
  request tomorrow"

* tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits)
  dt-bindings: remove file that was added accidentally
  drm/edid: quirk HTC vive headset as non-desktop. [v2]
  drm/fb: add support for not enabling fbcon on non-desktop displays [v2]
  drm: add connector info/property for non-desktop displays [v2]
  drm/amdgpu: fix rmmod KCQ disable failed error
  drm/amdgpu: fix kernel hang when starting VNC server
  drm/amdgpu: don't skip attributes when powerplay is enabled
  drm/amd/pp: fix typecast error in powerplay.
  drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support
  drm/tegra: sor: Reimplement pad clock
  Revert "drm/radeon: dont switch vt on suspend"
  drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
  drm/amd/powerplay: fix unfreeze level smc message for smu7
  drm/amdgpu:fix memleak
  drm/amdgpu:fix memleak in takedown
  drm/amd/pp: fix dpm randomly failed on Vega10
  drm/amdgpu: set f_mapping on exported DMA-bufs
  drm/amdgpu: Properly allocate VM invalidate eng v2
  drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
  drm/fsl-dcu: avoid disabling pixel clock twice on suspend
  ...

Merge tag 'docs-4.15-2' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
"A few late-arriving docs updates that have no real reason to wait.

  There's a new "Co-Developed-by" tag described by Greg, and a build
  enhancement from Willy to generate docs warnings during a kernel build
  (but only when additional warnings have been requested in general)"

* tag 'docs-4.15-2' of git://git.lwn.net/linux:
  Add optional check for bad kernel-doc comments
  Documentation: fix profile= options in kernel-parameters.txt
  documentation/svga.txt: update outdated file
  kokr/memory-barriers.txt: Fix typo in paring example
  kokr/memory-barriers/txt: Replace uses of "transitive"
  Documentation/process: add Co-Developed-by: tag for patches with multiple authors

Merge branch 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull keys update from James Morris:
"There's nothing too controversial here:

   - Doc fix for keyctl_read().

   - time_t -> time64_t replacement.

   - Set the module licence on things to prevent tainting"

* 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  pkcs7: Set the module licence to prevent tainting
  security: keys: Replace time_t with time64_t for struct key_preparsed_payload
  security: keys: Replace time_t/timespec with time64_t
  KEYS: fix in-kernel documentation for keyctl_read()

Merge tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
"No features this time, just minor cleanups and bug fixes.

  Cleanups:
   - fix spelling mistake: "resoure" -> "resource"
   - remove unused redundant variable stop
   - Fix bool initialization/comparison

  Bug Fixes:
   - initialized returned struct aa_perms
   - fix leak of null profile name if profile allocation fails
   - ensure that undecidable profile attachments fail
   - fix profile attachment for special unconfined profiles
   - fix locking when creating a new complain profile.
   - fix possible recursive lock warning in __aa_create_ns"

* tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix possible recursive lock warning in __aa_create_ns
  apparmor: fix locking when creating a new complain profile.
  apparmor: fix profile attachment for special unconfined profiles
  apparmor: ensure that undecidable profile attachments fail
  apparmor: fix leak of null profile name if profile allocation fails
  apparmor: remove unused redundant variable stop
  apparmor: Fix bool initialization/comparison
  apparmor: initialized returned struct aa_perms
  apparmor: fix spelling mistake: "resoure" -> "resource"

powerpc/kexec: Fix kexec/kdump in P9 guest kernels

The code that cleans up the IAMR/AMOR before kexec'ing failed to
remember that when we're running as a guest AMOR is not writable, it's
hypervisor privileged.

They symptom is that the kexec stops before entering purgatory and
nothing else is seen on the console. If you examine the state of the
system all threads will be in the 0x700 program check handler.

Fix it by making the write to AMOR dependent on HV mode.

Fixes: 1e2a516e89fc ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR")
Cc: [email protected] # v4.10+
Reported-by: Yilin Zhang <[email protected]>
Debugged-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Tested-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

crypto: af_alg - remove locking in async callback

The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.

This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.

In addition, the sock_hold is moved before the AIO encrypt/decrypt
operation to ensure that the socket is always present. This avoids a
tiny race window where the socket is unprotected and yet used by the AIO
operation.

Finally, the release of resources for a crypto operation is moved into a
common function of af_alg_free_resources.

Cc: <[email protected]>
Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
Reported-by: Romain Izard <[email protected]>
Signed-off-by: Stephan Mueller <[email protected]>
Tested-by: Romain Izard <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

crypto: algif_aead - skip SGL entries with NULL page

The TX SGL may contain SGL entries that are assigned a NULL page. This
may happen if a multi-stage AIO operation is performed where the data
for each stage is pointed to by one SGL entry. Upon completion of that
stage, af_alg_pull_tsgl will assign NULL to the SGL entry.

The NULL cipher used to copy the AAD from TX SGL to the destination
buffer, however, cannot handle the case where the SGL starts with an SGL
entry having a NULL page. Thus, the code needs to advance the start
pointer into the SGL to the first non-NULL entry.

This fixes a crash visible on Intel x86 32 bit using the libkcapi test
suite.

Cc: <[email protected]>
Fixes: 72548b093ee38 ("crypto: algif_aead - copy AAD from src to dst")
Signed-off-by: Stephan Mueller <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

blk-wbt: fix comments typo

Signed-off-by: weiping zhang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-wbt: move wbt_clear_stat to common place in wbt_done

wbt_done call wbt_clear_stat no matter current stat was tracked
or not, move it to common place.

Signed-off-by: weiping zhang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-sysfs: remove NULL pointer checking in queue_wb_lat_store

wbt_init doesn't set q->rq_wb to NULL, if wbt_init return 0,
so check return value is enough, remove NULL checking.

Signed-off-by: weiping zhang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-wbt: remove duplicated setting in wbt_init

rwb->wc and rwb->queue_depth were overwritten by wbt_set_write_cache and
wbt_set_queue_depth, remove the default setting.

Signed-off-by: weiping zhang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Christoph:

"A couple nvme fixes for 4.15:

- expand the queue ready fix that we only had for RDMA to also cover FC and
loop by moving it to common code (Sagi)
- fix an array out of bounds in the PCIe HMB code (Minwoo Im)
- two new device quirks (Jeff Lien and Kai-Heng Feng)
- static checkers fixes (Keith Busch)
- FC target refcount fix (James Smart)
- A trivial spelling fix in new code (Colin Ian King)"

Merge tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

4.15 merge window fixes 1

* tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc:
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/vc4: Account for interrupts in flight

Merge tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

drm/i915 fixes for v4.15

* tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915: Fix init_clock_gating for resume
  drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
  drm/i915: Clear breadcrumb node when cancelling signaling
  drm/i915/gvt: ensure -ve return value is handled correctly
  drm/i915: Re-register PMIC bus access notifier on runtime resume
  drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2

Merge tag 'drm-misc-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Fix crtc_id in page_flip event.

* tag 'drm-misc-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-misc:
drm/vblank: Pass crtc_id to page_flip_ioctl.

drm/ttm: don't attempt to use hugepages if dma32 requested (v2)

The commit below introduced thp support for ttm allocations, however it didn't
take into account the case where dma32 was requested. Some drivers always request
dma32, and the bochs driver is one of those.

This fixes an oops:

[   30.108507] ------------[ cut here ]------------
[   30.108920] kernel BUG at ./include/linux/gfp.h:408!
[   30.109356] invalid opcode: 0000 [#1] SMP
[   30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net
[   30.115605]  virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[   30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[   30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   30.118866] task: ffff923a77e03380 task.stack: ffffa78182228000
[   30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[   30.119810] RSP: 0000:ffffa7818222bba8 EFLAGS: 00010202
[   30.120250] RAX: 0000000000000001 RBX: 00000000014382c6 RCX: 0000000000000006
[   30.120840] RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000000000
[   30.121443] RBP: ffff923a760d6000 R08: 0000000000000000 R09: 0000000000000006
[   30.122039] R10: 0000000000000040 R11: 0000000000000300 R12: ffff923a729273c0
[   30.122629] R13: 0000000000000000 R14: 0000000000000000 R15: ffff923a7483d400
[   30.123223] FS:  00007fe48da7dac0(0000) GS:ffff923a7cc00000(0000) knlGS:0000000000000000
[   30.123896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.124373] CR2: 00007fe457b73000 CR3: 0000000078313000 CR4: 00000000000006f0
[   30.124968] Call Trace:
[   30.125186]  ttm_pool_populate+0x19b/0x400 [ttm]
[   30.125578]  ttm_bo_vm_fault+0x325/0x570 [ttm]
[   30.125964]  __do_fault+0x19/0x11e
[   30.126255]  __handle_mm_fault+0xcd3/0x1260
[   30.126609]  handle_mm_fault+0x14c/0x310
[   30.126947]  __do_page_fault+0x28c/0x530
[   30.127282]  do_page_fault+0x32/0x270
[   30.127593]  async_page_fault+0x22/0x30
[   30.127922] RIP: 0033:0x7fe48aae39a8
[   30.128225] RSP: 002b:00007ffc21c4d928 EFLAGS: 00010206
[   30.128664] RAX: 00007fe457b73000 RBX: 000055cd4c1041a0 RCX: 00007fe457b73040
[   30.129259] RDX: 0000000000300000 RSI: 0000000000000000 RDI: 00007fe457b73000
[   30.129855] RBP: 0000000000000300 R08: 000000000000000c R09: 0000000100000000
[   30.130457] R10: 0000000000000001 R11: 0000000000000246 R12: 000055cd4c1041a0
[   30.131054] R13: 000055cd4bdfe990 R14: 000055cd4c104110 R15: 0000000000000400
[   30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[   30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: ffffa7818222bba8
[   30.133836] ---[ end trace d4f1deb60784f40a ]---

v2: handle free path as well.

Reported-by: Laura Abbott <[email protected]>
Reported-by: Adam Williamson <[email protected]>
Fixes: 0284f1ead87463bc17cf5e81a24fc65c052486f3 (drm/ttm: add transparent huge page support for cached allocations v2)
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

Merge tag 'keys-next-20171123' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys

Merge keys subsystem changes from David Howells, for v4.15.

x86/PCI: Remove unused HyperTransport interrupt support

There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left. Remove the unused entry point and all the
supporting code.

See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt
support").

Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com

x86/umip: Fix insn_get_code_seg_params()'s return value

In order to save on redundant structs definitions
insn_get_code_seg_params() was made to return two 4-bit values in a char
but clang complains:

  arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char'
  changes value from 132 to -124 [-Wconstant-conversion]
                  return INSN_CODE_SEG_PARAMS(4, 8);
                  ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
  ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS'
  #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))

Those two values do get picked apart afterwards the opposite way of how
they were ORed so wrt to the LSByte, the return value is the same.

But this function returns -EINVAL in the error case, which is an int. So
make it return an int which is the native word size anyway and thus fix
the clang warning.

Reported-by: Kees Cook <[email protected]>
Reported-by: Nick Desaulniers <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

x86/boot/KASLR: Remove unused variable

There are two variables "rc" in mem_avoid_memmap. One at the top of the
function and another one inside the while() loop. Drop the outer one as it
is unused. Cleanup some whitespace damage while at it.

Signed-off-by: Chao Fan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

genirq/matrix: Make - vs ?: Precedence explicit

Noticed with a Clang build. This improves the readability of the ?:
expression, as it has lower precedence than the - expression. Show
explicitly that - is evaluated first.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast

irqchip/imgpdc: Use resource_size function on resource object

drivers/irqchip/irq-imgpdc.c:327:20-23: WARNING: Suspicious code.
resource_size is maybe missing with res_regs

Generated by: scripts/coccinelle/api/resource_size.cocci

Signed-off-by: Vasyl Gomonovych <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

irqchip/qcom: Fix u32 comparison with value less than zero

The comparison of u32 nregs being less than zero is never true since
nregs is unsigned. Fix this by making nregs a signed integer.

Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
Cc: Jason Cooper <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge branch 'ipvlan-Fix-insufficient-skb-linear-check'

Gao Feng says:

====================
ipvlan: Fix insufficient skb linear check

The current ipvlan codes use pskb_may_pull to get the skb linear header in
func ipvlan_get_L3_hdr, but the size isn't enough for arp and ipv6 icmp.
So it may access the unexpected momory in ipvlan_addr_lookup.
====================

Signed-off-by: David S. Miller <[email protected]>

ipvlan: Fix insufficient skb linear check for ipv6 icmp

In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for ipv6 header. But it
would use the latter memory directly without linear check when it is icmp.
So it still may access the unepxected memory in ipvlan_addr_lookup.

Now invoke the pskb_may_pull again if it is ipv6 icmp.

Signed-off-by: Gao Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipvlan: Fix insufficient skb linear check for arp

In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for arp header. But it
would access the arp payload in func ipvlan_addr_lookup. So it still may
access the unepxected memory.

Now use arp_hdr_len(port->dev) instead of the arp header as the param.

Signed-off-by: Gao Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6

Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't
make sense if we haven't enabled CONFIG_IPV6. Fix it by adding
if IS_ENABLED(CONFIG_IPV6) check.

Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink")
Fixes: fd7eafd02121 ("geneve: fix fill_info when link down")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-for-davem-2017-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.15

First set of fixes for 4.15. Most important here is the iwlwifi fix
for scan command firmware interface change.

ath10k

* fix CCMP-256, GCMP and GCMP-256 in raw mode, it was never working

wcn36xx

* fix device tree node search

iwlwifi

* fix a regression with firmware API change of scan cmd (introduced in
firmware version 34)

* add a bunch of PCI IDs and fix configuration structs for A000 devices

* fix the exported firmware name strings for 9000 and A000 devices
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Fixes 2017-11-21

This series contains fixes for igb/vf, ixgbe/vf, i40e/vf and fm10k.

Jake fixes a regression issue with older firmware, where we were using
the NVM lock to synchronize NVM reads for all devices and firmware
versions, yet this caused issues with older firmware prior to version
1.5.  Fixed this by only grabbing the lock for newer devices and firmware
version 1.5 or newer.

Zijie Pan fixes the calculation of the i40e VF MAC addresses, where it was
possible to increment to the next MAC entry without calling
i40e_add_mac_filter().

Amritha removes the upper limit of 64 queues on a channel VSI since the
upper bound is determined by the VSI's num_queue_pairs.

Filip fixes an issue during FLR resets, where should have been checking
for upcoming core reset and if so, just return with I40E_ERR_NOT_READY.

Alan fixes the notifying clients of l2 parameters by copying the
parameters to the client instance struct and re-organizes the priority
in which the client tasks fire so that if the flag for notifying l2
params is set, it will trigger before the client open task.  Also fixed
the promiscuous settings after reset for all the VSI's.

Brian King from IBM fixes an issue seen on Power systems which would
result in skb list corruption and eventual kernel oops.  Brian
provides the same fix for nearly all our drivers, to replace the
read_barrier_depends with smp_rmb() to ensure loads are ordered with
respect to the load of tx_buffer->next_to_watch.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY

The PHY on BCM7278 has an additional bit that needs to be cleared:
IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out
of suspend/resume cycles.

Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2017-11-23

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Several BPF offloading fixes, from Jakub. Among others:

    - Limit offload to cls_bpf and XDP program types only.
    - Move device validation into the driver and don't make
      any assumptions about the device in the classifier due
      to shared blocks semantics.
    - Don't pass offloaded XDP program into the driver when
      it should be run in native XDP instead. Offloaded ones
      are not JITed for the host in such cases.
    - Don't destroy device offload state when moved to
      another namespace.
    - Revert dumping offload info into user space for now,
      since ifindex alone is not sufficient. This will be
      redone properly for bpf-next tree.

2) Fix test_verifier to avoid using bpf_probe_write_user()
   helper in test cases, since it's dumping a warning into
   kernel log which may confuse users when only running tests.
   Switch to use bpf_trace_printk() instead, from Yonghong.

3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
   before it becomes uabi, from Gianluca. More specifically:

    - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
      by bpf_csum_diff(), where the argument is either a
      valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
      then enforces a valid pointer in case of non-0 size
      or a valid pointer or NULL in case of size 0. Given
      that, the semantics for ARG_PTR_TO_MEM in combination
      with ARG_CONST_SIZE_OR_ZERO are now such that in case
      of size 0, the pointer must always be valid and cannot
      be NULL. This fix in semantics allows for bpf_probe_read()
      to drop the recently added size == 0 check in the helper
      that would become part of uabi otherwise once released.
      At the same time we can then fix bpf_probe_read_str() and
      bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
      instead of ARG_CONST_SIZE in order to fix recently
      reported issues by Arnaldo et al, where LLVM optimizes
      two boundary checks into a single one for unknown
      variables where the verifier looses track of the variable
      bounds and thus rejects valid programs otherwise.

4) A fix for the verifier for the case when it detects
   comparison of two constants where the branch is guaranteed
   to not be taken at runtime. Verifier will rightfully prune
   the exploration of such paths, but we still pass the program
   to JITs, where they would complain about using reserved
   fields, etc. Track such dead instructions and sanitize
   them with mov r0,r0. Rejection is not possible since LLVM
   may generate them for valid C code and doesn't do as much
   data flow analysis as verifier. For bpf-next we might
   implement removal of such dead code and adjust branches
   instead. Fix from Alexei.
====================

Signed-off-by: David S. Miller <[email protected]>

net: accept UFO datagrams from tuntap and packet

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 < payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: realtek: r8169: implement set_link_ksettings()

Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially
implemented the new ethtool API, by replacing get_settings()
with get_link_ksettings(). This breaks ethtool, since the
userspace tool (according to the new API specs) never tries
the legacy set() call, when the new get() call succeeds.

All attempts to chance some setting from userspace result in:
> Cannot set new settings: Operation not supported

Implement the missing set() call.

Signed-off-by: Tobias Jakobi <[email protected]>
Tested-by: Holger Hoffstätte <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipv6: Fixup device for anycast routes during copy

Florian reported a breakage with anycast routes due to commit
4832c30d5458 ("net: ipv6: put host and anycast routes on device with
address"). Prior to this commit anycast routes were added against the
loopback device causing repetitive route entries with no insight into
why they existed. e.g.:
  $ ip -6 ro ls  table local type anycast
  anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
  anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
  anycast fe80:: dev lo proto kernel metric 0 pref medium
  anycast fe80:: dev lo proto kernel metric 0 pref medium

The point of commit 4832c30d5458 is to add the routes using the device
with the address which is causing the route to be added. e.g.,:
  $ ip -6 ro ls  table local type anycast
  anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
  anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
  anycast fe80:: dev eth2 proto kernel metric 0 pref medium
  anycast fe80:: dev eth1 proto kernel metric 0 pref medium

For traffic to work as it did before, the dst device needs to be switched
to the loopback when the copy is created similar to local routes.

Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'smc-fixes-for-smc-buffer-handling'

Ursula Braun says:

====================
net/smc: fixes for smc buffer handling

here are 2 cleanup patches for smc buffer handling.
====================

Signed-off-by: David S. Miller <[email protected]>

net/smc: Fix preinitialization of buf_desc in __smc_buf_create()

With gcc-4.1.2:

    net/smc/smc_core.c: In function ‘__smc_buf_create’:
    net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function

Indeed, if the for-loop is never executed, bufsize is used
uninitialized.  In addition, buf_desc is stored for later use, while it
is still a NULL pointer.

Before, error handling was done by checking if buf_desc is non-NULL.
The cleanup changed this to an error check, but forgot to update the
preinitialization of buf_desc to an error pointer.

Update the preinitializatin of buf_desc to fix this.

Fixes: b33982c3a6838d13 ("net/smc: cleanup function __smc_buf_create()")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: use sk_rcvbuf as start for rmb creation

Commit 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
merged handling of SMC receive and send buffers. It introduced sk_buf_size
as merged start value for size determination. But since sk_buf_size is not
used at all, sk_sndbuf is erroneously used as start for rmb creation.
This patch makes sure, sk_buf_size is really used as intended, and
sk_rcvbuf is used as start value for rmb creation.

Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: Ursula Braun <[email protected]>
Reviewed-by: Hans Wippel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Do not consider linkdown nexthops during multipath

When the 'ignore_routes_with_linkdown' sysctl is set, we should not
consider linkdown nexthops during route lookup.

While the code correctly verifies that the initially selected route
('match') has a carrier, it does not perform the same check in the
subsequent multipath selection, resulting in a potential packet loss.

In case the chosen route does not have a carrier and the sysctl is set,
choose the initially selected route.

Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: David Ahern <[email protected]>
Acked-by: Andy Gospodarek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: fix crash when deleting secondary chains

If you flush (delete) a filter chain other than chain 0 (such as when
deleting the device), the kernel may run into a use-after-free. The
chain refcount must not be decremented unless we are sure we are done
with the chain.

To reproduce the bug, run:
    ip link add dtest type dummy
    tc qdisc add dev dtest ingress
    tc filter add dev dtest chain 1  parent ffff: flower
    ip link del dtest

Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"),
but unless you have KAsan or luck, you won't notice it until
commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()")

Fixes: f93e1cdcf42c ("net/sched: fix filter flushing")
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Roman Kapl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE

This change resolves a new compile-time warning
when built as a loadable module:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/phy/cortina.o
see include/linux/module.h for more information

This adds the license as "GPL", which matches the header of the file.

MODULE_DESCRIPTION and MODULE_AUTHOR are also added.

Signed-off-by: Jesse Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-linus-timers-conversion-final-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into timers/urgent

Pull the last batch of manual timer conversions from Kees Cook:

- final batch of "non trivial" timer conversions (multi-tree dependencies,
   things Coccinelle couldn't handle, etc).

- treewide conversions via Coccinelle, in 4 steps:
   - DEFINE_TIMER() functions converted to struct timer_list * argument
   - init_timer() -> setup_timer()
   - setup_timer() -> timer_setup()
   - setup_timer() -> timer_setup() (with a single embedded structure)

- deprecated timer API removals (init_timer(), setup_*timer())

- finalization of new API (remove global casts)

kbuild: drop $(extra-y) from real-objs-y

$(real-objs-y) in only used in scripts/Makefile.build to form
"targets", but $(extra-y) is added to "targets" in another line.
We do not need to add $(extra-y) twice.

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: clean up *.i and *.lst patterns by make clean

*.i and *.lst are supported by the single target build. Clean up them.

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: rpm: prompt to use "rpm-pkg" if "rpm" target is used

The "rpm" has been kept for backward compatibility since pre-git era.
I am planning to remove it after the Linux 4.18 release.  Annouce the
end of the support, prompting to use "rpm-pkg" instead.

If you use "rpm", it will work like "rpm-pkg", but warning messages
will be displayed as follows:

  WARNING: "rpm" target will be removed after Linux 4.18
           Please use "rpm-pkg" instead.

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: pkg: use --transform option to prefix paths in tar

For rpm-pkg and deb-pkg, a source tar file is created.  All paths in
the archive must be prefixed with the base name of the tar so that
everything is contained in the directory when you extract it.

Currently, scripts/package/Makefile uses a symlink for that, and
removes it after the tar is created.

If you terminate the build during the tar creation, the symlink is
left over.  Then, at the next package build, you will see a warning
like follows:

  ln: '.' and 'kernel-4.14.0+/.' are the same file

It is possible to fix it by adding -n (--no-dereference) option to
the "ln" command, but a cleaner way is to use --transform option
of "tar" command.  This option is GNU extension, but it should not
hurt to use it in the Linux build system.

The 'S' flag is needed to exclude symlinks from the path fixup.
Without it, symlinks in the kernel are broken.

Signed-off-by: Masahiro Yamada <[email protected]>

coccinelle: fix parallel build with CHECK=scripts/coccicheck

The command "make -j8 C=1 CHECK=scripts/coccicheck" produces
lots of "coccicheck failed" error messages.

Julia Lawall explained the Coccinelle behavior as follows:
"The problem on the Coccinelle side is that it uses a subdirectory
with the name of the semantic patch to store standard output and
standard error for the different threads.  I didn't want to use a
name with the pid, so that one could easily find this information
while Coccinelle is running.  Normally the subdirectory is cleaned
up when Coccinelle completes, so there is only one of them at a time.
Maybe it is best to just add the pid.  There is the risk that these
subdirectories will accumulate if Coccinelle crashes in a way such
that they don't get cleaned up, but Coccinelle could print a warning
if it detects this case, rather than failing."

When scripts/coccicheck is used as CHECK tool and -j option is given
to Make, the whole of build process runs in parallel.  So, multiple
processes try to get access to the same subdirectory.

I notice spatch creates the subdirectory only when it runs in parallel
(i.e. --jobs <N> is given and <N> is greater than 1).

Setting NPROC=1 is a reasonable solution; spatch does not create the
subdirectory.  Besides, ONLINE=1 mode takes a single file input for
each spatch invocation, so there is no reason to parallelize it in
the first place.

Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Julia Lawall <[email protected]>

kconfig/symbol.c: use correct pointer type argument for sizeof

sym_arr is of type struct symbol **.
So in malloc we need sizeof(struct symbol *).

The problem was indicated by coccinelle.

Signed-off-by: Heinrich Schuchardt <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

mmc: sdhci-msm: Optionally wait for signal level changes

Not all instances of the SDCC core supports changing signal voltage and
as such will not generate a power interrupt when the software attempts
to change the voltage. This results in probing the eMMC on some devices
to take over 2 minutes.

Check that the SWITCHABLE_SIGNALING_VOLTAGE bit in MCI_GENERICS is set
before waiting for the power interrupt.

Cc: Sahitya Tummala <[email protected]>
Cc: Vijay Viswanath <[email protected]>
Fixes: c0309b3803fe ("mmc: sdhci-msm: Add sdhci msm register write APIs which wait for pwr irq")
Signed-off-by: Bjorn Andersson <[email protected]>
Tested-by: Luca Weiss <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

mmc: block: Ensure that debugfs files are removed

The card is not necessarily being removed, but the debugfs files must be
removed when the driver is removed, otherwise they will continue to exist
after unbinding the card from the driver. e.g.

  # echo "mmc1:0001" > /sys/bus/mmc/drivers/mmcblk/unbind
  # cat /sys/kernel/debug/mmc1/mmc1\:0001/ext_csd
  [  173.634584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  [  173.643356] IP: mmc_ext_csd_open+0x5e/0x170

A complication is that the debugfs_root may have already been removed, so
check for that too.

Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module")
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Cc: [email protected] # 4.14+
Signed-off-by: Ulf Hansson <[email protected]>

mmc: core: Do not leave the block driver in a suspended state

The block driver must be resumed if the mmc bus fails to suspend the card.

Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Cc: [email protected] # v3.19+
Signed-off-by: Ulf Hansson <[email protected]>

mmc: block: Check return value of blk_get_request()

blk_get_request() can fail, always check the return value.

Fixes: 0493f6fe5bde ("mmc: block: Move boot partition locking into a driver op")
Fixes: 3ecd8cf23f88 ("mmc: block: move multi-ioctl() to use block layer")
Fixes: 614f0388f580 ("mmc: block: move single ioctl() commands to block requests")
Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module")
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Cc: [email protected] # 4.13+
Signed-off-by: Ulf Hansson <[email protected]>

mmc: block: Fix missing blk_put_request()

Ensure blk_get_request() is paired with blk_put_request().

Fixes: 0493f6fe5bde ("mmc: block: Move boot partition locking into a driver op")
Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module")
Signed-off-by: Adrian Hunter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Cc: [email protected] # 4.13+
Signed-off-by: Ulf Hansson <[email protected]>

powerpc/powernv: Fix kexec crashes caused by tlbie tracing

Rebooting into a new kernel with kexec fails in trace_tlbie() which is
called from native_hpte_clear(). This happens if the running kernel
has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints
always execute few RCU checks regardless of whether tracing is on or
off. We are already in the last phase of kexec sequence in real mode
with HILE_BE set. At this point the RCU check ends up in
RCU_LOCKDEP_WARN and causes kexec to fail.

Fix this by not calling trace_tlbie() from native_hpte_clear().

mpe: It's not safe to call trace points at this point in the kexec
path, even if we could avoid the RCU checks/warnings. The only
solution is to not call them.

Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
Cc: [email protected] # v4.13+
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Reported-by: Aneesh Kumar K.V <[email protected]>
Suggested-by: Michael Ellerman <[email protected]>
Acked-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Check if vphb exists before iterating over AFU devices

During an eeh a kernel-oops is reported if no vPHB is allocated to the
AFU. This happens as during AFU init, an error in creation of vPHB is
a non-fatal error. Hence afu->phb should always be checked for NULL
before iterating over it for the virtual AFU pci devices.

This patch fixes the kenel-oops by adding a NULL pointer check for
afu->phb before it is dereferenced.

Fixes: 9e8df8a21963 ("cxl: EEH support")
Cc: [email protected] # v4.3+
Signed-off-by: Vaibhav Jain <[email protected]>
Acked-by: Andrew Donnellan <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

drm/vblank: Pass crtc_id to page_flip_ioctl.

We added crtc_id to the atomic ioctl, but forgot to add it for vblank
and page flip events. Commit bd386e518056 ("drm: Reorganize
drm_pending_event to support future event types [v2]") added it to
the vblank event, but page flip event was still missing.

Correct this and add a test for making sure we always set crtc_id correctly.

Fixes: bd386e518056 ("drm: Reorganize drm_pending_event to support future event types [v2]")
Fixes: 5db06a8a98f5 ("drm: Pass CRTC ID in userspace vblank events")
Cc: Daniel Stone <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # v4.12+
Reviewed-by: Daniel Vetter <[email protected]> #irc
Testcase: igt/kms_vblank/crtc_id
Signed-off-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

bpf: fix branch pruning logic

when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

ALSA: hda: Add Raven PCI ID

This commit adds PCI ID for Raven platform

Signed-off-by: Vijendar Mukunda <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

nvme-pci: add quirk for delay before CHK RDY for WDC SN200

And increase the existing delay to cover this device as well.

Cc: [email protected]
Signed-off-by: Jeff Lien <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

ALSA: hda/realtek - Fix ALC700 family no sound issue

It maybe the typo for ALC700 support patch.
To fix the bit value on this patch.

Fixes: 6fbae35a3170 ("ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703")
Signed-off-by: Kailang Yang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'pwm/for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
"The changes for this release include power management improvements for
  the pwm-img driver, support for the backup mode on pwm-atmel-tcb as
  well as support for more hardware with the R-Car and Mediatek drivers.

  To round things off there's a bit of cleanup for sunxi and stm32-lp"

* tag 'pwm/for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: stm32-lp: Remove pwm_is_enabled() check before calling pwm_disable()
  pwm: mediatek: Add MT2712/MT7622 support
  pwm: sunxi: Use of_device_get_match_data()
  pwm: atmel-tcb: Support backup mode
  dt-bindings: pwm: Add R-Car D3 device tree bindings
  pwm: img: Add runtime PM
  pwm: img: Add suspend / resume handling

Merge tag 'rtc-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
"There is nothing scary this cycle, mostly driver fixes and updates.

  The core fix has been in for a while and has been tested on multiple
  kernel revisions by multiple teams.

  Core:
   - Fix setting the alarm to the next expiring timer

  New drivers:
   - Mediatek MT7622 RTC
   - NXP PCF85363
   - Spreadtrum SC27xx PMIC RTC

  Drivers updates:
   - Use generic nvmem to expose the Non volatile ram for ds1305,
     ds1511, m48t86 and omap
   - abx80x: solve possible race condition at probe
   - armada38x: support trimming the RTC oscillator
   - at91rm9200: fix reading the alarm value at boot
   - ds1511: allow waking platform
   - m41t80: rework square wave output
   - pcf8523: support trimming the RTC oscillator
   - pcf8563: fix clock output rate
   - pl031: make interrupt optional
   - xgene: fix suspend/resume"

* tag 'rtc-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  dt-bindings: rtc: imxdi: Improve the bindings text
  rtc: sc27xx: Add Spreadtrum SC27xx PMIC RTC driver
  dt-bindings: rtc: Add Spreadtrum SC27xx RTC documentation
  rtc: at91rm9200: fix reading alarm value
  rtc: at91rm9200: stop calculating yday in at91_rtc_readalarm
  rtc: sysfs: Use time64_t variables to set time/alarm
  rtc: xgene: mark PM functions as __maybe_unused
  rtc: xgene: Fix suspend/resume
  rtc: pcf8563: don't alway enable the alarm
  rtc: pcf8563: fix output clock rate
  rtc: rx8010: Fix for incorrect return value
  rtc: rx8010: Specify correct address for RX8010_RESV31
  rtc: rx8010: Remove duplicate define
  rtc: m41t80: remove unneeded checks from m41t80_sqw_set_rate
  rtc: m41t80: avoid i2c read in m41t80_sqw_is_prepared
  rtc: m41t80: avoid i2c read in m41t80_sqw_recalc_rate
  rtc: m41t80: fix m41t80_sqw_round_rate return value
  rtc: m41t80: m41t80_sqw_set_rate should return 0 on success
  rtc: add support for NXP PCF85363 real-time clock
  rtc: omap: Support scratch registers
  ...

x86/entry/64: Add missing irqflags tracing to native_load_gs_index()

Running this code with IRQs enabled (where dummy_lock is a spinlock):

static void check_load_gs_index(void)
{
/* This will fail. */
load_gs_index(0xffff);

spin_lock(&dummy_lock);
spin_unlock(&dummy_lock);
}

Will generate a lockdep warning.  The issue is that the actual write
to %gs would cause an exception with IRQs disabled, and the exception
handler would, as an inadvertent side effect, update irqflag tracing
to reflect the IRQs-off status.  native_load_gs_index() would then
turn IRQs back on and return with irqflag tracing still thinking that
IRQs were off.  The dummy lock-and-unlock causes lockdep to notice the
error and warn.

Fix it by adding the missing tracing.

Apparently nothing did this in a context where it mattered.  I haven't
tried to find a code path that would actually exhibit the warning if
appropriately nasty user code were running.

I suspect that the security impact of this bug is very, very low --
production systems don't run with lockdep enabled, and the warning is
mostly harmless anyway.

Found during a quick audit of the entry code to try to track down an
unrelated bug that Ingo found in some still-in-development code.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd

Pull MTD updates from Richard Weinberger:
"General changes:
   -  Unconfuse get_unmapped_area and point/unpoint driver methods
   -  New partition parser: sharpslpart
   -  Kill GENERIC_IO
   -  Various fixes

  NAND changes:
   -  Add a flag to mark NANDs that require 3 address cycles to encode a
      page address
   -  Set a default ECC/free layout when NAND_ECC_NONE is requested
   -  Fix a bug in panic_nand_write()
   -  Another batch of cleanups for the denali driver
   -  Fix PM support in the atmel driver
   -  Remove support for platform data in the omap driver
   -  Fix subpage write in the omap driver
   -  Fix irq handling in the mtk driver
   -  Change link order of mtk_ecc and mtk_nand drivers to speed up boot
      time
   -  Change log level of ECC error messages in the mxc driver
   -  Patch the pxa3xx driver to support Armada 8k platforms
   -  Add BAM DMA support to the qcom driver
   -  Convert gpio-nand to the GPIO desc API
   -  Fix ECC handling in the mt29f driver

  SPI-NOR changes:
   -  Introduce system power management support
   -  New mechanism to select the proper .quad_enable() hook by JEDEC
      ID, when needed, instead of only by manufacturer ID
   -  Add support to new memory parts from Gigadevice, Winbond, Macronix
      and Everspin
   -  Maintainance for Cadence, Intel, Mediatek and STM32 drivers"

*  tag 'for-linus-20171120' of git://git.infradead.org/linux-mtd: (85 commits)
  mtd: Avoid probe failures when mtd->dbg.dfs_dir is invalid
  mtd: sharpslpart: Add sharpslpart partition parser
  mtd: Add sanity checks in mtd_write/read_oob()
  mtd: remove the get_unmapped_area method
  mtd: implement mtd_get_unmapped_area() using the point method
  mtd: chips/map_rom.c: implement point and unpoint methods
  mtd: chips/map_ram.c: implement point and unpoint methods
  mtd: mtdram: properly handle the phys argument in the point method
  mtd: mtdswap: fix spelling mistake: 'TRESHOLD' -> 'THRESHOLD'
  mtd: slram: use memremap() instead of ioremap()
  kconfig: kill off GENERIC_IO option
  mtd: Fix C++ comment in include/linux/mtd/mtd.h
  mtd: constify mtd_partition
  mtd: plat-ram: Replace manual resource management by devm
  mtd: nand: Fix writing mtdoops to nand flash.
  mtd: intel-spi: Add Intel Lewisburg PCH SPI super SKU PCI ID
  mtd: nand: mtk: fix infinite ECC decode IRQ issue
  mtd: spi-nor: Add support for mr25h128
  mtd: nand: mtk: change the compile sequence of mtk_nand.o and mtk_ecc.o
  mtd: spi-nor: enable 4B opcodes for mx66l51235l
  ...

Merge tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

- Fix a memory leak in the new in-core extent map

- Refactor the xfs_dev_t conversions for easier xfsprogs porting

* tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: abstract out dev_t conversions
xfs: fix memory leak in xfs_iext_free_last_leaf

Merge branch 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull mode_t whack-a-mole from Al Viro:
"For all internal uses we want umode_t, which is arch-independent;
  mode_t (or __kernel_mode_t, for that matter) is wrong outside of
  userland ABI.

  Unfortunately, that crap keeps coming back and needs to be put down
  from time to time..."

* 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mode_t whack-a-mole: task_dump_owner()

Merge branch '9p-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 9p filesystemfixes from Al Viro:
"Several 9p fixes"

* '9p-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: Fix missing commas in mount options
  net/9p: Switch to wait_event_killable()
  fs/9p: Compare qid.path in v9fs_test_inode

kbuild: Set KBUILD_CFLAGS before incl. arch Makefile

Set the clang KBUILD_CFLAGS up before including arch/ Makefiles,
so that ld-options (etc.) can work correctly.

This fixes errors with clang such as ld-options trying to CC
against your host architecture, but LD trying to link against
your target architecture.

Signed-off-by: Chris Fries <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Matthias Kaehlcke <[email protected]>
Tested-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

dt-bindings: remove file that was added accidentally

I think this snuck in when I applied the patch for f97decac5f4c (didn't
apply cleanly, required some manual applying + git-add). It is unused
and shouldn't be here. My bad.

Fixes: f97decac5f4c "drm/msm: Support multiple ringbuffers"
Signed-off-by: Rob Clark <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts

This fixes two errors that prevent a guest using the HPT MMU from
successfully migrating to a POWER9 host in radix MMU mode, or resizing
its HPT when running on a radix host.

The first bug was that commit 8dc6cca556e4 ("KVM: PPC: Book3S HV:
Don't rely on host's page size information", 2017-09-11) missed two
uses of hpte_base_page_size(), one in the HPT rehashing code and
one in kvm_htab_write() (which is used on the destination side in
migrating a HPT guest). Instead we use kvmppc_hpte_base_page_shift().
Having the shift count means that we can use left and right shifts
instead of multiplication and division in a few places.

Along the way, this adds a check in kvm_htab_write() to ensure that the
page size encoding in the incoming HPTEs is recognized, and if not
return an EINVAL error to userspace.

The second bug was that kvm_htab_write was performing some but not all
of the functions of kvmhv_setup_mmu(), resulting in the destination VM
being left in radix mode as far as the hardware is concerned. The
simplest fix for now is make kvm_htab_write() call
kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does.
In future it would be better to refactor the code more extensively
to remove the duplication.

Fixes: 8dc6cca556e4 ("KVM: PPC: Book3S HV: Don't rely on host's page size information")
Fixes: 7a84084c6054 ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9")
Reported-by: Suraj Jitindar Singh <[email protected]>
Tested-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>

drm/edid: quirk HTC vive headset as non-desktop. [v2]

This uses the EDID info from my HTC Vive to mark it as
non-desktop.

v2: Change description from non-standard to non-desktop

Reviewed-by: Keith Packard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/fb: add support for not enabling fbcon on non-desktop displays [v2]

We don't want fbcon to get used on non-desktop dislays,
don't pass them as enabled connectors to the fb helper setup.

This prevents my HMD from getting disorted fbcon, and from
affecting other displays console.

v2: Change description from non-standard to non-desktop

Reviewed-by: Keith Packard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm: add connector info/property for non-desktop displays [v2]

This adds the infrastructure needed to quirk displays
using edid and to mark them a non-desktop.

A non-desktop display is one which shouldn't normally be included
as a part of a desktop environment.

This is meant to cover head mounted devices like HTC Vive.

v2: Change description from non-standard to non-desktop, add docs

Reviewed-by: Keith Packard <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
fixup docs

Merge tag 'tilcdc-4.15-fixes' of https://github.com/jsarha/linux into drm-next

tilcdc fixes for v4.15

* tag 'tilcdc-4.15-fixes' of https://github.com/jsarha/linux:
drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support

Merge branch 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux into drm-next

more misc amdgpu fixes.

* 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: fix rmmod KCQ disable failed error
  drm/amdgpu: fix kernel hang when starting VNC server
  drm/amdgpu: don't skip attributes when powerplay is enabled
  drm/amd/pp: fix typecast error in powerplay.
  Revert "drm/radeon: dont switch vt on suspend"
  drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
  drm/amd/powerplay: fix unfreeze level smc message for smu7
  drm/amdgpu:fix memleak
  drm/amdgpu:fix memleak in takedown

Merge tag 'imx-drm-next-2017-10-18' of git://git.pengutronix.de/git/pza/linux into drm-next

drm/imx: various cleanups

- Switch to drm_*_get/put() helpers
- Use correct parallel-display connector enum: DPI instead of VGA
- Remove incorrect unit name from device tree binding documentation example
- Remove an unused variable

* tag 'imx-drm-next-2017-10-18' of git://git.pengutronix.de/git/pza/linux:
  gpu: ipu-v3: ipu-dc: Remove unused 'di' variable
  dt-bindings: fsl-imx-drm: Remove incorrect "@di0" usage
  drm/imx: parallel-display: use correct connector enum
  drm/imx: switch to drm_*_get(), drm_*_put() helpers

Merge tag 'drm/tegra/for-4.15-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux into drm-next

drm/tegra: Fixes for v4.15-rc1

This includes an update to the SOR pad clock programming needed because
of some changes that went in through the clock tree.

* tag 'drm/tegra/for-4.15-rc1-fixes' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: sor: Reimplement pad clock

Merge branch 'bpf-fix-null-arg-semantics'

Gianluca Borello says:

====================
This set includes some fixes in semantics and usability issues that emerged
recently, and would be good to have them in net before the next release.

In particular, ARG_CONST_SIZE_OR_ZERO semantics was recently changed in
commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
semantics") with the goal of letting the compiler generate simpler code
that the verifier can more easily accept.

To handle this change in semantics, a few checks in some helpers were
added, like in commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2
type to ARG_CONST_SIZE_OR_ZERO"), and those checks are less than ideal
because once they make it into a released kernel bpf programs can start
relying on them, preventing the possibility of being removed later on.

This patch tries to fix the issue by introducing a new argument type
ARG_PTR_TO_MEM_OR_NULL that can be used for helpers that can receive a
<NULL, 0> tuple. By doing so, we can fix the semantics of the other helpers
that don't need <NULL, 0> and can just handle <!NULL, 0>, allowing the code
to get rid of those checks.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO

Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
the compiler generates optimized BPF code when checking boundaries of an
argument from C code. A typical example of this optimized code can be
generated using the bpf_perf_event_output helper when operating on variable
memory:

/* len is a generic scalar */
if (len > 0 && len <= 0x7fff)
bpf_perf_event_output(ctx, &perf_map, 0, buf, len);

110: (79) r5 = *(u64 *)(r10 -40)
111: (bf) r1 = r5
112: (07) r1 += -1
113: (25) if r1 > 0x7ffe goto pc+6
114: (bf) r1 = r6
115: (18) r2 = 0xffff94e5f166c200
117: (b7) r3 = 0
118: (bf) r4 = r7
119: (85) call bpf_perf_event_output#25
R5 min value is negative, either use unsigned or 'var &= const'

With this code, the verifier loses track of the variable.

Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
avoids this quite common case which leads to usability issues, and the
compiler generates code that the verifier can more easily test:

if (len <= 0x7fff)
bpf_perf_event_output(ctx, &perf_map, 0, buf, len);

or

bpf_perf_event_output(ctx, &perf_map, 0, buf, len & 0x7fff);

No changes to the bpf_perf_event_output helper are necessary since it can
handle a case where size is 0, and an empty frame is pushed.

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Gianluca Borello <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO

Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO
semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way
the compiler generates optimized BPF code when checking boundaries of an
argument from C code. A typical example of this optimized code can be
generated using the bpf_probe_read_str helper when operating on variable
memory:

/* len is a generic scalar */
if (len > 0 && len <= 0x7fff)
bpf_probe_read_str(p, len, s);

251: (79) r1 = *(u64 *)(r10 -88)
252: (07) r1 += -1
253: (25) if r1 > 0x7ffe goto pc-42
254: (bf) r1 = r7
255: (79) r2 = *(u64 *)(r10 -88)
256: (bf) r8 = r4
257: (85) call bpf_probe_read_str#45
R2 min value is negative, either use unsigned or 'var &= const'

With this code, the verifier loses track of the variable.

Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it
avoids this quite common case which leads to usability issues, and the
compiler generates code that the verifier can more easily test:

if (len <= 0x7fff)
bpf_probe_read_str(p, len, s);

or

bpf_probe_read_str(p, len & 0x7fff, s);

No changes to the bpf_probe_read_str helper are necessary since
strncpy_from_unsafe itself immediately returns if the size passed is 0.

Signed-off-by: Gianluca Borello <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: remove explicit handling of 0 for arg2 in bpf_probe_read

Commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to
ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to
simplify writing bpf programs by taking advantage of the new semantics
introduced for ARG_CONST_SIZE_OR_ZERO which allows <!NULL, 0> arguments.

In order to prevent the helper from actually passing a NULL pointer to
probe_kernel_read, which can happen when <NULL, 0> is passed to the helper,
the commit also introduced an explicit check against size == 0.

After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type,
bpf_probe_read can not receive a pair of <NULL, 0> arguments anymore, thus
the check is not needed anymore and can be removed, since probe_kernel_read
can correctly handle a <!NULL, 0> call. This also fixes the semantics of
the helper before it gets officially released and bpf programs start
relying on this check.

Fixes: 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO")
Signed-off-by: Gianluca Borello <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: introduce ARG_PTR_TO_MEM_OR_NULL

With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
and the verifier can prove the value of this next argument is 0. However,
most helpers are just interested in handling <!NULL, 0>, so forcing them to
deal with <NULL, 0> makes the implementation of those helpers more
complicated for no apparent benefits, requiring them to explicitly handle
those corner cases with checks that bpf programs could start relying upon,
preventing the possibility of removing them later.

Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.

Currently, the only helper that needs this is bpf_csum_diff_proto(), so
change arg1 and arg3 to this new type as well.

Also add a new battery of tests that explicitly test the
!ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
other helpers.

Signed-off-by: Gianluca Borello <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

block: remove useless assignment in bio_split

Remove useless assignment to the variable "split" because the variable is
unconditionally assigned later.

Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

null_blk: fix dev->badblocks leak

null_alloc_dev() allocates memory for dev->badblocks, but cleanup
currently only occurs in the configfs release codepath, missing a number
of other places.

This bug was found running the blktests block/010 test, alongside
kmemleak:
rapido1:/blktests# ./check block/010
...
rapido1:/blktests# echo scan > /sys/kernel/debug/kmemleak
[  306.966708] kmemleak: 32 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
rapido1:/blktests# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88001f86d000 (size 4096):
  comm "modprobe", pid 231, jiffies 4294892415 (age 318.252s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814b0379>] kmemleak_alloc+0x49/0xa0
    [<ffffffff810f180f>] kmem_cache_alloc+0x9f/0xe0
    [<ffffffff8124e45f>] badblocks_init+0x2f/0x60
    [<ffffffffa0019fae>] 0xffffffffa0019fae
    [<ffffffffa0021273>] nullb_device_badblocks_store+0x63/0x130 [null_blk]
    [<ffffffff810004cd>] do_one_initcall+0x3d/0x170
    [<ffffffff8109fe0d>] do_init_module+0x56/0x1e9
    [<ffffffff8109ebd7>] load_module+0x1c47/0x26a0
    [<ffffffff8109f819>] SyS_finit_module+0xa9/0xd0
    [<ffffffff814b4f60>] entry_SYSCALL_64_fastpath+0x13/0x94

Fixes: 2f54a613c942 ("nullb: badbblocks support")
Reviewed-by: Shaohua Li <[email protected]>
Signed-off-by: David Disseldorp <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features

I got the logic wrong in the DT CPU features code when I added the
Power9 DD2.1 feature. We should be setting the bit if we detect a
DD2.1, not clearing it if we detect a DD2.0.

This code isn't actually exercised at the moment so nothing is
actually broken.

Fixes: 3ffa9d9e2a7c ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature")
Signed-off-by: Michael Ellerman <[email protected]>