Git Repo - linux.git/log

Merge branch 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull smack update from James Morris:
"One small change for Automotive Grade Linux"

* 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Smack: Handle CGROUP2 in the same way that CGROUP

alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering

memory-barriers.txt has been updated with the following requirement.

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Current writeX() and iowriteX() implementations on alpha are not
satisfying this requirement as the barrier is after the register write.

Move mb() in writeX() and iowriteX() functions to guarantee that HW
observes memory changes before performing register operations.

Signed-off-by: Sinan Kaya <[email protected]>
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: Implement CPU vulnerabilities sysfs functions.

Implement the CPU vulnerabilty show functions for meltdown, spectre_v1
and spectre_v2 on Alpha.

Tests on XP1000 (EV67/667MHz) and ES45 (EV68CB/1.25GHz) show them
to be vulnerable to Meltdown and Spectre V1. In the case of
Meltdown I saw a 1 to 2% success rate in reading bytes on the
XP1000 and 50 to 60% success rate on the ES45. (This compares to
99.97% success reported for Intel CPUs.) Report EV6 and later
CPUs as vulnerable.

Tests on PWS600au (EV56/600MHz) for Spectre V1 attack were
unsuccessful (though I did not try particularly hard) so mark EV4
through to EV56 as not vulnerable.

Signed-off-by: Michael Cree <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: rtc: stop validating rtc_time in .read_time

The RTC core is always calling rtc_valid_tm after the read_time callback.
It is not necessary to call it just before returning from the callback.

Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: rtc: remove unused set_mmss ops

The .set_mmss and .setmmss64 ops are only called when the RTC is not
providing an implementation for the .set_time callback.

On alpha, .set_time is provided so .set_mmss64 is never called. Remove the
unused code.

Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull alpha syscall cleanups from Al Viro:
"A couple of SYSCALL_DEFINE conversions and removal of pointless (and
  bitrotted) piece stuck in ret_from_kernel_thread since the
  kernel_exceve/kernel_thread conversions six years ago"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  alpha: get rid of pointless insn in ret_from_kernel_thread
  alpha: switch pci syscalls to SYSCALL_DEFINE

Merge branch 'misc.sparc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull sparc syscall cleanups from Al Viro:
"sparc syscall stuff - killing pointless wrappers, conversions to
  {COMPAT_,}SYSCALL_DEFINE"

* 'misc.sparc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sparc: get rid of asm wrapper for nis_syscall()
  sparc: switch compat {f,}truncate64() to COMPAT_SYSCALL_DEFINE
  sparc: switch compat pread64 and pwrite64 to COMPAT_SYSCALL_DEFINE
  convert compat sync_file_range() to COMPAT_SYSCALL_DEFINE
  switch sparc_remap_file_pages() to SYSCALL_DEFINE
  sparc: get rid of memory_ordering(2) wrapper
  sparc: trivial conversions to {COMPAT_,}SYSCALL_DEFINE()
  sparc: bury a zombie extern that had been that way for twenty years
  sparc: get rid of remaining SIGN... wrappers
  sparc: kill useless SIGN... wrappers
  sparc: get rid of sys_sparc_pipe() wrappers

treewide: fix up files incorrectly marked executable

Joe Perches noted that we have a few source files that for some
inexplicable reason (read: I'm too lazy to even go look at the history)
are marked executable:

  drivers/gpu/drm/amd/amdgpu/vce_v4_0.c
  drivers/net/ethernet/cadence/macb_ptp.c

A simple git command line to show executable C/asm/header files is this:

    git ls-files -s '*.[chsS]' | grep '^100755'

and then you can fix them up with scripting by just feeding that output
into:

    | cut -f2 | xargs chmod -x

and commit it.

Which is exactly what this commit does.

Reported-by: Joe Perches <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'i2c/for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:

-I2C core now reports proper OF style module alias. I'd like to repeat
  the note from the commit msg here (Thanks, Javier!):

      NOTE: This patch may break out-of-tree drivers that were relying
            on this behavior, and only had an I2C device ID table even
            when the device was registered via OF.

            There are no remaining drivers in mainline that do this, but
            out-of-tree drivers have to be fixed and define a proper OF
            device ID table to have module auto-loading working.

- new driver for the SynQuacer I2C controller

- major refactoring of the QUP driver

- the piix4 driver now uses request_muxed_region which should fix a
   long standing resource conflict with the sp5100_tco watchdog

- a bunch of small core & driver improvements

* 'i2c/for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (53 commits)
  i2c: add support for Socionext SynQuacer I2C controller
  dt-bindings: i2c: add binding for Socionext SynQuacer I2C
  i2c: Update i2c_trace_msg static key to modern api
  i2c: fix parameter of trace_i2c_result
  i2c: imx: avoid taking clk_prepare mutex in PM callbacks
  i2c: imx: use clk notifier for rate changes
  i2c: make i2c_check_addr_validity() static
  i2c: rcar: fix mask value of prohibited bit
  dt-bindings: i2c: document R8A77965 bindings
  i2c: pca-platform: drop gpio from platform data
  i2c: pca-platform: use device_property_read_u32
  i2c: pca-platform: unconditionally use devm_gpiod_get_optional
  sh: sh7785lcr: add GPIO lookup table for i2c controller reset
  i2c: qup: reorganization of driver code to remove polling for qup v2
  i2c: qup: reorganization of driver code to remove polling for qup v1
  i2c: qup: send NACK for last read sub transfers
  i2c: qup: fix buffer overflow for multiple msg of maximum xfer len
  i2c: qup: change completion timeout according to transfer length
  i2c: qup: use the complete transfer length to choose DMA mode
  i2c: qup: proper error handling for i2c error in BAM mode
  ...

Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
"Notable changes:

   - Support for 4PB user address space on 64-bit, opt-in via mmap().

   - Removal of POWER4 support, which was accidentally broken in 2016
     and no one noticed, and blocked use of some modern instructions.

   - Workarounds so that the hypervisor can enable Transactional Memory
     on Power9.

   - A series to disable the DAWR (Data Address Watchpoint Register) on
     Power9.

   - More information displayed in the meltdown/spectre_v1/v2 sysfs
     files.

   - A vpermxor (Power8 Altivec) implementation for the raid6 Q
     Syndrome.

   - A big series to make the allocation of our pacas (per cpu area),
     kernel page tables, and per-cpu stacks NUMA aware when using the
     Radix MMU on Power9.

  And as usual many fixes, reworks and cleanups.

  Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
  Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
  Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
  Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
  Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
  Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
  Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
  Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
  Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
  Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
  Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
  Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
  Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"

* tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
  powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
  powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
  powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
  powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
  Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
  powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
  powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
  cxl: Fix possible deadlock when processing page faults from cxllib
  powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
  powerpc/mm/radix: Update command line parsing for disable_radix
  powerpc/mm/radix: Parse disable_radix commandline correctly.
  powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
  powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
  powerpc/mm/keys: Update documentation and remove unnecessary check
  powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
  powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: hard disable irqs in smp_send_stop loop
  powerpc: use NMI IPI for smp_send_stop
  powerpc/powernv: Fix SMT4 forcing idle code
  ...

Merge tag 'leaks-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tobin/leaks

Pull leaking-addresses updates from Tobin Harding:
"This set represents improvements to the scripts/leaking_addresses.pl
  script.

  The major improvement is that with this set applied the script
  actually runs in a reasonable amount of time (less than a minute on a
  standard stock Ubuntu user desktop). Also, we have a second maintainer
  now and a tree hosted on kernel.org

  We do a few code clean ups. We fix the command help output. Handling
  of the vsyscall address range is fixed to check the whole range
  instead of just the start/end addresses. We add support for 5 page
  table levels (suggested on LKML). We use a system command to get the
  machine architecture instead of using Perl. Calling this command for
  every regex comparison is what previously choked the script, caching
  the result of this call gave the major speed improvement. We add
  support for scanning 32-bit kernels using the user/kernel memory
  split. Path skipping code refactored and simplified (meaning easier
  script configuration). We remove version numbering. We add a variable
  name to improve readability of a regex and finally we check filenames
  for leaking addresses.

  Currently script scans /proc/PID for all PID. With this set applied we
  only scan for PID==1. It was observed that on an idle system files
  under /proc/PID are predominantly the same for all processes. Also it
  was noted that the script does not scan _all_ the kernel since it only
  scans active processes. Scanning only for PID==1 makes explicit the
  inherent flaw in the script that the scan is only partial and also
  speeds things up"

* tag 'leaks-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tobin/leaks:
  MAINTAINERS: Update LEAKING_ADDRESSES
  leaking_addresses: check if file name contains address
  leaking_addresses: explicitly name variable used in regex
  leaking_addresses: remove version number
  leaking_addresses: skip '/proc/1/syscall'
  leaking_addresses: skip all /proc/PID except /proc/1
  leaking_addresses: cache architecture name
  leaking_addresses: simplify path skipping
  leaking_addresses: do not parse binary files
  leaking_addresses: add 32-bit support
  leaking_addresses: add is_arch() wrapper subroutine
  leaking_addresses: use system command to get arch
  leaking_addresses: add support for 5 page table levels
  leaking_addresses: add support for kernel config file
  leaking_addresses: add range check for vsyscall memory
  leaking_addresses: indent dependant options
  leaking_addresses: remove command examples
  leaking_addresses: remove mention of kptr_restrict
  leaking_addresses: fix typo function not called

Merge tag 'linux-kselftest-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest update from Shuah Khan:
"This Kselftest update for 4.17-rc1 consists of:

   - Test build error fixes

   - Fixes to prevent intel_pstate from building on non-x86 systems.

   - New test for ion with vgem driver.

   - Change to print the test name to /dev/kmsg to add context to kernel
     failures if any uncovered from running the test.

   - Kselftest framework enhancements to add KSFT_TAP_LEVEL environment
     variable to prevent nested TAP headers being printed in the
     Kselftest output.

     Nested TAP13 headers could cause problems for some parsers. This
     change suppresses the nested headers from test programs and test
     shell scripts with changes to framework and Makefiles without
     changing the tests"

* tag 'linux-kselftest-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/intel_pstate: Fix build rule for x86
  selftests: Print the test we're running to /dev/kmsg
  selftests/seccomp: Allow get_metadata to XFAIL
  selftests/android/ion: Makefile: fix build error
  selftests: futex Makefile add top level TAP header echo to RUN_TESTS
  selftests: Makefile set KSFT_TAP_LEVEL to prevent nested TAP headers
  selftests: lib.mk set KSFT_TAP_LEVEL to prevent nested TAP headers
  selftests: kselftest framework: add handling for TAP header level
  selftests: ion: Add simple test with the vgem driver
  selftests: ion: Remove some prints

Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull general security layer updates from James Morris:

- Convert security hooks from list to hlist, a nice cleanup, saving
   about 50% of space, from Sargun Dhillon.

- Only pass the cred, not the secid, to kill_pid_info_as_cred and
   security_task_kill (as the secid can be determined from the cred),
   from Stephen Smalley.

- Close a potential race in kernel_read_file(), by making the file
   unwritable before calling the LSM check (vs after), from Kees Cook.

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: convert security hooks to use hlist
  exec: Set file unwritable before LSM check
  usb, signal, security: only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill

net_sched: fix a missing idr_remove() in u32_delete_key()

When we delete a u32 key via u32_delete_key(), we forget to
call idr_remove() to remove its handle from IDR.

Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reported-by: Marcin Kabiesz <[email protected]>
Tested-by: Marcin Kabiesz <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache updates from David Howells:
"Three patches that fix some of AFS's usage of fscache:

   (1) Need to invalidate the cache if a foreign data change is detected
       on the server.

   (2) Move the vnode ID uniquifier (equivalent to i_generation) from
       the auxiliary data to the index key to prevent a race between
       file delete and a subsequent file create seeing the same index
       key.

   (3) Need to retire cookies that correspond to files that we think got
       deleted on the server.

  Four patches to fix some things in fscache and cachefiles:

   (4) Fix a couple of checker warnings.

   (5) Correctly indicate to the end-of-operation callback whether an
       operation completed or was cancelled.

   (6) Add a check for multiple cookie relinquishment.

   (7) Fix a path through the asynchronous write that doesn't wake up a
       waiter for a page if the cache decides not to write that page,
       but discards it instead.

  A couple of patches to add tracepoints to fscache and cachefiles:

   (8) Add tracepoints for cookie operators, object state machine
       execution, cachefiles object management and cachefiles VFS
       operations.

   (9) Add tracepoints for fscache operation management and page
       wrangling.

  And then three development patches:

  (10) Attach the index key and auxiliary data to the cookie, pass this
       information through various fscache-netfs API functions and get
       rid of the callbacks to the netfs to get it.

       This means that the cache can get at this information, even if
       the netfs goes away. It also means that the cache can be lazy in
       updating the coherency data.

  (11) Pass the object data size through various fscache-netfs API
       rather than calling back to the netfs for it, and store the value
       in the object.

       This makes it easier to correctly resize the object, as the size
       is updated on writes to the cache, rather than calling back out
       to the netfs.

  (12) Maintain a catalogue of allocated cookies. This makes it possible
       to catch cookie collision up front rather than down in the bowels
       of the cache being run from a service thread from the object
       state machine.

       This will also make it possible in the future to reconnect to a
       cookie that's not gone dead yet because it's waiting for
       finalisation of the storage and also make it possible to bring
       cookies online if the cache is added after the cookie has been
       obtained"

* tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: Maintain a catalogue of allocated cookies
  fscache: Pass object size in rather than calling back for it
  fscache: Attach the index key and aux data to the cookie
  fscache: Add more tracepoints
  fscache: Add tracepoints
  fscache: Fix hanging wait on page discarded by writeback
  fscache: Detect multiple relinquishment of a cookie
  fscache: Pass the correct cancelled indications to fscache_op_complete()
  fscache, cachefiles: Fix checker warnings
  afs: Be more aggressive in retiring cached vnodes
  afs: Use the vnode ID uniquifier in the cache key not the aux data
  afs: Invalidate cache on server data change

nfit, address-range-scrub: add module option to skip initial ars

After attempting to quickly retrieve known errors the kernel proceeds to
kick off a long running ARS. Add a module option to disable this
behavior at initialization time, or at new region discovery time.
Otherwise, ARS can be started manually regardless of the state of this
setting.

Co-developed-by: Dave Jiang <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

nfit, address-range-scrub: rework and simplify ARS state machine

ARS is an operation that can take 10s to 100s of seconds to find media
errors that should rarely be present. If the platform crashes due to
media errors in persistent memory, the expectation is that the BIOS will
report those known errors in a 'short' ARS request.

A 'short' ARS request asks platform firmware to return an ARS payload
with all known errors, but without issuing a 'long' scrub. At driver
init a short request is issued to all PMEM ranges before registering
regions. Then, in the background, a long ARS is scheduled for each
region.

The ARS implementation is simplified to centralize ARS completion work
in the ars_complete() helper. The timeout is removed since there is no
facility to cancel ARS, and this otherwise arranges for system init to
never be blocked waiting for a 'long' ARS. The ars_state flags are used
to coordinate ARS requests from driver init, ARS requests from
userspace, and ARS requests in response to media error notifications.

Given that there is no notification of ARS completion the implementation
still needs to poll. It backs off exponentially to a maximum poll period
of 30 minutes.

Suggested-by: Toshi Kani <[email protected]>
Co-developed-by: Dave Jiang <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

nfit, address-range-scrub: determine one platform max_ars value

acpi_nfit_query_poison() is awkward in that it requires an nfit_spa
argument in order to determine what max_ars value to use. Instead probe
for the minimum max_ars across all scrub-capable ranges in the system
and drop the nfit_spa argument.

This enables a larger rework / simplification of the ARS state machine
whereby the status can be retrieved once and then iterated over all
address ranges to reap completions.

Signed-off-by: Dan Williams <[email protected]>

powerpc/powernv: Create platform devs for nvdimm buses

Scan the devicetree for an nvdimm-bus compatible and create
a platform device for them.

Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

doc/devicetree: Persistent memory region bindings

Add device-tree binding documentation for the nvdimm region driver.

Cc: [email protected]
Signed-off-by: Oliver O'Halloran <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

libnvdimm: Add device-tree based driver

This patch adds peliminary device-tree bindings for persistent memory
regions. The driver registers a libnvdimm bus for each pmem-region
node and each address range under the node is converted to a region
within that bus.

Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

libnvdimm: Add of_node to region and bus descriptors

We want to be able to cross reference the region and bus devices
with the device tree node that they were spawned from. libNVDIMM
handles creating the actual devices for these internally, so we
need to pass in a pointer to the relevant node in the descriptor.

Signed-off-by: Oliver O'Halloran <[email protected]>
Acked-by: Dan Williams <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

libnvdimm, region: quiet region probe

The message about constraining number of online cpus to be less than or
equal to ND_MAX_LANES (256) is only useful for block-aperture
configurations and BTT. Make it debug since it is only relevant when
debugging performance.

Signed-off-by: Dan Williams <[email protected]>

ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation

The commit 02a5d6925cd3 ("ALSA: pcm: Avoid potential races between OSS
ioctls and read/write") split the PCM preparation code to a locked
version, and it added a sanity check of runtime->oss.prepare flag
along with the change. This leaded to an endless loop when the stream
gets XRUN: namely, snd_pcm_oss_write3() and co call
snd_pcm_oss_prepare() without setting runtime->oss.prepare flag and
the loop continues until the PCM state reaches to another one.

As the function is supposed to execute the preparation
unconditionally, drop the invalid state check there.

The bug was triggered by syzkaller.

Fixes: 02a5d6925cd3 ("ALSA: pcm: Avoid potential races between OSS ioctls and read/write")
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: usb-audio: Add sanity checks in UAC3 clock parsers

The UAC3 clock parser codes lack of the sanity checks for malformed
descriptors like UAC2 parser does. Without it, the driver may lead to
a potential crash.

Fixes: 9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support")
Tested-by: Ruslan Bilovol <[email protected]>
Reviewed-by: Ruslan Bilovol <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: usb-audio: More strict sanity checks for clock parsers

The sanity checks introduced for malformed descriptors loosely check
the given descriptor size, although the size greater than the defined
description is invalid.  It was due to a concern of any funky firmware
in the actual products.  But this doesn't look hitting, and any sane
products must have the defined descriptors.

So in this patch, we make the validators more strict, allowing only
with the defined descriptor sizes.  The value in clock selector
validator is corrected from 5 to 7 to count the two unlisted fields
after baCSourceID[].

Suggested-by: Ruslan Bilovol <[email protected]>
Reviewed-by: Ruslan Bilovol <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: usb-audio: Refactor clock finder helpers

There are lots of open-coded functions to find a clock source,
selector and multiplier. Now there are both v2 and v3, so six
variants.

This patch refactors the code to use a common helper for the main
loop, and define each validator function for each target.
There is no functional change.

Fixes: 9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support")
Reviewed-by: Ruslan Bilovol <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

libnvdimm, namespace: use a safe lookup for dimm device name

The following NULL dereference results from incorrectly assuming that
ndd is valid in this print:

  struct nvdimm_drvdata *ndd = to_ndd(&nd_region->mapping[i]);

  /*
   * Give up if we don't find an instance of a uuid at each
   * position (from 0 to nd_region->ndr_mappings - 1), or if we
   * find a dimm with two instances of the same uuid.
   */
  dev_err(&nd_region->dev, "%s missing label for %pUb\n",
                  dev_name(ndd->dev), nd_label->uuid);

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 43 PID: 673 Comm: kworker/u609:10 Not tainted 4.16.0-rc4+ #1
[..]
RIP: 0010:nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm]
[..]
Call Trace:
  ? devres_add+0x2f/0x40
  ? devm_kmalloc+0x52/0x60
  ? nd_region_activate+0x9c/0x320 [libnvdimm]
  nd_region_probe+0x94/0x260 [libnvdimm]
  ? kernfs_add_one+0xe4/0x130
  nvdimm_bus_probe+0x63/0x100 [libnvdimm]

Switch to using the nvdimm device directly.

Fixes: 0e3b0d123c8f ("libnvdimm, namespace: allow multiple pmem...")
Cc: <[email protected]>
Reported-by: Dave Jiang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

libnvdimm, dimm: fix dpa reservation vs uninitialized label area

At initialization time the 'dimm' driver caches a copy of the memory
device's label area and reserves address space for each of the
namespaces defined.

However, as can be seen below, the reservation occurs even when the
index blocks are invalid:

nvdimm nmem0: nvdimm_init_config_data: len: 131072 rc: 0
nvdimm nmem0: config data size: 131072
nvdimm nmem0: __nd_label_validate: nsindex0 labelsize 1 invalid
nvdimm nmem0: __nd_label_validate: nsindex1 labelsize 1 invalid
nvdimm nmem0: : pmem-6025e505: 0x1000000000 @ 0xf50000000 reserve <-- bad

Gate dpa reservation on the presence of valid index blocks.

Cc: <[email protected]>
Fixes: 4a826c83db4e ("libnvdimm: namespace indices: read and validate")
Reported-by: Krzysztof Rusocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Adopt iommu_unmap_fast() interface to type1 backend
   (Suravee Suthikulpanit)

- mdev sample driver fixup (Shunyong Yang)

- More efficient PFN mapping handling in type1 backend
   (Jason Cai)

- VFIO device ioeventfd interface (Alex Williamson)

- Tag new vfio-platform sub-maintainer (Alex Williamson)

* tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
  MAINTAINERS: vfio/platform: Update sub-maintainer
  vfio/pci: Add ioeventfd support
  vfio/pci: Use endian neutral helpers
  vfio/pci: Pull BAR mapping setup from read-write path
  vfio/type1: Improve memory pinning process for raw PFN mapping
  vfio-mdev/samples: change RDI interrupt condition
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull fw_cfg, vhost updates from Michael Tsirkin:
"This cleans up the qemu fw cfg device driver.

  On top of this, vmcore is dumped there on crash to help debugging
  with kASLR enabled.

  Also included are some fixes in vhost"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: add vsock compat ioctl
  vhost: fix vhost ioctl signature to build with clang
  fw_cfg: write vmcoreinfo details
  crash: export paddr_vmcoreinfo_note()
  fw_cfg: add DMA register
  fw_cfg: add a public uapi header
  fw_cfg: handle fw_cfg_read_blob() error
  fw_cfg: remove inline from fw_cfg_read_blob()
  fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
  fw_cfg: fix sparse warning reading FW_CFG_ID
  fw_cfg: fix sparse warnings with fw_cfg_file
  fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
  ptr_ring: fix build

Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

- move pci_uevent_ers() out of pci.h (Michael Ellerman)

- skip ASPM common clock warning if BIOS already configured it (Sinan
   Kaya)

- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)

- remove last user of pci_get_bus_and_slot() and the function itself
   (Sinan Kaya)

- add decoding for 16 GT/s link speed (Jay Fang)

- add interfaces to get max link speed and width (Tal Gilboa)

- add pcie_bandwidth_capable() to compute max supported link bandwidth
   (Tal Gilboa)

- add pcie_bandwidth_available() to compute bandwidth available to
   device (Tal Gilboa)

- add pcie_print_link_status() to log link speed and whether it's
   limited (Tal Gilboa)

- use PCI core interfaces to report when device performance may be
   limited by its slot instead of doing it in each driver (Tal Gilboa)

- fix possible cpqphp NULL pointer dereference (Shawn Lin)

- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
   hotplug (Mika Westerberg)

- add support for PCI I/O port space that's neither directly accessible
   via CPU in/out instructions nor directly mapped into CPU physical
   memory space. This is fairly intrusive and includes minor changes to
   interfaces used for I/O space on most platforms (Zhichang Yuan, John
   Garry)

- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
   John Garry)

- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)

- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
   (Shawn Lin)

- report quirk timings with dev_info (Bjorn Helgaas)

- report quirks that take longer than 10ms (Bjorn Helgaas)

- add and use Altera Vendor ID (Johannes Thumshirn)

- tidy Makefiles and comments (Bjorn Helgaas)

- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
   ia64, and mn10300 with x86 (Bjorn Helgaas)

- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
   Lawler)

- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)

- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
   Helgaas)

- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)

- remove portdrv link order dependency (Bjorn Helgaas)

- remove support for unused VC portdrv service (Bjorn Helgaas)

- simplify portdrv feature permission checking (Bjorn Helgaas)

- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
   Helgaas)

- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)

- use cached AER capability offset (Frederick Lawler)

- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)

- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)

- use generic pci_mmap_resource_range() instead of powerpc and xtensa
   arch-specific versions (David Woodhouse)

- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)

- remove System and Video ROM reservations on sparc (Bjorn Helgaas)

- probe for device reset support during enumeration instead of runtime
   (Bjorn Helgaas)

- add ACS quirk for Ampere (née APM) root ports (Feng Kan)

- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
   Vincent-Cross)

- protect device restore with device lock (Sinan Kaya)

- handle failure of FLR gracefully (Sinan Kaya)

- handle CRS (config retry status) after device resets (Sinan Kaya)

- skip various config reads for SR-IOV VFs as an optimization
   (KarimAllah Ahmed)

- consolidate VPD code in vpd.c (Bjorn Helgaas)

- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)

- add DT support for R-Car r8a7743 (Biju Das)

- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
   bridge driver that causes a general protection fault (Dexuan Cui)

- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
   (Dexuan Cui)

- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
   (Dexuan Cui)

- make several structures static (Fengguang Wu)

- increase number of MSI IRQs supported by Synopsys DesignWare bridges
   from 32 to 256 (Gustavo Pimentel)

- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
   API from DesignWare drivers (Gustavo Pimentel)

- add Tegra power management support (Manikanta Maddireddy)

- add Tegra loadable module support (Manikanta Maddireddy)

- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)

- support optional regulator for HiSilicon STB (Shawn Guo)

- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)

- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)

* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
  MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
  HISI LPC: Add ACPI support
  ACPI / scan: Do not enumerate Indirect IO host children
  ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
  HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
  of: Add missing I/O range exception for indirect-IO devices
  PCI: Apply the new generic I/O management on PCI IO hosts
  PCI: Add fwnode handler as input param of pci_register_io_range()
  PCI: Remove __weak tag from pci_register_io_range()
  MAINTAINERS: Add missing /drivers/pci/cadence directory entry
  fm10k: Report PCIe link properties with pcie_print_link_status()
  net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
  net/mlx5: Report PCIe link properties with pcie_print_link_status()
  net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
  PCI: Add pcie_print_link_status() to log link speed and whether it's limited
  PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
  misc: pci_endpoint_test: Handle 64-bit BARs properly
  PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
  PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
  PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
  ...

Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
"Doug and I are at a conference next week so if another PR is sent I
  expect it to only be bug fixes. Parav noted yesterday that there are
  some fringe case behavior changes in his work that he would like to
  fix, and I see that Intel has a number of rc looking patches for HFI1
  they posted yesterday.

  Parav is again the biggest contributor by patch count with his ongoing
  work to enable container support in the RDMA stack, followed by Leon
  doing syzkaller inspired cleanups, though most of the actual fixing
  went to RC.

  There is one uncomfortable series here fixing the user ABI to actually
  work as intended in 32 bit mode. There are lots of notes in the commit
  messages, but the basic summary is we don't think there is an actual
  32 bit kernel user of drivers/infiniband for several good reasons.

  However we are seeing people want to use a 32 bit user space with 64
  bit kernel, which didn't completely work today. So in fixing it we
  required a 32 bit rxe user to upgrade their userspace. rxe users are
  still already quite rare and we think a 32 bit one is non-existing.

   - Fix RDMA uapi headers to actually compile in userspace and be more
     complete

   - Three shared with netdev pull requests from Mellanox:

      * 7 patches, mostly to net with 1 IB related one at the back).
        This series addresses an IRQ performance issue (patch 1),
        cleanups related to the fix for the IRQ performance problem
        (patches 2-6), and then extends the fragmented completion queue
        support that already exists in the net side of the driver to the
        ib side of the driver (patch 7).

      * Mostly IB, with 5 patches to net that are needed to support the
        remaining 10 patches to the IB subsystem. This series extends
        the current 'representor' framework when the mlx5 driver is in
        switchdev mode from being a netdev only construct to being a
        netdev/IB dev construct. The IB dev is limited to raw Eth queue
        pairs only, but by having an IB dev of this type attached to the
        representor for a switchdev port, it enables DPDK to work on the
        switchdev device.

      * All net related, but needed as infrastructure for the rdma
        driver

   - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers

   - SRP performance updates

   - IB uverbs write path cleanup patch series from Leon

   - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
     need to set the port for ib_srpt to listen on in configfs in order
     for it to be enabled
     (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)

   - TSO and Scatter FCS support in mlx4

   - Refactor of modify_qp routine to resolve problems seen while
     working on new code that is forthcoming

   - More refactoring and updates of RDMA CM for containers support from
     Parav

   - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
     memory' user API features

   - Infrastructure updates for the new IOCTL interface, based on
     increased usage

   - ABI compatibility bug fixes to fully support 32 bit userspace on 64
     bit kernel as was originally intended. See the commit messages for
     extensive details

   - Syzkaller bugs and code cleanups motivated by them"

* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
  IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
  IB/mlx5: Device memory mr registration support
  net/mlx5: Mkey creation command adjustments
  IB/mlx5: Device memory support in mlx5_ib
  net/mlx5: Query device memory capabilities
  IB/uverbs: Add device memory registration ioctl support
  IB/uverbs: Add alloc/free dm uverbs ioctl support
  IB/uverbs: Add device memory capabilities reporting
  IB/uverbs: Expose device memory capabilities to user
  RDMA/qedr: Fix wmb usage in qedr
  IB/rxe: Removed GID add/del dummy routines
  RDMA/qedr: Zero stack memory before copying to user space
  IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
  IB/mlx5: Add information for querying IPsec capabilities
  IB/mlx5: Add IPsec support for egress and ingress
  {net,IB}/mlx5: Add ipsec helper
  IB/mlx5: Add modify_flow_action_esp verb
  IB/mlx5: Add implementation for create and destroy action_xfrm
  IB/uverbs: Introduce ESP steering match filter
  IB/uverbs: Add modify ESP flow_action
  ...

Merge tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration

Pull mailbox updates from Jassi Brar:

- New Hi3660 mailbox driver

- Fix TEGRA Kconfig warning

- Broadcom: use dma_pool_zalloc instead of dma_pool_alloc+memset

* tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: Add support for Hi3660 mailbox
  dt-bindings: mailbox: Introduce Hi3660 controller binding
  mailbox: tegra: relax TEGRA_HSP_MBOX Kconfig dependencies
  maillbox: bcm-flexrm-mailbox: Use dma_pool_zalloc()

MAINTAINERS: Update LEAKING_ADDRESSES

MAINTAINERS is out of date for leaking_addresses.pl. There is now a tree on
kernel.org for development of this script. We have a second maintainer now,
thanks Tycho. Development of this scripts was started on kernel-hardening
mailing list so let's keep it there.

Update maintainer details; Add mailing list, kernel.org hosted tree, and second
maintainer.

Signed-off-by: Tobin C. Harding <[email protected]>

MIPS: BCM47XX: Use standard reset button for Luxul XWR-1750

The original patch submitted for support of the Luxul XWR-1750 used a
non-standard button handler for the reset button. This patch will allow
using the standard KEY_RESTART

Signed-off-by: Dan Haab <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Hauke Mehrtens <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/18981/
Signed-off-by: James Hogan <[email protected]>

leaking_addresses: check if file name contains address

Sometimes files may be created by using output from printk. As the scan
traverses the directory tree we should parse each path name and check if
it is leaking an address.

Add check for leaking address on each path name.

Suggested-by: Tycho Andersen <[email protected]>
Acked-by: Tycho Andersen <[email protected]>
Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: explicitly name variable used in regex

Currently sub routine may_leak_address() is checking regex against Perl
special variable $_ which is _fortunately_ being set correctly in a loop
before this sub routine is called. We already have declared a variable
to hold this value '$line' we should use it.

Use $line in regex match instead of implicit $_

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: remove version number

We have git now, we don't need a version number. This was originally
added because leaking_addresses.pl shamelessly (and mindlessly) copied
checkpatch.pl

Remove version number from script.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: skip '/proc/1/syscall'

The pointers listed in /proc/1/syscall are user pointers, and negative
syscall args will show up like kernel addresses.

For example

/proc/31808/syscall: 0 0x3 0x55b107a38180 0x2000 0xffffffffffffffb0 \
0x55b107a302d0 0x55b107a38180 0x7fffa313b8e8 0x7ff098560d11

Skip parsing /proc/1/syscall

Suggested-by: Tycho Andersen <[email protected]>
Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: skip all /proc/PID except /proc/1

When the system is idle it is likely that most files under /proc/PID
will be identical for various processes. Scanning _all_ the PIDs under
/proc is unnecessary and implies that we are thoroughly scanning /proc.
This is _not_ the case because there may be ways userspace can trigger
creation of /proc files that leak addresses but were not present during
a scan. For these two reasons we should exclude all PID directories
under /proc except '1/'

Exclude all /proc/PID except /proc/1.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: cache architecture name

Currently we are repeatedly calling `uname -m`.  This is causing the
script to take a long time to run (more than 10 seconds to parse
/proc/kallsyms).  We can use Perl state variables to cache the result of
the first call to `uname -m`.  With this change in place the script
scans the whole kernel in under a minute.

Cache machine architecture in state variable.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: simplify path skipping

Currently script has multiple configuration arrays. This is confusing,
evident by the fact that a bunch of the entries are in the wrong place.
We can simplify the code by just having a single array for absolute
paths to skip and a single array for file names to skip wherever they
appear in the scanned directory tree. There are also currently multiple
subroutines to handle the different arrays, we can reduce these to a
single subroutine also.

Simplify the path skipping code.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: do not parse binary files

Currently script parses binary files. Since we are scanning for
readable kernel addresses there is no need to parse binary files. We
can use Perl to check if file is binary and skip parsing it if so.

Do not parse binary files.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: add 32-bit support

Currently script only supports x86_64 and ppc64.  It would be nice to be
able to scan 32-bit machines also.  We can add support for 32-bit
architectures by modifying how we check for false positives, taking
advantage of the page offset used by the kernel, and using the correct
regular expression.

Support for 32-bit machines is enabled by the observation that the kernel
addresses on 32-bit machines are larger [in value] than the page offset.
We can use this to filter false positives when scanning the kernel for
leaking addresses.

Programmatic determination of the running architecture is not
immediately obvious (current 32-bit machines return various strings from
`uname -m`).  We therefore provide a flag to enable scanning of 32-bit
kernels.  Also we can check the kernel config file for the offset and if
not found default to 0xc0000000.  A command line option to parse in the
page offset is also provided.  We do automatically detect architecture
if running on ix86.

Add support for 32-bit kernels.  Add a command line option for page
offset.

Suggested-by: Kaiwan N Billimoria <[email protected]>
Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: add is_arch() wrapper subroutine

Currently there is duplicate code when checking the architecture type.
We can remove the duplication by implementing a wrapper function
is_arch().

Implement and use wrapper function is_arch().

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: use system command to get arch

Currently script uses Perl to get the machine architecture. This can be
erroneous since Perl uses the architecture of the machine that Perl was
compiled on not the architecture of the running machine. We should use
the systems `uname` command instead.

Use `uname -m` instead of Perl to get the machine architecture.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: add support for 5 page table levels

Currently script only supports 4 page table levels because of the way
the kernel address regular expression is crafted. We can do better than
this. Using previously added support for kernel configuration options we
can get the number of page table levels defined by
CONFIG_PGTABLE_LEVELS. Using this value a correct regular expression can
be crafted. This only supports 5 page tables on x86_64.

Add support for 5 page table levels on x86_64.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: add support for kernel config file

Features that rely on the ability to get kernel configuration options
are ready to be implemented in script. In preparation for this we can
add support for kernel config options as a separate patch to ease
review.

Add support for locating and parsing kernel configuration file.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: add range check for vsyscall memory

Currently script checks only first and last address in the vsyscall
memory range. We can do better than this. When checking for false
positives against $match, we can convert $match to a hexadecimal value
then check if it lies within the range of vsyscall addresses.

Check whole range of vsyscall addresses when checking for false
positive.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: indent dependant options

A number of the command line options to script are dependant on the
option --input-raw being set. If we indent these options it makes
explicit this dependency.

Indent options dependant on --input-raw.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: remove command examples

Currently help output includes command examples. These were cute when we
first started development of this script but are unnecessary.

Remove command examples.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: remove mention of kptr_restrict

leaking_addresses.pl can be run with kptr_restrict==0 now, we don't need
the comment about setting kptr_restrict any more.

Remove comment suggesting setting kptr_restrict.

Signed-off-by: Tobin C. Harding <[email protected]>

leaking_addresses: fix typo function not called

Currently code uses a check against an undefined variable because the
variable is a sub routine name and is not evaluated.

Evaluate subroutine; add parenthesis to sub routine name.

Signed-off-by: Tobin C. Harding <[email protected]>

pstore: fix crypto dependencies without compression

Commit 58eb5b670747 ("pstore: fix crypto dependencies") fixed up the crypto
dependencies but missed the case when no compression is selected.

With CONFIG_PSTORE=y, CONFIG_PSTORE_COMPRESS=n and CONFIG_CRYPTO=m we see
the following link error:

fs/pstore/platform.o: In function `pstore_register':
(.text+0x1b1): undefined reference to `crypto_has_alg'
(.text+0x205): undefined reference to `crypto_alloc_base'
fs/pstore/platform.o: In function `pstore_unregister':
(.text+0x3b0): undefined reference to `crypto_destroy_tfm'

Fix this by checking at compile-time if CONFIG_PSTORE_COMPRESS is enabled.

Fixes: 58eb5b670747 ("pstore: fix crypto dependencies")
Signed-off-by: Tobias Regnery <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Kees Cook <[email protected]>

Merge tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
"A bigger than usual pull request for SELinux, 13 patches (lucky!)
  along with a scary looking diffstat.

  Although if you look a bit closer, excluding the usual minor
  tweaks/fixes, there are really only two significant changes in this
  pull request: the addition of proper SELinux access controls for SCTP
  and the encapsulation of a lot of internal SELinux state.

  The SCTP changes are the result of a multi-month effort (maybe even a
  year or longer?) between the SELinux folks and the SCTP folks to add
  proper SELinux controls. A special thanks go to Richard for seeing
  this through and keeping the effort moving forward.

  The state encapsulation work is a bit of janitorial work that came out
  of some early work on SELinux namespacing. The question of namespacing
  is still an open one, but I believe there is some real value in the
  encapsulation work so we've split that out and are now sending that up
  to you"

* tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: wrap AVC state
  selinux: wrap selinuxfs state
  selinux: fix handling of uninitialized selinux state in get_bools/classes
  selinux: Update SELinux SCTP documentation
  selinux: Fix ltp test connect-syscall failure
  selinux: rename the {is,set}_enforcing() functions
  selinux: wrap global selinux state
  selinux: fix typo in selinux_netlbl_sctp_sk_clone declaration
  selinux: Add SCTP support
  sctp: Add LSM hooks
  sctp: Add ip option support
  security: Add support for SCTP security hooks
  netlabel: If PF_INET6, check sk_buff ip header version

Merge tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"We didn't have anything to send for v4.16, but we're back with a
  little more than usual for v4.17.

  Eleven patches in total, most fall into the small fix category, but
  there are three non-trivial changes worth calling out:

   - the audit entry filter is being removed after deprecating it for
     quite a while (years of no one really using it because it turns out
     to be not very practical)

   - created our own version of "__mutex_owner()" because the locking
     folks were upset we were using theirs

   - improved our handling of kernel command line parameters to make
     them more forgiving

   - we fixed auditing of symlink operations

  Everything passes the audit-testsuite and as of a few minutes ago it
  merges well with your tree"

* tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: add refused symlink to audit_names
  audit: remove path param from link denied function
  audit: link denied should not directly generate PATH record
  audit: make ANOM_LINK obey audit_enabled and audit_dummy_context
  audit: do not panic on invalid boot parameter
  audit: track the owner of the command mutex ourselves
  audit: return on memory error to avoid null pointer dereference
  audit: bail before bug check if audit disabled
  audit: deprecate the AUDIT_FILTER_ENTRY filter
  audit: session ID should not set arch quick field pointer
  audit: update bugtracker and source URIs

Merge tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
"This cycle was almost entirely improvements to the pstore compression
  options, noted below:

   - Add lz4hc and 842 to pstore compression options (Geliang Tang)

   - Refactor to use crypto compression API (Geliang Tang)

   - Fix up Kconfig dependencies for compression (Arnd Bergmann)

   - Allow for run-time compression selection

   - Remove stack VLA usage"

* tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: fix crypto dependencies
  pstore: Use crypto compress API
  pstore/ram: Do not use stack VLA for parity workspace
  pstore: Select compression at runtime
  pstore: Avoid size casts for 842 compression
  pstore: Add lz4hc and 842 compression support

Merge branch 'akpm' (patches from Andrew)

Merge updates from Andrew Morton:

- a few misc things

- ocfs2 updates

- the v9fs maintainers have been missing for a long time. I've taken
   over v9fs patch slinging.

- most of MM

* emailed patches from Andrew Morton <[email protected]>: (116 commits)
  mm,oom_reaper: check for MMF_OOM_SKIP before complaining
  mm/ksm: fix interaction with THP
  mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
  headers: untangle kmemleak.h from mm.h
  include/linux/mmdebug.h: make VM_WARN* non-rvals
  mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
  mm: change return type to vm_fault_t
  mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
  mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
  kernel/fork.c: detect early free of a live mm
  mm: make counting of list_lru_one::nr_items lockless
  mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
  block_invalidatepage(): only release page if the full page was invalidated
  mm: kernel-doc: add missing parameter descriptions
  mm/swap.c: remove @cold parameter description for release_pages()
  mm/nommu: remove description of alloc_vm_area
  zram: drop max_zpage_size and use zs_huge_class_size()
  zsmalloc: introduce zs_huge_class_size()
  mm: fix races between swapoff and flush dcache
  fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
  ...

make lookup_one_len() safe to use with directory locked shared

Signed-off-by: Al Viro <[email protected]>

new helper: __lookup_slow()

lookup_slow() sans locking/unlocking the directory

Signed-off-by: Al Viro <[email protected]>

merge common parts of lookup_one_len{,_unlocked} into common helper

Signed-off-by: Al Viro <[email protected]>

Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd

Pull MTD updates from Boris Brezillon:
"MTD Core:
   - Remove support for asynchronous erase (not implemented by any of
     the existing drivers anyway)
   - Remove Cyrille from the list of SPI NOR and MTD maintainers
   - Fix kernel doc headers
   - Allow users to define the partitions parsers they want to test
     through a DT property (compatible of the partitions subnode)
   - Remove the bfin-async-flash driver (the only architecture using it
     has been removed)
   - Fix pagetest test
   - Add extra checks in mtd_erase()
   - Simplify the MTD partition creation logic and get rid of
     mtd_add_device_partitions()

  MTD Drivers:
   - Add endianness information to the physmap DT binding
   - Add Eon EN29LV400A IDs to JEDEC probe logic
   - Use %*ph where appropriate

  SPI NOR Drivers:
   - Make fsl-quaspi assign different names to MTD devices connected to
     the same QSPI controller
   - Remove an unneeded driver.bus assigned in the fsl-qspi driver

  NAND Core:
   - Prepare arrival of the SPI NAND subsystem by implementing a generic
     (interface-agnostic) layer to ease manipulation of NAND devices
   - Move onenand code base to the drivers/mtd/nand/ dir
   - Rework timing mode selection
   - Provide a generic way for NAND chip drivers to flag a specific
     GET/SET FEATURE operation as supported/unsupported
   - Stop embedding ONFI/JEDEC param page in nand_chip

  NAND Drivers:
   - Rework/cleanup of the mxc driver
   - Various cleanups in the vf610 driver
   - Migrate the fsmc and vf610 to ->exec_op()
   - Get rid of the pxa driver (replaced by marvell_nand)
   - Support ->setup_data_interface() in the GPMI driver
   - Fix probe error path in several drivers
   - Remove support for unused hw_syndrome mode in sunxi_nand
   - Various minor improvements"

* tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd: (89 commits)
  dt-bindings: fsl-quadspi: Add the example of two SPI NOR
  mtd: fsl-quadspi: Distinguish the mtd device names
  mtd: nand: Fix some function description mismatches in core.c
  mtd: fsl-quadspi: Remove unneeded driver.bus assignment
  mtd: rawnand: marvell: Rename ->ecc_clk into ->core_clk
  mtd: rawnand: s3c2410: enhance the probe function error path
  mtd: rawnand: tango: fix probe function error path
  mtd: rawnand: sh_flctl: fix the probe function error path
  mtd: rawnand: omap2: fix the probe function error path
  mtd: rawnand: mxc: fix probe function error path
  mtd: rawnand: denali: fix probe function error path
  mtd: rawnand: davinci: fix probe function error path
  mtd: rawnand: cafe: fix probe function error path
  mtd: rawnand: brcmnand: fix probe function error path
  mtd: rawnand: sunxi: Stop supporting ECC_HW_SYNDROME mode
  mtd: rawnand: marvell: Fix clock resource by adding a register clock
  mtd: ftl: Use DIV_ROUND_UP()
  mtd: Fix some function description mismatches in mtdcore.c
  mtd: physmap_of: update struct map_info's swap as per map requirement
  dt-bindings: mtd-physmap: Add endianness supports
  ...

Merge tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- DM core passthrough ioctl fix to retain reference to DM table, and
   that table's block devices, while issuing the ioctl to one of those
   block devices.

- DM core passthrough ioctl fix to _not_ override the fmode_t used to
   issue the ioctl. Overriding by using the fmode_t that the block
   device was originally open with during DM table load is a liability.

- Add DM core support for secure erase forwarding and update the DM
   linear and DM striped targets to support them.

- A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
   same, write zeroes) for targets that make use of the non-splitting IO
   variant (as is done for multipath or thinp when layered directly on
   NVMe).

- Allow DM targets to return a payload in response to a DM message that
   they are sent. This is useful for DM targets that would like to
   provide statistics data in response to DM messages.

- Update DM bufio to support non-power-of-2 block sizes. Numerous other
   related changes prepare the DM bufio code for this support.

- Fix DM crypt to use a bounded amount of memory across the entire
   system. This is to avoid OOM that can otherwise occur in response to
   certain pathological IO workloads (e.g. discarding a large DM crypt
   device).

- Add a 'check_at_most_once' feature to the DM verity target to allow
   verity to be used on mobile devices that have very limited resources.

- Fix the DM integrity target to fail early if a keyed algorithm (e.g.
   HMAC) is to be used but the key isn't set.

- Add non-power-of-2 support to the DM unstripe target.

- Eliminate the use of a Variable Length Array in the DM stripe target.

- Update the DM log-writes target to record metadata (REQ_META flag).

- DM raid fixes for its nosync status and some variable range issues.

* tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
  dm: remove fmode_t argument from .prepare_ioctl hook
  dm: hold DM table for duration of ioctl rather than use blkdev_get
  dm raid: fix parse_raid_params() variable range issue
  dm verity: make verity_for_io_block static
  dm verity: add 'check_at_most_once' option to only validate hashes once
  dm bufio: don't embed a bio in the dm_buffer structure
  dm bufio: support non-power-of-two block sizes
  dm bufio: use slab cache for dm_buffer structure allocations
  dm bufio: reorder fields in dm_buffer structure
  dm bufio: relax alignment constraint on slab cache
  dm bufio: remove code that merges slab caches
  dm bufio: get rid of slab cache name allocations
  dm bufio: move dm-bufio.h to include/linux/
  dm bufio: delete outdated comment
  dm: add support for secure erase forwarding
  dm: backfill abnormal IO support to non-splitting IO submission
  dm raid: fix nosync status
  dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
  dm stripe: get rid of a Variable Length Array (VLA)
  dm log writes: record metadata flag for better flags record
  ...

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted stuff, including Christoph's I_DIRTY patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: move I_DIRTY_INODE to fs.h
  ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
  fs: fold open_check_o_direct into do_dentry_open
  vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
  vfs: make sure struct filename->iname is word-aligned
  get rid of pointless includes of fs_struct.h
  [poll] annotate SAA6588_CMD_POLL users

net: phy: marvell: Enable interrupt function on LED2 pin

The LED2[2]/INTn pin on Marvell 88E1318S as well as 88E1510/12/14/18 needs
to be configured to be usable as interrupt not only when WOL is enabled,
but whenever we rely on interrupts from the PHY.

Signed-off-by: Esben Haabendal <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kvm: x86: fix a prototype warning

Make the function static to avoid a

warning: no previous prototype for ‘vmx_enable_tdp’

Signed-off-by: Peng Hao <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-04-06

This series contains a couple of fixes for the new ice driver.

Wei Yongjun fixes the return error code for error case during init.

Anirudh fixes the incorrect use of ARRAY_SIZE() in the ice ethtool code
and fixed "for" loop calculations.
====================

Signed-off-by: David S. Miller <[email protected]>

ARM: sa1100/simpad: switch simpad CF to use gpiod APIs

Switch simpad's CF implementation to use the gpiod APIs. The inverted
detection is handled using gpiolib's native inversion abilities.

Signed-off-by: Russell King <[email protected]>

ARM: sa1100/shannon: convert to generic CF sockets

Convert shannon to use the generic CF socket support.

Signed-off-by: Russell King <[email protected]>

ARM: sa1100/nanoengine: convert to generic CF sockets

Convert nanoengine to use the generic CF socket support.
Makefile fix from Arnd Bergmann <[email protected]>.

Signed-off-by: Russell King <[email protected]>

ice: Bug fixes in ethtool code

1) Return correct size from ice_get_regs_len.
2) Fix incorrect use of ARRAY_SIZE in ice_get_regs.

Fixes: fcea6f3da546 (ice: Add stats and ethtool support)
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Fix error return code in ice_init_hw()

Fix to return error code ICE_ERR_NO_MEMORY from the alloc error
handling case instead of 0, as done elsewhere in this function.

Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology")
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Anirudh Venkataramanan <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

Merge remote-tracking branch 'lorenzo/pci/cadence' into next

* lorenzo/pci/cadence:
MAINTAINERS: Add missing /drivers/pci/cadence directory entry

fscache: Maintain a catalogue of allocated cookies

Maintain a catalogue of allocated cookies so that cookie collisions can be
handled properly. For the moment, this just involves printing a warning
and returning a NULL cookie to the caller of fscache_acquire_cookie(), but
in future it might make sense to wait for the old cookie to finish being
cleaned up.

This requires the cookie key to be stored attached to the cookie so that we
still have the key available if the netfs relinquishes the cookie. This is
done by an earlier patch.

The catalogue also renders redundant fscache_netfs_list (used for checking
for duplicates), so that can be removed.

Signed-off-by: David Howells <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Tested-by: Steve Dickson <[email protected]>

fscache: Pass object size in rather than calling back for it

Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received. This makes it easier to update the size of the object
when a new page is written that extends the object.

The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.

Signed-off-by: David Howells <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Tested-by: Steve Dickson <[email protected]>

init, tracing: Have printk come through the trace events for initcall_debug

With trace events set before and after the initcall function calls, instead
of having a separate routine for printing out the initcalls when
initcall_debug is specified on the kernel command line, have the code
register a callback to the tracepoints where the initcall trace events are.

This removes the need for having a separate function to do the initcalls as
the tracepoint callbacks can handle the printk. It also includes other
initcalls that are not called by the do_one_initcall() which includes
console and security initcalls.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>

init, tracing: instrument security and console initcall trace events

Trace events have been added around the initcall functions defined in
init/main.c. But console and security have their own initcalls. This adds
the trace events associated for those initcall functions.

Link: http://lkml.kernel.org/r/[email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Abderrahmane Benbachir <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

init, tracing: Add initcall trace events

Being able to trace the start and stop of initcalls is useful to see where
the timings are an issue. There is already an "initcall_debug" parameter,
but that can cause a large overhead itself, as the printing of the
information may take longer than the initcall functions.

Adding in a start and finish trace event around the initcall functions, as
well as a trace event that records the level of the initcalls, one can get a
much finer measurement of the times and interactions of the initcalls
themselves, as trace events are much lighter than printk()s.

Suggested-by: Abderrahmane Benbachir <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Add rcu dereference annotation for test func that touches filter->prog

A boot up test function update_pred_fn() dereferences filter->prog without
the proper rcu annotation.

To do this, we must also take the event_mutex first. Normally, this isn't
needed because this test function can not race with other use cases that
touch the event filters (it is disabled if any events are enabled).

Reported-by: kbuild test robot <[email protected]>
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Add rcu dereference annotation for filter->prog

ftrace_function_set_filter() referenences filter->prog without annotation
and sparse complains about it. It needs a rcu_dereference_protected()
wrapper.

Reported-by: kbuild test robot <[email protected]>
Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Fixup logic inversion on setting trace_global_clock defaults

In commit 932066a15335 ("tracing: Default to using trace_global_clock if
sched_clock is unstable"), the logic for deciding to override the
default clock if unstable was reversed from the earlier posting. I was
trying to reduce the width of the message by using an early return
rather than a if-block, but reverted back to using the if-block and
accidentally left the predicate inverted.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 932066a15335 ("tracing: Default to using trace_global_clock if sched_clock is unstable")
Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Hide global trace clock from lockdep

Function tracing can trace in NMIs and such. If the TSC is determined
to be unstable, the tracing clock will switch to the global clock on
boot up, unless "trace_clock" is specified on the kernel command line.

The global clock disables interrupts to access sched_clock_cpu(), and in
doing so can be done within lockdep internals (because of function
tracing and NMIs). This can trigger false lockdep splats.

The trace_clock_global() is special, best not to trace the irq logic
within it.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ring-buffer: Add set/clear_current_oom_origin() during allocations

As si_mem_available() can say there is enough memory even though the memory
available is not useable by the ring buffer, it is best to not kill innocent
applications because the ring buffer is taking up all the memory while it is
trying to allocate a great deal of memory.

If the allocator is user space (because kernel threads can also increase the
size of the kernel ring buffer on boot up), then after si_mem_available()
says there is enough memory, set the OOM killer to kill the current task if
an OOM triggers during the allocation.

Link: http://lkml.kernel.org/r/[email protected]
Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ring-buffer: Check if memory is available before allocation

The ring buffer is made up of a link list of pages. When making the ring
buffer bigger, it will allocate all the pages it needs before adding to the
ring buffer, and if it fails, it frees them and returns an error. This makes
increasing the ring buffer size an all or nothing action. When this was
first created, the pages were allocated with "NORETRY". This was to not
cause any Out-Of-Memory (OOM) actions from allocating the ring buffer. But
NORETRY was too strict, as the ring buffer would fail to expand even when
there's memory available, but was taken up in the page cache.

Commit 848618857d253 ("tracing/ring_buffer: Try harder to allocate") changed
the allocating from NORETRY to RETRY_MAYFAIL. The RETRY_MAYFAIL would
allocate from the page cache, but if there was no memory available, it would
simple fail the allocation and not trigger an OOM.

This worked fine, but had one problem. As the ring buffer would allocate one
page at a time, it could take up all memory in the system before it failed
to allocate and free that memory. If the allocation is happening and the
ring buffer allocates all memory and then tries to take more than available,
its allocation will not trigger an OOM, but if there's any allocation that
happens someplace else, that could trigger an OOM, even though once the ring
buffer's allocation fails, it would free up all the previous memory it tried
to allocate, and allow other memory allocations to succeed.

Commit d02bd27bd33dd ("mm/page_alloc.c: calculate 'available' memory in a
separate function") separated out si_mem_availble() as a separate function
that could be used to see how much memory is available in the system. Using
this function to make sure that the ring buffer could be allocated before it
tries to allocate pages we can avoid allocating all memory in the system and
making it vulnerable to OOMs if other allocations are taking place.

Link: http://lkml.kernel.org/r/[email protected]
CC: [email protected]
Cc: [email protected]
Fixes: 848618857d253 ("tracing/ring_buffer: Try harder to allocate")
Requires: d02bd27bd33dd ("mm/page_alloc.c: calculate 'available' memory in a separate function")
Reported-by: Zhaoyang Huang <[email protected]>
Tested-by: Joel Fernandes <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

lockdep: Add print_irqtrace_events() to __warn

Running a test on a x86_32 kernel I triggered a bug that an interrupt
disable/enable isn't being catched by lockdep. At least knowing where the
last one was found would be helpful, but the warnings that are produced do
not show this information. Even without debugging lockdep, having the WARN()
display the last place hard and soft irqs were enabled or disabled is
valuable.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>

vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)

Commit 841a915d20c7b2 ("printf: Do not have bprintf dereference pointers")
would preprocess various pointers that are dereferenced in the bprintf()
because the recording and printing are done at two different times. Some
pointers stayed dereferenced in the ring buffer because user space could
handle them (namely "%pS" and friends). Pointers that are not dereferenced
should not be processed immediately but instead just saved directly.

Cc: [email protected]
Fixes: 841a915d20c7b2 ("printf: Do not have bprintf dereference pointers")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Uninitialized variable in create_tracing_map_fields()

Smatch complains that idx can be used uninitialized when we check if
(idx < 0). It has to be the first iteration through the loop and the
HIST_FIELD_FL_STACKTRACE bit has to be clear and the HIST_FIELD_FL_VAR
bit has to be set to reach the bug.

Link: http://lkml.kernel.org/r/20180328114815.GC29050@mwanda
Fixes: 30350d65ac56 ("tracing: Add variable support to hist triggers")
Acked-by: Tom Zanussi <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Make sure variable string fields are NULL-terminated

The strncpy() currently being used for variable string fields can
result in a lack of termination if the string length is equal to the
field size. Use the safer strscpy() instead, which will guarantee
termination.

Link: http://lkml.kernel.org/r/fb97c1e518fb358c12a4057d7445ba2c46956cd7.1522256721.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Add action comparisons when testing matching hist triggers

Actions also need to be considered when checking for matching triggers
- triggers differing only by action should be allowed, but currently
aren't because the matching check ignores the action and erroneously
returns -EEXIST.

Add and call an actions_match() function to address that.

Here's an example using onmatch() actions.  The first -EEXIST shouldn't
occur because the onmatch() is different in the second wakeup_latency()
param.  The second -EEXIST shouldn't occur because it's a different
action (in this case, it doesn't have an action, so shouldn't be seen
as being the same and therefore rejected).

In the after case, both are correctly accepted (and trying to add one of
them again returns -EEXIST as it should).

before:

  # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events
  # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
  # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: File exists
  # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
-su: echo: write error: File exists

after:

  # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events
  # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
  # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Link: http://lkml.kernel.org/r/a7fd668b87ec10736c8f016ac4279c8480d50c2b.1522256721.git.tom.zanussi@linux.intel.com
Tested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Don't add flag strings when displaying variable references

Variable references should never have flags appended when displayed -
prevent that from happening.

Before:

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:...

After:

  hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:...

Link: http://lkml.kernel.org/r/913318a5610ef6b24af2522575f671fa6ee19b6b.1522256721.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Fix display of hist trigger expressions containing timestamps

When displaying hist triggers, variable references that have the
timestamp field flag set are erroneously displayed as common_timestamp
rather than the variable reference.  Additionally, timestamp
expressions are displayed in the same way.  Fix this by forcing the
timestamp flag handling to follow variable reference and expression
handling.

Before:

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs:...

After:

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
  hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:...

Link: http://lkml.kernel.org/r/92746b06be67499c2a6217bd55395b350ad18fad.1522256721.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ftrace: Drop a VLA in module_exists()

Avoid a VLA by using a real constant expression instead of a variable.
The compiler should be able to optimize the original code and avoid using
an actual VLA. Anyway this change is useful because it will avoid a false
positive with -Wvla, it might also help the compiler generating better
code.

Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Salvatore Mesoraca <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Mention trace_clock=global when warning about unstable clocks

Mention the alternative of adding trace_clock=global to the kernel
command line when we detect that we've used an unstable clock across a
suspend/resume cycle.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Default to using trace_global_clock if sched_clock is unstable

Across suspend, we may see a very large drift in timestamps if the sched
clock is unstable, prompting the global trace's ringbuffer code to warn
and suggest switching to the global clock. Preempt this request by
detecting when the sched clock is unstable (determined during
late_initcall) and automatically switching the default clock over to
trace_global_clock.

This should prevent requiring user interaction to resolve warnings such
as:

    Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857
    If you just came from a suspend/resume,
    please switch to the trace global clock:
    echo global > /sys/kernel/debug/tracing/trace_clock

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chris Wilson <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

mm,oom_reaper: check for MMF_OOM_SKIP before complaining

I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP).  We don't need to complain when
exit_mmap() already set MMF_OOM_SKIP.

  Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: unable to reap pid:7558 (a.out)
  a.out           D13272  7558   6931 0x00100084
  Call Trace:
   schedule+0x2d/0x80
   rwsem_down_write_failed+0x2bb/0x440
   call_rwsem_down_write_failed+0x13/0x20
   down_write+0x49/0x60
   unlink_file_vma+0x28/0x50
   free_pgtables+0x36/0x100
   exit_mmap+0xbb/0x180
   mmput+0x50/0x110
   copy_process.part.41+0xb61/0x1fe0
   _do_fork+0xe6/0x560
   do_syscall_64+0x74/0x230
   entry_SYSCALL_64_after_hwframe+0x42/0xb7

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/ksm: fix interaction with THP

This patch fixes a corner case for KSM.  When two pages belong or
belonged to the same transparent hugepage, and they should be merged,
KSM fails to split the page, and therefore no merging happens.

This bug can be reproduced by:
* making sure ksm is running (in case disabling ksmtuned)
* enabling transparent hugepages
* allocating a THP-aligned 1-THP-sized buffer
  e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
* filling it with the same values
  e.g. memset(p, 42, 1<<21)
* performing madvise to make it mergeable
  e.g. madvise(p, 1<<21, MADV_MERGEABLE)
* waiting for KSM to perform a few scans

The expected outcome is that the all the pages get merged (1 shared and
the rest sharing); the actual outcome is that no pages get merged (1
unshared and the rest volatile)

The reason of this behaviour is that we increase the reference count
once for both pages we want to merge, but if they belong to the same
hugepage (or compound page), the reference counter used in both cases is
the one of the head of the compound page.  This means that
split_huge_page will find a value of the reference counter too high and
will fail.

This patch solves this problem by testing if the two pages to merge
belong to the same hugepage when attempting to merge them.  If so, the
hugepage is split safely.  This means that the hugepage is not split if
not necessary.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Claudio Imbrenda <[email protected]>
Co-authored-by: Gerald Schaefer <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t

This fixes a warning shown when phys_addr_t is 32-bit int when compiling
with clang:

  mm/memblock.c:927:15: warning: implicit conversion from 'unsigned long long'
        to 'phys_addr_t' (aka 'unsigned int') changes value from
        18446744073709551615 to 4294967295 [-Wconstant-conversion]
                                  r->base : ULLONG_MAX;
                                            ^~~~~~~~~~
  ./include/linux/kernel.h:30:21: note: expanded from macro 'ULLONG_MAX'
  #define ULLONG_MAX      (~0ULL)
                           ^~~~~

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Agner <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

headers: untangle kmemleak.h from mm.h

Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
reason.  It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
don't already #include it.  Also remove <linux/kmemleak.h> from source
files that do not use it.

This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
be good to run it through the 0day bot for other $ARCHes.  I have
neither the horsepower nor the storage space for the other $ARCHes.

Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
for which patches are included here (in v2).

[ slab.h is the second most used header file after module.h; kernel.h is
  right there with slab.h. There could be some minor error in the
  counting due to some #includes having comments after them and I didn't
  combine all of those. ]

[[email protected]: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Reported-by: Michael Ellerman <[email protected]> [2 build failures]
Reported-by: Fengguang Wu <[email protected]> [2 build failures]
Reviewed-by: Andrew Morton <[email protected]>
Cc: Wei Yongjun <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mimi Zohar <[email protected]>
Cc: John Johansen <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/mmdebug.h: make VM_WARN* non-rvals

At present the construct

if (VM_WARN(...))

will compile OK with CONFIG_DEBUG_VM=y and will fail with
CONFIG_DEBUG_VM=n.  The reason is that VM_{WARN,BUG}* have always been
special wrt.  {WARN/BUG}* and never generate any code when DEBUG_VM is
disabled.  So we cannot really use it in conditionals.

We considered changing things so that this construct works in both cases
but that might cause unwanted code generation with CONFIG_DEBUG_VM=n.
It is safer and simpler to make the build fail in both cases.

[[email protected]: changelog]
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>