Git Repo - linux.git/log

powerpc/powernv: Fix endian issues with sensor code

One OPAL call and one device tree property needed byte swapping.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

md/bitmap: don't abuse i_writecount for bitmap files.

md bitmap code currently tries to use i_writecount to stop any other
process from writing to out bitmap file. But that is really an abuse
and has bit-rotted so locking is all wrong.

So discard that - root should be allowed to shoot self in foot.

Still use it in a much less intrusive way to stop the same file being
used as bitmap on two different array, and apply other checks to
ensure the file is at least vaguely usable for bitmap storage
(is regular, is open for write. Support for ->bmap is already checked
elsewhere).

Reported-by: Al Viro <[email protected]>
Signed-off-by: NeilBrown <[email protected]>

Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
"Highlights:
   - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
   - xdr fixes (a larger xdr rewrite has been posted but I decided it
     would be better to queue it up for 3.16).
   - miscellaneous fixes and cleanup from all over (thanks especially to
     Kinglong Mee)"

* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
  nfsd4: don't create unnecessary mask acl
  nfsd: revert v2 half of "nfsd: don't return high mode bits"
  nfsd4: fix memory leak in nfsd4_encode_fattr()
  nfsd: check passed socket's net matches NFSd superblock's one
  SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
  NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
  SUNRPC: New helper for creating client with rpc_xprt
  NFSD: Free backchannel xprt in bc_destroy
  NFSD: Clear wcc data between compound ops
  nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
  nfsd4: fix nfs4err_resource in 4.1 case
  nfsd4: fix setclientid encode size
  nfsd4: remove redundant check from nfsd4_check_resp_size
  nfsd4: use more generous NFS4_ACL_MAX
  nfsd4: minor nfsd4_replay_cache_entry cleanup
  nfsd4: nfsd4_replay_cache_entry should be static
  nfsd4: update comments with obsolete function name
  rpc: Allow xdr_buf_subsegment to operate in-place
  NFSD: Using free_conn free connection
  SUNRPC: fix memory leak of peer addresses in XPRT
  ...

tracepoint: Simplify tracepoint module search

Instead of copying the num_tracepoints and tracepoints_ptrs from
the module structure to the tp_mod structure, which only uses it to
find the module associated to tracepoints of modules that are coming
and going, simply copy the pointer to the module struct to the tracepoint
tp_module structure.

Also removed un-needed brackets around an if statement.

Link: http://lkml.kernel.org/r/[email protected]
Acked-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints

Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.

This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.

From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.

   text    data     bss     dec     hex filename
10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
10434930        4282848 10391552        25109330        17f2352 vmlinux

Link: http://lkml.kernel.org/r/[email protected]
CC: Ingo Molnar <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Frank Ch. Eigler <[email protected]>
CC: Johannes Berg <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: Steven Rostedt <[email protected]>

Merge branch 'akpm' (incoming from Andrew)

Merge a few more patches from Andrew Morton:
"A few leftovers"

* emailed patches from Andrew Morton <[email protected]>:
  fs/ncpfs/dir.c: fix indenting in ncp_lookup()
  ncpfs/inode.c: fix mismatch printk formats and arguments
  ncpfs: remove now unused PRINTK macro
  ncpfs: convert PPRINTK to ncp_vdbg
  ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
  ncpfs: Add pr_fmt and convert printks to pr_<level>
  arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
  lib/percpu_counter.c: fix bad percpu counter state during suspend
  autofs4: check dev ioctl size before allocating
  mm: vmscan: do not swap anon pages just because free+file is low

fs/ncpfs/dir.c: fix indenting in ncp_lookup()

My static checker suggests adding curly braces here. Probably that was
the intent, but actually the code works the same either way. I've just
changed the indenting and left the code as-is.

Signed-off-by: Dan Carpenter <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Acked-by: Dave Chiluk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ncpfs/inode.c: fix mismatch printk formats and arguments

Conversions to ncp_dbg showed some format/argument mismatches so fix
them.

Signed-off-by: Joe Perches <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ncpfs: remove now unused PRINTK macro

Uses are gone, remove the macro.

Signed-off-by: Joe Perches <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ncpfs: convert PPRINTK to ncp_vdbg

Use a more current logging style.

Convert the paranoia debug statement to vdbg.
Remove the embedded function names as dynamic_debug can do that.

Signed-off-by: Joe Perches <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg

Use a more current logging style and enable use of dynamic debugging.

Remove embedded function names, dynamic debug can add this instead.

Signed-off-by: Joe Perches <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ncpfs: Add pr_fmt and convert printks to pr_<level>

Convert to a more current logging style.

Add pr_fmt to prefix with "ncpfs: ".
Remove the embedded function names and use "%s: ", __func__

Some previously unprefixed messages now have "ncpfs: "

Signed-off-by: Joe Perches <[email protected]>
Cc: Petr Vandrovec <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()

Kmemcheck should use the preferred interface for parsing command line
arguments, kstrto*(), rather than sscanf() itself. Use it
appropriately.

Signed-off-by: David Rientjes <[email protected]>
Cc: Vegard Nossum <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/percpu_counter.c: fix bad percpu counter state during suspend

I got a bug report yesterday from Laszlo Ersek in which he states that
his kvm instance fails to suspend.  Laszlo bisected it down to this
commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is
converted to use the blk-mq infrastructure.

After digging a bit, it became clear that the issue was with the queue
drain.  blk-mq tracks queue usage in a percpu counter, which is
incremented on request alloc and decremented when the request is freed.
The initial hunt was for an inconsistency in blk-mq, but everything
seemed fine.  In fact, the counter only returned crazy values when
suspend was in progress.

When a CPU is unplugged, the percpu counters merges that CPU state with
the general state.  blk-mq takes care to register a hotcpu notifier with
the appropriate priority, so we know it runs after the percpu counter
notifier.  However, the percpu counter notifier only merges the state
when the CPU is fully gone.  This leaves a state transition where the
CPU going away is no longer in the online mask, yet it still holds
private values.  This means that in this state, percpu_counter_sum()
returns invalid results, and the suspend then hangs waiting for
abs(dead-cpu-value) requests to complete which of course will never
happen.

Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state.  This bug
has been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.

Signed-off-by: Jens Axboe <[email protected]>
Reported-by: Laszlo Ersek <[email protected]>
Tested-by: Laszlo Ersek <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

autofs4: check dev ioctl size before allocating

There wasn't any check of the size passed from userspace before trying
to allocate the memory required.

This meant that userspace might request more space than allowed,
triggering an OOM.

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: vmscan: do not swap anon pages just because free+file is low

Page reclaim force-scans / swaps anonymous pages when file cache drops
below the high watermark of a zone in order to prevent what little cache
remains from thrashing.

However, on bigger machines the high watermark value can be quite large
and when the workload is dominated by a static anonymous/shmem set, the
file set might just be a small window of used-once cache. In such
situations, the VM starts swapping heavily when instead it should be
recycling the no longer used cache.

This is a longer-standing problem, but it's more likely to trigger after
commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy")
because file pages can no longer accumulate in a single zone and are
dispersed into smaller fractions among the available zones.

To resolve this, do not force scan anon when file pages are low but
instead rely on the scan/rotation ratios to make the right prediction.

Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Suleiman Souhlal <[email protected]>
Cc: <[email protected]> [3.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/block/loop.c: ratelimit error messages

Metric tons of high speed spew is not helpful when things go pear shaped.
systemd lost its mind, forgot how to stop services it insists on being
sole manager of, massive printk() flood ensued, box eventually died.

[16206.684000] loop: Write error at byte offset 11412291584, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 13155434496, length 4096.
[16206.684000] loop: Write error at byte offset 13155438592, length 4096.
[16206.684000] loop: Write error at byte offset 13155442688, length 4096.
[16206.684000] loop: Write error at byte offset 13960736768, length 4096.
[16206.684000] loop: Write error at byte offset 14229172224, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.
[16206.684000] loop: Write error at byte offset 14766043136, length 4096.
[16206.684000] loop: Write error at byte offset 15034478592, length 4096.
[16206.684000] systemd-journald[1758]: /dev/kmsg buffer overrun, some messages lost.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge remote-tracking branches 'asoc/fix/rt5640', 'asoc/fix/samsung', 'asoc/fix/tlv320aic23' and 'asoc/fix/warn' into asoc-linus

Merge remote-tracking branches 'asoc/fix/alc5632', 'asoc/fix/cs42l52', 'asoc/fix/cs42xxx8', 'asoc/fix/da732x', 'asoc/fix/davinci', 'asoc/fix/fsl-sai', 'asoc/fix/fsl-ssi' and 'asoc/fix/max98090' into asoc-linus

Merge tag 'asoc-v3.15-4' into asoc-linus

ASoC: Final updates for v3.15 merge window

A few more updates from last week - use of the tdm_slot mapping from
Xiubo plus a few smaller fixes and cleanups.

# gpg: Signature made Mon 31 Mar 2014 10:49:42 BST using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"

Merge tag 'asoc-v3.15-3' into asoc-linus

ASoC: Updates for v3.15

A few more updates for the merge window:

- Fixes for the simple-card DAI format DT mess.
- A new driver for Cirrus cs42xx8 devices.
- DT support for a couple more devices.
- A revert of a previous buggy fix for soc-pcm, plus a few more fixes
   and cleanups.

# gpg: Signature made Sun 23 Mar 2014 16:56:11 GMT using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"

Merge tag 'asoc-v3.15-2' into asoc-linus

ASoC: Updates for v3.15

This is mostly a few additional fixes from Lars-Peter, a new driver and
cleaning up a git failure with merging the Intel branch (combined with
an xargs failure to pay attention to error codes).  The history lists a
bunch of additional commits for the branch but the content of those
commits is actually present already but not recorded in history due to
git failing.  Unfortunately xargs is used in the merge script and it
doesn't do a good job of noticing errors from the commands it invokes.

# gpg: Signature made Thu 13 Mar 2014 14:25:44 GMT using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"

Merge tag 'asoc-v3.15' into asoc-linus

ASoC: Updates for v3.15

Quite a busy release for ASoC this time, more on janitorial work than
exciting new features but welcome nontheless:

- Lots of cleanups from Takashi for enumerations; the original API for
   these was error prone so he's refactored lots of code to use more
   modern APIs which avoid issues.
- Elimination of the ASoC level wrappers for I2C and SPI moving us
   closer to converting to regmap completely and avoiding some
   randconfig hassle.
- Provide both manually and transparently locked DAPM APIs rather than
   a mix of the two fixing some concurrency issues.
- Start converting CODEC drivers to use separate bus interface drivers
   rather than having them all in one file helping avoid dependency
   issues.
- DPCM support for Intel Haswell and Bay Trail platforms.
- Lots of work on improvements for simple-card, DaVinci and the Renesas
   rcar drivers.
- New drivers for Analog Devices ADAU1977, TI PCM512x and parts of the
   CSR SiRF SoC.

# gpg: Signature made Wed 12 Mar 2014 23:05:45 GMT using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"
# gpg:                 aka "Mark Brown <[email protected]>"

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull more networking updates from David Miller:

1) If a VXLAN interface is created with no groups, we can crash on
    reception of packets.  Fix from Mike Rapoport.

2) Missing includes in CPTS driver, from Alexei Starovoitov.

3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
    and Dan Carpenter.

4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers.  From
    Josh Boyer.

5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
    Borkmann.

6) Byte-Queue-Limit enabled drivers aren't handled properly in
    AF_PACKET transmit path, also from Daniel Borkmann.

    Same problem exists in pktgen, and Daniel fixed it there too.

7) Fix resource leaks in driver probe error paths of new sxgbe driver,
    from Francois Romieu.

8) Truesize of SKBs can gradually get more and more corrupted in NAPI
    packet recycling path, fix from Eric Dumazet.

9) Fix uniprocessor netfilter build, from Florian Westphal.  In the
    longer term we should perhaps try to find a way for ARRAY_SIZE() to
    work even with zero sized array elements.

10) Fix crash in netfilter conntrack extensions due to mis-estimation of
    required extension space.  From Andrey Vagin.

11) Since we commit table rule updates before trying to copy the
    counters back to userspace (it's the last action we perform), we
    really can't signal the user copy with an error as we are beyond the
    point from which we can unwind everything.  This causes all kinds of
    use after free crashes and other mysterious behavior.

    From Thomas Graf.

12) Restore previous behvaior of div/mod by zero in BPF filter
    processing.  From Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net: sctp: wake up all assocs if sndbuf policy is per socket
  isdnloop: several buffer overflows
  netdev: remove potentially harmful checks
  pktgen: fix xmit test for BQL enabled devices
  net/at91_ether: avoid NULL pointer dereference
  tipc: Let tipc_release() return 0
  at86rf230: fix MAX_CSMA_RETRIES parameter
  mac802154: fix duplicate #include headers
  sxgbe: fix duplicate #include headers
  net: filter: be more defensive on div/mod by X==0
  netfilter: Can't fail and free after table replacement
  xen-netback: Trivial format string fix
  net: bcmgenet: Remove unnecessary version.h inclusion
  net: smc911x: Remove unused local variable
  bonding: Inactive slaves should keep inactive flag's value
  netfilter: nf_tables: fix wrong format in request_module()
  netfilter: nf_tables: set names cannot be larger than 15 bytes
  netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
  netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
  netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
  ...

Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull more staging patches from Greg KH:
"Here are some more staging patches for 3.15-rc1.

  They include a late-submission of a wireless driver that a bunch of
  people seem to have the hardware for now.  As it's stand-alone, it
  should be fine (now passes the 0-day random build bot tests).

  There are also some fixes for the unisys drivers, as they were causing
  havoc on a number of different machines.  To resolve all of those
  issues, we just mark the driver as BROKEN now, and we can fix it up
  "properly" over time"

* tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723au: The 8723 only has two paths
  Staging: unisys: mark drivers as BROKEN
  Staging: unisys: verify that a control channel exists
  staging: unisys: Add missing close parentheses in filexfer.c
  staging: r8723au: Fix build problem when RFKILL is not selected
  staging: r8723au: Fix randconfig build errors
  staging: r8723au: Turn on build of new driver
  staging: r8723au: Additional source patches
  staging: r8723au: Add source files for new driver - part 4
  staging: r8723au: Add source files for new driver - part 3
  staging: r8723au: Add source files for new driver - part 2
  staging: r8723au: Add source files for new driver - part 1

Merge branch 'acpi-config'

* acpi-config:
ACPI: Update the ACPI spec information in Kconfig

ACPI: Update the ACPI spec information in Kconfig

The UEFI Forum included the ACPI spec in its portfolio in October 2013
and will host future spec iterations, following the ACPI v5.0a release.

A UEFI Forum working group named ACPI Specification Working Group (ASWG)
has been established to handle future ACPI developments, any UEFI member
can join the group and contribute to ACPI specification.

So update the ownership and developers for ACPI in Kconfig accordingly,
and add another website link to ACPI specification too.

Signed-off-by: Hanjun Guo <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull second set of arm64 updates from Catalin Marinas:
"A second pull request for this merging window, mainly with fixes and
  docs clarification:

   - Documentation clarification on CPU topology and booting
     requirements
   - Additional cache flushing during boot (needed in the presence of
     external caches or under virtualisation)
   - DMA range invalidation fix for non cache line aligned buffers
   - Build failure fix with !COMPAT
   - Kconfig update for STRICT_DEVMEM"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix DMA range invalidation for cache line unaligned buffers
  arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
  arm64: fix !CONFIG_COMPAT build failures
  Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
  arm64: Relax the kernel cache requirements for boot
  arm64: Update the TCR_EL1 translation granule definitions for 16K pages
  ARM: topology: Make it clear that all CPUs need to be described

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull second set of s390 patches from Martin Schwidefsky:
"The second part of Heikos uaccess rework, the page table walker for
  uaccess is now a thing of the past (yay!)

  The code change to fix the theoretical TLB flush problem allows us to
  add a TLB flush optimization for zEC12, this machine has new
  instructions that allow to do CPU local TLB flushes for single pages
  and for all pages of a specific address space.

  Plus the usual bug fixing and some more cleanup"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uaccess: rework uaccess code - fix locking issues
  s390/mm,tlb: optimize TLB flushing for zEC12
  s390/mm,tlb: safeguard against speculative TLB creation
  s390/irq: Use defines for external interruption codes
  s390/irq: Add defines for external interruption codes
  s390/sclp: add timeout for queued requests
  kvm/s390: also set guest pages back to stable on kexec/kdump
  lcs: Add missing destroy_timer_on_stack()
  s390/tape: Add missing destroy_timer_on_stack()
  s390/tape: Use del_timer_sync()
  s390/3270: fix crash with multiple reset device requests
  s390/bitops,atomic: add missing memory barriers
  s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6

net: sctp: wake up all assocs if sndbuf policy is per socket

SCTP charges chunks for wmem accounting via skb->truesize in
sctp_set_owner_w(), and sctp_wfree() respectively as the
reverse operation. If a sender runs out of wmem, it needs to
wait via sctp_wait_for_sndbuf(), and gets woken up by a call
to __sctp_write_space() mostly via sctp_wfree().

__sctp_write_space() is being called per association. Although
we assign sk->sk_write_space() to sctp_write_space(), which
is then being done per socket, it is only used if send space
is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
is set and therefore not invoked in sock_wfree().

Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf
again when transmitting packet") fixed an issue where in case
sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again
unless it is interrupted by a signal. However, a still
remaining issue is that if net.sctp.sndbuf_policy=0, that is
accounting per socket, and one-to-many sockets are in use,
the reclaimed write space from sctp_wfree() is 'unfairly'
handed back on the server to the association that is the lucky
one to be woken up again via __sctp_write_space(), while
the remaining associations are never be woken up again
(unless by a signal).

The effect disappears with net.sctp.sndbuf_policy=1, that
is wmem accounting per association, as it guarantees a fair
share of wmem among associations.

Therefore, if we have reclaimed memory in case of per socket
accounting, wake all related associations to a socket in a
fair manner, that is, traverse the socket association list
starting from the current neighbour of the association and
issue a __sctp_write_space() to everyone until we end up
waking ourselves. This guarantees that no association is
preferred over another and even if more associations are
taken into the one-to-many session, all receivers will get
messages from the server and are not stalled forever on
high load. This setting still leaves the advantage of per
socket accounting in touch as an association can still use
up global limits if unused by others.

Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.")
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Thomas Graf <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: Vlad Yasevich <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ARM: 8016/1: Check cpu id in pj4_cp0_init.

Check cpu id in pj4_cp0_init. So for no-PJ4 V7 cpus,
pj4_cpu0_init just return.
This fix will help to make the all the V7 cpus(PJ4 and no-PJ4)
can use code.

Signed-off-by: Chao Xie <[email protected]>
Reviewed-by: Kevin Hilman <[email protected]>
Tested-by: Kevin Hilman <[email protected]>
Tested-by: Stephen Warren <[email protected]>
Reviewed-by: Stephen Warren <[email protected]>
Tested-by: Matt Porter <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 8015/1: Add cpu_is_pj4 to distinguish PJ4 because it has some differences with V7

The patch add cpu_is_pj4 at arch/arm/include/asm/cputype.h
PJ4 has some differences with V7, for example the coprocessor.
To disinguish this kind of situation. cpu_is_pj4 is needed.

Signed-off-by: Chao Xie <[email protected]>
Reviewed-by: Kevin Hilman <[email protected]>
Tested-by: Kevin Hilman <[email protected]>
Tested-by: Stephen Warren <[email protected]>
Reviewed-by: Stephen Warren <[email protected]>
Tested-by: Matt Porter <[email protected]>
Signed-off-by: Russell King <[email protected]>

Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
"Highlights:

   - drm:

     Generic display port aux features, primary plane support, drm
     master management fixes, logging cleanups, enforced locking checks
     (instead of docs), documentation improvements, minor number
     handling cleanup, pseudofs for shared inodes.

   - ttm:

     add ability to allocate from both ends

   - i915:

     broadwell features, power domain and runtime pm, per-process
     address space infrastructure (not enabled)

   - msm:

     power management, hdmi audio support

   - nouveau:

     ongoing GPU fault recovery, initial maxwell support, random fixes

   - exynos:

     refactored driver to clean up a lot of abstraction, DP support
     moved into drm, LVDS bridge support added, parallel panel support

   - gma500:

     SGX MMU support, SGX irq handling, asle irq work fixes

   - radeon:

     video engine bringup, ring handling fixes, use dp aux helpers

   - vmwgfx:

     add rendernode support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
  DRM: armada: fix corruption while loading cursors
  drm/dp_helper: don't return EPROTO for defers (v2)
  drm/bridge: export ptn3460_init function
  drm/exynos: remove MODULE_DEVICE_TABLE definitions
  ARM: dts: exynos4412-trats2: enable exynos/fimd node
  ARM: dts: exynos4210-trats: enable exynos/fimd node
  ARM: dts: exynos4412-trats2: add panel node
  ARM: dts: exynos4210-trats: add panel node
  ARM: dts: exynos4: add MIPI DSI Master node
  drm/panel: add S6E8AA0 driver
  ARM: dts: exynos4210-universal_c210: add proper panel node
  drm/panel: add ld9040 driver
  panel/ld9040: add DT bindings
  panel/s6e8aa0: add DT bindings
  drm/exynos: add DSIM driver
  exynos/dsim: add DT bindings
  drm/exynos: disallow fbdev initialization if no device is connected
  drm/mipi_dsi: create dsi devices only for nodes with reg property
  drm/mipi_dsi: add flags to DSI messages
  Skip intel_crt_init for Dell XPS 8700
  ...

isdnloop: several buffer overflows

There are three buffer overflows addressed in this patch.

1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer.  I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().

2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters.  The ->eazlist[] is 11
characters long.  I have modified the code to return if the source
buffer is too long.

3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters.  I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.

Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[].  (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters).  For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

include/linux/syscalls.h: add sys_renameat2() prototype

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/coccinelle: Use PTR_ERR_OR_ZERO

PTR_RET is deprecated. Do not recommend its usage anymore.
Use PTR_ERR_OR_ZERO instead.

Signed-off-by: Sachin Kamat <[email protected]>
Signed-off-by: Michal Marek <[email protected]>

scripts/bootgraph.pl: Add graphic header

Adding -header + help function like other .pl in /scripts.

Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Michal Marek <[email protected]>

ALSA: ice1712: Fix boundary checks in PCM pointer ops

PCM pointer callbacks in ice1712 driver check the buffer size boundary
wrongly between bytes and frames.  This leads to PCM core warnings
like:
   snd_pcm_update_hw_ptr0: 105 callbacks suppressed
   ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730

This patch fixes these checks to be placed after the proper unit
conversions.

Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

scripts: objdiff: detect object code changes between two commits

objdiff is useful when doing large code cleanups.  For example, when
removing checkpatch warnings and errors from new drivers in the staging
tree.

objdiff can be used in conjunction with a git rebase to confirm that
each commit made no changes to the resulting object code.  It has the
same return values as diff(1).

This was written specifically to support adding the skein and threefish
cryto drivers to the staging tree.  I needed a programmatic way to
confirm that commits changing >90% of the lines didn't inadvertently
change the code.

Temporary files (objdump output) are stored in

  /path/to/linux/.tmp_objdiff

'make mrproper' will remove this directory.

Signed-off-by: Jason Cooper <[email protected]>
Signed-off-by: Michal Marek <[email protected]>

ARM: add missing system_misc.h include to process.c

arm_pm_restart(), arm_pm_idle() and soft_restart() are all declared in
system_misc.h, but this file is not included in process.c. Add this
missing include. Found via sparse:

arch/arm/kernel/process.c:98:6: warning: symbol 'soft_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:127:6: warning: symbol 'arm_pm_restart' was not declared. Should it be static?
arch/arm/kernel/process.c:134:6: warning: symbol 'arm_pm_idle' was not declared. Should it be static?

Signed-off-by: Russell King <[email protected]>

[media] gpsca: remove the risk of a division by zero

As reported by Coverity, there's a potential risk of a division
by zero on some calls to jpeg_set_qual(), if quality is zero.

As quality can't be 0 or lower than that, adds an extra clause
to cover this special case.

Coverity reports: CID#11922280, CID#11922293, CID#11922295

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

[media] stk1160: warrant a NUL terminated string

strncpy() doesn't warrant a NUL terminated string. Use
strlcpy() instead.

Fixes Coverity bug CID#1195195.

Acked-by: Ezequiel Garcia <[email protected]>
Acked-by: Hans Verkuil <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

backlight: lm3639: Use devm_backlight_device_register()

Change to use devm_backlight_device_register() for simple cleanup.

Signed-off-by: Daniel Jeong <[email protected]>
Acked-by: Jingoo Han <[email protected]>
Signed-off-by: Lee Jones <[email protected]>

backlight: gpio-backlight: Add DT support

Signed-off-by: Denis Carikli <[email protected]>
Acked-by: Jingoo Han <[email protected]>
Signed-off-by: Lee Jones <[email protected]>

backlight: core: Replace kfree with put_device

As per the comments on device_register, we shouldn't call kfree()
right after a device_register() failure. Instead call put_device(),
which in turn will call bl_device_release resulting in a kfree to the
full structure.

Signed-off-by: Levente Kurusa <[email protected]>
Acked-by: Jingoo Han <[email protected]>
Signed-off-by: Lee Jones <[email protected]>

ASoC: davinci-mcasp: Fix bit clock polarity settings

IB_NF, NB_IF and IB_IF configured the bc polarity incorrectly. The receive
polarity was set to the same edge as the TX in these cases.

Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

ASoC: samsung: Fix build on multiplatform

PCM and S/PDIF drivers referenced mach headers for a trivial
data structure. This caused build errors on multiplatform builds
as machine headers are not accessible from driver files. Move the data
structure definition to the driver header and remove the dependency.
While at it rename the structure to avoid multiple definition errors
as the same structure is also used by the platform code.

Signed-off-by: Sachin Kamat <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

ASoC: fsl_sai: Fix Bit Clock Polarity configurations

The BCP bit in TCR4/RCR4 register rules as followings:
  0 Bit clock is active high with drive outputs on rising edge
    and sample inputs on falling edge.
  1 Bit clock is active low with drive outputs on falling edge
    and sample inputs on rising edge.

For all formats currently supported in the fsl_sai driver, they're exactly
sending data on the falling edge and sampling on the rising edge.

However, the driver clears this BCP bit for all of them which results click
noise when working with SGTL5000 and big noise with WM8962.

Thus this patch corrects the BCP settings for all the formats here to fix
the nosie issue.

Signed-off-by: Nicolin Chen <[email protected]>
Acked-by: Xiubo Li <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

Merge branches 'pm-wakeup' and 'pm-domains'

* pm-wakeup:
PM / wakeup: Correct presence vs. emptiness of wakeup_* attributes

* pm-domains:
PM / domains: Add pd_ignore_unused to keep power domains enabled

Merge branches 'acpi-cleanup', 'acpi-thermal', 'acpi-video' and 'acpi-dock'

* acpi-cleanup:
  ACPI: Clean up memory allocations

* acpi-thermal:
  ACPI / thermal: Fix wrong variable usage in debug statement

* acpi-video:
  ACPI / video: Favor native backlight interface for ThinkPad Helix

* acpi-dock:
  ACPI / dock: Drop dock_device_ids[] table

Merge branch 'pm-cpufreq'

* pm-cpufreq:
  cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
  cpufreq: create another field .flags in cpufreq_frequency_table
  cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
  cpufreq: don't print value of .driver_data from core
  cpufreq: ia64: don't set .driver_data to index
  cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
  cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
  cpufreq: powernv: cpufreq driver for powernv platform
  cpufreq: at32ap: don't declare local variable as static
  cpufreq: loongson2_cpufreq: don't declare local variable as static
  cpufreq: unicore32: fix typo issue for 'clk'
  cpufreq: exynos: Disable on multiplatform build

Merge branch 'pm-cpuidle'

* pm-cpuidle:
  cpuidle: sysfs: Export target residency information
  intel_idle: fine-tune IVT residency targets
  tools/power turbostat: Run on Broadwell
  tools/power turbostat: simplify output, add Avg_MHz
  intel_idle: Add CPU model 54 (Atom N2000 series)
  intel_idle: support Bay Trail
  intel_idle: allow sparse sub-state numbering, for Bay Trail
  ACPI idle: permit sparse C-state sub-state numbers

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpuidle

Pull intel_idle and turbostat material for v3.15-rc1 from Len Brown.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  intel_idle: fine-tune IVT residency targets
  tools/power turbostat: Run on Broadwell
  tools/power turbostat: simplify output, add Avg_MHz
  intel_idle: Add CPU model 54 (Atom N2000 series)
  intel_idle: support Bay Trail
  intel_idle: allow sparse sub-state numbering, for Bay Trail
  ACPI idle: permit sparse C-state sub-state numbers

Merge branch 'cpu-hotplug'

* cpu-hotplug:
arm, kvm: fix double lock on cpu_add_remove_lock

arm, kvm: fix double lock on cpu_add_remove_lock

Commit 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
holds the lock before calling the two functions:

kvm_vgic_hyp_init()
kvm_timer_hyp_init()

and both the two functions are calling register_cpu_notifier()
to register cpu notifier, so cause double lock on cpu_add_remove_lock.

Considered that both two functions are only called inside
kvm_arch_init() with holding cpu_add_remove_lock, so simply use
__register_cpu_notifier() to fix the problem.

Fixes: 8146875de7d4 (arm, kvm: Fix CPU hotplug callback registration)
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

spi: qup: Depend on ARCH_QCOM

Commit 8fc1b0f87d9f ("ARM: qcom: Split Qualcomm support into legacy and
multiplatform") removed Kconfig symbol ARCH_MSM_DT. But that commit
left one (optional) dependency on ARCH_MSM_DT untouched.

Three Kconfig symbols used to depend on ARCH_MSM_DT: ARCH_MSM8X60,
ARCH_MSM8960, and ARCH_MSM8974. These three symbols now depend on
ARCH_QCOM. So it appears this driver needs to depend on ARCH_QCOM too.

Signed-off-by: Paul Bolle <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Signed-off-by: Mark Brown <[email protected]>

arm64: Fix DMA range invalidation for cache line unaligned buffers

If the buffer needing cache invalidation for inbound DMA does start or
end on a cache line aligned address, we need to use the non-destructive
clean&invalidate operation. This issue was introduced by commit
7363590d2c46 (arm64: Implement coherent DMA API based on swiotlb).

Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: Jon Medhurst (Tixy) <[email protected]>

cpuidle: sysfs: Export target residency information

From user space, there is no way to know the target residency for each idle
state. If we want to write tools to measure the accuracy of the idle state
selection from the governor, we need this info.

As the exit latency is exported through sysfs, exporting the target residency
in the same place makes sense.

Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h

fsl_soc.h was included twice.

Signed-off-by: Sachin Kamat <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ALSA: hda - Do not assign streams in reverse order

Currently stream numbers are assigned in reverse order.

Unfortunately commit 7546abfb8e1f9933b5 ("ALSA: hda - Increment
default stream numbers for AMD HDMI controllers") assumed this was not
the case (specifically, it had the "old cards had single device only"
=> "extra unused stream numbers do not matter" assumption), causing
non-working audio regressions for AMD Radeon HDMI users.

Change the stream numbers to be assigned in forward order.

The benefit is that regular audio playback will still work even if the
assumed stream count is too high, downside is that a too high stream
count may remain hidden.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77002
Reported-by: Christian Güdel <[email protected]>
Signed-off-by: Anssi Hannula <[email protected]>
Tested-by: Christian Güdel <[email protected]> # 3.14
Cc: Alex Deucher <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek - Add eapd shutup to ALC283

Add eapd shutup function to alc283_shutup.
It could avoid pop noise from speaker.

Signed-off-by: Kailang Yang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek - Change model name alias for ChromeOS

Chrome OS was use model name of alc283-dac-wcaps for loading model as default.
Change the model name to same as model name of Chrome OS for future support.

Signed-off-by: Kailang Yang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

mmc: sdhci-acpi: Intel SDIO has broken card detect

Intel SDIO has broken card detect so add a quirk to reflect that.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Ulf Hansson <[email protected]>
Signed-off-by: Chris Ball <[email protected]>

thermal: rcar-thermal: update thermal zone only when temperature changes

Avoid updating the thermal zone in case an IRQ was triggered but the
temperature didn't effectively change.
Note this is not a driver issue.
Below is a captured debug trace illustrating the purpose of this patch:
out of 8 thermal zone updates, only 2 are actually necessary.

[   41.120000] rcar_thermal_work(): cctemp=25000
[   41.120000] rcar_thermal_work(): nctemp=30000
[   41.120000] rcar_thermal_work(): temp is now 30000C, update thermal zone
[   58.990000] rcar_thermal_work(): cctemp=30000
[   58.990000] rcar_thermal_work(): nctemp=30000
[   58.990000] rcar_thermal_work(): same temp, do not update thermal zone
[   59.290000] rcar_thermal_work(): cctemp=30000
[   59.290000] rcar_thermal_work(): nctemp=30000
[   59.290000] rcar_thermal_work(): same temp, do not update thermal zone
[   59.590000] rcar_thermal_work(): cctemp=30000
[   59.590000] rcar_thermal_work(): nctemp=30000
[   59.590000] rcar_thermal_work(): same temp, do not update thermal zone
[   59.890000] rcar_thermal_work(): cctemp=30000
[   59.890000] rcar_thermal_work(): nctemp=30000
[   59.890000] rcar_thermal_work(): same temp, do not update thermal zone
[   60.190000] rcar_thermal_work(): cctemp=30000
[   60.190000] rcar_thermal_work(): nctemp=30000
[   60.190000] rcar_thermal_work(): same temp, do not update thermal zone
[   60.490000] rcar_thermal_work(): cctemp=30000
[   60.490000] rcar_thermal_work(): nctemp=30000
[   60.490000] rcar_thermal_work(): same temp, do not update thermal zone
[   60.790000] rcar_thermal_work(): cctemp=30000
[   60.790000] rcar_thermal_work(): nctemp=35000
[   60.790000] rcar_thermal_work(): temp is now 35000C, update thermal zone

I suspect this may be due to sensor sampling accuracy / fluctuation,
but no formal proof.

Signed-off-by: Patrick Titiano <[email protected]>
Acked-by: Kuninori Morimoto <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>

thermal: rcar-thermal: fix same mask applied twice

Mask is already applied preceding the if statement.
Remove the second mask.

Signed-off-by: Patrick Titiano <[email protected]>
Acked-by: Kuninori Morimoto <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>

thermal: ti-soc-thermal: Use SIMPLE_DEV_PM_OPS macro

Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.

Signed-off-by: Jingoo Han <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>

thermal: imx: update formula for thermal sensor

Thermal sensor used to need two calibration points which are
in fuse map to get a slope for converting thermal sensor's raw
data to real temperature in degree C. Due to the chip calibration
limitation, hardware team provides an universal formula to get
real temperature from internal thermal sensor raw data:

Slope = 0.4297157 - (0.0015976 * 25C fuse);

Update the formula, as there will be no hot point calibration
data in fuse map from now on.

Signed-off-by: Anson Huang <[email protected]>
Acked-by: Shawn Guo <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
"various cleanups for ext2, ext3, udf, isofs, a documentation update
  for quota, and a fix of a race in reiserfs readdir implementation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix race in readdir
  ext2: acl: remove unneeded include of linux/capability.h
  ext3: explicitly remove inode from orphan list after failed direct io
  fs/isofs/inode.c add __init to init_inodecache()
  ext3: Speedup WB_SYNC_ALL pass
  fs/quota/Kconfig: Update filesystems
  ext3: Update outdated comment before ext3_ordered_writepage()
  ext3: Update PF_MEMALLOC handling in ext3_write_inode()
  ext2/3: use prandom_u32() instead of get_random_bytes()
  ext3: remove an unneeded check in ext3_new_blocks()
  ext3: remove unneeded check in ext3_ordered_writepage()
  fs: Mark function as static in ext3/xattr_security.c
  fs: Mark function as static in ext3/dir.c
  fs: Mark function as static in ext2/xattr_security.c
  ext3: Add __init macro to init_inodecache
  ext2: Add __init macro to init_inodecache
  udf: Add __init macro to init_inodecache
  fs: udf: parse_options: blocksize check

Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild changes from Michal Marek:
- cleanups in the main Makefiles and Documentation/DocBook/Makefile
- make O=...  directory is automatically created if needed
- mrproper/distclean removes the old include/linux/version.h to make
   life easier when bisecting across the commit that moved the version.h
   file

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: docbook: fix the include error when executing "make help"
  kbuild: create a build directory automatically for out-of-tree build
  kbuild: remove redundant '.*.cmd' pattern from make distclean
  kbuild: move "quote" to Kbuild.include to be consistent
  kbuild: docbook: use $(obj) and $(src) rather than specific path
  kbuild: unconditionally clobber include/linux/version.h on distclean
  kbuild: docbook: specify KERNELDOC dependency correctly
  kbuild: docbook: include cmd files more simply
  kbuild: specify build_docproc as a phony target

Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC changes from Vineet Gupta:
- Support for external initrd from Noam
- Fix broken serial console in nsimosci Virtual Platform
- Reuse of ENTRY/END assembler macros across hand asm code
- Other minor fixes here and there

* tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [nsimosci] Unbork console
  ARC: [nsimosci] Change .dts to use generic 8250 UART
  ARC: [SMP] General Fixes
  ARC: Remove unused DT template file
  ARC: [clockevent] simplify timer ISR
  ARC: [clockevent] can't be SoC specific
  ARC: Remove ARC_HAS_COH_RTSC
  ARC: switch to generic ENTRY/END assembler annotations
  ARC: support external initrd
  ARC: add uImage to .gitignore
  ARC: [arcfpga] Fix __initconst data const-correctness

DRM: armada: fix corruption while loading cursors

Loading cursors to the LCD controller's SRAM can be corrupted when the
configured pixel clock is relatively slow. This seems to be caused
when we write back-to-back to the SRAM registers.

There doesn't appear to be any status register we can read to check
when an access has completed.

Inserting a dummy read between the writes appears to fix the problem.

Cc: <[email protected]> # 3.13
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

Merge tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen build fix from David Vrabel:
"Fix arm build of drivers/xen/events/

The merge of irq-core-for-linus branch broke it"

* tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Xen: do hv callback accounting only on x86

Merge branch 'akpm' (incoming from Andrew)

Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things

* emailed patches from Andrew Morton <[email protected]>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...

MAINTAINERS: update Intel C600 SAS driver maintainers

Signed-off-by: Lukasz Dorau <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Maciej Patelczyk <[email protected]>
Cc: Artur Paszkiewicz <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ufs: remove unused ufs_super_block_third pointer

Pointer 'usb3' to struct ufs_super_block_third acquired via
ubh_get_usb_third() is never used in function
ufs_read_cylinder_structures(). Thus remove it.

Detected by Coverity: CID 139939.

Signed-off-by: Christian Engelmayer <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ufs: remove unused ufs_super_block_second pointer

Pointer 'usb2' to struct ufs_super_block_second acquired via
ubh_get_usb_second() is never used in function ufs_statfs(). Thus
remove it.

Detected by Coverity: CID 139940.

Signed-off-by: Christian Engelmayer <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ufs: remove unused ufs_super_block_first pointer

Remove occurences of unused pointers to struct ufs_super_block_first
that were acquired via ubh_get_usb_first().

Detected by Coverity: CID 139929 - CID 139936, CID 139940.

Signed-off-by: Christian Engelmayer <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ufs/super.c: add __init to init_inodecache()

init_inodecache is only called by __init init_ufs_fs.

Signed-off-by: Fabian Frederick <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

doc/kernel-parameters.txt: add early_ioremap_debug

Add description of early_ioremap_debug kernel parameter.

Signed-off-by: Mark Salter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Young <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: add early_ioremap support

Add support for early IO or memory mappings which are needed before the
normal ioremap() is usable. This also adds fixmap support for permanent
fixed mappings such as that used by the earlyprintk device register
region.

Signed-off-by: Mark Salter <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Young <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: initialize pgprot info earlier in boot

Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.

The new fixmap and early_ioremap support also needs to use these macros
before paging_init() is called. This patch moves the init_mem_pgprot()
call out of paging_init() and into setup_arch() so that pgprot_default
gets initialized in time for fixmap and early_ioremap.

Signed-off-by: Mark Salter <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Young <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: use generic early_ioremap

Move x86 over to the generic early ioremap implementation.

Signed-off-by: Mark Salter <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: create generic early_ioremap() support

This patch creates a generic implementation of early_ioremap() support
based on the existing x86 implementation. early_ioremp() is useful for
early boot code which needs to temporarily map I/O or memory regions
before normal mapping functions such as ioremap() are available.

Some architectures have optional MMU. In the no-MMU case, the remap
functions simply return the passed in physical address and the unmap
functions do nothing.

Signed-off-by: Mark Salter <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86/mm: sparse warning fix for early_memremap

This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures.  The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available.  Typically, this
means before paging_init() has run.

This patch (of 6):

There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);

early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.

For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap

From Boris:
  "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
   FWIW, all callers of early_memremap use the memory they get remapped
   as normal memory so we should be safe"

Signed-off-by: Dave Young <[email protected]>
Signed-off-by: Mark Salter <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lglock: map to spinlock when !CONFIG_SMP

When the system has only one CPU, lglock is effectively a spinlock; map
it directly to spinlock to eliminate the indirection and duplicate code.

In addition to removing overhead, this drops 1.6k of code with a
defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.

Signed-off-by: Josh Triplett <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Howells <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

percpu: add preemption checks to __this_cpu ops

We define a check function in order to avoid trouble with the include
files. Then the higher level __this_cpu macros are modified to invoke
the preemption check.

[[email protected]: coding-style fixes]
Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Tejun Heo <[email protected]>
Tested-by: Grygorii Strashko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vmstat: use raw_cpu_ops to avoid false positives on preemption checks

vm counters are allowed to be racy. Use raw_cpu_ops to avoid the
local_irq_disable overhead and to avoid preemption checks which will be
added to the __this_cpu operations.

[[email protected]: Add comment. Again.]
Signed-off-by: Christoph Lameter <[email protected]>
Reported-by: Sergey Senozhatsky <[email protected]>
Cc: Dave Chinner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

slub: use raw_cpu_inc for incrementing statistics

Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.

When __this_cpu_inc is altered to check if preemption is disabled this
triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.

[[email protected]: add comment]
Signed-off-by: Christoph Lameter <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

net: replace __this_cpu_inc in route.c with raw_cpu_inc

The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.

I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915ce87e ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races.  In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high.  These counters may be inaccurate
due to race conditions.

The use of __this_cpu op improves the situation already from what commit
dbd2915ce87e did since the single instruction emitted on x86 does not
allow the race to occur anymore.  However, non x86 platforms could still
experience a race here.

Trace:

  __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    __ip_route_output_key+0x575/0x8c0
    ip_route_output_flow+0x27/0x70
    udp_sendmsg+0x825/0xa20
    inet_sendmsg+0x85/0xc0
    sock_sendmsg+0x9c/0xd0
    ___sys_sendmsg+0x37c/0x390
    __sys_sendmsg+0x49/0x90
    SyS_sendmsg+0x12/0x20
    tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

modules: use raw_cpu_write for initialization of per cpu refcount.

The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.

So simply disable the check through the use of raw_cpu ops.

Trace:

  __this_cpu_write operation in preemptible [00000000] code: modprobe/286
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    dump_stack+0x4e/0x82
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    load_module+0xcfd/0x2650
    SyS_init_module+0xa6/0xd0
    tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: Rusty Russell <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: use raw_cpu ops for determining current NUMA node

With the preempt checking logic for __this_cpu_ops we will get false
positives from locations in the code that use numa_node_id.

Before the __this_cpu ops where introduced there were no checks for
preemption present either.  smp_raw_processor_id() was used.  See

  http://www.spinics.net/lists/linux-numa/msg00641.html

Therefore we need to use raw_cpu_read here to avoid false postives.

Note that this issue has been discussed in prior years.  If the process
changes nodes after retrieving the current numa node then that is
acceptable since most uses of numa_node etc are for optimization and not
for correctness.

There were suggestions to implement a raw_numa_node_id in order to do
preempt checks for numa_node_id as well.  But I think we better defer
that to another patch since that would mean investigating how
numa_node_id() is used throughout the kernel which would increase the
scope of this patchset significantly.  After all preemption was never
checked before when numa_node_id() was used.

Some sample traces:

__this_cpu_read operation in preemptible [00000000] code: login/1456
caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  get_task_policy+0x1d/0x49
  get_vma_policy+0x14/0x76
  alloc_pages_vma+0x35/0xff
  handle_mm_fault+0x290/0x73b
  __do_page_fault+0x3fe/0x44d
  do_page_fault+0x9/0xc
  page_fault+0x22/0x30
  generic_file_aio_read+0x38e/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  alloc_pages_current+0x8f/0xbc
  __page_cache_alloc+0xb/0xd
  __do_page_cache_readahead+0xf4/0x219
  ra_submit+0x1c/0x20
  ondemand_readahead+0x28c/0x2b4
  page_cache_sync_readahead+0x38/0x3a
  generic_file_aio_read+0x261/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

percpu: add raw_cpu_ops

The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel.  The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).

The patch set also addresses various consistency issues in general with
the per cpu macros.

A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
   because checks are skipped. This is typically shown through a raw_
   prefix. So this patch set changes the places where __this_cpu_ptr()
   is used to raw_cpu_ptr().

B. There has been the long term wish by some that __this_cpu operations
   would check for preemption. However, there are cases where preemption
   checks need to be skipped. This patch set adds raw_cpu operations that
   do not check for preemption and then adds preemption checks to the
   __this_cpu operations.

C. The use of __get_cpu_var is always a reference to a percpu variable
   that can also be handled via a this_cpu operation. This patch set
   replaces all uses of __get_cpu_var with this_cpu operations.

D. We can then use this_cpu RMW operations in various places replacing
   sequences of instructions by a single one.

E. The use of this_cpu operations throughout will allow other arches than
   x86 to implement optimized references and RMV operations to work with
   per cpu local data.

F. The use of this_cpu operations opens up the possibility to
   further optimize code that relies on synchronization through
   per cpu data.

The patch set works in a couple of stages:

I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
    Also converts the existing __this_cpu_xx_# primitive in the x86
    code to raw_cpu_xx_#.

II. Patch 2-4 use the raw_cpu operations in places that would give
     us false positives once they are enabled.

III. Patch 5 adds preemption checks to __this_cpu operations to allow
    checking if preemption is properly disabled when these functions
    are used.

IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
   with this_cpu_ptr. They do not depend on any changes to the percpu
   code. No preemption tests are skipped if they are applied.

V. Patches 21-46 are conversion patches that use this_cpu operations
   in various kernel subsystems/drivers or arch code.

VI.  Patches 47/48 (not included in this series) remove no longer used
    functions (__this_cpu_ptr and __get_cpu_var).  These should only be
    applied after all the conversion patches have made it and after we
    have done additional passes through the kernel to ensure that none of
    the uses of these functions remain.

This patch (of 46):

The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.

raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.

Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.

Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Bryan Wu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: David Daney <[email protected]>
Cc: David Miller <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Dipankar Sarma <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Hans-Christian Egtvedt <[email protected]>
Cc: Hedi Berriche <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Mike Frysinger <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Nicolas Pitre <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Russell King <[email protected]>
Cc: Russell King <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Wim Van Sebroeck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

slub: fix leak of 'name' in sysfs_slab_add

The failure paths of sysfs_slab_add don't release the allocation of
'name' made by create_unique_id() a few lines above the context of the
diff below. Create a common exit path to make it more obvious what
needs freeing.

[[email protected]: free the name only if !unmergeable]
Signed-off-by: Dave Jones <[email protected]>
Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

slub: rework sysfs layout for memcg caches

Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches.  Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache.  For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:

  X
  A -> X
  B -> X

where X is some unique id (see create_unique_id()).  Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:

  X
  X:M
  X:N
  A -> X
  B -> X
  A:M -> X:M
  A:N -> X:N

Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.

It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory.  This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.

This patch does the trick.  It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

slub: adjust memcg caches when creating cache alias

Otherwise, kzalloc() called from a memcg won't clear the whole object.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg, slab: do not destroy children caches if parent has aliases

Currently we destroy children caches at the very beginning of
kmem_cache_destroy().  This is wrong, because the root cache will not
necessarily be destroyed in the end - if it has aliases (refcount > 0),
kmem_cache_destroy() will simply decrement its refcount and return.  In
this case, at best we will get a bunch of warnings in dmesg, like this
one:

  kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
  CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
  Call Trace:
    dump_stack+0x49/0x5b
    kmem_cache_destroy+0xdf/0xf0
    kmem_cache_destroy_memcg_children+0x97/0xc0
    kmem_cache_destroy+0xf/0xf0
    xfs_mru_cache_uninit+0x21/0x30 [xfs]
    exit_xfs_fs+0x2e/0xc44 [xfs]
    SyS_delete_module+0x198/0x1f0
    system_call_fastpath+0x16/0x1b

At worst - if kmem_cache_destroy() will race with an allocation from a
memcg cache - the kernel will panic.

This patch fixes this by moving children caches destruction after the
check if the cache has aliases.  Plus, it forbids destroying a root
cache if it still has children caches, because each children cache keeps
a reference to its parent.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg, slab: unregister cache from memcg before starting to destroy it

Currently, memcg_unregister_cache(), which deletes the cache being
destroyed from the memcg_slab_caches list, is called after
__kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
destroy the cache.

As a result, one can access a partially destroyed cache while traversing
a memcg_slab_caches list, which can have deadly consequences (for
instance, cache_show() called for each cache on a memcg_slab_caches list
from mem_cgroup_slabinfo_read() will dereference pointers to already
freed data).

To fix this, let's move memcg_unregister_cache() before the cache
destruction process beginning, issuing memcg_register_cache() on failure.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg, slab: separate memcg vs root cache creation paths

Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
memcg-only and except-for-memcg calls. To clean this up, let's move the
code responsible for memcg cache creation to a separate function.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg, slab: cleanup memcg cache creation

This patch cleans up the memcg cache creation path as follows:

- Move memcg cache name creation to a separate function to be called
  from kmem_cache_create_memcg().  This allows us to get rid of the mutex
  protecting the temporary buffer used for the name formatting, because
  the whole cache creation path is protected by the slab_mutex.

- Get rid of memcg_create_kmem_cache().  This function serves as a proxy
  to kmem_cache_create_memcg().  After separating the cache name creation
  path, it would be reduced to a function call, so let's inline it.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg, slab: never try to merge memcg caches

When a kmem cache is created (kmem_cache_create_memcg()), we first try to
find a compatible cache that already exists and can handle requests from
the new cache, i.e.  has the same object size, alignment, ctor, etc.  If
there is such a cache, we do not create any new caches, instead we simply
increment the refcount of the cache found and return it.

Currently we do this procedure not only when creating root caches, but
also for memcg caches.  However, there is no point in that, because, as
every memcg cache has exactly the same parameters as its parent and cache
merging cannot be turned off in runtime (only on boot by passing
"slub_nomerge"), the root caches of any two potentially mergeable memcg
caches should be merged already, i.e.  it must be the same root cache, and
therefore we couldn't even get to the memcg cache creation, because it
already exists.

The only exception is boot caches - they are explicitly forbidden to be
merged by setting their refcount to -1.  There are currently only two of
them - kmem_cache and kmem_cache_node, which are used in slab internals (I
do not count kmalloc caches as their refcount is set to 1 immediately
after creation).  Since they are prevented from merging preliminary I
guess we should avoid to merge their children too.

So let's remove the useless code responsible for merging memcg caches.

Signed-off-by: Vladimir Davydov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Glauber Costa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>