Git Repo - linux.git/log

Hexagon: add IOMEM and _relaxed IO macros

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: switch to using the device type for IO mappings

Uncached on our architecture can still have side effects
such as canceled/replayed transactions; device type prevents
this.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: don't print info for offline CPU's

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: add support for single-stepping (v4+)

Hardware single-step is only available on v4 and later
architectures.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: use correct work mask when checking for more work

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: add support for additional exceptions

Add multi-reg-write and unaligned-PC exceptions.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: fix initial page table setup prior to jump to VA

Use the exact number of pages needed to be mapped pre-VA-jump,
then map 896MB afterwards, which the arch mem init will fix up.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: remove keyring related call

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: check to if we will overflow the signal stack

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: Signal and return path fixes

This fixes the return value of sigreturn and moves the work pending check
into a c routine for readability and fixes the loop for multiple pending
signals. Based on feedback from Al Viro.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: add support for new v4+ registers

Add support for a couple new v4+ registers, along with
newer save/restore pt_regs.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: add individual register access for switch_stack

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: use defines for MIN_KERNEL_SEG calculation

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: use GENERIC_CPU_DEVICES

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: change arch version config to allow comparisons

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: add support for ARCH_PFN_OFFSET

Add support for loading the kernel at a physical offset. The
offset should still be 4M aligned.

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: fix __atomic_add_unless

Signed-off-by: Richard Kuo <[email protected]>

Hexagon: clean up generic headers in Kbuild

Signed-off-by: Richard Kuo <[email protected]>

Merge branch 'akpm' (incoming from Andrew)

Merge third batch of fixes from Andrew Morton:
"Most of the rest.  I still have two large patchsets against AIO and
  IPC, but they're a bit stuck behind other trees and I'm about to
  vanish for six days.

   - random fixlets
   - inotify
   - more of the MM queue
   - show_stack() cleanups
   - DMI update
   - kthread/workqueue things
   - compat cleanups
   - epoll udpates
   - binfmt updates
   - nilfs2
   - hfs
   - hfsplus
   - ptrace
   - kmod
   - coredump
   - kexec
   - rbtree
   - pids
   - pidns
   - pps
   - semaphore tweaks
   - some w1 patches
   - relay updates
   - core Kconfig changes
   - sysrq tweaks"

* emailed patches from Andrew Morton <[email protected]>: (109 commits)
  Documentation/sysrq: fix inconstistent help message of sysrq key
  ethernet/emac/sysrq: fix inconstistent help message of sysrq key
  sparc/sysrq: fix inconstistent help message of sysrq key
  powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
  ARM/etm/sysrq: fix inconstistent help message of sysrq key
  power/sysrq: fix inconstistent help message of sysrq key
  kgdb/sysrq: fix inconstistent help message of sysrq key
  lib/decompress.c: fix initconst
  notifier-error-inject: fix module names in Kconfig
  kernel/sys.c: make prctl(PR_SET_MM) generally available
  UAPI: remove empty Kbuild files
  menuconfig: print more info for symbol without prompts
  init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
  kconfig menu: move Virtualization drivers near other virtualization options
  Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  relay: use macro PAGE_ALIGN instead of FIX_SIZE
  kernel/relay.c: move FIX_SIZE macro into relay.c
  kernel/relay.c: remove unused function argument actor
  drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
  drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
  ...

Merge tag 'md-3.10' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
"A mixed bag of little fixes.  No real new functionality here.  Several
  patches are tagged for -stable."

* tag 'md-3.10' of git://neil.brown.name/md:
  MD: ignore discard request for hard disks of hybid raid1/raid10 array
  md: bad block list should default to disabled.
  md: raid1/raid10 md devices leak memory when stopping
  DM RAID: Add message/status support for changing sync action
  MD: Export 'md_reap_sync_thread' function
  md: don't update metadata when stopping a read-only array.
  md: Allow devices to be re-added to a read-only array.
  md/raid10: Allow skipping recovery when clean arrays are assembled
  MD: Fix typos in MD documentation
  md/raid5: avoid an extra write when writing to a known-bad-block.
  md/raid5: Change or of some order to improve efficiency.
  md: use set_bit_le and clear_bit_le
  md: HOT_DISK_REMOVE shouldn't make a read-auto device active.
  md: use common code for all calls to ->hot_remove_disk()
  md: never update metadata when array is read-only.

Documentation/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix sysrq documentation.

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ethernet/emac/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its upper-case
characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch ethernet emac sysrq key: "emac(c)"

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Josh Boyer <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sparc/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix spare sysrq key: "global-regs(y)"

Signed-off-by: zhangwei(Jovi) <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc/xmon/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix powerpc xmon sysrq key: "xmon(x)"

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ARM/etm/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlights its
upper-case characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix arm etm sysrq key: "etm-buffer-dump(v)"
(This patch also add "-" to separate each sysrq key help word,
instead of spaces)

Signed-off-by: zhangwei(Jovi) <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

power/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix power off sysrq key: "poweroff(o)"

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kgdb/sysrq: fix inconstistent help message of sysrq key

Currently help message of /proc/sysrq-trigger highlight its upper-case
characters, like below:

SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix kgdb sysrq key: "debug(g)"

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Jason Wessel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/decompress.c: fix initconst

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

notifier-error-inject: fix module names in Kconfig

The Kconfig help text for MEMORY_NOTIFIER_ERROR_INJECT and
OF_RECONFIG_NOTIFIER_ERROR_INJECT has mismatched module names.

Signed-off-by: Akinobu Mita <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/sys.c: make prctl(PR_SET_MM) generally available

The purpose of this patch is to allow privileged processes to set
their own per-memory memory-region fields:

      start_code, end_code, start_data, end_data, start_brk, brk,
      start_stack, arg_start, arg_end, env_start, env_end.

This functionality is needed by any application or package that needs to
reconstruct Linux processes, that is, to start them in any way other than
by means of an "execve()" from an executable file.  This includes:

1. Restoring processes from a checkpoint-file (by all potential
   user-level checkpointing packages, not only CRIU's).
2. Restarting processes on another node after process migration.
3. Starting duplicated copies of a running process (for reliability
   and high-availablity).
4. Starting a process from an executable format that is not supported
   by Linux, thus requiring a "manual execve" by a user-level utility.
5. Similarly, starting a process from a networked and/or crypted
   executable that, for confidentiality, licensing or other reasons,
   may not be written to the local file-systems.

The code that does that was already included in the Linux kernel by the
CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this was
enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE", which is
normally disabled.  The patch removes those ifdefs.

Signed-off-by: Amnon Shiloh <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

UAPI: remove empty Kbuild files

Remove empty Kbuild files as they cause problems with the patch program which
removes files that become empty.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

menuconfig: print more info for symbol without prompts

When we search a config symbol, if it has no prompt the position of this
symbol in the Kconfig file and it's dependencies are not printed.  This
can be inconvenient, especially when it's set to n and we want to find out
why.

the following is an example:

before:

Symbol: GENERIC_SMP_IDLE_THREAD [=y]
Type  : boolean
  Selected by: X86 [=y]

after:

Symbol: GENERIC_SMP_IDLE_THREAD [=y]
Type  : boolean
  Defined at arch/Kconfig:213
  Selected by: X86 [=y]

Signed-off-by: Weng Meiling <[email protected]>
Signed-off-by: Libo Chen <[email protected]>
Cc: Michal Marek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display

The kconfig language requires that dependent options all follow the
menuconfig symbol in order to be collapsed below it.  Recently some hidden
options were added below the EXPERT menuconfig, but did not depend on
EXPERT (because hidden options can't).  This broke the display.  So
re-order all these options, and while we're here stick the PCI quirks
under the EXPERT menu (since it isn't sitting with any related options).

Before this commit, we get:
[*] Configure standard kernel features (expert users)  --->
[ ] Sysctl syscall support
[*] Load all symbols for debugging/ksymoops
...
[ ] Embedded system

Now we get the older (and correct) behavior:
[*] Configure standard kernel features (expert users)  --->
[ ] Embedded system
And if you go into the expert menu you get the expert options:
[ ] Sysctl syscall support
[*] Load all symbols for debugging/ksymoops
...

Signed-off-by: Mike Frysinger <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Cc: zhangwei(Jovi) <[email protected]>
Cc: Michal Marek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kconfig menu: move Virtualization drivers near other virtualization options

Make virtualization drivers be logically grouped together (physically
near each other) in the kconfig menu by moving "Virtualization drivers"
to be near "Virtio drivers", Microsort Hyper-V, and Xen driver support.

This is just a user-friendly, visual search change.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Alexander Graf <[email protected]>
Cc: Stuart Yoder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.

To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.

Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs. ones which emit errors. The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.

While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.

Signed-off-by: Stephen Boyd <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Acked-by: Helge Deller <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

relay: use macro PAGE_ALIGN instead of FIX_SIZE

Macro FIX_SIZE is same as PAGE_ALIGN at present, so use PAGE_ALIGN
instead.

Thanks Andrew found this.

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/relay.c: move FIX_SIZE macro into relay.c

It's better to place FIX_SIZE macro in relay.c, instead of relay.h

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/relay.c: remove unused function argument actor

Currently argument `actor' is never used in the relay reading path, so
remove it.

Signed-off-by: zhangwei(Jovi) <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()

Use platform_device_put() instead of platform_device_unregister() if
platform_device_add() fail, and platform_device_del() should be used in
the error handling case after platform_device_add() success.

Signed-off-by: Wei Yongjun <[email protected]>
Cc: Evgeniy Polyakov <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()

Use platform_device_put() instead of platform_device_unregister() if
platform_device_add() fail, and platform_device_del() should be used in
the error handling case after platform_device_add() success.

Signed-off-by: Wei Yongjun <[email protected]>
Cc: Evgeniy Polyakov <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/w1/slaves/w1_ds2780.c: fix the error handling in w1_ds2780_add_slave()

Use platform_device_put() instead of platform_device_unregister() if
platform_device_add() fail, and platform_device_del() should be used in
the error handling case after platform_device_add() success.

Signed-off-by: Wei Yongjun <[email protected]>
Cc: Evgeniy Polyakov <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/w1/slaves/w1_bq27000.c: fix the error handling in w1_bq27000_add_slave()

Use platform_device_put() instead of platform_device_unregister() if
platform_device_add() fails, and also add the return value check of
platform_device_add_data().

Signed-off-by: Wei Yongjun <[email protected]>
Cc: Evgeniy Polyakov <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Neil Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/memstick/host/r592.c: make r592_pm_ops static

r592_pm_ops is not exported. Also, CONFIG_PM_SLEEP is used to
remove unnecessary ifdefs.

Signed-off-by: Jingoo Han <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

semaphore: use `bool' type for semaphore_waiter's up

Signed-off-by: liguang <[email protected]>
Cc: Jiri Kosina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

semaphore: use unlikely() for down's timeout

Signed-off-by: liguang <[email protected]>
Cc: Jiri Kosina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

pps: pps_kc_hardpps_lock can be static

drivers/pps/kc.c:37:1: sparse: symbol 'pps_kc_hardpps_lock' was not declared. Should it be static?
drivers/pps/kc.c:39:19: sparse: symbol 'pps_kc_hardpps_dev' was not declared. Should it be static?
drivers/pps/kc.c:40:5: sparse: symbol 'pps_kc_hardpps_mode' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <[email protected]>
Cc: Rodolfo Giometti <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

pps: hide more configuration symbols behind CONFIG_PPS

Make CONFIG_PPS_DEBUG and CONFIG_NTP_PPS be hidden if CONFIG_PPS is not
selected, so that we are not prompted for these configuration options if
CONFIG_PPS is not set.

Signed-off-by: Florian Fainelli <[email protected]>
Cc: Rodolfo Giometti <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

aoe: replace kmalloc and then memcpy with kmemdup

Signed-off-by: Mihnea Dobrescu-Balaur <[email protected]>
Cc: Ed Cashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nbd: increase default and max request sizes

Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned. This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic

Signed-off-by: Paul Clements <[email protected]>
Signed-off-by: Michal Belczyk <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

pid_namespace.c/.h: simplify defines

Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can
simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE.

[[email protected]: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined]
Signed-off-by: Raphael S.Carvalho <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/pid.c: improve flow of a loop inside alloc_pidmap.

find_next_offset() searches for an available "cleaned bit" in the
respective pid bitmap (page), so returns the offset if found, otherwise
it returns a value equals to BITS_PER_PAGE.

For example, suppose find_next_offset didn't find any available bit, so
there's no purpose to call mk_pid (Wasteful Cpu Cycles).

Therefore, I found it could be better to call mk_pid after the checking
(offset < BITS_PER_PAGE) returned sucessfully! Another point: If (offset
< BITS_PER_PAGE) results in a "failure", then mk_pid would be called
again afterwards.

[[email protected]: simplify code]
Signed-off-by: Raphael S. Carvalho <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Serge Hallyn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

rbtree_test: add __init/__exit annotations

Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

rbtree_test: add extra rbtree integrity check

Account for the rbtree having 2**bh(v)-1 internal nodes.

While this can be seen as a consequence of other checks, Michel states
that it nicely sums up what the other properties are for.

Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kexec: Use min() and min_t() to simplify logic

Simplify the logic of variable assignments.

[[email protected]: replace min_t with min, remove unneeded casts]
Signed-off-by: Zhang Yanfei <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kexec: fix wrong types of some local variables

The types of the following local variables:

- ubytes/mbytes in kimage_load_crash_segment()/kimage_load_normal_segment()

- r in vmcoreinfo_append_str()

are wrong, so fix them.

Signed-off-by: Zhang Yanfei <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

exec: do not abuse ->cred_guard_mutex in threadgroup_lock()

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

set_task_comm: kill the pointless memset() + wmb()

set_task_comm() does memset() + wmb() before strlcpy().  This buys
nothing and to add to the confusion, the comment is wrong.

- We do not need memset() to be "safe from non-terminating string
  reads", the final char is always zero and we never change it.

- wmb() is paired with nothing, it cannot prevent from printing
  the mixture of the old/new data unless the reader takes the lock.

Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: John Stultz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes

Currently, a write to a procfs file will return the number of bytes
successfully written.  If the actual string is longer than this, the
remainder of the string will not be be written and userspace will
complete the operation by issuing additional write()s.

Hence

$ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

results in

$ cat /proc/$$/comm
pqrs

since the final four bytes were written with a second write() since
TASK_COMM_LEN == 16.  This is obviously an undesired result and not
equivalent to prctl(PR_SET_NAME).  The implementation should not need to
know the definition of TASK_COMM_LEN.

This patch truncates the string to the first TASK_COMM_LEN bytes and
returns the bytes written as the length of the string written so the
second write() is suppressed.

$ cat /proc/$$/comm
abcdefghijklmno

Signed-off-by: David Rientjes <[email protected]>
Acked-by: John Stultz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: change wait_for_dump_helpers() to use wait_event_interruptible()

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop.  This is not needed and in fact this is not
strictly correct, we can/should do this only once after we change
pipe->writers.  We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can also
help to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this is
fine.  Once we see pipe->readers == 1 we know that the handler
decremented the counter, this is all we need.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Mandeep Singh Baines <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: factor out the setting of PF_DUMPCORE

Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
zap_threads() called by do_coredump().

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Mandeep Singh Baines <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: introduce dump_interrupted()

By discussion with Mandeep.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true.  dump_seek() does this only
before f_op->llseek(), otherwise it relies on dump_write().

We need this change to ensure that the coredump won't delay suspend, and
to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
lot of time.  In particular this can help oom-killer.

We add the new trivial helper, dump_interrupted() to add the comments and
to simplify the potential freezer changes.  Perhaps it will have more
callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers().  It is not trivial to
change dump_write() to restart if f_op->write() fails because of
freezing().  We need to handle the short writes, we need to clear
TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
it to check PF_DUMPCORE).  And if the buggy f_op->write() sets
TIF_SIGPENDING we can not distinguish this case from the race with
freeze_task() + __thaw_task().

So we simply accept the fact that the freezer can truncate a core-dump but
at least you can reliably suspend.  Hopefully we can tolerate this
unlikely case and the necessary complications doesn't worth a trouble.
But if we decide to make the coredumping freezable later we can do this on
top of this change.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Mandeep Singh Baines <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: sanitize the setting of signal->group_exit_code

Now that the coredumping process can be SIGKILL'ed, the setting of
->group_exit_code in do_coredump() can race with complete_signal() and
SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
SIGKILL | 0x80.

But the main problem is that it is not clear to me what should we do if
binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
comes as a separate change.

This patch adds 0x80 if ->core_dump() succeeds and the process was not
killed. But perhaps we can (should?) re-set ->group_exit_code changed by
SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.

Signed-off-by: Oleg Nesterov <[email protected]>
Tested-by: Mandeep Singh Baines <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: ensure that SIGKILL always kills the dumping thread

prepare_signal() blesses SIGKILL sent to the dumping process but this
signal can be "lost" anyway.  The problems is, complete_signal() sees
SIGNAL_GROUP_EXIT and skips the "kill them all" logic.  And even if the
dumping process is single-threaded (so the target is always "correct"),
the group-wide SIGKILL is not recorded in task->pending and thus
__fatal_signal_pending() won't be true.  A multi-threaded case has even
more problems.

And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
right to me.  This coredumping process is not exiting yet, it can do a lot
of work dumping the core.

With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
signal->group_exit_task instead.  This makes signal_group_exit() true and
thus this should equally close the races with exit/exec/stop but allows to
kill the dumping thread reliably.

Notes:
- It is not clear what should we do with ->group_exit_code
  if the dumper was killed, see the next change.

- we need more (hopefully straightforward) changes to ensure
  that SIGKILL actually interrupts the coredump. Basically we
  need to check __fatal_signal_pending() in dump_write() and
  dump_seek().

Signed-off-by: Oleg Nesterov <[email protected]>
Tested-by: Mandeep Singh Baines <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: only SIGKILL should interrupt the coredumping task

There are 2 well known and ancient problems with coredump/signals, and a
lot of related bug reports:

- do_coredump() clears TIF_SIGPENDING but of course this can't help
  if, say, SIGCHLD comes after that.

  In this case the coredump can fail unexpectedly. See for example
  wait_for_dump_helper()->signal_pending() check but there are other
  reasons.

- At the same time, dumping a huge core on the slow media can take a
  lot of time/resources and there is no way to kill the coredumping
  task reliably. In particular this is not oom_kill-friendly.

This patch tries to fix the 1st problem, and makes the preparation for the
next changes.

We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
that this process dumps the core.  prepare_signal() checks this flag and
nacks any signal except SIGKILL.

Note that this check tries to be conservative, in the long term we should
probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
discussion.  See marc.info/?l=linux-kernel&m=120508897917439

Notes:
- recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
  The patch assumes that dump_write/etc paths should never
  call it, but we can change it as well.

- There is another source of TIF_SIGPENDING, freezer. This
  will be addressed separately.

Signed-off-by: Oleg Nesterov <[email protected]>
Tested-by: Mandeep Singh Baines <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kmod: remove call_usermodehelper_fns()

This function suffers from not being able to determine if the cleanup is
called in case it returns -ENOMEM. Nobody is using it anymore, so let's
remove it.

Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Cc: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

usermodehelper: split remaining calls to call_usermodehelper_fns()

These are the only users of call_usermodehelper_fns(). This function
suffers from not being able to determine if the cleanup is called. Even
if in this places the cleanup pointer is NULL, convert them to use the
separate call_usermodehelper_setup() + call_usermodehelper_exec()
functions so we can remove the _fns variant.

Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Cc: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coredump: remove trailling whitespace

Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Cc: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

KEYS: split call to call_usermodehelper_fns()

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case there's an OOM in this last
function the cleanup function may not be called - in this case we would
miss a call to key_put().

Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Acked-by: David Howells <[email protected]>
Acked-by: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kmod: split call to call_usermodehelper_fns()

Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
calling call_usermodehelper_fns(). In case the latter returns -ENOMEM the
cleanup function may had not been called - in this case we would not free
argv and module_name.

Signed-off-by: Lucas De Marchi <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Cc: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

usermodehelper: export call_usermodehelper_exec() and call_usermodehelper_setup()

call_usermodehelper_setup() + call_usermodehelper_exec() need to be
called instead of call_usermodehelper_fns() when the cleanup function
needs to be called even when an ENOMEM error occurs. In this case using
call_usermodehelper_fns() the user can't distinguish if the cleanup
function was called or not.

[[email protected]: export call_usermodehelper_setup() to modules]
Signed-off-by: Lucas De Marchi <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: David Howells <[email protected]>
Cc: James Morris <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest: add a test case for PTRACE_PEEKSIGINFO

* Dump signals from process-wide and per-thread queues with
different sizes of buffers.
* Check error paths for buffers with restricted permissions. A part of
buffer or a whole buffer is for read-only.
* Try to get nonexistent signal.

Signed-off-by: Andrew Vagin <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: David Howells <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: "Michael Kerrisk (man-pages)" <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Pedro Alves <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ptrace: add ability to retrieve signals without removing from a queue (v4)

This patch adds a new ptrace request PTRACE_PEEKSIGINFO.

This request is used to retrieve information about pending signals
starting with the specified sequence number.  Siginfo_t structures are
copied from the child into the buffer starting at "data".

The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
struct ptrace_peeksiginfo_args {
u64 off; /* from which siginfo to start */
u32 flags;
s32 nr; /* how may siginfos to take */
};

"nr" has type "s32", because ptrace() returns "long", which has 32 bits on
i386 and a negative values is used for errors.

Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
signals from process-wide queue.  If this flag is not set, signals are
read from a per-thread queue.

The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
signal with the specified sequence number doesn't exist, ptrace returns
zero.  The request returns an error, if no signal has been dumped.

Errors:
EINVAL - one or more specified flags are not supported or nr is negative
EFAULT - buf or addr is outside your accessible address space.

A result siginfo contains a kernel part of si_code which usually striped,
but it's required for queuing the same siginfo back during restore of
pending signals.

This functionality is required for checkpointing pending signals.  Pedro
Alves suggested using it in "gdb" to peek at pending signals.  gdb already
uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
dequeued.  This functionality allows gdb to look at the pending signals
which were not reported yet.

The prototype of this code was developed by Oleg Nesterov.

Signed-off-by: Andrew Vagin <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: David Howells <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: "Michael Kerrisk (man-pages)" <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Pedro Alves <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: remove duplicated message prefix in hfsplus_block_free()

Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: add error propagation to __hfsplus_ext_write_extent()

__hfsplus_ext_write_extent() suppresses errors coming from
hfs_brec_find(). The patch implements error code propagation.

Signed-off-by: Alexey Khoroshilov <[email protected]>
Reviewed-by: Vyacheslav Dubeyko <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfs/hfsplus: convert printks to pr_<level>

Use a more current logging style.

Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
hfsplus now uses "hfsplus: " for all messages.
Coalesce formats.
Prefix debugging messages too.

Signed-off-by: Joe Perches <[email protected]>
Cc: Vyacheslav Dubeyko <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfs/hfsplus: convert dprint to hfs_dbg

Use a more current logging style.

Rename macro and uses.
Add do {} while (0) to macro.
Add DBG_ to macro.
Add and use hfs_dbg_cont variant where appropriate.

Signed-off-by: Joe Perches <[email protected]>
Cc: Vyacheslav Dubeyko <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: fix warnings in fs/hfsplus/bfind.c

fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
(1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
(2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]

[[email protected]: make the workaround more explicit]
Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfs: add error checking for hfs_find_init()

hfs_find_init() may fail with ENOMEM, but there are places, where the
returned value is not checked. The consequences can be very unpleasant,
e.g. kfree uninitialized pointer and inappropriate mutex unlocking.

The patch adds checks for errors in hfs_find_init().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <[email protected]>
Reviewed-by: Vyacheslav Dubeyko <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: remove unneeded test in nilfs_writepage()

page->mapping->host cannot be NULL in nilfs_writepage(), so remove the
unneeded test.

The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage()
error: we previously assumed 'inode' could be null (see line 195)".

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: fix using of PageLocked() in nilfs_clear_dirty_page()

Change test_bit(PG_locked, &page->flags) to PageLocked().

Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption

The NILFS2 driver remounts itself in RO mode in the case of discovering
metadata corruption (for example, discovering a broken bmap).  But
usually, this takes place when there have been file system operations
before remounting in RO mode.

Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in
modified inodes' address spaces.  It results in flush kernel thread's
infinite trying to flush dirty pages in RO mode.  As a result, it is
possible to see such side effects as: (1) flush kernel thread occupies
50% - 99% of CPU time; (2) system can't be shutdowned without manual
power switch off.

SYMPTOMS:
(1) System log contains error message: "Remounting filesystem read-only".
(2) The flush kernel thread occupies 50% - 99% of CPU time.
(3) The system can't be shutdowned without manual power switch off.

REPRODUCTION PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <[email protected]>):

  ----------------[BEGIN SCRIPT]--------------------
  #!/bin/bash

  VG=unencrypted
  #apt-get install nilfs-tools darcs
  lvcreate --size 2G --name ntest $VG
  mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
  mkdir /var/tmp/n
  mkdir /var/tmp/n/ntest
  mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
  mkdir /var/tmp/n/ntest/thedir
  cd /var/tmp/n/ntest/thedir
  sleep 2
  date
  darcs init
  sleep 2
  dmesg|tail -n 5
  date
  darcs whatsnew || true
  date
  sleep 2
  dmesg|tail -n 5
  ----------------[END SCRIPT]--------------------

(3) Try to shutdown the system.

REPRODUCIBILITY: 100%

FIX:

This patch implements checking mount state of NILFS2 driver in
nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page()
methods.  If it is detected the RO mount state then all dirty pages are
simply discarded with warning messages is written in system log.

[[email protected]: fix printk warning]
Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
Cc: Anthony Doggett <[email protected]>
Cc: ARAI Shun-ichi <[email protected]>
Cc: Piotr Szymaniak <[email protected]>
Cc: Zahid Chowdhury <[email protected]>
Cc: Elmer Zhang <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

i2o: check copy_from_user() size parameter

Limit the size of the copy so we don't corrupt memory. Hopefully this
can only be called by root, but fixing this makes the static checkers
happier.

Signed-off-by: Dan Carpenter <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Masanari Iida <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dmi_scan: refactor dmi_scan_machine(), {smbios,dmi}_present()

Move the calls to memcpy_fromio() up into the loop in
dmi_scan_machine(), and move the signature checks back down into
dmi_decode(). We need to check at 16-byte intervals but keep a 32-byte
buffer for an SMBIOS entry, so shift the buffer after each iteration.

Merge smbios_present() into dmi_present(), so we look for an SMBIOS
signature at the beginning of the given buffer and then for a DMI
signature at an offset of 16 bytes.

[[email protected]: use proper buf type in dmi_present()]
Signed-off-by: Ben Hutchings <[email protected]>
Reported-by: Tim McGrath <[email protected]>
Tested-by: Tim Mcgrath <[email protected]>
Cc: Zhenzhong Duan <[email protected]>
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

binfmt_elf: PIE: make PF_RANDOMIZE check comment more accurate

The comment I originally added in commit a3defbe5c337 ("binfmt_elf: fix
PIE execution with randomization disabled") is not really 100% accurate
-- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset
in runtime.

Another option of course is direct modification of personality flags
(i.e. running through setarch wrapper).

Make the comment more explicit and accurate.

Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: make binfmt support for #! scripts modular and removable

Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support
for interpreted scripts starting with "#!"; allow compiling out that
support, or building it as a module. Embedded systems running exclusively
compiled binaries could leave this support out, and systems that don't
need scripts before mounting the root filesystem can build this as a
module.

Signed-off-by: Josh Triplett <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

epoll: cleanup: use RCU_INIT_POINTER when nulling

It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results
in slightly smaller/faster code.

Signed-off-by: Eric Wong <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

epoll: cleanup: hoist out f_op->poll calls

This reduces the amount of code inside the ready list iteration loops for
better readability IMHO.

Signed-off-by: Eric Wong <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

epoll: lock ep->mtx in ep_free to silence lockdep

Technically we do not need to hold ep->mtx during ep_free since we are
certain there are no other users of ep at that point. However, lockdep
complains with a "suspicious rcu_dereference_check() usage!" message; so
lock the mutex before ep_remove to silence the warning.

Signed-off-by: Eric Wong <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arve Hjønnevåg <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: NeilBrown <[email protected]>,
Cc: Rafael J. Wysocki <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

epoll: use RCU to protect wakeup_source in epitem

This prevents wakeup_source destruction when a user hits the item with
EPOLL_CTL_MOD while ep_poll_callback is running.

Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2"

Signed-off-by: Eric Wong <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Arve Hjønnevåg <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

epoll: trim epitem by one cache line

It is common for epoll users to have thousands of epitems, so saving a
cache line on every allocation leads to large memory savings.

Since epitem allocations are cache-aligned, reducing sizeof(struct
epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
cache line boundary on x86_64.

Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my
x86_64 Core2 Duo (which has 64-byte cache alignment):

object_size : 192 => 128
objs_per_slab: 21 => 32

Also, add a BUILD_BUG_ON() to check for future accidental breakage.

[[email protected]: use __packed, for all architectures]
Signed-off-by: Eric Wong <[email protected]>
Cc: Davide Libenzi <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/timer.c: move some non timer related syscalls to kernel/sys.c

Andrew Morton noted:

akpm3:/usr/src/25> grep SYSCALL kernel/timer.c
SYSCALL_DEFINE1(alarm, unsigned int, seconds)
SYSCALL_DEFINE0(getpid)
SYSCALL_DEFINE0(getppid)
SYSCALL_DEFINE0(getuid)
SYSCALL_DEFINE0(geteuid)
SYSCALL_DEFINE0(getgid)
SYSCALL_DEFINE0(getegid)
SYSCALL_DEFINE0(gettid)
SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info)
COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)

Only one of those should be in kernel/timer.c. Who wrote this thing?

[[email protected]: coding-style fixes]
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/timer.c: convert compat_sys_sysinfo to COMPAT_SYSCALL_DEFINE

Signed-off-by: Stephen Rothwell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/compat.c: make do_sysinfo() static

The only use outside of kernel/timer.c was in kernel/compat.c, so move
compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c.

Signed-off-by: Stephen Rothwell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Al Viro <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

binfmt_misc: reuse string_unescape_inplace()

There is string_unescape_inplace() function which decodes strings in generic
way. Let's use it.

Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

dynamic_debug: reuse generic string_unescape function

There is kernel function to do the job in generic way. Let's use it.

Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Jason Baron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

staging: speakup: remove custom string_unescape_any_inplace

There is generic implementation of the function to unescape strings.

Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Samuel Thibault <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: William Hubbs <[email protected]>
Cc: Chris Brannon <[email protected]>
Cc: Kirk Reiser <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/string_helpers: introduce generic string_unescape

There are several places in kernel where modules unescapes input to convert
C-Style Escape Sequences into byte codes.

The patch provides generic implementation of such approach. Test cases are
also included into the patch.

[[email protected]: clarify comment]
[[email protected]: export get_random_int() to modules]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Samuel Thibault <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: William Hubbs <[email protected]>
Cc: Chris Brannon <[email protected]>
Cc: Kirk Reiser <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/smp.c: cleanups

We sometimes use "struct call_single_data *data" and sometimes "struct
call_single_data *csd". Use "csd" consistently.

We sometimes use "struct call_function_data *data" and sometimes "struct
call_function_data *cfd". Use "cfd" consistently.

Also, avoid some 80-col layout tricks.

Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/fs.h: disable preempt when acquire i_size_seqcount write lock

Two rt tasks bind to one CPU core.

The higher priority rt task A preempts a lower priority rt task B which
has already taken the write seq lock, and then the higher priority rt
task A try to acquire read seq lock, it's doomed to lockup.

rt task A with lower priority: call write
i_size_write                                        rt task B with higher priority: call sync, and preempt task A
  write_seqcount_begin(&inode->i_size_seqcount);    i_size_read
  inode->i_size = i_size;                             read_seqcount_begin <-- lockup here...

So disable preempt when acquiring every i_size_seqcount *write* lock will
cure the problem.

Signed-off-by: Fan Du <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/smp.c: remove 'priv' of call_single_data

The 'priv' field is redundant; we can pass data via 'info'.

Signed-off-by: liguang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>