Git Repo - linux.git/log

fat: fix incorrect function comment

fat_search_long() returns 0 on success, -ENOENT/ENOMEM on failure.
Change the function comment accordingly.

While at it, fix some trivial typos.

Signed-off-by: Ravishankar N <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Documentation: ABI: remove testing/sysfs-devices-node

This file is already documented in the stable ABI (see commit
5bbe1ec11fcf).

Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: fix inconsistent lock state

Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq.  Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.

Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

  =================================
  [ INFO: inconsistent lock state ]
  3.7.0 #36 Not tainted
  ---------------------------------
  inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
   (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
  {SOFTIRQ-ON-W} state was registered at:
     __lock_acquire+0x8ae/0xca0
     lock_acquire+0x199/0x200
     _raw_spin_lock+0x41/0x50
     proc_alloc_inum+0x4c/0xd0
     alloc_mnt_ns+0x49/0xc0
     create_mnt_ns+0x25/0x70
     mnt_init+0x161/0x1c7
     vfs_caches_init+0x107/0x11a
     start_kernel+0x348/0x38c
     x86_64_start_reservations+0x131/0x136
     x86_64_start_kernel+0x103/0x112
  irq event stamp: 2993422
  hardirqs last  enabled at (2993422):  _raw_spin_unlock_irqrestore+0x55/0x80
  hardirqs last disabled at (2993421):  _raw_spin_lock_irqsave+0x29/0x70
  softirqs last  enabled at (2993394):  _local_bh_enable+0x13/0x20
  softirqs last disabled at (2993395):  call_softirq+0x1c/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(proc_inum_lock);
    <Interrupt>
      lock(proc_inum_lock);

   *** DEADLOCK ***

  no locks held by swapper/1/0.

  stack backtrace:
  Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
  Call Trace:
   <IRQ>  [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
    print_usage_bug+0x2a5/0x2c0
    mark_lock+0x33b/0x5e0
    __lock_acquire+0x813/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_free_inum+0x1c/0x50
    free_pid_ns+0x1c/0x50
    put_pid_ns+0x2e/0x50
    put_pid+0x4a/0x60
    delayed_put_pid+0x12/0x20
    rcu_process_callbacks+0x462/0x790
    __do_softirq+0x1b4/0x3b0
    call_softirq+0x1c/0x30
    do_softirq+0x59/0xd0
    irq_exit+0x54/0xd0
    smp_apic_timer_interrupt+0x95/0xa3
    apic_timer_interrupt+0x72/0x80
    cpuidle_enter_tk+0x10/0x20
    cpuidle_enter_state+0x17/0x50
    cpuidle_idle_call+0x287/0x520
    cpu_idle+0xba/0x130
    start_secondary+0x2b3/0x2bc

Signed-off-by: Xiaotian Feng <[email protected]>
Cc: Al Viro <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors

Commit 263a523d18bc ("linux/kernel.h: Fix warning seen with W=1 due to
change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to
change in DIV_ROUND_CLOSEST.

Unfortunately, the C compiler converts divide operations with unsigned
divisors to unsigned, even if the dividend is signed and negative (for
example, -10 / 5U = 858993457).  The C standard says "If one operand has
unsigned int type, the other operand is converted to unsigned int", so
the compiler is not to blame.  As a result, DIV_ROUND_CLOSEST(0, 2U) and
similar operations now return bad values, since the automatic conversion
of expressions such as "0 - 2U/2" to unsigned was not taken into
account.

Fix by checking for the divisor variable type when deciding which
operation to perform.  This fixes DIV_ROUND_CLOSEST(0, 2U), but still
returns bad values for negative dividends divided by unsigned divisors.
Mark the latter case as unsupported.

One observed effect of this problem is that the s2c_hwmon driver reports
a value of 4198403 instead of 0 if the ADC reads 0.

Other impact is unpredictable.  Problem is seen if the divisor is an
unsigned variable or constant and the dividend is less than (divisor/2).

Signed-off-by: Guenter Roeck <[email protected]>
Reported-by: Juergen Beisert <[email protected]>
Tested-by: Juergen Beisert <[email protected]>
Cc: Jean Delvare <[email protected]>
Cc: <[email protected]> [3.7.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg: don't register hotcpu notifier from ->css_alloc()

Commit 648bb56d076b ("cgroup: lock cgroup_mutex in cgroup_init_subsys()")
made cgroup_init_subsys() grab cgroup_mutex before invoking
->css_alloc() for the root css.  Because memcg registers hotcpu notifier
from ->css_alloc() for the root css, this introduced circular locking
dependency between cgroup_mutex and cpu hotplug.

Fix it by moving hotcpu notifier registration to a subsys initcall.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.7.0-rc4-work+ #42 Not tainted
  -------------------------------------------------------
  bash/645 is trying to acquire lock:
   (cgroup_mutex){+.+.+.}, at: [<ffffffff8110c5b7>] cgroup_lock+0x17/0x20

  but task is already holding lock:
   (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

-> #1 (cpu_hotplug.lock){+.+.+.}:
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         get_online_cpus+0x3c/0x60
         rebuild_sched_domains_locked+0x1b/0x70
         cpuset_write_resmask+0x298/0x2c0
         cgroup_file_write+0x1ef/0x300
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b

-> #0 (cgroup_mutex){+.+.+.}:
         __lock_acquire+0x14ce/0x1d20
         lock_acquire+0x97/0x1e0
         mutex_lock_nested+0x61/0x3b0
         cgroup_lock+0x17/0x20
         cpuset_handle_hotplug+0x1b/0x560
         cpuset_update_active_cpus+0xe/0x10
         cpuset_cpu_inactive+0x47/0x50
         notifier_call_chain+0x66/0x150
         __raw_notifier_call_chain+0xe/0x10
         __cpu_notify+0x20/0x40
         _cpu_down+0x7e/0x2f0
         cpu_down+0x36/0x50
         store_online+0x5d/0xe0
         dev_attr_store+0x18/0x30
         sysfs_write_file+0xe0/0x150
         vfs_write+0xa8/0x160
         sys_write+0x52/0xa0
         system_call_fastpath+0x16/0x1b
  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(cpu_hotplug.lock);
                                 lock(cgroup_mutex);
                                 lock(cpu_hotplug.lock);
    lock(cgroup_mutex);

   *** DEADLOCK ***

  5 locks held by bash/645:
   #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8123bab8>] sysfs_write_file+0x48/0x150
   #1:  (s_active#42){.+.+.+}, at: [<ffffffff8123bb38>] sysfs_write_file+0xc8/0x150
   #2:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81079277>] cpu_hotplug_driver_lock+0x1
+7/0x20
   #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81093157>] cpu_maps_update_begin+0x17/0x20
   #4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109300f>] cpu_hotplug_begin+0x2f/0x60

  stack backtrace:
  Pid: 645, comm: bash Not tainted 3.7.0-rc4-work+ #42
  Call Trace:
   print_circular_bug+0x28e/0x29f
   __lock_acquire+0x14ce/0x1d20
   lock_acquire+0x97/0x1e0
   mutex_lock_nested+0x61/0x3b0
   cgroup_lock+0x17/0x20
   cpuset_handle_hotplug+0x1b/0x560
   cpuset_update_active_cpus+0xe/0x10
   cpuset_cpu_inactive+0x47/0x50
   notifier_call_chain+0x66/0x150
   __raw_notifier_call_chain+0xe/0x10
   __cpu_notify+0x20/0x40
   _cpu_down+0x7e/0x2f0
   cpu_down+0x36/0x50
   store_online+0x5d/0xe0
   dev_attr_store+0x18/0x30
   sysfs_write_file+0xe0/0x150
   vfs_write+0xa8/0x160
   sys_write+0x52/0xa0
   system_call_fastpath+0x16/0x1b

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: warn on uapi #includes that #include <uapi/...

Avoid specifying internal uapi #include paths with uapi/... as
userspace should not use and never see that.

Neaten message line wrapping above.

Signed-off-by: Joe Perches <[email protected]>
Acked-by: David Howells <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

revert "rtc: recycle id when unloading a rtc driver"

Revert commit 2830a6d20139df2198d63235df7957712adb28e5.

We already perform the ida_simple_remove() in rtc_device_release(),
which is an appropriate place. Commit 2830a6d20 ("rtc: recycle id when
unloading a rtc driver") caused the kernel to emit

ida_remove called for id=0 which is not allocated.

warnings when rtc_device_release() tries to release an alread-released
ID.

Let's restore things to their previous state and then work out why
Vincent's kernel wasn't calling rtc_device_release() - presumably a bug
in a specific sub-driver.

Reported-by: Lothar Waßmann <[email protected]>
Acked-by: Alexander Holler <[email protected]>
Cc: Vincent Palatin <[email protected]>
Cc: <[email protected]> [3.7.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: clean up transparent hugepage sysfs error messages

Clarify error messages and correct a few typos in the transparent hugepage
sysfs init code.

Signed-off-by: Jeremy Eder <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method

Add an error message for the case of failure of sync fs in
delayed_sync_fs() method.

Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: rework processing of hfs_btree_write() returned error

Add to hfs_btree_write() a return of -EIO on failure of b-tree node
searching. Also add logic ofor processing errors from hfs_btree_write()
in hfsplus_system_write_inode() with a message about b-tree writing
failure.

[[email protected]: reduce scope of `err', print errno on error]
Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Acked-by: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: rework processing errors in hfsplus_free_extents()

Currently, it doesn't process error codes from the hfsplus_block_free()
call in hfsplus_free_extents() method. Add some error code processing.

Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: avoid crash on failed block map free

If the read fails we kmap an error code. This doesn't end well. Instead
print a critical error and pray. This mirrors the rest of the fs
behaviour with critical error cases.

Acked-by: Vyacheslav Dubeyko <[email protected]>
Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Vyacheslav Dubeyko <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jan Kara <[email protected]>
Acked-by: Hin-Tak Leung <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kcmp: include linux/ptrace.h

This makes it compile on s390. After all the ptrace_may_access
(which we use this file) is declared exactly in linux/ptrace.h.

This is preparatory work to wire this syscall up on all archs.

Signed-off-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Alexander Kartashov <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>

Add the missing header include for spinlocks, to avoid potential build
failures on specific architectures or configurations.

Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Sascha Hauer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: cma: WARN if freed memory is still in use

Memory returned to free_contig_range() must have no other references.
Let kernel to complain loudly if page reference count is not equal to 1.

[[email protected]: support sparsemem]
Signed-off-by: Marek Szyprowski <[email protected]>
Reviewed-by: Kyungmin Park <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

exec: do not leave bprm->interp on stack

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules.  Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted.  They leave bprm->interp
pointing to their local stack.  This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules.  As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp.  To avoid adding a new kmalloc to every exec, the default
value is left as-is.  Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

   http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook <[email protected]>
Cc: halfdog <[email protected]>
Cc: P J P <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

The right dmi version is in SMBIOS if it's zero in DMI region

This issue was originally found from an oracle bug.
One customer noticed system UUID doesn't match between dmidecode & uek2.

- HP ProLiant BL460c G6 :
   # cat /sys/devices/virtual/dmi/id/product_uuid
   00000000-0000-4C48-3031-4D5030333531
   # dmidecode | grep -i uuid
   UUID: 00000000-0000-484C-3031-4D5030333531

From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
network byte order.

So we need to get dmi version to distinguish.  If version is 0.0, the
real version is taken from the SMBIOS version.  This is part of original
kernel comment in code.

[[email protected]: checkpatch fixes]
Signed-off-by: Zhenzhong Duan <[email protected]>
Cc: Feng Jin <[email protected]>
Cc: Jean Delvare <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/firmware/dmi_scan.c: check dmi version when get system uuid

As of version 2.6 of the SMBIOS specification, the first 3 fields of the
UUID are supposed to be little-endian encoded.

Also a minor fix to match variable meaning and mute checkpatch.pl

[[email protected]: tweak code comment]
Signed-off-by: Zhenzhong Duan <[email protected]>
Cc: Feng Jin <[email protected]>
Cc: Jean Delvare <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Documentation: kernel-parameters.txt remove capability.disable

Remove the documentation for capability.disable. The code supporting
this parameter was removed with commit 5915eb53861c ("security: remove
dummy module")

Signed-off-by: Josh Boyer <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Cc: Rob Landley <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix calculation of dirtyable memory

The system uses global_dirtyable_memory() to calculate number of
dirtyable pages/pages that can be allocated to the page cache.  A bug
causes an underflow thus making the page count look like a big unsigned
number.  This in turn confuses the dirty writeback throttling to
aggressively write back pages as they become dirty (usually 1 page at a
time).  This generally only affects systems with highmem because the
underflowed count gets subtracted from the global count of dirtyable
memory.

The problem was introduced with v3.2-4896-gab8fabd

Fix is to ensure we don't get an underflowed total of either highmem or
global dirtyable memory.

Signed-off-by: Sonny Rao <[email protected]>
Signed-off-by: Puneet Kumar <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Tested-by: Damien Wyart <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

compaction: fix build error in CMA && !COMPACTION

isolate_freepages_block() and isolate_migratepages_range() are used for
CMA as well as compaction so it breaks build for CONFIG_CMA &&
!CONFIG_COMPACTION.

This patch fixes it.

[[email protected]: add "do { } while (0)", per Mel]
Signed-off-by: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vfs: make lremovexattr retry once on ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make removexattr retry once on ESTALE

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make llistxattr retry once on ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make listxattr retry once on ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make lgetxattr retry once on ESTALE

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make getxattr retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: allow lsetxattr() to retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: allow setxattr to retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: allow utimensat() calls to retry once on an ESTALE error

Clearly, we can't handle the NULL filename case, but we can deal with
the case where there's a real pathname.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix user_statfs to retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make fchownat retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make fchmodat retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: have chroot retry once on ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: have chdir retry lookup and call once on ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: have faccessat retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: have do_sys_truncate retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix renameat to retry on ESTALE errors

...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make do_unlinkat retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make do_rmdir retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: add a flags argument to user_path_parent

...so we can pass in LOOKUP_REVAL. For now, nothing does yet.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix linkat to retry once on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix symlinkat to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix mkdirat to retry once on an ESTALE error

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix mknodat to retry on ESTALE errors

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: turn is_dir argument to kern_path_create into a lookup_flags arg

Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags
passed in here are currently ignored.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: fix readlinkat to retry on ESTALE

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: make fstatat retry on ESTALE errors from getattr call

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: add a retry_estale helper function to handle retries on ESTALE

This function is expected to be called from path-based syscalls to help
them decide whether to try the lookup and call again in the event that
they got an -ESTALE return back on an earier try.

Currently, we only retry the call once on an ESTALE error, but in the
event that we decide that that's not enough in the future, we should be
able to change the logic in this helper without too much effort.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus

vfs: d_obtain_alias() needs to use "/" as default name.

NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root.  This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted.  e.g.  if
"/mnt" is an NFS mount then

{ cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }

will cause a WARN message like
   WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
   ...
   Root dentry has weird name <>

to appear in kernel logs.

So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.

Signed-off-by: NeilBrown <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: Remove useless function prototypes

Commit 8e22cc88d68ca1a46d7d582938f979eb640ed30f removes the (un)lock_super
function definitions but forgets to remove their prototypes.

Signed-off-by: Alessio Igor Bogani <[email protected]>
Signed-off-by: Al Viro <[email protected]>

documentation: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

mm: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

vfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

ntfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Reviewed-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Al Viro <[email protected]>

nilfs2: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

ncpfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

minix: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

logfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

hfsplus: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

jfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

hpfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

FS-Cache: Clear remaining page count on retrieval cancellation

Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.

Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below).  This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.

FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF     D      3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007f328>]  [<ffffffffa007f328>] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:ffff88001c6d7988  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff880014cdfe00 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88001c6d7998 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffe00
R13: ffff88001c6d7ab4 R14: ffff88001a8638a0 R15: ffff88001552b190
FS:  00007f877aaf0700(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff11378fd2 CR3: 000000001c6c6000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 6075, threadinfo ffff88001c6d6000, task ffff88001c6c4080)
Stack:
ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
[<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<ffffffffa007db4d>] fscache_put_operation+0x135/0x2ed [fscache]
[<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<ffffffffa008116d>] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
[<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afe3e>] ra_submit+0x1c/0x20
[<ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<ffffffff810dbc78>] sys_read+0x44/0x75
[<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <[email protected]>

FS-Cache: Mark cancellation of in-progress operation

Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.

Signed-off-by: David Howells <[email protected]>

FS-Cache: One of the write operation paths doesn't set the object state

In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).

Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed.  The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.

Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs.  At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF            3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007db22>]  [<ffffffffa007db22>] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:ffff88003e32bcf8  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff88001818eb78 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88003e32bd18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00811da
R13: 0000000000000001 R14: 0000000100625d26 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff7dd31c68 CR3: 000000003d730000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo ffff88003e32a000, task ffff88003bb38080)
Stack:
ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
[<ffffffff8102d1ad>] ? vprintk_emit+0x3c6/0x41a
[<ffffffffa00811da>] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
[<ffffffffa007f0ad>] fscache_op_work_func+0xec/0x123 [fscache]
[<ffffffff81042de0>] process_one_work+0x21c/0x3b0
[<ffffffff81042d82>] ? process_one_work+0x1be/0x3b0
[<ffffffffa007efc1>] ? fscache_operation_gc+0x23e/0x23e [fscache]
[<ffffffff8104332e>] worker_thread+0x202/0x2df
[<ffffffff8104312c>] ? rescuer_thread+0x18e/0x18e
[<ffffffff81047c1c>] kthread+0xd0/0xd8
[<ffffffff81421bfa>] ? _raw_spin_unlock_irq+0x29/0x3e
[<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
[<ffffffff814227ec>] ret_from_fork+0x7c/0xb0
[<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55

Signed-off-by: David Howells <[email protected]>

FS-Cache: Fix signal handling during waits

wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for.  This means that the signal handling in
fscache_wait_for_retrieval_activation()  should now work properly.

Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:

FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF            3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007fcb4>]  [<ffffffffa007fcb4>] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:ffff88002a4c39a8  EFLAGS: 00010292
RAX: 000000000000001a RBX: ffff88002d3dc158 RCX: 0000000000008685
RDX: ffffffff8102ccd6 RSI: 0000000000000001 RDI: ffffffff8102d1d6
RBP: ffff88002a4c39c8 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff8163afa0 R11: ffff88003bd11900 R12: ffffffffa00868c8
R13: ffff880028306458 R14: ffff88002d3dc1b0 R15: ffff88001372e538
FS:  00007f17426a0700(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f1742494a44 CR3: 0000000031bd7000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 15006, threadinfo ffff88002a4c2000, task ffff880023de3040)
Stack:
ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
[<ffffffffa0080f6d>] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
[<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afe3e>] ra_submit+0x1c/0x20
[<ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<ffffffff810dbc78>] sys_read+0x44/0x75
[<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <[email protected]>

NFS4: Open files for fscaching

nfs4_file_open() should open files for fscaching.

Signed-off-by: David Howells <[email protected]>

FS-Cache: Add transition to handle invalidate immediately after lookup

Add a missing transition to the FS-Cache object state machine to handle an
invalidation event occuring between the back end completing the object lookup
by calling fscache_obtained_object() (which moves to state OBJECT_AVAILABLE)
and the backend returning to fscache_lookup_object() and thence to
fscache_object_state_machine() which then does a goto lookup_transit to handle
the transition - but lookup_transit doesn't handle EV_INVALIDATE.

Without this, the following BUG can be logged:

FS-Cache: Unsupported event 2 [5/f7] in state OBJECT_AVAILABLE
------------[ cut here ]------------
kernel BUG at fs/fscache/object.c:357!

Where event 2 is EV_INVALIDATE.

Signed-off-by: David Howells <[email protected]>

Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild changes from Michal Marek:
"The kbuild changes are minimal this time:

   - scripts/pnmlogo fix for some newer format

   - minor top-level Makefile cleanup

   - fix for a v3.5 regression with make clean M=<directory>"

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Do not remove vmlinux when cleaning external module
  scripts/pnmtologo: fix for plain PBM
  kbuild: Remove reference to uninitialised variable

ARM: OMAP2+: Trivial fix for IOMMU merge issue

Commit 787314c35fbb ("Merge tag 'iommu-updates-v3.8' of
git://git./linux/kernel/git/joro/iommu") did not account for the changed
header location.

The headers were made local to mach-omap2 as they are specific to omap2+
only, and we wanted to get most of the #include <plat/*.h> headers fixed
up anyways for the ARM multiplatform support.

We attempted to avoid this kind of merge conflict early on by setting up
a minimal git branch shared by the arm-soc tree and the iommu tree, but
looks like we still hit a merge issue there as the branches got merged
as various topic branches.

Cc: Joerg Roedel <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page

nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

BUG: Bad page state in process python-bin  pfn:17d39b
page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
Call Trace:
[<ffffffff8111bfe7>] bad_page+0x107/0x160
[<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
[<ffffffff8111ef19>] __pagevec_free+0x59/0xb0
[<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
[<ffffffff8112230c>] release_pages+0x21c/0x250
[<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
[<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
[<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
[<ffffffff811226f8>] __lru_cache_add+0x58/0x70
[<ffffffff81122731>] lru_cache_add_lru+0x21/0x40
[<ffffffff81123f49>] putback_lru_page+0x69/0x100
[<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
[<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
[<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
[<ffffffff8115255c>] compact_zone+0x4cc/0x600
[<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
[<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
[<ffffffff8115290e>] compact_zone_order+0x7e/0xb0
[<ffffffff81152a49>] try_to_compact_pages+0x109/0x170
[<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
[<ffffffff814c9136>] ? thread_return+0x4e/0x778
[<ffffffff81150d43>] alloc_pages_vma+0x93/0x150
[<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
[<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
[<ffffffff81136755>] handle_mm_fault+0x245/0x2b0
[<ffffffff814ce383>] do_page_fault+0x123/0x3a0
[<ffffffff814cbdf5>] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

(1) Make nfs_migrate_page() wait.

(2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

(3) Set a timeout around the wait.

(4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells <[email protected]>
Acked-by: Jeff Layton <[email protected]>

FS-Cache: Exclusive op submission can BUG if there's been an I/O error

The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
if there's been an I/O error because it may see the parent cache object in an
unexpected state.  It should only BUG if there hasn't been an I/O error.

In this case the problem was produced by remounting the cache partition to be
R/O.  The EROFS state was detected and the cache was aborted, but not
everything handled the aborting correctly.

SysRq : Emergency Remount R/O
EXT4-fs (sda6): re-mounted. Opts: (null)
Emergency Remount complete
CacheFiles: I/O Error: Failed to update xattr with error -30
FS-Cache: Cache cachefiles stopped due to I/O error
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:128!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ #1093                  /DG965RY
RIP: 0010:[<ffffffffa00739c0>]  [<ffffffffa00739c0>] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
RSP: 0018:ffff880000853d40  EFLAGS: 00010206
RAX: ffff880038ac72a8 RBX: ffff8800181f2260 RCX: ffffffff81f2b2b0
RDX: 0000000000000001 RSI: ffffffff8179a478 RDI: ffff8800181f2280
RBP: ffff880000853d60 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880038ac7268
R13: ffff8800181f2280 R14: ffff88003a359190 R15: 000000010122b162
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000034cc4a77f0 CR3: 0000000010e96000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:2 (pid: 6612, threadinfo ffff880000852000, task ffff880014c3c040)
Stack:
ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
Call Trace:
[<ffffffffa0072375>] fscache_object_work_func+0x792/0xe65 [fscache]
[<ffffffff81047e44>] process_one_work+0x1eb/0x37f
[<ffffffff81047de6>] ? process_one_work+0x18d/0x37f
[<ffffffffa0071be3>] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
[<ffffffff810482e4>] worker_thread+0x15a/0x21a
[<ffffffff8104818a>] ? rescuer_thread+0x188/0x188
[<ffffffff8104bf96>] kthread+0x7f/0x87
[<ffffffff813ad6f4>] kernel_thread_helper+0x4/0x10
[<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<ffffffff813abd1d>] ? retint_restore_args+0xe/0xe
[<ffffffff8104bf17>] ? __init_kthread_worker+0x53/0x53
[<ffffffff813ad6f0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <[email protected]>

FS-Cache: Limit the number of I/O error reports for a cache

Limit the number of I/O error reports for a cache to 1 to prevent massive
amounts of noise. After the first I/O error the cache is taken off line
automatically, so must be restarted to resume caching.

Signed-off-by: David Howells <[email protected]>

FS-Cache: Don't mask off the object event mask when printing it

Don't mask off the object event mask when printing it. That way it can be seen
if threre are bits set that shouldn't be.

Signed-off-by: David Howells <[email protected]>

FS-Cache: Initialise the object event mask with the calculated mask

Initialise the object event mask with the calculated mask rather than unmasking
undefined events also.

Signed-off-by: David Howells <[email protected]>

FS-Cache: Convert the object event ID #defines into an enum

Convert the fscache_object event IDs from #defines into an enum. Also add an
extra label to the enum to carry the event count and redefine the event mask
in terms of that.

Signed-off-by: David Howells <[email protected]>

CacheFiles: Add missing retrieval completions

CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.

This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]

Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<ffffffff81197b24>]  [<ffffffff81197b24>] fscache_put_operation+0x304/0x330
RSP: 0018:ffff880062f739d8  EFLAGS: 00010296
RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
FS:  00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[<ffffffff8119aa7e>] __fscache_read_or_alloc_pages+0x1fe/0x530
[<ffffffff81250780>] __nfs_readpages_from_fscache+0x70/0x1c0
[<ffffffff8123142a>] nfs_readpages+0xca/0x1e0
[<ffffffff815f3c06>] ? rpc_do_put_task+0x36/0x50
[<ffffffff8122755b>] ? alloc_nfs_open_context+0x4b/0x110
[<ffffffff815ecd1a>] ? rpc_call_sync+0x5a/0x70
[<ffffffff810e7e9a>] __do_page_cache_readahead+0x1ca/0x270
[<ffffffff810e7f61>] ra_submit+0x21/0x30
[<ffffffff810e818d>] ondemand_readahead+0x11d/0x250
[<ffffffff810e83b6>] page_cache_sync_readahead+0x36/0x60
[<ffffffff810dffa4>] generic_file_aio_read+0x454/0x770
[<ffffffff81224ce1>] nfs_file_read+0xe1/0x130
[<ffffffff81121bd9>] do_sync_read+0xd9/0x120
[<ffffffff8114088f>] ? mntput+0x1f/0x40
[<ffffffff811238cb>] ? fput+0x1cb/0x260
[<ffffffff81122938>] vfs_read+0xc8/0x180
[<ffffffff81122af5>] sys_read+0x55/0x90

Reported-by: Mark Moseley <[email protected]>
Signed-off-by: David Howells <[email protected]>

NFS: Use FS-Cache invalidation

Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.

The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete.  There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).

This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064                  /DG965RY
RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
[<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
[<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
[<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
[<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
[<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

Reported-by: Mark Moseley <[email protected]>
Signed-off-by: David Howells <[email protected]>

CacheFiles: Implement invalidation

Implement invalidation for CacheFiles.  This is in two parts:

(1) Provide an invalidation method (which just truncates the backing file).

(2) Abort attempts to copy anything read from the backing file whilst
     invalidation is in progress.

Question: CacheFiles uses truncation in a couple of places.  It has been using
notify_change() rather than sys_truncate() or something similar.  This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write).  Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?

Signed-off-by: David Howells <[email protected]>

VFS: Make more complete truncate operation available to CacheFiles

Make a more complete truncate operation available to CacheFiles (including
security checks and suchlike) so that it can use this to clear invalidated
cache files.

Signed-off-by: David Howells <[email protected]>
Acked-by: Al Viro <[email protected]>

Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux

Pull nfsd update from Bruce Fields:
"Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...

FS-Cache: Provide proper invalidation

Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

Pull Ceph update from Sage Weil:
"There are a few different groups of commits here.  The largest is
  Alex's ongoing work to enable the coming RBD features (cloning,
  striping).  There is some cleanup in libceph that goes along with it.

  Cyril and David have fixed some problems with NFS reexport (leaking
  dentries and page locks), and there is a batch of patches from Yan
  fixing problems with the fs client when running against a clustered
  MDS.  There are a few bug fixes mixed in for good measure, many of
  which will be going to the stable trees once they're upstream.

  My apologies for the late pull.  There is still a gremlin in the rbd
  map/unmap code and I was hoping to include the fix for that as well,
  but we haven't been able to confirm the fix is correct yet; I'll send
  that in a separate pull once it's nailed down."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits)
  rbd: get rid of rbd_{get,put}_dev()
  libceph: register request before unregister linger
  libceph: don't use rb_init_node() in ceph_osdc_alloc_request()
  libceph: init event->node in ceph_osdc_create_event()
  libceph: init osd->o_node in create_osd()
  libceph: report connection fault with warning
  libceph: socket can close in any connection state
  rbd: don't use ENOTSUPP
  rbd: remove linger unconditionally
  rbd: get rid of RBD_MAX_SEG_NAME_LEN
  libceph: avoid using freed osd in __kick_osd_requests()
  ceph: don't reference req after put
  rbd: do not allow remove of mounted-on image
  libceph: Unlock unprocessed pages in start_read() error path
  ceph: call handle_cap_grant() for cap import message
  ceph: Fix __ceph_do_pending_vmtruncate
  ceph: Don't add dirty inode to dirty list if caps is in migration
  ceph: Fix infinite loop in __wake_requests
  ceph: Don't update i_max_size when handling non-auth cap
  bdi_register: add __printf verification, fix arg mismatch
  ...

FS-Cache: Fix operation state management and accounting

Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

(1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

(2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

(a) The ->n_ops is now the number of extant operations on the object
    and is now decremented by fscache_put_operation() only.

(b) The ->n_in_progress is simply the number of objects that have been
    taken off of the object's pending queue for the purposes of being
    run.  This is decremented by fscache_op_complete() only.

(c) The ->n_exclusive is the number of exclusive ops that have been
    submitted and queued or are in progress.  It is decremented by
    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.

FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
[<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
[<ffffffff810d8d47>] evict+0xa1/0x15c
[<ffffffff810d8e2e>] dispose_list+0x2c/0x38
[<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
[<ffffffff810c56b7>] prune_super+0xd5/0x140
[<ffffffff8109b615>] shrink_slab+0x102/0x1ab
[<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
[<ffffffff8103e009>] ? process_timeout+0xb/0xb
[<ffffffff8109dba3>] kswapd+0x270/0x289
[<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
[<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
[<ffffffff8104bf7a>] kthread+0x7f/0x87
[<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
[<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
[<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
[<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <[email protected]>

FS-Cache: Make cookie relinquishment wait for outstanding reads

Make fscache_relinquish_cookie() log a warning and wait if there are any
outstanding reads left on the cookie it was given.

Signed-off-by: David Howells <[email protected]>

CacheFiles: Make some debugging statements conditional

Downgrade some debugging statements to not unconditionally print stuff, but
rather be conditional on the appropriate module parameter setting.

Signed-off-by: David Howells <[email protected]>

FS-Cache: Check that there are no read ops when cookie relinquished

Check that the netfs isn't trying to relinquish a cookie that still has read
operations in progress upon it. If there are, then give log a warning and BUG.

Signed-off-by: David Howells <[email protected]>

CacheFiles: Downgrade the requirements passed to the allocator

Downgrade the requirements passed to the allocator in the gfp flags parameter.
FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to
store an object or a page in the cache.

Signed-off-by: David Howells <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull two btrfs reverts from Chris Mason:
"I had missed that for two of the patches in my last pull, we had
  included different fixes during 3.7."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Revert "Btrfs: reorder tree mod log operations in deleting a pointer"
  Revert "Btrfs: MOD_LOG_KEY_REMOVE_WHILE_MOVING never change node's nritems"

Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull new F2FS filesystem from Jaegeuk Kim:
"Introduce a new file system, Flash-Friendly File System (F2FS), to
  Linux 3.8.

  Highlights:
   - Add initial f2fs source codes
   - Fix an endian conversion bug
   - Fix build failures on random configs
   - Fix the power-off-recovery routine
   - Minor cleanup, coding style, and typos patches"

From the Kconfig help text:

  F2FS is based on Log-structured File System (LFS), which supports
  versatile "flash-friendly" features. The design has been focused on
  addressing the fundamental issues in LFS, which are snowball effect
  of wandering tree and high cleaning overhead.

  Since flash-based storages show different characteristics according to
  the internal geometry or flash memory management schemes aka FTL, F2FS
  and tools support various parameters not only for configuring on-disk
  layout, but also for selecting allocation and cleaning algorithms.

and there's an article by Neil Brown about it on lwn.net:

  http://lwn.net/Articles/518988/

* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
  f2fs: fix tracking parent inode number
  f2fs: cleanup the f2fs_bio_alloc routine
  f2fs: introduce accessor to retrieve number of dentry slots
  f2fs: remove redundant call to f2fs_put_page in delete entry
  f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
  f2fs: rewrite f2fs_bio_alloc to make it simpler
  f2fs: fix a typo in f2fs documentation
  f2fs: remove unused variable
  f2fs: move error condition for mkdir at proper place
  f2fs: remove unneeded initialization
  f2fs: check read only condition before beginning write out
  f2fs: remove unneeded memset from init_once
  f2fs: show error in case of invalid mount arguments
  f2fs: fix the compiler warning for uninitialized use of variable
  f2fs: resolve build failures
  f2fs: adjust kernel coding style
  f2fs: fix endian conversion bugs reported by sparse
  f2fs: remove unneeded version.h header file from f2fs.h
  f2fs: update the f2fs document
  f2fs: update Kconfig and Makefile
  ...

CacheFiles: Fix the marking of cached pages

Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

(1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

(2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
[<ffffffff8109583c>] ? dump_page+0xb9/0xbe
[<ffffffff81095916>] bad_page+0xd5/0xea
[<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
[<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
[<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>

lib: atomic64: Initialize locks statically to fix early users

The atomic64 library uses a handful of static spin locks to implement
atomic 64-bit operations on architectures without support for atomic
64-bit instructions.

Unfortunately, the spinlocks are initialized in a pure initcall and that
is too late for the vfs namespace code which wants to use atomic64
operations before the initcall is run.

This became a problem as of commit 8823c079ba71: "vfs: Add setns support
for the mount namespace".

This leads to BUG messages such as:

  BUG: spinlock bad magic on CPU#0, swapper/0/0
   lock: atomic64_lock+0x240/0x400, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
    do_raw_spin_lock+0x158/0x198
    _raw_spin_lock_irqsave+0x4c/0x58
    atomic64_add_return+0x30/0x5c
    alloc_mnt_ns.clone.14+0x44/0xac
    create_mnt_ns+0xc/0x54
    mnt_init+0x120/0x1d4
    vfs_caches_init+0xe0/0x10c
    start_kernel+0x29c/0x300

coming out early on during boot when spinlock debugging is enabled.

Fix this by initializing the spinlocks statically at compile time.

Reported-and-tested-by: Vaibhav Bedia <[email protected]>
Tested-by: Tony Lindgren <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

bfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

affs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

adfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

ocfs2: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>

omfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Acked-by: Bob Copeland <[email protected]>
Signed-off-by: Al Viro <[email protected]>

procfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <[email protected]>
Signed-off-by: Al Viro <[email protected]>