Git Repo - linux.git/log

ALSA : au88x0 - Limit number of channels to fix Oops via OSS emu

Fix playback/capture channels patch to change supported playback
channels of au8830 to 1,2,4 and capture channels to 1,2.
This prevent oops when oss emulation use SNDCTL_DSP_CHANNELS to
set 3 Channels

Signed-off-by: Raymond Yau <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

compat: copy missing fields in compat_statfs64 to user

f_flags and f_spare fields were not copied to userspace when
compat_sys_[f]statfs64 called.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>

compat: update comment of compat statfs syscalls

The commit 7ed1ee6118ae ("Take statfs variants to fs/statfs.c")
separates out statfs syscalls from fs/open.c. Thus the comment
should be changed also.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Jiri Kosina <[email protected]>
Signed-off-by: Al Viro <[email protected]>

compat: remove unnecessary assignment in compat_rw_copy_check_uvector()

*@ret_pointer is initialized to @fast_pointer thus the assignment is
redundant.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Jeff Moyer <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fs: FS_POSIX_ACL does not depend on BLOCK

- Fix a kconfig unmet dependency warning.
- Remove the comment that identifies which filesystems use POSIX ACL
  utility routines.
- Move the FS_POSIX_ACL symbol outside of the BLOCK symbol if/endif block
  because its functions do not depend on BLOCK and some of the filesystems
  that use it do not depend on BLOCK.

warning: (GENERIC_ACL && JFFS2_FS_POSIX_ACL && NFSD_V4 && NFS_ACL_SUPPORT && 9P_FS_POSIX_ACL) selects FS_POSIX_ACL which has unmet direct dependencies (BLOCK)

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fs: Remove unlikely() from fget_light()

There's an unlikely() in fget_light() that assumes the file ref count
will be 1. Running the annotate branch profiler on a desktop that is
performing daily tasks (running firefox, evolution, xchat and is also part
of a distcc farm), it shows that the ref count is not 1 that often.

correct incorrect      %    Function                  File              Line
------- ---------      -    --------                  ----              ----
1035099358 6209599193  85    fget_light              file_table.c         315

Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fs: Remove unlikely() from fput_light()

In fput_light(), there's an unlikely(fput_needed), which running on
my normal desktop doing firefox, xchat, evolution and part of my distcc farm,
and running the annotate branch profiler shows that the unlikely is not
very unlikely.

correct incorrect  %        Function             File              Line
------- ---------  -        --------             ----              ----
       0       48 100 fput_light                file.h               26
115828710 897415279  88 fput_light              file.h               26
865271179 5286128445  85 fput_light             file.h               26
19568539  8923664  31 fput_light                file.h               26
12353677  3562279  22 fput_light                file.h               26
  267691    67062  20 fput_light                file.h               26
15014853   348172   2 fput_light                file.h               26
  209258      205   0 fput_light                file.h               26
1364164        0   0 fput_light                file.h               26

Which gives 1032903812 times it was correct and 6203351846 times it was
incorrect, or 85% incorrect.

Cc: Andrew Morton <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Al Viro <[email protected]>

fallocate should be a file operation

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes.  On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions.   Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>

make the feature checks in ->fallocate future proof

Instead of various home grown checks that might need updates for new
flags just check for any bit outside the mask of the features supported
by the filesystem. This makes the check future proof for any newly
added flag.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>

staging: smbfs building fix

Building error for smbfs:

drivers/staging/smbfs/dir.c:286: error: static declaration of 'smbfs_dentry_operations' follows non-static declaration
drivers/staging/smbfs/proto.h:42: error: previous declaration of 'smbfs_dentry_operations' was here
drivers/staging/smbfs/dir.c:294: error: static declaration of 'smbfs_dentry_operations_case' follows non-static declaration
drivers/staging/smbfs/proto.h:41: error: previous declaration of 'smbfs_dentry_operations_case' was here
make[3]: *** [drivers/staging/smbfs/dir.o] Error 1
make[2]: *** [drivers/staging/smbfs] Error 2
make[1]: *** [drivers/staging] Error 2
make[1]: *** Waiting for unfinished jobs....

Fix it by removing static keywords

Signed-off-by: Yang Ruirui <[email protected]>
Signed-off-by: Al Viro <[email protected]>

tidy up around finish_automount()

do_add_mount() and mnt_clear_expiry() are not needed outside of
namespace.c anymore, now that namei has finish_automount() to
use.

Signed-off-by: Al Viro <[email protected]>

don't drop newmnt on error in do_add_mount()

That gets rid of the kludge in finish_automount() - we need
to keep refcount on the vfsmount as-is until we evict it from
expiry list.

Signed-off-by: Al Viro <[email protected]>

Take the completion of automount into new helper

... and shift it from namei.c to namespace.c

Signed-off-by: Al Viro <[email protected]>

Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next

RDMA: Update workqueue usage

* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

alpha: fix WARN_ON in __local_bh_enable()

Interrupts ought to be disabled _before_ irq_enter().

Signed-off-by: Ivan Kokshaysky <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: fix breakage caused by df9ee29270

Commit df9ee29270 made arch_local_irq_save and arch_local_irq_restore
static inline which with -Werror trips up on __set_hae() and _set_hae()
which are extern inline. The naive solution is to make __set_hae() and
set_hae() static inline but for reasons described in commit d559d4a24a3fe
this breaks the generic kernel build. Instead, since this is architecture
specific code, this patch hard wires in the architecture specific method
f disabling and enabling interrupts.

Tested-by: Michael Cree <[email protected]>
Signed-off-by: Ivan Kokshaysky <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: add GENERIC_HARDIRQS_NO__DO_IRQ to Kconfig

Acked-by: Kyle McMartin <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha/osf_sys: remove unused MAX_SELECT_SECONDS

Remove the leftover from the commit 14e2acd86865 ("select:
fix alpha OSF wrapper").

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: change to new Makefile flag variables

Signed-off-by: matt mooney <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: kill off alpha_do_IRQ

Good riddance... Nuke a pile of redundant handlers that the
generic code takes care of as well.

Tested-by: Michael Cree <[email protected]>
Signed-off-by: Kyle McMartin <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: irq clean up

Stop touching irq_desc[irq] directly, instead use accessor
functions provided. Use irq_has_action instead of directly
testing the irq_desc.

Tested-by: Michael Cree <[email protected]>
Signed-off-by: Kyle McMartin <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

alpha: use set_irq_chip and push down __do_IRQ to the machine types

Also kill superfluous IRQ_DISABLED initialization, since that's the
default state of the irq_desc[i].status field.

Tested-by: Michael Cree <[email protected]>
Signed-off-by: Kyle McMartin <[email protected]>
Signed-off-by: Matt Turner <[email protected]>

drm/radeon/kms: balance asic_reset functions

First, we were calling mc_stop() at the top of the function
which turns off all MC (memory controller) clients,
then checking if the GPU is idle.  If it was idle we
returned without re-enabling the MC clients which would
lead to a blank screen, etc.  This patch checks if the
GPU is idle before calling mc_stop().

Second, if the reset failed, we were returning without
re-enabling the MC clients.  This patch re-enables
the MC clients before returning regardless of whether
the reset was successful or not.

Signed-off-by: Alex Deucher <[email protected]>
Cc: Jerome Glisse <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/radeon/kms: remove duplicate card_posted() functions

Use the common one for all asics.

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/radeon/kms: add module option for pcie gen2

Switching to pcie gen2 causes problems on some
boards. Add a module option to turn it on/off.

There are gen2 compatability issues with some
motherboards it seems.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=33027

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/radeon/kms: fix typo in evergreen safe reg

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

Merge remote branch 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next into drm-fixes

* 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next:
  drm/nouveau: fix gpu page faults triggered by plymouthd
  drm/nouveau: greatly simplify mm, killing some bugs in the process
  drm/nvc0: enable protection of system-use-only structures in vm
  drm/nv40: initialise 0x17xx on all chipsets that have it
  drm/nv40: make detection of 0x4097-ful chipsets available everywhere

drm/nouveau: fix gpu page faults triggered by plymouthd

The switch to separate BAR and channel address spaces made the fbcon memory
address calculation incorrect on NV50+ boards, this commit fixes that.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nouveau: greatly simplify mm, killing some bugs in the process

Reviewed-by: Francisco Jerez <[email protected]>
Signed-off-by: Ben Skeggs <[email protected]>

drm/nvc0: enable protection of system-use-only structures in vm

Somehow missed this in the original merge of the nvc0 code.

Signed-off-by: Ben Skeggs <[email protected]>

drm/nv40: initialise 0x17xx on all chipsets that have it

Signed-off-by: Ben Skeggs <[email protected]>

drm/nv40: make detection of 0x4097-ful chipsets available everywhere

Signed-off-by: Ben Skeggs <[email protected]>

drivers/nfc/pn544.c: fix min_t warnings

Fix these:

  drivers/nfc/pn544.c: In function 'pn544_read':
  drivers/nfc/pn544.c:356: warning: comparison of distinct pointer types lacks a cast
  drivers/nfc/pn544.c:377: warning: comparison of distinct pointer types lacks a cast
  drivers/nfc/pn544.c: In function 'pn544_write':
  drivers/nfc/pn544.c:463: warning: comparison of distinct pointer types lacks a cast
  drivers/nfc/pn544.c:485: warning: comparison of distinct pointer types lacks a cast

Cc: "Matti J. Aaltonen" <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ARM: PL08x: cleanup comments

Cleanup the formatting of comments, remove some which don't make sense
anymore.

Signed-off-by: Russell King <[email protected]>
[fix conflict with 96a608a4]
Signed-off-by: Dan Williams <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6:
  ocfs2: Make OCFS2_FS depend on CONFIGFS_FS
  dlm: Make DLM depend on CONFIGFS_FS
  net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS
  configfs: change depends -> select SYSFS
  [SCSI] sd,sr: kill compat SDEV_MEDIA_CHANGE event
  [SCSI] sd: implement sd_check_events()

parisc: fix compile breakage caused by inlining maybe_mkwrite

On PARISC, we have an include of linux/mm.h inside our asm/pgtable.h, so
this patch

  commit 14fd403f2146f740942d78af4e0ee59396ad8eab
  Author: Andrea Arcangeli <[email protected]>
  Date:   Thu Jan 13 15:46:37 2011 -0800

      thp: export maybe_mkwrite

causes us an unsatisfiable use of pte_mkwrite in linux/mm.h.

The fix is to avoid including linux/mm.h in our pgtable.h, which
unbreaks the build.

Signed-off-by: Linus Torvalds <[email protected]>

fix non-x86 build failure in pmdp_get_and_clear

pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
BUG() and they were defined only to diminish the risk of build issues on
not-x86 archs and to be consistent with the generic pte methods previously
defined in include/asm-generic/pgtable.h.

But they are causing more trouble than they were supposed to solve, so
it's simpler not to define them when THP is off.

This is also correcting the export of pmdp_splitting_flush which is
currently unused (x86 isn't using the generic implementation in
mm/pgtable-generic.c and no other arch needs that [yet]).

Signed-off-by: Andrea Arcangeli <[email protected]>
Sam Ravnborg <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

VFS: Fix UP compile error in fs/namespace.c

mnt_longterm is there only on SMP

Reported-and-tested-by: Joachim Eastwood <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

RDMA/nes: Fix incorrect SFP+ link status detection on driver init

During iw_nes initialization the link status for SFP+ PHY is always
detected as "up" regardless of real state (cable either connected or
disconnected). Add SFP+ PHY specific link status detection to the
iw_nes initialization procedure. Use link status recheck for
netdev_open to detect delayed state updates.

Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

RDMA/nes: Fix SFP+ link down detection issue with switch port disable

In case of SFP+ PHY, link status check at interrupt processing can
give false results. For proper link status change detection a delayed
recheck is needed to give nes registers time to settle. Add a
periodic link status recheck scheduled at interrupt to detect
potential delayed registers state changes.

Addresses: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2117
Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events

Depending on link state change, IB_EVENT_PORT_ERR or
IB_EVENT_PORT_ACTIVE should be generated when handling MAC interrupts.

Plugging in a cable happens to result in series of interrupts changing
driver's link state a number of times before finally staying at link
up (e.g. link up, link down, link up, link down, ..., link up). To
prevent sending series of redundant IB_EVENT_PORT_ACTIVE and
IB_EVENT_PORT_ERR events, we use a timer to debounce them in
nes_port_ibevent().

Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

RDMA/nes: Fix bonding on iw_nes

Enable configuring bonds on nes devices by adding missing support for
master net_device to the driver.

Signed-off-by: Maciej Sosnowski <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

ocfs2: Make OCFS2_FS depend on CONFIGFS_FS

This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by OCFS2_FS
fs/ocfs2/Kconfig:1: symbol OCFS2_FS depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: James Bottomley <[email protected]>

dlm: Make DLM depend on CONFIGFS_FS

This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM
fs/dlm/Kconfig:1: symbol DLM depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: James Bottomley <[email protected]>

net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS

This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by NETCONSOLE_DYNAMIC
drivers/net/Kconfig:3390: symbol NETCONSOLE_DYNAMIC depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: James Bottomley <[email protected]>

configfs: change depends -> select SYSFS

This patch changes configfs to select SYSFS to fix the following:

warning: (TARGET_CORE && GFS2_FS) selects CONFIGFS_FS which has unmet direct dependencies (SYSFS)

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Nicholas A. Bellinger <[email protected]>
Acked-by: Joel Becker <[email protected]>

Merge branch 'master' of /pub/scm/linux/kernel/git/jejb/scsi-post-merge-2.6 into for-linus

fs/btrfs: Fix build of ctree

Fix the build failure in some configurations:

     CC [M]  fs/btrfs/ctree.o
  In file included from fs/btrfs/ctree.c:21:0:
  fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type
  fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type
  make[2]: *** [fs/btrfs/ctree.o] Error 1
  make[1]: *** [fs/btrfs] Error 2
  make: *** [fs] Error 2

caused by commit 57cc7215b708 ("headers: kobject.h redux")

We need to include kobject.h here.

Reported-by: Jeff Garzik <[email protected]>
Fix-suggested-by: Li Zefan <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: simplify CONFIG_SQUASHFS_LZO handling
  Squashfs: move squashfs_i() definition from squashfs.h
  Squashfs: get rid of default n in Kconfig
  Squashfs: add missing check in zlib_wrapper
  Squashfs: remove unnecessary variable in zlib_wrapper
  Squashfs: Add XZ compression configuration option
  Squashfs: add XZ compression support

ACPI: Fix boot problem related to APEI with acpi_disabled set

Commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each root
bridge (v3)") put the acpi_hest_init() call in acpi_pci_root_init() into
a wrong place, presumably because the author confused acpi_pci_disabled
with acpi_disabled. Bring the code ordering in acpi_pci_root_init()
back to sanity.

Additionally, make sure that hest_disable is set when acpi_disabled is
set, which is going to prevent acpi_hest_parse(), that still may be
executed for acpi_disabled=1 through aer_acpi_firmware_first(), from
crashing because of uninitialized hest_tab.

Reported-and-tested-by: Andres Salomon <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

PCI / ACPI: Fix build of the AER driver for CONFIG_ACPI unset

After commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each
root bridge (v3)") include/linux/pci-acpi.h is included by
drivers/pci/pcie/aer/aerdrv.c and if CONFIG_ACPI is unset, the bogus and
unnecessary alternative definition of acpi_find_root_bridge_handle()
causes a build error to occur.

Remove the offending piece of garbage.

Reported-and-tested-by: Stephen Rothwell <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
  sanitize vfsmount refcounting changes
  fix old umount_tree() breakage
  autofs4: Merge the remaining dentry ops tables
  Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
  Allow d_manage() to be used in RCU-walk mode
  Remove a further kludge from __do_follow_link()
  autofs4: Bump version
  autofs4: Add v4 pseudo direct mount support
  autofs4: Fix wait validation
  autofs4: Clean up autofs4_free_ino()
  autofs4: Clean up dentry operations
  autofs4: Clean up inode operations
  autofs4: Remove unused code
  autofs4: Add d_manage() dentry operation
  autofs4: Add d_automount() dentry operation
  Remove the automount through follow_link() kludge code from pathwalk
  CIFS: Use d_automount() rather than abusing follow_link()
  NFS: Use d_automount() rather than abusing follow_link()
  AFS: Use d_automount() rather than abusing follow_link()
  Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
  ...

sanitize vfsmount refcounting changes

Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.

Accounting rules for longterm count:
* 1 for each fs_struct.root.mnt
* 1 for each fs_struct.pwd.mnt
* 1 for having non-NULL ->mnt_ns
* decrement to 0 happens only under vfsmount lock exclusive

That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.

For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.

Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc. And we get to keep
the optimization Nick's variant had bought us...

Signed-off-by: Al Viro <[email protected]>

fix old umount_tree() breakage

Expiry-related code calls umount_tree() several times with
the same list to collect vfsmounts to.  Which is fine, except
that umount_tree() implicitly assumed that the list would
be empty on each call - it moves the victims over there and
then iterates through the list kicking them out.  It's *almost*
idempotent, so everything nearly worked.  However, mnt->ghosts
handling (and thus expirability checks) had been broken - that
part was not idempotent...

The fix is trivial - use local temporary list, splice it to
the the collector list when we are through.

Signed-off-by: Al Viro <[email protected]>

btrfs: Require CAP_SYS_ADMIN for filesystem rebalance

Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
filesystem and may run uninterruptibly for a long time. This does not
seem to be something that an unprivileged user should be able to do.

Reported-by: Aron Xu <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check

If we run low on space we could get a bunch of warnings out of
btrfs_block_rsv_check, but this is mostly just called via the transaction code
to see if we need to end the transaction, it expects to see failures, so let's
not WARN and freak everybody out for no reason. Thanks,

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()

In btrfs_read_fs_root_no_radix(), 'root' is not freed if
btrfs_search_slot() returns error.

Signed-off-by: Tsutomu Itoh <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: check NULL or not

Should check if functions returns NULL or not.

Signed-off-by: Tsutomu Itoh <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: Don't pass NULL ptr to func that may deref it.

Hi,

In fs/btrfs/inode.c::fixup_tree_root_location() we have this code:

...
if (!path) {
err = -ENOMEM;
goto out;
}
...
out:
btrfs_free_path(path);
return err;

btrfs_free_path() passes its argument on to other functions and some of
them end up dereferencing the pointer.
In the code above that pointer is clearly NULL, so btrfs_free_path() will
eventually cause a NULL dereference.

There are many ways to cut this cake (fix the bug). The one I chose was to
make btrfs_free_path() deal gracefully with NULL pointers. If you
disagree, feel free to come up with an alternative patch.

Signed-off-by: Jesper Juhl <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: mount failure return value fix

I happened to pass swap partition as root partition in cmdline,
then kernel panic and tell me about "Cannot open root device".
It is not correct, in fact it is a fs type mismatch instead of 'no device'.

Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL.
The logic in init/do_mounts.c:
        for (p = fs_names; *p; p += strlen(p)+1) {
                int err = do_mount_root(name, p, flags, root_mount_data);
                switch (err) {
                        case 0:
                                goto out;
                        case -EACCES:
                                flags |= MS_RDONLY;
                                goto retry;
                        case -EINVAL:
                                continue;
                }
print "Cannot open root device"
panic
}
SO fs type after btrfs will have no chance to mount

Here fix the return value as -EINVAL

Signed-off-by: Dave Young <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: Mem leak in btrfs_get_acl()

It seems to me that we leak the memory allocated to 'value' in
btrfs_get_acl() if the call to posix_acl_from_xattr() fails.
Here's a patch that attempts to correct that problem.

Signed-off-by: Jesper Juhl <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: fix wrong free space information of btrfs

When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.

# mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
# btrfs-show
Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 28.00KB
devid    1 size 5.01GB used 2.03GB path /dev/sda9
devid    2 size 10.00GB used 2.01GB path /dev/sda10
# btrfs device scan /dev/sda9 /dev/sda10
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
   (fill the filesystem)
# sync
# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt
# btrfs-show
Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 3.99GB
devid    1 size 5.01GB used 5.01GB path /dev/sda9
devid    2 size 10.00GB used 4.99GB path /dev/sda10

It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.

This patch fixes it by calcing the free space that can be used to allocate
chunks.

Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
   3.1. if it is not zero, and then check the number of the devices that has
        more free space than this device,
        if the number of the devices is beyond the min stripe number, the free
        space can be used, and add into total free space.
        if the number of the devices is below the min stripe number, we can not
        use the free space, the check ends.
   3.2. if the free space is zero, check the next devices, goto 3.1

This implementation is just likely fake chunk allocation.

After appling this patch, df can show correct space information:
# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda9 btrfs 17G 8.6G 0 100% /mnt

Signed-off-by: Miao Xie <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: make the chunk allocator utilize the devices better

With this patch, we change the handling method when we can not get enough free
extents with default size.

Implementation:
1. Look up the suitable free extent on each device and keep the search result.
   If not find a suitable free extent, keep the max free extent
2. If we get enough suitable free extents with default size, chunk allocation
   succeeds.
3. If we can not get enough free extents, but the number of the extent with
   default size is >= min_stripes, we just change the mapping information
   (reduce the number of stripes in the extent map), and chunk allocation
   succeeds.
4. If the number of the extent with default size is < min_stripes, sort the
   devices by its max free extent's size descending
5. Use the size of the max free extent on the (num_stripes - 1)th device as the
   stripe size to allocate the device space

By this way, the chunk allocator can allocate chunks as large as possible when
the devices' space is not enough and make full use of the devices.

Signed-off-by: Miao Xie <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: restructure find_free_dev_extent()

- make it return the start position and length of the max free space when it can
not find a suitable free space.
- make it more readability

Signed-off-by: Miao Xie <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: fix wrong calculation of stripe size

There are two tiny problem:
- One is When we check the chunk size is greater than the max chunk size or not,
  we should take mirrors into account, but the original code didn't.
- The other is btrfs shouldn't use the size of the residual free space as the
  length of of a dup chunk when doing chunk allocation. It is because the device
  space that a dup chunk needs is twice as large as the chunk size, if we use
  the size of the residual free space as the length of a dup chunk, we can not
  get enough free space. Fix it.

Signed-off-by: Miao Xie <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: try to reclaim some space when chunk allocation fails

We cannot write data into files when when there is tiny space in the filesystem.

Reproduce steps:
# mkfs.btrfs /dev/sda1
# mount /dev/sda1 /mnt
# dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
# dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999
(fill the filesystem)
# umount /mnt
# mount /dev/sda1 /mnt
# rm -f /mnt/tmpfile0
# dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
(failed with nospec)

But if we do the last step again, we can write data successfully. The reason of
the problem is that btrfs didn't try to commit the current transaction and
reclaim some space when chunk allocation failed.

This patch fixes it by committing the current transaction to reclaim some
space when chunk allocation fails.

Signed-off-by: Miao Xie <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

btrfs: fix wrong data space statistics

Josef has implemented mixed data/metadata chunks, we must add those chunks'
space just like data chunks.

Signed-off-by: Miao Xie <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

fs/btrfs: Fix build of ctree

CC [M] fs/btrfs/ctree.o
In file included from fs/btrfs/ctree.c:21:0:
fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
make[2]: *** [fs/btrfs/ctree.o] Error 1
make[1]: *** [fs/btrfs] Error 2
make: *** [fs] Error 2

We need to include kobject.h here.

Reported-by: Jeff Garzik <[email protected]>
Fix-suggested-by: Li Zefan <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Chris Mason <[email protected]>

Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38

Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into btrfs-38

microblaze: Fix asm/pgtable.h

Function ptep_test_and_clear_young have had wrong the first argument.
It is also necessary to add __HAVE macros for ptep_test_and_clear_young and
ptep_get_and_clear functions.

Error log:
In file included from linux/arch/microblaze/include/asm/pgtable.h:570,
from arch/microblaze/mm/pgtable.c:35:
include/asm-generic/pgtable.h:23: error: conflicting types for 'ptep_test_and_clear_young'
linux/arch/microblaze/include/asm/pgtable.h:449: error:
previous definition of 'ptep_test_and_clear_young' was here
include/asm-generic/pgtable.h:73: error: redefinition of 'ptep_get_and_clear'
linux/arch/microblaze/include/asm/pgtable.h:462: error:
previous definition of 'ptep_get_and_clear' was here

Signed-off-by: Michal Simek <[email protected]>

microblaze: Fix missing pagemap.h

Add missing linux/pagemap.h to solve compilation error.

Error log:
In file included from linux/arch/microblaze/include/asm/tlb.h:17,
from mm/pgtable-generic.c:9:
include/asm-generic/tlb.h: In function 'tlb_flush_mmu':
include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages'
include/asm-generic/tlb.h: In function 'tlb_remove_page':
include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release'

Signed-off-by: Michal Simek <[email protected]>

dt/flattree: Return virtual address from early_init_dt_alloc_memory_arch()

The physical address is never used by the device tree code when
allocating memory for unflattening. Change the architecture's alloc
hook to return the virutal address instead.

Signed-off-by: Grant Likely <[email protected]>

autofs4: Merge the remaining dentry ops tables

Merge the remaining autofs4 dentry ops tables. It doesn't matter if
d_automount and d_manage are present on something that's not mountable or
holdable as these ops are only used if the appropriate flags are set in
dentry->d_flags.

[AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the
same dentry_operations.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Unexport do_add_mount() and add in follow_automount(), not ->d_automount()

Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself.  follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer.  The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2.  One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used.  The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Allow d_manage() to be used in RCU-walk mode

Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call
d_manage() directly. d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Remove a further kludge from __do_follow_link()

Remove a further kludge from __do_follow_link() as it's no longer required with
the automount code.

This reverts the non-helper-function parts of
051d381259eb57d6074d02a6ba6e90e744f1a29f, which breaks union mounts.

Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Bump version

Increase the autofs module sub-version so we can tell what kernel
implementation is being used from user space debug logging.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Add v4 pseudo direct mount support

Version 4 of autofs provides a pseudo direct mount implementation
that relies on directories at the leaves of a directory tree under
an indirect mount to trigger mounts.

This patch adds support for that functionality.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Fix wait validation

It is possible for the check in wait.c:validate_request() to return
an incorrect result if the dentry that was mounted upon has changed
during the callback.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Clean up autofs4_free_ino()

When this function is called the local reference count does't need to
be updated since the dentry is going away and dput definitely must
not be called here.

Also the autofs info struct field inode isn't used so remove it.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Clean up dentry operations

There are now two distinct dentry operations uses. One for dentrys
that trigger mounts and one for dentrys that do not.

Rationalize the use of these dentry operations and rename them to
reflect their function.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Clean up inode operations

Since the use of ->follow_link() has been eliminated there is no
need to separate the indirect and direct inode operations.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Remove unused code

Remove code that is not used due to the use of ->d_automount()
and ->d_manage().

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Add d_manage() dentry operation

This patch required a previous patch to add the ->d_automount()
dentry operation.

Add a function to use the newly defined ->d_manage() dentry operation
for blocking during mount and expire.

Whether the VFS calls the dentry operations d_automount() and d_manage()
is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs
uses the d_automount() operation to callback to user space to request
mount operations and the d_manage() operation to block walks into mounts
that are under construction or destruction.

In order to prevent these functions from being called unnecessarily the
DMANAGED_* flags are cleared for cases which would cause this. In the
common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both
set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is
cleared upon successful mount request completion and set during expire
runs, both during the dentry expire check, and if selected for expire,
is left set until a subsequent successful mount request completes.

The exception to this is the so-called rootless multi-mount which has
no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag
is cleared upon successful mount request completion as well and set
again after a successful expire.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

autofs4: Add d_automount() dentry operation

Add a function to use the newly defined ->d_automount() dentry operation
for triggering mounts instead of doing the user space callback in ->lookup()
and ->d_revalidate().

Note, to be useful the subsequent patch to add the ->d_manage() dentry
operation is also needed so the discussion of functionality is deferred to
that patch.

Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Remove the automount through follow_link() kludge code from pathwalk

Remove the automount through follow_link() kludge code from pathwalk in favour
of using d_automount().

Signed-off-by: David Howells <[email protected]>
Acked-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

CIFS: Use d_automount() rather than abusing follow_link()

Make CIFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

[NOTE: THIS IS UNTESTED!]

Signed-off-by: David Howells <[email protected]>
Cc: Steve French <[email protected]>
Signed-off-by: Al Viro <[email protected]>

NFS: Use d_automount() rather than abusing follow_link()

Make NFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <[email protected]>
Acked-by: Trond Myklebust <[email protected]>
Acked-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

AFS: Use d_automount() rather than abusing follow_link()

Make AFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Add an AT_NO_AUTOMOUNT flag to suppress terminal automount

Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
point directories. This can be used by fstatat() users to permit the
gathering of attributes on an automount point and also prevent
mass-automounting of a directory of automount points by ls.

Signed-off-by: David Howells <[email protected]>
Acked-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Add a dentry op to allow processes to be held during pathwalk transit

Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

mkdir         S ffffffff8014e05a     0 32580  24956
Call Trace:
[<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
[<ffffffff80127f7d>] avc_has_perm+0x46/0x58
[<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
[<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
[<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
[<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
[<ffffffff80057a2f>] lookup_create+0x46/0x80
[<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

automount     D ffffffff8014e05a     0 32581      1              32561
Call Trace:
[<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
[<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
[<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
[<ffffffff800e6d55>] do_rmdir+0x77/0xde
[<ffffffff8005d229>] tracesys+0x71/0xe0
[<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <[email protected]>
Was-Acked-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Add a dentry op to handle automounting rather than abusing follow_link()

Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The ->d_automount() dentry operation:

struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The ->d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.

Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]

Signed-off-by: David Howells <[email protected]>
Was-Acked-by: Ian Kent <[email protected]>
Signed-off-by: Al Viro <[email protected]>

do_lookup() fix

do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
crossing of mountpoints, which breaks things badly. If we
hit need_revalidate: and do nothing in there, we need to come
back into LOOKUP_RCU half of things, not to done: in non-RCU
one.

Signed-off-by: Al Viro <[email protected]>

Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  Update Pekka's email address in MAINTAINERS
  mm/slab.c: make local symbols static
  slub: Avoid use of slub_lock in show_slab_objects()
  memory hotplug: one more lock on memory hotplug

Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: avoid pointless blocked-task warnings
  rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
  rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Add missing Kconfig dependencies
  x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
  x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
  x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timekeeping: Make local variables static
  time: Rename misnamed minsec argument of clocks_calc_mult_shift()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove syscall_exit_fields
  tracing: Only process module tracepoints once
  perf record: Add "nodelay" mode, disabled by default
  perf sched: Fix list of events, dropping unsupported ':r' modifier
  Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
  perf top: Fix annotate segv
  perf evsel: Fix order of event list deletion

Merge branch 'devel-stable' of master.kernel.org:/home/rmk/linux-2.6-arm

* 'devel-stable' of master.kernel.org:/home/rmk/linux-2.6-arm: (161 commits)
  ARM: pxa: fix building issue of missing physmap.h
  ARM: mmp: PXA910 drive strength FAST using wrong value
  ARM: mmp: MMP2 drive strength FAST using wrong value
  ARM: pxa: fix recursive calls in pxa_low_gpio_chip
  AT91: Support for gsia18s board
  AT91: Acme Systems FOX Board G20 board files
  AT91: board-sam9m10g45ek.c: Remove duplicate inclusion of mach/hardware.h
  ARM: pxa: fix suspend/resume array index miscalculation
  ARM: pxa: use cpu_has_ipr() consistently in irq.c
  ARM: pxa: remove unused variable in clock-pxa3xx.c
  ARM: pxa: fix warning in zeus.c
  ARM: sa1111: fix typo in sa1111_retrigger_lowirq()
  ARM mxs: clkdev related compile fixes
  ARM i.MX mx31_3ds: Fix MC13783 regulator names
  ARM: plat-stmp3xxx: irq_data conversion.
  ARM: plat-spear: irq_data conversion.
  ARM: plat-orion: irq_data conversion.
  ARM: plat-omap: irq_data conversion.
  ARM: plat-nomadik: irq_data conversion.
  ARM: plat-mxc: irq_data conversion.
  ...

Fix up trivial conflict in arch/arm/plat-omap/gpio.c (Lennert
Buytenhek's irq_data conversion clashing with some omap irq updates)

Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm

* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: fix missing branch in __error_a
  ARM: fix /proc/$PID/stack on SMP
  ARM: Fix build regression on SA11x0, PXA, and H720x targets
  ARM: 6625/1: use memblock memory regions for "System RAM" I/O resources
  ARM: fix wrongly patched constants
  ARM: 6624/1: fix dependency for CONFIG_SMP_ON_UP
  ARM: 6623/1: Thumb-2: Fix out-of-range offset for Thumb-2 in proc-v7.S
  ARM: 6622/1: fix dma_unmap_sg() documentation
  ARM: 6621/1: bitops: remove condition code clobber for CLZ
  ARM: 6620/1: Change misleading warning when CONFIG_CMDLINE_FORCE is used
  ARM: 6619/1: nommu: avoid mapping vectors page when !CONFIG_MMU
  ARM: sched_clock: make minsec argument to clocks_calc_mult_shift() zero
  ARM: sched_clock: allow init_sched_clock() to be called early
  ARM: integrator: fix compile warning in cpu.c
  ARM: 6616/1: Fix ep93xx-fb init/exit annotations
  ARM: twd: fix display of twd frequency
  ARM: udelay: prevent math rounding resulting in short udelays

parisc: fix compile breakage caused by inlining maybe_mkwrite

on Parisc, we have an include of linux/mm.h inside our asm/pgtable.h, so
this patch

commit 14fd403f2146f740942d78af4e0ee59396ad8eab
Author: Andrea Arcangeli <[email protected]>
Date: Thu Jan 13 15:46:37 2011 -0800

thp: export maybe_mkwrite

Causes us an unsatisfiable use of pte_mkwrite in linux/mm.h

The fix is obviously not to include linux/mm.h in our pgtable.h, which
unbreaks the build.

Signed-off-by: James Bottomley <[email protected]>