Raymond Yau [Sun, 16 Jan 2011 02:55:54 +0000 (10:55 +0800)]
ALSA : au88x0 - Limit number of channels to fix Oops via OSS emu
Fix playback/capture channels patch to change supported playback
channels of au8830 to 1,2,4 and capture channels to 1,2.
This prevent oops when oss emulation use SNDCTL_DSP_CHANNELS to
set 3 Channels
Randy Dunlap [Sun, 2 Jan 2011 22:44:00 +0000 (14:44 -0800)]
fs: FS_POSIX_ACL does not depend on BLOCK
- Fix a kconfig unmet dependency warning.
- Remove the comment that identifies which filesystems use POSIX ACL
utility routines.
- Move the FS_POSIX_ACL symbol outside of the BLOCK symbol if/endif block
because its functions do not depend on BLOCK and some of the filesystems
that use it do not depend on BLOCK.
warning: (GENERIC_ACL && JFFS2_FS_POSIX_ACL && NFSD_V4 && NFS_ACL_SUPPORT && 9P_FS_POSIX_ACL) selects FS_POSIX_ACL which has unmet direct dependencies (BLOCK)
Steven Rostedt [Tue, 14 Dec 2010 00:38:09 +0000 (19:38 -0500)]
fs: Remove unlikely() from fget_light()
There's an unlikely() in fget_light() that assumes the file ref count
will be 1. Running the annotate branch profiler on a desktop that is
performing daily tasks (running firefox, evolution, xchat and is also part
of a distcc farm), it shows that the ref count is not 1 that often.
correct incorrect % Function File Line
------- --------- - -------- ---- ---- 10350993586209599193 85 fget_light file_table.c 315
Steven Rostedt [Tue, 14 Dec 2010 00:38:08 +0000 (19:38 -0500)]
fs: Remove unlikely() from fput_light()
In fput_light(), there's an unlikely(fput_needed), which running on
my normal desktop doing firefox, xchat, evolution and part of my distcc farm,
and running the annotate branch profiler shows that the unlikely is not
very unlikely.
Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.
This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
make the feature checks in ->fallocate future proof
Instead of various home grown checks that might need updates for new
flags just check for any bit outside the mask of the features supported
by the filesystem. This makes the check future proof for any newly
added flag.
Tejun Heo [Tue, 19 Oct 2010 15:24:36 +0000 (15:24 +0000)]
RDMA: Update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband
instead of the system workqueue. All system workqueue usages
including flush_scheduled_work() callers are converted to use and
flush ib_wq.
* cancel_delayed_work() + flush_scheduled_work() converted to
cancel_delayed_work_sync().
* qib_wq is removed and ib_wq is used instead.
This is to prepare for deprecation of flush_scheduled_work().
Commit df9ee29270 made arch_local_irq_save and arch_local_irq_restore
static inline which with -Werror trips up on __set_hae() and _set_hae()
which are extern inline. The naive solution is to make __set_hae() and
set_hae() static inline but for reasons described in commit d559d4a24a3fe
this breaks the generic kernel build. Instead, since this is architecture
specific code, this patch hard wires in the architecture specific method
f disabling and enabling interrupts.
Alex Deucher [Tue, 11 Jan 2011 18:36:55 +0000 (13:36 -0500)]
drm/radeon/kms: balance asic_reset functions
First, we were calling mc_stop() at the top of the function
which turns off all MC (memory controller) clients,
then checking if the GPU is idle. If it was idle we
returned without re-enabling the MC clients which would
lead to a blank screen, etc. This patch checks if the
GPU is idle before calling mc_stop().
Second, if the reset failed, we were returning without
re-enabling the MC clients. This patch re-enables
the MC clients before returning regardless of whether
the reset was successful or not.
Dave Airlie [Mon, 17 Jan 2011 02:20:31 +0000 (12:20 +1000)]
Merge remote branch 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next into drm-fixes
* 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next:
drm/nouveau: fix gpu page faults triggered by plymouthd
drm/nouveau: greatly simplify mm, killing some bugs in the process
drm/nvc0: enable protection of system-use-only structures in vm
drm/nv40: initialise 0x17xx on all chipsets that have it
drm/nv40: make detection of 0x4097-ful chipsets available everywhere
Andrew Morton [Mon, 17 Jan 2011 00:55:23 +0000 (16:55 -0800)]
drivers/nfc/pn544.c: fix min_t warnings
Fix these:
drivers/nfc/pn544.c: In function 'pn544_read':
drivers/nfc/pn544.c:356: warning: comparison of distinct pointer types lacks a cast
drivers/nfc/pn544.c:377: warning: comparison of distinct pointer types lacks a cast
drivers/nfc/pn544.c: In function 'pn544_write':
drivers/nfc/pn544.c:463: warning: comparison of distinct pointer types lacks a cast
drivers/nfc/pn544.c:485: warning: comparison of distinct pointer types lacks a cast
Andrea Arcangeli [Sun, 16 Jan 2011 21:10:39 +0000 (13:10 -0800)]
fix non-x86 build failure in pmdp_get_and_clear
pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
BUG() and they were defined only to diminish the risk of build issues on
not-x86 archs and to be consistent with the generic pte methods previously
defined in include/asm-generic/pgtable.h.
But they are causing more trouble than they were supposed to solve, so
it's simpler not to define them when THP is off.
This is also correcting the export of pmdp_splitting_flush which is
currently unused (x86 isn't using the generic implementation in
mm/pgtable-generic.c and no other arch needs that [yet]).
Maciej Sosnowski [Wed, 24 Nov 2010 17:29:54 +0000 (17:29 +0000)]
RDMA/nes: Fix incorrect SFP+ link status detection on driver init
During iw_nes initialization the link status for SFP+ PHY is always
detected as "up" regardless of real state (cable either connected or
disconnected). Add SFP+ PHY specific link status detection to the
iw_nes initialization procedure. Use link status recheck for
netdev_open to detect delayed state updates.
Maciej Sosnowski [Wed, 24 Nov 2010 17:29:46 +0000 (17:29 +0000)]
RDMA/nes: Fix SFP+ link down detection issue with switch port disable
In case of SFP+ PHY, link status check at interrupt processing can
give false results. For proper link status change detection a delayed
recheck is needed to give nes registers time to settle. Add a
periodic link status recheck scheduled at interrupt to detect
potential delayed registers state changes.
Addresses: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2117 Signed-off-by: Maciej Sosnowski <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
Depending on link state change, IB_EVENT_PORT_ERR or
IB_EVENT_PORT_ACTIVE should be generated when handling MAC interrupts.
Plugging in a cable happens to result in series of interrupts changing
driver's link state a number of times before finally staying at link
up (e.g. link up, link down, link up, link down, ..., link up). To
prevent sending series of redundant IB_EVENT_PORT_ACTIVE and
IB_EVENT_PORT_ERR events, we use a timer to debounce them in
nes_port_ibevent().
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:
fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by OCFS2_FS
fs/ocfs2/Kconfig:1: symbol OCFS2_FS depends on SYSFS
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:
fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM
fs/dlm/Kconfig:1: symbol DLM depends on SYSFS
net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:
fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by NETCONSOLE_DYNAMIC
drivers/net/Kconfig:3390: symbol NETCONSOLE_DYNAMIC depends on SYSFS
Stefan Schmidt [Wed, 12 Jan 2011 09:30:42 +0000 (10:30 +0100)]
fs/btrfs: Fix build of ctree
Fix the build failure in some configurations:
CC [M] fs/btrfs/ctree.o
In file included from fs/btrfs/ctree.c:21:0:
fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type
fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type
make[2]: *** [fs/btrfs/ctree.o] Error 1
make[1]: *** [fs/btrfs] Error 2
make: *** [fs] Error 2
caused by commit 57cc7215b708 ("headers: kobject.h redux")
ACPI: Fix boot problem related to APEI with acpi_disabled set
Commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each root
bridge (v3)") put the acpi_hest_init() call in acpi_pci_root_init() into
a wrong place, presumably because the author confused acpi_pci_disabled
with acpi_disabled. Bring the code ordering in acpi_pci_root_init()
back to sanity.
Additionally, make sure that hest_disable is set when acpi_disabled is
set, which is going to prevent acpi_hest_parse(), that still may be
executed for acpi_disabled=1 through aer_acpi_firmware_first(), from
crashing because of uninitialized hest_tab.
PCI / ACPI: Fix build of the AER driver for CONFIG_ACPI unset
After commit 415e12b23792 ("PCI/ACPI: Request _OSC control once for each
root bridge (v3)") include/linux/pci-acpi.h is included by
drivers/pci/pcie/aer/aerdrv.c and if CONFIG_ACPI is unset, the bogus and
unnecessary alternative definition of acpi_find_root_bridge_handle()
causes a build error to occur.
Linus Torvalds [Sun, 16 Jan 2011 19:31:50 +0000 (11:31 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
sanitize vfsmount refcounting changes
fix old umount_tree() breakage
autofs4: Merge the remaining dentry ops tables
Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
Allow d_manage() to be used in RCU-walk mode
Remove a further kludge from __do_follow_link()
autofs4: Bump version
autofs4: Add v4 pseudo direct mount support
autofs4: Fix wait validation
autofs4: Clean up autofs4_free_ino()
autofs4: Clean up dentry operations
autofs4: Clean up inode operations
autofs4: Remove unused code
autofs4: Add d_manage() dentry operation
autofs4: Add d_automount() dentry operation
Remove the automount through follow_link() kludge code from pathwalk
CIFS: Use d_automount() rather than abusing follow_link()
NFS: Use d_automount() rather than abusing follow_link()
AFS: Use d_automount() rather than abusing follow_link()
Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
...
Al Viro [Sat, 15 Jan 2011 03:30:21 +0000 (22:30 -0500)]
sanitize vfsmount refcounting changes
Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.
Accounting rules for longterm count:
* 1 for each fs_struct.root.mnt
* 1 for each fs_struct.pwd.mnt
* 1 for having non-NULL ->mnt_ns
* decrement to 0 happens only under vfsmount lock exclusive
That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count. Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.
For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.
Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc. And we get to keep
the optimization Nick's variant had bought us...
Al Viro [Sun, 16 Jan 2011 01:08:44 +0000 (20:08 -0500)]
fix old umount_tree() breakage
Expiry-related code calls umount_tree() several times with
the same list to collect vfsmounts to. Which is fine, except
that umount_tree() implicitly assumed that the list would
be empty on each call - it moves the victims over there and
then iterates through the list kicking them out. It's *almost*
idempotent, so everything nearly worked. However, mnt->ghosts
handling (and thus expirability checks) had been broken - that
part was not idempotent...
The fix is trivial - use local temporary list, splice it to
the the collector list when we are through.
Ben Hutchings [Wed, 29 Dec 2010 14:55:03 +0000 (14:55 +0000)]
btrfs: Require CAP_SYS_ADMIN for filesystem rebalance
Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
filesystem and may run uninterruptibly for a long time. This does not
seem to be something that an unprivileged user should be able to do.
Josef Bacik [Wed, 12 Jan 2011 21:04:22 +0000 (21:04 +0000)]
Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check
If we run low on space we could get a bunch of warnings out of
btrfs_block_rsv_check, but this is mostly just called via the transaction code
to see if we need to end the transaction, it expects to see failures, so let's
not WARN and freak everybody out for no reason. Thanks,
btrfs_free_path() passes its argument on to other functions and some of
them end up dereferencing the pointer.
In the code above that pointer is clearly NULL, so btrfs_free_path() will
eventually cause a NULL dereference.
There are many ways to cut this cake (fix the bug). The one I chose was to
make btrfs_free_path() deal gracefully with NULL pointers. If you
disagree, feel free to come up with an alternative patch.
Dave Young [Sat, 8 Jan 2011 10:09:13 +0000 (10:09 +0000)]
btrfs: mount failure return value fix
I happened to pass swap partition as root partition in cmdline,
then kernel panic and tell me about "Cannot open root device".
It is not correct, in fact it is a fs type mismatch instead of 'no device'.
Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL.
The logic in init/do_mounts.c:
for (p = fs_names; *p; p += strlen(p)+1) {
int err = do_mount_root(name, p, flags, root_mount_data);
switch (err) {
case 0:
goto out;
case -EACCES:
flags |= MS_RDONLY;
goto retry;
case -EINVAL:
continue;
}
print "Cannot open root device"
panic
}
SO fs type after btrfs will have no chance to mount
Jesper Juhl [Thu, 6 Jan 2011 21:45:21 +0000 (21:45 +0000)]
btrfs: Mem leak in btrfs_get_acl()
It seems to me that we leak the memory allocated to 'value' in
btrfs_get_acl() if the call to posix_acl_from_xattr() fails.
Here's a patch that attempts to correct that problem.
Miao Xie [Wed, 5 Jan 2011 10:07:31 +0000 (10:07 +0000)]
btrfs: fix wrong free space information of btrfs
When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.
# mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
# btrfs-show
Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 28.00KB
devid 1 size 5.01GB used 2.03GB path /dev/sda9
devid 2 size 10.00GB used 2.01GB path /dev/sda10
# btrfs device scan /dev/sda9 /dev/sda10
# mount /dev/sda9 /mnt
# dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
(fill the filesystem)
# sync
# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt
# btrfs-show
Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
Total devices 2 FS bytes used 3.99GB
devid 1 size 5.01GB used 5.01GB path /dev/sda9
devid 2 size 10.00GB used 4.99GB path /dev/sda10
It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.
This patch fixes it by calcing the free space that can be used to allocate
chunks.
Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
3.1. if it is not zero, and then check the number of the devices that has
more free space than this device,
if the number of the devices is beyond the min stripe number, the free
space can be used, and add into total free space.
if the number of the devices is below the min stripe number, we can not
use the free space, the check ends.
3.2. if the free space is zero, check the next devices, goto 3.1
This implementation is just likely fake chunk allocation.
After appling this patch, df can show correct space information:
# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda9 btrfs 17G 8.6G 0 100% /mnt
Miao Xie [Wed, 5 Jan 2011 10:07:28 +0000 (10:07 +0000)]
btrfs: make the chunk allocator utilize the devices better
With this patch, we change the handling method when we can not get enough free
extents with default size.
Implementation:
1. Look up the suitable free extent on each device and keep the search result.
If not find a suitable free extent, keep the max free extent
2. If we get enough suitable free extents with default size, chunk allocation
succeeds.
3. If we can not get enough free extents, but the number of the extent with
default size is >= min_stripes, we just change the mapping information
(reduce the number of stripes in the extent map), and chunk allocation
succeeds.
4. If the number of the extent with default size is < min_stripes, sort the
devices by its max free extent's size descending
5. Use the size of the max free extent on the (num_stripes - 1)th device as the
stripe size to allocate the device space
By this way, the chunk allocator can allocate chunks as large as possible when
the devices' space is not enough and make full use of the devices.
Miao Xie [Wed, 5 Jan 2011 10:07:24 +0000 (10:07 +0000)]
btrfs: fix wrong calculation of stripe size
There are two tiny problem:
- One is When we check the chunk size is greater than the max chunk size or not,
we should take mirrors into account, but the original code didn't.
- The other is btrfs shouldn't use the size of the residual free space as the
length of of a dup chunk when doing chunk allocation. It is because the device
space that a dup chunk needs is twice as large as the chunk size, if we use
the size of the residual free space as the length of a dup chunk, we can not
get enough free space. Fix it.
But if we do the last step again, we can write data successfully. The reason of
the problem is that btrfs didn't try to commit the current transaction and
reclaim some space when chunk allocation failed.
This patch fixes it by committing the current transaction to reclaim some
space when chunk allocation fails.
Stefan Schmidt [Wed, 12 Jan 2011 09:30:42 +0000 (09:30 +0000)]
fs/btrfs: Fix build of ctree
CC [M] fs/btrfs/ctree.o
In file included from fs/btrfs/ctree.c:21:0:
fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
make[2]: *** [fs/btrfs/ctree.o] Error 1
make[1]: *** [fs/btrfs] Error 2
make: *** [fs] Error 2
Michal Simek [Sun, 16 Jan 2011 12:56:53 +0000 (13:56 +0100)]
microblaze: Fix asm/pgtable.h
Function ptep_test_and_clear_young have had wrong the first argument.
It is also necessary to add __HAVE macros for ptep_test_and_clear_young and
ptep_get_and_clear functions.
Error log:
In file included from linux/arch/microblaze/include/asm/pgtable.h:570,
from arch/microblaze/mm/pgtable.c:35:
include/asm-generic/pgtable.h:23: error: conflicting types for 'ptep_test_and_clear_young'
linux/arch/microblaze/include/asm/pgtable.h:449: error:
previous definition of 'ptep_test_and_clear_young' was here
include/asm-generic/pgtable.h:73: error: redefinition of 'ptep_get_and_clear'
linux/arch/microblaze/include/asm/pgtable.h:462: error:
previous definition of 'ptep_get_and_clear' was here
Michal Simek [Sun, 16 Jan 2011 12:50:17 +0000 (13:50 +0100)]
microblaze: Fix missing pagemap.h
Add missing linux/pagemap.h to solve compilation error.
Error log:
In file included from linux/arch/microblaze/include/asm/tlb.h:17,
from mm/pgtable-generic.c:9:
include/asm-generic/tlb.h: In function 'tlb_flush_mmu':
include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages'
include/asm-generic/tlb.h: In function 'tlb_remove_page':
include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release'
Grant Likely [Thu, 13 Jan 2011 22:36:09 +0000 (15:36 -0700)]
dt/flattree: Return virtual address from early_init_dt_alloc_memory_arch()
The physical address is never used by the device tree code when
allocating memory for unflattening. Change the architecture's alloc
hook to return the virutal address instead.
David Howells [Sat, 15 Jan 2011 10:51:57 +0000 (10:51 +0000)]
autofs4: Merge the remaining dentry ops tables
Merge the remaining autofs4 dentry ops tables. It doesn't matter if
d_automount and d_manage are present on something that's not mountable or
holdable as these ops are only used if the appropriate flags are set in
dentry->d_flags.
[AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the
same dentry_operations.
David Howells [Fri, 14 Jan 2011 19:10:03 +0000 (19:10 +0000)]
Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.
This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.
To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.
d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.
This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.
David Howells [Fri, 14 Jan 2011 18:46:51 +0000 (18:46 +0000)]
Allow d_manage() to be used in RCU-walk mode
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call
d_manage() directly. d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).
autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.
Ian Kent [Fri, 14 Jan 2011 18:46:35 +0000 (18:46 +0000)]
autofs4: Add v4 pseudo direct mount support
Version 4 of autofs provides a pseudo direct mount implementation
that relies on directories at the leaves of a directory tree under
an indirect mount to trigger mounts.
Ian Kent [Fri, 14 Jan 2011 18:46:30 +0000 (18:46 +0000)]
autofs4: Fix wait validation
It is possible for the check in wait.c:validate_request() to return
an incorrect result if the dentry that was mounted upon has changed
during the callback.
Ian Kent [Fri, 14 Jan 2011 18:46:24 +0000 (18:46 +0000)]
autofs4: Clean up autofs4_free_ino()
When this function is called the local reference count does't need to
be updated since the dentry is going away and dput definitely must
not be called here.
Also the autofs info struct field inode isn't used so remove it.
Ian Kent [Fri, 14 Jan 2011 18:46:03 +0000 (18:46 +0000)]
autofs4: Add d_manage() dentry operation
This patch required a previous patch to add the ->d_automount()
dentry operation.
Add a function to use the newly defined ->d_manage() dentry operation
for blocking during mount and expire.
Whether the VFS calls the dentry operations d_automount() and d_manage()
is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs
uses the d_automount() operation to callback to user space to request
mount operations and the d_manage() operation to block walks into mounts
that are under construction or destruction.
In order to prevent these functions from being called unnecessarily the
DMANAGED_* flags are cleared for cases which would cause this. In the
common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both
set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is
cleared upon successful mount request completion and set during expire
runs, both during the dentry expire check, and if selected for expire,
is left set until a subsequent successful mount request completes.
The exception to this is the so-called rootless multi-mount which has
no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag
is cleared upon successful mount request completion as well and set
again after a successful expire.
Ian Kent [Fri, 14 Jan 2011 18:45:58 +0000 (18:45 +0000)]
autofs4: Add d_automount() dentry operation
Add a function to use the newly defined ->d_automount() dentry operation
for triggering mounts instead of doing the user space callback in ->lookup()
and ->d_revalidate().
Note, to be useful the subsequent patch to add the ->d_manage() dentry
operation is also needed so the discussion of functionality is deferred to
that patch.
David Howells [Fri, 14 Jan 2011 18:45:31 +0000 (18:45 +0000)]
Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
point directories. This can be used by fstatat() users to permit the
gathering of attributes on an automount point and also prevent
mass-automounting of a directory of automount points by ls.
David Howells [Fri, 14 Jan 2011 18:45:26 +0000 (18:45 +0000)]
Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk. The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).
The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory. This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.
The ->d_manage() dentry operation:
int (*d_manage)(struct path *path, bool mounting_here);
takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.
->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep. However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.
Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.
follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).
A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()). The new follow_down() calls d_manage() as appropriate. It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage(). follow_down()
ignores automount points so that it can be used to mount on them.
__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself. That would allow the autofs
daemon to continue on in rcu-walk mode.
Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked. It can always be set again when necessary.
==========================
WHAT THIS MEANS FOR AUTOFS
==========================
Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.
autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it. This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.
The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:
This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
David Howells [Fri, 14 Jan 2011 18:45:21 +0000 (18:45 +0000)]
Add a dentry op to handle automounting rather than abusing follow_link()
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation. The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).
This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.
takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted. If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar). If there's a collision with
another automount attempt, NULL should be returned. If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned. In any other case, an error code should be
returned.
The ->d_automount() operation is called with no locks held and may sleep. At
this point the pathwalk algorithm will be in ref-walk mode.
Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints. It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful. The path will be updated to point to the mounted
filesystem if a successful automount took place.
__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()). This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).
__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.
follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".
I've also extracted the mount/don't-mount logic from autofs4 and included it
here. It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname. If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.
I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points. This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary. It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.
[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]
Al Viro [Sat, 15 Jan 2011 18:12:53 +0000 (13:12 -0500)]
do_lookup() fix
do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
crossing of mountpoints, which breaks things badly. If we
hit need_revalidate: and do nothing in there, we need to come
back into LOOKUP_RCU half of things, not to done: in non-RCU
one.
Linus Torvalds [Sat, 15 Jan 2011 21:01:14 +0000 (13:01 -0800)]
Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
Update Pekka's email address in MAINTAINERS
mm/slab.c: make local symbols static
slub: Avoid use of slub_lock in show_slab_objects()
memory hotplug: one more lock on memory hotplug
Linus Torvalds [Sat, 15 Jan 2011 20:45:00 +0000 (12:45 -0800)]
Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: avoid pointless blocked-task warnings
rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, olpc: Add missing Kconfig dependencies
x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timekeeping: Make local variables static
time: Rename misnamed minsec argument of clocks_calc_mult_shift()
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove syscall_exit_fields
tracing: Only process module tracepoints once
perf record: Add "nodelay" mode, disabled by default
perf sched: Fix list of events, dropping unsupported ':r' modifier
Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
perf top: Fix annotate segv
perf evsel: Fix order of event list deletion
Linus Torvalds [Sat, 15 Jan 2011 20:29:50 +0000 (12:29 -0800)]
Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: fix missing branch in __error_a
ARM: fix /proc/$PID/stack on SMP
ARM: Fix build regression on SA11x0, PXA, and H720x targets
ARM: 6625/1: use memblock memory regions for "System RAM" I/O resources
ARM: fix wrongly patched constants
ARM: 6624/1: fix dependency for CONFIG_SMP_ON_UP
ARM: 6623/1: Thumb-2: Fix out-of-range offset for Thumb-2 in proc-v7.S
ARM: 6622/1: fix dma_unmap_sg() documentation
ARM: 6621/1: bitops: remove condition code clobber for CLZ
ARM: 6620/1: Change misleading warning when CONFIG_CMDLINE_FORCE is used
ARM: 6619/1: nommu: avoid mapping vectors page when !CONFIG_MMU
ARM: sched_clock: make minsec argument to clocks_calc_mult_shift() zero
ARM: sched_clock: allow init_sched_clock() to be called early
ARM: integrator: fix compile warning in cpu.c
ARM: 6616/1: Fix ep93xx-fb init/exit annotations
ARM: twd: fix display of twd frequency
ARM: udelay: prevent math rounding resulting in short udelays