Dean Roe [Tue, 24 Jan 2006 22:49:43 +0000 (14:49 -0800)]
[IA64-SGI] add sn_feature_sets bit
SGI's prom has added a new feature which avoids an Altix-specific
MCA that can occur with excessive use of ia64_pal_cache_flush. This
patch adds the #define to the sn_feature_sets.h to reflect that bit
is taken.
Takashi Iwai [Tue, 24 Jan 2006 22:30:56 +0000 (14:30 -0800)]
[IA64-SGI] sn_dma_alloc_coherent should use gfp flags
Takashi helped us track down a bad page state bug we thought was coming
from alsa. It turns out we weren't paying attention to the gfp flags
that were passed in to sn_dma_alloc_coherent().
David L Stevens [Tue, 24 Jan 2006 21:06:39 +0000 (13:06 -0800)]
[IPV6] MLDv2: fix change records when transitioning to/from inactive
The following patch fixes these problems in MLDv2:
1) Add/remove "delete" records for sending change reports when
addition of a filter results in that filter transitioning to/from
inactive. [same as recent IPv4 IGMPv3 fix]
2) Remove 2 redundant "group_type" checks (can't be IPV6_ADDR_ANY
within that loop, so checks are always true)
3) change an is_in() "return 0" to "return type == MLD2_MODE_IS_INCLUDE".
It should always be "0" to get here, but it improves code locality
to not assume it, and if some race allowed otherwise, doing
the check would return the correct result.
Jerome Borsboom [Tue, 24 Jan 2006 20:57:19 +0000 (12:57 -0800)]
[AF_KEY]: no message type set
When returning a message to userspace in reply to a SADB_FLUSH or
SADB_X_SPDFLUSH message, the type was not set for the returned PFKEY
message. The patch below corrects this problem.
Keith Owens [Tue, 24 Jan 2006 01:31:26 +0000 (12:31 +1100)]
[IA64] Set the correct default OS status in the MCA handler
sos->os_status is set to a default value of IA64_MCA_COLD_BOOT for an
MCA, but then is incorrectly overwritten with IA64_MCA_SAME_CONTEXT (0).
This makes SAL think that all MCAs have been recovered.
Nate Diller [Tue, 24 Jan 2006 09:09:14 +0000 (10:09 +0100)]
[BLOCK] elevator: allow default scheduler to potentially be modular
Jens has decided that allowing the default scheduler to be a module is
a bug, and should not be allowed under kconfig. However, I find that
scenario useful for debugging, and wish for the kernel to be able to
handle this situation without OOPSing, if I enable such an option in
the .config directly. This patch dynamically checks for the presence
of the compiled-in default, and falls back to no-op, emitting a
suitable error message, when the default is not available
Tested for a range of boot options on 2.6.16-rc1-mm2.
Nate Diller [Tue, 24 Jan 2006 09:07:58 +0000 (10:07 +0100)]
[BLOCK] elevator: default choice selection
My previous default iosched patch did a poor job dealing with the
'elevator=' boot-time option. The old behavior falls back to the
compiled-in default if the requested one is not registered at boot
time. This patch dynamically evaluates which default
to use, and emits a suitable error message when the requested scheduler
is not available. It also does the 'as' -> 'anticipatory' conversion
before elevator registration, which along with a modified registration
function, allows it to correctly indicate which default scheduler is
in use.
Tested for a range of boot options on 2.6.16-rc1-mm2.
On my latest laptop, I've had occasional PHY dead on wakeup from
sleep... the PHY would be totally unresponsive even to toggling the hard
reset line until the machine is powered down... Looking closely at the
code, I found some possible issues in the way we setup the MDIO lines
during suspend along with slight divergences from what Darwin does when
resetting it that may explain the problem. That patch change these and
the problem appear to be gone for me at least... I also fixed an mdelay
-> msleep while I was at it to the pmac feature code that is called
when toggling the PHY reset line since sungem doesn't call it in an
atomic context anymore.
Russell King [Sat, 21 Jan 2006 23:03:28 +0000 (23:03 +0000)]
[SERIAL] Make uart_port flags a bitwise type
Same reasoning as commit 747c8a55946ed037bf7d62454c3c599c02af2262
but this time we're making uart_port flags a bitwise type - not
all of these flags correspond with the old ASYNC_ flags, so there
is the possibility for bugs if the wrong ASYNC_* constants are
used. Always use UPF_* constants for uart_port->flags.
Russell King [Sat, 21 Jan 2006 22:54:06 +0000 (22:54 +0000)]
[SERIAL] Fix UPF_ flag usage with uart_info->flags
The previous change found a bug in the serial SAK handling - because
we were looking for UPF_SAK set in uart_info->flags, we would never
raise a SAK condition. UPF_SAK is in uart_port->flags.
Russell King [Sat, 21 Jan 2006 22:50:36 +0000 (22:50 +0000)]
[SERIAL] Make uart_info flags a bitwise type
The potential for confusing the flags is fairly high. Make
uart_info's flags a bitwise type so sparse can check that the
right flag definitions are used with the right structure.
Alan Cox [Sat, 21 Jan 2006 14:59:12 +0000 (14:59 +0000)]
[SERIAL] 8250 serial console fixes
This patch resolves most of the problems with an SMP serial console race
with output via the tty path. At the end of the serial console print we
force enable the tx int in case we clobbered the tx interrupt status
racing between the console and tty output. That way the extra tx
interrupt causes the transmit path to restart not hang.
It also makes the serial console printk use the FIFO. This is neccessary
because some remote management devices fake serial console with FIFO and
are confused into sending one packet per character over ethernet when we
stall rather than filling the FIFO.
In order to preserve existing reliability semantics the function waits
for the serial queue to completely empty before returning.
Both of these problems were identified by a Red Hat partner.
are busted. This is most easily seen with an X application
that uses sub-second select/poll timeout such as emacs. You
hit a key and it takes a second or so before the app responds.
The two ROUND_UP() calls upon entry are using {tv,ts}_sec where it
should instead be using {tv_usec,ts_nsec}, which perfectly explains
the observed incorrect behavior.
Adrian Bunk [Fri, 20 Jan 2006 00:25:37 +0000 (01:25 +0100)]
[CPUFREQ] X86_GX_SUSPMOD must depend on PCI
This patch fixes the following compile error:
...
CC arch/i386/kernel/cpu/cpufreq/gx-suspmod.o
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c: In function 'gx_detect_chipset':
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c:193: error: implicit declaration of function 'pci_match_id'
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c:193: warning: comparison between pointer and integer
make[3]: *** [arch/i386/kernel/cpu/cpufreq/gx-suspmod.o] Error 1
John Hawkes [Thu, 19 Jan 2006 07:46:53 +0000 (23:46 -0800)]
[IA64] eliminate softlockup warning
Fix an unnecessary softlockup watchdog warning in the ia64
uncached_build_memmap() that occurs occasionally at 256p and always at
512p. The problem occurs at boot time.
Al Viro [Thu, 19 Jan 2006 12:57:01 +0000 (12:57 +0000)]
[ARM] safer handling of syscall table padding
ARM entry-common.S needs to know syscall table size; in itself that would
not be a problem, but there's an additional constraint - some of the
instructions using it want a constant that would be a multiple of 4.
So we have to pad syscall table with sys_ni_syscall and that's where
the trouble begins. .rept pseudo-op wants a constant expression for
number of repetitions and subtraction of two labels (before and after
syscall table) doesn't always get simplified to constant early enough
for .rept. If labels end up in different frags, we lose. And while
the frag size is large enough (slightly below 4Kb), the syscall table
is about 1/3 of that. We used to get away with that, but the recent
changes had been enough to trigger the breakage.
Proper fix is simple: have a macro (CALL(x)) to populate the table
instead of using explicit .long x and the first time we include calls.S
have it defined to .equ NR_syscalls,NR_syscalls+1. Then we can find
the proper amount of padding on the first inclusion simply by looking
at NR_syscalls at that time. And that will be constant, no matter what.
Moreover, the same trick kills the need of having an estimate of padded
NR_syscalls - it will be calculated for free at the same time.
Russell King [Thu, 19 Jan 2006 12:32:54 +0000 (12:32 +0000)]
[ARM] Remove CONFIG_BROKEN=y from defconfigs
Remove CONFIG_BROKEN=y from the ARM defconfigs, and update with
the appropriate changes. This results in only some unselected
configuration symbols being removed - hence no material effect
on the configuration.
Alan Cox [Thu, 19 Jan 2006 01:44:13 +0000 (17:44 -0800)]
[PATCH] EDAC: core EDAC support code
This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.
The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.
This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.
Alan Cox [Thu, 19 Jan 2006 01:44:07 +0000 (17:44 -0800)]
[PATCH] EDAC: atomic scrub operations
EDAC requires a way to scrub memory if an ECC error is found and the chipset
does not do the work automatically. That means rewriting memory locations
atomically with respect to all CPUs _and_ bus masters. That means we can't
use atomic_add(foo, 0) as it gets optimised for non-SMP
This adds a function to include/asm-foo/atomic.h for the platforms currently
supported which implements a scrub of a mapped block.
It also adjusts a few other files include order where atomic.h is included
before types.h as this now causes an error as atomic_scrub uses u32.
David Woodhouse [Thu, 19 Jan 2006 01:44:05 +0000 (17:44 -0800)]
[PATCH] Add pselect/ppoll system call implementation
The following implementation of ppoll() and pselect() system calls
depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
thread_info.
These system calls have to change the signal mask during their
operation, and signal handlers must be invoked using the new, temporary
signal mask. The old signal mask must be restored either upon successful
exit from the system call, or upon returning from the invoked signal
handler if the system call is interrupted. We can't simply restore the
original signal mask and return to userspace, since the restored signal
mask may actually block the signal which interrupted the system call.
The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
path to trap into do_signal() just as TIF_SIGPENDING does, and by
causing do_signal() to use the saved signal mask instead of the current
signal mask when setting up the stack frame for the signal handler -- or
by causing do_signal() to simply restore the saved signal mask in the
case where there is no handler to be invoked.
The first patch implements the sys_pselect() and sys_ppoll() system
calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
#ifdef should go away in time when all architectures have implemented
it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
kernel (in the -mm tree), and the third patch then removes the
arch-specific implementations of sys_rt_sigsuspend() and replaces them
with generic versions using the same trick.
The fourth and fifth patches, provided by David Howells, implement
TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
adds the syscalls to the i386 syscall table.
This patch:
Add the pselect() and ppoll() system calls, providing core routines usable by
the original select() and poll() system calls and also the new calls (with
their semantics w.r.t timeouts).
Jeff Dike [Thu, 19 Jan 2006 01:44:02 +0000 (17:44 -0800)]
[PATCH] uml: add TIF_RESTORE_SIGMASK support
Add support for TIF_RESTORE_SIGMASK. I copy the i386 handling of the flag.
sys_sigsuspend is also changed to follow i386.
Also a bit of cleanup -
turn an if into a switch
get rid of a couple more emacs formatting comments
David Howells [Thu, 19 Jan 2006 01:44:00 +0000 (17:44 -0800)]
[PATCH] Handle TIF_RESTORE_SIGMASK for i386
Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:
[PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
[PATCH] 3/3 Generic sys_rt_sigsuspend
It does the following:
(1) Declares TIF_RESTORE_SIGMASK for i386.
(2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.
(3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
in current->saved_sigmask.
(4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.
(5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
rather than attempting to fudge the return registers.
(6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
intrinsically.
(7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
-EFAULT rather than true/false to be consistent with the rest of the
kernel.
Due to the fact do_signal() is then only called from one place:
(8) Makes do_signal() no longer have a return value is it was just being
ignored; force_sig() takes care of this.
(9) Discards the old sigmask argument to do_signal() as it's no longer
necessary.
(10) Makes do_signal() static.
(11) Marks the second argument to do_notify_resume() as unused. The unused
argument should remain in the middle as the arguments are passed in as
registers, and the ordering is specific in entry.S
Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(),
they no longer need access to the exception frame, and so can just take
arguments normally.
David Woodhouse [Thu, 19 Jan 2006 01:43:57 +0000 (17:43 -0800)]
[PATCH] Generic sys_rt_sigsuspend()
The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
sys_rt_sigsuspend() instead of duplicating it for each architecture. This
provides such an implementation and makes arch/powerpc use it.
It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
Ulrich Drepper [Thu, 19 Jan 2006 01:43:53 +0000 (17:43 -0800)]
[PATCH] vfs: *at functions: core
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.
We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).
The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:
real 0m31.921s
user 0m0.688s
sys 0m31.234s
With syscall support the results are much better:
real 0m20.699s
user 0m0.536s
sys 0m20.149s
The interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.
find_exported_dentry contains two duplicate loops to find an alias that the
acceptable callback likes. Split this out to a new helper and switch from
list_for_each to list_for_each_entry to make it more readable.
David Shaw [Thu, 19 Jan 2006 01:43:51 +0000 (17:43 -0800)]
[PATCH] knfsd: Provide missing NFSv2 part of patch for checking vfs_getattr.
A recent patch which checked the return status of vfs_getattr in nfsd,
completely missed the nfsproc.c (NFSv2) part. Here is it.
This patch moved the call to vfs_getattr from the xdr encoding (at which point
it is too late to return an error) to the call handling. This means several
calls to vfs_getattr are needed in nfsproc.c. Many are encapsulated in
nfsd_return_attrs and nfsd_return_dirop.
Al Viro [Thu, 19 Jan 2006 01:43:48 +0000 (17:43 -0800)]
[PATCH] nfsd4_lock() returns bogus values to clients
missing nfserrno() in default case of a switch by return value of
posix_lock_file(); as the result we send negative host-endian to clients that
expect positive network-endian, preferably mentioned in RFC... BTW, that case
is not impossible - posix_lock_file() can return -ENOLCK and we do not handle
that one explicitly.
Al Viro [Thu, 19 Jan 2006 01:43:47 +0000 (17:43 -0800)]
[PATCH] NFSERR_SERVERFAULT returned host-endian
->rp_status is network-endian and nobody byteswaps it before sending to
client; putting NFSERR_SERVERFAULT instead of nfserr_serverfault in there is
not nice...
J. Bruce Fields [Thu, 19 Jan 2006 01:43:36 +0000 (17:43 -0800)]
[PATCH] nfsd4: don't create on open that fails due to ERR_GRACE
In an earlier patch (commit b648330a1d741d5df8a5076b2a0a2519c69c8f41) I noted
that a too-early grace-period check was preventing us from bumping the
sequence id on open. Unfortunately in that patch I stupidly moved the
grace-period check back too far, so now an open for create can succesfully
create the file while still returning ERR_GRACE.
The correct place for that check is after we've set the open_owner and handled
any replays, but before we actually start mucking with the filesystem.
J. Bruce Fields [Thu, 19 Jan 2006 01:43:32 +0000 (17:43 -0800)]
[PATCH] nfsd4: no replays on unconfirmed owners
We shouldn't check for replays until after checking whether the open owner is
confirmed. Clients are allowed to reuse openowners without bumping the seqid.
J. Bruce Fields [Thu, 19 Jan 2006 01:43:30 +0000 (17:43 -0800)]
[PATCH] nfsd4: handle replays of failed open reclaims
We need to make sure open reclaims are marked confirmed immediately so that we
can handle replays even if they fail (e.g. with a seqid-incrementing error).
(See 8.1.8.)
J. Bruce Fields [Thu, 19 Jan 2006 01:43:29 +0000 (17:43 -0800)]
[PATCH] nfsd4: recovery lookup dir check
Make sure we get a directory when we look up the recovery directory.
Thanks to Christoph Hellwig for the bug report.
Based on feedback from Christoph and others, we may remove the need for this
lookup and just pass in a file descriptor from userspace instead, and/or
completely move the directory handling to userspace. For now we're just
fixing the obvious bugs.
Kevin Coffman [Thu, 19 Jan 2006 01:43:25 +0000 (17:43 -0800)]
[PATCH] svcrpc: gss: server context init failure handling
We require the server's gssd to create a completed context before asking the
kernel to send a final context init reply. However, gssd could be buggy, or
under some bizarre circumstances we might purge the context from our cache
before we get the chance to use it here.
Handle this case by returning GSS_S_NO_CONTEXT to the client.
Also move the relevant code here to a separate function rather than nesting
excessively.
Andy Adamson [Thu, 19 Jan 2006 01:43:24 +0000 (17:43 -0800)]
[PATCH] svcrpc: gss: handle the GSS_S_CONTINUE
Kerberos context initiation is handled in a single round trip, but other
mechanisms (including spkm3) may require more, so we need to handle the
GSS_S_CONTINUE case in svcauth_gss_accept. Send a null verifier.
J. Bruce Fields [Thu, 19 Jan 2006 01:43:19 +0000 (17:43 -0800)]
[PATCH] nfsd4: rename lk_stateowner
One of the things that's confusing about nfsd4_lock is that the lk_stateowner
field could be set to either of two different lockowners: the open owner or
the lock owner. Rename to lk_replay_owner and add a comment to make it clear
that it's used for whichever stateowner has its sequence id bumped for replay
detection.
J. Bruce Fields [Thu, 19 Jan 2006 01:43:18 +0000 (17:43 -0800)]
[PATCH] nfsd4: fix nfsd4_lock cleanup on failure
release_state_owner also puts the lock owner on the close_lru. There's no
need for that, though; replays of the failed lock would be handled by the
openowner not the lockowner.
Also consolidate the cleanup a bit, fixing leaks that can happen if errors
occur between the time a new lock owner is allocated and the lock is done.
Remove a comment and dprintk that look a little redundant.
J. Bruce Fields [Thu, 19 Jan 2006 01:43:16 +0000 (17:43 -0800)]
[PATCH] svcrpc: save and restore the daddr field when request deferred
The server code currently keeps track of the destination address on every
request so that it can reply using the same address. However we forget to do
that in the case of a deferred request. Remedy this oversight. >From folks
at PolyServe.
NeilBrown [Thu, 19 Jan 2006 01:43:14 +0000 (17:43 -0800)]
[PATCH] nfsd: remove inline from a couple of large NFS functions
These are both called from two places close together. I could rearrange that
code so there is only one call site, but just removing the 'inline' is
probably best.
YAMAMOTO Takashi [Thu, 19 Jan 2006 01:43:13 +0000 (17:43 -0800)]
[PATCH] nfsd: check error status from nfsd_sync_dir
Change nfsd_sync_dir to return an error if ->sync fails, and pass that error
up through the stack. This involves a number of rearrangements of error
paths, and care to distinguish between Linux -errno numbers and NFSERR
numbers.
In the 'create' routines, we continue with the 'setattr' even if a previous
sync_dir failed.
This patch is quite different from Takashi's in a few ways, but there is still
a strong lineage.
Roman Zippel [Thu, 19 Jan 2006 01:43:10 +0000 (17:43 -0800)]
[PATCH] hfs: set correct create date for links
HFS+ also requires the correct creation date so recent version of OS X
recognize it as link.
Improve link handling:
- if something is wrong with the link, ignore the link attribute and treat
it as regular file (this also fixes a missing unlock during lookup).
- check for incorrect link counts during unlink.
Roman Zippel [Thu, 19 Jan 2006 01:43:09 +0000 (17:43 -0800)]
[PATCH] hfs: set correct ctime
Read the correct ctime from disk (it was written but never read for some
reason). Read also creation date, which is used in the next patch. (Problem
found by Olivier Castan <[email protected]>)