Keith Owens [Tue, 11 Apr 2006 05:54:07 +0000 (22:54 -0700)]
[PATCH] Reinstate const in next_thread()
Before commit 47e65328a7b1cdfc4e3102e50d60faf94ebba7d3, next_thread() took
a const task_t. Reinstate the const qualifier, getting the next thread
never changes the current thread.
Ben Dooks [Tue, 11 Apr 2006 05:54:02 +0000 (22:54 -0700)]
[PATCH] leds: re-layout include/linux/leds.h
Lay out the structure definitions in include/linux/leds.h to be aligned as
much as possible. Also minor updates to the comments to make them more
concise.
Andrew Morton [Tue, 11 Apr 2006 05:53:58 +0000 (22:53 -0700)]
[PATCH] timer initialisation fix
We need the boot CPU's tvec_bases[] entry to be initialised super-early in
boot, for early_serial_setup(). That runs within setup_arch(), before even
per-cpu areas are initialised.
The patch changes tvec_bases to use compile-time initialisation, and adds a
separate array `tvec_base_done' to keep track of which CPU has had its
tvec_bases[] entry initialised (because we can no longer use the zeroness of
that tvec_bases[] entry to determine whether it has been initialised).
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
While cleaning up parisc_ksyms.c earlier, I noticed that strpbrk wasn't
being exported from lib/string.c. Investigating further, I noticed a
changeset that removed its export and added it to _ksyms.c on a few more
architectures. The justification was that "other arches do it."
I think this is wrong, since no architecture currently defines
__HAVE_ARCH_STRPBRK, there's no reason for any of them to be exporting it
themselves. Therefore, consolidate the export to lib/string.c.
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Dave Jones [Tue, 11 Apr 2006 05:53:51 +0000 (22:53 -0700)]
[PATCH] S390: fix implicit declaration of (un)likely.
include/asm/atomic.h:94: warning: implicit declaration of function 'unlikely'
include/asm/atomic.h:97: warning: implicit declaration of function 'likely'
The proc_mkdir calls in the dasd driver are not check for NULL pointers. Add
code to check the pointers and bail out if one of the proc entries could not
be created.
[PATCH] s390: fail-fast requests on quiesced devices
Using the fail-fast flag in i/o requests on a dasd disk which has been
quiesced leads to kernel panics. Modify the request start function to only
work on requests in a valid state.
The dasd driver sometimes print the misleading message "Can't offline dasd
device with open count = 0". The reason why it can't offline the device in
this case is that the device is still in the startup phase. Print a more
meaningful message.
Debugging events in cio_trace/hex_ascii are truncated for some trace entries.
Increase trace event size to 16 bytes to cover longer text events, make
CIO_HEX_EVENT an inline function that loops to cover bigger hex events.
[PATCH] s390: wrong return codes in cio_ignore_proc_init()
cio_ignore_proc_init() returns 1 in case of success and 0 in case of failure.
The caller tests for != 0, so better return 0 in case of success and -ENOENT
in case of failure.
[PATCH] uml: avoid warnings for diffent names for an unsigned quadword
Since on some 64-bit systems __u64 is rightfully defined to unsigned long and
GCC recognizes anyway unsigned long and unsigned long long as different, fix
some types back to being unsigned long long to avoid warnings and errors (for
prototype mismatch) on those systems.
Thanks to the report by Wesley Emeneker wesleyemeneker (at) google (dot) com
The call to local_save_flags seems bogus since it is followed by
local_irq_restore, and it's intended to lock the list from concurrent
mconsole_interrupt invocations.
Switch this proc from storing 4k of data (a whole path) on the stack to
keeping it on the heap.
Maybe it's not called in process context but only in early boot context (where
in UML you have a normal process stack on the host) but just to be safe, fix
it.
While at it some little readability simplifications.
- Some bug come from conversion to os-Linux (open() doesn't follow the
kernel -errno return convention, while the old code called os_open_file()
which followed it). This caused the wrong return code to be printed.
- Then be more precise about what happened and do some whitespace fixes.
[PATCH] uml: fix hang on run_helper() failure on uml_net
Fix an hang on a pipe when run_helper() fails when called by change_tramp()
(i.e. when calling uml_net) - reproduced the bug and verified this fixes it.
Make sparse checker work for userspace files - it normally gets -nostdinc
separately, so avoid having it for userspace files. Also, add -D$(SUBARCH)
for multiarch hosts (i.e. AMD64 with compatibility headers).
It works, the only problem is a bit of bogus warnings for system headers, but
they're not too many.
Noticed this for a compilation-time warning, so I'm fixing it even for TT mode
- this is not put_user, but copy_to_user, so we need a pointer to sp, not sp
itself (we're trying to write the word pointed to by the "sp" var.).
[PATCH] uml: fix 2 harmless cast warnings for 64-bit
Fix two harmless warnings in 64-bit compilation (the 2nd doesn't trigger for
now because of a missing __attribute((format)) for cow_printf, but next
patches fix that).
[PATCH] uml: safe migration path to the correct V3 COW format
- Correct the layout of all header versions - make all them well-specified
for any external event. As we don't have 1-byte or 2-byte wide fields, the
32-bit layout (historical one) has no extra padding, so we can safely add
__attribute__((packed)).
- Add detection and reading of the broken 64-bit COW format which has been
around for a while - to allow safe migration to the correct 32-bit format.
Safe detection is possible, thanks to some luck with the existing format,
and it works in practice.
[PATCH] uml: make 64-bit COW files compatible with 32-bit ones
This is the minimal fix to make 64-bit UML binaries create 32-bit compatible
COW files and read them. I've indeed tested that current code doesn't do this
- the code gets SIGFPE for a division by a value read at the wrong place,
where 0 is found.
Jeff Dike [Tue, 11 Apr 2006 05:53:28 +0000 (22:53 -0700)]
[PATCH] uml: memory hotplug cleanups
Change memory hotplug to use GFP_NOWAIT instead of GFP_ATOMIC, so that it
will grab memory without sleeping, but doesn't try to use the emergency
pools.
A small list initialization suggested by Daniel Phillips - don't initialize
lists which are just about to be list_add-ed.
Jeff Dike [Tue, 11 Apr 2006 05:53:26 +0000 (22:53 -0700)]
[PATCH] UML: TLS fixlets
Two small TLS fixes -
arch/um/os-Linux/sys-i386/tls.c uses errno and -E* so it should include
errno.h
__setup_host_supports_tls returns 1, but as an initcall, it should return 0
Remove multi-exported symbols from arch/m32r/kernel/m32r_ksyms.c.
WARNING: vmlinux: 'enable_irq' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'disable_irq' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'disable_irq_nosync' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'synchronize_irq' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'memchr' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strstr' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'memscan' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'memcmp' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'memmove' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strnlen' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strchr' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strncmp' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strcmp' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strncat' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strcat' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strncpy' exported twice. Previous export was in vmlinux
WARNING: vmlinux: 'strcpy' exported twice. Previous export was in vmlinux
[PATCH] m32r: security fix of {get,put}_user macros
Update {get,put}_user macros for m32r kernel.
- Modify get_user to use __get_user_asm macro, instead of __get_user_x macro.
- Remove arch/m32r/lib/{get,put}user.S.
- Some cosmetic updates.
I would like to thank NIIBE Yutaka for his reporting about the m32r kernel's
security problem in {get,put}_user macros.
There were no address checking for user space access in {get,put}_user macros.
;-)
[PATCH] m32r: Fix cpu_possible_map and cpu_present_map initialization for SMP kernel
This patch fixes a boot problem of the m32r SMP kernel 2.6.16-rc1-mm3 or
later.
In this patch, cpu_possible_map is statically initialized, and cpu_present_map
is also copied from cpu_possible_map in smp_prepare_cpus(), because the m32r
architecture has not supported CPU hotplug yet.
I've encountered two problems with 2.6.16 and newer kernels on my API CS20
(dual 833MHz Alpha 21264b processors). The first is the kernel OOPSing
because of a NULL pointer dereference while trying to populate SysFS with the
CPU information. The other is that only one processor was being brought up.
I've included a small Alpha-specific patch that fixes both problems.
The first problem was caused by the CPUs never being properly registered using
register_cpu(), the way it's done on other architectures.
The second problem has to do with the removal of hwrpb_cpu_present_mask in
arch/alpha/kernel/smp.c. In setup_smp() in the 2.6.15 kernel sources,
hwrpb_cpu_present_mask has a bit set for each processor that is probed, and
afterwards cpu_present_mask is set to the cpumask for the boot CPU. In the
same function of the same file in the 2.6.16 sources, instead of
hwrpb_cpu_present_mask being set, cpu_possible_map is updated for each probed
CPU. cpu_present_mask is still set to the cpumask of the boot CPU afterwards.
The problem lies in include/asm-alpha/smp.h, where cpu_possible_map is
#define'd to be cpu_present_mask.
- cpu_present_mask and cpu_possible_map are essentially the same thing
on alpha, as it doesn't support CPU hotplug;
- allocate "struct cpu" only for present CPUs, like sparc64 does.
Static array of "struct cpu" is just a waste of memory.
If the board has more than 32 PCI busses on it, the mptable bus array will
overwrite its bounds for the PCI busses, and stomp on anything that's after
it.
Prevent possible table overflow and unknown data corruption. Code is in an
__init section so it will be discarded after init.
Randy Dunlap [Tue, 11 Apr 2006 05:53:11 +0000 (22:53 -0700)]
[PATCH] i386: print EIP/ESP last
Print summary registers (EIP and SS:ESP only) as last death info. This
makes this important data visible in case it had scrolled off the top of
the display. Similar to what x86_64 does. Suggested by Andi Kleen.
Switching to automatic bigsmp causes a misleading error message, that more
then 8 cpus are detected, and user needs to select either X86_GENERICARCH
or X86_BIGSMP to handle.
Reason is we switched to bigsmp to avoid IP race when new cpu is comming
up. [bigsmp is nothing but using physical flat mode that can work for 1 ..
255 cpus] [default is X86_PC, that uses logical flat mode up to 8 CPUs
max] Current x86_64 code uses bigsmp as default when hotplug is enabled.
It would be preferable to make bigsmp as default, and work the dependencies
of other related code like SMP_SUSPEND, and some related to memory hotplug
code for i386.
Current logical flat mode doesnt use shortcuts that cause the race by using
the send_IPI_mask() instead of shortcuts when HOTPLUG_CPU is enabled.
In the meantime this patch is the path of lease resistance.
We will switch to bigsmp default sometime soon, when we get to work it again.
[PATCH] frv: define MMU mode specific syscalls as 'cond_syscall' and clean up unneeded macros
For some architectures, a few syscalls are not linked in noMMU mode. In
that case, the MMU depending syscalls are needed to be defined as
'cond_syscall'. For example, ARM architecture selectively links sys_mlock
by the mode configuration.
In case of FRV, it has been managed by #ifdef CONFIG_MMU macro in
arch/frv/kernel/entry.S. However these conditional macros are just
duplicates if they were defined as cond_syscall. Compilation test is done
with FRV toolchains for both of MMU and noMMU mode.
Andy Whitcroft [Tue, 11 Apr 2006 05:53:01 +0000 (22:53 -0700)]
[PATCH] page flags: add commentry regarding field reservation
Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.
[PATCH] overcommit: use totalreserve_pages for nommu
This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/nommu.c.
When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().
This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/mmap.c.
When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
The code checks for newbrk with oldbrk which are page aligned before making
a check for the memory limit set of data segment. If the memory limit is
not page aligned in that case it bypasses the test for the limit if the
memory allocation is still for the same page.
Luke Yang [Tue, 11 Apr 2006 05:52:56 +0000 (22:52 -0700)]
[PATCH] nommu: use compound page in slab allocator
The earlier patch to consolidate mmu and nommu page allocation and
refcounting by using compound pages for nommu allocations had a bug:
kmalloc slabs who's pages were initially allocated by a non-__GFP_COMP
allocator could be passed into mm/nommu.c kmalloc allocations which really
wanted __GFP_COMP underlying pages. Fix that by having nommu pass
__GFP_COMP to all higher order slab allocations.
[PATCH] slab: add statistics for alien cache overflows
Add a statistics counter which is incremented everytime the alien cache
overflows. alien_cache limit is hardcoded to 12 right now. We can use
this statistics to tune alien cache if needed in the future.
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
for sparc64.
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu under /net
[PATCH] md: make sure 64bit fields in version-1 metadata are 64-bit aligned
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Andrew Morton [Tue, 11 Apr 2006 05:52:46 +0000 (22:52 -0700)]
[PATCH] select() warning fixes
fs/select.c: In function `core_sys_select':
fs/select.c:339: warning: assignment from incompatible pointer type
fs/select.c:376: warning: comparison of distinct pointer types lacks a cast
By using a void* we can remove lots of casts rather than adding more.
Mike Galbraith [Tue, 11 Apr 2006 05:52:44 +0000 (22:52 -0700)]
[PATCH] sched: fix interactive task starvation
Fix a starvation problem that occurs when a stream of highly interactive tasks
delay an array switch for extended periods despite EXPIRED_STARVING(rq) being
true. AFAIKT, the only choice is to enqueue awakening tasks on the expired
array in this case.
Without this patch, it can be nearly impossible to remotely login to a busy
server, and interactive shell commands can starve for minutes.
Also, convert the EXPIRED_STARVING macro into an inline function which humans
can understand.
[PATCH] splice: add direct fd <-> fd splicing support
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.
Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Roman Zippel [Sun, 9 Apr 2006 15:27:14 +0000 (17:27 +0200)]
kconfig: recenter menuconfig
Move the menuconfig output more into the centre again, it's using a
fixed position depending on the window width using the fact that the
menu output has to work in a 80 chars terminal.
kbuild: fix mode of checkstack.pl and other files.
Make it executable like it should be. Do the same for other files intended to be
executed by the user - the ones called by the build process needn't be
executable as they already work (as argument to their interpreter).
Sam Ravnborg [Tue, 11 Apr 2006 11:24:32 +0000 (13:24 +0200)]
kbuild: rebuild initramfs if content of initramfs changes
initramfs.cpio.gz being build in usr/ and included in the
kernel was not rebuild when the included files changed.
To fix this the following was done:
- let gen_initramfs.sh generate a list of files and directories included
in the initramfs
- gen_initramfs generate the gzipped cpio archive so we could simplify
the kbuild file (Makefile)
- utilising the kbuild infrastructure so when uid/gid root mapping changes
the initramfs will be rebuild
With this change we have a much more robust initramfs generation.