Ingo Molnar [Fri, 16 Feb 2007 09:27:34 +0000 (01:27 -0800)]
[PATCH] x86: rewrite SMP TSC sync code
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.
The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.
The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.
The TSC synchronization-checking code also got moved into a separate file.
Ingo Molnar [Fri, 16 Feb 2007 09:27:29 +0000 (01:27 -0800)]
[PATCH] Fix timeout overflow with jiffies
Prevent timeout overflow if timer ticks are behind jiffies (due to high
softirq load or due to dyntick), by limiting the valid timeout range to
MAX_LONG/2.
Jeff Dike [Fri, 16 Feb 2007 09:27:21 +0000 (01:27 -0800)]
[PATCH] uml: fix 2.6.20 hang
A previous cleanup misused need_poll, which had a fairly broken interface.
It implemented a growable array, changing the used elements count itself,
but leaving it up to the caller to fill in the actual elements, including
the entire array if the array had to be reallocated. This worked because
the previous users were switching between two such structures, and the
elements were copied from the inactive array to the active array after
making sure the active array had enough room.
maybe_sigio_broken was made to use need_poll, but it was operating on a
single array, so when the buffer was reallocated, the previous contents
were lost.
This patch makes need_poll implement more sane semantics. It merely
assures that the array is of the proper size and that the contents are
preserved. It is up to the caller to adjust the used elements count and to
ensure that the proper elements are resent.
This manifested itself as a hang in 2.6.20 as the uninitialized buffer
convinced UML that one of its own file descriptors didn't support SIGIO and
needed to be watched by poll in a separate thread. The result was an
interrupt flood as control traffic over this descriptor sparked interrupts,
which resulted in more control traffic, ad nauseum.
Dmitriy Monakhov [Fri, 16 Feb 2007 09:27:18 +0000 (01:27 -0800)]
[PATCH] __page_symlink retry loop error code fix
If prepare_write or commit_write return AOP_TRUNCATED_PAGE we jump to
"retry" label and than if find_or_create_page() failed function return
incorrect error code.
The following patch allows me to boot. Only the if(mask..) continue;
part fixes the problem actually, the gotos where changed so that we
don't try to unmap something we couldn't map anyway.
Nate Dailey [Thu, 15 Feb 2007 23:13:46 +0000 (18:13 -0500)]
sata_vsc: use default cache line size if non-zero
This modifies drivers/ata/sata_vsc.c to only set the cache line size
to 0x80 if the default value is zero. Apparently zero isn't allowed
due to a bug in the chip, but I've found performance is much better
with the (non-zero) default instead of 0x80.
[note1: "default" means BIOS-programmed value, in this context -jgarzik]
[note2: superfluous braces were removed from the patch -jg]
Robert Hancock [Mon, 12 Feb 2007 00:36:56 +0000 (18:36 -0600)]
sata_nv: handle SError status indication
ADMA-capable controllers provide a bit in the status register that appears
to indicate that the controller detected an SError condition. Update sata_nv
to detect this and trigger error handling in order to handle the fault.
Zhang, Yanmin [Fri, 9 Feb 2007 03:29:51 +0000 (11:29 +0800)]
ATA convert GSI to irq on ia64
If an ATA drive uses legacy mode, ata driver will choose 14 and 15
as the fixed irq number. On ia64 platform, such numbers are GSI and
should be converted to irq vector.
Tejun Heo [Wed, 7 Feb 2007 20:37:41 +0000 (12:37 -0800)]
libata: clear TF before IDENTIFYing
Some devices chock if Feature is not clear when IDENTIFY is issued.
Set ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE for IDENTIFY such that whole
TF is cleared when reading ID data.
Kudos to Art Haas for testing various futile patches over several
months and Mark Lord for pointing out the fix.
Tejun Heo [Mon, 5 Feb 2007 14:21:19 +0000 (23:21 +0900)]
libata: fix drive side 80c cable check, take 3
The 80c wire bit is bit 13, not 14. Bit 14 is always 1 if word93 is
implemented. This increases the chance of incorrect wire detection
especially because host side cable detection is often unreliable and
we sometimes soley depend on drive side cable detection. Fix the test
and add word93 validity check.
sata_promise: new EH conversion for 20619 chips, take 2
This patch updates the sata_promise driver to use new-style
libata error handling for 20619 (TX4000) chips. sata_promise
already uses new EH for the other chips it supports, so the
patch is quite simple:
* remove ->phy_reset and ->eng_timeout ops from pdc_pata_ops,
and instead bind ->freeze, ->thaw, ->error_handler, and
->post_internal_cmd to existing new EH functions
* drop ATA_FLAG_SRST from board_20619's flags
* remove now unused pdc_pata_phy_reset() and pdc_eng_timeout()
Tested on a TX4000 with both modern working disks and old/quirky
disks. Also used a CD-RW drive to test reading and writing CDs.
[PATCH] sysctl: hide the sysctl proc inodes from selinux
Since the security checks are applied on each read and write of a sysctl file,
just like they are applied when calling sys_sysctl, they are redundant on the
standard VFS constructs. Since it is difficult to compute the security labels
on the standard VFS constructs we just mark the sysctl inodes in proc private
so selinux won't even bother with them.
Stephen Smalley [Wed, 14 Feb 2007 08:34:16 +0000 (00:34 -0800)]
[PATCH] selinux: enhance selinux to always ignore private inodes
Hmmm...turns out to not be quite enough, as the /proc/sys inodes aren't truly
private to the fs, so we can run into them in a variety of security hooks
beyond just the inode hooks, such as security_file_permission (when reading
and writing them via the vfs helpers), security_sb_mount (when mounting other
filesystems on directories in proc like binfmt_misc), and deeper within the
security module itself (as in flush_unauthorized_files upon inheritance across
execve). So I think we have to add an IS_PRIVATE() guard within SELinux, as
below. Note however that the use of the private flag here could be confusing,
as these inodes are _not_ private to the fs, are exposed to userspace, and
security modules must implement the sysctl hook to get any access control over
them.
I goofed and when reenabling the fine grained selinux labels for
sysctls and forgot to add the "/sys" prefix before consulting
the policy database. When computing the same path using
proc_dir_entries we got the "/sys" for free as it was part
of the tree, but it isn't true for clt_table trees.
[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables
It isn't needed anymore, all of the users are gone, and all of the ctl_table
initializers have been converted to use explicit names of the fields they are
initializing.
[PATCH] sysctl: reimplement the sysctl proc support
With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.
For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.
The speed feels about the same, even though we can now cache the sysctl
dentries :(
We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.
[PATCH] sysctl: factor out sysctl_head_next from do_sysctl
The current logic to walk through the list of sysctl table headers is slightly
painful and implement in a way it cannot be used by code outside sysctl.c
I am in the process of implementing a version of the sysctl proc support that
instead of using the proc generic non-caching monster, just uses the existing
sysctl data structure as backing store for building the dcache entries and for
doing directory reads. To use the existing data structures however I need a
way to get at them.
[PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
[PATCH] sysctl: remove support for directory strategy routines
parse_table has support for calling a strategy routine when descending into a
directory. To date no one has used this functionality and the /proc/sys
interface has no analog to it.
So no one is using this functionality kill it and make the binary sysctl code
easier to follow.
[PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry
binfmt_misc has a mount point in the middle of the sysctl and that mount point
is created as a proc_generic directory.
Doing it that way gets in the way of cleaning up the sysctl proc support as it
continues the existence of a horrible hack. So instead simply create the
directory as an ordinary sysctl directory. At least that removes the magic
special case.
[PATCH] sysctl: move SYSV IPC sysctls to their own file
This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the ipc logic to together it makes
the code a little more readable.
[PATCH] sysctl: move utsname sysctls to their own file
This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
with special cases, and by keeping all of the utsname logic to together it
makes the code a little more readable.
ocfs2 was did not have the binary number it uses under CTL_FS registered in
sysctl.h. Register it to avoid future conflicts, and change the name of the
definition to be in line with the rest of the sysctl numbers.
[PATCH] sysctl: C99 convert coda ctl_tables and remove binary sysctls
Will converting the coda sysctl initializers I discovered that it is yet
another user of sysctl that was stomping CTL_KERN. So off with it's
sys_sysctl support since it wasn't done in a supportable way.
[PATCH] sysctl: remove sys_sysctl support from drivers/char/rtc.c
The real time clock driver was using the binary number reserved for cdroms in
the sysctl binary number interface, which is a no-no. So since the sysctl
binary interface is wrong remove it.
[PATCH] sysctl: C99 convert ctl_tables in arch/x86_64/kernel/vsyscall.c
Basically everything was done but I removed all element initializers from the
trailing entries to make it clear the entire last entry should be zero filled.
[PATCH] sysctl: C99 convert arch/sh64/kernel/traps.c and remove ABI breakage
While doing the C99 conversion I notices that the top level sh64 directory was
using the binary number for CTL_KERN. That is a no-no so I removed the
support for the sysctl binary interface only leaving sysctl /proc support.
At least the sysctl tables were placed at the end of the list so user space
did not see this mistake.
[PATCH] sysctl: C99 convert arch/mips/lasat/sysctl.c and remove ABI breakage
While C99 converting the ctl_table initializers I realized that the binary
sysctl numbers were in conflict with the binary values under CTL_KERN.
Including CTL_KERN KERN_VERSION as used by glibc. So I just removed the
sysctl binary interface for these values, as it was unsupportable.
Luckily these sysctl were inserted at the end of the sysctl list so this bug
was not visible to userspace.
[PATCH] sysctl: C99 convert arch/ia64/kernel/perfmon and remove ABI breakage
This convters the sysctl ctl_tables to use C99 initializers. While I was
looking at it I discovered it was using a portion of the sysctl binary
addresses space under CTL_KERN KERN_OSTYPE which was completely inappropriate.
So I completely removed all of the sysctl binary names, to remove and avoid
the ABI conflict.
By not using the enumeration in sysctl.h (or even understanding it) the SN
platform placed their arch specific xpc directory on top of CTL_KERN and only
because they didn't have 4 entries in their xpc directory got lucky and didn't
break glibc.
This is totally irresponsible. So this patch entirely removes sys_sysctl
support from their sysctl code. Hopefully they don't have ascii name
conflicts as well.
And now that they have no ABI numbers add them to the end instead of the
sysctl list instead of the head so nothing else will be overridden.
[PATCH] sysctl: frv: remove unnecessary insert_at_head flag
Since the binary sysctl numbers are unique putting the registered sysctls at
the head of the sysctl list where they can override existing sysctls serves no
useful purpose.
[PATCH] sysctl: frv: pm remove unnecessary insert_at_head flag
With unique binary numbers setting insert_at_head to insert yourself at the
head of sysctl list and thus override existing sysctl entries serves no point.
There is no need for open files in /proc/sys/XXX to hold a reference count on
the module that provides the file to prevent module unload races. While there
is code active in the module p->used in the sysctl_table_header is
incremented, preventing the sysctl from being unregisted. Once the sysctl is
unregistered it cannot be found. Open files are also not a problem as they
revalidate the sysctl information and bump p->used before accessing module
code.
So setting de->owner is unnecessary, makes for a bad example and gets in my
way of removing ctl_table->de.
[PATCH] sysctl: decnet: remove unnecessary insert_at_head flag
The sysctl numbers used are unique so setting the insert_at_head flag does not
succeed in overriding any sysctls, and is just confusing because it doesn't.
Clear the flag.
[PATCH] sysctl: x25: remove unnecessary insert_at_head from register_sysctl_table
There has not been much maintenance on sysctl in years, and as a result is
there is a lot to do to allow future interesting work to happen, and being
ambitious I'm trying to do it all at once :)
The patches in this series fall into several general categories.
- Removal of useless attempts to override the standard sysctls
- Registers of sysctl numbers in sysctl.h so someone else does not use
the magic number and conflict.
- C99 conversions so it becomes possible to change the layout of
struct ctl_table without breaking everything.
- Removal of useless claims of module ownership, in the proc dir entries
- Removal of sys_sysctl support where people had used conflicting sysctl
numbers. Trying to break glibc or other applications by changing the
ABI is not cool. 9 instances of this in the kernel seems a little
extreme.
- General enhancements when I got the junk I could see out.
This patch:
Since x25 uses unique binary numbers inserting yourself at the head of the
search list for sysctls so you can override already registered sysctls is
pointless.
Tim Schmielau [Wed, 14 Feb 2007 08:33:14 +0000 (00:33 -0800)]
[PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).