Oleg Drokin [Sat, 25 Mar 2006 11:06:54 +0000 (03:06 -0800)]
[PATCH] Missed error checking for intent's filp in open_namei().
It seems there is error check missing in open_namei for errors returned
through intent.open.file (from lookup_instantiate_filp).
If there is plain open performed, then such a check done inside
__path_lookup_intent_open called from path_lookup_open(), but when the open
is performed with O_CREAT flag set, then __path_lookup_intent_open is only
called with LOOKUP_PARENT set where no file opening can occur yet.
Later on lookup_hash is called where exact opening might take place and
intent.open.file may be filled. If it is filled with error value of some
sort, then we get kernel attempting to dereference this error value as
address (and corresponding oops) in nameidata_to_filp() called from
filp_open().
While this is relatively simple to workaround in ->lookup() method by just
checking lookup_instantiate_filp() return value and returning error as
needed, this is not so easy in ->d_revalidate(), where we can only return
"yes, dentry is valid" or "no, dentry is invalid, perform full lookup
again", and just returning 0 on error would cause extra lookup (with
potential extra costly RPCs).
So in short, I believe that there should be no difference in error handling
for opening a file and creating a file in open_namei() and propose this
simple patch as a solution.
Andrew Morton [Sat, 25 Mar 2006 11:06:53 +0000 (03:06 -0800)]
[PATCH] jbd: embed j_commit_timer in journal struct
The kjournald timer is currently on the kernel thread's stack and the journal
structure points at it. Save a pointer hop by moving the timer into the
journal structure.
David Howells [Sat, 25 Mar 2006 11:06:52 +0000 (03:06 -0800)]
[PATCH] Keys: Replace duplicate non-updateable keys rather than failing
Cause an attempt to add a duplicate non-updateable key (such as a keyring) to
a keyring to discard the extant copy in favour of the new one rather than
failing with EEXIST:
# do the test in an empty session
keyctl session
# create a new keyring called "a" and attach to session
keyctl newring a @s
# create another new keyring called "a" and attach to session,
# displacing the keyring added by the second command:
keyctl newring a @s
Without this patch, the third command will fail.
For updateable keys (such as those of "user" type), the update method will
still be called rather than a new key being created.
Chen, Kenneth W [Sat, 25 Mar 2006 11:06:50 +0000 (03:06 -0800)]
[PATCH] x86: HUGETLBFS and DEBUG_PAGEALLOC are incompatible
DEBUG_PAGEALLOC is not compatible with hugetlb page support. That debug
option turns off PSE. Once it is turned off in CR4, the cpu will ignore
pse bit in the pmd and causing infinite page-not- present faults.
So disable DEBUG_PAGEALLOC if the user selected hugetlbfs.
Ashok Raj [Sat, 25 Mar 2006 11:06:50 +0000 (03:06 -0800)]
[PATCH] x86: make CONFIG_HOTPLUG_CPU depend on !X86_PC
Make CONFIG_HOTPLUG_CPU depend on !X86_PC, so we need to turn on either
CONFIG_GENERICARCH, CONFIG_BIGSMP or any other subarch except X86_PC when
CONFIG_HOTPLUG_CPU=y
With 2.6.15+ kernels when CONFIG_HOTPLUG_CPU is turned on we switch to
bigsmp mode for sending IPI's and ioapic configurations that caused the
following error message.
>> More than 8 CPUs detected and CONFIG_X86_PC cannot handle it.
>> Use CONFIG_X86_GENERICARCH or CONFIG_X86_BIGSMP.
Originally bigsmp was added just to handle >8 cpus, but now with hotplug
cpu support we need to use bigsmp mode (why? see below), that cause the
above error message even if there were less than 8 cpus in the system.
The message is bogus, but we are cannot use logical flat mode due to issues
with broadcast IPI can confuse a CPU just comming up. We use flat physical
mode just like x86_64 case. More details on why bigsmp now uses flat
physical mode (vs. cluster mode) in following link.
We have had this memory leak for a while now. The situation is complicated
by the use of alloc_kmemlist() as a function to resize various caches by
do_tune_cpucache().
What we do here is first of all make sure that we deallocate properly in
the loop over all the nodes.
If we are just resizing caches then we can simply return with -ENOMEM if an
allocation fails.
If the cache is new then we need to rollback and remove all earlier
allocations.
We detect that a cache is new by checking if the link to the global cache
chain has been setup. This is a bit hackish ....
(also fix up too overlong lines that I added in the last patch...)
[PATCH] slab: Bypass free lists for __drain_alien_cache()
__drain_alien_cache() currently drains objects by freeing them to the
(remote) freelists of the original node. However, each node also has a
shared list containing objects to be used on any processor of that node.
We can avoid a number of remote node accesses by copying the pointers to
the free objects directly into the remote shared array.
And while we are at it: Skip alien draining if the alien cache spinlock is
already taken.
Kiran reported that this is a performance benefit.
slabr_objects() can be used to transfer objects between various object
caches of the slab allocator. It is currently only used during
__cache_alloc() to retrieve elements from the shared array. We will be
using it soon to transfer elements from the alien caches to the remote
shared array.
As suggested by Eric Dumazet, optimize kzalloc() calls that pass a
compile-time constant size. Please note that the patch increases kernel
text slightly (~200 bytes for defconfig on x86).
Introduce a memory-zeroing variant of kmem_cache_alloc. The allocator
already exits in XFS and there are potential users for it so this patch
makes the allocator available for the general public.
David Howells [Sat, 25 Mar 2006 11:06:36 +0000 (03:06 -0800)]
[PATCH] Optimise d_find_alias()
The attached patch optimises d_find_alias() to only take the spinlock if
there's anything in the the inode's alias list. If there isn't, it returns
NULL immediately.
With respect to the superblock sharing patch, this should reduce by one the
number of times the dcache_lock is taken by nfs_lookup() for ordinary
directory lookups.
Only in the case where there's already a dentry for particular directory inode
(such as might happen when another mountpoint is rooted at that dentry) will
the lock then be taken the extra time.
Thomas Gleixner [Sat, 25 Mar 2006 11:06:35 +0000 (03:06 -0800)]
[PATCH] Validate and sanitze itimer timeval from userspace
According to the specification the timevals must be validated and an
errorcode -EINVAL returned in case the timevals are not in canonical form.
This check was never done in Linux.
The pre 2.6.16 code converted invalid timevals silently. Negative timeouts
were converted by the timeval_to_jiffies conversion to the maximum timeout.
hrtimers and the ktime_t operations expect timevals in canonical form.
Otherwise random results might happen on 32 bits machines due to the
optimized ktime_add/sub operations. Negative timeouts are treated as
already expired. This might break applications which work on pre 2.6.16.
To prevent random behaviour and API breakage the timevals are checked and
invalid timevals sanitized in a simliar way as the pre 2.6.16 code did.
Invalid timevals are reported with a per boot limited number of kernel
messages so applications which use this misfeature can be corrected.
After a grace period of one year the sanitizing should be replaced by a
correct validation check. This is also documented in
Documentation/feature-removal-schedule.txt
The validation and sanitizing is done inside do_setitimer so all callers
(sys_setitimer, compat_sys_setitimer, osf_setitimer) are catched.
Thomas Gleixner [Sat, 25 Mar 2006 11:06:33 +0000 (03:06 -0800)]
[PATCH] sys_alarm() unsigned signed conversion fixup
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.
Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.
hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.
For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.c
Andrew Morton [Sat, 25 Mar 2006 11:06:32 +0000 (03:06 -0800)]
[PATCH] timer irq driven soft watchdog fix
I seem to have lost this hunk in yesterday's patch. It brings the
coming-online CPU's softlockup timer up to date so we don't get false-positive
tripups during CPU hot-add.
Andrew Morton [Sat, 25 Mar 2006 12:24:19 +0000 (09:24 -0300)]
V4L/DVB (3604): V4l printk fix
drivers/media/video/v4l2-common.c: In function `v4l_printk_ioctl_arg':
drivers/media/video/v4l2-common.c:477: warning: `0' flag used with `%p' printf format
Someone went and "improved" my patch ;)
V4L/DVB (3599a): Move drivers/usb/media to drivers/media/video
Because of historic reasons, there are two separate directories with
V4L stuff. Most drivers are located at driver/media/video. However, some
code for USB Webcams were inserted under drivers/usb/media.
This makes difficult for module authors to know were things should be.
Also, makes Kconfig menu confusing for normal users.
This patch moves all V4L content under drivers/usb/media to
drivers/media/video, and fixes Kconfig/Makefile entries.
Ilia Sotnikov [Sat, 25 Mar 2006 09:38:55 +0000 (01:38 -0800)]
[IPV4]: Aggregate route entries with different TOS values
When we get an ICMP need-to-frag message, the original TOS value in the
ICMP payload cannot be used as a key to look up the routes to update.
This is because the TOS field may have been modified by routers on the
way. Similarly, ip_rt_redirect should also ignore the TOS as the router
that gave us the message may have modified the TOS value.
The patch achieves this objective by aggregating entries with different
TOS values (but are otherwise identical) into the same bucket. This
makes it easy to update them at the same time when an ICMP message is
received.
In future we should use a twin-hashing scheme where teh aggregation
occurs at the entry level. That is, the TOS goes back into the hash
for normal lookups while ICMP lookups will end up with a node that
gives us a list that contains all other route entries that differ
only by TOS.
John Heffner [Sat, 25 Mar 2006 09:34:07 +0000 (01:34 -0800)]
[TCP]: Set default max buffers from memory pool size
This patch sets the maximum TCP buffer sizes (available to automatic
buffer tuning, not to setsockopt) based on the TCP memory pool size.
The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more
than 1/128 of the memory pressure threshold.
Herbert Xu [Sat, 25 Mar 2006 09:25:29 +0000 (01:25 -0800)]
[SCTP]: Fix up sctp_rcv return value
I was working on the ipip/xfrm problem and as usual I get side-tracked by
other problems.
As part of an attempt to change the IPv4 protocol handler calling
convention I found that SCTP violated the existing convention.
It's returning non-zero values after freeing the skb. This is doubly bad
as 1) the skb gets resubmitted; 2) the return value is interpreted as a
protocol number.
This patch changes those return values to zero.
IPv6 doesn't suffer from this problem because it uses a positive return
value as an indication for resubmission. So the only effect of this patch
there is to increment the IPSTATS_MIB_INDELIVERS counter which IMHO is
the right thing to do.
Herbert Xu [Sat, 25 Mar 2006 09:24:25 +0000 (01:24 -0800)]
[NET]: Take RTNL when unregistering notifier
The netdev notifier call chain is currently unregistered without taking
any locks outside the notifier system. Because the notifier system itself
does not synchronise unregistration with respect to the calling of the
chain, we as its user need to do our own locking.
We are supposed to take the RTNL for all calls to netdev notifiers, so
taking the RTNL should be sufficient to protect it.
The registration path in dev.c already takes the RTNL so it's OK.
Leonid Arsh [Thu, 23 Mar 2006 17:52:51 +0000 (19:52 +0200)]
IPoIB: P_Key change event handling
This patch causes the network interface to respond to P_Key change
events correctly. As a result, you'll see a child interface in the
"RUNNING" state (netif_carrier_on()) only when the corresponding P_Key
is configured by the SM. When SM removes a P_Key, the "RUNNING" state
will be disabled for the corresponding network interface. To
implement this, I added IB_EVENT_PKEY_CHANGE event handling. To
prevent flushing the device before the device is open by the "delay
open" mechanism, I added an additional device flag called
IPOIB_FLAG_INITIALIZED.
This also prevents the child network interface from trying to join to
multicast groups until the PKEY is configured. We used to get error
messages like:
ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff
in this case. To fix this, I just check IPOIB_FLAG_OPER_UP flag in
ipoib_set_mcast_list().
Roland Dreier [Fri, 24 Mar 2006 23:47:30 +0000 (15:47 -0800)]
IB/mthca: Fix modify QP error path
If the call to mthca_MODIFY_QP() failed, then mthca_modify_qp() would
still do some things it shouldn't, such as store away attributes for
special QPs. Fix this, and simplify the code, by simply jumping to
the exit path if mthca_MODIFY_QP() fails.
Leonid Arsh [Wed, 22 Mar 2006 17:54:24 +0000 (19:54 +0200)]
IPoIB: Fix network interface "RUNNING" status
With the current IPoIB driver, the status of network interfaces stays
"RUNNING" even if the link goes down (for example because a cable is
unplugged). Fix this by flushing the IPoIB interface when the link
goes down.
Jack Morgenstein [Mon, 20 Mar 2006 15:32:43 +0000 (17:32 +0200)]
IB/mthca: Check SRQ limit in modify SRQ operation
When setting the shared receive queue (SRQ) watermark in a modify SRQ
operation, make sure that the supplied value is not larger than the
full size of the SRQ.
Jack Morgenstein [Mon, 20 Mar 2006 10:35:34 +0000 (12:35 +0200)]
IB/mthca: Check that SRQ WQE size does not exceed device's max value
Guarantee the calculated work queue entry size does not exceed the max
allowable WQE size when creating an SRQ. This is a problem with Arbel
in Tavor-compatibility mode because the current WQE size computation
method rounds up to next power of 2.
Roland Dreier [Fri, 24 Mar 2006 23:47:26 +0000 (15:47 -0800)]
IB/srp: Use a fake scatterlist for non-SG SCSI commands
Since the SCSI midlayer is moving towards entirely getting rid of
commands with use_sg == 0, we should treat this case as an exception.
Therefore, change the IB SRP initiator to create a fake scatterlist
for these commands with sg_init_one(). This simplifies the flow of
DMA mapping and unmapping, since SRP can just use dma_map_sg() and
dma_unmap_sg() unconditionally, rather than having to choose between
the dma_{map,unmap}_sg() and dma_{map,unmap}_single() variants.
Dave Jones [Fri, 24 Mar 2006 23:47:25 +0000 (15:47 -0800)]
[WIRELESS]: Fix config dependencies.
I accidentally ended up with a config that set NET_RADIO off,
and NET_WIRELESS_RTNETLINK on, which blew up thus..
net/built-in.o: In function `do_setlink':net/core/rtnetlink.c:479: undefined reference to `wireless_rtnetlink_set'
net/built-in.o: In function `do_getlink':net/core/rtnetlink.c:521: undefined reference to `wireless_rtnetlink_get'
Peter Chubb [Fri, 24 Mar 2006 05:39:47 +0000 (21:39 -0800)]
[BRIDGE]: Unaligned accesses in the ethernet bridge
I see lots of
kernel unaligned access to 0xa0000001009dbb6f, ip=0xa000000100811591
kernel unaligned access to 0xa0000001009dbb6b, ip=0xa0000001008115c1
kernel unaligned access to 0xa0000001009dbb6d, ip=0xa0000001008115f1
messages in my logs on IA64 when using the ethernet bridge with 2.6.16.
Kurt Hackel [Mon, 6 Mar 2006 22:08:49 +0000 (14:08 -0800)]
[PATCH] ocfs2: dlm recovery fixes
when starting lock mastery (excepting the recovery lock) wait on any nodes
needing recovery. fix one instance where lock resources were left attached
to the recovery list after recovery completed. ensure that the node_down
code is run uniformly regardless of which node found the dead node first.
Fenghua Yu [Tue, 28 Feb 2006 00:16:22 +0000 (16:16 -0800)]
[IA64] New IA64 core/thread detection patch
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add
proc_number=-1 to get core/thread mapping info on the running processer.
Based on this change, we had better to update existing core/thread
detection in IA64 kernel correspondingly. The attached patch implements
this change. It simplifies detection code and eliminates potential race
condition. It also runs a bit faster and has better scalability especially
when cores and threads number grows up in one package.
Jack Steiner [Thu, 2 Mar 2006 22:02:25 +0000 (16:02 -0600)]
[IA64] Increase max node count on SN platforms
Add support in IA64 acpi for platforms that support more than
256 nodes. Currently, ACPI is limited to 256 nodes because the
proximity domain number is 8-bits.
Long term, we expect to use ACPI3.0 to support >256 nodes.
This patch is an interim solution that works with platforms
that pass the high order bits of the proximity domain in
"reserved" fields of the ACPI tables. This code is enabled
ONLY on SN platforms.
Prarit Bhargava [Wed, 8 Mar 2006 18:30:18 +0000 (13:30 -0500)]
[IA64] Tollhouse HP: IA64 arch changes
arch/ia64/sn and include/asm-ia64/sn changes required to support Tollhouse
system PCI hotplug, fixes the ia64_sn_sysctl_ioboard_get call, and introduces
the PRF_HOTPLUG_SUPPORT feature bit.
Hans Verkuil [Mon, 20 Mar 2006 19:32:52 +0000 (16:32 -0300)]
V4L/DVB (3588): Remove VIDIOC_G/S_AUDOUT from msp3400
VIDIOC_G/S_AUDOUT does not belong in msp3400 (it's a user level command,
not to be used in internal i2c drivers). Also fix a compile warning and
improve LOG_STATUS.
Add a new audio mode V4L2_TUNER_MODE_LANG1_LANG2 (used by VIDIOC_G/S_TUNER).
This mode allows the user to select both languages of a bilingual transmission,
one language on the left, one on the right audio channel. If there is no
bilingual transmission, or it is not supported, then this mode should act like
V4L2_TUNER_MODE_STEREO.
This mode is introduced for PVR-like drivers where it is useful to be able to
record both languages of a bilingual broadcast.
- implement VIDIOC_INT_S_AUDIO_ROUTING for msp3400 and tvaudio
- use the new command in bttv, pvrusb2 and em28xx.
- remove the now obsolete MSP_SET_MATRIX from msp3400 (yeah!)
- remove the obsolete VIDIOC_S_AUDIO from msp3400.
Hans Verkuil [Sun, 19 Mar 2006 11:21:56 +0000 (08:21 -0300)]
V4L/DVB (3580): Last round of msp3400 cleanups before adding routing commands
Lots of cleanups:
- remove duplicate actions
- add D/K3 Dual FM-Stereo and D/K NICAM FM (HDEV3) support
- put prescales in the proper place
- add missing D/K NICAM
- msp34xxg_reset now only resets instead of also starting the autodetect
(moved that to msp34xxg_thread)
- fix support for SAP.
Hans Verkuil [Sun, 19 Mar 2006 00:31:00 +0000 (21:31 -0300)]
V4L/DVB (3577): Cleanup audio input handling
Cleanup audio input handling in bttv and tvaudio:
- inputs were specified that were never used
- mute was handled as a special input which led to confusing code
- confusing naming made it difficult to see if the setting was for
i2c or gpio.
The old audiochip.h input names moved to tvaudio.h. Currently this
is used both by tvaudio and msp3400 until the msp3400 implements the
new msp3400-specific inputs.
Detect in bttv the tvaudio and msp3400 i2c clients and use these
client pointers to set the inputs directly instead of broadcasting the
command.
Removed AUDC_SET_INPUT. Now replaced by VIDIOC_S_AUDIO. This will be
replaced again later by the new ROUTING commands.
Removed VIDIOC_G_AUDIO implementations in i2c drivers: this command is
a user level command and not to be used internally. It wasn't called at
all anyway.
Michael Krufky [Thu, 23 Mar 2006 04:11:18 +0000 (01:11 -0300)]
V4L/DVB (3575): Cxusb: fix i2c debug messages for bluebird devices
Only the Medion boxes return 0x08 after an i2c read/write.
The bluebird devices do not return anything at all.
This patch conditionalizes the test for the 0x08 return code
to produce a warning message when using the Medion box, only.
Michael Krufky [Thu, 23 Mar 2006 03:15:55 +0000 (00:15 -0300)]
V4L/DVB (3573): Cxusb: remove FIXME: comment in bluebird_patch_dvico_firmware_download
Removed the FIXME comment from bluebird_patch_dvico_firmware_download:
/* FIXME: are we allowed to change the fw-data ? */
Yes, we are allowed. DViCO's Windows driver also does the same thing.
A single firmware image is used to support all of the bluebird boxes.
The firmware sets all devices to PID: d700. Instead of using that, the
driver replaces the d700 with the cold PID+1 before the download.
Michael Krufky [Thu, 23 Mar 2006 03:01:34 +0000 (00:01 -0300)]
V4L/DVB (3572): Cxusb: conditionalize gpio write for the medion box
This patch removes the (harmless) -ETIMEDOUT during device init
for the DViCO FusionHDTV Bluebird boxes, by conditionalizing the
gpio write inside cxusb_i2c_xfer to happen only for Medion boxes.
Andrew Morton [Wed, 22 Mar 2006 18:01:59 +0000 (15:01 -0300)]
V4L/DVB (3571): Printk warning fixes
On ppc64, u64 is `unsigned long'
drivers/media/video/v4l2-common.c: In function `v4l_printk_ioctl_arg':
drivers/media/video/v4l2-common.c:486: warning: cast from pointer to integer of different size
drivers/media/video/v4l2-common.c:580: warning: long long int format, v4l2_std_id arg (arg 8)
drivers/media/video/v4l2-common.c:625: warning: long long int format, v4l2_std_id arg (arg 8)
drivers/media/video/v4l2-common.c:693: warning: long long int format, v4l2_std_id arg (arg 4)
drivers/media/video/v4l2-common.c:910: warning: long long unsigned int format, long unsigned int a$
V4L/DVB (3549): Make hotplug automatically load the b2c2-flexcop-usb module
There was no MODULE_DEVICE_TABLE for the b2c2-flexcop-usb module. This makes it
impossible for hotplug to load the module automatically, when such a device is
connected.
V4L/DVB (3546): Fix Compilation after moving bttv code
- Missing a Makefile for bt8xx
- rds.h were at wrong directory, since it is a global header for an internal
interface
- tda7432 and tda9875 were dependent from bttv.h
- bttv.h were holding i2c addresses
V4L/DVB (3518): Creates a virtual video device driver
The Virtual Video Device Driver (aka vivi) is a device that
can be used to:
1) test core v4l functionalities;
2) be a prototype for newer development.
Vivi were developed using the best practices for v4l driver.
When loaded, it provides a video device that generates a
standard color bar, with a timestamp placed at top left corner.
Video_buf were concerned to allow PCI devices to be used as
video capture devices. This patch extends video_buf features
by virtualizing pci-dependent functions and allowing other
type of devices to use it.
It is still DMA centric, although it may be used also by
devices that emulates scatter/gather behavior or a DMA device
Memory errors encountered by user applications may surface
when the CPU is running in kernel context. The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0). This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.
An example is a user process lauching a load from memory
and the data in memory had bad ECC. Before the bad data
gets to the CPU register, and interrupt comes in. The
code jumps to the IVT interrupt entry point and begins
execution in kernel context. The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA. The MCA surfaces in
kernel context, even though the load was initiated from
user context.
As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table. One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers. A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address. This limits clutter in the code.
This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers. Other spots
can be added as needed by adding a single macro.