Gerrit Renker [Mon, 17 Nov 2008 06:56:55 +0000 (22:56 -0800)]
dccp: Tidy up setsockopt calls
This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.
Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.
The second switch-case statement now only has those statements which need
locking and which make use of `val'.
Gerrit Renker [Mon, 17 Nov 2008 06:55:08 +0000 (22:55 -0800)]
dccp: Deprecate Ack Ratio sysctl
This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.
The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.
With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.
Thanks to Tomasz Grobelny for discussion leading up to this patch.
Gerrit Renker [Mon, 17 Nov 2008 06:53:48 +0000 (22:53 -0800)]
dccp: Feature negotiation for minimum-checksum-coverage
This provides feature negotiation for server minimum checksum coverage
which so far has been missing.
Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.
Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.
Gerrit Renker [Mon, 17 Nov 2008 06:51:23 +0000 (22:51 -0800)]
dccp: Deprecate old setsockopt framework
The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.
This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid
* wrong usage (type);
* changing values while the connection is in progress.
Gerrit Renker [Mon, 17 Nov 2008 06:49:52 +0000 (22:49 -0800)]
dccp: Mechanism to resolve CCID dependencies
This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.
The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html#ccid_dependencies
If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.
The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.
Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.
This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.
Mark McLoughlin [Mon, 17 Nov 2008 06:39:18 +0000 (22:39 -0800)]
virtio_net: Recycle some more rx buffer pages
Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.
A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.
Eric Dumazet [Mon, 17 Nov 2008 03:46:36 +0000 (19:46 -0800)]
net: make sure struct dst_entry refcount is aligned on 64 bytes
As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.
We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.
for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0
As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)
Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.
"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Eric Dumazet [Mon, 17 Nov 2008 03:40:17 +0000 (19:40 -0800)]
net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.
This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.
Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.
__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)
Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Eric Dumazet [Mon, 17 Nov 2008 03:37:55 +0000 (19:37 -0800)]
rcu: Introduce hlist_nulls variant of hlist
hlist uses NULL value to finish a chain.
hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.
This allows to store many different end markers, so that some RCU lockless
algos (used in TCP/UDP stack for example) can save some memory barriers in
fast paths.
Two new files are added :
include/linux/list_nulls.h
- mimics hlist part of include/linux/list.h, derived to hlist_nulls variant
include/linux/rculist_nulls.h
- mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant
Only four helpers are declared for the moment :
hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()
prefetches() were removed, since an end of list is not anymore NULL value.
prefetches() could trigger useless (and possibly dangerous) memory transactions.
Example of use (extracted from __udp4_lib_lookup())
struct sock *sk, *result;
struct hlist_nulls_node *node;
unsigned short hnum = ntohs(dport);
unsigned int hash = udp_hashfn(net, hnum);
struct udp_hslot *hslot = &udptable->hash[hash];
int score, badness;
rcu_read_lock();
begin:
result = NULL;
badness = -1;
sk_nulls_for_each_rcu(sk, node, &hslot->head) {
score = compute_score(sk, net, saddr, hnum, sport,
daddr, dport, dif);
if (score > badness) {
result = sk;
badness = score;
}
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != hash)
goto begin;
if (result) {
if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
result = NULL;
else if (unlikely(compute_score(result, net, saddr, hnum, sport,
daddr, dport, dif) < badness)) {
sock_put(result);
goto begin;
}
}
rcu_read_unlock();
return result;
In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.
This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.
Ben Greear [Mon, 17 Nov 2008 03:19:38 +0000 (19:19 -0800)]
ipv4: Fix ARP behavior with many mac-vlans
Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans. My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan. I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?
The attached patch makes it work much better for me. The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request. The old code
*would* create a stale entry even if we are not going to respond.
Jeff Layton [Sat, 15 Nov 2008 16:12:47 +0000 (11:12 -0500)]
cifs: reinstate sharing of tree connections
Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.
Jeff Kirsher [Fri, 14 Nov 2008 06:45:23 +0000 (06:45 +0000)]
e1000e: fix IPMI traffic
Some users reported that they have machines with BMCs enabled that cannot
receive IPMI traffic after e1000e is loaded.
http://marc.info/?l=e1000-devel&m=121909039127414&w=2
http://marc.info/?l=e1000-devel&m=121365543823387&w=2
This fixes the issue if they load with the new parameter = 0 by disabling
crc stripping, but leaves the performance feature on for most users.
Based on work done by Hong Zhang.
Jeff Kirsher [Fri, 14 Nov 2008 06:45:07 +0000 (06:45 +0000)]
e1000e: fix warn_on reload after phy_id error
If the driver fails to initialize the first time due to the failure in the
phy_id check the kernel triggers a warn_on on the second try to load the
driver because the driver did not free the msi/x resources in the first
load because of the previous failure in phy_id check.
Paulius Zaleckas [Fri, 14 Nov 2008 00:24:34 +0000 (00:24 +0000)]
phylib: make mdio-gpio work without OF (v4)
make mdio-gpio work with non OpenFirmware gpio implementation.
Aditional changes to mdio-gpio:
- use gpio_request() and gpio_free()
- place irq[] array in struct mdio_gpio_info
- add module description, author and license
- add note about compiling this driver as module
- rename mdc and mdio function (were ugly names)
- change MII to MDIO in bus name
- add __init __exit to module (un)loading functions
- probe fails if no phys added to the bus
- kzalloc bitbang with sizeof(*bitbang)
Changes since v3:
- keep bus naming "%x" to be compatible with existing drivers.
Changes since v2:
- more #ifdefs reduction
- platform driver will be registered on OF platforms also
- unified platform and OF bus_id to phy%i
Changes since v1:
- removed NO_IRQ
- reduced #idefs
Helge Deller [Sun, 16 Nov 2008 23:30:57 +0000 (00:30 +0100)]
unitialized return value in mm/mlock.c: __mlock_vma_pages_range()
Fix an unitialized return value when compiling on parisc (with CONFIG_UNEVICTABLE_LRU=y):
mm/mlock.c: In function `__mlock_vma_pages_range':
mm/mlock.c:165: warning: `ret' might be used uninitialized in this function
Signed-off-by: Helge Deller <[email protected]>
[ It isn't ever really used uninitialized, since no caller should ever
call this function with an empty range. But the compiler is correct
that from a local analysis standpoint that is impossible to see, and
fixing the warning is appropriate. ] Signed-off-by: Linus Torvalds <[email protected]>
Mark Brown [Wed, 12 Nov 2008 16:34:02 +0000 (17:34 +0100)]
mfd: Correct WM8350 I2C return code usage
The vendor BSP used for the WM8350 development provided an I2C driver
which incorrectly returned zero on succesful sends rather than the
number of transmitted bytes, an error which was then propagated into the
WM8350 I2C accessors.
Linus Torvalds [Sun, 16 Nov 2008 18:09:34 +0000 (10:09 -0800)]
acpi: fix oops in acpi_system_wakeup_device_seq_show
Commit 0794469da3f7b2093575cbdfc1108308dd3641ce: ("ACPI: struct device -
replace bus_id with dev_name(), dev_set_name()") introduced a bug by
testing 'dev_name(ldev)' instead of 'ldev->bus' for NULL when printing
out the bus information.
Jesse Brandeburg [Fri, 14 Nov 2008 13:51:54 +0000 (13:51 +0000)]
e100: fix dma error in direction for mapping
The e100 driver triggers BUG_ON(buf->direction != dir)
by doing pci_map_single(..., PCI_DMA_BIDIRECTIONAL)
and pci_dma_sync_single_for_device(..., PCI_DMA_TODEVICE).
Changing the DMA direction, especially with dmabounce will result
in unexpected behaviour.
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
igb_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
David Brownell [Sun, 16 Nov 2008 08:36:08 +0000 (00:36 -0800)]
pegasus: minor resource shrinkage
Make pegasus driver not allocate a workqueue until the driver
is bound to some device, which will need that workqueue if
the device is brought up. This conserves resources when the
driver is linked but there's no pegasus device connected.
Also shrink the runtime footprint a smidgeon by moving some
init-only code into its proper section, and move an obnoxious
(frequent and meaningless) message to be debug-only.
PJ Waskiewicz [Fri, 7 Nov 2008 12:16:08 +0000 (12:16 +0000)]
ixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}()
netif_carrier_off() is sufficient to stop Tx into the driver. Stopping the Tx
queues is redundant and unnecessary. By the same token, netif_carrier_on()
will be sufficient to re-enable Tx, so waking the queues is unnecessary.
walimis [Sat, 15 Nov 2008 07:19:06 +0000 (15:19 +0800)]
function tracing: fix wrong pos computing when read buffer has been fulfilled
Impact: make output of available_filter_functions complete
phenomenon:
The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.
root cause:
When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.
So, every time the last function to fill the read buffer, if overflowed,
will be dropped.
This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.
fix:
Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.
Another little fix is to show correct allocating pages count.
Ingo Molnar [Sun, 16 Nov 2008 07:07:15 +0000 (08:07 +0100)]
sched: fix kernel warning on /proc/sched_debug access
Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:
Linus Torvalds [Sun, 16 Nov 2008 03:02:48 +0000 (19:02 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: don't grab devices with no input
HID: fix radio-mr800 hidquirks
HID: fix kworld fm700 radio hidquirks
HID: fix start/stop cycle in usbhid driver
HID: use single threaded work queue for hid_compat
HID: map macbook keys for "Expose" and "Dashboard"
HID: support for new unibody macbooks
HID: fix locking in hidraw_open()
Al Viro [Sat, 15 Nov 2008 01:15:43 +0000 (01:15 +0000)]
Fix inotify watch removal/umount races
Inotify watch removals suck violently.
To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex. That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount. We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.
Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done. Cleanup is just deactivate_super().
However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords. That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.
We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable. OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e. that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().
So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong. We had
to drop ih->mutex before we could grab ->s_umount. So the watch
could've been gone already.
That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer. If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address. That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that. Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.
So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check. If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.
And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.
Linus Torvalds [Sat, 15 Nov 2008 20:10:32 +0000 (12:10 -0800)]
Merge branch 'sh/for-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
serial: sh-sci: Reorder the SCxTDR write after the TDxE clear.
sh: __copy_user function can corrupt the stack in case of exception
sh: Fixed the TMU0 reload value on resume
sh: Don't factor in PAGE_OFFSET for valid_phys_addr_range() check.
sh: early printk port type fix
i2c: fix i2c-sh_mobile rx underrun
sh: Provide a sane valid_phys_addr_range() to prevent TLB reset with PMB.
usb: r8a66597-hcd: fix wrong data access in SuperH on-chip USB
fix sci type for SH7723
serial: sh-sci: fix cannot work SH7723 SCIFA
sh: Handle fixmap TLB eviction more coherently.
A common reason for device drivers to implement their own printk macros
is the lack of a printk prefix with the standard pr_xyz macros.
Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the
format string.
The most common use of the pr_fmt macro would be to add the name of the
device driver to all pr_xyz messages in a source file.
Linus Torvalds [Sat, 15 Nov 2008 19:38:41 +0000 (11:38 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
V4L/DVB (9624): CVE-2008-5033: fix OOPS on tvaudio when controlling bass/treble
V4L/DVB (9623): tvaudio: Improve debug msg by printing something more human
V4L/DVB (9622): tvaudio: Improve comments and remove a unneeded prototype
V4L/DVB (9621): Avoid writing outside shadow.bytes[] array
V4L/DVB (9620): tvaudio: use a direct reference for chip description
V4L/DVB (9619): tvaudio: update initial comments
V4L/DVB (9618): tvaudio: add additional logic to avoid OOPS
V4L/DVB (9617): tvtime: remove generic_checkmode callback
V4L/DVB (9616): tvaudio: cleanup - group all callbacks together
V4L/DVB (9615): tvaudio: instead of using a magic number, use ARRAY_SIZE
V4L/DVB (9613): tvaudio: fix a memory leak
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] dpt_i2o: fix transferred data length for scsi_set_resid()
[SCSI] scsi_error regression: Fix idempotent command handling
[SCSI] zfcp: Fix hexdump data in s390dbf traces
[SCSI] zfcp: fix erp timeout cleanup for port open requests
[SCSI] zfcp: Wait for port scan to complete when setting adapter online
[SCSI] zfcp: Fix cast warning
[SCSI] zfcp: Fix request list handling in error path
[SCSI] zfcp: fix mempool usage for status_read requests
[SCSI] zfcp: fix req_list_locking.
[SCSI] zfcp: Dont clear reference from SCSI device to unit
[SCSI] qla2xxx: Update version number to 8.02.01-k9.
[SCSI] qla2xxx: Return a FAILED status when abort mailbox-command fails.
[SCSI] qla2xxx: Do not honour max_vports from firmware for 2G ISPs and below.
[SCSI] qla2xxx: Use pci_disable_rom() to manipulate PCI config space.
[SCSI] qla2xxx: Correct Atmel flash-part handling.
[SCSI] megaraid: fix mega_internal_command oops
David Woodhouse [Fri, 14 Nov 2008 13:47:31 +0000 (13:47 +0000)]
Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"
This reverts commit e51af6630848406fc97adbd71443818cdcda297b, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.
The better fix is commit cbda1ba898647aeb4ee770b803c922f595e97731
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.
| drivers/misc/c2port/core.c: In function 'c2port_reset':
| drivers/misc/c2port/core.c:73: error: dereferencing pointer to incomplete type
| drivers/misc/c2port/core.c: In function 'c2port_strobe_ck':
| drivers/misc/c2port/core.c:91: error: dereferencing pointer to incomplete type
Include <linux/sched.h> to fix it, as m68k's local_irq_enable() needs to know
about struct task_struct.
m68k: Fix off-by-one in m68k_setup_user_interrupt()
commit 69961c375288bdab7604e0bb1c8d22999bb8a347 ("[PATCH] m68k/Atari:
Interrupt updates") added a BUG_ON() with an incorrect upper bound
comparison, which causes an early crash on VME boards, where IRQ_USER is
8, cnt is 192 and NR_IRQS is 200.
Takashi Iwai [Sat, 15 Nov 2008 18:28:54 +0000 (19:28 +0100)]
ALSA: hda - Check model type instead of SSID in patch_92hd71bxx()
Check board preset model instead of codec->subsystem_id in
patch_92hd71bxx() so that other hardwares configured via the model
option work like the given model.
Linus Torvalds [Sat, 15 Nov 2008 18:20:36 +0000 (10:20 -0800)]
Move "exit_robust_list" into mm_release()
We don't want to get rid of the futexes just at exit() time, we want to
drop them when doing an execve() too, since that gets rid of the
previous VM image too.
Doing it at mm_release() time means that we automatically always do it
when we disassociate a VM map from the task.
Julia Lawall [Fri, 14 Nov 2008 18:08:18 +0000 (19:08 +0100)]
ALSA: sound/pci/pcxhr/pcxhr.c: introduce missing kfree and pci_disable_device
Error handling code following a kzalloc should free the allocated data.
The error handling code is adjusted to call pci_disable_device(pci); as
well, as done later in the function
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@
(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
when != if (...) { <+...x...+> }
x->f = E
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
Matthew Ranostay [Fri, 14 Nov 2008 22:46:22 +0000 (17:46 -0500)]
ALSA: hda: STAC_VREF_EVENT value change
Changed value for STAC_VREF_EVENT from 0x40 to 0x00 because the
unsol response value is only 6-bits width and the former value
was 1<<6 which is an overrun.
[SCSI] dpt_i2o: fix transferred data length for scsi_set_resid()
dpt_i2o.c::adpt_i2o_to_scsi() reads the value at (reply+5) which
should contain the length in bytes of the transferred data. This
would be correct if reply was a u32 *. However it is a void * here,
so we need to read the value at (reply+20) instead.
The value at (reply+5) is usually 0xff0000, which is apparently
'large enough' and didn't cause any trouble until 2.6.27 where
Jeff Layton [Fri, 14 Nov 2008 18:53:46 +0000 (13:53 -0500)]
cifs: reinstate sharing of SMB sessions sans races
We do this by abandoning the global list of SMB sessions and instead
moving to a per-server list. This entails adding a new list head to the
TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to
a non-atomic variable. We have to protect it by a lock anyway, so there's
no benefit to making it an atomic. The list and refcount are protected
by the global cifs_tcp_ses_lock.
The patch also adds a new routines to find and put SMB sessions and
that properly take and put references under the lock.
Tejun Heo [Thu, 13 Nov 2008 01:04:46 +0000 (10:04 +0900)]
libata: improve phantom device detection
Currently libata uses four methods to detect device presence.
1. PHY status if available.
2. TF register R/W test (only promotes presence, never demotes)
3. device signature after reset
4. IDENTIFY failure detection in SFF state machine
Combination of the above works well in most cases but recently there
have been a few reports where a phantom device causes unnecessary
delay during probe. In both cases, PHY status wasn't available. In
one case, it passed #2 and #3 and failed IDENTIFY with ATA_ERR which
didn't qualify as #4. The other failed #2 but as it passed #3 and #4,
it still caused failure.
In both cases, phantom device reported diagnostic failure, so these
cases can be safely worked around by considering any !ATA_DRQ IDENTIFY
failure as NODEV_HINT if diagnostic failure is set.
Jeff Layton [Fri, 14 Nov 2008 18:44:38 +0000 (13:44 -0500)]
cifs: disable sharing session and tcon and add new TCP sharing code
The code that allows these structs to be shared is extremely racy.
Disable the sharing of SMB and tcon structs for now until we can
come up with a way to do this that's race free.
We want to continue to share TCP sessions, however since they are
required for multiuser mounts. For that, implement a new (hopefully
race-free) scheme. Add a new global list of TCP sessions, and take
care to get a reference to it whenever we're dealing with one.
Commit 46abc02175b3c246dd5141d878f565a8725060c9 ("phylib: give mdio
buses a device tree presence") added a call to device_unregister() in
a situation where the caller did not intend for the device to be
freed yet, but apart from just unregistering the device from the
system, device_unregister() does an additional put_device() that is
intended to free it.
The right function to use in this situation is device_del(), which
unregisters the device from the system like device_unregister() does,
but without dropping the reference count an additional time.
J. K. Cliburn [Sun, 9 Nov 2008 21:05:30 +0000 (15:05 -0600)]
atl1: Do not enumerate options unsupported by chip
Of the various WOL options provided in include/linux/ethtool.h, the
L1 NIC supports only magic packet. Remove all options except magic
packet from the atl1 driver.
Anton Vorontsov [Thu, 13 Nov 2008 18:26:27 +0000 (21:26 +0300)]
net/ucc_geth: Fix oops in uec_get_ethtool_stats()
p_{tx,rx}_fw_statistics_pram are special: they're available only when
a device is open. If the device is closed, we should just fill the data
with zeroes.
Fixes the following oops:
root@b1:~# ifconfig eth1 down
root@b1:~# ethtool -S eth1
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc01e1dcc
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
NIP [c01e1dcc] uec_get_ethtool_stats+0x98/0x124
LR [c0287cc8] ethtool_get_stats+0xfc/0x23c
Call Trace:
[cfaadde0] [c0287ca8] ethtool_get_stats+0xdc/0x23c (unreliable)
[cfaade20] [c0288340] dev_ethtool+0x2fc/0x588
[cfaade50] [c0285648] dev_ioctl+0x290/0x33c
[cfaadea0] [c0272238] sock_ioctl+0x80/0x2ec
[cfaadec0] [c00b5ae4] vfs_ioctl+0x40/0xc0
[cfaadee0] [c00b5fa8] do_vfs_ioctl+0x78/0x20c
[cfaadf10] [c00b617c] sys_ioctl+0x40/0x74
[cfaadf40] [c00142d8] ret_from_syscall+0x0/0x38
[...]
---[ end trace b941007b2dfb9759 ]---
Segmentation fault
p.s. While at it, also remove u64 casts, they aren't needed.
Pavel Emelyanov [Fri, 14 Nov 2008 22:51:45 +0000 (14:51 -0800)]
scm: fix scm_fp_list->list initialization made in wrong place
This is the next page of the scm recursion story (the commit f8d570a4 net: Fix recursive descent in __scm_destroy()).
In function scm_fp_dup(), the INIT_LIST_HEAD(&fpl->list) of newly
created fpl is done *before* the subsequent memcpy from the old
structure and thus the freshly initialized list is overwritten.
But that's OK, since this initialization is not required at all,
since the fpl->list is list_add-ed at the destruction time in any
case (and is unused in other code), so I propose to drop both
initializations, rather than moving it after the memcpy.
Please, correct me if I miss something significant.
Randy Dunlap [Wed, 12 Nov 2008 21:05:17 +0000 (13:05 -0800)]
9p: restrict RDMA usage
linux-next:
Make 9p's RDMA option depend on INET since it uses Infiniband rdma_*
functions and that code depends on INET. Otherwise 9p can try to
use symbols which don't exist.
Randy Dunlap [Thu, 13 Nov 2008 21:33:24 +0000 (21:33 +0000)]
Create/use more directory structure in the Documentation/ tree.
Create Documentation/blockdev/ sub-directory and populate it.
Populate the Documentation/serial/ sub-directory.
Move MSI-HOWTO.txt to Documentation/PCI/.
Move ioctl-number.txt to Documentation/ioctl/.
Update all relevant 00-INDEX files.
Update all relevant Kconfig files and source files.
The uname system call for 64 bit compares current->personality without
masking the upper 16 bits. If e.g. READ_IMPLIES_EXEC is set the result
of a uname system call will always be s390x even if the process uses
the s390 personality.
Heiko Carstens [Fri, 14 Nov 2008 17:18:07 +0000 (18:18 +0100)]
[S390] cpu topology: fix locking
cpu_coregroup_map used to grab a mutex on s390 since it was only
called from process context.
Since c7c22e4d5c1fdebfac4dba76de7d0338c2b0d832 "block: add support
for IO CPU affinity" this is not true anymore.
It now also gets called from softirq context.
To prevent possible deadlocks change this in architecture code and
use a spinlock instead of a mutex.
Cornelia Huck [Fri, 14 Nov 2008 17:18:06 +0000 (18:18 +0100)]
[S390] cio: Fix refcount after moving devices.
In ccw_device_move_to_orphanage(), a replacing ccw_device
is searched via get_{disc,orphaned}_ccwdev_by_dev_id()
which obtain a reference on the returned ccw_device.
This reference must be given up again after the device
has been moved to its new parent.
Heiko Carstens [Fri, 14 Nov 2008 17:18:05 +0000 (18:18 +0100)]
[S390] ftrace: fix kernel stack backchain walking
With CONFIG_IRQSOFF_TRACER the trace_hardirqs_off() function includes
a call to __builtin_return_address(1). But we calltrace_hardirqs_off()
from early entry code. There we have just a single stack frame.
So this results in a kernel stack backchain walk that would walk beyond
the kernel stack. Following the NULL terminated backchain this results
in a lowcore read access.
To fix this we simply call trace_hardirqs_off_caller() and pass the
current instruction pointer.
[S390] kvm_s390: Fix oops in virtio device detection with "mem="
The current virtio model on s390 has the descriptor page above the main
memory. The guest virtio detection will oops if the mem= parameter is
used to reduce/change the memory size.
We have to use real_memory_size instead of max_pfn to detect the virtio
descriptor pages.
Gerald Schaefer [Fri, 14 Nov 2008 17:18:00 +0000 (18:18 +0100)]
[S390] Fix range for add_active_range() in setup_memory()
add_active_range() expects start_pfn + size as end_pfn value, i.e. not
the pfn of the last page frame but the one behind that.
We used the pfn of the last page frame so far, which can lead to a
BUG_ON in move_freepages(), when the kernelcore parameter is specified
(page_zone(start_page) != page_zone(end_page)).
Not all tvaudio chips allow controlling bass/treble. So, the driver
has a table with a flag to indicate if the chip does support it.
Unfortunately, the handling of this logic were broken for a very long
time (probably since the first module version). Due to that, an OOPS
were generated for devices that don't support bass/treble.
This were the resulting OOPS message before the patch, with debug messages
enabled:
V4L/DVB (9623): tvaudio: Improve debug msg by printing something more human
Before the patch, the used ioctl were printed as an hexadecimal code,
hard to be understand without consulting the way _IO macros work.
Instead, use the V4L default handler for printing such errors into a way
that would be easier to understand.