Git Repo - linux.git/log

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Do less agressive buddy clearing
sched: Disable SD_PREFER_LOCAL for MC/CPU domains

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Set DELIVERY_MODE=4 for vector=NMI_VECTOR in uv_hub_send_ipi()
  x86, UV: Fix and clean up bau code to use uv_gpa_to_pnode()
  x86: Don't print number of MCE banks for every CPU
  x86, UV: Fix information in __uv_hub_info structure
  x86: Document linker script ASSERT() quirk

frv: fix check on unsigned in do_signal()

syscallno is unsigned

Signed-off-by: Roel Kluin <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: don't call pte_unmap() against an improper pte

There are some places where we do like:

pte = pte_map();
do {
(do break in some conditions)
} while (pte++, ...);
pte_unmap(pte - 1);

But if the loop breaks at the first loop, pte_unmap() unmaps invalid pte.

This patch is a fix for this problem.

Signed-off-by: Daisuke Nishimura <[email protected]>
Reviewd-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lis3: add support for the HP HDX 18

I have an HP HDX 18 laptop, and noted that the configuration of the
accelerometer needs to be x_inverted.

Signed-off-by: Ian E. Morgan <[email protected]>
Signed-off-by: Éric Piel <[email protected]>
Cc: Pavel Machek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lis3: add support for the HP EliteBook 8530w

Correct orientation for HP EliteBook 8530w.

Reported-by: Jörgen Jonssson <[email protected]>
Signed-off-by: Éric Piel <[email protected]>
Cc: Pavel Machek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lis3: better support for hp 6730x

I have learned that the 6730b and 6730s have different accelerometer
orientation, and have modified the driver accordingly (diff attached),
while dropping the wild guess for AMD based 6735 having the same
orientation as Intel based 6730 (this is not true for any other related
series/family, thus is not probable for 673x).

Signed-off-by: Pavel Herrmann <[email protected]>
Signed-off-by: Éric Piel <[email protected]>
Cc: Pavel Machek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

cpuidle: always return with interrupts enabled

In the case where cpuidle_idle_call() returns before changing state due to
a need_resched(), it was returning with IRQs disabled.

The idle path assumes that the platform specific idle code returns with
interrupts enabled (although this too is undocumented AFAICT) and on ARM
we have a WARN_ON(!(irqs_disabled()) when returning from the idle loop, so
the user-visible effects were only a warning since interrupts were
eventually re-enabled later.

On x86, this same problem exists, but there is no WARN_ON() to detect it.
As on ARM, the interrupts are eventually re-enabled, so I'm not sure of
any actual bugs triggered by this. It's primarily a
correctness/consistency fix.

This patch ensures IRQs are (re)enabled before returning.

Reported-by: Hemanth V <[email protected]>
Signed-off-by: Kevin Hilman <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Tested-by: Martin Michlmayr <[email protected]>
Cc: <[email protected]> [2.6.31.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: version 0.30

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: fix false EXPORT_SYMBOL warning

Ingo reported that the following lines triggered a false warning,

static struct lock_class_key rcu_lock_key;
struct lockdep_map rcu_lock_map =
STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
EXPORT_SYMBOL_GPL(rcu_lock_map);

from kernel/rcutree.c , and the false warning looked like this,

WARNING: EXPORT_SYMBOL(foo); should immediately follow its
function/variable
+EXPORT_SYMBOL_GPL(rcu_lock_map);

We actually should be checking the statement before the EXPORT_* for a
mention of the exported object, and complain where it is not there.

[[email protected]: coding-style fixes]
Cc: Ingo Molnar <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Reported-by: Daniel Walker <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: fix __attribute__ matching

In the following code,

union thread_union init_thread_union
__attribute__((__section__(".data.init_task"))) =
{ INIT_THREAD_INFO(init_task) };

There is a non-conforming declaration. It should really be like the
following,

union thread_union init_thread_union
__attribute__((__section__(".data.init_task"))) = {
INIT_THREAD_INFO(init_task)
};

However, checkpatch doesn't catch this right now because it doesn't
correctly evaluate the "__attribute__".

It is not at all clear that we care what preceeds an assignment style
attribute when we find the open brace. Relax the test so we do not need
to check the __attribute__.

Reported-by: Daniel Walker <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: fix false errors due to macro concatenation

The macro concatenation (##) sequence can cause false errors when checking
macro's.  Checkpatch doesn't currently know about the operator.

For example this line,

+ entry = (struct ftrace_raw_##call *)raw_data;                   \

is correct but it produces the following error,

ERROR: need consistent spacing around '*' (ctx:WxB)
+       entry = (struct ftrace_raw_##call *)raw_data;\
                                          ^

The line above doesn't have any spacing problems, and if you remove the
macro concatenation sequence checkpatch doesn't give any errors.

Extend identifier handling to include ## concatenation within the
definition of an identifier.

Reported-by: Daniel Walker <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: update copyright dates

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: correctly stop scanning at the bottom of a hunk

We are allowing context scanning checks to apply against the first line of
context outside at the end of the hunk. This can lead to false matches to
patch names leading to various perl warnings. Correctly stop at the
bottom of the hunk.

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: possible types -- prevent illegal modifiers being added

Prevent known non types being detected as modifiers. Ensure we do not
look at any type which starts with a keyword.

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix sparsemem configuration

Currently, sparsemem is only available if EXPERIMENTAL is enabled.
However, it hasn't ever been marked experimental.

It's been about four years since sparsemem was merged, and we have
platforms which depend on it; allow architectures to decide whether
sparsemem should be the default memory model.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

edac: i5100 fix initialization code

Allow csrows to properly initialize when the topology only has active
channels on 2 and 3.  This new check allows proper detection and
initialization in this topology.  Only checking the first mrt that
represented channels 0 and 1 is not sufficient.

I also fixed up the related debug information path.  I can submit as a 2nd
patch if needed.

Signed-off-by: Keith Mannthey <[email protected]>
Acked-by: Aristeu Rozanski <[email protected]>
Signed-off-by: Doug Thompson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

edac: i5400 fix missing CONFIG_PCI define

When building without CONFIG_PCI the edac_pci_idx variable is unused,
causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI,
just like the rest of the PCI support.

Signed-off-by: Ira W. Snyder <[email protected]>
Signed-off-by: Doug Thompson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

edac: i5400 fix csrow mapping

The i5400 EDAC driver has several bugs with chip-select row computation
which most likely lead to bugs in detailed error reporting.  Attempts to
contact the authors have gone mostly unanswered so I am presenting my diff
here.  I do not subscribe to lkml and would appreciate being kept in the
cc.

The most egregious problem was miscalculating the addresses of MTR
registers after register 0 by assuming they are 32bit rather than 16.
This caused the driver to miss half of the memories.  Most motherboards
tend to have only 8 dimm slots and not 16, so this may not have been
noticed before.

Further, the row calculations multiplied the number of dimms several
times, ultimately ending up with a maximum row of 32.  The chipset only
supports 4 dimms in each of 4 channels, so csrow could not be higher than
4 unless you use a row per-rank with dual-rank dimms.  I opted to
eliminate this behavior as it is confusing to the user and the error
reporting works by slot and not rank.  This gives a much clearer view of
memory by slot and channel in /sys.

Signed-off-by: Jeff Roberson <[email protected]>
Signed-off-by: Doug Thompson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sysctl: fix false positives when PROC_SYSCTL=n

Having ->procname but not ->proc_handler is valid when PROC_SYSCTL=n,
people use such combination to reduce ifdefs with non-standard handlers.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14408

Signed-off-by: Alexey Dobriyan <[email protected]>
Reported-by: Peter Teoh <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hwmon: enhance the sysfs API for power meters

Augment the documentation of the hwmon sysfs API to accomodate ACPI power
meters and the current desired behavior of power capping hardware drivers.

Signed-off-by: Darrick J. Wong <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

twl4030-gpio: remove __devexit markings from remove func

The gpio_twl4030_probe() function calls gpio_twl4030_remove(), and the
former has __devinit, so the latter cannot use __devexit. Otherwise we
hit the section mismatch warning:

WARNING: drivers/gpio/built-in.o(.devinit.text+0x71a): Section mismatch
in reference from the function _gpio_twl4030_probe() to the function
.devexit.text:_gpio_twl4030_remove()
The function __devinit _gpio_twl4030_probe() references a function
__devexit _gpio_twl4030_remove().
This is often seen when error handling in the init function uses
functionality in the exit path.
The fix is often to remove the __devexit annotation of
_gpio_twl4030_remove() so it may be used outside an exit section.

Signed-off-by: Mike Frysinger <[email protected]>
Cc: David Brownell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

8250_pci: add IBM Saturn serial card

The IBM Saturn serial card has only one port. Without that fixup,
the kernel thinks it has two, which confuses userland setup and
admin tools as well.

[[email protected]: fix pci-ids.h layout]
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Acked-by: Alan Cox <[email protected]>
Cc: Michael Reed <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

serial: add ADDI-DATA GmbH PCI-Express communication cards in 8250_pci.c and pci_ids.h

Add support for ADDI-DATA GmbH PCI-Express communication cards:

APCIe-7300
APCIe-7420
APCIe-7500
APCIe-7800

Warning: 8250_pci.c depends on pci_ids.h. 8250_pci.c

Signed-off-by: Krauth Julien <[email protected]>
Acked-by: Alan Cox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

atmel_serial: fix bad BUILD_BUG_ON() usage

is_power_of_2() appears not to be constant enough for BUILD_BUG_ON()
after the latest rework, so replace it with an open-coded test.

Signed-off-by: Haavard Skinnemoen <[email protected]>
Cc: Uwe Kleine-König <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Nicolas Ferre <[email protected]>
Cc: Claudio Scordino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vmscan: order evictable rescue in LRU putback

Isolators putting a page back to the LRU do not hold the page lock, and if
the page is mlocked, another thread might munlock it concurrently.

Expecting this, the putback code re-checks the evictability of a page when
it just moved it to the unevictable list in order to correct its decision.

The problem, however, is that ordering is not garuanteed between setting
PG_lru when moving the page to the list and checking PG_mlocked
afterwards:

#0: #1

spin_lock()
if (TestClearPageMlocked())
  if (PageLRU())
    move to evictable list
SetPageLRU()
spin_unlock()
if (!PageMlocked())
  move to evictable list

The PageMlocked() check may get reordered before SetPageLRU() in #0,
resulting in #0 not moving the still mlocked page, and in #1 failing to
isolate and move the page as well.  The page is now stranded on the
unevictable list.

The race condition is very unlikely.  The consequence currently is one
page falling off the reclaim grid and eventually getting freed with
PG_unevictable set, which triggers a warning in the page allocator.

TestClearPageMlocked() in #1 already provides full memory barrier
semantics.

This patch adds an explicit full barrier to force ordering between
SetPageLRU() and PageMlocked() so that either one of the competitors
rescues the page.

Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

do_mbind(): fix memory leak

If migrate_prep is failed, new variable is leaked. This patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mbind(): fix leak of never putback pages

If mbind() receives an invalid address, do_mbind leaks a page. The
following test program detects this leak.

This patch fixes it.

migrate_efault.c
=======================================
#include <numaif.h>
#include <numa.h>
#include <sys/mman.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

static unsigned long pagesize;

static void* make_hole_mapping(void)
{

void* addr;

addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
MAP_ANON|MAP_PRIVATE, 0, 0);
if (addr == MAP_FAILED)
return NULL;

/* make page populate */
memset(addr, 0, pagesize*3);

/* make memory hole */
munmap(addr+pagesize, pagesize);

return addr;
}

int main(int argc, char** argv)
{
void* addr;
int ch;
int node;
struct bitmask *nmask = numa_allocate_nodemask();
int err;
int node_set = 0;

while ((ch = getopt(argc, argv, "n:")) != -1){
switch (ch){
case 'n':
node = strtol(optarg, NULL, 0);
numa_bitmask_setbit(nmask, node);
node_set = 1;
break;
default:
;
}
}
argc -= optind;
argv += optind;

if (!node_set)
numa_bitmask_setbit(nmask, 0);

pagesize = getpagesize();

addr = make_hole_mapping();

err = mbind(addr, pagesize*3, MPOL_BIND, nmask->maskp, nmask->size, MPOL_MF_MOVE_ALL);
if (err)
perror("mbind ");

return 0;
}
=======================================

Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfs: fix oops on mount with corrupted btree extent records

A particular fsfuzzer run caused an hfs file system to crash on mount.
This is due to a corrupted MDB extent record causing a miscalculation of
HFS_I(inode)->first_blocks for the extent tree. If the extent records are
zereod out, it won't trigger the first_blocks special case. Instead it
falls through to the extent code which we're still in the middle of
initializing.

This patch catches the 0 size extent records, reports the corruption, and
fails the mount.

Reported-by: Ramon de Carvalho Valle <[email protected]>
Signed-off-by: Jeff Mahoney <[email protected]>
Cc: Valdis Kletnieks <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

loop: fix NULL dereference if mount fails

Commit bb21488482bd36eae6b30b014d93619063773fd4 ("[PATCH] switch loop")
started to pass NULL bdev to ioctl hook.

Steps to reproduce:

[boot with loop.max_part=1]
[mount -o loop something so mount fails]

BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
IP: [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
PGD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:35/ACPI0003:00/power_supply/ACAD/online
CPU 0
Modules linked in: zfs nvidia(P) [last unloaded: zfs]
Pid: 15177, comm: mount Tainted: P           2.6.32-rc4-zfs #2 Satellite X200
RIP: 0010:[<ffffffff811486ee>]  [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
RSP: 0018:ffff88003b3d5bb8  EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000125f RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88003b3d5ce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffffffff000
R13: 0000000000000000 R14: ffff880071cef280 R15: 00000000000200da
FS:  00007fd77cfe7740(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000b8 CR3: 0000000001001000 CR4: 00000000000026f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount (pid: 15177, threadinfo ffff88003b3d4000, task ffff88007572f920)
Stack:
ffff88003b3d5c38 ffffffff812f95f5 ffff88007eeb6600 0000000000000000
<0> 0000000000000000 ffff88003b3d5c18 ffffffff811547d9 ffff88001bf11ef0
<0> 7fffffffffffffff ffff88001bf11ee8 ffff88001bf11ef0 0000000000000000
Call Trace:
[<ffffffff812f95f5>] ? schedule_timeout+0x1f5/0x250
[<ffffffff811547d9>] ? rb_insert_color+0x109/0x140
[<ffffffff812fb754>] ? _spin_unlock_irq+0x14/0x40
[<ffffffff812f84c6>] ? wait_for_common+0x66/0x170
[<ffffffff8105a280>] ? default_wake_function+0x0/0x10
[<ffffffff810f8258>] ioctl_by_bdev+0x38/0x50
[<ffffffff811d2481>] loop_clr_fd+0x1e1/0x210
[<ffffffff811d2522>] lo_release+0x72/0x80
[<ffffffff810f934c>] __blkdev_put+0x1ac/0x1d0
[<ffffffff810f937b>] blkdev_put+0xb/0x10
[<ffffffff810f93b9>] blkdev_close+0x39/0x60
[<ffffffff810ccef3>] __fput+0xd3/0x230
[<ffffffff810cd06d>] fput+0x1d/0x30
[<ffffffff810c9680>] filp_close+0x50/0x80
[<ffffffff81061f11>] put_files_struct+0x81/0x100
[<ffffffff81061fde>] exit_files+0x4e/0x60
[<ffffffff81063ec5>] do_exit+0x6b5/0x730
[<ffffffff8107b279>] ? up_read+0x9/0x10
[<ffffffff8104c86e>] ? do_page_fault+0x18e/0x2a0
[<ffffffff81063f81>] do_group_exit+0x41/0xc0
[<ffffffff81064012>] sys_exit_group+0x12/0x20
[<ffffffff81030deb>] system_call_fastpath+0x16/0x1b
Code: f8 48 89 e5 48 81 ec 30 01 00 00 48 89 5d d8 4c 89 6d e8 4c 89 65 e0 4c 89 75 f0 4c 89 7d f8 48 89 bd e8 fe ff ff 49 89 cd 89 f3 <49> 8b 88 b8 00 00 00 81 fa 68 12 00 00 0f 84 57 05 00 00 0f 86
RIP  [<ffffffff811486ee>] blkdev_ioctl+0x2e/0xa30
RSP <ffff88003b3d5bb8>
CR2: 00000000000000b8
---[ end trace c0b4d3c3118d1427 ]---
Fixing recursive fault but reboot is needed!

Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vmscan: limit VM_EXEC protection to file pages

It is possible to have !Anon but SwapBacked pages, and some apps could
create huge number of such pages with MAP_SHARED|MAP_ANONYMOUS. These
pages go into the ANON lru list, and hence shall not be protected: we only
care mapped executable files. Failing to do so may trigger OOM.

Tested-by: Christian Borntraeger <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

revert "mm: oom analysis: add buffer cache information to show_free_areas()"

Revert

    commit 71de1ccbe1fb40203edd3beb473f8580d917d2ca
    Author:     KOSAKI Motohiro <[email protected]>
    AuthorDate: Mon Sep 21 17:01:31 2009 -0700
    Commit:     Linus Torvalds <[email protected]>
    CommitDate: Tue Sep 22 07:17:27 2009 -0700

        mm: oom analysis: add buffer cache information to show_free_areas()

show_free_areas() is called during page allocation failures, and page
allocation failures can occur in any calling context.

But nr_blockdev_pages() takes VFS locks which should not be taken from
hard IRQ context (at least).  The result is lockdep warnings (and
deadlockability) during page allocation failures.

Cc: KOSAKI Motohiro <[email protected]>
Cc: Wu Fengguang <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hfsplus: refuse to mount volumes larger than 2TB

As found in <http://bugs.debian.org/550010>, hfsplus is using type u32
rather than sector_t for some sector number calculations.

In particular, hfsplus_get_block() does:

        u32 ablock, dblock, mask;
...
        map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));

I am not confident that I can find and fix all cases where a sector number
may be truncated.  For now, avoid data loss by refusing to mount HFS+
volumes with more than 2^32 sectors (2TB).

[[email protected]: fix 32 and 64-bit issues]
Signed-off-by: Ben Hutchings <[email protected]>
Cc: Eric Sesterhenn <[email protected]>
Cc: Roman Zippel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: rt2x00 list is moderated

Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: add Open Firmware / Flattened Device Tree entry

Signed-off-by: Grant Likely <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: document new "K:" entry type

K: is for keyword. Syntax is perl extended regex.

Reorganized header documentation and indent the section entry descriptions
so that the first K: would not be considered a regex to match by
get_maintainer.pl

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/get_maintainer.pl: add patch/file search for keywords

Based on an idea from Wolfram Sang.

Add search for MAINTAINERS line "K:" regex pattern match in a patch or file
Matches are added after file pattern matches
Add --keywords command line switch (default 1, on)
Change version to 0.21

Signed-off-by: Joe Perches <[email protected]>
Cc: Wolfram Sang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update WOLFSON MICROELECTRONICS

Integrate P:/M: lines
Remove L: [email protected]

Signed-off-by: Joe Perches <[email protected]>
Cc: Mark Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: fix up PERIPHERAL spelling

Signed-off-by: Joe Perches <[email protected]>
Cc: Li Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: WINBOND CIR - Integrate P:/M: lines, fixup David Härdeman's name

Signed-off-by: Joe Perches <[email protected]>
Cc: David Härdeman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: SIMPLE FIRMWARE INTERFACE: update email style

Signed-off-by: Joe Perches <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update SCORE architecture name style and add file pattern

Signed-off-by: Joe Perches <[email protected]>
Cc: Chen Liqin <[email protected]>
Cc: Lennox Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update Kernel Janitors after mismerge

Fix the mismerge of the W: URL and the S: status fields.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: use tab not spaces after field types

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: change ATM mailing list to moderated

Signed-off-by: Joe Perches <[email protected]>
Cc: Chas Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update OMAP Tony Lindgren email name

Which had an embedded and duplicated email address

Signed-off-by: Joe Perches <[email protected]>
Cc: Tony Lindgren <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update TRACING section

Move to alphabetic position
Use single line F: entries

Signed-off-by: Joe Perches <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update GENERIC UIO FOR PCI DEVICES

Quote a name with a period
remove L: [email protected]

Signed-off-by: Joe Perches <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

omap_hsmmc: add missing probe handler hook

The missing probe handler hook will never probe the driver. Add it back.
Fixes broken MMC on OMAP.

We use platform_driver_probe() API since omap_hsmmc is not a hot-pluggable
device.

Signed-off-by: Roger Quadros <[email protected]>
Tested-by: Felipe Contreras <[email protected]>
Tested-by: Tony Lindgren <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Felipe Contreras <[email protected]>
Cc: Denis Karpov <[email protected]>
Cc: Madhusudhan Chikkature <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

strstrip(): mark as as must_check

strstrip() can return a modified value of its input argument, when
removing elading whitesapce. So it is surely bug for this function's
return value to be ignored. The caller is probably going to use the
incorrect original pointer.

So mark it __must_check to prevent this frm happening (as it has before).

Signed-off-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

cgroup: fix strstrip() misuse

cgroup_write_X64() and cgroup_write_string() ignore the return value of
strstrip(). it makes small inconsistent behavior.

example:
=========================
# cd /mnt/cgroup/hoge
# cat memory.swappiness
60
# echo "59 " > memory.swappiness
# cat memory.swappiness
59
# echo " 58" > memory.swappiness
bash: echo: write error: Invalid argument

This patch fixes it.

Cc: Li Zefan <[email protected]>
Acked-by: Paul Menage <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

congestion_wait(): don't use WRITE

commit 8aa7e847d (Fix congestion_wait() sync/async vs read/write
confusion) replace WRITE with BLK_RW_ASYNC. Unfortunately, concurrent mm
development made the unchanged place accidentally.

This patch fixes it too.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

connector: fix regression introduced by sid connector

Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([<000000013fe04100>] 0x13fe04100)
[<000000000048a946>] sk_filter+0x9a/0xd0
[<000000000049d938>] netlink_broadcast+0x2c0/0x53c
[<00000000003ba9ae>] cn_netlink_send+0x272/0x2b0
[<00000000003baef0>] proc_sid_connector+0xc4/0xd4
[<0000000000142604>] __set_special_pids+0x58/0x90
[<0000000000159938>] sys_setsid+0xb4/0xd8
[<00000000001187fe>] sysc_noemu+0x10/0x16
[<00000041616cb266>] 0x41616cb266

The warning is
---> WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err > 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger <[email protected]>
Cc: Scott James Remnant <[email protected]>
Cc: Matt Helsley <[email protected]>
Cc: David S. Miller <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Evgeniy Polyakov <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hwpoison: fix/proc/meminfo alignment

Given such a long name, the kB count in /proc/meminfo's HardwareCorrupted
line is being shown too far right (it does align with x86_64's VmallocChunk
above, but I hope nobody will ever have that much corrupted!). Align it.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hwpoison: fix oops on ksm pages

Memory failure on a KSM page currently oopses on its NULL anon_vma in
page_lock_anon_vma(): that may not be much worse than the consequence of
ignoring it, but it is better to be consistent with how ZERO_PAGE and
hugetlb pages and other awkward cases are treated. Just skip it.

We could fix it for 2.6.32 at the KSM end, by putting a dummy anon_vma
pointer in there; but that would get harder next time, when KSM will put a
pointer to something else there (and I'm not currently planning to do any
work to open that up to memory_failure). So I would prefer this simple
PageKsm test, until the other exceptions are handled.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

cpufreq: add cpufreq_get() stub for CONFIG_CPU_FREQ=n

When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm
(although it looks like a bit of the kvm code could be omitted when
CONFIG_CPU_FREQ is disabled).

arch/x86/built-in.o: In function `kvm_arch_init':
(.text+0x10de7): undefined reference to `cpufreq_get'

(Needed in linux-next's KVM tree, but it's correct in 2.6.32).

Signed-off-by: Randy Dunlap <[email protected]>
Tested-by: Eric Paris <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Dave Jones <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

[S390] smp: fix sigp sense handling

sigp sense only returns the status of a cpu if it is non zero. If the
status of the sensed cpu is all zeros condition code 0 (accpeted) is
set and no status bits are returned.
The current code however assumes that a status was returned and tests
bits in it. This means uninitalized data is accessed with random
results.
Worst case is that the code that checks if cpu is offline on cpu
hotplug assumes that the target cpu is offline while it is still
running. This leads potentially to memory corruption since resources
that are still needed by the target cpu will be freed and could be
resused while still in use.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] smp: fix sigp stop handling

According to the architecture a cpu must not necessarily enter stopped
state after completion of a sigp instruction with "stop" order code.
So remove the BUG() statement after self sending sigp stop to avoid
that it ever gets reached.
Also add a sigp busy check to make sure that the order gets delivered.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] cputime: fix overflow on 31 bit systems

The cputime_to_msecs / cputime_to_clock_t and cputime64_to_clock_t
cause fixpoint divide exceptions if the cputime is too large.
On a machine that collected 49.7 days worth of idle time reading
from /proc/stat will generate oopses like this:

Kernel BUG at 001b0c92 [verbose debug info unavailable]
fixpoint divide exception: 0009 [#13] SMP
Modules linked in: ipv6
CPU: 1 Tainted: G      D   2.6.27.10 #5
Process cat (pid: 21352, task: 1fb34138, ksp: 1d2a3d98)
Krnl PSW : 070c2000 801b0c92 (show_stat+0x2ca/0x68c)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0
Krnl GPRS: 00000001 00001388 00000bb8 0015d2a1
           00000000 00000000 000003e8 0001fd91
           00000000 00000000 0000129d eecd2ff0
           1cc533b9 0036f780 801b0bce 1d2a3cc0
Krnl Code: 801b0c86: f18890abf198       mvo     171(9,%r9),408(9,%r15)
           801b0c8c: 98abf170           lm      %r10,%r11,368(%r15)
           801b0c90: 1da1               dr      %r10,%r1
          >801b0c92: 90abf170           stm     %r10,%r11,368(%r15)
           801b0c96: 98abf190           lm      %r10,%r11,400(%r15)
           801b0c9a: 1da1               dr      %r10,%r1
           801b0c9c: 90abf190           stm     %r10,%r11,400(%r15)
           801b0ca0: 18a3               lr      %r10,%r3
Call Trace:
([<00000000001b09f4>] show_stat+0x2c/0x68c)
[<000000000018dcee>] seq_read+0xb2/0x364
[<00000000001a9980>] proc_reg_read+0x68/0x98
[<00000000001705ee>] vfs_read+0x6e/0xe8
[<0000000000170732>] sys_read+0x36/0x78
[<000000000010f750>] sysc_do_restart+0x12/0x16
[<0000000077f3ad6a>] 0x77f3ad6a
<4>---[ end trace 1436ea9559d3de9e ]---

Reported-by: Mike Frysinger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] call home: fix string length handling

After copying uts->nodename to the static nodename array the static
version isn't necessarily zero termininated, since the size of the
array is one byte too short.
Afterwards doing strncat(data, nodename, strlen(nodename)); may copy
an arbitrary large amount of bytes.
Fix this by getting rid of the static array and using strncat with
proper length limit.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] call home: fix error handling in init function

Fix missing unregister_sysctl_table in case the SCLP doesn't provide
the requested feature. Also simplify the whole error handling while
at it.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] smp: fix prefix handling of offlined cpus

Offlined cpus still have valid prefix register contents. Dumpers
will store the register contents of a cpu to the location where its
prefix register points to.
For offlined cpus the area (lowcore) has been freed and the dumper
would write the uninteresting contents of the offline cpu to a memory
location which might be in use by some other component and destroy
valueable information.
To fix this set the prefix register of offline cpus to absolute
address zero again. This prevents the current dumpers to write to
random memory locations.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] s/r: cmm resume fix

If a suspended z/VM guest has been logged off before the resume the
'SET SMSG IUCV' CP command need to be repeated to reenable sending
message via SMSG. This fixes the following error:

HCPMFS057I H4214002 not receiving; SMSG off
Error: non-zero CP response for command 'SMSG H4214002 CMM SHRINK 5010': #57

Signed-off-by: Martin Schwidefsky <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

[S390] call home: fix local buffer usage in proc handler

Fix the size of the local buffer and use snprintf to prevent
further miscalculations. Also fix the usage of bitwise vs logic
operations.

Signed-off-by: Sebastian Ott <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

blkdev: flush disk cache on ->fsync

Currently there is no barrier support in the block device code.  That
means we cannot guarantee any sort of data integerity when using the
block device node with dis kwrite caches enabled.  Using the raw block
device node is a typical use case for virtualization (and I assume
databases, too).  This patch changes block_fsync to issue a cache flush
and thus make fsync on block device nodes actually useful.

Note that in mainline we would also need to add such code to the
->aio_write method for O_SYNC handling, but assuming that Jan's patch
series for the O_SYNC rewrite goes in it will also call into ->fsync
for 2.6.32.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: move bdi/address_space unplug functions to backing-dev.h

There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.

Signed-off-by: Jens Axboe <[email protected]>

backing-dev: ensure that a removed bdi no longer has super_block referencing it

When the bdi is being removed, we have to ensure that no super_blocks
currently have that cached in sb->s_bdi. Normally this is ensured by
the sb having a longer life span than the bdi, but if the device is
suddenly yanked, we have to kill this reference. sb->s_bdi is pointed
to freed memory at that point.

This fixes a problem with sync(1) hanging when a USB stick is pulled
without cleanly umounting it first.

Reported-by: Pavel Machek <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

net: Fix 'Re: PACKET_TX_RING: packet size is too long'

Currently PACKET_TX_RING forces certain amount of every frame to remain
unused. This probably originates from an early version of the
PACKET_TX_RING patch that in fact used the extra space when the (since
removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current
code does not make any use of this extra space.

This patch removes the extra space reservation and lets userspace make
use of the full frame size.

Signed-off-by: Gabor Gombas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ide: Serialize CMD643 and CMD646 to fix a hardware bug with SSD

CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
connected to one of the channels.

Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
/dev/hda is 8GB Seagate ST38410A in MWDMA2
/dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2

- When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
  are corrupted, there are twiddled single bits 1->0 and some full 32-bit
  words corrupted, sometimes commands fail (which switches /dev/hda to
  PIO mode but the corruptions happen even in PIO).
- Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
- When I connected normal rotating harddisk to /dev/hdd, there was no
  corruption, so the corruption is something specific to SSD.
- I tried the same setup on a PCI card with CMD649 and saw no corruption.

This patch serializes the operation for CMD646 and 643 (I didn't test
CMD643 but it may have the same hw bug too because it's earlier design).
CMD649 is good. I don't know anything about CMD 648.

Signed-off-by: Mikulas Patocka <[email protected]>
Tested-by: Frans Pop <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netdev: usb: dm9601.c can drive a device not supported yet, add support for it

I found that the current version of drivers/net/usb/dm9601.c can be used to
successfully drive a low-power, low-cost network adapter with USB ID
0a46:9000, based on a DM9000E chipset. As no device with this ID is yet
present in the kernel, I have created a patch that adds support for the device
to the dm9601 driver.

Created and tested against linux-2.6.32-rc5.

Signed-off-by: Janusz Krzysztofik <[email protected]>
Acked-by: Peter Korsgaard <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qlge: Fix firmware mailbox command timeout.

The mailbox command process would only process a maximum of 5 unrelated
firmware events while waiting for it's command completion status.
It should process an unlimited number of events while waiting for a maximum of 5 seconds.

Signed-off-by: Ron Mercer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qlge: Fix EEH handling.

Clean up driver resources without touch the hardware. Add pci
save/restore state.

Signed-off-by: Ron Mercer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)

Augment raw_send_hdrinc to correct for incorrect ip header length values

A series of oopses was reported to me recently.  Apparently when using AF_RAW
sockets to send data to peers that were reachable via ipsec encapsulation,
people could panic or BUG halt their systems.

I've tracked the problem down to user space sending an invalid ip header over an
AF_RAW socket with IP_HDRINCL set to 1.

Basically what happens is that userspace sends down an ip frame that includes
only the header (no data), but sets the ip header ihl value to a large number,
one that is larger than the total amount of data passed to the sendmsg call.  In
raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr
that was passed in, but assume the data is all valid.  Later during ipsec
encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff
to provide headroom for the ipsec headers.  During this operation, the
skb->transport_header is repointed to a spot computed by
skb->network_header + the ip header length (ihl).  Since so little data was
passed in relative to the value of ihl provided by the raw socket, we point
transport header to an unknown location, resulting in various crashes.

This fix for this is pretty straightforward, simply validate the value of of
iph->ihl when sending over a raw socket.  If (iph->ihl*4U) > user data buffer
size, drop the frame and return -EINVAL.  I just confirmed this fixes the
reported crashes.

Signed-off-by: Neil Horman <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6

bonding: fix a race condition in calls to slave MII ioctls

In mii monitor mode, bond_check_dev_link() calls the the ioctl
handler of slave devices. It stores the ndo_do_ioctl function
pointer to a static (!) ioctl variable and later uses it to call the
handler with the IOCTL macro.

If another thread executes bond_check_dev_link() at the same time
(even with a different bond, which none of the locks prevent), a
race condition occurs. If the two racing slaves have different
drivers, this may result in one driver's ioctl handler being
called with a pointer to a net_device controlled with a different
driver, resulting in unpredictable breakage.

Unless I am overlooking something, the "static" must be a
copy'n'paste error (?).

Signed-off-by: Jiri Bohac <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

param: fix setting arrays of bool

We create a dummy struct kernel_param on the stack for parsing each
array element, but we didn't initialize the flags word. This matters
for arrays of type "bool", where the flag indicates if it really is
an array of bools or unsigned int (old-style).

Reported-by: Takashi Iwai <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Cc: [email protected]

param: fix NULL comparison on oom

kp->arg is always true: it's the contents of that pointer we care about.

Reported-by: Takashi Iwai <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Cc: [email protected]

param: fix lots of bugs with writing charp params from sysfs, by leaking mem.

e180a6b7759a "param: fix charp parameters set via sysfs" fixed the case
where charp parameters written via sysfs were freed, leaving drivers
accessing random memory.

Unfortunately, storing a flag in the kparam struct was a bad idea: it's
rodata so setting it causes an oops on some archs.  But that's not all:

1) module_param_array() on charp doesn't work reliably, since we use an
   uninitialized temporary struct kernel_param.
2) there's a fundamental race if a module uses this parameter and then
   it's changed: they will still access the old, freed, memory.

The simplest fix (ie. for 2.6.32) is to never free the memory.  This
prevents all these problems, at cost of a memory leak.  In practice, there
are only 18 places where a charp is writable via sysfs, and all are
root-only writable.

Reported-by: Takashi Iwai <[email protected]>
Cc: Sitsofe Wheeler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Christof Schmitt <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Cc: [email protected]

virtio: order used ring after used index read

On SMP guests, reads from the ring might bypass used index reads. This
causes guest crashes because host writes to used index to signal ring
data readiness. Fix this by inserting rmb before used ring reads.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Cc: [email protected]

virtio-pci: fix per-vq MSI-X request logic

Commit f68d24082e22ccee3077d11aeb6dc5354f0ca7f1
in 2.6.32-rc1 broke requesting IRQs for per-VQ MSI-X vectors:
- vector number was used instead of the vector itself
- we try to request an IRQ for VQ which does not
have a callback handler

This is a regression that causes warnings in kernel log,
potentially lower performance as we need to scan vq list,
and might cause system failure if the interrupt
requested is in fact needed by another system.

This was not noticed earlier because in most cases
we were falling back on shared interrupt for all vqs.

The warnings often look like this:

virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 28 for MSI/MSI-X
IRQ handler type mismatch for IRQ 1
current handler: i8042
Pid: 2400, comm: modprobe Tainted: G W
2.6.32-rc3-11952-gf3ed8d8-dirty #1
Call Trace:
[<ffffffff81072aed>] ? __setup_irq+0x299/0x304
[<ffffffff81072ff3>] ? request_threaded_irq+0x144/0x1c1
[<ffffffff813455af>] ? vring_interrupt+0x0/0x30
[<ffffffff81346598>] ? vp_try_to_find_vqs+0x583/0x5c7
[<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
[<ffffffff81346609>] ? vp_find_vqs+0x2d/0x83
[<ffffffff81345d00>] ? vp_get+0x3c/0x4e
[<ffffffffa0016373>] ? virtnet_probe+0x2f1/0x428 [virtio_net]
[<ffffffffa0015188>] ? skb_recv_done+0x0/0x34 [virtio_net]
[<ffffffffa00150d8>] ? skb_xmit_done+0x0/0x39 [virtio_net]
[<ffffffff8110ab92>] ? sysfs_do_create_link+0xcb/0x116
[<ffffffff81345cc2>] ? vp_get_status+0x14/0x16
[<ffffffff81345464>] ? virtio_dev_probe+0xa9/0xc8
[<ffffffff8122b11c>] ? driver_probe_device+0x8d/0x128
[<ffffffff8122b206>] ? __driver_attach+0x4f/0x6f
[<ffffffff8122b1b7>] ? __driver_attach+0x0/0x6f
[<ffffffff8122a9f9>] ? bus_for_each_dev+0x43/0x74
[<ffffffff8122a374>] ? bus_add_driver+0xea/0x22d
[<ffffffff8122b4a3>] ? driver_register+0xa7/0x111
[<ffffffffa001a000>] ? init+0x0/0xc [virtio_net]
[<ffffffff81009051>] ? do_one_initcall+0x50/0x148
[<ffffffff8106e117>] ? sys_init_module+0xc5/0x21a
[<ffffffff8100af02>] ? system_call_fastpath+0x16/0x1b
virtio-pci 0000:00:03.0: irq 26 for MSI/MSI-X
virtio-pci 0000:00:03.0: irq 27 for MSI/MSI-X

Reported-by: Marcelo Tosatti <[email protected]>
Reported-by: Shirley Ma <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

sched: move rq_weight data array out of .percpu

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

The per-cpu handling doesn't impose significant performance penalty on
potentially contented path in tg_shares_up().

...
ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
ffffffff81043383:       00 00
ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff81043393:       00
ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff8104339b:       00
ffffffff8104339c:       48 01 d0                add    %rdx,%rax
ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
ffffffff810433a6:       00
ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
...

After:

...
ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
ffffffff81043383:       00
ffffffff81043384:       48 98                   cltq
ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
ffffffff8104338d:       00
ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
ffffffff8104339c:       81
ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff810433a4:       00
ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff810433b5:       00
ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
...

Signed-off-by: Jiri Kosina <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

percpu: allow pcpu_alloc() to be called with IRQs off

pcpu_alloc() and pcpu_extend_area_map() perform a series of
spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe
with respect to being called from contexts which have IRQs off.

This patch converts the code to perform save/restore of flags instead,
making pcpu_alloc() (or __alloc_percpu() respectively) to be called
from early kernel startup stage, where IRQs are off.

This is needed for proper initialization of per-cpu rq_weight data from
sched_init().

tj: added comment explaining why irqsave/restore is used in alloc path.

Signed-off-by: Jiri Kosina <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>

virtio-net: fix data corruption with OOM

virtio net used to unlink skbs from send queues on error,
but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
we do not do this. This causes guest data corruption and crashes
with vhost since net core can requeue the skb or free it without
it being taken off the list.

This patch fixes this by queueing the skb after successful
transmit.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: Set ip_summed correctly for page buffers passed to GRO

Page buffers containing packets with an incorrect checksum or using a
protocol not handled by hardware checksum offload were previously not
passed to LRO. The conversion to GRO changed this, but did not set
the ip_summed value accordingly.

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cnic: Fix L2CTX_STATUSB_NUM offset in context memory.

The BNX2_L2CTX_STATUSB_NUM definition needs to be changed to match
the recent firmware update:

commit 078b0735881c7969aaf21469f3577831cddd9f8c
bnx2: Update firmware to 5.0.0.j3.

Without the fix, bnx2 can crash intermittently in bnx2_rx_int() when
iSCSI is enabled.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: Benjamin Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drbd: fix in_flight rw indexing

Signed-off-by: Jens Axboe <[email protected]>

aio: implement request batching

Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb. Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: get rid of the WRITE_ODIRECT flag

Hi,

The WRITE_ODIRECT flag is only used in one place, and that code path
happens to also call blk_run_address_space. The introduction of this
flag, then, could result in the device being unplugged twice for every
I/O.

Further, with the batching changes in the next patch, we don't want an
O_DIRECT write to imply a queue unplug.

Signed-off-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: fix style issue in cfq_get_avg_queues()

Line breaks and bad brace placement.

Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: fairness for sync no-idle queues

Currently no-idle queues in cfq are not serviced fairly:
even if they can only dispatch a small number of requests at a time,
they have to compete with idling queues to be serviced, experiencing
large latencies.

We should notice, instead, that no-idle queues are the ones that would
benefit most from having low latency, in fact they are any of:
* processes with large think times (e.g. interactive ones like file
  managers)
* seeky (e.g. programs faulting in their code at startup)
* or marked as no-idle from upper levels, to improve latencies of those
  requests.

This patch improves the fairness and latency for those queues, by:
* separating sync idle, sync no-idle and async queues in separate
  service_trees, for each priority
* service all no-idle queues together
* and idling when the last no-idle queue has been serviced, to
  anticipate for more no-idle work
* the timeslices allotted for idle and no-idle service_trees are
  computed proportionally to the number of processes in each set.

Servicing all no-idle queues together should have a performance boost
for NCQ-capable drives, without compromising fairness.

Signed-off-by: Corrado Zoccolo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: enable idling for last queue on priority class

cfq can disable idling for queues in various circumstances.
When workloads of different priorities are competing, if the higher
priority queue has idling disabled, lower priority queues may steal
its disk share. For example, in a scenario with an RT process
performing seeky reads vs a BE process performing sequential reads,
on an NCQ enabled hardware, with low_latency unset,
the RT process will dispatch only the few pending requests every full
slice of service for the BE process.

The patch solves this issue by always performing idle on the last
queue at a given priority class > idle. If the same process, or one
that can pre-empt it (so at the same priority or higher), submits a
new request within the idle window, the lower priority queue won't
dispatch, saving the disk bandwidth for higher priority ones.

Note: this doesn't touch the non_rotational + NCQ case (no hardware
to test if this is a benefit in that case).

Signed-off-by: Corrado Zoccolo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: reimplement priorities using different service trees

We use different service trees for different priority classes.
This allows a simplification in the service tree insertion code, that no
longer has to consider priority while walking the tree.

Signed-off-by: Corrado Zoccolo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: preparation to handle multiple service trees

We embed a pointer to the service tree in each queue, to handle multiple
service trees easily.
Service trees are enriched with a counter.
cfq_add_rq_rb is invoked after putting the rq in the fifo, to ensure
that all fields in rq are properly initialized.

Signed-off-by: Corrado Zoccolo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cfq-iosched: adapt slice to number of processes doing I/O

When the number of processes performing I/O concurrently increases,
a fixed time slice per process will cause large latencies.

This patch, if low_latency mode is enabled, will scale the time slice
assigned to each process according to a 300ms target latency.

In order to keep fairness among processes:
* The number of active processes is computed using a special form of
running average, that quickly follows sudden increases (to keep latency low),
and decrease slowly (to have fairness in spite of rapid decreases of this
value).

To safeguard sequential bandwidth, we impose a minimum time slice
(computed using 2*cfq_slice_idle as base, adjusted according to priority
and async-ness).

Signed-off-by: Corrado Zoccolo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

drm/kms: fix kms/fbdev colormap support properly.

This sets the fbcon to use TRUECOLOR by default, it then
only modifies the pseudo palette for fbcon, and only touches
the real palette when in 8-bit pseudo color mode.

Signed-off-by: Dave Airlie <[email protected]>

drm: Add the basic check for the detailed timing in EDID

Sometimes we will get the incorrect display modeline when parsing the detailed
timing in EDID. For example:
>hsync/vsync width is zero
>sync is beyond the blank.

So add the basic check for the detailed timing in EDID to avoid the incorrect
display modeline.

Signed-off-by: Zhao Yakui <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

drm/radeon/kms: ignore vga arbiter return.

Since we register all radeon devices, and the arbiter only cares about
VGA class ones, we will fail to startup on display controller class devices.
We don't gain anything by using the return value here.

this helps kms on sparc64 get started.

Reported-by: David S. Miller <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>

xen: set up mmu_ops before trying to set any ptes

xen_setup_stackprotector() ends up trying to set page protections,
so we need to have vm_mmu_ops set up before trying to do so.
Failing to do so causes an early boot crash.

[ Impact: Fix early crash under Xen. ]

Signed-off-by: Jeremy Fitzhardinge <[email protected]>

Merge commit 'gcl/merge' into merge

MAINTAINERS: rt2x00 list is moderated

Cc: [email protected]
Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: John W. Linville <[email protected]>