Linus Torvalds [Wed, 21 Dec 2011 02:39:37 +0000 (18:39 -0800)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
mtd: plat_ram: call mtd_device_register only if partition data exists
mtd: pxa2xx-flash.c: It used to fall back to provided table.
mtd: gpmi: add missing include 'module.h'
mtd: ndfc: fix typo in structure dereference
Linus Torvalds [Tue, 20 Dec 2011 19:42:38 +0000 (11:42 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/clocksource: Fix kernel-doc warnings
rtc: m41t80: Workaround broken alarm functionality
rtc: Expire alarms after the time is set.
Linus Torvalds [Tue, 20 Dec 2011 19:41:17 +0000 (11:41 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Linus Torvalds [Tue, 20 Dec 2011 19:40:48 +0000 (11:40 -0800)]
Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
Linus Torvalds [Tue, 20 Dec 2011 19:31:56 +0000 (11:31 -0800)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a regression in nfs_file_llseek()
NFSv4: Do not accept delegated opens when a delegation recall is in effect
NFSv4: Ensure correct locking when accessing the 'lock_states' list
NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot
SUNRPC: Fix the execution time statistics in the face of RPC restarts
Linus Torvalds [Tue, 20 Dec 2011 19:31:44 +0000 (11:31 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: Clip cliprects against screen boundaries in present and dirty
vmwgfx: Resend the cursor after legacy modeset
vmwgfx: Do better culling of presents
vmwgfx: Refactor kms code to use vmw_user_lookup_handle helper
vmwgfx: Add helper function to get surface or dmabuf
vmwgfx: Refactor cursor update
vmwgfx: Remove dmabuf check in present ioctl
vmwgfx: Use the revised fifo hw version register when present
Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.
Einar Lueck [Mon, 19 Dec 2011 22:56:36 +0000 (22:56 +0000)]
qeth: recovery through asynchronous delivery
If recovery is triggered in presence of pending asynchronous
deliveries of storage blocks we do a forced cleanup after
the corresponding tasklets are completely stopped and trigger
appropriate notifications for the correspondingerror state.
Frank Blaschka [Mon, 19 Dec 2011 22:56:35 +0000 (22:56 +0000)]
qeth: improve recovery during resource shortage
In case there are no system resources to run a recovery we have to clear
recovery bitmasks so a further automatic or manual driven recovery can
fix up the device.
Ursula Braun [Mon, 19 Dec 2011 22:56:34 +0000 (22:56 +0000)]
netiucv: allow multiple interfaces to same peer
The NETIUCV device driver allows to connect a Linux guest on z/VM to
another z/VM guest based on the z/VM communication facility IUCV.
Multiple output paths to different guests are possible, as well as
multiple input paths from different guests.
With this feature, you can configure multiple point-to-point NETIUCV
interfaces between your Linux on System z instance and another z/VM
guest.
Ursula Braun [Mon, 19 Dec 2011 22:56:32 +0000 (22:56 +0000)]
qeth: suspicious rcu_dereference_check in recovery
qeth layer3 recovery invokes its set_multicast_list function, which
invokes function __vlan_find_dev_deep requiring rcu_read_lock or
rtnl lock. This causes kernel messages:
Ursula Braun [Mon, 19 Dec 2011 22:56:31 +0000 (22:56 +0000)]
af_iucv: get rid of state IUCV_SEVERED
af_iucv differs unnecessarily between state IUCV_SEVERED and
IUCV_DISCONN. This patch removes state IUCV_SEVERED.
While simplifying af_iucv, this patch removes the 2nd invocation of
cpcmd as well.
Ursula Braun [Mon, 19 Dec 2011 22:56:29 +0000 (22:56 +0000)]
af_iucv: release reference to HS device
For HiperSockets transport skbs sent are bound to one of the
available HiperSockets devices. Add missing release of reference to
a HiperSockets device before freeing an skb.
Ursula Braun [Mon, 19 Dec 2011 22:56:28 +0000 (22:56 +0000)]
af_iucv: accelerate close for HS transport
Closing an af_iucv socket may wait for confirmation of outstanding
send requests. This patch adds confirmation code for the new
HiperSockets transport.
Thomas Graf [Mon, 19 Dec 2011 04:11:40 +0000 (04:11 +0000)]
sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.
The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.
When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.
Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.
The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.
Yevgeny Petrilin [Mon, 19 Dec 2011 21:53:38 +0000 (21:53 +0000)]
mlx4_en: FIX: Setting default_qpn before using it
When UDP RSS is enabled, we use same QPN for TCP and UDP ranges
The bug is that the default_qpn was used base UDP qpn before it
was set.
Fixes bug introduced in commit: 1202d460b1df3a77fda66f4ba5f90ae3527dd42f
binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()
The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.
This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.
where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000). So there could be an int overflow while computing
176 points *= 1000;
and points may have negative value. Meaning the oom score for a mem hog task
will be one.
196 if (points <= 0)
197 return 1;
For example:
[ 3366] 0 3366 3539048024303939 5 0 0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.
In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.
The points variable should be of type long instead of int to prevent the
int overflow.
Haogang Chen [Tue, 20 Dec 2011 01:11:56 +0000 (17:11 -0800)]
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
There is a potential integer overflow in nilfs_ioctl_clean_segments().
When a large argv[n].v_nmembs is passed from the userspace, the subsequent
call to vmalloc() will allocate a buffer smaller than expected, which
leads to out-of-bound access in nilfs_ioctl_move_blocks() and
lfs_clean_segments().
The following check does not prevent the overflow because nsegs is also
controlled by the userspace and could be very large.
if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
goto out_free;
This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
returns -EINVAL when overflow.
David Rientjes [Tue, 20 Dec 2011 01:11:52 +0000 (17:11 -0800)]
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.
c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by 89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Axel Lin [Fri, 9 Dec 2011 03:27:55 +0000 (11:27 +0800)]
mfd: Include linux/io.h to jz4740-adc
Include linux/io.h to fix below build error:
CC drivers/mfd/jz4740-adc.o
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_irq_demux':
drivers/mfd/jz4740-adc.c:73: error: implicit declaration of function 'readb'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_enabled':
drivers/mfd/jz4740-adc.c:110: error: implicit declaration of function 'writeb'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_config':
drivers/mfd/jz4740-adc.c:146: error: implicit declaration of function 'readl'
drivers/mfd/jz4740-adc.c:151: error: implicit declaration of function 'writel'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_probe':
drivers/mfd/jz4740-adc.c:249: error: implicit declaration of function 'ioremap_nocache'
drivers/mfd/jz4740-adc.c:249: warning: assignment makes pointer from integer without a cast
drivers/mfd/jz4740-adc.c:289: warning: passing argument 3 of 'mfd_add_devices' discards qualifiers from pointer target type
include/linux/mfd/core.h:93: note: expected 'struct mfd_cell *' but argument is of type 'const struct mfd_cell *'
drivers/mfd/jz4740-adc.c:299: error: implicit declaration of function 'iounmap'
make[2]: *** [drivers/mfd/jz4740-adc.o] Error 1
make[1]: *** [drivers/mfd] Error 2
make: *** [drivers] Error 2
NeilBrown [Sat, 26 Nov 2011 20:17:41 +0000 (07:17 +1100)]
mfd: Use request_threaded_irq for twl4030-irq instead of irq_set_chained_handler
irq_set_chained_handler sets 'desc->handle_irq'.
However this irq is called by handle_nested_irq from handle_twl4030_pih,
and that uses action->thread_fn.
So the handled set with irq_set_chained_handler is never called.
So change to use request_threaded_irq instead - that sets the correct field.
NeilBrown [Sat, 26 Nov 2011 20:17:41 +0000 (07:17 +1100)]
mfd: Base interrupt for twl4030-irq must be one-shot
As the interrupt source is only cleared by the threaded interrupt
service routine, we need to make the base interrupt IRQF_ONESHOT.
Without this, the first interrupt from the TWL4030 cause the CPU to
enter an infinite loop trying to handle to interrupt but never
clearing it.
Axel Lin [Mon, 7 Nov 2011 03:20:09 +0000 (11:20 +0800)]
mfd: include linux/module.h for ab5500-debugfs
Include linux/module.h to fix below build error:
CC drivers/mfd/ab5500-debugfs.o
drivers/mfd/ab5500-debugfs.c:571: error: 'THIS_MODULE' undeclared here (not in a function)
make[2]: *** [drivers/mfd/ab5500-debugfs.o] Error 1
Axel Lin [Mon, 31 Oct 2011 06:24:30 +0000 (14:24 +0800)]
mfd: Set tps6586x bits if new value is different from the old one
It does not make sense to write new value only when all the bit_mask
bits are zero.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Axel Lin [Mon, 31 Oct 2011 06:23:03 +0000 (14:23 +0800)]
mfd: Set da903x bits if new value is different from the old one
It does not make sense to write new value only when all the bit_mask
bits are zero.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Axel Lin [Mon, 31 Oct 2011 03:00:06 +0000 (11:00 +0800)]
mfd: Set adp5520 bits if new value is different from the old one
Current code checks if all the bit_mask bits are all zero is wrong.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Dmitry Kasatkin [Mon, 5 Dec 2011 11:17:41 +0000 (13:17 +0200)]
evm: key must be set once during initialization
On multi-core systems, setting of the key before every caclculation,
causes invalid HMAC calculation for other tfm users, because internal
state (ipad, opad) can be invalid before set key call returns.
It needs to be set only once during initialization.
Ohad Ben-Cohen [Mon, 19 Dec 2011 23:51:38 +0000 (15:51 -0800)]
Revert "mmc: enable runtime PM by default"
When SDIO runtime PM was originally introduced, we immediately faced
two regressions with two different chipsets, and in response decided
not to enable it by default.
With the recent work on the 8686 we hoped we found all the gotchas,
so 08da834 did make sense (at least experimentally).
Unfortunately we now see that some setups out there still refuse to
work when SDIO runtime PM is enabled by default (see
http://www.spinics.net/lists/linux-mmc/msg11161.html), and obviously
we can't live with these kind of regressions.
Linus Torvalds [Mon, 19 Dec 2011 23:13:53 +0000 (15:13 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
IB/mlx4: Fix shutdown crash accessing a non-existent bitmap
Linus Torvalds [Mon, 19 Dec 2011 23:11:12 +0000 (15:11 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - fix touchpad not working after S2R on Vostro V13
Input: cma3000_d0x - fix signedness bug in cma3000_thread_irq()
Input: wacom - add product id used by Samsung Slate 7
Xi Wang [Fri, 16 Dec 2011 12:44:15 +0000 (12:44 +0000)]
sctp: fix incorrect overflow check on autoclose
Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value. If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.
This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.
1) Avoid overflowing autoclose * HZ.
2) Keep the default autoclose bound consistent across 32- and 64-bit
platforms (INT_MAX / HZ in this patch).
3) Keep the autoclose value consistent between setsockopt() and
getsockopt() calls.
Clemens Ladisch [Mon, 19 Dec 2011 21:07:58 +0000 (22:07 +0100)]
x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
When printing the code bytes in show_registers(), the markers around the
byte at the fault address could make the printk() format string look
like a valid log level and facility code. This would prevent this byte
from being printed and result in a spurious newline:
Alex Juncu [Thu, 15 Dec 2011 23:01:25 +0000 (23:01 +0000)]
llc: llc_cmsg_rcv was getting called after sk_eat_skb.
Received non stream protocol packets were calling llc_cmsg_rcv that used a
skb after that skb was released by sk_eat_skb. This caused received STP
packets to generate kernel panics.
Matt Carlson [Fri, 16 Dec 2011 13:33:23 +0000 (13:33 +0000)]
tg3: Make the RSS indir tbl admin configurable
This patch adds the ethtool callbacks necessary to change the rss
indirection table from userspace. Should the number of interrupts
change (e.g. across a close / open call, or through a reset) and
any one of the indirection table values fall out-of-range, the driver
will reset the indirection table to a default layout.
[Integrated many suggestions made by Ben Hutchings.]
Changes since v3
* Removed TG3_FLAG_SUPPORT_MSIX checks at the start of
tg3_get_rxfh_indir() and tg3_set_rxfh_indir().
Michael Chan [Sun, 18 Dec 2011 18:15:09 +0000 (18:15 +0000)]
bnx2: Update driver to use new mips firmware.
bnx2-mips-06-6.2.3 and bnx2-mips-09-6.2.1.b
New firmware fixes iSCSI problems with some LeftHand targets that don't
set TTT=0xffffffff for Data-In according to spec. Firmware generates
exception warnings for this condition and becomes very slow. This is
fixed by suppressing these warnings when using default error mask.
Lancer does not have HW registers to indicate the EQ causing the INTx
interrupt. As a result EQE entries of one EQ may be consumed when interrupt
is caused by another EQ. Fix this by arming CQs at the end of NAPI poll
routine to regenerate the EQEs.
Yevgeny Petrilin [Mon, 19 Dec 2011 04:03:53 +0000 (04:03 +0000)]
mlx4: Fixing wrong error codes in communication channel
The communication channel is HW interface from PF point of view
So the command return status should be stored as HW error code
and only then translated to errno values.
Yevgeny Petrilin [Mon, 19 Dec 2011 04:00:34 +0000 (04:00 +0000)]
mlx4_core: Changing link sensing logic
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.
2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
IB or Ethernet.
The driver will use this hint in case the user didn't specify explicitly the link layer
type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
and default the port to link sensing
Yevgeny Petrilin [Mon, 19 Dec 2011 04:00:26 +0000 (04:00 +0000)]
mlx4: capability for link sensing
For ConnectX3 devices, we allow link sensing only if FW explicitly
reported it supports the feature.
For older versions (ConnectX1 and 2), if the card supports both link layer types
(Ethenet and Infiniband), link sensing is supported.
Sean Hefty [Tue, 6 Dec 2011 21:17:11 +0000 (21:17 +0000)]
RDMA/cma: Verify private data length
private_data_len is defined as a u8. If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0. This
can lead to overwriting random kernel memory. Avoid this by verifying
that the resulting size fits into a u8.
Reported-by: B. Thery <[email protected]>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335> Signed-off-by: Sean Hefty <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
cgroups: fix a css_set not found bug in cgroup_attach_proc
There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.
This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.
$ cat zombie.c
\#include <unistd.h>
int main()
{
if (fork())
pause();
return 0;
}
$
We are hitting this bug pretty regularly on ChromeOS.
This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:
https://lkml.org/lkml/2011/11/1/356
I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.
Robert Richter [Mon, 19 Dec 2011 15:38:30 +0000 (16:38 +0100)]
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
If oprofilefs_ulong_from_user() is called with count equals
zero, *val remains unchanged. Depending on the implementation it
might be uninitialized.
Change oprofilefs_ulong_from_user()'s interface to return count
on success. Thus, we are able to return early if count equals
zero which avoids using *val uninitialized. Fixing all users of
oprofilefs_ulong_ from_user().
This follows write syscall implementation when count is zero:
"If count is zero ... [and if] no errors are detected, 0 will be
returned without causing any other effect." (man 2 write)
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
This reverts commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05.
As when booting the kernel under Amazon EC2 as an HVM guest it ends up
hanging during startup. Reverting this we loose the fix for kexec
booting to the crash kernels.
Felipe Balbi [Mon, 19 Dec 2011 11:45:01 +0000 (13:45 +0200)]
usb: gadget: epautoconf: do not change number of streams
We should not change gadget driver's descriptors just
because we think it's right to do so.
There are several of reasons which would support this
statement but it suffices to say that this was probably
never tested because it updates bmAttributes without
asking the driver if it's ok to do so.
This means that e.g. on UASP gadget it would enable
stream support even for the command endpoint which must
not have stream support enabled.
In fact, this change is fixing the bug introduced by
commit a59d6b9 (usb: gadget: add streams support to
the gadget framework) which was caught when testing
UASP gadget with dwc3 driver.