Glauber Costa [Thu, 26 Jan 2012 12:09:28 +0000 (12:09 +0000)]
net: explicitly add jump_label.h header to sock.h
Commit 36a1211970193ce215de50ed1e4e1272bc814df1 removed linux/module.h
include statement from one of the headers that end up in net/sock.h.
It was providing us with static_branch() definition implicitly, so
after its removal the build got broken.
To fix this, and avoid having this happening in the future,
let me do the right thing and include linux/jump_label.h
explicitly in sock.h.
Willem de Bruijn [Thu, 26 Jan 2012 10:34:35 +0000 (10:34 +0000)]
ipv6: Fix ip_gre lockless xmits.
Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.
modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0
The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').
On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).
The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:
ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873
This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.
The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.
Eric Dumazet [Thu, 26 Jan 2012 00:41:38 +0000 (00:41 +0000)]
netns: fix net_alloc_generic()
When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :
James Chapman [Wed, 25 Jan 2012 02:39:05 +0000 (02:39 +0000)]
l2tp: l2tp_ip - fix possible oops on packet receive
When a packet is received on an L2TP IP socket (L2TPv3 IP link
encapsulation), the l2tpip socket's backlog_rcv function calls
xfrm4_policy_check(). This is not necessary, since it was called
before the skb was added to the backlog. With CONFIG_NET_NS enabled,
xfrm4_policy_check() will oops if skb->dev is null, so this trivial
patch removes the call.
This bug has always been present, but only when CONFIG_NET_NS is
enabled does it cause problems. Most users are probably using UDP
encapsulation for L2TP, hence the problem has only recently
surfaced.
14) skge carrier assertion and DMA mapping fixes from Stephen Hemminger.
15) Congestion recovery undo performed at the wrong spot in BIC and CUBIC
congestion control modules, fix from Neal Cardwell.
16) Ethtool ETHTOOL_GSSET_INFO is unnecessarily restrictive, from Michał Mirosław.
17) Fix triggerable race in ipv6 sysctl handling, from Francesco Ruggeri.
18) Statistics bug fixes in mlx4 from Eugenia Emantayev.
19) rds locking bug fix during info dumps, from your's truly.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
rds: Make rds_sock_lock BH rather than IRQ safe.
netprio_cgroup.h: dont include module.h from other includes
net: flow_dissector.c missing include linux/export.h
team: send only changed options/ports via netlink
net/hyperv: fix possible memory leak in do_set_multicast()
drivers/net: dsa/mv88e6xxx.c files need linux/module.h
stmmac: added PCI identifiers
llc: Fix race condition in llc_ui_recvmsg
stmmac: fix phy naming inconsistency
dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches.
tg3: fix ipv6 header length computation
skge: add byte queue limit support
mv643xx_eth: Add Rx Discard and Rx Overrun statistics
bnx2x: fix compilation error with SOE in fw_dump
bnx2x: handle CHIP_REVISION during init_one
bnx2x: allow user to change ring size in ISCSI SD mode
bnx2x: fix Big-Endianess in ethtool -t
bnx2x: fixed ethtool statistics for MF modes
bnx2x: credit-leakage fixup on vlan_mac_del_all
macvlan: fix a possible use after free
...
David S. Miller [Tue, 24 Jan 2012 22:03:44 +0000 (17:03 -0500)]
rds: Make rds_sock_lock BH rather than IRQ safe.
rds_sock_info() triggers locking warnings because we try to perform a
local_bh_enable() (via sock_i_ino()) while hardware interrupts are
disabled (via taking rds_sock_lock).
There is no reason for rds_sock_lock to be a hardware IRQ disabling
lock, none of these access paths run in hardware interrupt context.
Therefore making it a BH disabling lock is safe and sufficient to
fix this bug.
Paul Gortmaker [Tue, 24 Jan 2012 11:33:19 +0000 (11:33 +0000)]
netprio_cgroup.h: dont include module.h from other includes
A considerable effort was invested in wiping out module.h
from being present in all the other standard includes. This
one leaked back in, but once again isn't strictly necessary,
so remove it.
Jiri Pirko [Tue, 24 Jan 2012 05:16:00 +0000 (05:16 +0000)]
team: send only changed options/ports via netlink
This patch changes event message behaviour to send only updated records
instead of whole list. This fixes bug on which userspace receives non-actual
data in case multiple events occur in row.
Paul Gortmaker [Tue, 24 Jan 2012 10:41:40 +0000 (10:41 +0000)]
drivers/net: dsa/mv88e6xxx.c files need linux/module.h
An implicit instance of module.h leaked back into existence
and was masking the fact that these drivers weren't calling
out the include for itself. Fix the drivers before we remove
the implicit include path via net/netprio_cgroup.h file.
After commit "db8857b stmmac: use an unique MDIO bus name" my
device stopped being probed because two different names were being
used in different places. This fixes the inconsistency.
Linus Torvalds [Tue, 24 Jan 2012 20:12:40 +0000 (12:12 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Pass information that quota is stored in system file to userspace
ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
jbd: Issue cache flush after checkpointing
Johannes Berg [Thu, 19 Jan 2012 16:20:57 +0000 (08:20 -0800)]
iwlwifi: fix PCI-E transport "inta" race
When an interrupt comes in, we read the reason
bits and collect them into "trans_pcie->inta".
This happens with the spinlock held. However,
there's a bug resetting this variable -- that
happens after the spinlock has been released.
This means that it is possible for interrupts
to be missed if the reset happens after some
other interrupt reasons were already added to
the variable.
I found this by code inspection, looking for a
reason that we sometimes see random commands
time out. It seems possible that this causes
such behaviour, but I can't say for sure right
now since it happens extremely infrequently on
my test systems.
Eliad Peller [Wed, 11 Jan 2012 11:11:50 +0000 (13:11 +0200)]
mac80211: set bss_conf.idle when vif is connected
__ieee80211_recalc_idle() iterates through the vifs,
sets bss_conf.idle = true if they are disconnected,
and increases "count" if they are not (which later
gets evaluated in order to determine whether the
device is idle).
However, the loop doesn't set bss_conf.idle = false
(along with increasing "count"), causing the device
idle state and the vif idle state to get out of sync
in some cases.
Eliad Peller [Tue, 10 Jan 2012 13:19:54 +0000 (15:19 +0200)]
mac80211: update oper_channel on ibss join
Commit 13c40c5 ("mac80211: Add HT operation modes for IBSS") broke
ibss operation by mistakenly removing the local->oper_channel
update (causing ibss to start on the wrong channel). fix it.
Mark Brown [Tue, 24 Jan 2012 11:17:26 +0000 (11:17 +0000)]
regulator: Fix documentation for of_node parameter of regulator_register()
Commit 5bc75a886353 ("kernel-doc: fix new warning in regulator core")
added documentation for of_node to address a warning but the
documentation didn't explain what the parameter is for so would be
likely to be unhelpful for users. Clarify that.
Linus Torvalds [Mon, 23 Jan 2012 23:11:27 +0000 (15:11 -0800)]
Merge tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Power management fixes for 3.3
Two fixes for regressions introduced during the merge window, one fix for
a long-standing obscure issue in the computation of hibernate image size
and two small PM documentation fixes.
* tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Sleep: Fix read_unlock_usermodehelper() call.
PM / Hibernate: Rewrite unlock_system_sleep() to fix s2disk regression
PM / Hibernate: Correct additional pages number calculation
PM / Documentation: Fix minor issue in freezing_of_tasks.txt
PM / Documentation: Fix spelling mistake in basic-pm-debugging.txt
Linus Torvalds [Mon, 23 Jan 2012 22:50:30 +0000 (14:50 -0800)]
Merge tag 'arm-soc-imx-move' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Consolidate i.MX 5 platforms to be under the new shared i.MX 3/5/6 tree.
* tag 'arm-soc-imx-move' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM i.MX: Update defconfig
ARM i.MX: Merge i.MX5 support into mach-imx
ARM i.MX5: remove unnecessary includes from board files
Fix up fairly trivial conflicts due to various changes nearby in
arch/arm/{mach,plat}-imx/{Kconfig,Makefile}
Pull request had been sent to the wrong email address, but happened
before the merge window closed. I'm merging the MX 5 consolidation,
since it apparently will help the next development window and will avoid
conflicts later as per Arnd.
Commit b298d289
"PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()"
added read_unlock_usermodehelper() but read_unlock_usermodehelper() is called
without read_lock_usermodehelper() when kmalloc() failed.
Paulius Zaleckas [Mon, 23 Jan 2012 01:16:35 +0000 (01:16 +0000)]
mv643xx_eth: Add Rx Discard and Rx Overrun statistics
These statistics helped me a lot while searching who is losing
packets in my setup.
I added these stats to MIB group since they are very similar,
but just in other registers.
I have tested this patch on 88F6281 SoC.
Ariel Elior [Mon, 23 Jan 2012 07:31:55 +0000 (07:31 +0000)]
bnx2x: handle CHIP_REVISION during init_one
The macro `CHIP_IS_E1x' requires `bp' to be initialized.
As `bp' is not yet initialized during this phase of `bnx2x_init_dev',
it accessed uninitialized fields in the struct.
Yuval Mintz [Mon, 23 Jan 2012 07:31:51 +0000 (07:31 +0000)]
bnx2x: credit-leakage fixup on vlan_mac_del_all
Upon insertion of elements into the execution queue, it is validated
that there are enough credits to support additional vlan-macs,
and the credits are consumed. However, when removing a pending
command in `bnx2x_vland_mac_del_all' the consumed credits are not
released, which might cause leakage and eventually the inability to
add new vlan-macs in certain scenarios.
Eric Dumazet [Mon, 23 Jan 2012 05:38:59 +0000 (05:38 +0000)]
macvlan: fix a possible use after free
Commit bc416d9768 (macvlan: handle fragmented multicast frames) added a
possible use after free in macvlan_handle_frame(), since
ip_check_defrag() uses pskb_may_pull() : skb header can be reallocated.
Linus Torvalds [Mon, 23 Jan 2012 18:08:08 +0000 (10:08 -0800)]
Merge branch 'kernel-doc' from Randy Dunlap
The usual kernel-doc fixups from Randy. Some of them David acked as
merged in his tree, this is the random left-overs.
* kernel-doc:
docbook: fix sched source file names in device-drivers book
docbook: change iomap source filename in deviceiobook
docbook: don't use serial_core.h in device-drivers book
kernel-doc: fix kernel-doc warnings in sched
kernel-doc: fix new warnings in cfg80211.h
kernel-doc: fix new warning in usb.h
kernel-doc: fix new warnings in device.h
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warning in regulator core
kernel-doc: fix new warnings in pci
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in auditsc.c
scripts/kernel-doc: fix fatal error caused by cfg80211.h
Linus Torvalds [Mon, 23 Jan 2012 17:27:54 +0000 (09:27 -0800)]
Merge branch 'akpm'
Quoth Andrew:
"Random fixes. And a simple new LED driver which I'm trying to sneak
in while you're not looking."
Sneaking successful.
* akpm:
score: fix off-by-one index into syscall table
mm: fix rss count leakage during migration
SHM_UNLOCK: fix Unevictable pages stranded after swap
SHM_UNLOCK: fix long unpreemptible section
kdump: define KEXEC_NOTE_BYTES arch specific for s390x
mm/hugetlb.c: undo change to page mapcount in fault handler
mm: memcg: update the correct soft limit tree during migration
proc: clear_refs: do not clear reserved pages
drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails
drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
kprobes: initialize before using a hlist
ipc/mqueue: simplify reading msgqueue limit
leds: add led driver for Bachmann's ot200
mm: __count_immobile_pages(): make sure the node is online
mm: fix NULL ptr dereference in __count_immobile_pages
mm: fix warnings regarding enum migrate_mode
Takashi Iwai [Mon, 23 Jan 2012 17:23:36 +0000 (18:23 +0100)]
ALSA: hda - Fix silent outputs from docking-station jacks of Dell laptops
The recent change of the power-widget handling for IDT codecs caused
the silent output from the docking-station line-out jack. This was
partially fixed by the commit f2cbba7602383cd9cdd21f0a5d0b8bd1aad47b33
"ALSA: hda - Fix the lost power-setup of seconary pins after PM resume".
But the line-out on the docking-station is still silent when booted
with the jack plugged even by this fix.
The remainig bug is that the power-widget is set off in stac92xx_init()
because the pins in cfg->line_out_pins[] aren't checked there properly
but only hp_pins[] are checked in is_nid_hp_pin().
This patch fixes the problem by checking both HP and line-out pins
and leaving the power-map correctly.
Linus Torvalds [Mon, 23 Jan 2012 16:59:49 +0000 (08:59 -0800)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
CIFS: Rename *UCS* functions to *UTF16*
[CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
[CIFS] Fix build break with multiuser patch when LANMAN disabled
cifs: warn about impending deprecation of legacy MultiuserMount code
cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
cifs: sanitize username handling
keys: add a "logon" key type
cifs: lower default wsize when unix extensions are not used
cifs: better instrumentation for coalesce_t2
cifs: integer overflow in parse_dacl()
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Randy Dunlap [Sat, 21 Jan 2012 19:03:13 +0000 (11:03 -0800)]
kernel-doc: fix kernel-doc warnings in sched
Fix new kernel-doc notation warnings:
Warning(include/linux/sched.h:2094): No description found for parameter 'p'
Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'
Randy Dunlap [Sat, 21 Jan 2012 19:03:00 +0000 (11:03 -0800)]
kernel-doc: fix new warnings in cfg80211.h
Fix new kernel-doc warnings:
Warning(include/net/cfg80211.h:1165): No description found for parameter 'channel_type'
Warning(include/net/cfg80211.h:2090): No description found for parameter 'probe_resp_offload'
Randy Dunlap [Sat, 21 Jan 2012 19:02:51 +0000 (11:02 -0800)]
kernel-doc: fix new warnings in device.h
Fix new kernel-doc warnings:
Warning(include/linux/device.h:299): No description found for parameter 'name'
Warning(include/linux/device.h:299): No description found for parameter 'subsys'
Warning(include/linux/device.h:299): No description found for parameter 'node'
Warning(include/linux/device.h:299): No description found for parameter 'add_dev'
Warning(include/linux/device.h:299): No description found for parameter 'remove_dev'
Warning(include/linux/device.h:685): No description found for parameter 'id'
Warning(include/linux/device.h:1009): No description found for parameter '__driver'
Warning(include/linux/device.h:1009): No description found for parameter '__register'
Warning(include/linux/device.h:1009): No description found for parameter '__unregister'
Randy Dunlap [Sat, 21 Jan 2012 19:02:42 +0000 (11:02 -0800)]
kernel-doc: fix new warnings in debugfs
Fix new kernel-doc warnings:
Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'
Randy Dunlap [Sat, 21 Jan 2012 19:02:35 +0000 (11:02 -0800)]
kernel-doc: fix new warnings in pci
Fix new kernel-doc warnings:
Warning(drivers/pci/pci.c:2811): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2811): Excess function parameter 'pdev' description in 'pci_intx_mask_supported'
Warning(drivers/pci/pci.c:2894): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2894): Excess function parameter 'pdev' description in 'pci_check_and_mask_intx'
Warning(drivers/pci/pci.c:2908): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2908): Excess function parameter 'pdev' description in 'pci_check_and_unmask_intx'
Randy Dunlap [Sat, 21 Jan 2012 19:02:28 +0000 (11:02 -0800)]
kernel-doc: fix new warnings in driver-core
Fix new kernel-doc warnings:
Warning(drivers/base/bus.c:925): No description found for parameter 'key'
Warning(drivers/base/bus.c:1241): No description found for parameter 'subsys'
Warning(drivers/base/bus.c:1241): No description found for parameter 'groups'
Randy Dunlap [Sat, 21 Jan 2012 19:02:24 +0000 (11:02 -0800)]
kernel-doc: fix new warnings in auditsc.c
Fix new kernel-doc warnings in auditsc.c:
Warning(kernel/auditsc.c:1875): No description found for parameter 'success'
Warning(kernel/auditsc.c:1875): No description found for parameter 'return_code'
Warning(kernel/auditsc.c:1875): Excess function parameter 'pt_regs' description in '__audit_syscall_exit'
Randy Dunlap [Sat, 21 Jan 2012 18:31:54 +0000 (10:31 -0800)]
scripts/kernel-doc: fix fatal error caused by cfg80211.h
include/net/cfg80211.h uses __must_check in functions that
have kernel-doc notation. This was confusing scripts/kernel-doc,
so have scripts/kernel-doc ignore "__must_check".
Dan Rosenberg [Fri, 20 Jan 2012 22:34:27 +0000 (14:34 -0800)]
score: fix off-by-one index into syscall table
If the provided system call number is equal to __NR_syscalls, the
current check will pass and a function pointer just after the system
call table may be called, since sys_call_table is an array with total
size __NR_syscalls.
Whether or not this is a security bug depends on what the compiler puts
immediately after the system call table. It's likely that this won't do
anything bad because there is an additional NULL check on the syscall
entry, but if there happens to be a non-NULL value immediately after the
system call table, this may result in local privilege escalation.
Memory migration fills a pte with a migration entry and it doesn't
update the rss counters. Then it replaces the migration entry with the
new page (or the old one if migration failed). But between these two
passes this pte can be unmaped, or a task can fork a child and it will
get a copy of this migration entry. Nobody accounts for this in the rss
counters.
This patch properly adjust rss counters for migration entries in
zap_pte_range() and copy_one_pte(). Thus we avoid extra atomic
operations on the migration fast-path.
Hugh Dickins [Fri, 20 Jan 2012 22:34:21 +0000 (14:34 -0800)]
SHM_UNLOCK: fix Unevictable pages stranded after swap
Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in
find_get_pages") correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers to
mean end of tree) when it meets a run of nr_pages swap entries.
The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages(). The first is already commented, and
not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.
Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
swap) instead of pagevec_lookup().
But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(), looping
down an array of pages, oftentimes under the same lock.
Leave out the "rotate unevictable list" block: that's a leftover from
when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
handling involved looking at pages at tail of LRU.
Was there significance to the sequence first ClearPageUnevictable, then
test page_evictable, then SetPageUnevictable here? I think not, we're
under LRU lock, and have no barriers between those.
Hugh Dickins [Fri, 20 Jan 2012 22:34:19 +0000 (14:34 -0800)]
SHM_UNLOCK: fix long unpreemptible section
scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
evictable again once the shared memory is unlocked. It does this with
pagevec_lookup()s across the whole object (which might occupy most of
memory), and takes 300ms to unlock 7GB here. A cond_resched() every
PAGEVEC_SIZE pages would be good.
However, KOSAKI-san points out that this is called under shmem.c's
info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
There is no strong reason for that: we need to take these pages off the
unevictable list soonish, but those locks are not required for it.
So move the call to scan_mapping_unevictable_pages() from shmem.c's
unlock handling up to shm.c's unlock handling. Remove the recently
added barrier, not needed now we have spin_unlock() before the scan.
Use get_file(), with subsequent fput(), to make sure we have a reference
to mapping throughout scan_mapping_unevictable_pages(): that's something
that was previously guaranteed by the shm_lock().
Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
time, and we lazily discover them to be Unevictable later, so it serves
no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
pages still on pagevec are not marked Unevictable.
The original code avoided redundant rescans by checking VM_LOCKED flag
at its level: now avoid them by checking shp's SHM_LOCKED.
The original code called scan_mapping_unevictable_pages() on a locked
area at shm_destroy() time: perhaps we once had accounting cross-checks
which required that, but not now, so skip the overhead and just let
inode eviction deal with them.
Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
more as comment than to save space; comment them used for SHM_UNLOCK.
Michael Holzheu [Fri, 20 Jan 2012 22:34:16 +0000 (14:34 -0800)]
kdump: define KEXEC_NOTE_BYTES arch specific for s390x
kdump only allocates memory for the prstatus ELF note. For s390x,
besides of prstatus multiple ELF notes for various different register
types are stored. Therefore the currently allocated memory is not
sufficient. With this patch the KEXEC_NOTE_BYTES macro can be defined
by architecture code and for s390x it is set to the correct size now.
Hillf Danton [Fri, 20 Jan 2012 22:34:13 +0000 (14:34 -0800)]
mm/hugetlb.c: undo change to page mapcount in fault handler
Page mapcount should be updated only if we are sure that the page ends
up in the page table otherwise we would leak if we couldn't COW due to
reservations or if idx is out of bounds.
Johannes Weiner [Fri, 20 Jan 2012 22:34:12 +0000 (14:34 -0800)]
mm: memcg: update the correct soft limit tree during migration
end_migration() passes the old page instead of the new page to commit
the charge. This page descriptor is not used for committing itself,
though, since we also pass the (correct) page_cgroup descriptor. But
it's used to find the soft limit tree through the page's zone, so the
soft limit tree of the old page's zone is updated instead of that of the
new page's, which might get slightly out of date until the next charge
reaches the ratelimit point.
This glitch has been present since 5564e88 ("memcg: condense
page_cgroup-to-page lookup points").
This fixes a bug that I introduced in 2.6.38. It's benign enough (to my
knowledge) that we probably don't want this for stable.
Will Deacon [Fri, 20 Jan 2012 22:34:09 +0000 (14:34 -0800)]
proc: clear_refs: do not clear reserved pages
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.
On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.
To see this problem in action, just run:
$ echo 1 > /proc/self/clear_refs
on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37
This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.
Axel Lin [Fri, 20 Jan 2012 22:34:05 +0000 (14:34 -0800)]
drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
We need to write new value if the bit mask fields of new value is not
equal to old value. It does not make sense to write new value only when
all the bit_mask bits are zero.
Commit ef53d9c5e ("kprobes: improve kretprobe scalability with hashed
locking") introduced a bug where we can potentially leak
kretprobe_instances since we initialize a hlist head after having used
it.
Add support for leds on Bachmann's ot200 visualisation device. The
device has three leds on the back panel (led_err, led_init and led_run)
and can handle up to seven leds on the front panel.
The driver was written by Linutronix on behalf of Bachmann electronic
GmbH. It incorporates feedback from Lars-Peter Clausen
Michal Hocko [Fri, 20 Jan 2012 22:33:58 +0000 (14:33 -0800)]
mm: __count_immobile_pages(): make sure the node is online
page_zone() requires an online node otherwise we are accessing NULL
NODE_DATA. This is not an issue at the moment because node_zones are
located at the structure beginning but this might change in the future
so better be careful about that.
Michal Hocko [Fri, 20 Jan 2012 22:33:55 +0000 (14:33 -0800)]
mm: fix NULL ptr dereference in __count_immobile_pages
Fix the following NULL ptr dereference caused by
cat /sys/devices/system/memory/memory0/removable
Pid: 13979, comm: sed Not tainted 3.0.13-0.5-default #1 IBM BladeCenter LS21 -[7971PAM]-/Server Blade
RIP: __count_immobile_pages+0x4/0x100
Process sed (pid: 13979, threadinfo ffff880221c36000, task ffff88022e788480)
Call Trace:
is_pageblock_removable_nolock+0x34/0x40
is_mem_section_removable+0x74/0xf0
show_mem_removable+0x41/0x70
sysfs_read_file+0xfe/0x1c0
vfs_read+0xc7/0x130
sys_read+0x53/0xa0
system_call_fastpath+0x16/0x1b
We are crashing because we are trying to dereference NULL zone which
came from pfn=0 (struct page ffffea0000000000). According to the boot
log this page is marked reserved:
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
and early_node_map confirms that:
early_node_map[3] active PFN ranges
1: 0x00000010 -> 0x0000009c
1: 0x00000100 -> 0x000bffa3
1: 0x00100000 -> 0x00240000
The problem is that memory_present works in PAGE_SECTION_MASK aligned
blocks so the reserved range sneaks into the the section as well. This
also means that free_area_init_node will not take care of those reserved
pages and they stay uninitialized.
When we try to read the removable status we walk through all available
sections and hope that the zone is valid for all pages in the section.
But this is not true in this case as the zone and nid are not initialized.
We have only one node in this particular case and it is marked as node=1
(rather than 0) and that made the problem visible because page_to_nid will
return 0 and there are no zones on the node.
Let's check that the zone is valid and that the given pfn falls into its
boundaries and mark the section not removable. This might cause some
false positives, probably, but we do not have any sane way to find out
whether the page is reserved by the platform or it is just not used for
whatever other reasons.
Andrew Morton [Fri, 20 Jan 2012 22:33:53 +0000 (14:33 -0800)]
mm: fix warnings regarding enum migrate_mode
sparc64 allmodconfig:
In file included from include/linux/compat.h:15,
from /usr/src/25/arch/sparc/include/asm/siginfo.h:19,
from include/linux/signal.h:5,
from include/linux/sched.h:73,
from arch/sparc/kernel/asm-offsets.c:13:
include/linux/fs.h:618: warning: parameter has incomplete type
It seems that my sparc64 compiler (gcc-3.4.5) doesn't like the forward
declaration of enums.
Fix this by moving the "enum migrate_mode" definition into its own header
file.
Takashi Iwai [Mon, 23 Jan 2012 16:10:24 +0000 (17:10 +0100)]
ALSA: hda - Fix buffer-alignment regression with Nvidia HDMI
The commit 2ae66c26550cd94b0e2606a9275eb0ab7070ad0e
ALSA: hda: option to enable arbitrary buffer/period sizes
introduced a regression on machines with Intel controller and Nvidia
HDMI. The reason is that the driver modifies the global variable
align_buffer_size when an Intel controller is found, and the Nvidia
HDMI controller is probed after Intel although Nvidia chips require
the aligned buffers.
This patch fixes the problem by moving the flag into the local struct
so that it's not affected by other controllers.
David S. Miller [Sun, 22 Jan 2012 19:45:14 +0000 (14:45 -0500)]
bluetooth: hci: Fix type of "enable_hs" to bool.
Fixes:
net/bluetooth/hci_core.c: In function ‘__check_enable_hs’:
net/bluetooth/hci_core.c:2587:1: warning: return from incompatible pointer type [enabled by default]
Glauber Costa [Fri, 20 Jan 2012 04:57:16 +0000 (04:57 +0000)]
net: introduce res_counter_charge_nofail() for socket allocations
There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:
sk->sk_wmem_queued + size >= sk->sk_sndbuf
The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.
I see two ways of fixing this:
1) storing the information about those allocations somewhere
in memcg, and then deducting from that first, before
we start draining the res_counter,
2) providing a slightly different allocation function for
the res_counter, that matches the original behavior of
the network code more closely.
I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.
Sathya Perla [Thu, 19 Jan 2012 20:34:04 +0000 (20:34 +0000)]
be2net: create RSS rings even in multi-channel configs
Currently RSS rings are not created in a multi-channel config.
RSS rings can be created on one (out of four) interfaces per port in a
multi-channel config. Doing this insulates the driver from a FW bug wherin
multi-channel config is wrongly reported even when not enabled. This also
helps performance in a multi-channel config, as one interface per port gets
RSS rings.
Paul Gortmaker [Thu, 19 Jan 2012 15:40:06 +0000 (15:40 +0000)]
pktgen: Fix unsigned function that is returning negative vals
Every call to num_args() immediately checks the return value for
less than zero, as it will return -EFAULT for a failed get_user()
call. So it makes no sense for the function to be declared as an
unsigned long.
Yuchung Cheng [Thu, 19 Jan 2012 14:42:21 +0000 (14:42 +0000)]
tcp: detect loss above high_seq in recovery
Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.
Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.
Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.
Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.
With netem reordering, a gap of N is supposed to reorder every Nth packet with
given reorder probability. However, the code currently skips N packets and
reorders every (N+1)th packet.
Marcel Apfelbaum [Thu, 19 Jan 2012 09:45:31 +0000 (09:45 +0000)]
mlx4_core: Fix mtt profile issue
Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.
In native mode display all available staticstics.
In SRIOV mode on VF display only SW counters statistics,
in SRIOV mode on hypervisor display SW counters and errors (got from FW)
statistics.
Neal Cardwell [Wed, 18 Jan 2012 17:47:59 +0000 (17:47 +0000)]
tcp: fix undo after RTO for CUBIC
This patch fixes CUBIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).
When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:
(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.
(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.
This patch makes the following changes:
(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"
(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
(3) use ca->last_max_cwnd to check if we're in slow start
Neal Cardwell [Wed, 18 Jan 2012 17:47:58 +0000 (17:47 +0000)]
tcp: fix undo after RTO for BIC
This patch fixes BIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).
When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:
(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.
(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.
This patch makes the following changes:
(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"
(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events
(3) use ca->last_max_cwnd to check if we're in slow start