Git Repo - linux.git/log

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

net: explicitly add jump_label.h header to sock.h

Commit 36a1211970193ce215de50ed1e4e1272bc814df1 removed linux/module.h
include statement from one of the headers that end up in net/sock.h.
It was providing us with static_branch() definition implicitly, so
after its removal the build got broken.

To fix this, and avoid having this happening in the future,
let me do the right thing and include linux/jump_label.h
explicitly in sock.h.

Signed-off-by: Glauber Costa <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
CC: David S. Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: RTNETLINK adjusting values of min_ifinfo_dump_size

Setting link parameters on a netdevice changes the value
of if_nlmsg_size(), therefore it is necessary to recalculate
min_ifinfo_dump_size.

Signed-off-by: Stefan Gula <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Fix ip_gre lockless xmits.

Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.

modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0

The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').

On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).

The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:

ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873

This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.

The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.

[1] http://patchwork.ozlabs.org/patch/104610/
[2] http://kernel.googlecode.com/files/xmit_udp_parallel.c

Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

xen-netfront: correct MAX_TX_TARGET calculation.

Signed-off-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netns: fix net_alloc_generic()

When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :

[  200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
[  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
[  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
[  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
[  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
[  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
[  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
[  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
[  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
[  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
[  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
[  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
[  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190

net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.

This also reduces number of reallocations in net_assign_generic()

Reported-by: Sasha Levin <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Sjur Brændeland <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: bind() optimize port allocation

Port autoselection finds a port and then drop the lock,
then right after that, gets the hash bucket again and lock it.

Fix it to go direct.

Signed-off-by: Flavio Leitner <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: bind() fix autoselection to share ports

The current code checks for conflicts when the application
requests a specific port. If there is no conflict, then
the request is granted.

On the other hand, the port autoselection done by the kernel
fails when all ports are bound even when there is a port
with no conflict available.

The fix changes port autoselection to check if there is a
conflict and use it if not.

Signed-off-by: Flavio Leitner <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: l2tp_ip - fix possible oops on packet receive

When a packet is received on an L2TP IP socket (L2TPv3 IP link
encapsulation), the l2tpip socket's backlog_rcv function calls
xfrm4_policy_check(). This is not necessary, since it was called
before the skb was added to the backlog. With CONFIG_NET_NS enabled,
xfrm4_policy_check() will oops if skb->dev is null, so this trivial
patch removes the call.

This bug has always been present, but only when CONFIG_NET_NS is
enabled does it cause problems. Most users are probably using UDP
encapsulation for L2TP, hence the problem has only recently
surfaced.

EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0
EIP is at l2tp_ip_recvmsg+0xd4/0x2a7
EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246
ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Call Trace:
[<c1218568>] sock_common_recvmsg+0x31/0x46
[<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d
[<c12163a1>] __sock_recvmsg+0x31/0x3b
[<c1216828>] sock_recvmsg+0x96/0xab
[<c10b2693>] ? might_fault+0x47/0x81
[<c10b2693>] ? might_fault+0x47/0x81
[<c1167fd0>] ? _copy_from_user+0x31/0x115
[<c121e8c8>] ? copy_from_user+0x8/0xa
[<c121ebd6>] ? verify_iovec+0x3e/0x78
[<c1216604>] __sys_recvmsg+0x10a/0x1aa
[<c1216792>] ? sock_recvmsg+0x0/0xab
[<c105a99b>] ? __lock_acquire+0xbdf/0xbee
[<c12d5a99>] ? do_page_fault+0x193/0x375
[<c10d1200>] ? fcheck_files+0x9b/0xca
[<c10d1259>] ? fget_light+0x2a/0x9c
[<c1216bbb>] sys_recvmsg+0x2b/0x43
[<c1218145>] sys_socketcall+0x16d/0x1a5
[<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c100305f>] sysenter_do_call+0x12/0x38
Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c

Signed-off-by: James Chapman <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Davem says:

1) Fix JIT code generation on x86-64 for divide by zero, from Eric Dumazet.

2) tg3 header length computation correction from Eric Dumazet.

3) More build and reference counting fixes for socket memory cgroup
   code from Glauber Costa.

4) module.h snuck back into a core header after all the hard work we
   did to remove that, from Paul Gortmaker and Jesper Dangaard Brouer.

5) Fix PHY naming regression and add some new PCI IDs in stmmac, from
   Alessandro Rubini.

6) Netlink message generation fix in new team driver, should only advertise
   the entries that changed during events, from Jiri Pirko.

7) SRIOV VF registration and unregistration fixes, and also add a
   missing PCI ID, from Roopa Prabhu.

8) Fix infinite loop in tx queue flush code of brcmsmac, from Stanislaw Gruszka.

9) ftgmac100/ftmac100 build fix, missing interrupt.h include.

10) Memory leak fix in net/hyperv do_set_mutlicast() handling, from Wei Yongjun.

11) Off by one fix in netem packet scheduler, from Vijay Subramanian.

12) TCP loss detection fix from Yuchung Cheng.

13) TCP reset packet MD5 calculation uses wrong address, fix from Shawn Lu.

14) skge carrier assertion and DMA mapping fixes from Stephen Hemminger.

15) Congestion recovery undo performed at the wrong spot in BIC and CUBIC
    congestion control modules, fix from Neal Cardwell.

16) Ethtool ETHTOOL_GSSET_INFO is unnecessarily restrictive, from Michał Mirosław.

17) Fix triggerable race in ipv6 sysctl handling, from Francesco Ruggeri.

18) Statistics bug fixes in mlx4 from Eugenia Emantayev.

19) rds locking bug fix during info dumps, from your's truly.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  rds: Make rds_sock_lock BH rather than IRQ safe.
  netprio_cgroup.h: dont include module.h from other includes
  net: flow_dissector.c missing include linux/export.h
  team: send only changed options/ports via netlink
  net/hyperv: fix possible memory leak in do_set_multicast()
  drivers/net: dsa/mv88e6xxx.c files need linux/module.h
  stmmac: added PCI identifiers
  llc: Fix race condition in llc_ui_recvmsg
  stmmac: fix phy naming inconsistency
  dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches.
  tg3: fix ipv6 header length computation
  skge: add byte queue limit support
  mv643xx_eth: Add Rx Discard and Rx Overrun statistics
  bnx2x: fix compilation error with SOE in fw_dump
  bnx2x: handle CHIP_REVISION during init_one
  bnx2x: allow user to change ring size in ISCSI SD mode
  bnx2x: fix Big-Endianess in ethtool -t
  bnx2x: fixed ethtool statistics for MF modes
  bnx2x: credit-leakage fixup on vlan_mac_del_all
  macvlan: fix a possible use after free
  ...

rds: Make rds_sock_lock BH rather than IRQ safe.

rds_sock_info() triggers locking warnings because we try to perform a
local_bh_enable() (via sock_i_ino()) while hardware interrupts are
disabled (via taking rds_sock_lock).

There is no reason for rds_sock_lock to be a hardware IRQ disabling
lock, none of these access paths run in hardware interrupt context.

Therefore making it a BH disabling lock is safe and sufficient to
fix this bug.

Reported-by: Kumar Sanghvi <[email protected]>
Reported-by: Josh Boyer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netprio_cgroup.h: dont include module.h from other includes

A considerable effort was invested in wiping out module.h
from being present in all the other standard includes. This
one leaked back in, but once again isn't strictly necessary,
so remove it.

Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: flow_dissector.c missing include linux/export.h

The file net/core/flow_dissector.c seems to be missing
including linux/export.h.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

team: send only changed options/ports via netlink

This patch changes event message behaviour to send only updated records
instead of whole list. This fixes bug on which userspace receives non-actual
data in case multiple events occur in row.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/hyperv: fix possible memory leak in do_set_multicast()

do_set_multicast() may not free the memory malloc in
netvsc_set_multicast_list().

Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: K. Y. Srinivasan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers/net: dsa/mv88e6xxx.c files need linux/module.h

An implicit instance of module.h leaked back into existence
and was masking the fact that these drivers weren't calling
out the include for itself. Fix the drivers before we remove
the implicit include path via net/netprio_cgroup.h file.

Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: added PCI identifiers

STM has a device ID within its own VENDOR space, and it is being
used in the STA2X11 I/O Hub.

Signed-off-by: Alessandro Rubini <[email protected]>
Acked-by: Giancarlo Asnaghi <[email protected]>
Acked-by: Giuseppe Cavallaro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

llc: Fix race condition in llc_ui_recvmsg

There is a race on sk_receive_queue between llc_ui_recvmsg and
sock_queue_rcv_skb.

Our current solution is to protect skb_eat in llc_ui_recvmsg
with the queue spinlock.

Signed-off-by: Radu Iliescu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: fix phy naming inconsistency

After commit "db8857b stmmac: use an unique MDIO bus name" my
device stopped being probed because two different names were being
used in different places. This fixes the inconsistency.

Signed-off-by: Alessandro Rubini <[email protected]>
Acked-by: Giancarlo Asnaghi <[email protected]>
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Florian Fainelli <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Pass information that quota is stored in system file to userspace
  ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
  jbd: Issue cache flush after checkpointing

iwlwifi: fix PCI-E transport "inta" race

When an interrupt comes in, we read the reason
bits and collect them into "trans_pcie->inta".
This happens with the spinlock held. However,
there's a bug resetting this variable -- that
happens after the spinlock has been released.
This means that it is possible for interrupts
to be missed if the reset happens after some
other interrupt reasons were already added to
the variable.

I found this by code inspection, looking for a
reason that we sometimes see random commands
time out. It seems possible that this causes
such behaviour, but I can't say for sure right
now since it happens extremely infrequently on
my test systems.

Cc: [email protected] [3.2]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Wey-Yi Guy <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

mac80211: set bss_conf.idle when vif is connected

__ieee80211_recalc_idle() iterates through the vifs,
sets bss_conf.idle = true if they are disconnected,
and increases "count" if they are not (which later
gets evaluated in order to determine whether the
device is idle).

However, the loop doesn't set bss_conf.idle = false
(along with increasing "count"), causing the device
idle state and the vif idle state to get out of sync
in some cases.

Signed-off-by: Eliad Peller <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

mac80211: update oper_channel on ibss join

Commit 13c40c5 ("mac80211: Add HT operation modes for IBSS") broke
ibss operation by mistakenly removing the local->oper_channel
update (causing ibss to start on the wrong channel). fix it.

Signed-off-by: Eliad Peller <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

regulator: Fix documentation for of_node parameter of regulator_register()

Commit 5bc75a886353 ("kernel-doc: fix new warning in regulator core")
added documentation for of_node to address a warning but the
documentation didn't explain what the parameter is for so would be
likely to be unhelpful for users. Clarify that.

Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix silent outputs from docking-station jacks of Dell laptops
  ALSA: HDA: Use model=auto for Thinkpad T510
  ALSA: hda - Fix buffer-alignment regression with Nvidia HDMI
  ALSA: hda - Fix a unused variable warning
  snd-hda-intel: better Alienware M17x R3 quirk
  ALSA: hda/realtek - Remove use_jack_tbl field
  ALSA: hda/realtek - Avoid conflict of unsol-events with static quirks
  ALSA: hda/realtek - Avoid multi-ios conflicting with multi-speakers

gma500: Fix shmem mapping

GMA500 did it the old way and it's been on the TODO list to fix.
Current kernels now blow up if we use the old way so we'd better
do the work!

Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

migrate_mode.h is not exported to user mode

so move its include into fs.h inside the __KERNEL__ protection.

Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Power management fixes for 3.3

Two fixes for regressions introduced during the merge window, one fix for
a long-standing obscure issue in the computation of hibernate image size
and two small PM documentation fixes.

* tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Sleep: Fix read_unlock_usermodehelper() call.
  PM / Hibernate: Rewrite unlock_system_sleep() to fix s2disk regression
  PM / Hibernate: Correct additional pages number calculation
  PM / Documentation: Fix minor issue in freezing_of_tasks.txt
  PM / Documentation: Fix spelling mistake in basic-pm-debugging.txt

Merge tag 'arm-soc-imx-move' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Consolidate i.MX 5 platforms to be under the new shared i.MX 3/5/6 tree.

* tag 'arm-soc-imx-move' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM i.MX: Update defconfig
  ARM i.MX: Merge i.MX5 support into mach-imx
  ARM i.MX5: remove unnecessary includes from board files

Fix up fairly trivial conflicts due to various changes nearby in
arch/arm/{mach,plat}-imx/{Kconfig,Makefile}

Pull request had been sent to the wrong email address, but happened
before the merge window closed.  I'm merging the MX 5 consolidation,
since it apparently will help the next development window and will avoid
conflicts later as per Arnd.

PM / Sleep: Fix read_unlock_usermodehelper() call.

Commit b298d289
"PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()"
added read_unlock_usermodehelper() but read_unlock_usermodehelper() is called
without read_lock_usermodehelper() when kmalloc() failed.

Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches.

Add reporting of silicon revision during the probe function for Marvell 88E6123/88E6161/88E6165 switches.

Signed-off-by: Chris Healy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tg3: fix ipv6 header length computation

tg3_start_xmit() makes the wrong assumption for TSOV6 that skb->head
doesnt include any payload data.

if (skb_is_gso_v6(skb))
hdr_len = skb_headlen(skb) - ETH_HLEN;

This is not true anymore after commit f07d960df3 (tcp: avoid frag
allocation for small frames)

We should instead use : skb_transport_offset(skb) + tcp_hdrlen(skb)

Its also true for IPv4

Signed-off-by: Eric Dumazet <[email protected]>
CC: Matt Carlson <[email protected]>
CC: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

skge: add byte queue limit support

This also changes the cleanup logic slightly to aggregate
completed notifications for multiple packets.

Signed-off-by: Stephen Hemminger <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mv643xx_eth: Add Rx Discard and Rx Overrun statistics

These statistics helped me a lot while searching who is losing
packets in my setup.
I added these stats to MIB group since they are very similar,
but just in other registers.
I have tested this patch on 88F6281 SoC.

Signed-off-by: Paulius Zaleckas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: fix compilation error with SOE in fw_dump

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: handle CHIP_REVISION during init_one

The macro `CHIP_IS_E1x' requires `bp' to be initialized.
As `bp' is not yet initialized during this phase of `bnx2x_init_dev',
it accessed uninitialized fields in the struct.

Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: allow user to change ring size in ISCSI SD mode

Signed-off-by: Dmitry Kravkov <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: fix Big-Endianess in ethtool -t

Signed-off-by: Dmitry Kravkov <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: fixed ethtool statistics for MF modes

Previosuly, in MF modes `ethtool -S' lacked some of the statistics
which appeared in non-MF modes. This has been fixed.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: credit-leakage fixup on vlan_mac_del_all

Upon insertion of elements into the execution queue, it is validated
that there are enough credits to support additional vlan-macs,
and the credits are consumed. However, when removing a pending
command in `bnx2x_vland_mac_del_all' the consumed credits are not
released, which might cause leakage and eventually the inability to
add new vlan-macs in certain scenarios.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

macvlan: fix a possible use after free

Commit bc416d9768 (macvlan: handle fragmented multicast frames) added a
possible use after free in macvlan_handle_frame(), since
ip_check_defrag() uses pskb_may_pull() : skb header can be reallocated.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Ben Greear <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'kernel-doc' from Randy Dunlap

The usual kernel-doc fixups from Randy.  Some of them David acked as
merged in his tree, this is the random left-overs.

* kernel-doc:
  docbook: fix sched source file names in device-drivers book
  docbook: change iomap source filename in deviceiobook
  docbook: don't use serial_core.h in device-drivers book
  kernel-doc: fix kernel-doc warnings in sched
  kernel-doc: fix new warnings in cfg80211.h
  kernel-doc: fix new warning in usb.h
  kernel-doc: fix new warnings in device.h
  kernel-doc: fix new warnings in debugfs
  kernel-doc: fix new warning in regulator core
  kernel-doc: fix new warnings in pci
  kernel-doc: fix new warnings in driver-core
  kernel-doc: fix new warnings in auditsc.c
  scripts/kernel-doc: fix fatal error caused by cfg80211.h

Merge branch 'akpm'

Quoth Andrew:
  "Random fixes.  And a simple new LED driver which I'm trying to sneak
   in while you're not looking."

Sneaking successful.

* akpm:
  score: fix off-by-one index into syscall table
  mm: fix rss count leakage during migration
  SHM_UNLOCK: fix Unevictable pages stranded after swap
  SHM_UNLOCK: fix long unpreemptible section
  kdump: define KEXEC_NOTE_BYTES arch specific for s390x
  mm/hugetlb.c: undo change to page mapcount in fault handler
  mm: memcg: update the correct soft limit tree during migration
  proc: clear_refs: do not clear reserved pages
  drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails
  drivers/video/backlight/adp88x0_bl.c: fix bit testing logic
  kprobes: initialize before using a hlist
  ipc/mqueue: simplify reading msgqueue limit
  leds: add led driver for Bachmann's ot200
  mm: __count_immobile_pages(): make sure the node is online
  mm: fix NULL ptr dereference in __count_immobile_pages
  mm: fix warnings regarding enum migrate_mode

Merge branch 'for-linus-3.3' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf

* 'for-linus-3.3' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
MAINTAINERS: Add dma-buf sharing framework maintainer

ALSA: hda - Fix silent outputs from docking-station jacks of Dell laptops

The recent change of the power-widget handling for IDT codecs caused
the silent output from the docking-station line-out jack. This was
partially fixed by the commit f2cbba7602383cd9cdd21f0a5d0b8bd1aad47b33
"ALSA: hda - Fix the lost power-setup of seconary pins after PM resume".
But the line-out on the docking-station is still silent when booted
with the jack plugged even by this fix.

The remainig bug is that the power-widget is set off in stac92xx_init()
because the pins in cfg->line_out_pins[] aren't checked there properly
but only hp_pins[] are checked in is_nid_hp_pin().

This patch fixes the problem by checking both HP and line-out pins
and leaving the power-map correctly.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42637

Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Merge git://git.samba.org/sfrench/cifs-2.6

* git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Rename *UCS* functions to *UTF16*
  [CIFS] ACL and FSCACHE support no longer EXPERIMENTAL
  [CIFS] Fix build break with multiuser patch when LANMAN disabled
  cifs: warn about impending deprecation of legacy MultiuserMount code
  cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts
  cifs: sanitize username handling
  keys: add a "logon" key type
  cifs: lower default wsize when unix extensions are not used
  cifs: better instrumentation for coalesce_t2
  cifs: integer overflow in parse_dacl()
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions

docbook: fix sched source file names in device-drivers book

Fix warning since kernel scheduler functions and kernel-doc
notation were moved to other files.

docproc: lnx-33-rc1/kernel/sched.c: No such file or directory

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

docbook: change iomap source filename in deviceiobook

Fix warning since kernel-doc comments were moved to another
source file.

Warning(lib/iomap.c): no structured comments found

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

docbook: don't use serial_core.h in device-drivers book

Fix new kernel-doc warning. This file no longer contains
kernel-doc comments.

Warning(include/linux/serial_core.h): no structured comments found

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix kernel-doc warnings in sched

Fix new kernel-doc notation warnings:

Warning(include/linux/sched.h:2094): No description found for parameter 'p'
Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in cfg80211.h

Fix new kernel-doc warnings:

Warning(include/net/cfg80211.h:1165): No description found for parameter 'channel_type'
Warning(include/net/cfg80211.h:2090): No description found for parameter 'probe_resp_offload'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warning in usb.h

Fix new kernel-doc warning:

Warning(include/linux/usb.h:1251): No description found for parameter 'num_mapped_sgs'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in device.h

Fix new kernel-doc warnings:

Warning(include/linux/device.h:299): No description found for parameter 'name'
Warning(include/linux/device.h:299): No description found for parameter 'subsys'
Warning(include/linux/device.h:299): No description found for parameter 'node'
Warning(include/linux/device.h:299): No description found for parameter 'add_dev'
Warning(include/linux/device.h:299): No description found for parameter 'remove_dev'
Warning(include/linux/device.h:685): No description found for parameter 'id'
Warning(include/linux/device.h:1009): No description found for parameter '__driver'
Warning(include/linux/device.h:1009): No description found for parameter '__register'
Warning(include/linux/device.h:1009): No description found for parameter '__unregister'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in debugfs

Fix new kernel-doc warnings:

Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs'
Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warning in regulator core

Fix new kernel-doc warning:

Warning(drivers/regulator/core.c:2741): No description found for parameter 'of_node'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Liam Girdwood <[email protected]>
Cc: Mark Brown <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in pci

Fix new kernel-doc warnings:

Warning(drivers/pci/pci.c:2811): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2811): Excess function parameter 'pdev' description in 'pci_intx_mask_supported'
Warning(drivers/pci/pci.c:2894): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2894): Excess function parameter 'pdev' description in 'pci_check_and_mask_intx'
Warning(drivers/pci/pci.c:2908): No description found for parameter 'dev'
Warning(drivers/pci/pci.c:2908): Excess function parameter 'pdev' description in 'pci_check_and_unmask_intx'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jesse Barnes <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in driver-core

Fix new kernel-doc warnings:

Warning(drivers/base/bus.c:925): No description found for parameter 'key'
Warning(drivers/base/bus.c:1241): No description found for parameter 'subsys'
Warning(drivers/base/bus.c:1241): No description found for parameter 'groups'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel-doc: fix new warnings in auditsc.c

Fix new kernel-doc warnings in auditsc.c:

Warning(kernel/auditsc.c:1875): No description found for parameter 'success'
Warning(kernel/auditsc.c:1875): No description found for parameter 'return_code'
Warning(kernel/auditsc.c:1875): Excess function parameter 'pt_regs' description in '__audit_syscall_exit'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric Paris <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/kernel-doc: fix fatal error caused by cfg80211.h

include/net/cfg80211.h uses __must_check in functions that
have kernel-doc notation. This was confusing scripts/kernel-doc,
so have scripts/kernel-doc ignore "__must_check".

Error(include/net/cfg80211.h:2702): cannot understand prototype: 'struct cfg80211_bss * __must_check cfg80211_inform_bss(...)

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Johannes Berg <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

score: fix off-by-one index into syscall table

If the provided system call number is equal to __NR_syscalls, the
current check will pass and a function pointer just after the system
call table may be called, since sys_call_table is an array with total
size __NR_syscalls.

Whether or not this is a security bug depends on what the compiler puts
immediately after the system call table. It's likely that this won't do
anything bad because there is an additional NULL check on the syscall
entry, but if there happens to be a non-NULL value immediately after the
system call table, this may result in local privilege escalation.

Signed-off-by: Dan Rosenberg <[email protected]>
Cc: <[email protected]>
Cc: Chen Liqin <[email protected]>
Cc: Lennox Wu <[email protected]>
Cc: Eugene Teo <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix rss count leakage during migration

Memory migration fills a pte with a migration entry and it doesn't
update the rss counters.  Then it replaces the migration entry with the
new page (or the old one if migration failed).  But between these two
passes this pte can be unmaped, or a task can fork a child and it will
get a copy of this migration entry.  Nobody accounts for this in the rss
counters.

This patch properly adjust rss counters for migration entries in
zap_pte_range() and copy_one_pte().  Thus we avoid extra atomic
operations on the migration fast-path.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

SHM_UNLOCK: fix Unevictable pages stranded after swap

Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in
find_get_pages") correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers to
mean end of tree) when it meets a run of nr_pages swap entries.

The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages(). The first is already commented, and
not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.

Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
swap) instead of pagevec_lookup().

But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(), looping
down an array of pages, oftentimes under the same lock.

Leave out the "rotate unevictable list" block: that's a leftover from
when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
handling involved looking at pages at tail of LRU.

Was there significance to the sequence first ClearPageUnevictable, then
test page_evictable, then SetPageUnevictable here? I think not, we're
under LRU lock, and have no barriers between those.

Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: <[email protected]> [back to 3.1 but will need respins]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

SHM_UNLOCK: fix long unpreemptible section

scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
evictable again once the shared memory is unlocked.  It does this with
pagevec_lookup()s across the whole object (which might occupy most of
memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
PAGEVEC_SIZE pages would be good.

However, KOSAKI-san points out that this is called under shmem.c's
info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
There is no strong reason for that: we need to take these pages off the
unevictable list soonish, but those locks are not required for it.

So move the call to scan_mapping_unevictable_pages() from shmem.c's
unlock handling up to shm.c's unlock handling.  Remove the recently
added barrier, not needed now we have spin_unlock() before the scan.

Use get_file(), with subsequent fput(), to make sure we have a reference
to mapping throughout scan_mapping_unevictable_pages(): that's something
that was previously guaranteed by the shm_lock().

Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
time, and we lazily discover them to be Unevictable later, so it serves
no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
pages still on pagevec are not marked Unevictable.

The original code avoided redundant rescans by checking VM_LOCKED flag
at its level: now avoid them by checking shp's SHM_LOCKED.

The original code called scan_mapping_unevictable_pages() on a locked
area at shm_destroy() time: perhaps we once had accounting cross-checks
which required that, but not now, so skip the overhead and just let
inode eviction deal with them.

Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
more as comment than to save space; comment them used for SHM_UNLOCK.

Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kdump: define KEXEC_NOTE_BYTES arch specific for s390x

kdump only allocates memory for the prstatus ELF note.  For s390x,
besides of prstatus multiple ELF notes for various different register
types are stored.  Therefore the currently allocated memory is not
sufficient.  With this patch the KEXEC_NOTE_BYTES macro can be defined
by architecture code and for s390x it is set to the correct size now.

Signed-off-by: Michael Holzheu <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hugetlb.c: undo change to page mapcount in fault handler

Page mapcount should be updated only if we are sure that the page ends
up in the page table otherwise we would leak if we couldn't COW due to
reservations or if idx is out of bounds.

Signed-off-by: Hillf Danton <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg: update the correct soft limit tree during migration

end_migration() passes the old page instead of the new page to commit
the charge.  This page descriptor is not used for committing itself,
though, since we also pass the (correct) page_cgroup descriptor.  But
it's used to find the soft limit tree through the page's zone, so the
soft limit tree of the old page's zone is updated instead of that of the
new page's, which might get slightly out of date until the next charge
reaches the ratelimit point.

This glitch has been present since 5564e88 ("memcg: condense
page_cgroup-to-page lookup points").

This fixes a bug that I introduced in 2.6.38.  It's benign enough (to my
knowledge) that we probably don't want this for stable.

Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: clear_refs: do not clear reserved pages

/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.

On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts.  Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in.  This results in a system deadlock
on the next exception.

To see this problem in action, just run:

$ echo 1 > /proc/self/clear_refs

on an ARM platform (as any user) and watch your system hang.  I think this
has been the case since 2.6.37

This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM.  Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.

Signed-off-by: Will Deacon <[email protected]>
Reported-by: Moussa Ba <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Russell King <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: <[email protected]> [2.6.37+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails

Signed-off-by: Axel Lin <[email protected]>
Acked-by: Alberto Panizzo <[email protected]>
Cc: Richard Purdie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/video/backlight/adp88x0_bl.c: fix bit testing logic

We need to write new value if the bit mask fields of new value is not
equal to old value. It does not make sense to write new value only when
all the bit_mask bits are zero.

Signed-off-by: Axel Lin <[email protected]>
Cc: Michael Hennerich <[email protected]>
Cc: Richard Purdie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kprobes: initialize before using a hlist

Commit ef53d9c5e ("kprobes: improve kretprobe scalability with hashed
locking") introduced a bug where we can potentially leak
kretprobe_instances since we initialize a hlist head after having used
it.

Initialize the hlist head before using it.

Reported by: Jim Keniston <[email protected]>
Acked-by: Jim Keniston <[email protected]>
Signed-off-by: Ananth N Mavinakayanahalli <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Srinivasa D S <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ipc/mqueue: simplify reading msgqueue limit

Because the current task is being used to get the limit, we can simply
use rlimit() instead of task_rlimit().

Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

leds: add led driver for Bachmann's ot200

Add support for leds on Bachmann's ot200 visualisation device. The
device has three leds on the back panel (led_err, led_init and led_run)
and can handle up to seven leds on the front panel.

The driver was written by Linutronix on behalf of Bachmann electronic
GmbH. It incorporates feedback from Lars-Peter Clausen

[[email protected]: add dependency on HAS_IOMEM]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Christian Gmeiner <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Richard Purdie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: __count_immobile_pages(): make sure the node is online

page_zone() requires an online node otherwise we are accessing NULL
NODE_DATA. This is not an issue at the moment because node_zones are
located at the structure beginning but this might change in the future
so better be careful about that.

Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix NULL ptr dereference in __count_immobile_pages

Fix the following NULL ptr dereference caused by

  cat /sys/devices/system/memory/memory0/removable

Pid: 13979, comm: sed Not tainted 3.0.13-0.5-default #1 IBM BladeCenter LS21 -[7971PAM]-/Server Blade
RIP: __count_immobile_pages+0x4/0x100
Process sed (pid: 13979, threadinfo ffff880221c36000, task ffff88022e788480)
Call Trace:
  is_pageblock_removable_nolock+0x34/0x40
  is_mem_section_removable+0x74/0xf0
  show_mem_removable+0x41/0x70
  sysfs_read_file+0xfe/0x1c0
  vfs_read+0xc7/0x130
  sys_read+0x53/0xa0
  system_call_fastpath+0x16/0x1b

We are crashing because we are trying to dereference NULL zone which
came from pfn=0 (struct page ffffea0000000000). According to the boot
log this page is marked reserved:
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)

and early_node_map confirms that:
early_node_map[3] active PFN ranges
    1: 0x00000010 -> 0x0000009c
    1: 0x00000100 -> 0x000bffa3
    1: 0x00100000 -> 0x00240000

The problem is that memory_present works in PAGE_SECTION_MASK aligned
blocks so the reserved range sneaks into the the section as well.  This
also means that free_area_init_node will not take care of those reserved
pages and they stay uninitialized.

When we try to read the removable status we walk through all available
sections and hope that the zone is valid for all pages in the section.
But this is not true in this case as the zone and nid are not initialized.

We have only one node in this particular case and it is marked as node=1
(rather than 0) and that made the problem visible because page_to_nid will
return 0 and there are no zones on the node.

Let's check that the zone is valid and that the given pfn falls into its
boundaries and mark the section not removable.  This might cause some
false positives, probably, but we do not have any sane way to find out
whether the page is reserved by the platform or it is just not used for
whatever other reasons.

Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix warnings regarding enum migrate_mode

sparc64 allmodconfig:

In file included from include/linux/compat.h:15,
                 from /usr/src/25/arch/sparc/include/asm/siginfo.h:19,
                 from include/linux/signal.h:5,
                 from include/linux/sched.h:73,
                 from arch/sparc/kernel/asm-offsets.c:13:
include/linux/fs.h:618: warning: parameter has incomplete type

It seems that my sparc64 compiler (gcc-3.4.5) doesn't like the forward
declaration of enums.

Fix this by moving the "enum migrate_mode" definition into its own header
file.

Acked-by: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andy Isaacson <[email protected]>
Cc: Nai Xia <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ALSA: HDA: Use model=auto for Thinkpad T510

The user reports that model=auto works fine for him. Using
model=auto bring in new features such as jack detection notification
to userspace.

Alsa info is available at http://paste.ubuntu.com/805351/

Signed-off-by: David Henningsson <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda - Fix buffer-alignment regression with Nvidia HDMI

The commit 2ae66c26550cd94b0e2606a9275eb0ab7070ad0e
ALSA: hda: option to enable arbitrary buffer/period sizes
introduced a regression on machines with Intel controller and Nvidia
HDMI. The reason is that the driver modifies the global variable
align_buffer_size when an Intel controller is found, and the Nvidia
HDMI controller is probed after Intel although Nvidia chips require
the aligned buffers.

This patch fixes the problem by moving the flag into the local struct
so that it's not affected by other controllers.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42567

Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ethtool: allow ETHTOOL_GSSET_INFO for users

Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users.
ETHTOOL_GSTRINGS is already allowed, but is unusable without this one.

Signed-off-by: Michał Mirosław <[email protected]>
Acked-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bluetooth: hci: Fix type of "enable_hs" to bool.

Fixes:

net/bluetooth/hci_core.c: In function ‘__check_enable_hs’:
net/bluetooth/hci_core.c:2587:1: warning: return from incompatible pointer type [enabled by default]

Signed-off-by: David S. Miller <[email protected]>

net: introduce res_counter_charge_nofail() for socket allocations

There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
   in memcg, and then deducting from that first, before
   we start draining the res_counter,
2) providing a slightly different allocation function for
   the res_counter, that matches the original behavior of
   the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.

Signed-off-by: Glauber Costa <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
CC: Tejun Heo <[email protected]>
CC: Li Zefan <[email protected]>
CC: Laurent Chavey <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cgroup: make sure memcg margin is 0 when over limit

For the memcg sock code, we'll need to register allocations
that are temporarily over limit. Let's make sure that margin
is 0 in this case.

I am keeping this as a separate patch, so that if any weirdness
interaction appears in the future, we can now exactly what caused
it.

Suggested by Johannes Weiner

Signed-off-by: Glauber Costa <[email protected]>
CC: KAMEZAWA Hiroyuki <[email protected]>
CC: Johannes Weiner <[email protected]>
CC: Michal Hocko <[email protected]>
CC: Tejun Heo <[email protected]>
CC: Li Zefan <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix socket memcg build with !CONFIG_NET

There is still a build bug with the sock memcg code, that triggers
with !CONFIG_NET, that survived my series of randconfig builds.

Signed-off-by: Glauber Costa <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
CC: Hiroyouki Kamezawa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kernel-doc: fix new warning in net/sock.h

Fix new kernel-doc warning:

Warning(include/net/sock.h:372): No description found for parameter 'sk_cgrp_prioidx'

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kernel-doc: fix new warning in net/phy/mdio_bus.c

Fix new kernel-doc warning:

Warning(drivers/net/phy/mdio_bus.c:49): No description found for parameter 'size'

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: md5: using remote adress for md5 lookup in rst packet

md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.

Signed-off-by: shawnlu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

be2net: create RSS rings even in multi-channel configs

Currently RSS rings are not created in a multi-channel config.
RSS rings can be created on one (out of four) interfaces per port in a
multi-channel config. Doing this insulates the driver from a FW bug wherin
multi-channel config is wrongly reported even when not enabled. This also
helps performance in a multi-channel config, as one interface per port gets
RSS rings.

Signed-off-by: Sathya Perla <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

pktgen: Fix unsigned function that is returning negative vals

Every call to num_args() immediately checks the return value for
less than zero, as it will return -EFAULT for a failed get_user()
call. So it makes no sense for the function to be declared as an
unsigned long.

Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: detect loss above high_seq in recovery

Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.

Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.

Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.

Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.

Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

skge: check for PCI dma mapping errors

Driver should check for mapping errors.
Machines with limited DMA maps may return an error when a PCI map is
requested (not an issue on standard x86).

Also use upper/lower 32 bits macros for clarity.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

skge: don't assert carrier until link is up

Skge device would assert carrier (link up) as soon as network device open
was called, rather than waiting until PHY has detected link.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netem: Fix off-by-one bug in reordering

With netem reordering, a gap of N is supposed to reorder every Nth packet with
given reorder probability. However, the code currently skips N packets and
reorders every (N+1)th packet.

Signed-off-by: Vijay Subramanian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4_core: map async events to arbitrary slave eqs

Slave async events were mapped to single eq. This patch fixes this issue, so
the slaves can map the async events to any eq.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Jack Morgenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4_core: Fix mtt profile issue

Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Jack Morgenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4_core: removed function index from vf.

The Virtual Functions should not be aware their function number.

Signed-off-by: Marcel Apfelbaum <[email protected]>
Reviewed-by: Jack Morgenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4_en: eth statistics modification

In native mode display all available staticstics.
In SRIOV mode on VF display only SW counters statistics,
in SRIOV mode on hypervisor display SW counters and errors (got from FW)
statistics.

Signed-off-by: Eugenia Emantayev <[email protected]>
Reviewed-by: Yevgeny Petrilin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4: VF is not allowed to perform dump stats

In multifunction mode - DUMP_STATS command is not executed
for VFs.

Signed-off-by: Eugenia Emantayev <[email protected]>
Reviewed-by: Yevgeny Petrilin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4_en: clear all eth statistics when port goes up

Bug fix: Not all stats fields were cleared.

Signed-off-by: Eugenia Emantayev <[email protected]>
Reviewed-by: Yevgeny Petriln <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix undo after RTO for CUBIC

This patch fixes CUBIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).

When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:

(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.

(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.

This patch makes the following changes:

(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"

(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events

(3) use ca->last_max_cwnd to check if we're in slow start

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Acked-by: Sangtae Ha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: fix undo after RTO for BIC

This patch fixes BIC so that cwnd reductions made during RTOs can be
undone (just as they already can be undone when using the default/Reno
behavior).

When undoing cwnd reductions, BIC-derived congestion control modules
were restoring the cwnd from last_max_cwnd. There were two problems
with using last_max_cwnd to restore a cwnd during undo:

(a) last_max_cwnd was set to 0 on state transitions into TCP_CA_Loss
(by calling the module's reset() functions), so cwnd reductions from
RTOs could not be undone.

(b) when fast_covergence is enabled (which it is by default)
last_max_cwnd does not actually hold the value of snd_cwnd before the
loss; instead, it holds a scaled-down version of snd_cwnd.

This patch makes the following changes:

(1) upon undo, revert snd_cwnd to ca->loss_cwnd, which is already, as
the existing comment notes, the "congestion window at last loss"

(2) stop forgetting ca->loss_cwnd on TCP_CA_Loss events

(3) use ca->last_max_cwnd to check if we're in slow start

Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

enic: fix compile when CONFIG_PCI_IOV is not enabled

reverting back change that access enic->num_vfs outside
CONFIG_PCI_IOV

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>