Git Repo - linux.git/log

sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers

In order to stop the RX path accessing the RX ring while it's being
stopped or resized, we clear the interrupt mask (EESIPR) and then call
free_irq() or synchronise_irq().  This is insufficient because the
interrupt handler or NAPI poller may set EESIPR again after we clear
it.  Also, in sh_eth_set_ringparam() we currently don't disable NAPI
polling at all.

I could easily trigger a crash by running the loop:

   while ethtool -G eth0 rx 128 && ethtool -G eth0 rx 64; do echo -n .; done

and 'ping -f' toward the sh_eth port from another machine.

To fix this:
- Add a software flag (irq_enabled) to signal whether interrupts
  should be enabled
- In the interrupt handler, if the flag is clear then clear EESIPR
  and return
- In the NAPI poller, if the flag is clear then don't set EESIPR
- Set the flag before enabling interrupts in sh_eth_dev_init() and
  sh_eth_set_ringparam()
- Clear the flag and serialise with the interrupt and NAPI
  handlers before clearing EESIPR in sh_eth_close() and
  sh_eth_set_ringparam()

After this, I could run the loop for 100,000 iterations successfully.

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: Fix crash or memory leak when resizing rings on device that is down

If the device is down then no packet buffers should be allocated.
We also must not touch its registers as it may be powered off.

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: Detach net device when stopping queue to resize DMA rings

We must only ever stop TX queues when they are full or the net device
is not 'ready' so far as the net core, and specifically the watchdog,
is concerned. Otherwise, the watchdog may fire *immediately* if no
packets have been added to the queue in the last 5 seconds.

What's more, sh_eth_tx_timeout() will likely crash if called while
we're resizing the TX ring.

I could easily trigger this by running the loop:

while ethtool -G eth0 rx 128 && ethtool -G eth0 rx 64; do echo -n .; done

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: Fix padding of short frames on TX

If an skb to be transmitted is shorter than the minimum Ethernet frame
length, we currently set the DMA descriptor length to the minimum but
do not add zero-padding. This could result in leaking sensitive
data. We also pass different lengths to dma_map_single() and
dma_unmap_single().

Use skb_padto() to pad properly, before calling dma_map_single().

Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
pull-request: wireless-drivers-next 2015-01-22

now a bigger pull request for net-next. Rafal found a UTF-8 bug in
patchwork[1] and because of that two commits (d0c102f70aec and
d0f66df5392a) have his name corrupted:

Acked-by: Rafa? Mi?ecki <[email protected]>
Somehow I failed to spot that when I commited the patches. As rebasing
public git trees is bad, I thought we can live with these and decided
not to rebase. But I'll pay close attention to this in the future to
make sure that it won't happen again. Also we requested an update to
patchwork.kernel.org, the latest patchwork doesn't seem to have this
bug.

Also please note this pull request also adds one DT binding doc, but
this was reviewed in the device tree list:

.../bindings/net/wireless/qcom,ath10k.txt | 30 +

Please let me know if you have any issues.

[1] https://lists.ozlabs.org/pipermail/patchwork/2015-January/001261.html
====================

Signed-off-by: David S. Miller <[email protected]>

net: act_bpf: fix size mismatch on filter preparation

Similarly as in cls_bpf, also this code needs to reject mismatches.

Reference: http://article.gmane.org/gmane.linux.network/347406
Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cls_basic: return from walking on match in basic_get

As soon as we've found a matching handle in basic_get(), we can
return it. There's no need to continue walking until the end of
a filter chain, since they are unique anyway.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Cc: Thomas Graf <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers: net: cpsw: discard dual emac default vlan configuration

In Dual EMAC, the default VLANs are used to segregate Rx packets between
the ports, so adding the same default VLAN to the switch will affect the
normal packet transfers. So returning error on addition of dual EMAC
default VLANs.

Even if EMAC 0 default port VLAN is added to EMAC 1, it will lead to
break dual EMAC port separations.

Fixes: d9ba8f9e6298 (driver: net: ethernet: cpsw: dual emac interface implementation)
Cc: <[email protected]> # v3.9+
Reported-by: Felipe Balbi <[email protected]>
Signed-off-by: Mugunthan V N <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'cls_bpf'

Daniel Borkmann says:

====================
Two cls_bpf fixes

Found them while doing a review on act_bpf and going over the
cls_bpf code again. Will also address the first issue in act_bpf
as it needs to be fixed there, too.
====================

Signed-off-by: David S. Miller <[email protected]>

net: cls_bpf: fix auto generation of per list handles

When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cls_bpf: fix size mismatch on filter preparation

In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb84e1 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: Add an optional device tree property "snps,burst_len"

This property define the AXI bug lenth.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set

Clear the TX COE bit when force_thresh_dma_mode is set even hardware
dma capability says support.

Tested on BF609.

Signed-off-by: Sonic Zhang <[email protected]>
Acked-by: Giuseppe Cavallaro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: if force_thresh_dma_mode is set, pass tc to both txmode and rxmode in tx_hard_error_bump_tc interrupt

Dont' pass SF_DMA_MODE to rxmode in this case.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ovs_flowids'

Joe Stringer says:

====================
openvswitch: Introduce 128-bit unique flow identifiers.

This series extends the openvswitch datapath interface for flow commands to use
128-bit unique identifiers as an alternative to the netlink-formatted flow key.
This significantly reduces the cost of assembling messages between the kernel
and userspace, in particular improving Open vSwitch revalidation performance by
40% or more.

v14:
- Perform lookup using unmasked key in legacy case.
- Fix minor checkpatch.pl style violations.

v13:
- Embed sw_flow_id in sw_flow to save memory allocation in UFID case.
- Malloc unmasked key for id in non-UFID case.
- Fix bug where non-UFID case could double-serialize keys.

v12:
- Userspace patches fully merged into Open vSwitch master
- New minor refactor patches (2,3,4)
- Merge unmasked_key, ufid representation of flow identifier in sw_flow
- Improve memory allocation sizes when serializing ufid
- Handle corner case where a flow_new is requested with a flow that has an
identical ufid as an existing flow, but a different flow key
- Limit UFID to between 1-16 octets inclusive.
- Add various helper functions to improve readibility

v11:
- Pushed most of the prerequisite patches for this series to OVS master.
- Split out openvswitch.h interface changes from datapath implementation
- Datapath implementation to be reviewed on net-next, separately

v10:
- New patch allowing datapath to serialize masked keys
- Simplify datapath interface by accepting UFID or flow_key, but not both
- Flows set up with UFID must be queried/deleted using UFID
- Reduce sw_flow memory usage for UFID
- Don't periodically rehash UFID table in linux datapath
- Remove kernel_only UFID in linux datapath

v9:
- No kernel changes

v8:
- Rename UID -> UFID
- Fix null dereference in datapath when paired with older userspace
- All patches are reviewed/acked except datapath changes.

v7:
- Remove OVS_DP_F_INDEX_BY_UID
- Rework datapath UID serialization for variable length UIDs

v6:
- Reduce netlink conversions for all datapaths
- Various bugfixes

v5:
- Various bugfixes
- Improve logging

v4:
- Datapath memory leak fixes
- Enable UID-based terse dumping and deleting by default
- Various fixes

RFCv3:
- Add datapath implementation
====================

Signed-off-by: David S. Miller <[email protected]>

openvswitch: Add support for unique flow IDs.

Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

genetlink: Add genlmsg_parse() helper function.

The first user will be the next patch.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Use sw_flow_key_range for key ranges.

These minor tidyups make a future patch a little tidier.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Refactor ovs_flow_tbl_insert().

Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Refactor ovs_nla_fill_match().

Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-next-for-3.20-20150121' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-21-01

this is a pull request of 4 patches for net-next/master.

Andri Yngvason contributes one patch to further consolidate the CAN
state change handling. The next patch is by kbuild test robot/Fengguang
Wu which fixes a coccinelle warning in the CAN infrastructure. The two
last patches are by me, they remove a unused variable from the flexcan
and at91_can driver.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sh_eth'

Sergei Shtylyov says:

====================
sh_eth: massage PM code

Here's a set of 2 patches against DaveM's 'net-next.git' repo. We're adding
the support for suspend/hibernation as well as somewhat changing the existing
code. There are still MDIO-related issue with suspend (kernel exception), we've
been working on it and shall address it with a separate patch...
====================

Signed-off-by: David S. Miller <[email protected]>

sh_eth: add more PM methods

Add sh_eth_{suspend|resume}() implementing {suspend|resume|freeze|thaw|poweroff|
restore}() PM methods to make it possible to restore from hibernation not only
in Linux but also in e.g. U-Boot and to have more determined state on resume/
restore.

Signed-off-by: Mikhail Ulyanov <[email protected]>
[Sergei: moved sh_eth_{suspend|resume}() before sh_eth_runtime_nop(), enclosed
them with #ifdef CONFIG_PM_SLEEP, reordered the local variables, got rid of
*goto* and label, reordered macro invocations, renamed, modified the changelog.]
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: use SET_RUNTIME_PM_OPS()

Use SET_RUNTIME_PM_OPS() macro to initialize the runtime PM method pointers in
the 'struct dev_pm_ops'.

Signed-off-by: Mikhail Ulyanov <[email protected]>
[Sergei: renamed, added the changelog.]
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-fixes-for-3.19-20150121' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2015-01-21

this is a pull request for v3.19, net/master, which consists of a single patch.

Viktor Babrian fixes the issue in the c_can dirver, that the CAN interface
might continue to send frames after the interface has been shut down.
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: add BQL support

Add support for Byte Queue Limits to the STMicro MAC driver.

Tested on a Amlogic S802 quad Cortex-A9 board, where the use of BQL
decreases the latency of a high priority ping from ~12ms to ~1ms when
the 100Mbit link is saturated by 20 TCP streams.

Signed-off-by: Beniamino Galvani <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Giuseppe Cavallaro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"The lifetime rules of cgroup hierarchies always have been somewhat
  counter-intuitive and cgroup core tried to enforce that hierarchies
  w/o userland-visible usages must die in finite amount of time so that
  the controllers can be reused for other hierarchies; unfortunately,
  this can't be implemented reasonably for the memory controller - the
  kmemcg part doesn't have any way to forcefully drain the existing
  usages, leading to an interruptible hang if a following mount attempts
  to use the controller in any way.

  So, it seems like we're stuck with "hierarchies live on till they die
  whenever that may be" at least for now.  This pretty much confines
  attaching controllers to hierarchies to before the hierarchies are
  actively used by making dynamic configurations post active usages
  unreliable.  This has never been reliable and should be fine in
  practice given how cgroups are used.

  After the patch, hierarchies aren't killed if it isn't already
  drained.  A following mount attempt of the same mount options will
  reuse the existing hierarchy.  Mount attempts with differing options
  will fail w/ -EBUSY"

* 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: prevent mount hang due to memory controller lifetime

Merge tag 'regulator-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"One correctness fix here for the s2mps11 driver which would have
  resulted in some of the regulators being completely broken together
  with a fix for locking in regualtor_put() (which is fortunately rarely
  called at all in practical systems)"

* tag 'regulator-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: s2mps11: Fix wrong calculation of register offset
  regulator: core: fix race condition in regulator_put()

Merge tag 'spi-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few driver specific fixes here, some fixes for issues introduced and
  discovered during recent work on the DesignWare driver (which has been
  getting a lot of attention recently) and a couple of other drivers.
  All serious things for people who run into them"

* tag 'spi-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: dw: amend warning message
  spi: sh-msiof: fix MDR1_FLD_MASK value
  spi: dw-mid: fix FIFO size
  spi: dw: Fix detecting FIFO depth
  spi/pxa2xx: Clear cur_chip pointer before starting next message

cxgb4: Fixes cxgb4_inet6addr_notifier unregister call

commit b5a02f503caa0837 ("cxgb4 : Update ipv6 address handling api") introduced
a regression where unregister cxgb4_inet6addr_notifier wasn't getting called
during module_exit.

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

NFC: st21nfcb: Fix copy/paste error in comment

include/linux/platform_data/st21nfcb.h is based on
include/linux/platform_data/st21nfca.h.

The endif comment is inacurrate for st21nfcb.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfcb: Remove useless include

include/linux/platform_data/st21nfcb.h is phy generic.
There is no need to include linux/i2c.h

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfcb: Fix "NULL pointer dereference" possible error

When the platform with CONFIG_ST21NFCB_I2C=y without any st21nfcb component
physically connected a:
"Unable to handle kernel NULL pointer dereference at virtual address" may
show up at driver initialization phase.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfca: Fix some skb memory leaks

Fix some memory leaks after some nfc_hci_get_param calls.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfca: Remove checkpatch.pl warning Possible unnecessary 'out of memory' message

Remove unnecessary memory allocation message already shown by devm_kzalloc.
This remove a warning when running scripts/checkpatch.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfca: Remove skb_pipe_list and skb_pipe_info useless allocation

skb_pipe_list and skb_pipe_info are allocated in nfc_hci_send_cmd.
alloc_skb on those buffer are then useless.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: nfc_disable_se Remove useless blank line at beginning of function

Remove one useless blank line at beginning of nfc_disable_se function.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: nfc_enable_se Remove useless blank line at beginning of function

Remove one useless blank line at beginning of nfc_enable_se function.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfcb: Avoid use of skb after free

Do not insert in send queue the skb that contains unknown Packet Control
Byte

Acked-by: Christophe Ricard <[email protected]>
Signed-off-by: Anda-Maria Nicolae <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element

Array of platform_device_id elements should be terminated with empty
element.

Fixes: 5bccae6ec458 ("rtc: s5m-rtc: add real-time clock driver for s5m8767")
Signed-off-by: Andrey Ryabinin <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

printk: add dummy routine for when CONFIG_PRINTK=n

There are missing dummy routines for log_buf_addr_get() and
log_buf_len_get() for when CONFIG_PRINTK is not set causing build
failures.

This patch adds these dummy routines at the appropriate location.

Signed-off-by: Pranith Kumar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/vmscan: fix highidx argument type

for_each_zone_zonelist_nodemask wants an enum zone_type argument, but is
passed gfp_t:

  mm/vmscan.c:2658:9:    expected int enum zone_type [signed] highest_zoneidx
  mm/vmscan.c:2658:9:    got restricted gfp_t [usertype] gfp_mask
  mm/vmscan.c:2658:9: warning: incorrect type in argument 2 (different base types)
  mm/vmscan.c:2658:9:    expected int enum zone_type [signed] highest_zoneidx
  mm/vmscan.c:2658:9:    got restricted gfp_t [usertype] gfp_mask

convert argument to the correct type.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Suleiman Souhlal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg: remove extra newlines from memcg oom kill log

Commit e61734c55c24 ("cgroup: remove cgroup->name") added two extra
newlines to memcg oom kill log messages.  This makes dmesg hard to read
and parse.  The issue affects 3.15+.

Example:

  Task in /t                          <<< extra #1
   killed as a result of limit of /t
                                      <<< extra #2
  memory: usage 102400kB, limit 102400kB, failcnt 274712

Remove the extra newlines from memcg oom kill messages, so the messages
look like:

  Task in /t killed as a result of limit of /t
  memory: usage 102400kB, limit 102400kB, failcnt 240649

Fixes: e61734c55c24 ("cgroup: remove cgroup->name")
Signed-off-by: Greg Thelen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86, build: replace Perl script with Shell script

Commit e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd")
added Perl to the required build environment. This reimplements in
shell the Perl script used to find the size of the kernel with bss and
brk added.

Signed-off-by: Kees Cook <[email protected]>
Reported-by: Rob Landley <[email protected]>
Acked-by: Rob Landley <[email protected]>
Cc: Anca Emanuel <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Junjie Mao <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: page_alloc: embed OOM killing naturally into allocation slowpath

The OOM killing invocation does a lot of duplicative checks against the
task's allocation context.  Rework it to take advantage of the existing
checks in the allocator slowpath.

The OOM killer is invoked when the allocator is unable to reclaim any
pages but the allocation has to keep looping.  Instead of having a check
for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
invocation to the true branch of should_alloc_retry().  The __GFP_FS
check from oom_gfp_allowed() can then be moved into the OOM avoidance
branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

__alloc_pages_may_oom() can then signal to the caller whether the OOM
killer was invoked, instead of requiring it to duplicate the order and
high_zoneidx checks to guess this when deciding whether to continue.

Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

rhashtable: rhashtable_remove() must unlink in both tbl and future_tbl

As removals can occur during resizes, entries may be referred to from
both tbl and future_tbl when the removal is requested. Therefore
rhashtable_remove() must unlink the entry in both tables if this is
the case. The existing code did search both tables but stopped when it
hit the first match.

Failing to unlink in both tables resulted in use after free.

Fixes: 97defe1ecf86 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <[email protected]>
Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge remote-tracking branches 'spi/fix/dw', 'spi/fix/msiof' and 'spi/fix/pxa2xx' into spi-linus

ipv6: tcp: fix race in IPV6_2292PKTOPTIONS

IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
to charge the skb to socket.

It means that destructor must be called while socket is locked.

Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
to protect ourselves : kfree_skb() might race with other users
manipulating sk->sk_forward_alloc

Fix this race by holding socket lock for the duration of
ip6_datagram_recv_ctl()

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: fix rht_for_each_entry_safe() endless loop

"next" is not updated, causing an endless loop for buckets with more than
one element.

Fixes: 88d6ed15acff ("rhashtable: Convert bucket iterators to take table and index")
Signed-off-by: Patrick McHardy <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 's390'

Ursula Braun says:

====================
s390/qeth patches for net

here are two s390/qeth patches built for net.
One patch is quite large, but we would like to fix the locking warning
seen in recent kernels as soon as possible. But if you want me to submit
these patches for net-next, I will do.
Or Gerlitz says:
====================

Signed-off-by: David S. Miller <[email protected]>

390/qeth: Fix locking warning during qeth device setup

Do not wait for channel command buffers in IPA commands.
The potential wait could be done while holding a spin lock and causes
in recent kernels such a bug if kernel lock debugging is enabled:

kernel: BUG: sleeping function called from invalid context at drivers/s390/net/qeth_core_main.c:
794
kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 2031, name: NetworkManager
kernel: 2 locks held by NetworkManager/2031:
kernel:  #0:  (rtnl_mutex){+.+.+.}, at: [<00000000006e0d7a>] rtnetlink_rcv+0x32/0x50
kernel:  #1:  (_xmit_ETHER){+.....}, at: [<00000000006cfe90>] dev_set_rx_mode+0x30/0x50
kernel: CPU: 0 PID: 2031 Comm: NetworkManager Not tainted 3.18.0-rc5-next-20141124 #1
kernel:        00000000275fb1f0 00000000275fb280 0000000000000002 0000000000000000
               00000000275fb320 00000000275fb298 00000000275fb298 00000000007e326a
               0000000000000000 000000000099ce2c 00000000009b4988 000000000000000b
               00000000275fb2e0 00000000275fb280 0000000000000000 0000000000000000
               0000000000000000 00000000001129c8 00000000275fb280 00000000275fb2e0
kernel: Call Trace:
kernel: ([<00000000001128b0>] show_trace+0xf8/0x158)
kernel:  [<000000000011297a>] show_stack+0x6a/0xe8
kernel:  [<00000000007e995a>] dump_stack+0x82/0xb0
kernel:  [<000000000017d668>] ___might_sleep+0x170/0x228
kernel:  [<000003ff80026f0e>] qeth_wait_for_buffer+0x36/0xd0 [qeth]
kernel:  [<000003ff80026fe2>] qeth_get_ipacmd_buffer+0x3a/0xc0 [qeth]
kernel:  [<000003ff80105078>] qeth_l3_send_setdelmc+0x58/0xf8 [qeth_l3]
kernel:  [<000003ff8010b1fe>] qeth_l3_set_ip_addr_list+0x2c6/0x848 [qeth_l3]
kernel:  [<000003ff8010bbb4>] qeth_l3_set_multicast_list+0x434/0xc48 [qeth_l3]
kernel:  [<00000000006cfe9a>] dev_set_rx_mode+0x3a/0x50
kernel:  [<00000000006cff90>] __dev_open+0xe0/0x140
kernel:  [<00000000006d02a0>] __dev_change_flags+0xa0/0x178
kernel:  [<00000000006d03a8>] dev_change_flags+0x30/0x70
kernel:  [<00000000006e14ee>] do_setlink+0x346/0x9a0
...

The device driver has plenty of command buffers available
per channel for channel command communication.
In the extremely rare case when there is no command buffer
available, return a NULL pointer and issue a warning
in the kernel log. The caller handles the case when
a NULL pointer is encountered and returns an error.

In the case the wait for command buffer is possible
(because no lock is held as in the OSN case), still wait
until a channel command buffer is available.

Signed-off-by: Thomas Richter <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Reviewed-by: Eugene Crosser <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qeth: clean up error handling

In the functions that are registering and unregistering MAC
addresses in the qeth-handled hardware, remove callback functions
that are unnesessary, as only the return code is analyzed.
Translate hardware response codes to semi-standard 'errno'-like
codes for readability.

Add kernel-doc description to the internal API function
qeth_send_control_data().

Signed-off-by: Eugene Crosser <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Reviewed-by: Thomas-Mich Richter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/fsl: Replace spin_event_timeout() with arch independent in xgmac_mdio

spin_event_timeout() is PPC dependent, use an arch independent
equivalent instead.

Signed-off-by: Shaohui Xie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/fsl: drop in_be32() & out_be32() in xgmac_mdio

Use ioread32be() & iowrite32be() instead.

Signed-off-by: Shaohui Xie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: handle more gso types

In commit 5a7baa78851b ("bonding: Advertize vxlan offload features when
supported"), Or Gerlitz added support conditional vxlan offload.

In this patch I also add support for all kind of tunnels,
but we allow a bonding device to not require segmentation,
as it is always better to make this segmentation at the very last stage,
if a particular slave device requires it.

Tested:

Setup a GRE tunnel,
on a physical NIC not having tx-gre-segmentation.
Results on bnx2x are even better, as we no longer have to segment
in software.

ethtool -K bond0 tx-gre-segmentation off

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
7538.32

ethtool -K bond0 tx-gre-segmentation on

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
10200.5

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bridge: simplify br_getlink() a bit

Static checkers complain that we should maybe set "ret" before we do the
"goto out;".  They interpret the NULL return from br_port_get_rtnl() as
a failure and forgetting to set the error code is a common bug in this
situation.

The code is confusing but it's actually correct.  We are returning zero
deliberately.  Let's re-write it a bit to be more clear.

Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Fix __ip6_route_redirect

In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect.  The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
   one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e4, will stop the backtrack from
   happening.

Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Linux 3.19-rc6

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"Hopefully the last round of fixes for 3.19

   - regression fix for the LDT changes
   - regression fix for XEN interrupt handling caused by the APIC
     changes
   - regression fixes for the PAT changes
   - last minute fixes for new the MPX support
   - regression fix for 32bit UP
   - fix for a long standing relocation issue on 64bit tagged for stable
   - functional fix for the Hyper-V clocksource tagged for stable
   - downgrade of a pr_err which tends to confuse users

  Looks a bit on the large side, but almost half of it are valuable
  comments"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86/apic: Re-enable PCI_MSI support for non-SMP X86_32
  x86, mm: Change cachemode exports to non-gpl
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86, mpx: Strictly enforce empty prctl() args
  x86, mpx: Fix potential performance issue on unmaps
  x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  x86: Don't rely on VMWare emulating PAT MSR correctly
  x86, irq: Properly tag virtualization entry in /proc/interrupts
  x86, boot: Skip relocs when load address unchanged
  x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi
  ACPI: pci: Do not clear pci_dev->irq in acpi_pci_irq_disable()
  x86/xen: Treat SCI interrupt as normal GSI interrupt

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"From the irqchip departement you get:

   - regression fix for omap-intc

   - regression fix for atmel-aic-common

   - functional correctness fix for hip04

   - type mismatch fix for gic-v3-its

   - proper error pointer check for mtd-sysirq

  Mostly one and two liners except for the omap regression fix which is
  slightly larger than desired"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: atmel-aic-common: Prevent clobbering of priority when changing IRQ type
  irqchip: omap-intc: Fix legacy DMA regression
  irqchip: gic-v3-its: Fix use of max with decimal constant
  irqchip: hip04: Initialize hip04_cpu_map to 0xffff
  irqchip: mtk-sysirq: Use IS_ERR() instead of NULL pointer check

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"A set of small fixes:

   - regression fix for exynos_mct clocksource

   - trivial build fix for kona clocksource

   - functional one liner fix for the sh_tmu clocksource

   - two validation fixes to prevent (root only) data corruption in the
     kernel via settimeofday and adjtimex.  Tagged for stable"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  clocksource: sh_tmu: Set cpu_possible_mask to fix SMP broadcast
  clocksource: kona: fix __iomem annotation
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write

Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A week's worth of fixes for various ARM platforms.  Diff wise, the
  largest fix is for OMAP to deal with how GIC now registers interrupts
  (irq_domain_add_legacy() -> irq_domain_add_linear() changes).

  Besides this, a few more renesas platforms needed the GIC instatiation
  done for legacy boards.  There's also a fix that disables coherency of
  mvebu due to issues, and a few other smaller fixes"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: add baud rate to Juno stdout-path
  ARM: dts: imx25: Fix PWM "per" clocks
  bus: mvebu-mbus: fix support of MBus window 13
  Merge tag 'mvebu-fixes-3.19-3' of git://git.infradead.org/linux-mvebu into fixes
  ARM: mvebu: completely disable hardware I/O coherency
  ARM: OMAP: Work around hardcoded interrupts
  ARM: shmobile: r8a7779: Instantiate GIC from C board code in legacy builds
  ARM: shmobile: r8a7778: Instantiate GIC from C board code in legacy builds
  arm: boot: dts: dra7: enable dwc3 suspend PHY quirk

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"A couple of fixes - deadlock in CIFS and build breakage in cris serial
  driver (resurfaced f_dentry in there)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Convert file->f_dentry->d_inode to file_inode()
  fix deadlock in cifs_ioctl_clone()

Merge tag 'dm-3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
"Two stable fixes for dm-cache and one 3.19 DM core fix:

   - fix potential for dm-cache metadata corruption via stale metadata
     buffers being used when switching an inactive cache table to
     active; this could occur due to each table having it's own bufio
     client rather than sharing the client between tables.

   - fix dm-cache target to properly account for discard IO while
     suspending otherwise IO quiescing could complete prematurely.

   - fix DM core's handling of multiple internal suspends by maintaining
     an 'internal_suspend_count' and only resuming the device when this
     count drops to zero"

* tag 'dm-3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix handling of multiple internal suspends
  dm cache: fix problematic dual use of a single migration count variable
  dm cache: share cache-metadata object across inactive and active DM tables

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull two block layer fixes from Jens Axboe:
"Two small patches that should make it into 3.19:

   - a fixup from me for NVMe, making the cq_vector a signed variable.
     Otherwise our -1 comparison fails, and commit 2b25d981790b doesn't
     do what it was supposed to.

   - a fixup for the hotplug handling for blk-mq from Ming Lei, using
     the proper kobject referencing to ensure we release resources at
     the right time"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix hctx/ctx kobject use-after-free
  NVMe: cq_vector should be signed

Merge branch 'phy_dsa'

Florian Fainelli says:

====================
net: phy and dsa random fixes/cleanups

These two patches were already present as part of my attempt to make
DSA modules work properly, these are the only two "valid" patches at
this point which should not need any further rework.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: factor interrupt disabling in a function

Factor the interrupt disabling in a function: bcm_sf2_intr_disable()
since we are doing the same thing in the setup and suspend paths.

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: fixed: allow setting no update_link callback

fixed_phy_set_link_update() contains an early check against a NULL
callback pointer, which basically prevents us from removing any
previous callback we may have set. The users of the fp->link_update
callback deal with a NULL callback just fine, so we really want to allow
"removing" a link_update callback to avoid dangling callback pointers
during e.g: module removal.

Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: set slave MII bus PHY mask

When registering a mdio bus, Linux assumes than every port has a PHY and tries
to scan it. If a switch port has no PHY registered, DSA will fail to register
the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch
PHYs mask.

As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no
registered PHYs for port 5 and port 6), with the following declared names:

static struct dsa_chip_data switch_cdata = {
[...]
.port_names[0] = "sw0",
.port_names[1] = "sw1",
.port_names[2] = "sw2",
.port_names[3] = "sw3",
.port_names[4] = "sw4",
.port_names[5] = "cpu",
};

DSA will fail to create the switch instance. With the PHY mask set for the
slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will
be successfully created.

Signed-off-by: Vivien Didelot <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

NFC: st21nfca: Remove unreachable code

kfree_skb(skb) in st21nfca_hci_event_received is never reach.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: st21nfcb: Fix "WARNING: invalid free of devm_ allocated data"

ndlc pointer got allocated with devm_kzalloc in ndlc_probe function.

This gives this error message:
drivers/nfc/st21nfcb/ndlc.c:296:1-6: WARNING: invalid free of devm_ allocated data.

Reported-by: Dan Carpenter <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: dts: st21nfcb: Fix compatible string spelling to follow other drivers

Other drivers are following the following compatible string format for dts:
s/_/-/

Because some devices may still use the previous string, the new corrected
string is added to the of_device_id table.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

NFC: dts: st21nfca: Fix compatible string spelling to follow other drivers

Other drivers are following the following compatible string format for dts:
s/_/-/

Because some devices may still use the previous string, the new corrected
string is added to the of_device_id table.

Signed-off-by: Christophe Ricard <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>

net: ipv6: Add sysctl entry to disable MTU updates from RA

The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.

Signed-off-by: Harout Hedeshian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'fib_trie_next'

Alexander Duyck says:

====================
Fixes and improvements for recent fib_trie updates

While performing testing and prepping the next round of patches I found a
few minor issues and improvements that could be made.

These changes should help to reduce the overall code size and improve the
performance slighlty as I noticed a 20ns or so improvement in my worst-case
testing which will likely only result in a 1ns difference with a standard
sized trie.
====================

Signed-off-by: David S. Miller <[email protected]>

fib_trie: Various clean-ups for handling slen

While doing further work on the fib_trie I noted a few items.

First I was using calls that were far more complicated than they needed to
be for determining when to push/pull the suffix length. I have updated the
code to reflect the simplier logic.

The second issue is that I realised we weren't necessarily handling the
case of a leaf_info struct surviving a flush. I have updated the logic so
that now we will call pull_suffix in the event of having a leaf info value
left in the leaf after flushing it.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Move fib_find_alias to file where it is used

The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Use empty_children instead of counting empty nodes in stats collection

It doesn't make much sense to count the pointers ourselves when
empty_children already has a count for the number of NULL pointers stored
in the tnode. As such save ourselves the cycles and just use
empty_children.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Add collapse() and should_collapse() to resize

This patch really does two things.

First it pulls the logic for determining if we should collapse one node out
of the tree and the actual code doing the collapse into a separate pair of
functions. This helps to make the changes to these areas more readable.

Second it encodes the upper 32b of the empty_children value onto the
full_children value in the case of bits == KEYLENGTH. By doing this we are
able to handle the case of a 32b node where empty_children would appear to
be 0 when it was actually 1ul << 32.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Fall back to slen update on inflate/halve failure

This change corrects an issue where if inflate or halve fails we were
exiting the resize function without at least updating the slen for the
node. To correct this I have moved the update of max_size into the while
loop so that it is only decremented on a successful call to either inflate
or halve.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Fix RCU bug and merge similar bits of inflate/halve

This patch addresses two issues.

The first issue is the fact that I believe I had the RCU freeing sequence
slightly out of order.  As a result we could get into an issue if a caller
went into a child of a child of the new node, then backtraced into the to be
freed parent, and then attempted to access a child of a child that may have
been consumed in a resize of one of the new nodes children.  To resolve this I
have moved the resize after we have freed the oldtnode.  The only side effect
of this is that we will now be calling resize on more nodes in the case of
inflate due to the fact that we don't have a good way to test to see if a
full_tnode on the new node was there before or after the allocation.  This
should have minimal impact however since the node should already be
correctly size so it is just the cost of calling should_inflate that we
will be taking on the node which is only a couple of cycles.

The second issue is the fact that inflate and halve were essentially doing
the same thing after the new node was added to the trie replacing the old
one.  As such it wasn't really necessary to keep the code in both functions
so I have split it out into two other functions, called replace and
update_children.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits

In doing performance testing and analysis of the changes I recently found
that by shifting the index I had created an unnecessary dependency.

I have updated the code so that we instead shift a mask by bits and then
just test against that as that should save us about 2 CPU cycles since we
can generate the mask while the key and pos are being processed.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlx4-next'

Or Gerlitz says:

====================
mlx4: Fix and enhance the device reset flow

This series from Yishai Hadas fixes the device reset flow and adds SRIOV support.

Reset flows are required whenever a device experiences errors, is unresponsive,
or is not in a deterministic state. In such cases, the driver is expected to
reset the HW and continue operation. When SRIOV is enabled, these requirements
apply both to PF and VF devices.

Currently, the mlx4 reset flow doesn't work properly: when a fatal error is
detected on the FW internal buffer the chip is not reset and stays in its
bad state. There are cases that assumed to be fatal such as non-responsive FW,
errors via closing commands but are not handled today.

The AER mechanism should also be fixed:
- It should use mlx4_load_one instead of __mlx4_init_one which is done
upon HCA probing.
- It must be aligned with concurrent catas flow, mark device to be in
an error state, reset chip, etc.
- Port types should be restored to their original values before error occurred.

In addition, there the SRIOV use-case isn't supported.

In above cases when the device state becomes fatal we must act as follows:
1) Reset the chip and mark the HW device state as in fatal error.
2) Wake up any pending commands, preventing new ones to come in.
3) Restart the software stack.

We also address the SRIOV mode as follows: In case the PF detects a fatal error,
it lets VFs know about that, then both itself and VFs are restarted asynchronously.
However, in case only the VF encountered a fatal case or forced to be reset, they
reset the VF stuff and then restart software.

changes from V0:

No need to call pci_disable_device upon permanent PCI error. This will
be done as part of mlx4_remove_one which is called later once we
return PCI_ERS_RESULT_DISCONNECT from the pci error handler.

Initial toggle value should use only the T bit and not the whole byte value.
Not doing so sometimes broke SRIOV as of junky value seen by the VF as a
non-ready comm channel
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Reset flow activation upon SRIOV fatal command cases

When SRIOV commands are executed over the comm-channel and get
a fatal error (e.g. timeout, closing command failure) the VF enters
into error state and reset flow is activated.

To be able to recognize whether the failure was on a closing command, the
operational code for the given VHCR command is used. Once the device entered
into an error state we prevent redundant error messages from being printed.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Enable device recovery flow with SRIOV

In SRIOV, both the PF and the VF may attempt device recovery whenever they
assume that the device is not functioning. When the PF driver resets the
device, the VF should detect this and attempt to reinitialize itself.

The VF must be able to reset itself under all circumstances, even
if the PF is not responsive.

The VF shall reset itself in the following cases:

1. Commands are not processed within reasonable time over the communication channel.
This is done considering device state and the correct return code based on
the command as was done in the native mode, done in the next patch.

2. The VF driver receives an internal error event reported by the PF on the
communication channel. This occurs when the PF driver resets the device or
when VF is out of sync with the PF.

Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
PF is not responsive.

As PF and VF may run their reset flow simulantanisly, there are several cases
that are handled:
- Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
- Prevent PF getting VF commands before it has finished initializing its resources.
- Upon VF startup, check that comm-channel is online before sending
commands to the PF and getting timed-out.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Handle AER flow properly

Fix AER callbacks to work properly, it includes:
- Refractoring AER to be aligned with Reset flow support.
- Sync with concurrent catas flow.

In addition, fix the shutdown PCI callback to sync with
concurrent catas flow.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Manage interface state for Reset flow cases

We need to manage interface state to sync between reset flow and some other
relative cases such as remove_one. This has to be done to prevent certain
races. For example in case software stack is down as a result of unload call,
the remove_one should skip the unload phase.

Implement the remove_one case, handling AER and other cases comes next.

The interface can be up/down, upon remove_one, the state will include an extra
bit indicating that the device is cleaned-up, forcing other tasks to finish
before the final cleanup.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Activate reset flow upon fatal command cases

We activate reset flow upon command fatal errors, when the device enters an
erroneous state, and must be reset.

The cases below are assumed to be fatal: FW command timed-out, an error from FW
on closing commands, pci is offline when posting/pending a command.

In those cases we place the device into an error state: chip is reset, pending
commands are awakened and completed immediately. Subsequent commands will
return immediately.

The return code in the above cases will depend on the command. Commands which
free and close resources will return success (because the chip was reset, so
callers may safely free their kernel resources). Other commands will return -EIO.

Since the device's state was marked as error, the catas poller will
detect this and restart the device's software stack (as is done when a FW
internal error is directly detected). The device state is protected by a
persistent mutex lives on its mlx4_dev, as such no need any more for the
hcr_mutex which is removed.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Enhance the catas flow to support device reset

This includes:

- resetting the chip when a fatal error is detected (the current code
  does not do this).

- exposing the ability to enter error state from outside the catas code
  by calling its functionality. (E.g. FW Command timeout, AER error).

- managing a persistent device state. This is needed to sync between
  reset flow cases.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Refactor the catas flow to work per device

Using a WQ per device instead of a single global WQ, this allows
independent reset handling per device even when SRIOV is used.

This comes as a pre-patch for supporting chip reset
for both native and SRIOV.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Set device configuration data to be persistent across reset

When an HCA enters an internal error state, this is detected by the driver.
The driver then should reset the HCA and restart the software stack.

Keep ports information and some SRIOV configuration in a persistent area
to have it valid across reset.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Maintain a persistent memory for mlx4 device

Maintain a persistent memory that should survive reset flow/PCI error.
This comes as a preparation for coming series to support above flows.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipvlan: fix incorrect usage of IS_ERR() macro in IPv6 code path.

The ip6_route_output() always returns a valid dst pointer unlike in IPv4
case. So the validation has to be different from the IPv4 path. Correcting
that error in this patch.

This was picked up by a static checker with a following warning -

drivers/net/ipvlan/ipvlan_core.c:380 ipvlan_process_v6_outbound()
warn: 'dst' isn't an ERR_PTR

Signed-off-by: Mahesh Bandewar <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: llc: use correct size for sysctl timeout entries

The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netxen: fix netxen_nic_poll() logic

NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.

If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.

Reported-and-tested-by: Mike Galbraith <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Fixes: d75b1ade567f ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <[email protected]>
Acked-by: Manish Chopra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb3: re-use native hex2bin()

Call hex2bin() library function instead of doing conversion here.

Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: re-use native hex2bin()

Call hex2bin() library function, instead of doing conversion here.

Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-01-22

This series contains updates to e1000, e1000e, igb, fm10k and virtio_net.

Asaf Vertz provides a fix for e1000 to future-proof the time comparisons
by using time_after_eq() instead of plain math.

Mathias Koehrer provides a fix for e1000e to add a check to e1000_xmit_frame()
to ensure a work queue will not be scheduled that has not been initialized.

Jacob adds the use of software timestamping via the virtio_net driver.

Alex Duyck cleans up page reuse code in igb and fm10k.  Cleans up the
page reuse code from getting into a state where all the workarounds
needed are in place as well as cleaning up oversights, such as using
__free_pages instead of put_page to drop a locally allocated page.

Richard Cochran provides 4 patches for igb dealing with time sync.
First provides a helper function since the code that handles the time
sync interrupt is repeated in three different places.  Then serializes
the access to the time sync interrupt since the registers may be
manipulated from different contexts.  Enables the use of i210 device
interrupt to generate an internal PPS event for adjusting the kernel
system time.  The i210 device offers a number of special PTP hardware
clock features on the Software Defined Pins (SDPs), so added support for
two of the possible functions (time stamping external events and
periodic output signals).

Or Gerlitz fixes fm10k from double setting of NETIF_F_SG since the
networking core does it for the driver during registration time.

Joe Stringer adds support for up to 104 bytes of inner+outer headers in
fm10k and adds an initial check to fail encapsulation offload if these
are too large.

Matthew increases the timeout for the data path reset based on feedback
from the hardware team, since 100us is too short of a time to wait for
the data path reset to complete.

Alexander Graf provides a fix for igb to indicate failure on VF reset
for an empty MAC address, to mirror the behavior of ixgbe.

Florian Westphal updates e1000 and e1000e to support txtd update delay
via xmit_more, this way we won't update the Tx tail descriptor if the
queue has not been stopped and we know at least one more skb will be
sent right away.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'vxlan_tx'

Tom Herbert says:

====================
vxlan: Don't use UDP socket for transmit

UDP socket is not pertinent to transmit for UDP tunnels, checksum
enablement can be done without a socket. This patch set eliminates
reference to a socket in udp_tunnel_xmit functions and in VXLAN
transmit.

Also, make GBP, RCO, can CSUM6_RX flags visible to receive socket
and only match these for shareable socket.

v2: Fix geneve to call udp_tunnel_xmit with good arguments.
====================

Signed-off-by: David S. Miller <[email protected]>

vxlan: Eliminate dependency on UDP socket in transmit path

In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>