Git Repo - linux.git/log

net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined

Add checking whether the call to ndo_dflt_fdb_dump is needed.
It is not expected to call ndo_dflt_fdb_dump unconditionally
by some drivers (i.e. qlcnic or macvlan) that defines
own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
and don't want ndo_dflt_fdb_dump to be called at all.
At the same time it is desirable to call the default dump
function on a bridge device.
Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
Add extra checking in br_fdb_dump to avoid duplicate entries
as now filter_dev can be NULL.

Following tests for filtering have been performed before
the change and after the patch was applied to make sure
they are the same and it doesn't break the filtering algorithm.

[root@localhost ~]# cd /root/iproute2-3.18.0/bridge
[root@localhost bridge]# modprobe dummy
[root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
[root@localhost bridge]# brctl addbr br0
[root@localhost bridge]# brctl addif br0 dummy0
[root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
[root@localhost bridge]# # show all
[root@localhost bridge]# ./bridge fdb show
33:33:00:00:00:01 dev p2p1 self permanent
01:00:5e:00:00:01 dev p2p1 self permanent
33:33:ff:ac:ce:32 dev p2p1 self permanent
33:33:00:00:02:02 dev p2p1 self permanent
01:00:5e:00:00:fb dev p2p1 self permanent
33:33:00:00:00:01 dev p7p1 self permanent
01:00:5e:00:00:01 dev p7p1 self permanent
33:33:ff:79:50:53 dev p7p1 self permanent
33:33:00:00:02:02 dev p7p1 self permanent
01:00:5e:00:00:fb dev p7p1 self permanent
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by bridge
[root@localhost bridge]# ./bridge fdb show br br0
f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
33:33:00:00:00:01 dev dummy0 self permanent
f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
33:33:00:00:00:01 dev br0 self permanent
02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
02:00:00:12:01:04 dev br0 master br0 permanent
[root@localhost bridge]# # filter by port
[root@localhost bridge]# ./bridge fdb show brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]# # filter by port + bridge
[root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
f2:46:50:85:6d:d9 master br0 permanent
f2:46:50:85:6d:d9 vlan 1 master br0 permanent
33:33:00:00:00:01 self permanent
f1:f2:f3:f4:f5:f6 self permanent
[root@localhost bridge]#

Signed-off-by: Hubert Sokolowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ip_cmsg_csum'

Tom Herbert says:

====================
ip: Support checksum returned in csmg

This patch set allows the packet checksum for a datagram socket
to be returned in csum data in recvmsg. This allows userspace
to implement its own checksum over the data, for instance if an
IP tunnel was be implemented in user space, the inner checksum
could be validated.

Changes in this patch set:
  - Move checksum conversion to inet_sock from udp_sock. This
    generalizes checksum conversion for use with other protocols.
  - Move IP cmsg constants to a header file and make processing
    of the flags more efficient in ip_cmsg_recv
  - Return checksum value in cmsg. This is specifically the unfolded
    32 bit checksum of the full packet starting from the first byte
    returned in recvmsg

Tested: Wrote a little server to get checksums in cmsg for UDP and
        verfied correct checksum is returned.
====================

Signed-off-by: David S. Miller <[email protected]>

ip: Add offset parameter to ip_cmsg_recv

Add ip_cmsg_recv_offset function which takes an offset argument
that indicates the starting offset in skb where data is being received
from. This will be useful in the case of UDP and provided checksum
to user space.

ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of
zero.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ip: IP cmsg cleanup

Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that
they can be referenced in other source files.

Restructure ip_cmsg_recv to not go through flags using shift, check
for flags by 'and'. This eliminates both the shift and a conditional
per flag check.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ip: Move checksum convert defines to inet

Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlink: Warn on unordered or illegal nla_nest_cancel() or nlmsg_cancel()

Calling nla_nest_cancel() in a different order as the nesting was
built up can lead to negative offsets being calculated which
results in skb_trim() being called with an underflowed unsigned
int. Warn if mark < skb->data as it's definitely a bug.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Linux 3.19-rc3

Merge tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:

- Wire up sys_execveat(). Tested on 32 & 64 bit.

- Fix for kdump on LE systems with cpus hot unplugged.

- Revert Anton's fix for "kernel BUG at kernel/smpboot.c:134!", this
   broke other platforms, we'll do a proper fix for 3.20.

* tag 'powerpc-3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
  powerpc/kdump: Ignore failure in enabling big endian exception during crash
  powerpc: Wire up sys_execveat() syscall

Merge tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull ia64 fixlet from Tony Luck:
"Add execveat syscall"

* tag 'please-pull-syscall' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Enable execveat syscall for ia64

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
RDMA/cxgb4/cxgb4vf/csiostor: Cleanup register defines

This series continues to cleanup all the macros/register defines related to
SGE, PCIE, MC, MA, TCAM, MAC, etc that are defined in t4_regs.h and the
affected files.

Will post another 1 or 2 series so that we can cover all the macros so that
they all follow the same style to be consistent.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4 and csiostor driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <[email protected]>

cxgb4/cxgb4vf/csiostor: Cleanup PL, XGMAC, SF and MC related register defines

This patch cleanups all PL, XGMAC and SF related macros/register defines
that are defined in t4_regs.h and the affected files

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4/csiostor: Cleanup TP, MPS and TCAM related register defines

This patch cleanups all TP, MPS and TCAM related macros/register defines
that are defined in t4_regs.h and the affected files

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4/cxg4vf/csiostor: Cleanup MC, MA and CIM related register defines

This patch cleanups all MC, MA and CIM related macros/register defines that are
defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4/cxgb4vf/csiostor: Cleanup SGE and PCI related register defines

This patch cleansup remaining SGE related macros/register defines and all PCI
related ones that are defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDMA/cxgb4/cxgb4vf/csiostor: Cleanup SGE register defines

This patch cleanups all SGE related macros/register defines that are
defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

be2net: support TX batching using skb->xmit_more flag

This patch uses skb->xmit_more flag to batch TX requests.
TX is flushed either when xmit_more is false or there is
no more space in the TXQ.

Skyhawk-R and BEx chips require an even number of wrbs to be posted.
So, when a batch of TX requests is accumulated, the last header wrb
may need to be fixed with an extra dummy wrb.

This patch refactors be_xmit() routine as a sequence of be_xmit_enqueue()
and be_xmit_flush() calls. The Tx completion code is also
updated to be able to unmap/free a batch of skbs rather than a single
skb.

Signed-off-by: Sathya Perla <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

at86rf230: Constify struct regmap_config

The regmap_config struct may be const because it is not modified by the
driver and regmap_init() accepts pointer to const.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

[IA64] Enable execveat syscall for ia64

See commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
syscalls: implement execveat() system call

Signed-off-by: Tony Luck <[email protected]>

Revert "mac80211: Fix accounting of the tailroom-needed counter"

This reverts commit ca34e3b5c808385b175650605faa29e71e91991b.

It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.

Cc: [email protected]
Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter")
Reported-by: Christopher Chavez <[email protected]>
Bisected-by: Larry Finger <[email protected]>
Debugged-by: Christian Lamparter <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

list_nulls: fix missing header

Fixup below build error:

include/linux/list_nulls.h: In function ‘hlist_nulls_del’:
include/linux/list_nulls.h:84:13: error: ‘LIST_POISON2’ undeclared (first use in this function)

Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: fix missing header

Fixup below build error:

include/linux/rhashtable.h: At top level:
include/linux/rhashtable.h:118:34: error: field ‘mutex’ has incomplete type

Signed-off-by: Ying Xue <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qlcnic: Fix dump_skb output

Use normal facilities to avoid printing each byte
on a separate line.

Now emits at KERN_DEBUG instead of KERN_INFO.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

enic: reconfigure resources for kdump crash kernel

When running in kdump kernel, reduce number of resources used by the driver.
This will enable NIC to operate in low memory kdump kernel environment.

Also change the driver version to .83

Signed-off-by: Govindarajulu Varadarajan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'geneve-next'

Jesse Gross says:

====================
Geneve Cleanups

Much of the basis for the Geneve code comes from VXLAN. However,
Geneve is quite a bit simpler than VXLAN and so this cleans up a
lot of the infrastruction - particularly around locking - where the
extra complexity is not necessary.
====================

Signed-off-by: David S. Miller <[email protected]>

geneve: Check family when reusing sockets.

When searching for an existing socket to reuse, the address family
is not taken into account - only port number. This means that an
IPv4 socket could be used for IPv6 traffic and vice versa, which
is sure to cause problems when passing packets.

It is not possible to trigger this problem currently because the
only user of Geneve creates just IPv4 sockets. However, that is
likely to change in the near future.

Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Remove socket hash table.

The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.

Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Simplify locking.

The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.

In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.

Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Remove workqueue.

The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.

Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: cpsw: fix hangs with interrupts

The CPSW IP implements pulse-signaled interrupts. Due to
that we must write a correct, pre-defined value to the
CPDMA_MACEOIVECTOR register so the controller generates
a pulse on the correct IRQ line to signal the End Of
Interrupt.

The way the driver is written today, all four IRQ lines
are requested using the same IRQ handler and, because of
that, we could fall into situations where a TX IRQ fires
but we tell the controller that we ended an RX IRQ (or
vice-versa). This situation triggers an IRQ storm on the
reserved IRQ 127 of INTC which will in turn call ack_bad_irq()
which will, then, print a ton of:

unexpected IRQ trap at vector 00

In order to fix the problem, we are moving all calls to
cpdma_ctlr_eoi() inside the IRQ handler and making sure
we *always* write the correct value to the CPDMA_MACEOIVECTOR
register. Note that the algorithm assumes that IRQ numbers and
value-to-be-written-to-EOI are proportional, meaning that a
write of value 0 would trigger an EOI pulse for the RX_THRESHOLD
Interrupt and that's the IRQ number sitting in the 0-th index
of our irqs_table array.

This, however, is safe at least for current implementations of
CPSW so we will refrain from making the check smarter (and, as
a side-effect, slower) until we actually have a platform where
IRQ lines are swapped.

This patch has been tested for several days with AM335x- and
AM437x-based platforms. AM57x was left out because there are
still pending patches to enable ethernet in mainline for that
platform. A read of the TRM confirms the statement on previous
paragraph.

Reported-by: Yegor Yefremov <[email protected]>
Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly)
Cc: <[email protected]> # v3.9+
Signed-off-by: Felipe Balbi <[email protected]>
Acked-by: Tony Lindgren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"Two fixes for UML regressions. Nothing exciting"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
x86, um: actually mark system call tables readonly
um: Skip futex_atomic_cmpxchg_inatomic() test

Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo"

Commit 9fc2105aeaaf ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message

  FATAL: cannot locate cpu MHz in /proc/cpuinfo

I'm not the first one to hit it, see for example

  https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
  https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1

Reading original changelog, I have to say "Stop breaking working setups.
You know who you are!".

Signed-off-by: Pavel Machek <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86, um: actually mark system call tables readonly

Commit a074335a370e ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.

Before:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size

After:

$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size

Fixes: a074335a370e ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: Skip futex_atomic_cmpxchg_inatomic() test

futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.

Reported-by: Miklos Szeredi <[email protected]>
Suggested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Tested-by: Daniel Walter <[email protected]>

Merge branch 'rhashtable-next'

Thomas Graf says:

====================
rhashtable: Per bucket locks & deferred table resizing

Prepares for and introduces per bucket spinlocks and deferred table
resizing. This allows for parallel table mutations in different hash
buckets from atomic context. The resizing occurs in the background
in a separate worker thread while lookups, inserts, and removals can
continue.

Also modified the chain linked list to be terminated with a special
nulls marker to allow entries to move between multiple lists.

Last but not least, reintroduces lockless netlink_lookup() with
deferred Netlink socket destruction to avoid the side effect of
increased netlink_release() runtime.
====================

Signed-off-by: David S. Miller <[email protected]>

netlink: Lockless lookup with RCU grace period in socket release

Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()

This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Supports for nulls marker

In order to allow for wider usage of rhashtable, use a special nulls
marker to terminate each chain. The reason for not using the existing
nulls_list is that the prev pointer usage would not be valid as entries
can be linked in two different buckets at the same time.

The 4 nulls base bits can be set through the rhashtable_params structure
like this:

struct rhashtable_params params = {
[...]
.nulls_base = (1U << RHT_BASE_SHIFT),
};

This reduces the hash length from 32 bits to 27 bits.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Per bucket locks & deferred expansion/shrinking

Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.

The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.

Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.

In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts. Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.

The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.

Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

spinlock: Add spin_lock_bh_nested()

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nft_hash: Remove rhashtable_remove_pprev()

The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().

With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.

Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.

Signed-off-by: Thomas Graf <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Factor out bucket_tail() function

Subsequent patches will require access to the bucket tail. Access
to the tail is relatively cheap as the automatic resizing of the
table should keep the number of entries per bucket to no more
than 0.75 on average.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Convert bucket iterators to take table and index

This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.

It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.

The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Use rht_obj() instead of manual offset calculation

Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rhashtable: Do hashing inside of rhashtable_lookup_compare()

Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.

Signed-off-by: Thomas Graf <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'timecounter-next'

Richard Cochran says:

====================
Fixing the "Time Counter fixes and improvements"

For this series I had only tested the build with ARCH=x86 and arm, but
others like sparc64, microblaze, powerpc, and s390 will fail because
they somehow don't indirectly include clocksource.h for the drivers in
question.

This series fixes the build issues reported by:
kbuild test robot <[email protected]>
====================

Signed-off-by: David S. Miller <[email protected]>

microblaze: include the new timecounter header.

The timecounter/cyclecounter code has moved, so users need the new include.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlx4: include clocksource.h again

This driver uses the function, clocksource_khz2mult, and so it really must
include clocksource.h.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ixgbe: convert to CYCLECOUNTER_MASK macro.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

igb: convert to CYCLECOUNTER_MASK macro.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

e1000e: convert to CYCLECOUNTER_MASK macro.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: convert to CYCLECOUNTER_MASK macro.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

timecounter: provide a macro to initialize the cyclecounter mask field.

There is no need for users of the timecounter/cyclecounter code to include
clocksource.h just for a single macro.

Signed-off-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: Update Open vSwitch entry.

OVS development is moved to netdev mailing list. Update tree and
list in MAINTAINERS file.

Signed-off-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

enic: free all rq buffs when allocation fails

When allocation of all RQs fail, we do not free previously allocated buffers,
before returning error. This causes memory leak.

This patch fixes this by calling vnic_rq_clean(), which frees all the rq
buffers.

Signed-off-by: Govindarajulu Varadarajan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qmi_wwan: Set random MAC on devices with buggy fw

Some buggy firmwares export an incorrect MAC address (00:a0:c6:00:00:00). This
makes for example checking devices for random MAC addresses tricky, and you
might end up with multiple network interfaces with the same address.

This patch tries to fix, or at least improve, the situation by setting the MAC
address of devices with this firmware bug to a random address. I tested the
patch with two devices that has this firmware bug (Huawei E398 and E392), and
network traffic worked fine after changing the address.

Signed-off-by: Kristian Evensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-next-for-davem-2015-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Changes:

* ath9k: enable Transmit Power Control (TPC) for ar9003 chips

* rtlwifi: cleanup and updates from the vendor driver

* rsi: fix memory leak related to firmware image

* ath: parameter fix for FCC DFS pattern

Signed-off-by: David S. Miller <[email protected]>

net: ethernet: cisco: enic: enic_dev: Remove some unused functions

Removes some functions that are not used anywhere:
enic_dev_enable2_done() enic_dev_enable2() enic_dev_deinit_done()
enic_dev_init_prov2() enic_vnic_dev_deinit()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

isdn: hisax: hfc4s8s_l1: Remove some unused functions

Removes some functions that are not used anywhere:
Read_hfc32() Write_hfc32() Write_hfc16()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fddi: skfp: smt.c: Remove unused function

Remove the function smt_ifconfig() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: chelsio: cxgb3: mc5.c: Remove some unused functions

Removes some functions that are not used anywhere:
dbgi_rd_rsp3() dbgi_wr_addr3()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of three fixes: one to correct an abort path thinko
  causing failures (and a panic) in USB on device misbehaviour, One to
  fix an out of order issue in the fnic driver and one to match discard
  expectations to qemu which otherwise cause Linux to behave badly as a
  guest"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  SCSI: fix regression in scsi_send_eh_cmnd()
  fnic: IOMMU Fault occurs when IO and abort IO is out of order
  sd: tweak discard heuristics to work around QEMU SCSI issue

openvswitch: Consistently include VLAN header in flow and port stats.

Until now, when VLAN acceleration was in use, the bytes of the VLAN header
were not included in port or flow byte counters. They were however
included when VLAN acceleration was not used. This commit corrects the
inconsistency, by always including the VLAN header in byte counters.

Previous discussion at
http://openvswitch.org/pipermail/dev/2014-December/049521.html

Reported-by: Motonori Shindo <[email protected]>
Signed-off-by: Ben Pfaff <[email protected]>
Reviewed-by: Flavio Leitner <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: Do not apply TSO segment limit to non-TSO packets

Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.

In fact the problem was completely unrelated to IPsec. The bug is
also reproducible if you just disable TSO/GSO.

The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.

This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue. Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.

Fixes: 1485348d242 ("tcp: Apply device TSO segment limit earlier")
Reported-by: Thomas Jarosch <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Cheers,
Signed-off-by: David S. Miller <[email protected]>

net: skbuff: don't zero tc members when freeing skb

Not needed, only four cases:
- kfree_skb (or one of its aliases).
   Don't need to zero, memory will be freed.
- kfree_skb_partial and head was stolen:  memory will be freed.
- skb_morph:  The skb header fields (including tc ones) will be
   copied over from the 'to-be-morphed' skb right after
   skb_release_head_state returns.
- skb_segment:  Same as before, all the skb header
   fields are copied over from the original skb right away.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg say:

====================
pull request: bluetooth-next 2014-12-31

Here's the first batch of bluetooth patches for 3.20.

- Cleanups & fixes to ieee802154 drivers
- Fix synchronization of mgmt commands with respective HCI commands
- Add self-tests for LE pairing crypto functionality
- Remove 'BlueFritz!' specific handling from core using a new quirk flag
- Public address configuration support for ath3012
- Refactor debugfs support into a dedicated file
- Initial support for LE Data Length Extension feature from Bluetooth 4.2

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Nothing too exciting as a new year's start here: most of fixes are for
  ASoC, a boot crash fix on OMAP for deferred probe, a few driver
  specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
  fixes in kerneldoc comments for PCM"

* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Fix kerneldoc for params_*() functions
  ASoC: rockchip: i2s: fix maxburst of dma data to 4
  ASoC: rockchip: i2s: fix error defination of transmit data level
  ASoC: Intel: correct the fixed free block allocation
  ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
  ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
  ASoC: Intel: Fix BYTCR firmware name
  ASoC: dwc: Iterate over all channels
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: Intel: Add I2C dependency to two new machines
  ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe

geneve: Add Geneve GRO support

This results in an approximately 30% increase in throughput
when handling encapsulated bulk traffic.

Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Add Transparent Ethernet Bridging GRO support.

Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.

Signed-off-by: Jesse Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Altera TSE: Add missing phydev

Altera network device doesn't come up after

ifconfig eth0 down
ifconfig eth0 up

The reason behind is clearing priv->phydev during tse_shutdown().
The phydev is not restored back at tse_open().

Resubmiting as to follow Tobias Klauser suggestion.
phy_start/phy_stop are called on each ifup/ifdown and
phy_disconnect is called once during the module removal.

Signed-off-by: Kostya Belezko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlx4-net'

Or Gerlitz says:

====================
mlx4 driver fixes for 3.19-rc2

Please push Maor's patch to -stable >= 3.17

Jack's fixes error-flow issues introduced in 3.19-rc1, no need for -stable.
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Fix error flow in mlx4_init_hca()

We shouldn't call UNMAP_FA here, this is done in mlx4_load_one.

If mlx4_query_func fails, we need to invoke CLOSE_HCA for both
native and master.

Fixes: a0eacca948d2 ('net/mlx4_core: Refactor mlx4_load_one')
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow

Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the
old MTT's page shift, which could result in using an incorrect offset.
Fix the initialization to be after we calculate the new MTT offset.

In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This
is necessary in order to mark the MTT as invalid and avoid freeing it later.

Fixes: e630664 ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Matan Barak <[email protected]>
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/fsl: remove hardcoded clock setting from xgmac_mdio

There is no need to set the clock speed in read/write which will be performed
unnecessarily for each mdio access. Init it during probe is enough.

Also, the hardcoded clock value is not a proper way for all SoCs.

Signed-off-by: Shaohui Xie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/fsl: remove irq assignment from xgmac_mdio

Which is wrong and not used, so no extra space needed by
mdiobus_alloc_size(), use mdiobus_alloc() instead.

Signed-off-by: Shaohui Xie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/fsl: remove reset from xgmac_mdio

Since the reset is just clock setting, individual mdio reset is
not available.

Signed-off-by: Shaohui Xie <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ext4: remove spurious KERN_INFO from ext4_warning call

Signed-off-by: Jakub Wilk <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

Revert "ext4: fix suboptimal seek_{data,hole} extents traversial"

This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815.

This was causing regression test failures with generic/285 with an ext3
filesystem using CONFIG_EXT4_USE_FOR_EXT23.

Signed-off-by: Theodore Ts'o <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost cleanup and virtio bugfix
"There's a single change here, fixing a vhost bug where vhost
  initialization fails due to used ring alignment check being too
  strict"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: relax used address alignment
  virtio_ring: document alignment requirements

qlcnic: Fix return value in qlcnic_probe()

If the check of adapter fails and goes into the 'else' branch, the
return value 'err' should not still be zero.

Signed-off-by: Yongjian Xu <[email protected]>
Acked-by: Shahed Shaikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: axienet: fix error return code

Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sun4i-emac: fix error return code

Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

myri10ge: fix error return code

Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

The patch also modifies the test of mgp->cmd to satisfy checkpatch.

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Xilinx: fix error return code

Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-12-31

This series contains updates to fixes for e100, igb and i40e.

John Linville fixes a typo in e100 that has been around for some time,
where an attempted revert actually inverted the test for eeprom_mdix_enabled.

Todd fixes up a code comment that should have been removed back in 2007.

Joe Perches fixes a possible memory leak in i40e which was reported by
Dan Carpenter using smatch.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'gmac-next'

Roger Chen says:

====================
support GMAC driver for RK3288

Roger Chen (6):
  patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
  patch2: define clock ID used for GMAC
  patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
  patch4: dts: rockchip: add gmac info for rk3288
  patch5: dts: rockchip: enable gmac on RK3288 evb board
  patch6: add document for Rockchip RK3288 GMAC

Tested on rk3288 evb board:
Execute the following command to enable ethernet,
set local IP and ping a remote host.

busybox ifconfig eth0 up
busybox ifconfig eth0 192.168.1.111
ping 192.168.1.1
====================

Signed-off-by: David S. Miller <[email protected]>

GMAC: add document for Rockchip RK3288 GMAC

The document descripts how to add properties for GMAC in device tree.

change since v2:

1. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
2. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ARM: dts: rockchip: enable gmac on RK3288 evb board

enable gmac in rk3288-evb-rk808.dts

changes since v2:
1. add fixed regulator for PHY
2. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
3. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ARM: dts: rockchip: add gmac info for rk3288

add gmac info in rk3288.dtsi for GMAC driver

changes since v2:
1. add drive-strength in the pinctrl settings

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

GMAC: modify CRU config for Rockchip RK3288 SoCs integrated GMAC

modify CRU config for GMAC driver

changes since v2:
1. remove SCLK_MAC_PLL

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

GMAC: define clock ID used for GMAC

changes since v2:
1. remove SCLK_MAC_PLL

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC

This driver is based on stmmac driver.

changes since v2:
- use tab instead of space for macros
- use HIWORD_UPDATE macro for GMAC_CLK_RX_DL_CFG and GMAC_CLK_TX_DL_CFG
- remove drive-strength setting in the driver and set it in the pinctrl settings
- use dev_err instead of pr_err
- remove clock names's macros, just use the real name of the clock
- use devm_clk_get() instead of clk_get()
- remove clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll)
- remove gpio setting for LDO, just use regulator API
- remove phy reset using gpio in the glue layer, it has been handled in the stmmac driver
- remove handling phy interrupt (mii interrupt)

changes since v1:
- use BIT() to set register
- combine two remap_write() operations into one for the same register
- use macros for register value setting
- remove grf fail check in rk_gmac_setup() and save all the check in set_rgmii_speed()
- remove .tx_coe=1 in rk_gmac_data

Signed-off-by: Roger Chen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

i40e: Fix possible memory leak in i40e_dbg_dump_desc

I didn't notice that return in the code, fix it by
adding a goto out instead to free the memory.

Fixes:

> New smatch warnings:
> drivers/net/ethernet/intel/i40e/i40e_debugfs.c:832 i40e_dbg_dump_desc() warn: possible memory leak of 'ring'

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Tested-by: Jim Young <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

igb_ptp: Include clocksource.h to get CLOCKSOURCE_MASK.

Signed-off-by: David S. Miller <[email protected]>

e1000e: Include clocksource.h to get CLOCKSOURCE_MASK.

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'fib_trie-next'

Alexander Duyck says:

====================
fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%

These patches are meant to address several performance issues I have seen
in the fib_trie implementation, and fib_table_lookup specifically.  With
these changes in place I have seen a reduction of up to 35 to 75% for the
total time spent in fib_table_lookup depending on the type of search being
performed.

On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
this resulted in a reduction of over 370ns per packet in the total time to
process packets received from an ixgbe interface and route them to a dummy
interface.  This represents a failed lookup in the local trie followed by
a successful search in the main trie.

Baseline Refactor
  ixgbe->dummy routing 1.20Mpps 2.21Mpps
  ------------------------------------------------------------
  processing time per packet 835ns 453ns
  fib_table_lookup 50.1% 418ns 25.0% 113ns
  check_leaf.isra.9 7.9% 66ns    -- --
  ixgbe_clean_rx_irq 5.3% 44ns 9.8% 44ns
  ip_route_input_noref 2.9% 25ns 4.6% 21ns
  pvclock_clocksource_read 2.6% 21ns 4.6% 21ns
  ip_rcv 2.6% 22ns 4.0% 18ns

In the simple case of receiving a frame and dropping it before it can reach
the socket layer I saw a reduction of 40ns per packet.  This represents a
trip through the local trie with the correct leaf found with no need for
any backtracing.

Baseline Refactor
  ixgbe->local receive 2.65Mpps 2.96Mpps
  ------------------------------------------------------------
  processing time per packet 377ns 337ns
  fib_table_lookup 25.1% 95ns 25.8% 87ns
  ixgbe_clean_rx_irq 8.7% 33ns 9.0% 30ns
  check_leaf.isra.9 7.2% 27ns    -- --
  ip_rcv 5.7% 21ns 6.5% 22ns

These changes have resulted in several functions being inlined such as
check_leaf and fib_find_node, but due to the code simplification the
overall size of the code has been reduced.

   text    data     bss     dec     hex filename
  16932     376      16   17324    43ac net/ipv4/fib_trie.o - before
  15259     376       8   15643    3d1b net/ipv4/fib_trie.o - after

Changes since RFC:
  Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
  Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
====================

Signed-off-by: David S. Miller <[email protected]>

fib_trie: Add tracking value for suffix length

This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode. With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.

By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Remove checks for index >= tnode_child_length from tnode_get_child

For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change.  As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.

I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.

size net/ipv4/fib_trie.o
Before:
   text    data     bss     dec     hex filename
  15506     376       8   15890    3e12 net/ipv4/fib_trie.o

After:
   text    data     bss     dec     hex filename
  14827     376       8   15211    3b6b net/ipv4/fib_trie.o

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: inflate/halve nodes in a more RCU friendly way

This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well. By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.

I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Push tnode flushing down to inflate/halve

This change pushes the tnode freeing down into the inflate and halve
functions.  It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.

I believe this may address a bug in the freeing logic as well.  For some
reason if the freelist got to a certain size we would call
synchronize_rcu().  I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu().  As such that is what I have updated the behavior to be.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fib_trie: Push assignment of child to parent down into inflate/halve

This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize. By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>