Ben Hutchings [Tue, 7 Dec 2010 19:24:45 +0000 (19:24 +0000)]
sfc: Fix crash in legacy onterrupt handler during ring reallocation
If we are using a legacy interrupt, our IRQ may be shared and our
interrupt handler may be called even though interrupts are disabled on
the NIC. When we change ring sizes, we reallocate the event queue and
the interrupt handler may use an invalid pointer when called for
another device's interrupt.
Maintain a legacy_irq_enabled flag and test that at the top of the
interrupt handler. Note that this problem results from the need to
work around broken INT_ISR0 reads, and does not affect the legacy
interrupt handler for Falcon A1.
Ben Hutchings [Tue, 7 Dec 2010 19:11:26 +0000 (19:11 +0000)]
sfc: Generalise filter spec initialisation
Move search_depth arrays into per-table state.
Define initialisation function efx_filter_init_rx() which sets
everything apart from the match fields.
Define efx_filter_set_{ipv4_local,ipv4_full,eth_local}() to set the
match fields. This allows some simplification of callers and later
support for additional protocols and more flexible matching using
multiple calls to these functions.
Ben Hutchings [Tue, 7 Dec 2010 19:02:27 +0000 (19:02 +0000)]
sfc: Remove filter table IDs from filter functions
The separation between filter tables is largely an internal detail
and it may be removed in future hardware. To prepare for that:
- Merge table ID with filter index to make an opaque filter ID
- Wrap efx_filter_table_clear() with a function that clears filters
from both RX tables, which is all that the current caller requires
Ben Hutchings [Tue, 7 Dec 2010 18:29:52 +0000 (18:29 +0000)]
sfc: Log start and end of ethtool self-test at INFO level
Add message at start of self-test and increase log level of message at
end of self-test, so that any other messages produced during the
test are clearly associated with it.
This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
* a simple FIFO policy (which is the default) and
* a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).
The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.
Ben Hutchings [Mon, 15 Nov 2010 23:53:11 +0000 (23:53 +0000)]
sfc: Use TX push whenever adding descriptors to an empty queue
Whenever we add DMA descriptors to a TX ring and update the ring
pointer, the TX DMA engine must first read the new DMA descriptors and
then start reading packet data. However, all released Solarflare 10G
controllers have a 'TX push' feature that allows us to reduce latency
by writing the first new DMA descriptor along with the pointer update.
This is only useful when the queue is empty. The hardware should
ignore the pushed descriptor if the queue is not empty, but this check
is buggy, so we must do it in software.
In order to tell whether a TX queue is empty, we need to compare the
previous transmission count (write_count) and completion count
(read_count). However, if we do that every time we update the ring
pointer then read_count may ping-pong between the caches of two CPUs
running the transmission and completion paths for the queue.
Therefore, we split the check for an empty queue between the
completion path and the transmission path:
- Add an empty_read_count field representing a point at which the
completion path saw the TX queue as empty.
- Add an old_write_count field for use on the completion path.
- On the completion path, whenever read_count reaches or passes
old_write_count the TX queue may be empty. We then read
write_count, set empty_read_count if read_count == write_count,
and update old_write_count.
- On the transmission path, we read empty_read_count. If it's set, we
compare it with the value of write_count before the current set of
descriptors was added. If they match, the queue really is empty and
we can use TX push.
Ben Hutchings [Mon, 6 Dec 2010 22:58:41 +0000 (22:58 +0000)]
sfc: Remove locking from implementation of efx_writeo_paged()
It is not necessary to serialise writes to the paged 128-bit
registers. However, if we don't then we must always write the last
dword separately, not as part of a qword write.
Ben Hutchings [Mon, 6 Dec 2010 22:53:15 +0000 (22:53 +0000)]
sfc: Reorder struct efx_nic to separate fields by volatility
Place the regularly updated fields (locks, MAC stats, etc.) on a
separate cache-line from fields which are mostly constant. This
should reduce cache misses for access to the latter on the data path.
Don Skidmore [Fri, 3 Dec 2010 13:23:30 +0000 (13:23 +0000)]
ixgbe: fix for link failure on SFP+ DA cables
This patch helps prevent FW/SW semaphore collision from leading
to link establishment failure. The collision might mess up the
PHY registers so we reset the PHY. However there are SFI/KR areas
in the PHY that are not reset with a Reset_AN so we need to change
LMS to reset it. Also wait until AN state machine is AN_GOOD
Eric Dumazet [Wed, 1 Dec 2010 20:46:24 +0000 (20:46 +0000)]
filter: add a security check at install time
We added some security checks in commit 57fe93b374a6
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.
This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.
This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. (So that
sk_run_filter() can possibly not read unitialized memory)
Note: this can uncover application bug, denying a filter attach,
previously allowed.
Sathya Perla [Wed, 1 Dec 2010 01:04:17 +0000 (01:04 +0000)]
be2net: Handle out of buffer completions for lancer
If Lancer chip does not have posted RX buffers, it posts an RX completion entry
with the same frag_index as the last valid completion. The Error bit is also
set. In BE, a flush completion is indicated with a zero value for num_rcvd in
the completion.
Such completions don't carry any data and are not processed.
This patch refactors code to handle both cases with the same code.
Changli Gao [Wed, 1 Dec 2010 02:52:20 +0000 (02:52 +0000)]
af_packet: use vmalloc_to_page() instead for the addresss returned by vmalloc()
The following commit causes the pgv->buffer may point to the memory
returned by vmalloc(). And we can't use virt_to_page() for the vmalloc
address.
This patch introduces a new inline function pgv_to_page(), which calls
vmalloc_to_page() for the vmalloc address, and virt_to_page() for the
__get_free_pages address.
We used to increase page pointer to get the next page at the next page
address, after Neil's patch, it is wrong, as the physical address may
be not continuous. This patch also fixes this issue.
Eric Dumazet [Wed, 1 Dec 2010 06:03:06 +0000 (06:03 +0000)]
net: kill an RCU warning in inet_fill_link_af()
commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfeb8c
(rtnl: make link af-specific updates atomic) used incorrect
__in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU
warnings.
Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold
RTNL.
Based on a report and initial patch from Amerigo Wang.
Eric Dumazet [Tue, 30 Nov 2010 21:45:56 +0000 (21:45 +0000)]
filter: add SKF_AD_RXHASH and SKF_AD_CPU
Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
to be able to build advanced filters.
This can help spreading packets on several sockets with a fast
selection, after RPS dispatch to N cpus for example, or to catch a
percentage of flows in one queue.
tcpdump -s 500 "cpu = 1" :
[0] ld CPU
[1] jeq #1 jt 2 jf 3
[2] ret #500
[3] ret #0
# take 12.5 % of flows (average)
tcpdump -s 1000 "rxhash & 7 = 2" :
[0] ld RXHASH
[1] and #7
[2] jeq #2 jt 3 jf 4
[3] ret #1000
[4] ret #0
Joe Perches [Tue, 30 Nov 2010 08:18:44 +0000 (08:18 +0000)]
ehea: Use the standard logging functions
Remove ehea_error, ehea_info and ehea_debug macros.
Use pr_fmt, pr_<level>, netdev_<level> and netif_<level> as appropriate.
Fix messages to use trailing "\n", some messages had an extra one
as the old ehea_<level> macros added a trailing "\n".
Coalesced long format strings.
Michał Mirosław [Tue, 30 Nov 2010 06:38:00 +0000 (06:38 +0000)]
net: Fix too optimistic NETIF_F_HW_CSUM features
NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM+NETIF_F_IPV6_CSUM, but
some drivers miss the difference. Fix this and also fix UFO dependency
on checksumming offload as it makes the same mistake in assumptions.
Alexey Orishko [Mon, 29 Nov 2010 23:23:28 +0000 (23:23 +0000)]
USB CDC NCM host driver
The patch provides USB CDC NCM host driver support in the Linux Kernel.
Changes:
drivers/net/usb/cdc_ncm.c:
- initial submission of the CDC NCM host driver;
- verified on Intel 32/64 bit, Intel Atom, ST-Ericsson U8500 (ARM)
- throughput measured over 100 Mbits duplex;
- driver supports 16-bit NTB format only, but it is more than enough for
transfers up to 64K;
- driver can handle up to 32 datagrams in received NTB;
- timer is used to collect several packets in Tx direction
drivers/net/usb/Kconfig:
- a new entry to compile CDC NCM host driver
drivers/net/usb/Makefile:
- a new entry to compile CDC NCM host driver
Alexey Orishko [Mon, 29 Nov 2010 23:23:27 +0000 (23:23 +0000)]
usbnet: changes for upcoming cdc_ncm driver
Changes:
include/linux/usb/usbnet.h:
- a new flag to indicate driver's capability to accumulate IP packets in Tx
direction and extract several packets from single skb in Rx direction.
drivers/net/usb/usbnet.c:
- the procedure of counting packets in usbnet was updated due to the
accumulating of IP packets in the driver
- no short packets are sent if indicated by the flag in driver_info
structure
Matt Carlson [Mon, 6 Dec 2010 08:28:52 +0000 (08:28 +0000)]
tg3: Minor EEE code tweaks
The first hunk of this patch makes sure that the driver checks for the
appropriate preconditions before checking if EEE negotiation succeeded.
More specifically the link needs to be full duplex for EEE to be
enabled.
The second and third hunks of this patch fix a bug where the eee
advertisement register would be programmed with extra bits set.
The fourth hunk of this patch makes sure the EEE capability flag is not
set for 5718 A0 devices and that the device is not a serdes device.
None of these modifications are strictly necessary. The driver /
hardware still does the right thing. They are submitted primarily for
correctness.
Matt Carlson [Mon, 6 Dec 2010 08:28:51 +0000 (08:28 +0000)]
tg3: Fix 57765 EEE support
EEE support in the 57765 internal phy will not enable after a phy reset
unless it sees that EEE is supported in the MAC. This patch moves the
code that programs the CPMU EEE registers to a place before the phy
reset.
Matt Carlson [Mon, 6 Dec 2010 08:28:50 +0000 (08:28 +0000)]
tg3: Move EEE definitions into mdio.h
In commit 52b02d04c801fff51ca49ad033210846d1713253 entitled "tg3: Add
EEE support", Ben Hutchings had commented that the EEE advertisement
register will be in a standard location. This patch moves that
definition into mdio.h and changes the code to use it.
Matt Carlson [Mon, 6 Dec 2010 08:28:49 +0000 (08:28 +0000)]
tg3: Raise the jumbo frame BD flag threshold
The current transmit routines set the jumbo frame BD flag too
aggressively. This can reduce performance for common cases. This patch
raises the jumbo flag threshold to 1518, up from 1500.
Eric Dumazet [Mon, 6 Dec 2010 17:29:43 +0000 (09:29 -0800)]
filter: fix sk_filter rcu handling
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac280c263397
Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.
We can not use filter->rcu without being sure no other thread could do
the same thing.
Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.
Changli Gao [Mon, 29 Nov 2010 22:48:46 +0000 (22:48 +0000)]
net: don't reallocate skb->head unless the current one hasn't the needed extra size or is shared
skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.
Don Skidmore [Fri, 3 Dec 2010 03:31:51 +0000 (03:31 +0000)]
ixgbe: fix link behavior for SFP+ when driver is brought down
We have had several requests to have ifconfig down command disable
the SFP+ laser and thus make link go down. Likewise on ifconfig up
the laser would be enabled and link would come up. This patch enables
that behavior.
Ben Hutchings [Thu, 2 Dec 2010 13:47:01 +0000 (13:47 +0000)]
sfc: Remove broken automatic fallback for invalid Falcon chip/board config
If the Falcon board config is invalid, we cannot proceed - we do not
have a valid board type to pass to falcon_probe_board(), and if we
kluge that to work with an unknown board then other initialisation
code will crash.
Steve Hodgson [Thu, 2 Dec 2010 13:46:55 +0000 (13:46 +0000)]
sfc: Fix event based MCDI completion and MC REBOOT/CMDDONE ordering issue
The mcfw *never* sends CMDDONE when rebooting. Changing this so that it always
sends CMDDONE *before* REBOOT is easy on Siena, but it's not obvious that we
could guarantee to be able to implement this on future hardware.
Given this, I'm less convinced that the protocol should be changed.
To reiterate the failure mode: The driver sees this:
issue command
receive REBOOT event
Was that reboot event sent before the command was issued, or in
response to the command? If the former then there will be a subsequent
CMDDONE event, if the latter, then there will be no CMDDONE event.
Options to resolve this are:
1. REBOOT always completes an outstanding mcdi request, and we set
the credits count to ignore a subsequent CMDDONE event with
mismatching seqno.
2. REBOOT never completes an outstanding mcdi request. If there is
no CMDDONE event then we rely on the mcdi timeout code to complete
the outstanding request, incurring a 10s delay.
I'd argue that (2) is tidier, but that incurring a 10s delay is a little
needless. Let's go with (1).
Ben Hutchings [Thu, 2 Dec 2010 13:46:37 +0000 (13:46 +0000)]
sfc: Clear RXIN_SEL when soft-resetting QT2025C
When we enable PMA/PMD loopback this automatically sets RXIN_SEL
(inverse polarity for RXIN). We need to clear that bit during the
soft-reset sequence, as it is not done automatically.
Ben Hutchings [Thu, 2 Dec 2010 13:46:24 +0000 (13:46 +0000)]
sfc: Distinguish critical and non-critical over-temperature conditions
Set both the 'maximum' and critical temperature limits for LM87
hardware monitors on Falcon boards. Do not shut down a port until the
critical temperature is reached, but warn as soon as the 'maximum'
temperature is reached.
Ben Hutchings [Thu, 2 Dec 2010 13:46:09 +0000 (13:46 +0000)]
sfc: Reduce log level for MCDI error response in efx_mcdi_rpc()
Some errors are expected, e.g. when sending new commands to an MC
running old firmware. Only the caller of efx_mcdi_rpc() can decide
what is a real error. Therefore log the error responses with
netif_dbg().
Allan Stephens [Tue, 30 Nov 2010 12:01:02 +0000 (12:01 +0000)]
tipc: Eliminate obsolete native API forwarding routines
Moves the content of each native API message forwarding routine
into the sole routine that calls it, since the forwarding routines
no longer be called in isolation. Also removes code in each routine
that altered the outgoing message's importance level since this is
now no longer possible.
The previous function mapping (parent function, and child API) was
as follows:
tipc_send2name
\--tipc_forward2name
tipc_send2port
\--tipc_forward2port
tipc_send_buf2port
\--tipc_forward_buf2port
After this commit, the children don't exist and their functionality
is completely in the respective parent.
Allan Stephens [Tue, 30 Nov 2010 12:00:58 +0000 (12:00 +0000)]
tipc: Remove support for TIPC mode change callback
Eliminates support for the callback routine invoked when TIPC
changes its mode of operation from inactive to standalone or from
standalone to networked. This callback was part of TIPC's obsolete
native API and is not used by TIPC internally.
Allan Stephens [Tue, 30 Nov 2010 12:00:57 +0000 (12:00 +0000)]
tipc: Delete useless function prototypes
Removes several function declarations that aren't used anywhere,
either because they reference routines that no longer exist or
because all users of the function reference it after it has already
been defined.
Allan Stephens [Tue, 30 Nov 2010 12:00:54 +0000 (12:00 +0000)]
tipc: Remove obsolete inclusions of header files
Gets rid of #include statements that are no longer required as a
result of the merging of obsolete native API header file content
into other TIPC include files.
Allan Stephens [Tue, 30 Nov 2010 12:00:53 +0000 (12:00 +0000)]
tipc: Remove obsolete native API files and exports
As part of the removal of TIPC's native API support it is no longer
necessary for TIPC to export symbols for routines that can be called
by kernel-based applications, nor for it to have header files that
kernel-based applications can include to access the declarations for
those routines. This commit eliminates the exporting of symbols by
TIPC and migrates the contents of each obsolete native API include
file into its corresponding non-native API equivalent.
The code which was migrated in this commit was migrated intact, in
that there are no technical changes combined with the relocation.
Tomoya [Mon, 29 Nov 2010 18:19:52 +0000 (18:19 +0000)]
can: EG20T PCH: Delete unnecessary spin_lock
Delete unnecessary spin_lock for accessing Message Object.
Since all message objects are divided into tx/rx area completely,
spin_lock processing is unnecessary.
Tomoya [Mon, 29 Nov 2010 18:11:52 +0000 (18:11 +0000)]
can: EG20T PCH: Separate Interface Register(IF1/IF2)
CAN register of Intel PCH EG20T has 2 sets of interface register.
To reduce whole of code size, separate interface register.
As a result, the number of function also can be reduced.
Felix Fietkau [Thu, 2 Dec 2010 09:27:16 +0000 (10:27 +0100)]
ath5k: Add AHB bus support.
AHB specific functions are now in ahb.c file. AHB bus is
compiled in when CONFIG_ATHEROS_AR231X is set in kernel.
All other platforms will use PCI bus.