David S. Miller [Fri, 3 Apr 2015 20:23:58 +0000 (16:23 -0400)]
netfilter: Create and use nf_hook_state.
Instead of passing a large number of arguments down into the nf_hook()
entry points, create a structure which carries this state down through
the hook processing layers.
This makes is so that if we want to change the types or signatures of
any of these pieces of state, there are less places that need to be
changed.
Bluetooth: Fix location of TX power field in LE advertising data
The TX power field in the LE advertising data should be placed last
since it needs to be possible to enable kernel controlled TX power,
but still allow for userspace provided flags field.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"A fix for ALPS driver for issue introduced in the latest update and a
tweak for yet another Lenovo box in Synaptics.
There will be more ALPS tweaks coming.."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: define INPUT_PROP_ACCELEROMETER behavior
Input: synaptics - fix min-max quirk value for E440
Input: synaptics - add quirk for Thinkpad E440
Input: ALPS - fix max coordinates for v5 and v7 protocols
Input: add MT_TOOL_PALM
With this patch kernel will be able to handle setup request. This is
needed if we would like to handle control mesages with extension
headers. User space will be only resposible for reading setup data and
checking if scenario is conformance to specification (dst and src device
bnep role). In case of new user space, setup data must be leaved(peek
msg) on queue. New bnep session will be responsible for handling this
data.
Bluetooth: bnep: Add support to extended headers of control frames
Handling extended headers of control frames is required BNEP
functionality. This patch refractor bnep rx frame handling function.
Extended header for control frames shouldn't be omitted as it was
previously done. Every control frame should be checked if it contains
extended header and then every extension should be parsed separately.
Bluetooth: bnep: Add support for get bnep features via ioctl
This is needed if user space wants to know supported bnep features
by kernel, e.g. if kernel supports sending response to bnep setup
control message. By now there is no possibility to know supported
features by kernel in case of bnep. Ioctls allows only to add connection,
delete connection, get connection list, get connection info. Adding
connection if it's possible (establishing network device connection) is
equivalent to starting bnep session. Bnep session handles data queue of
transmit, receive messages over bnep channel. It means that if we add
connection the received/transmitted data will be parsed immediately. In
case of get bnep features we want to know before session start, if we
should leave setup data on socket queue and let kernel to handle with it,
or in case of no setup handling support, if we should pull this message
and handle setup response within user space.
David S. Miller [Fri, 3 Apr 2015 19:08:20 +0000 (15:08 -0400)]
Merge branch 'mvneta-sgmii'
Stas Sergeev says:
====================
mvneta: SGMII-based in-band link state signaling
Currently the fixed-link DT binding is pre-configured and
cannot be changed in run-time. This means the cable unplug
events are not being detected, and the link parameters can't
be negotiated.
The following patches are needed when mvneta is used
in fixed-link mode (without MDIO).
They add an API to fixed_phy that allows to update
status, and use that API in the mvneta driver when parsing
the SGMII in-band status.
There is also another implementation that doesn't add any API
and does everything in mvneta driver locally:
https://lkml.org/lkml/2015/3/31/327
I'll let people decide which approach is better.
No strong opinion on my side.
====================
mvneta: implement SGMII-based in-band link state signaling
When MDIO bus is unavailable (common setup for SGMII), the in-band
signaling must be used to correctly track link state.
This patch enables the in-band status delivery for link state changes, namely:
- link up/down
- link speed
- duplex full/half
fixed_phy_update_state() is used to update phy status.
add fixed_phy_update_state() - update state of fixed_phy
Currently fixed_phy uses a callback to periodically poll the link state.
This patch adds the fixed_phy_update_state() API.
It solves the following problems:
- On link state interrupt, MAC driver can't update status.
Instead it needs to provide the callback to periodically query
the HW about the link state. It is more efficient to update status
after interrupt.
- The callback needs to be unregistered before phy_disconnect(),
or otherwise it will be called with net_dev==NULL. phy_disconnect()
does not have enough info to unregister the callback automatically.
- The callback needs to be registered before of_phy_connect() to
avoid running with outdated state, but of_phy_connect() returns the
phy_device pointer, which is needed to register the callback. Registering
it before of_phy_connect() will therefore require a hack to get the
pointer earlier.
Overall, this addition makes the subsequent patch that implements
SGMII link status for mvneta, much cleaner.
Daniel Borkmann [Fri, 3 Apr 2015 18:52:24 +0000 (20:52 +0200)]
ebpf: add skb->priority to offset map for usage in {cls, act}_bpf
This adds the ability to read out the skb->priority from an eBPF
program, so that it can be taken into account from a tc filter
or action for the use-case where the priority is not being used
to directly override the filter classification in a qdisc, but
to tag traffic otherwise for the classifier; the priority can be
assigned from various places incl. user space, in future we may
also mangle it from an eBPF program.
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization
fix, a reboot quirk and a kgdb fixlet"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
MAINTAINERS: Change the x86 microcode loader maintainer
firmware: dmi_scan: Prevent dmi_num integer overflow
Alexander Duyck [Tue, 31 Mar 2015 21:19:10 +0000 (14:19 -0700)]
jhash: Update jhash_[321]words functions to use correct initval
Looking over the implementation for jhash2 and comparing it to jhash_3words
I realized that the two hashes were in fact very different. Doing a bit of
digging led me to "The new jhash implementation" in which lookup2 was
supposed to have been replaced with lookup3.
In reviewing the patch I noticed that jhash2 had originally initialized a
and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and
c were initialized to initval + (length << 2) + JHASH_INITVAL. However the
changes in jhash_3words simply replaced the initialization of a and b with
JHASH_INITVAL.
This change corrects what I believe was an oversight so that a, b, and c in
jhash_3words all have the same value added consisting of initval + (length
<< 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the
same hash result given the same inputs.
Fixes: 60d509c823cca ("The new jhash implementation") Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Fri, 3 Apr 2015 16:40:50 +0000 (12:40 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-04-03
This series contains updates to i40e and i40evf only.
Anjali provides a fix for verifying outer UDP receive checksum. Also
adds helpful information to display when figuring out the cause of
HMC errors.
Mitch provides a fix to prevent a malicious or buggy VF driver from
sending an invalid index into the VSI array which could panic the host.
Cleans up the code where a function was moved, but the message did
not follow. Adds protection to the VLAN filter list, same as the
MAC filter list, to protect from corruption if the watchdog happens
to run at the same time as a VLAN filter is being added/deleted.
Jesse changes several memcpy() statements to struct assignments which
are type safe and preferable. Fixed a bug when skb allocation fails,
where we should not continue using the skb pointer. Also fixed a void
function in FCoE which should not be returning anything.
Greg fixes both i40e and i40evf to set the Ethernet protocol correctly
when transmit VLAN offloads are disabled.
Shannon fixes up VLAN messages when ports are added or removed, which
were giving bogus index info. Also aligned the message text style
with other messages in the driver.
====================
Rusty Russell [Fri, 3 Apr 2015 11:47:17 +0000 (22:17 +1030)]
netdevice: document NETDEV_TX_BUSY deprecation.
This paraphrases DaveM (and steals some of his words) explaining why
a device shouldn't return NETDEV_TX_BUSY, even though it looks so inviting
to driver authors.
See http://www.spinics.net/lists/netdev/msg322350.html
Nicolas Dichtel [Fri, 3 Apr 2015 10:02:37 +0000 (12:02 +0200)]
netns: don't allocate an id for dead netns
First, let's explain the problem.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
- the bar nsid into the netns foo is removed
- the netns exit method of ipip is called, thus our ipip iface is removed:
=> a netlink message is built in the netns foo to advertise this deletion
=> this netlink message requests an nsid for bar, thus a new nsid is
allocated for bar and never removed.
This patch adds a check in peernet2id() so that an id cannot be allocated for
a netns which is currently destroyed.
David S. Miller [Fri, 3 Apr 2015 16:11:15 +0000 (12:11 -0400)]
Merge branch 'ipv4-null-cmp'
Ian Morris says:
====================
ipv4: coding style - comparisons with NULL
Per the suggestion of Joe Perches, attached is a patch which aligns the
coding style in ipv4 for comparisons with NULL.
The code uses multiple different styles when comparing with NULL (I.e.
x == NULL and !x as well as x != NULL and x). Generally the latter form
is preferred in netdev and so this changes aligns the code to this style.
====================
Ian Morris [Fri, 3 Apr 2015 08:17:27 +0000 (09:17 +0100)]
ipv4: coding style: comparison for inequality with NULL
The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.
Ian Morris [Fri, 3 Apr 2015 08:17:26 +0000 (09:17 +0100)]
ipv4: coding style: comparison for equality with NULL
The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.
Mitch Williams [Tue, 31 Mar 2015 07:45:05 +0000 (00:45 -0700)]
i40evf: protect VLAN filter list
The MAC filter list is protected by a critical task bit, and the VLAN
list should be protected as well. This prevents list corruption if the
watchdog happens to run at the same time as a VLAN filter is being added
or deleted.
i40e: Communicate VSI id in place of VSI index to the VFs
This does not affect the Virtual channel API as such but it changes the
meaning of what is communicated to the VSI resource struct as vsi_id.
Earlier vsi_idx was being passed in, which was the index in the PF's VSI
array. Now we pass vsi_id as communicated by the FW to the driver.
This will help with future expansion of VF and FW communication.
With this in place now the VF and Virtual channel driver change to move over
to VSI id use is complete and is validated.
Mitch Williams [Tue, 31 Mar 2015 07:45:04 +0000 (00:45 -0700)]
i40e: stop flow director on shutdown
In some cases, the hardware would continue to try to access the FDIR
ring after entering D3Hot state, which would cause either PCIe errors or
NMIs, depending upon system configuration.
Explicitly stop FDIR in our shutdown routine to eliminate this
possibility.
Shannon Nelson [Tue, 31 Mar 2015 07:45:04 +0000 (00:45 -0700)]
i40e: fix up VXLAN messages
When the VXLAN ports are added and removed, the messaging was giving some
bogus index info, the port was always '0' for the delete, and the message
text style didn't match other messages in the driver. Also, there was an
over-use of the tertiary statement which made reading a little harder
than necessary.
Greg Rose [Tue, 31 Mar 2015 07:45:03 +0000 (00:45 -0700)]
i40e/i40evf: Set Ethernet protocol correctly when Tx VLAN offloads are disabled
If transmit VLAN HW offloads are disabled then the network stack sends up
an skb with the protocol set to 8021q. In that case to get the correct
checksum offloads we have to reset the skb protocol to the encapsulated
ethertype.
Mitch Williams [Tue, 31 Mar 2015 07:45:02 +0000 (00:45 -0700)]
i40e: warn at the right time
The call to pci_disable_sriov got moved, but the message about not
disabling VFs didn't move. So move it. While we're at, reword the
message a bit to make it more consistent with other driver messages.
Jesse Brandeburg [Tue, 31 Mar 2015 07:45:01 +0000 (00:45 -0700)]
i40e/i40evf: fix bug when skb allocation fails
If the skb allocation fails we should not continue using the skb
pointer. Breaking out at the point of failure means that at the next
RX interrupt the driver will try the allocation again.
Mitch Williams [Tue, 31 Mar 2015 07:45:00 +0000 (00:45 -0700)]
i40e: validate VSI param from VFs
Validate that the VF has sent us a valid VSI index before actually using
that index. Without this code, a malicious or buggy VF driver could
panic the host by sending an invalid index into the VSI array.
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One drm core fix, one exynos regression fix, two sets of radeon fixes
(Alex was a bit behind last week), and two i915 fixes.
Nothing too serious we seem to have calmed down i915 since last week"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
drm: Exynos: Respect framebuffer pitch for FIMD/Mixer
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches
drm/radeon: programm the VCE fw BAR as well
drm/radeon: always dump the ring content if it's available
radeon: Do not directly dereference pointers to BIOS area.
drm/radeon/dpm: fix 120hz handling harder
drm/edid: set ELD for firmware and debugfs override EDIDs
Merge tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux
Pull irqchip fixes from Jason Cooper:
"This is the second round of fixes for irqchip. It contains some fixes
found while the arm64 guys were writing the kvm gicv3 its emulation.
GICv3 ITS:
- Small batch of fixes discovered while writing the kvm ITS emulation"
* tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux:
irqchip: gicv3-its: Use non-cacheable accesses when no shareability
irqchip: gicv3-its: Fix PROP/PEND and BASE/CBASE confusion
irqchip: gicv3-its: Fix device ID encoding
irqchip: gicv3-its: Fix encoding of collection's target redistributor
Dave Airlie [Thu, 2 Apr 2015 23:28:55 +0000 (09:28 +1000)]
Merge branch 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just two small fixes for radeon, both destined for stable.
* 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
Dave Airlie [Thu, 2 Apr 2015 23:27:48 +0000 (09:27 +1000)]
Merge tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes
one oops fixes and a 0-length allocation fix from next backported.
* tag 'drm-intel-fixes-2015-04-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches
Merge tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen regression fixes from David Vrabel:
"Fix two regressions in the balloon driver's use of memory hotplug when
used in a PV guest"
* tag 'stable/for-linus-4.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: before adding hotplugged memory, set frames to invalid
x86/xen: prepare p2m list for memory hotplug
tcp: fix FRTO undo on cumulative ACK of SACKed range
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma fix from Roland Dreier:
"Fix for exploitable integer overflow in uverbs interface"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
David S. Miller [Thu, 2 Apr 2015 20:33:43 +0000 (16:33 -0400)]
Merge branch 'mlx5-next'
Eli Cohen says:
====================
mlx5 batch of patches for net-next
This series contains small fixes to the mlx5 core driver and also
preparation steps towards adding Ethernet support for ConnectX4
devices which will be part of mlx5 driver.
====================
Eli Cohen [Thu, 2 Apr 2015 14:07:26 +0000 (17:07 +0300)]
net/mlx5_core: Avoid copying outbox in aysnc command completion
Avoid copying to the output buffer in cmd_exec since this is done after the
command is completed. Failure to do this may cause cases where the callback
handler is called before the copy done by cmd_exec which then overwrites it.
Eli Cohen [Thu, 2 Apr 2015 14:07:25 +0000 (17:07 +0300)]
net/mlx5_core: Use coherent memory for command interface page
Use coherent memory for the commands descriptor page. Take measures to make
sure the page is aligned to MLX5_ADAPTER_PAGE_SIZE as required by the hardware.
David S. Miller [Thu, 2 Apr 2015 20:27:13 +0000 (16:27 -0400)]
Merge branch 'tipc-next'
Jon Maloy says:
====================
tipc: remove some unnecessary complexity
The TIPC code is unnecessarily complex in some places, often because
the conditions or assumptions that were the cause for the complexity
are not valid anymore.
In these three commits, we eliminate some cases of such redundant
complexity.
====================
Jon Paul Maloy [Thu, 2 Apr 2015 13:33:02 +0000 (09:33 -0400)]
tipc: simplify link mtu negotiation
When a link is being established, the two endpoints advertise their
respective interface MTU in the transmitted RESET and ACTIVATE messages.
If there is any difference, the lower of the two MTUs will be selected
for use by both endpoints.
However, as a remnant of earlier attempts to introduce TIPC level
routing. there also exists an MTU discovery mechanism. If an intermediate
node has a lower MTU than the two endpoints, they will discover this
through a bisectional approach, and finally adopt this MTU for common use.
Since there is no TIPC level routing, and probably never will be,
this mechanism doesn't make any sense, and only serves to make the
link level protocol unecessarily complex.
In this commit, we eliminate the MTU discovery algorithm,and fall back
to the simple MTU advertising approach. This change is fully backwards
compatible.
Jon Paul Maloy [Thu, 2 Apr 2015 13:33:01 +0000 (09:33 -0400)]
tipc: eliminate delayed link deletion at link failover
When a bearer is disabled manually, all its links have to be reset
and deleted. However, if there is a remaining, parallel link ready
to take over a deleted link's traffic, we currently delay the delete
of the removed link until the failover procedure is finished. This
is because the remaining link needs to access state from the reset
link, such as the last received packet number, and any partially
reassembled buffer, in order to perform a successful failover.
In this commit, we do instead move the state data over to the new
link, so that it can fulfill the procedure autonomously, without
accessing any data on the old link. This means that we can now
proceed and delete all pertaining links immediately when a bearer
is disabled. This saves us from some unnecessary complexity in such
situations.
We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
FAILOVER_MSG and SYNCH_MSG respectively.
Jon Paul Maloy [Thu, 2 Apr 2015 13:33:00 +0000 (09:33 -0400)]
tipc: drop tunneled packet duplicates at reception
In commit 8b4ed8634f8b3f9aacfc42b4a872d30c36b9e255
("tipc: eliminate race condition at dual link establishment")
we introduced a parallel link synchronization mechanism that
guarentees sequential delivery even for users switching from
an old to a newly established link. The new mechanism makes it
unnecessary to deliver the tunneled duplicate packets back to
the old link, as we are currently doing. It is now sufficient
to use the last tunneled packet's inner sequence number as
synchronization point between the two parallel links, whereafter
it can be dropped.
In this commit, we drop the duplicate packets arriving on the new
link, after updating the synchronization point at each new arrival.
Although it would now have been sufficient for the other endpoint
to only tunnel the last packet in its send queue, and not the
entire queue, we must still do this to maintain compatibility
with older nodes.
This commit makes it possible to get rid if some complex
interaction between the two parallel links.
A new capability bit was introduced in the past to to differ devices
using the QoS ETS feature. The old was deprecated since then.
If driver sees device which set only the old capabilty, it will print
warning to user suggesting to upgrade the FW.
Support granular QoS per VF, by implementing the ndo_set_vf_rate.
Enforce a rate limit per VF when called, and enabled only for VFs in
VST mode with user priority supported by the device.
We don't enforce VFs to be in VST mode at the moment of configuration,
but rather save the given rate limit and enforce it when the VF is
moved to VST with user priority which is supported (currently 0).
VST<->VGT or VST qos value state changes are disallowed when a rate
limit is configured. Minimum BW share is not supported yet.
net/mlx4: Added qos_vport QP configuration in VST mode
Granular QoS per VF feature introduce a new QP field, qos_vport.
PF administrator can connect VF QPs to a certain QoS Vport, to
inherit its proporties. Connecting QPs to the default QoS Vport
(defined as 0) is always allowed, even when there are no allocated VPPs.
At this point, only the default vport is connected to QPs.
Query the port availible VPPs and allocates those on all supported
priorities in an equal share. Allocation is done only in SRIOV mode,
when the feature is supported by the device and port type is Ethernet.
Allocation currently is done only on the default priority 0.
Add the SET_VPORT_QOS device command, which is ntended for virtual
granular QoS configuration per VF in SRIOV mode. The SET_VPORT_QOS
command sets and queries QoS parameters of a VPort. Each priority
allowed for a VPort is assigned with a share of the BW, and a BW
limitation. QoS parameters can be modified at any time, but must be
initialized before any QP is associated with the VPort.
Implements device ALLOCATE_VPP command, to be used for granular QoS
configuration of VFs by the PF device. Defines and queries the amount
of VPPs assigned to each port, and the amount of VPPs assigned to each
priority of each port. Once the total VPPs are split between the priorities
of a port, they may be assigned with a share of the BW or a rate limit.
Split into two functions (get/set) whoch are supplied with
mlx4_alloc_vpp_context and physical port number.
net/mlx4: New file for QoS related firmware commands
Create two new files fw_qos.h and fw_qos.c in mlx4_core module.
It gathers all relevant QoS firmware related commands etc, thus improving
encapsulation of the mlx4_core module. For now it contains the QoS existing
commands: mlx4_SET_PORT_SCHEDULER and mlx4_SET_PORT_PRIO2TC.
net/mlx4: Aesthetic code changes in multi_func_init
Previous vf_oper and vf_admin code created very long lines, making it hard
to read the code. Added relevant in-struct pointers to reduce code
complexity and avoid code lines spread over 80 lines. Same logic is preserved.
net/mlx4_en: Change loopback only upon feature change
Currently any change of netdev features results in a call to
mlx4_en_update_loopback_state(). Those calls are unnecessary,
and should be called only upon loopback feature change.
Also moved some of the logic into mlx4_en_update_loopback_state().
net/mlx4: Add RSS support for fragmented IP datagrams
Enable RSS support for fragmented IP packets, when device supports it.
Until now, fragmented IP packets were directed only to the default_qpn.
Since IP fragments (datagram) have no upper protocols (L3 IP packets),
hash is performed on 3-tuple - dst MAC, source IP and dest IP. The HW
makes sure that this holds for the 1st fragment too, so all fragments
go to the same QP.
Jonathan Davies [Tue, 31 Mar 2015 10:05:15 +0000 (11:05 +0100)]
xen-netfront: transmit fully GSO-sized packets
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: Jonathan Davies <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.
With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.
Bluetooth: Disallow LE local out-of-band data when LE privacy is used
When the LE pivacy feature is used, then pairing has to happen based
on resolvable random addresses (RPA), but currently there is no clean
way to retrieve the correct RPA. So instead of returning an outdated
RPA, just disallow this command when LE privacy is in use.