iwlwifi: pcie: implement the overlow queue for Gen2 devices
When we enable TSO, we can have a lot of packets in the
operation mode that will be pushed to the transport
no matter what is the queue's fullness state.
To cope with that the transport can buffer those packets
and add them to the ring later when there is more room.
This implementation was missing in the Gen2 devices'
code.
Johannes Berg [Thu, 8 Jun 2017 07:07:11 +0000 (09:07 +0200)]
iwlwifi: mvm: clean up scan capability checks
Introduce and use iwl_mvm_cdb_scan_api(), which checks the family.
Most of this will go away once the 22000 firmware supports adaptive
dwell, after which the V6 scan API won't be used, but the V3 scan
*config* API will still need to be distinguished.
In any case, this gets rid of the completely bogus has_new_tx_api()
checks.
Johannes Berg [Tue, 11 Nov 2014 11:57:03 +0000 (12:57 +0100)]
iwlwifi: mvm: detect U-APSD breaking aggregation
Try to detect that the AP is not using aggregation even when there's
enough traffic to make it worthwhile; if this is the case and U-APSD
is enabled then assume the AP is broken (like so many) and doesn't
enable aggregation when U-APSD is used. In this case, disconnect from
the AP and blacklist U-APSD for a potential new connection to it.
iwlwifi: mvm: BT Coex - make the primary / secondary pick traffic aware
The primary channel is the channel that will be untouched by BT. The
secondary channel might be touched by BT. Hence, we want the primary
to be the most active channel. To do so, use the TCM infrastructure.
Since the BT keeps sending notifications, we can rely on them to
trigger the check. Every 10 seconds, we will check what is the most
active context and chose the right primary.
We need to wait 10 seconds before we modify the settings because
frequent changes in these settings can confuse BT.
iwlwifi: mvm: use TCM data to decide scan priority
The code for changing the scan priority is already implemented, but
isn't yet in use. Now that TCM data is available, we can base the
scan priority decision on the traffic load.
Traffic condition monitor gathers data about the traffic load and
other conditions and can be used to make decisions regarding latency,
throughput etc. This patch introduces the code and data structures to
collect this data for future use.
Al Viro [Fri, 20 Apr 2018 02:03:08 +0000 (22:03 -0400)]
Don't leak MNT_INTERNAL away from internal mounts
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies. As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
crashes, because
commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
releases the internal clcsock in smc_close_active() and sets smc->clcsock
to NULL.
For SHUT_RD the smc_close_active() call is removed.
For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
clcsock is already released.
Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <[email protected]> Reported-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type
could be greater than the fixed buffer length of 4096 bytes allocated by
the driver. This was causing HWRM_NVM_READ to copy more data to the buffer
than the allocated size, causing general protection fault.
Fix the issue by allocating the exact buffer length returned by
HWRM_NVM_FIND_DIR_ENTRY, instead of 4096. Move the kzalloc() call
into the bnxt_get_pkgver() function.
Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO") Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Programming vids (adding or removing them) still passes
guest-endian values in the DMA buffer. That's wrong
if guest is big-endian and when virtio 1 is enabled.
Note: this is on top of a previous patch:
virtio_net: split out ctrl buffer
When sending control commands, virtio net sets up several buffers for
DMA. The buffers are all part of the net device which means it's
actually allocated by kvmalloc so it's in theory (on extreme memory
pressure) possible to get a vmalloc'ed buffer which on some platforms
means we can't DMA there.
Fix up by moving the DMA buffers into a separate structure.
dann frazier [Thu, 19 Apr 2018 03:55:41 +0000 (21:55 -0600)]
net: hns: Avoid action name truncation
When longer interface names are used, the action names exposed in
/proc/interrupts and /proc/irq/* maybe truncated. For example, when
using the predictable name algorithm in systemd on a HiSilicon D05,
I see:
David S. Miller [Thu, 19 Apr 2018 20:11:12 +0000 (16:11 -0400)]
Merge branch 'Amiga-xsurf100'
Michael Schmitz says:
====================
New network driver for Amiga X-Surf 100 (m68k)
[This is a resend of my v3 series which was based on the wrong version and
tree. Only substantial change is to Asix AX99796B PHY driver.]
This patch series adds support for the Individual Computers X-Surf 100
network card for m68k Amiga, a network adapter based on the AX88796 chip set.
The driver was originally written for kernel version 3.19 by Michael Karcher
(see CC:), and adapted to 4.16+ for submission to netdev by me. Questions
regarding motivation for some of the changes are probably best directed at
Michael Karcher.
The driver has been tested by Adrian <[email protected]> who will
send his Tested-by tag separately.
A few changes to the ax88796 driver were required:
- to read the MAC address, some setup of the ax99796 chip must be done,
- attach to the MII bus only on device open to allow module unloading,
- allow to supersede ax_block_input/ax_block_output by card-specific
optimized code,
- use an optional interrupt status callback to allow easier sharing of the
card interrupt,
- set IRQF_SHARED if platform IRQ resource is marked shareable
The Asix Electronix PHY used on the X-Surf 100 is buggy, and causes the
software reset to hang if the previous command sent to the PHY was also
a soft reset. This bug requires addition of a PHY driver for Asix PHYs
to provide a fixed .soft_reset function, included in this series.
Some additional cleanup:
- do not attempt to free IRQ in ax_remove (complements 82533ad9a1c),
- clear platform drvdata on probe fail and module remove.
Changes since v1:
Raised in review by Andrew Lunn:
- move MII code around to avoid need for forward declaration,
- combine patches 2 and 7 to add cleanup in error path
Changes since v2:
- corrected authorship attribution to Michael Karcher
Suggested by Geert Uytterhoeven:
- use ei_local->reset_8390() instead of duplicating ax_reset_8390(),
- use %pR to format struct resource pointers,
- assign pdev and xs100 pointers in declaration,
- don't split error messages,
- change Kconfig logic to only require XSURF100 set on Amiga
Suggested by Andrew Lunn:
- add COMPILE_TEST to ax88796 Kconfig options,
- use new Asix PHY driver for X-Surf 100
Suggested by Andrew Lunn/Finn Thain:
- declare struct sk_buff in ax88796.h,
- correct whitespace error in ax88796.h
Changes since v3:
- various checkpatch cleanup
Andrew Lunn:
- don't duplicate genphy_soft_reset in Asix PHY driver, just call
genphy_soft_reset after writing zero to control register
====================
Michael Karcher [Thu, 19 Apr 2018 02:05:26 +0000 (14:05 +1200)]
net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)
Add platform device driver to populate the ax88796 platform data from
information provided by the XSurf100 zorro device driver. The ax88796
module will be loaded through this module's probe function.
Michael Schmitz [Thu, 19 Apr 2018 02:05:25 +0000 (14:05 +1200)]
net-next: ax88796: release platform device drvdata on probe error and module remove
The net device struct pointer is stored as platform device drvdata on
module probe - clear the drvdata entry on probe fail there, as well as
when unloading the module.
Michael Karcher [Thu, 19 Apr 2018 02:05:23 +0000 (14:05 +1200)]
net-next: ax88796: add interrupt status callback to platform data
To be able to tell the ax88796 driver whether it is sensible to enter
the 8390 interrupt handler, an "is this interrupt caused by the 88796"
callback has been added to the ax_plat_data structure (with NULL being
compatible to the previous behaviour).
Michael Karcher [Thu, 19 Apr 2018 02:05:22 +0000 (14:05 +1200)]
net-next: ax88796: Add block_input/output hooks to ax_plat_data
Add platform specific hooks for block transfer reads/writes of packet
buffer data, superseding the default provided ax_block_input/output.
Currently used for m68k Amiga XSurf100.
Michael Karcher [Thu, 19 Apr 2018 02:05:21 +0000 (14:05 +1200)]
net-next: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).
This complements the fix in 82533ad9a1c ("net: ethernet: ax88796:
don't call free_irq without request_irq first") that removed the
free_irq call in the error path of probe, to also not call free_irq
when remove is called to revert the effects of probe.
Fixes: 82533ad9a1c (net: ethernet: ax88796: don't call free_irq without request_irq first) Signed-off-by: Michael Karcher <[email protected]> Signed-off-by: Michael Schmitz <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Michael Schmitz [Thu, 19 Apr 2018 02:05:18 +0000 (14:05 +1200)]
net-next: phy: new Asix Electronics PHY driver
The Asix Electronics PHY found on the X-Surf 100 Amiga Zorro network
card by Individual Computers is buggy, and needs the reset bit toggled
as workaround to make a PHY soft reset succeed.
Add workaround driver just for this special case.
Suggested in xsurf100 patch series review by Andrew Lunn <[email protected]>
David S. Miller [Thu, 19 Apr 2018 19:59:12 +0000 (15:59 -0400)]
Merge branch 'Modernize-mdio-gpio'
Andrew Lunn says:
====================
Modernize mdio-gpio
This patchset is inspired by a previous version by Linus Walleij
It reworks the mdio-gpio code to make use of gpio descriptors instead
of gpio numbers. However compared to the previous version, it retains
support for platform devices. It does however remove the platform_data
header file. The needed GPIOs are now passed by making use of a gpiod
lookup table. e.g:
Andrew Lunn [Wed, 18 Apr 2018 23:02:58 +0000 (01:02 +0200)]
net: phy: mdio-gpio: Add #defines for the GPIO index's
The GPIOs are described in device tree using a list, without names.
Add defines to indicate what each index in the list means. These
defines should also be used by platform devices passing GPIOs via a
GPIO lookup table.
Andrew Lunn [Wed, 18 Apr 2018 23:02:57 +0000 (01:02 +0200)]
net: phy: mdio-gpio: Parse properties directly into bitbang structure
The same parsing code can be used for both OF and platform devices, if
the platform device uses a gpiod_lookup_table. Parse these properties
directly into the bitbang structure, rather than use an intermediate
platform data structure.
Andrew Lunn [Wed, 18 Apr 2018 23:02:54 +0000 (01:02 +0200)]
net: phy: mdio-gpio: Remove support for IRQs in platform data
No current devices use IRQs in platform data, so remove support for
it. The MDIO core will also initialise the new bus such that all
addresses are polled, so remove the unneeded re-initialisation.
Daniel Borkmann [Thu, 19 Apr 2018 19:48:19 +0000 (21:48 +0200)]
Merge branch 'bpf-type-format'
Martin KaFai Lau says:
====================
This patch introduces BPF Type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map. Hence, it basically focus
on the C programming language which the modern BPF is primary
using. The first use case is to provide a generic pretty print
capability for a BPF map.
A modified pahole that can convert dwarf to BTF is here:
https://github.com/iamkafai/pahole/tree/btf
Please see individual patch for details.
v5:
- Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
currently used. They can be added in the future.
Some bpf_df_xxx() are removed together.
- Add comment in patch 7 to clarify that the new bpffs_map_fops
should not be extended further.
v4:
- Fix warning (remove unneeded semicolon)
- Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
patch 1. Caught by W=1.
v3:
- Rebase to bpf-next
- Fix sparse warning (by adding static)
- Add BTF header logging: btf_verifier_log_hdr()
- Fix the alignment test on btf->type_off
- Add tests for the BTF header
- Lower the max BTF size to 16MB. It should be enough
for some time. We could raise it later if it would
be needed.
v2:
- Use kvfree where needed in patch 1 and 2
- Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
in patch 1
- Fix an incorrect goto target in map_create() during
the btf-error-path in patch 7
- re-org some local vars to keep the rev xmas tree in btf.c
====================
Martin KaFai Lau [Wed, 18 Apr 2018 22:56:06 +0000 (15:56 -0700)]
bpf: btf: Add BTF tests
This patch tests the BTF loading, map_create with BTF
and the changes in libbpf.
-r: Raw tests that test raw crafted BTF data
-f: Test LLVM compiled bpf prog with BTF data
-g: Test BPF_OBJ_GET_INFO_BY_FD for btf_fd
-p: Test pretty print
The tools/testing/selftests/bpf/Makefile will probe
for BTF support in llc and pahole before generating
debug info (-g) and convert them to BTF. You can supply
the BTF supported binary through the following make variables:
LLC, BTF_PAHOLE and LLVM_OBJCOPY.
LLC: The lastest llc with -mattr=dwarfris support for the bpf target.
It is only in the master of the llvm repo for now.
BTF_PAHOLE: The modified pahole with BTF support:
https://github.com/iamkafai/pahole/tree/btf
To add a BTF section: "pahole -J bpf_prog.o"
LLVM_OBJCOPY: Any llvm-objcopy should do
Martin KaFai Lau [Wed, 18 Apr 2018 22:56:05 +0000 (15:56 -0700)]
bpf: btf: Add BTF support to libbpf
If the ".BTF" elf section exists, libbpf will try to create
a btf_fd (through BPF_BTF_LOAD). If that fails, it will still
continue loading the bpf prog/map without the BTF.
If the bpf_object has a BTF loaded, it will create a map with the btf_fd.
libbpf will try to figure out the btf_key_id and btf_value_id of a map by
finding the BTF type with name "<map_name>_key" and "<map_name>_value".
If they cannot be found, it will continue without using the BTF.
Martin KaFai Lau [Wed, 18 Apr 2018 22:56:03 +0000 (15:56 -0700)]
bpf: btf: Add pretty print support to the basic arraymap
This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.
This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id. The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.
A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf(). This patch has
implemented these new map ops for the basic arraymap.
It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.
bpffs_map_fops should not be extended further to support
other operations. Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.
Here is a sample output when reading a pinned arraymap
with the following map's value:
Martin KaFai Lau [Wed, 18 Apr 2018 22:56:02 +0000 (15:56 -0700)]
bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
The original BTF data, which was used to create the BTF fd during
the earlier BPF_BTF_LOAD call, will be returned.
The userspace is expected to allocate buffer
to info.info and the buffer size is set to info.info_len before
calling BPF_OBJ_GET_INFO_BY_FD.
The original BTF data is copied to the userspace buffer (info.info).
Only upto the user's specified info.info_len will be copied.
The original BTF data size is set to info.info_len. The userspace
needs to check if it is bigger than its allocated buffer size.
If it is, the userspace should realloc with the kernel-returned
info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.
Martin KaFai Lau [Wed, 18 Apr 2018 22:56:01 +0000 (15:56 -0700)]
bpf: btf: Add BPF_BTF_LOAD command
This patch adds a BPF_BTF_LOAD command which
1) loads and verifies the BTF (implemented in earlier patches)
2) returns a BTF fd to userspace. In the next patch, the
BTF fd can be specified during BPF_MAP_CREATE.
Martin KaFai Lau [Wed, 18 Apr 2018 22:55:59 +0000 (15:55 -0700)]
bpf: btf: Check members of struct/union
This patch checks a few things of struct's members:
1) It has a valid size (e.g. a "const void" is invalid)
2) A member's size (+ its member's offset) does not exceed
the containing struct's size.
3) The member's offset satisfies the alignment requirement
The above can only be done after the needs_resolve member's type
is resolved. Hence, the above is done together in
btf_struct_resolve().
Each possible member's type (e.g. int, enum, modifier...) implements
the check_member() ops which will be called from btf_struct_resolve().
Martin KaFai Lau [Wed, 18 Apr 2018 22:55:58 +0000 (15:55 -0700)]
bpf: btf: Validate type reference
After collecting all btf_type in the first pass in an earlier patch,
the second pass (in this patch) can validate the reference types
(e.g. the referring type does exist and it does not refer to itself).
While checking the reference type, it also gathers other information (e.g.
the size of an array). This info will be useful in checking the
struct's members in a later patch. They will also be useful in doing
pretty print later.
Martin KaFai Lau [Wed, 18 Apr 2018 22:55:57 +0000 (15:55 -0700)]
bpf: btf: Introduce BPF Type Format (BTF)
This patch introduces BPF type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map. Hence, it basically focus
on the C programming language which the modern BPF is primary
using. The first use case is to provide a generic pretty print
capability for a BPF map.
BTF has its root from CTF (Compact C-Type format). To simplify
the handling of BTF data, BTF removes the differences between
small and big type/struct-member. Hence, BTF consistently uses u32
instead of supporting both "one u16" and "two u32 (+padding)" in
describing type and struct-member.
It also raises the number of types (and functions) limit
from 0x7fff to 0x7fffffff.
Due to the above changes, the format is not compatible to CTF.
Hence, BTF starts with a new BTF_MAGIC and version number.
This patch does the first verification pass to the BTF. The first
pass checks:
1. meta-data size (e.g. It does not go beyond the total btf's size)
2. name_offset is valid
3. Each BTF_KIND (e.g. int, enum, struct....) does its
own check of its meta-data.
Some other checks, like checking a struct's member is referring
to a valid type, can only be done in the second pass. The second
verification pass will be implemented in the next patch.
David Ahern [Wed, 18 Apr 2018 22:39:04 +0000 (15:39 -0700)]
net/ipv6: Remove compare of fib6_idev from rt6_duplicate_nexthop
After 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") the comparison of idev does not add value since it
correlates to the nexthop device which is already compared. Remove
the idev comparison.
David Ahern [Wed, 18 Apr 2018 22:39:03 +0000 (15:39 -0700)]
net/ipv6: Change ip6_route_get_saddr to get dev from route
Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.
Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change ip6_route_get_saddr can just pass the nexthop
device to ipv6_dev_get_saddr.
David Ahern [Wed, 18 Apr 2018 22:39:02 +0000 (15:39 -0700)]
net/ipv6: Remove unnecessary checks on fib6_idev
Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.
Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change a couple of device checks during route lookups
are no longer needed. Remove them.
David Ahern [Wed, 18 Apr 2018 22:38:59 +0000 (15:38 -0700)]
net/ipv6: Rename fib6_info struct elements
Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.
vmxnet3: fix incorrect dereference when rxvlan is disabled
vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.
This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.
llc->sap is refcount'ed and llc_sap_remove_socket() is paired
with llc_sap_add_socket(). This can be amended by holding its refcount
before llc_sap_remove_socket() and releasing it after release_sock().
Eric Dumazet [Wed, 18 Apr 2018 18:43:15 +0000 (11:43 -0700)]
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.
While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.
We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.
Jose Abreu [Wed, 18 Apr 2018 09:57:55 +0000 (10:57 +0100)]
net: stmmac: Disable ACS Feature for GMAC >= 4
ACS Feature is currently enabled for GMAC >= 4 but the llc_snap status
is never checked in descriptor rx_status callback. This will cause
stmmac to always strip packets even that ACS feature is already
stripping them.
Lets be safe and disable the ACS feature for GMAC >= 4 and always strip
the packets for this GMAC version.
PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were
used (GENMASK_ULL(40, 0)).
This commit fixes that by using the correct mask.
Fixes: e7c5359f2eed ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors") Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
tracking TCP data delivery and ECN stats
This patch series improve tracking the data delivery status
1. minor improvement on SYN data
2. accounting bytes delivered with CE marks
3. exporting the delivery stats to applications
s.t. users can get better sense of TCP performance at per host,
per connection, and even per application message level.
====================
Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS
Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.
These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.
tcp: track total bytes delivered with ECN CE marks
Introduce a new delivered_ce stat in tcp socket to estimate
number of packets being marked with CE bits. The estimation is
done via ACKs with ECE bit. Depending on the actual receiver
behavior, the estimation could have biases.
Since the TCP sender can't really see the CE bit in the data path,
so the sender is technically counting packets marked delivered with
the "ECE / ECN-Echo" flag set.
With RFC3168 ECN, because the ECE bit is sticky, this count can
drastically overestimate the nummber of CE-marked data packets
With DCTCP-style ECN this should be reasonably precise unless there
is loss in the ACK path, in which case it's not precise.
With AccECN proposal this can be made still more precise, even in
the case some degree of ACK loss.
However this is sender's best estimate of CE information.
tcp: better delivery accounting for SYN-ACK and SYN-data
the tcp_sock:delivered has inconsistent accounting for SYN and FIN.
1. it counts pure FIN
2. it counts pure SYN
3. it counts SYN-data twice
4. it does not count SYN-ACK
For congestion control perspective it does not matter much as C.C. only
cares about the difference not the aboslute value. But the next patch
would export this field to user-space so it's better to report the absolute
value w/o these caveats.
This patch counts SYN, SYN-ACK, or SYN-data delivery once always in
the "delivered" field.
Commit 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on
page reuse") tried to allow user/bpf_prog to (re)use area used by
xdp_frame (stored in frame headroom), by memset clearing area when
bpf_xdp_adjust_head give bpf_prog access to headroom area.
The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta
into account. (2) a combination of bpf_xdp_adjust_head calls, where
xdp->data is moved into xdp_frame section, can cause clearing
xdp_frame area again for area previously granted to bpf_prog.
After discussions with Daniel, we choose to implement a simpler
solution to the problem, which is to reserve the headroom used by
xdp_frame info.
This also avoids the situation where bpf_prog is allowed to adjust/add
headers, and then XDP_REDIRECT later drops the packet due to lack of
headroom for the xdp_frame. This would likely confuse the end-user.
Fixes: 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
Wolfram Sang [Wed, 18 Apr 2018 18:20:57 +0000 (20:20 +0200)]
mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs
Early revisions of certain SoCs cannot do multiple DMA RX streams in
parallel. To avoid data corruption, only allow one DMA RX channel and
fall back to PIO, if needed.
HID: i2c-hid: fix inverted return value from i2c_hid_command()
i2c_hid_command() returns non-zero in error cases (the actual
errno). Error handling in for I2C_HID_QUIRK_RESEND_REPORT_DESCR
case in i2c_hid_resume() had the check inverted; fix that.
Michael Ellerman [Thu, 19 Apr 2018 06:22:20 +0000 (16:22 +1000)]
powerpc/kvm: Fix lockups when running KVM guests on Power8
When running KVM guests on Power8 we can see a lockup where one CPU
stops responding. This often leads to a message such as:
watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
Task dump for CPU 72:
qemu-system-ppc R running task 10560 20917 20908 0x00040004
And then backtraces on other CPUs, such as:
Task dump for CPU 48:
ksmd R running task 10032 1519 2 0x00000804
Call Trace:
...
--- interrupt: 901 at smp_call_function_many+0x3c8/0x460
LR = smp_call_function_many+0x37c/0x460
pmdp_invalidate+0x100/0x1b0
__split_huge_pmd+0x52c/0xdb0
try_to_unmap_one+0x764/0x8b0
rmap_walk_anon+0x15c/0x370
try_to_unmap+0xb4/0x170
split_huge_page_to_list+0x148/0xa30
try_to_merge_one_page+0xc8/0x990
try_to_merge_with_ksm_page+0x74/0xf0
ksm_scan_thread+0x10ec/0x1ac0
kthread+0x160/0x1a0
ret_from_kernel_thread+0x5c/0x78
This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync
for KVM state when waking from idle"), which added a check in
pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
test of kvm_hstate.hwthread_req.
The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
entering the guest, so it can then come out to cede with
KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
setting hwthread_req to 1, but because hwthread_state is still
KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
wake up from idle and won't go to kvm_start_guest. From there the
thread will return somewhere garbage and crash.
Fix it by skipping the store of hwthread_state, but not the test of
hwthread_req, when coming out of idle. It's OK to skip the sync in
that case because hwthread_req will have been set on the same thread,
so there is no synchronisation required.
Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <[email protected]>
2) ip6_input_finish()
ipv6_frag_rcv() (lock frag queue spinlock)
ip6_frag_queue()
icmpv6_param_prob() (lock txq->_xmit_lock at some point)
We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.
Michael Neuling [Wed, 11 Apr 2018 03:37:58 +0000 (13:37 +1000)]
powerpc/eeh: Fix enabling bridge MMIO windows
On boot we save the configuration space of PCIe bridges. We do this so
when we get an EEH event and everything gets reset that we can restore
them.
Unfortunately we save this state before we've enabled the MMIO space
on the bridges. Hence if we have to reset the bridge when we come back
MMIO is not enabled and we end up taking an PE freeze when the driver
starts accessing again.
This patch forces the memory/MMIO and bus mastering on when restoring
bridges on EEH. Ideally we'd do this correctly by saving the
configuration space writes later, but that will have to come later in
a larger EEH rewrite. For now we have this simple fix.
The original bug can be triggered on a boston machine by doing:
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
On boston, this PHB has a PCIe switch on it. Without this patch,
you'll see two EEH events, 1 expected and 1 the failure we are fixing
here. The second EEH event causes the anything under the PHB to
disappear (i.e. the i40e eth).
With this patch, only 1 EEH event occurs and devices properly recover.
net: qualcomm: rmnet: Fix warning seen with fill_info
When the last rmnet device attached to a real device is removed, the
real device is unregistered from rmnet. As a result, the real device
lookup fails resulting in a warning when the fill_info handler is
called as part of the rmnet device unregistration.
Fix this by returning the rmnet flags as 0 when no real device is
present.
hv_netvsc: propogate Hyper-V friendly name into interface alias
This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.
The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.
In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.
The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.
Also requires exporting dev_set_alias in netdev core.
r8169: remove jumbo_tx_csum from chip config struct
According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.
By implementing this rule we can get rid of configuring tx checksumming
support per chip type.
The region to be used is always the first of type IORESOURCE_MEM.
We can implement this rule directly w/o having to specify which
region is the first one per configuration entry.
Certain entries in array mac_info[] are redundant, so remove them:
0x7cf, 0x2c200000 (VER 33): matched by entry 0x7c8, 0x2c000000
0x7cf, 0x28300000 (VER 26): matched by entry 0x7c8, 0x28000000
0x7cf, 0x3cb00000 (VER 24): matched by entry 0x7c8, 0x3c800000
0x7cf, 0x3c400000 (VER 22): matched by entry 0x7c8, 0x3c000000
0x7cf, 0x38500000 (VER 17): matched by entry 0x7c8, 0x38000000
0x7cf, 0x44900000 (VER 39): matched by entry 0x7c8, 0x44800000
0x7cf, 0x40b00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x40a00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x34a00000 (VER 09): matched by entry 0x7c8, 0x34800000
0x7cf, 0x24a00000 (VER 09): matched by entry 0x7c8, 0x24800000
In addition don't mask out bits 30 and 29 when printing the XID.
Most likely this is a relict from the times when the driver covered
RTL8169 chip version only.
For security reasons since commit ad67b74d2469 "printk: hash addresses
printed with %p" %p doesn't display the full address any longer.
We could switch to %px, but I think the pointer address doesn't
provide a real benefit, so remove printing the hashed address.