Revert ("r8169: remove 1000/Half from supported modes")
This reverts commit a6851c613fd7fccc5d1f28d5d8a0cbe9b0f4e8cc.
It was reported that RTL8111b successfully finishes 1000/Full autoneg
but no data flows. Reverting the original patch fixes the issue.
It seems to be a HW issue with the integrated RTL8211B PHY. This PHY
version used also e.g. on RTL8168d, so better revert the original patch.
René van Dorst [Sat, 27 Jul 2019 09:40:11 +0000 (11:40 +0200)]
net: phylink: Fix flow control for fixed-link
In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
with pl->supported, pl->supported is zeroed and only the speed/duplex
modes and MII bits are set.
So pl->link_config.advertising always loses the flow control/pause bits.
By setting Pause and Asym_Pause bits in pl->supported, the flow control
work again when devicetree "pause" is set in fixes-link node and the MAC
advertise that is supports pause.
Results with this patch.
Legend:
- DT = 'Pause' is set in the fixed-link in devicetree.
- validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
validate().
- flow = results reported my link is Up line.
+-----+------------+-------+
| DT | validate() | flow |
+-----+------------+-------+
| Yes | Yes | rx/tx |
| No | Yes | off |
| Yes | No | off |
+-----+------------+-------+
Paul Bolle [Fri, 26 Jul 2019 22:05:41 +0000 (00:05 +0200)]
gigaset: stop maintaining seperately
The Dutch consumer grade ISDN network will be shut down on September 1,
2019. This means I'll be converted to some sort of VOIP shortly. At that
point it would be unwise to try to maintain the gigaset driver, even for
odd fixes as I do. So I'll stop maintaining it as a seperate driver and
bump support to CAPI in staging. De facto this means the driver will be
unmaintained, since no-one seems to be working on CAPI.
I've lighty tested the hardware specific modules of this driver (bas-gigaset,
ser-gigaset, and usb-gigaset) for v5.3-rc1. The basic functionality appears to
be working. It's unclear whether anyone still cares. I'm aware of only one
person sort of using the driver a few years ago.
Thanks to Karsten Keil for the ISDN subsystems gigaset was using (I4L and
CAPI). And many thanks to Hansjoerg Lipp and Tilman Schmidt for writing and
upstreaming this driver.
Jia-Ju Bai [Fri, 26 Jul 2019 14:17:05 +0000 (22:17 +0800)]
net: rds: Fix possible null-pointer dereferences in rds_rdma_cm_event_handler_cmn()
In rds_rdma_cm_event_handler_cmn(), there are some if statements to
check whether conn is NULL, such as on lines 65, 96 and 112.
But conn is not checked before being used on line 108:
trans->cm_connect_complete(conn, event);
and on lines 140-143:
rdsdebug("DISCONNECT event - dropping connection "
"%pI6c->%pI6c\n", &conn->c_laddr,
&conn->c_faddr);
rds_conn_drop(conn);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, conn is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Jia-Ju Bai [Fri, 26 Jul 2019 08:27:36 +0000 (16:27 +0800)]
isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
In start_isoc_chain(), usb_alloc_urb() on line 1392 may fail
and return NULL. At this time, fifo->iso[i].urb is assigned to NULL.
Then, fifo->iso[i].urb is used at some places, such as:
LINE 1405: fill_isoc_urb(fifo->iso[i].urb, ...)
urb->number_of_packets = num_packets;
urb->transfer_flags = URB_ISO_ASAP;
urb->actual_length = 0;
urb->interval = interval;
LINE 1416: fifo->iso[i].urb->...
LINE 1419: fifo->iso[i].urb->...
Thus, possible null-pointer dereferences may occur.
To fix these bugs, "continue" is added to avoid using fifo->iso[i].urb
when it is NULL.
These bugs are found by a static analysis tool STCheck written by us.
1) Ariel is addressing an issue with enacp flow counter race condition
2) Aya fixes ethtool speed handling
3) Edward fixes modify_cq hw bits alignment
4) Maor fixes RDMA_RX capabilities handling
5) Mark reverses unregister devices order to address an issue with LAG
6) From Tariq,
- wrong max num channels indication regression
- TLS counters naming and documentation as suggested by Jakub
- kTLS, Call WARN_ONCE on netdev mismatch
There is one patch in this series that touches nfp driver to align
TLS statistics names with latest documentation, Jakub is CC'ed.
Please pull and let me know if there is any problem.
For -stable v4.9:
('net/mlx5: Use reversed order when unregister devices')
For -stable v4.20
('net/mlx5e: Prevent encap flow counter update async to user query')
('net/mlx5: Fix modify_cq_in alignment')
For -stable v5.1
('net/mlx5e: Fix matching of speed to PRM link modes')
For -stable v5.2
('net/mlx5: Add missing RDMA_RX capabilities')
====================
net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets
so that the hardware can flip the checksum to 0xFFFF if the computed
checksum is 0 per RFC768.
However, this bit had to be set for IPv6 UDP non fragmented packets
as well per hardware requirements. Otherwise, IPv6 UDP packets
with computed checksum as 0 were transmitted by hardware and were
dropped in the network.
In addition to setting this bit for IPv6 UDP, the field is also
appropriately renamed to udp_ind as part of this change.
Fixes: 5eb5f8608ef1 ("net: qualcomm: rmnet: Add support for TX checksum offload") Cc: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Haishuang Yan [Thu, 25 Jul 2019 16:40:17 +0000 (00:40 +0800)]
ip6_tunnel: fix possible use-after-free on xmit
ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
can cause a possible use-after-free accessing iph/ipv6h pointer
since the packet will be 'uncloned' running pskb_expand_head if
it is a cloned gso skb.
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") Signed-off-by: Haishuang Yan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Haishuang Yan [Thu, 25 Jul 2019 03:07:56 +0000 (11:07 +0800)]
ipip: validate header length in ipip_tunnel_xmit
We need the same checks introduced by commit cb9f1b783850
("ip: validate header length on virtual device xmit") for
ipip tunnel.
Fixes: cb9f1b783850b ("ip: validate header length on virtual device xmit") Signed-off-by: Haishuang Yan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
ipv6_flowlabel and ipv6_flowlabel_mgr are missing from
gitignore. Quentin points out that the original
commit 3fb321fde22d ("selftests/net: ipv6 flowlabel")
did add ignore entries, they are just missing the "ipv6_"
prefix.
Commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") which enabled multi-cos
feature after prolonged time in driver added some regression causing
numerous issues (sudden reboots, tx timeout etc.) reported by customers.
We plan to backout this commit and submit proper fix once we have root
cause of issues reported with this feature enabled.
Fixes: 3968d38917eb ("bnx2x: Fix Multi-Cos.") Signed-off-by: Sudarsana Reddy Kalluru <[email protected]> Signed-off-by: Manish Chopra <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/mlx5e: Prevent encap flow counter update async to user query
This patch prevents a race between user invoked cached counters
query and a neighbor last usage updater.
The cached flow counter stats can be queried by calling
"mlx5_fc_query_cached" which provides the number of bytes and
packets that passed via this flow since the last time this counter
was queried.
It does so by reducting the last saved stats from the current, cached
stats and then updating the last saved stats with the cached stats.
It also provide the lastuse value for that flow.
Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
last usage time of encapsulation flows, it calls the flow counter
query method periodically and async to user queries of the flow counter
using cls_flower.
This call is causing the driver to update the last reported bytes and
packets from the cache and therefore, future user queries of the flow
stats will return lower than expected number for bytes and packets
since the last saved stats in the driver was updated async to the last
saved stats in cls_flower.
This causes wrong stats presentation of encapsulation flows to user.
Since the neighbor usage updater only needs the lastuse stats from the
cached counter, the fix is to use a dedicated lastuse query call that
returns the lastuse value without synching between the cached stats and
the last saved stats.
Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Aya Levin [Sun, 16 Jun 2019 10:20:29 +0000 (13:20 +0300)]
net/mlx5e: Fix matching of speed to PRM link modes
Speed translation is performed based on legacy or extended PTYS
register. Translate speed with respect to:
1) Capability bit of extended PTYS table.
2) User request:
a) When auto-negotiation is turned on, inspect advertisement whether it
contains extended link modes.
b) When auto-negotiation is turned off, speed > 100Gbps (maximal
speed supported in legacy mode).
With both conditions fulfilled translation is done with extended PTYS
table otherwise use legacy PTYS table.
Without this patch 25/50/100 Gbps speed cannot be set, since try to
configure in extended mode but read from legacy mode.
No XSK support in the enhanced IPoIB driver and representors.
Add a profile property to specify this, and enhance the logic
that calculates the max number of channels to take it into
account.
Maor Gottlieb [Sun, 14 Jul 2019 08:33:07 +0000 (11:33 +0300)]
net/mlx5: Add missing RDMA_RX capabilities
New flow table type RDMA_RX was added but the MLX5_CAP_FLOW_TABLE_TYPE
didn't handle this new flow table type.
This means that MLX5_CAP_FLOW_TABLE_TYPE returns an empty capability to
this flow table type.
Update both the macro and the maximum supported flow table type to
RDMA_RX.
Fixes: d83eb50e29de ("net/mlx5: Add support in RDMA RX steering") Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Mark Zhang [Tue, 9 Jul 2019 02:37:12 +0000 (05:37 +0300)]
net/mlx5: Use reversed order when unregister devices
When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
interface unregestering must happen in the reverse order where rdma is
unregistered (unloaded) first, to guarantee all references to the lag
context in hardware is removed, then remove mlx5e netdev interface which
will cleanup the lag context from hardware.
Without this fix during destroy of LAG interface, we observed following
errors:
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xe4ac33)
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xa5aee8).
Chris Packham [Tue, 23 Jul 2019 23:35:01 +0000 (11:35 +1200)]
fsl/fman: Remove comment referring to non-existent function
fm_set_max_frm() existed in the Freescale SDK as a callback for an
early_param. When this code was ported to the upstream kernel the
early_param was converted to a module_param making the reference to the
function incorrect. The rest of the comment already does a good job of
explaining the parameter so removing the reference to the non-existent
function seems like the best thing to do.
- v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
Small packets going out of a tap device go through an optimized code
path that uses build_skb() rather than sock_alloc_send_pskb(). The
latter calls skb_set_owner_w(), but the small packet code path does not.
The net effect is that small packets are not owned by the userland
application's socket (e.g. QEMU), while large packets are.
This can be seen with a TCP session, where packets are not owned when
the window size is small enough (around PAGE_SIZE), while they are once
the window grows (note that this requires the host to support virtio
tso for the guest to offload segmentation).
All this leads to inconsistent behaviour in the kernel, especially on
netfilter modules that uses sk->socket (e.g. xt_owner).
Leon Romanovsky [Tue, 23 Jul 2019 07:22:48 +0000 (10:22 +0300)]
lib/dim: Fix -Wunused-const-variable warnings
DIM causes to the following warnings during kernel compilation
which indicates that tx_profile and rx_profile are supposed to
be declared in *.c and not in *.h files.
In file included from ./include/rdma/ib_verbs.h:64,
from ./include/linux/mlx5/device.h:37,
from ./include/linux/mlx5/driver.h:51,
from ./include/linux/mlx5/vport.h:36,
from drivers/infiniband/hw/mlx5/ib_virt.c:34:
./include/linux/dim.h:326:1: warning: _tx_profile_ defined but not used [-Wunused-const-variable=]
326 | tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
| ^~~~~~~~~~
./include/linux/dim.h:320:1: warning: _rx_profile_ defined but not used [-Wunused-const-variable=]
320 | rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
| ^~~~~~~~~~
While using net_dim, a dim_sample was used without ever initializing the
comps value. Added use of DIV_ROUND_DOWN_ULL() to prevent potential
overflow, it should not be a problem to save the final result in an int
because after the division by epms the value should not be larger than a
few thousand.
[ 1040.127124] UBSAN: Undefined behaviour in lib/dim/dim.c:78:23
[ 1040.130118] signed integer overflow:
[ 1040.131643] 134718714 * 100 cannot be represented in type 'int'
libbpf: silence GCC8 warning about string truncation
Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
GCC8 still complains about *expected* string truncation:
xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
from a string of length 15 [-Werror=stringop-truncation]
strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
This patch gets rid of the issue altogether by using memcpy instead.
There is no performance regression, as strncpy will still copy and fill
all of the bytes anyway.
Cong Wang [Tue, 23 Jul 2019 03:41:22 +0000 (20:41 -0700)]
netrom: hold sock when setting skb->destructor
sock_efree() releases the sock refcnt, if we don't hold this refcnt
when setting skb->destructor to it, the refcnt would not be balanced.
This leads to several bug reports from syzbot.
I have checked other users of sock_efree(), all of them hold the
sock refcnt.
Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get
net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.
The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.
Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Wed, 24 Jul 2019 18:02:48 +0000 (11:02 -0700)]
net/tls: add myself as a co-maintainer
I've been spending quite a bit of time fixing and
preventing bit rot in the core TLS code. TLS seems
to only be growing in importance, I'd like to help
ensuring the quality of our implementation.
Andreas Schwab [Wed, 24 Jul 2019 15:32:57 +0000 (17:32 +0200)]
net: phy: mscc: initialize stats array
The memory allocated for the stats array may contain arbitrary data.
Fixes: e4f9ba642f0b ("net: phy: mscc: add support for VSC8514 PHY.") Fixes: 00d70d8e0e78 ("net: phy: mscc: add support for VSC8574 PHY") Fixes: a5afc1678044 ("net: phy: mscc: add support for VSC8584 PHY") Fixes: f76178dc5218 ("net: phy: mscc: add ethtool statistics counters") Signed-off-by: Andreas Schwab <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: phylink: don't start and stop SGMII PHYs in SFP modules twice
SFP modules connected using the SGMII interface have their own PHYs which
are handled by the struct phylink's phydev field. On the other hand, for
the modules connected using 1000Base-X interface that field is not set.
Since commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network
devices and sfp cages") phylink_start() ends up setting the phydev field
using the sfp-bus infrastructure, which eventually calls phy_start() on it,
and then calling phy_start() again on the same phydev from phylink_start()
itself. Similar call sequence holds for phylink_stop(), only in the reverse
order. This results in WARNs during network interface bringup and shutdown
when a copper SFP module is connected, as phy_start() and phy_stop() are
called twice in a row for the same phy_device:
% ip link set up dev eth0
------------[ cut here ]------------
called from state UP
WARNING: CPU: 1 PID: 155 at drivers/net/phy/phy.c:895 phy_start+0x74/0xc0
Modules linked in:
CPU: 1 PID: 155 Comm: backend Not tainted 5.2.0+ #1
NIP: c0227bf0 LR: c0227bf0 CTR: c004d224
REGS: df547720 TRAP: 0700 Not tainted (5.2.0+)
MSR: 00029000 <CE,EE,ME> CR: 24002822 XER: 00000000
% ip link set down dev eth0
------------[ cut here ]------------
called from state HALTED
WARNING: CPU: 1 PID: 184 at drivers/net/phy/phy.c:858 phy_stop+0x3c/0x88
SFP modules with the 1000Base-X interface are not affected.
Place explicit calls to phy_start() and phy_stop() before enabling or after
disabling an attached SFP module, where phydev is not yet set (or is
already unset), so they will be made only from the inside of sfp-bus, if
needed.
Fixes: 217962615662 ("net: phy: warn if phy_start is called from invalid state") Signed-off-by: Arseny Solokha <[email protected]> Acked-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Wed, 24 Jul 2019 21:14:50 +0000 (14:14 -0700)]
Merge tag 'linux-can-fixes-for-5.3-20190724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-07-24
this is a pull reqeust of 7 patches for net/master.
The first patch is by Rasmus Villemoes add a missing netif_carrier_off() to
register_candev() so that generic netdev trigger based LEDs are initially off.
Nikita Yushchenko's patch for the rcar_canfd driver fixes a possible IRQ storm
on high load.
The patch by Weitao Hou for the mcp251x driver add missing error checking to
the work queue allocation.
Both Wen Yang's and Joakim Zhang's patch for the flexcan driver fix a problem
with the stop-mode.
Stephane Grosjean contributes a patch for the peak_usb driver to fix a
potential double kfree_skb().
The last patch is by YueHaibing and fixes the error path in can-gw's
cgw_module_init() function.
====================
When closing the CAN device while tx skbs are inflight, echo skb could
be released twice. By calling close_candev() before unlinking all
pending tx urbs, then the internal echo_skb[] array is fully and
correctly cleared before the USB write callback and, therefore,
can_get_echo_skb() are called, for each aborted URB.
To enter stop mode, the CPU should manually assert a global Stop Mode
request and check the acknowledgment asserted by FlexCAN. The CPU must
only consider the FlexCAN in stop mode when both request and
acknowledgment conditions are satisfied.
Weitao Hou [Tue, 25 Jun 2019 12:50:48 +0000 (20:50 +0800)]
can: mcp251x: add error check when wq alloc failed
add error check when workqueue alloc failed, and remove redundant code
to make it clear.
Fixes: e0000163e30e ("can: Driver for the Microchip MCP251x SPI CAN controllers") Signed-off-by: Weitao Hou <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Tested-by: Sean Nyekjaer <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
can: rcar_canfd: fix possible IRQ storm on high load
We have observed rcar_canfd driver entering IRQ storm under high load,
with following scenario:
- rcar_canfd_global_interrupt() in entered due to Rx available,
- napi_schedule_prep() is called, and sets NAPIF_STATE_SCHED in state
- Rx fifo interrupts are masked,
- rcar_canfd_global_interrupt() is entered again, this time due to
error interrupt (e.g. due to overflow),
- since scheduled napi poller has not yet executed, condition for calling
napi_schedule_prep() from rcar_canfd_global_interrupt() remains true,
thus napi_schedule_prep() gets called and sets NAPIF_STATE_MISSED flag
in state,
- later, napi poller function rcar_canfd_rx_poll() gets executed, and
calls napi_complete_done(),
- due to NAPIF_STATE_MISSED flag in state, this call does not clear
NAPIF_STATE_SCHED flag from state,
- on return from napi_complete_done(), rcar_canfd_rx_poll() unmasks Rx
interrutps,
- Rx interrupt happens, rcar_canfd_global_interrupt() gets called
and calls napi_schedule_prep(),
- since NAPIF_STATE_SCHED is set in state at this time, this call
returns false,
- due to that false return, rcar_canfd_global_interrupt() returns
without masking Rx interrupt
- and this results into IRQ storm: unmasked Rx interrupt happens again
and again is misprocessed in the same way.
This patch fixes that scenario by unmasking Rx interrupts only when
napi_complete_done() returns true, which means it has cleared
NAPIF_STATE_SCHED in state.
Rasmus Villemoes [Mon, 24 Jun 2019 08:34:13 +0000 (08:34 +0000)]
can: dev: call netif_carrier_off() in register_candev()
CONFIG_CAN_LEDS is deprecated. When trying to use the generic netdev
trigger as suggested, there's a small inconsistency with the link
property: The LED is on initially, stays on when the device is brought
up, and then turns off (as expected) when the device is brought down.
Make sure the LED always reflects the state of the CAN device.
'channels.max_combined' initialized only on ioctl success and
errno is only valid on ioctl failure.
The code doesn't produce any runtime issues, but makes memory
sanitizers angry:
Conditional jump or move depends on uninitialised value(s)
at 0x55C056F: xsk_get_max_queues (xsk.c:336)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
Additionally fixed warning on uninitialized bytes in ioctl arguments:
Syscall param ioctl(SIOCETHTOOL) points to uninitialised byte(s)
at 0x648D45B: ioctl (in /usr/lib64/libc-2.28.so)
by 0x55C0546: xsk_get_max_queues (xsk.c:330)
by 0x55C05B2: xsk_create_bpf_maps (xsk.c:354)
by 0x55C089F: xsk_setup_xdp_prog (xsk.c:447)
by 0x55C0E57: xsk_socket__create (xsk.c:601)
Address 0x1ffefff378 is on thread 1's stack
in frame #1, created by xsk_get_max_queues (xsk.c:318)
Uninitialised value was created by a stack allocation
at 0x55C04CD: xsk_get_max_queues (xsk.c:318)
The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.
The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.
Adjust shift count calculation to account for endianness.
The onboard sky2 NIC on ASUS P6T WS PRO doesn't work after PM resume
due to the infamous IRQ problem. Disabling MSI works around it, so
let's add it to the blacklist.
Unfortunately the BIOS on the machine doesn't fill the standard
DMI_SYS_* entry, so we pick up DMI_BOARD_* entries instead.
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return.
Issue found with Coccinelle.
net: dsa: mv88e6xxx: chip: Add of_node_put() before return
Each iteration of for_each_available_child_of_node puts the previous
node, but in the case of a return from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
return.
Issue found with Coccinelle.
====================
selftests: forwarding: GRE multipath fixes
Patch #1 ensures IPv4 forwarding is enabled during the test.
Patch #2 fixes the flower filters used to measure the distribution of
the traffic between the two nexthops, so that the test will pass
regardless if traffic is offloaded or not.
====================
The TC filters used in the test do not work with veth devices because the
outer Ethertype is 802.1Q and not IPv4. The test passes with mlxsw
netdevs since the hardware always looks at "The first Ethertype that
does not point to either: VLAN, CNTAG or configurable Ethertype".
Fix this by matching on the VLAN ID instead, but on the ingress side.
The reason why this is not performed at egress is explained in the
commit cited below.
Fixes: 541ad323db3a ("selftests: forwarding: gre_multipath: Update next-hop statistics match criteria") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Stephen Suryaputra <[email protected]> Tested-by: Stephen Suryaputra <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The test did not enable IPv4 forwarding during its setup phase, which
causes the test to fail on machines where IPv4 forwarding is disabled.
Fixes: 54818c4c4b93 ("selftests: forwarding: Test multipath tunneling") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Stephen Suryaputra <[email protected]> Tested-by: Stephen Suryaputra <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Mon, 22 Jul 2019 08:39:30 +0000 (10:39 +0200)]
net: stmmac: RX Descriptors need to be clean before setting buffers
RX Descriptors are being cleaned after setting the buffers which may
lead to buffer addresses being wiped out.
Fix this by clearing earlier the RX Descriptors.
Fixes: 2af6106ae949 ("net: stmmac: Introducing support for Page Pool") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yonglong Liu [Mon, 22 Jul 2019 05:59:12 +0000 (13:59 +0800)]
net: hns: fix LED configuration for marvell phy
Since commit(net: phy: marvell: change default m88e1510 LED configuration),
the active LED of Hip07 devices is always off, because Hip07 just
use 2 LEDs.
This patch adds a phy_register_fixup_for_uid() for m88e1510 to
correct the LED configuration.
Fixes: 077772468ec1 ("net: phy: marvell: change default m88e1510 LED configuration") Signed-off-by: Yonglong Liu <[email protected]> Reviewed-by: linyunsheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: mvpp2: Don't check for 3 consecutive Idle frames for 10G links
PPv2's XLGMAC can wait for 3 idle frames before triggering a link up
event. This can cause the link to be stuck low when there's traffic on
the interface, so disable this feature.
Fixes: 4bb043262878 ("net: mvpp2: phylink support") Signed-off-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Thomas Falcon [Tue, 16 Jul 2019 22:25:10 +0000 (17:25 -0500)]
bonding: Force slave speed check after link state recovery for 802.3ad
The following scenario was encountered during testing of logical
partition mobility on pseries partitions with bonded ibmvnic
adapters in LACP mode.
1. Driver receives a signal that the device has been
swapped, and it needs to reset to initialize the new
device.
2. Driver reports loss of carrier and begins initialization.
3. Bonding driver receives NETDEV_CHANGE notifier and checks
the slave's current speed and duplex settings. Because these
are unknown at the time, the bond sets its link state to
BOND_LINK_FAIL and handles the speed update, clearing
AD_PORT_LACP_ENABLE.
4. Driver finishes recovery and reports that the carrier is on.
5. Bond receives a new notification and checks the speed again.
The speeds are valid but miimon has not altered the link
state yet. AD_PORT_LACP_ENABLE remains off.
Because the slave's link state is still BOND_LINK_FAIL,
no further port checks are made when it recovers. Though
the slave devices are operational and have valid speed
and duplex settings, the bond will not send LACPDU's. The
simplest fix I can see is to force another speed check
in bond_miimon_commit. This way the bond will update
AD_PORT_LACP_ENABLE if needed when transitioning from
BOND_LINK_FAIL to BOND_LINK_UP.
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull preemption Kconfig fix from Thomas Gleixner:
"The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
and PREEMPT_RT.
Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
obviously can't work anymore. oldconfig builds are affected as well,
but it's more obvious as the user gets asked. [old]defconfig silently
fixes it up and selects PREEMPT_NONE.
Unbreak it by undoing the rename and adding a intermediate config
symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
to chase down a few #ifdefs, but it's better than tweaking 114
defconfigs and annoying users"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
Merge tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fixes for leaks caused by recently merged patches
- one build fix
- a fix to prevent mixing of incompatible features
* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't leak extent_map in btrfs_get_io_geometry()
btrfs: free checksum hash on in close_ctree
btrfs: Fix build error while LIBCRC32C is module
btrfs: inode: Don't compress if NODATASUM or NODATACOW set
Thomas Gleixner [Mon, 22 Jul 2019 15:59:19 +0000 (17:59 +0200)]
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
the preemption mode choice wich defaults to NONE. This also affects
oldconfig builds.
So rather than changing 114 defconfig files and being an annoyance to
users, revert the rename and select a new config symbol PREEMPTION. That
keeps everything working smoothly and the revelant ifdef's are going to be
fixed up step by step.
Reported-by: Mark Rutland <[email protected]> Fixes: a50a3f4b6a31 ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT") Signed-off-by: Thomas Gleixner <[email protected]>
Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For two regressions in media core:
- v4l2-subdev: fix regression in check_pad()
- videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
in use"
* tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
media: v4l2-subdev: fix regression in check_pad()
1) Several netfilter fixes including a nfnetlink deadlock fix from
Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
proper block sharing.
3) Fix r8169 PHY init from Thomas Voegtle.
4) Fix memory leak in mac80211, from Lorenzo Bianconi.
5) Missing NULL check on object allocation in cxgb4, from Navid
Emamdoost.
6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
7) Check that there is actually an ip header to access in skb->data in
VRF, from Peter Kosyh.
8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
9) One more tweak the the TCP fragmentation memory limit changes, to be
less harmful to applications setting small SO_SNDBUF values. From
Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
tcp: be more careful in tcp_fragment()
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
vrf: make sure skb->data contains ip header to make routing
connector: remove redundant input callback from cn_dev
qed: Prefer pcie_capability_read_word()
igc: Prefer pcie_capability_read_word()
cxgb4: Prefer pcie_capability_read_word()
be2net: Synchronize be_update_queues with dev_watchdog
bnx2x: Prevent load reordering in tx completion processing
net: phy: sfp: hwmon: Fix scaling of RX power
net: sched: verify that q!=NULL before setting q->flags
chelsio: Fix a typo in a function name
allocate_flower_entry: should check for null deref
net: hns3: typo in the name of a constant
kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
tipc: Fix a typo
mac80211: don't warn about CW params when not using them
mac80211: fix possible memory leak in ieee80211_assign_beacon
nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
nl80211: fix VENDOR_CMD_RAW_DATA
...
"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in
sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to
leading IPv6 address digits, which is not the case on big-endian
machines.
Since checking bitwise operations doesn't seem to be the point of the
test, replace two short comparisons with a single int comparison.
libbpf: Avoid designated initializers for unnamed union members
As it fails to build in some systems with:
libbpf.c: In function 'perf_buffer__new':
libbpf.c:4515: error: unknown field 'sample_period' specified in initializer
libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer
Doing as:
attr.sample_period = 1;
I.e. not as a designated initializer makes it build everywhere.
libbpf: Fix endianness macro usage for some compilers
Using endian.h and its endianness macros makes this code build in a
wider range of compilers, as some don't have those macros
(__BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__),
so use instead endian.h's macros (__BYTE_ORDER, __LITTLE_ENDIAN,
__BIG_ENDIAN) which makes this code even shorter :-)
Daniel Borkmann [Mon, 22 Jul 2019 14:04:17 +0000 (16:04 +0200)]
Merge branch 'bpf-sockmap-tls-fixes'
Jakub Kicinski says:
====================
John says:
Resolve a series of splats discovered by syzbot and an unhash
TLS issue noted by Eric Dumazet.
The main issues revolved around interaction between TLS and
sockmap tear down. TLS and sockmap could both reset sk->prot
ops creating a condition where a close or unhash op could be
called forever. A rare race condition resulting from a missing
rcu sync operation was causing a use after free. Then on the
TLS side dropping the sock lock and re-acquiring it during the
close op could hang. Finally, sockmap must be deployed before
tls for current stack assumptions to be met. This is enforced
now. A feature series can enable it.
To fix this first refactor TLS code so the lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. Then apply three fixes to sockmap to fix up races
on tear down around map free and close. Finally, if sockmap
is destroyed before TLS we add a new ULP op update to inform
the TLS stack it should not call sockmap ops. This last one
appears to be the most commonly found issue from syzbot.
v4:
- fix some use after frees;
- disable disconnect work for offload (ctx lifetime is much
more complex);
- remove some of the dead code which made it hard to understand
(for me) that things work correctly (e.g. the checks TLS is
the top ULP);
- add selftets.
====================
Jakub Kicinski [Fri, 19 Jul 2019 17:29:26 +0000 (10:29 -0700)]
selftests/tls: close the socket with open record
Add test which sends some data with MSG_MORE and then
closes the socket (never calling send without MSG_MORE).
This should make sure we clean up open records correctly.
John Fastabend [Fri, 19 Jul 2019 17:29:22 +0000 (10:29 -0700)]
bpf: sockmap/tls, close can race with map free
When a map free is called and in parallel a socket is closed we
have two paths that can potentially reset the socket prot ops, the
bpf close() path and the map free path. This creates a problem
with which prot ops should be used from the socket closed side.
If the map_free side completes first then we want to call the
original lowest level ops. However, if the tls path runs first
we want to call the sockmap ops. Additionally there was no locking
around prot updates in TLS code paths so the prot ops could
be changed multiple times once from TLS path and again from sockmap
side potentially leaving ops pointed at either TLS or sockmap
when psock and/or tls context have already been destroyed.
To fix this race first only update ops inside callback lock
so that TLS, sockmap and lowest level all agree on prot state.
Second and a ULP callback update() so that lower layers can
inform the upper layer when they are being removed allowing the
upper layer to reset prot ops.
This gets us close to allowing sockmap and tls to be stacked
in arbitrary order but will save that patch for *next trees.
v4:
- make sure we don't free things for device;
- remove the checks which swap the callbacks back
only if TLS is at the top.
John Fastabend [Fri, 19 Jul 2019 17:29:21 +0000 (10:29 -0700)]
bpf: sockmap, only create entry if ulp is not already enabled
Sockmap does not currently support adding sockets after TLS has been
enabled. There never was a real use case for this so it was never
added. But, we lost the test for ULP at some point so add it here
and fail the socket insert if TLS is enabled. Future work could
make sockmap support this use case but fixup the bug here.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
John Fastabend [Fri, 19 Jul 2019 17:29:20 +0000 (10:29 -0700)]
bpf: sockmap, synchronize_rcu before free'ing map
We need to have a synchronize_rcu before free'ing the sockmap because
any outstanding psock references will have a pointer to the map and
when they use this could trigger a use after free.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
John Fastabend [Fri, 19 Jul 2019 17:29:18 +0000 (10:29 -0700)]
net/tls: fix transition through disconnect with close
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call the tls close callback. Because of this a user could
disconnect a socket then put it in a LISTEN state which would break
our assumptions about sockets always being ESTABLISHED state.
More directly because close() can call unhash() and unhash is
implemented by sockmap if a sockmap socket has TLS enabled we can
incorrectly destroy the psock from unhash() and then call its close
handler again. But because the psock (sockmap socket representation)
is already destroyed we call close handler in sk->prot. However,
in some cases (TLS BASE/BASE case) this will still point at the
sockmap close handler resulting in a circular call and crash reported
by syzbot.
To fix both above issues implement the unhash() routine for TLS.
v4:
- add note about tls offload still needing the fix;
- move sk_proto to the cold cache line;
- split TX context free into "release" and "free",
otherwise the GC work itself is in already freed
memory;
- more TX before RX for consistency;
- reuse tls_ctx_free();
- schedule the GC work after we're done with context
to avoid UAF;
- don't set the unhash in all modes, all modes "inherit"
TLS_BASE's callbacks anyway;
- disable the unhash hook for TLS_HW.
John Fastabend [Fri, 19 Jul 2019 17:29:17 +0000 (10:29 -0700)]
net/tls: remove sock unlock/lock around strp_done()
The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.
To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.
Jakub Kicinski [Fri, 19 Jul 2019 17:29:15 +0000 (10:29 -0700)]
net/tls: don't call tls_sk_proto_close for hw record offload
The deprecated TOE offload doesn't actually do anything in
tls_sk_proto_close() - all TLS code is skipped and context
not freed. Remove the callback to make it easier to refactor
tls_sk_proto_close().
Jakub Kicinski [Fri, 19 Jul 2019 17:29:14 +0000 (10:29 -0700)]
net/tls: don't arm strparser immediately in tls_set_sw_offload()
In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().
In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Eric Dumazet [Fri, 19 Jul 2019 18:52:33 +0000 (11:52 -0700)]
tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.
Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback") Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Peter Kosyh [Fri, 19 Jul 2019 08:11:47 +0000 (11:11 +0300)]
vrf: make sure skb->data contains ip header to make routing
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).
Case:
1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.
So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.
Frederick Lawler [Thu, 18 Jul 2019 02:07:42 +0000 (21:07 -0500)]
qed: Prefer pcie_capability_read_word()
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Frederick Lawler [Thu, 18 Jul 2019 02:07:39 +0000 (21:07 -0500)]
igc: Prefer pcie_capability_read_word()
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().