Linus Torvalds [Fri, 23 Nov 2018 18:40:19 +0000 (10:40 -0800)]
Merge tag 'gpio-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Minor stuff except the IDA leak which was kind of important to fix.
Also new maintainers, yay.
- Do not lose an IDA on the gpiochip register errorpath.
- Fix the PXA non-pincontrol GPIO-using platforms.
- Fix the direction on the mockup GPIO driver.
- Add some MAINTAINERS stuff: Bartosz stepped up as GPIO
co-maintainer, and Andy established an Intel git tree"
* tag 'gpio-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
MAINTAINERS: Do maintain Intel GPIO drivers via separate tree
gpio: mockup: fix indicated direction
gpio: pxa: fix legacy non pinctrl aware builds again
gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path
MAINTAINERS: add myself as co-maintainer of gpiolib
Linus Torvalds [Fri, 23 Nov 2018 18:36:02 +0000 (10:36 -0800)]
Merge tag 'mmc-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci-pci: Fixup card detect lookup
- sdhci-pci: Workaround GLK firmware bug for tuning"
* tag 'mmc-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value
mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL
* tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drm:
drm/ast: fixed cursor may disappear sometimes
drm/ast: change resolution may cause screen blurred
drm/i915: Add rotation readout for plane initial config
drm/i915: Force a LUT update in intel_initial_commit()
drm/fb-helper: Blacklist writeback when adding connectors to fbdev
drm/i915: Write GPU relocs harder with gen3
drm/amdgpu: Enable HDP memory light sleep
drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
drm/amd/pp: handle negative values when reading OD
drm/amdgpu: Add missing firmware entry for HAINAN
drm/amd/powerplay: disable Vega20 DS related features
drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset
drm/i915: Disable LP3 watermarks on all SNB machines
drm/ast: Remove existing framebuffers before loading driver
udmabuf: set read/write flag when exporting
drm/amd/display: Support amdgpu "max bpc" connector property (v2)
drm/amdgpu: Add amdgpu "max bpc" connector property (v2)
drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates
drm/vc4: Fix NULL pointer dereference in the async update path
Specify correct type for the constants to avoid
the following sparse complaints:
./arch/arm64/include/asm/sysreg.h:471:42: warning: constant 0xffffffffffffffff is so big it is unsigned long
./arch/arm64/include/asm/sysreg.h:512:42: warning: constant 0xffffffffffffffff is so big it is unsigned long
Dave Airlie [Fri, 23 Nov 2018 01:03:20 +0000 (11:03 +1000)]
Merge tag 'drm-intel-fixes-2018-11-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix for fastboot DSI panel boot time flicker regression, also fixes Bugzilla #108225
- Fix Bugzilla #101269 to avoid GPU hangs on Sandybridge machines
- Avoid GPU hang on error capture on Broxton with Vt-d enabled
- Avoid missing GPU relocations on Pineview and Bearlake (Gen3)
====================
ibmvnic: Fix queue and buffer accounting errors
This series includes two small fixes. The first resolves a typo bug
in the code to clean up unused RX buffers during device queue removal.
The second ensures that device queue memory is updated to reflect new
supported queue ring sizes after migration to other backing hardware.
====================
Thomas Falcon [Wed, 21 Nov 2018 17:17:59 +0000 (11:17 -0600)]
ibmvnic: Update driver queues after change in ring size support
During device reset, queue memory is not being updated to accommodate
changes in ring buffer sizes supported by backing hardware. Track
any differences in ring buffer sizes following the reset and update
queue memory when possible.
Lorenzo Bianconi [Wed, 21 Nov 2018 15:32:10 +0000 (16:32 +0100)]
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine
reports the error code instead of NULL in case of failure and xdp_prog
pointer value is used in the driver to verify if XDP is currently
enabled.
Moreover report the error code to userspace if nicvf_xdp_setup fails
Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Jurgens [Wed, 21 Nov 2018 15:12:05 +0000 (17:12 +0200)]
{net, IB}/mlx4: Initialize CQ buffers in the driver when possible
Perform CQ initialization in the driver when the capability is supported
by the FW. When passing the CQ to HW indicate that the CQ buffer has
been pre-initialized.
Doing so decreases CQ creation time. Testing on P8 showed a single 2048
entry CQ creation time was reduced from ~395us to ~170us, which is
2.3x faster.
Tal Gilboa [Wed, 21 Nov 2018 14:28:23 +0000 (16:28 +0200)]
net/dim: Update DIM start sample after each DIM iteration
On every iteration of net_dim, the algorithm may choose to
check for the system state by comparing current data sample
with previous data sample. After each of these comparison,
regardless of the action taken, the sample used as baseline
is needed to be updated.
This patch fixes a bug that causes DIM to take wrong decisions,
due to never updating the baseline sample for comparison between
iterations. This way, DIM always compares current sample with
zeros.
Although this is a functional fix, it also improves and stabilizes
performance as the algorithm works properly now.
Performance:
Tested single UDP TX stream with pktgen:
samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
-m 24:8a:07:88:26:8b -f 3 -b 128
ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
Also, toggling between profiles is less frequent with the fix.
Fixes: 8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim") Signed-off-by: Tal Gilboa <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Paolo Abeni [Wed, 21 Nov 2018 13:31:15 +0000 (14:31 +0100)]
selftests: explicitly require kernel features needed by udpgro tests
commit 3327a9c46352f1 ("selftests: add functionals test for UDP GRO")
make use of ipv6 NAT, but such a feature is not currently implied by
selftests. Since the 'ip[6]tables' commands may actually create nft rules,
depending on the specific user-space version, let's pull both NF and
NFT nat modules plus the needed deps.
Reported-by: Naresh Kamboju <[email protected]> Fixes: 3327a9c46352f1 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Thu, 22 Nov 2018 16:45:44 +0000 (08:45 -0800)]
Merge tag 'sound-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only significant change is for OSS PCM emulation to convert with
kvcalloc() to address both performance and security issues. It's a
pretty straightforward change, which should be safe.
The rest are, as usual, device-specific small fixes for HD-audio"
* tag 'sound-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/ca0132 - fix AE-5 pincfg
ALSA: hda/ca0132 - Add new ZxR quirk
ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap()
ALSA: hda/realtek - Add quirk entry for HP Pavilion 15
ALSA: oss: Use kvzalloc() for local buffer allocations
Linus Torvalds [Thu, 22 Nov 2018 16:39:29 +0000 (08:39 -0800)]
Merge tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB fixes for 4.20-rc4.
There's the usual xhci and dwc2/3 fixes as well as a few minor other
issues resolved for problems that have been reported. Full details are
in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdc-acm: add entry for Hiro (Conexant) modem
usb: xhci: Prevent bus suspend if a port connect change or polling state is detected
usb: core: Fix hub port connection events lost
usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers"
usb: dwc2: pci: Fix an error code in probe
usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
usb: xhci: fix timeout for transition from RExit to U0
usb: xhci: fix uninitialized completion when USB3 port got wrong status
xhci: Add check for invalid byte size error when UAS devices are connected.
xhci: handle port status events for removed USB3 hcd
xhci: Fix leaking USB3 shared_hcd at xhci removal
USB: misc: appledisplay: add 20" Apple Cinema Display
USB: quirks: Add no-lpm quirk for Raydium touchscreens
usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB
USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub
usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
usb: dwc3: core: Clean up ULPI device
Linus Torvalds [Thu, 22 Nov 2018 16:31:46 +0000 (08:31 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes.
The qla2xxx is a regression from 4.18 and the ufs one is a device
enablement fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC
scsi: qla2xxx: Timeouts occur on surprise removal of QLogic adapter
Dave Airlie [Thu, 22 Nov 2018 01:19:20 +0000 (11:19 +1000)]
Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
- OD fixes for powerplay
- Vega20 fixes
- KFD fix for Kaveri
- add missing firmware declaration for hainan (SI chip)
- Fix DC user experience regressions related to panels that support >8 bpc
David S. Miller [Thu, 22 Nov 2018 01:10:32 +0000 (17:10 -0800)]
Merge branch 'mlxsw-Add-VxLAN-learning-support'
Ido Schimmel says:
====================
mlxsw: Add VxLAN learning support
This patchset adds VxLAN learning support in the mlxsw driver.
The first five patches from Petr add the required switchdev APIs which
allow device drivers to notify the VxLAN driver about learned / aged-out
FDB entries.
First in patch #1, an unnecessary argument is dropped from
__vxlan_fdb_delete().
In patches #2-#4, the VxLAN FDB handling code is extended to make
sending the switchdev events configurable; to mark user-added entries as
such; and to make sure HW-learned FDB entries do not take over
user-added ones.
Finally in patch #5, the necessary switchdev notifications are added and
handled by VxLAN, similarly to how this is handled in the bridge driver.
Patch #6 allows changing of the VxLAN's device ageing time since it is
useful for the selftest in the last patch.
Patch #7 adds support for querying bridge port flags of a given
netdevice, as a new entry should not be learned and notified to the
bridge driver in case learning is disabled on the bridge port.
Next patches gradually add learning support in mlxsw.
The last patch adds a new test case for VxLAN learning.
====================
Ido Schimmel [Wed, 21 Nov 2018 08:02:47 +0000 (08:02 +0000)]
mlxsw: spectrum_switchdev: Process learned VxLAN FDB entries
Start processing two new entry types in addition to current ones:
* Learned unicast tunnel entry
* Aged-out unicast tunnel entry
In both cases the device reports on a new {MAC, FID, IP address} tuple
that was learned / aged-out. Based on this notification, the driver
instructs the device to add / delete the entry to / from its database.
The driver also makes sure to notify the bridge and VxLAN drivers about
the new entry.
Ido Schimmel [Wed, 21 Nov 2018 08:02:46 +0000 (08:02 +0000)]
mlxsw: spectrum_nve: Add API to resolve learned IP addresses
FDB notifications for entries learned from an NVE tunnel contain the IP
address of the remote VTEP. In the case of IPv4 underlay, the IP address
is specified as-is. IPv6 addresses on the other hand, are specified as
handles which then need to be used to query the actual address from the
device.
Only IPv4 underlay is currently supported, so we cannot receive
notifications for IPv6 addresses and therefore an error is returned when
one tries to resolve such an address.
Ido Schimmel [Wed, 21 Nov 2018 08:02:44 +0000 (08:02 +0000)]
mlxsw: spectrum_fid: Store ifindex of NVE device in FID
The driver periodically polls for new FDB entries learned by the device.
In the case of an FDB entry learned from a VxLAN tunnel, the
notification includes the IP of the remote VTEP, the filtering
identifier (FID) and the source MAC address of the overlay packet.
Assuming learning is enabled in the VxLAN and bridge drivers, the driver
needs to generate a notification and update them about the new FDB
entry.
Store the ifindex of the NVE device in the FID so that the driver will
be able to update the VxLAN and bridge drivers using it.
Ido Schimmel [Wed, 21 Nov 2018 08:02:41 +0000 (08:02 +0000)]
bridge: Allow querying bridge port flags
Allow querying bridge port flags so that drivers capable of performing
VxLAN learning will update the bridge driver only if learning is enabled
on its bridge port corresponding to the VxLAN device.
Ido Schimmel [Wed, 21 Nov 2018 08:02:40 +0000 (08:02 +0000)]
vxlan: Allow changing ageing time
In a similar fashion to the bridge device, allow changing the ageing
time of the VxLAN device by scheduling its timer to fire if the ageing
time changed.
One use case is selftests where learning / ageing of VxLAN FDB entries
is tested. The default ageing time is 5 minutes, which is too long for a
simple selftest.
Petr Machata [Wed, 21 Nov 2018 08:02:39 +0000 (08:02 +0000)]
vxlan: Add hardware FDB learning
In order to allow devices to signal learning events to VXLAN, introduce
two new switchdev messages: SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE and
SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE.
Listen to these notifications in the vxlan driver. The FDB entries
learned this way have an NTF_EXT_LEARNED flag, and only entries marked
as such can be unlearned by the _DEL_ event. They are also immediately
marked as offloaded. This is the same behavior that the bridge driver
observes.
Petr Machata [Wed, 21 Nov 2018 08:02:37 +0000 (08:02 +0000)]
vxlan: Don't override user-added entries with ext-learned ones
When an external learning event collides with an user-added entry, the
user-added entry shouldn't be taken over. Otherwise on an unlearn event
the entry would be completely lost, even though the user added it by
hand.
Therefore skip update of FDB flags and state for these cases. This is in
accordance with the bridge behavior.
Petr Machata [Wed, 21 Nov 2018 08:02:36 +0000 (08:02 +0000)]
vxlan: Mark user-added FDB entries
The VXLAN driver needs to differentiate between FDB entries learned by
the VXLAN driver, and those added by the user. The latter ones shouldn't
be taken over by external learning events. This is in accordance with
bridge behavior.
Therefore, extend the flags bitfield to 16 bits and add a new private
NTF flag to mark the user-added entries.
This seems preferable to adding a dedicated boolean, because passing the
flag, unlike passing e.g. a true, makes it clear what the meaning of the
bit is.
Petr Machata [Wed, 21 Nov 2018 08:02:35 +0000 (08:02 +0000)]
vxlan: vxlan_fdb_notify(): Make switchdev notification configurable
In a following patch, vxlan is extended to allow hardware FDB learning.
For FDB entries learned this way, switchdev notifications should not be
sent again, because the driver already knows about these entries.
To that end, add an argument vxlan_fdb_notify() to determine whether
the switchdev notifications should be sent. Propagate the argument to
all call sites transitively, eventually passing true in all root calls.
Petr Machata [Wed, 21 Nov 2018 08:02:34 +0000 (08:02 +0000)]
vxlan: __vxlan_fdb_delete(): Drop unused argument vid
This argument is necessary for vxlan_fdb_delete(), the API of which is
prescribed by ndo_fdb_del, but __vxlan_fdb_delete() doesn't need it.
Therefore drop it.
Vincent Chen [Wed, 21 Nov 2018 01:38:11 +0000 (09:38 +0800)]
net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
In the original ftmac100_interrupt(), the interrupts are only disabled when
the condition "netif_running(netdev)" is true. However, this condition
causes kerenl hang in the following case. When the user requests to
disable the network device, kernel will clear the bit __LINK_STATE_START
from the dev->state and then call the driver's ndo_stop function. Network
device interrupts are not blocked during this process. If an interrupt
occurs between clearing __LINK_STATE_START and stopping network device,
kernel cannot disable the interrupts due to the condition
"netif_running(netdev)" in the ISR. Hence, kernel will hang due to the
continuous interruption of the network device.
In order to solve the above problem, the interrupts of the network device
should always be disabled in the ISR without being restricted by the
condition "netif_running(netdev)".
Y.C. Chen [Wed, 3 Oct 2018 06:57:47 +0000 (14:57 +0800)]
drm/ast: change resolution may cause screen blurred
The value of pitches is not correct while calling mode_set.
The issue we found so far on following system:
- Debian8 with XFCE Desktop
- Ubuntu with KDE Desktop
- SUSE15 with KDE Desktop
David S. Miller [Thu, 22 Nov 2018 00:14:56 +0000 (16:14 -0800)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fixes 2018-11-12
here is V4 of some net/smc fixes in different areas for the net tree.
v1->v2:
do not define 8-byte alignment for union smcd_cdc_cursor in
patch 4/5 "net/smc: atomic SMCD cursor handling"
v2->v3:
stay with 8-byte alignment for union smcd_cdc_cursor in
patch 4/5 "net/smc: atomic SMCD cursor handling", but get rid of
__packed for struct smcd_cdc_msg
v3->v4:
get rid of another __packed for struct smc_cdc_msg in
patch 4/5 "net/smc: atomic SMCD cursor handling"
====================
Ursula Braun [Tue, 20 Nov 2018 15:46:43 +0000 (16:46 +0100)]
net/smc: use after free fix in smc_wr_tx_put_slot()
In smc_wr_tx_put_slot() field pend->idx is used after being
cleared. That means always idx 0 is cleared in the wr_tx_mask.
This results in a broken administration of available WR send
payload buffers.
Hans Wippel [Tue, 20 Nov 2018 15:46:41 +0000 (16:46 +0100)]
net/smc: add SMC-D shutdown signal
When a SMC-D link group is freed, a shutdown signal should be sent to
the peer to indicate that the link group is invalid. This patch adds the
shutdown signal to the SMC code.
Karsten Graul [Tue, 20 Nov 2018 15:46:40 +0000 (16:46 +0100)]
net/smc: use queue pair number when matching link group
When searching for an existing link group the queue pair number is also
to be taken into consideration. When the SMC server sends a new number
in a CLC packet (keeping all other values equal) then a new link group
is to be created on the SMC client side.
Hans Wippel [Tue, 20 Nov 2018 15:46:39 +0000 (16:46 +0100)]
net/smc: abort CLC connection in smc_release
In case of a non-blocking SMC socket, the initial CLC handshake is
performed over a blocking TCP connection in a worker. If the SMC socket
is released, smc_release has to wait for the blocking CLC socket
operations (e.g., kernel_connect) inside the worker.
This patch aborts a CLC connection when the respective non-blocking SMC
socket is released to avoid waiting on socket operations or timeouts.
Eric Dumazet [Tue, 20 Nov 2018 13:53:59 +0000 (05:53 -0800)]
tcp: defer SACK compression after DupThresh
Jean-Louis reported a TCP regression and bisected to recent SACK
compression.
After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).
Sender waits a full RTO timer before recovering losses.
While RFC 6675 says in section 5, "Algorithm Details",
(2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
indicating at least three segments have arrived above the current
cumulative acknowledgment point, which is taken to indicate loss
-- go to step (4).
...
(4) Invoke fast retransmit and enter loss recovery as follows:
there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.
While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.
This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.
Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.
Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.
Petr Machata [Tue, 20 Nov 2018 11:39:56 +0000 (11:39 +0000)]
net: skb_scrub_packet(): Scrub offload_fwd_mark
When a packet is trapped and the corresponding SKB marked as
already-forwarded, it retains this marking even after it is forwarded
across veth links into another bridge. There, since it ingresses the
bridge over veth, which doesn't have offload_fwd_mark, it triggers a
warning in nbp_switchdev_frame_mark().
Then nbp_switchdev_allowed_egress() decides not to allow egress from
this bridge through another veth, because the SKB is already marked, and
the mark (of 0) of course matches. Thus the packet is incorrectly
blocked.
Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
function is called from tunnels and also from veth, and thus catches the
cases where traffic is forwarded between bridges and transformed in a
way that invalidates the marking.
Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") Signed-off-by: Petr Machata <[email protected]> Suggested-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Daniel Borkmann [Wed, 21 Nov 2018 22:33:22 +0000 (23:33 +0100)]
Merge branch 'bpf-libbpf-mapinmap'
Nikita V. Shirokov says:
====================
In this patch series I'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS).
First patch contains new helper + explains proposed workflow second patch
contains tests which also could be used as example usage.
v4->v5:
- naming: renamed everything to map_in_map instead of mapinmap
- start to return nonzero val if set_inner_map_fd failed
v3->v4:
- renamed helper to set_inner_map_fd
- now we set this value only if it haven't
been set before and only for (array|hash) of maps
v2->v3:
- fixing typo in patch description
- initializing inner_map_fd to -1 by default
v1->v2:
- addressing nits
- removing const identifier from fd in new helper
- starting to check return val for bpf_map_update_elem
====================
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load
bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities
It currently only checks whether kernel supports map/prog names.
This capability check will be used in the next two commits to
skip setting prog/map names.
Yonghong Song [Wed, 21 Nov 2018 19:22:42 +0000 (11:22 -0800)]
bpf: fix a libbpf loader issue
Commit 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
added support to read .BTF.ext sections from an object file, create
and pass prog_btf_fd and func_info to the kernel.
The program btf_fd (prog->btf_fd) is initialized to be -1 to please
zclose so we do not need special handling dur prog close.
Passing -1 to the kernel, however, will cause loading error.
Passing btf_fd 0 to the kernel if prog->btf_fd is invalid
fixed the problem.
Linus Torvalds [Wed, 21 Nov 2018 19:28:20 +0000 (11:28 -0800)]
Merge tag 'riscv-for-linus-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"This week is a bit bigger than I expected. That's my fault, as I
missed a few patches while I was at Plumbers last week. We have:
- A fix to a quite embarassing issue where raw_copy_to_user() was
implemented with asm_copy_from_user() (and vice versa).
- Improvements to our makefile to allow flat binaries to be
generated.
- A build fix that predeclares "struct module" at the top of
<asm/module.h>, which triggers warnings later in that header.
- The addition of our own <uapi/asm/unistd> header, which is
necessary to align our stat ABI on 32-bit systems.
- A fix to avoid printing a warning when the S or U bits are set in
print_isa().
I already have one patch in the queue for next week"
* tag 'riscv-for-linus-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: recognize S/U mode bits in print_isa
riscv: add asm/unistd.h UAPI header
riscv: fix warning in arch/riscv/include/asm/module.h
RISC-V: Build flat and compressed kernel images
RISC-V: Fix raw_copy_{to,from}_user()
Paul E. McKenney [Sun, 11 Nov 2018 19:43:42 +0000 (11:43 -0800)]
ixgbe: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu(). This commit therefore makes this change.
While reviewing code, I noticed that Eric Dumazet recommends that
drivers check the return code of napi_complete_done, and use that
to decide to enable interrupts or not when exiting poll. One of
the Intel drivers was already fixed (ixgbe).
Upon looking at the Intel drivers as a whole, we are handling our
polling and NAPI exit in a few different ways based on whether we
have multiqueue and whether we have Tx cleanup included. Several
drivers had the bug of exiting NAPI with return 0, which appears
to mess up the accounting in the stack.
Consolidate all the NAPI routines to do best known way of exiting
and to just mostly look like each other.
1) check return code of napi_complete_done to control interrupt enable
2) return the actual amount of work done.
3) return budget immediately if need NAPI poll again
Tested the changes on e1000e with a high interrupt rate set, and
it shows about an 8% reduction in the CPU utilization when busy
polling because we aren't re-enabling interrupts when we're about
to be polled.
Dave Chinner [Wed, 21 Nov 2018 16:06:37 +0000 (08:06 -0800)]
iomap: readpages doesn't zero page tail beyond EOF
When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.
However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.
Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.
The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.
Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.
This patch results in correct EOF detection through readpages:
As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.
The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:
Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.
This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.
It returns EINVAL when the operation is not supported by the
filesystem. Fix it to return EOPNOTSUPP to be consistent with
the man page and clone_file_range().
Clean up the inconsistent error return handling while I'm there.
(I know, lipstick on a pig, but every little bit helps...)
Dave Chinner [Mon, 19 Nov 2018 21:31:11 +0000 (13:31 -0800)]
iomap: dio data corruption and spurious errors when pipes fill
When doing direct IO to a pipe for do_splice_direct(), then pipe is
trivial to fill up and overflow as it can only hold 16 pages. At
this point bio_iov_iter_get_pages() then returns -EFAULT, and we
abort the IO submission process. Unfortunately, iomap_dio_rw()
propagates the error back up the stack.
The error is converted from the EFAULT to EAGAIN in
generic_file_splice_read() to tell the splice layers that the pipe
is full. do_splice_direct() completely fails to handle EAGAIN errors
(it aborts on error) and returns EAGAIN to the caller.
copy_file_write() then completely fails to handle EAGAIN as well,
and so returns EAGAIN to userspace, having failed to copy the data
it was asked to.
Avoid this whole steaming pile of fail by having iomap_dio_rw()
silently swallow EFAULT errors and so do short reads.
To make matters worse, iomap_dio_actor() has a stale data exposure
bug bio_iov_iter_get_pages() fails - it does not zero the tail block
that it may have been left uncovered by partial IO. Fix the error
handling case to drop to the sub-block zeroing rather than
immmediately returning the -EFAULT error.
Dave Chinner [Mon, 19 Nov 2018 21:31:10 +0000 (13:31 -0800)]
iomap: sub-block dio needs to zeroout beyond EOF
If we are doing sub-block dio that extends EOF, we need to zero
the unused tail of the block to initialise the data in it it. If we
do not zero the tail of the block, then an immediate mmap read of
the EOF block will expose stale data beyond EOF to userspace. Found
with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.
Fix this by detecting if the end of the DIO write is beyond EOF
and zeroing the tail if necessary.
Dave Chinner [Mon, 19 Nov 2018 21:31:10 +0000 (13:31 -0800)]
iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
When we write into an unwritten extent via direct IO, we dirty
metadata on IO completion to convert the unwritten extent to
written. However, when we do the FUA optimisation checks, the inode
may be clean and so we issue a FUA write into the unwritten extent.
This means we then bypass the generic_write_sync() call after
unwritten extent conversion has ben done and we don't force the
modified metadata to stable storage.
This violates O_DSYNC semantics. The window of exposure is a single
IO, as the next DIO write will see the inode has dirty metadata and
hence will not use the FUA optimisation. Calling
generic_write_sync() after completion of the second IO will also
sync the first write and it's metadata.
Fix this by avoiding the FUA optimisation when writing to unwritten
extents.
Dave Chinner [Tue, 20 Nov 2018 06:50:08 +0000 (22:50 -0800)]
xfs: delalloc -> unwritten COW fork allocation can go wrong
Long saga. There have been days spent following this through dead end
after dead end in multi-GB event traces. This morning, after writing
a trace-cmd wrapper that enabled me to be more selective about XFS
trace points, I discovered that I could get just enough essential
tracepoints enabled that there was a 50:50 chance the fsx config
would fail at ~115k ops. If it didn't fail at op 115547, I stopped
fsx at op 115548 anyway.
That gave me two traces - one where the problem manifested, and one
where it didn't. After refining the traces to have the necessary
information, I found that in the failing case there was a real
extent in the COW fork compared to an unwritten extent in the
working case.
Walking back through the two traces to the point where the CWO fork
extents actually diverged, I found that the bad case had an extra
unwritten extent in it. This is likely because the bug it led me to
had triggered multiple times in those 115k ops, leaving stray
COW extents around. What I saw was a COW delalloc conversion to an
unwritten extent (as they should always be through
xfs_iomap_write_allocate()) resulted in a /written extent/:
xfs_writepage: dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0
xfs_iext_remove: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real
xfs_bmap_pre_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real
xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex
And the result according to the xfs_bmap_post_update trace was:
0 1 32 52
+H+wwwwwwwwwwwwwwwwwwwwwwww+
PREV
Which is clearly wrong - it should be a merged unwritten extent,
not an unwritten extent.
That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG
case in xfs_bmap_add_extent_delay_real(), and sure enough, there's
the bug.
It takes the old delalloc extent (PREV) and adds the length of the
RIGHT extent to it, takes the start block from NEW, removes the
RIGHT extent and then updates PREV with the new extent.
What it fails to do is update PREV.br_state. For delalloc, this is
always XFS_EXT_NORM, while in this case we are converting the
delayed allocation to unwritten, so it needs to be updated to
XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so
the resultant extent is always written.
And that's the bug I've been chasing for a week - a bmap btree bug,
not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced
with the recent in core extent tree scalability enhancements.
Dave Chinner [Mon, 19 Nov 2018 21:31:10 +0000 (13:31 -0800)]
xfs: flush removing page cache in xfs_reflink_remap_prep
On a sub-page block size filesystem, fsx is failing with a data
corruption after a series of operations involving copying a file
with the destination offset beyond EOF of the destination of the file:
8093(157 mod 256): TRUNCATE DOWN from 0x7a120 to 0x50000 ******WWWW
8094(158 mod 256): INSERT 0x25000 thru 0x25fff (0x1000 bytes)
8095(159 mod 256): COPY 0x18000 thru 0x1afff (0x3000 bytes) to 0x2f400
8096(160 mod 256): WRITE 0x5da00 thru 0x651ff (0x7800 bytes) HOLE
8097(161 mod 256): COPY 0x2000 thru 0x5fff (0x4000 bytes) to 0x6fc00
The second copy here is beyond EOF, and it is to sub-page (4k) but
block aligned (1k) offset. The clone runs the EOF zeroing, landing
in a pre-existing post-eof delalloc extent. This zeroes the post-eof
extents in the page cache just fine, dirtying the pages correctly.
The problem is that xfs_reflink_remap_prep() now truncates the page
cache over the range that it is copying it to, and rounds that down
to cover the entire start page. This removes the dirty page over the
delalloc extent from the page cache without having written it back.
Hence later, when the page cache is flushed, the page at offset
0x6f000 has not been written back and hence exposes stale data,
which fsx trips over less than 10 operations later.
Fix this by changing xfs_reflink_remap_prep() to use
xfs_flush_unmap_range().
Merge tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower utility updates for 4.20-rc4 from Shuah Khan:
"This cpupower update for Linux 4.20-rc4 consists of compile fixes to allow
use of outside build flags and override of CFLAGS from Jiri Olsa, and fix
to compilation with STATIC=true from Konstantin Khlebnikov."
* tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
tools cpupower: Override CFLAGS assignments
tools cpupower debug: Allow to use outside build flags
tools/power/cpupower: fix compilation with STATIC=true
Ville Syrjälä [Tue, 20 Nov 2018 13:54:50 +0000 (15:54 +0200)]
drm/i915: Add rotation readout for plane initial config
If we need to force a full plane update before userspace/fbdev
have given us a proper plane state we should try to maintain the
current plane state as much as possible (apart from the parts
of the state we're trying to fix up with the plane update).
To that end add basic readout for the plane rotation and
maintain it during the initial fb takeover.
Ville Syrjälä [Tue, 20 Nov 2018 13:54:49 +0000 (15:54 +0200)]
drm/i915: Force a LUT update in intel_initial_commit()
If we force a plane update to fix up our half populated plane state
we'll also force on the pipe gamma for the plane (since we always
enable pipe gamma currently). If the BIOS hasn't programmed a sensible
LUT into the hardware this will cause the image to become corrupted.
Typical symptoms are a purple/yellow/etc. flash when the driver loads.
To avoid this let's program something sensible into the LUT when
we do the plane update. In the future I plan to add proper plane
gamma enable readout so this is just a temporary measure.
Hans de Goede [Mon, 19 Nov 2018 18:06:01 +0000 (19:06 +0100)]
ACPI / platform: Add SMB0001 HID to forbidden_id_list
Many HP AMD based laptops contain an SMB0001 device like this:
Device (SMBD)
{
Name (_HID, "SMB0001") // _HID: Hardware ID
Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings
{
IO (Decode16,
0x0B20, // Range Minimum
0x0B20, // Range Maximum
0x20, // Alignment
0x20, // Length
)
IRQ (Level, ActiveLow, Shared, )
{7}
})
}
The legacy style IRQ resource here causes acpi_dev_get_irqresource() to
be called with legacy=true and this message to show in dmesg:
ACPI: IRQ 7 override to edge, high
This causes issues when later on the AMD0030 GPIO device gets enumerated:
Device (GPIO)
{
Name (_HID, "AMDI0030") // _HID: Hardware ID
Name (_CID, "AMDI0030") // _CID: Compatible ID
Name (_UID, Zero) // _UID: Unique ID
Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
{
Name (RBUF, ResourceTemplate ()
{
Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, )
{
0x00000007,
}
Memory32Fixed (ReadWrite,
0xFED81500, // Address Base
0x00000400, // Address Length
)
})
Return (RBUF) /* \_SB_.GPIO._CRS.RBUF */
}
}
Now acpi_dev_get_irqresource() gets called with legacy=false, but because
of the earlier override of the trigger-type acpi_register_gsi() returns
-EBUSY (because we try to register the same interrupt with a different
trigger-type) and we end up setting IORESOURCE_DISABLED in the flags.
The setting of IORESOURCE_DISABLED causes platform_get_irq() to call
acpi_irq_get() which is not implemented on x86 and returns -EINVAL.
resulting in the following in dmesg:
amd_gpio AMDI0030:00: Failed to get gpio IRQ: -22
amd_gpio: probe of AMDI0030:00 failed with error -22
The SMB0001 is a "virtual" device in the sense that the only way the OS
interacts with it is through calling a couple of methods to do SMBus
transfers. As such it is weird that it has IO and IRQ resources at all,
because the driver for it is not expected to ever access the hardware
directly.
The Linux driver for the SMB0001 device directly binds to the acpi_device
through the acpi_bus, so we do not need to instantiate a platform_device
for this ACPI device. This commit adds the SMB0001 HID to the
forbidden_id_list, avoiding the instantiating of a platform_device for it.
Not instantiating a platform_device means we will no longer call
acpi_dev_get_irqresource() for the legacy IRQ resource fixing the probe of
the AMDI0030 device failing.
drm/fb-helper: Blacklist writeback when adding connectors to fbdev
Writeback connectors do not produce any on-screen output and require
special care for use. Such connectors are hidden from enumeration in
DRM resources by default, but they are still picked-up by fbdev.
This makes rather little sense since fbdev is not really adapted for
dealing with writeback.
Moreover, this is also a source of issues when userspace disables the
CRTC (and associated plane) without detaching the CRTC from the
connector (which is hidden by default). In this case, the connector is
still using the CRTC, leading to am "enabled/connectors mismatch" and
eventually the failure of the associated atomic commit. This situation
happens with VC4 testing under IGT GPU Tools.
Filter out writeback connectors in the fbdev helper to solve this.
Chris Wilson [Mon, 19 Nov 2018 15:41:53 +0000 (15:41 +0000)]
drm/i915: Write GPU relocs harder with gen3
Under moderate amounts of GPU stress, we can observe on Bearlake and
Pineview (later gen3 models) that we execute the following batch buffer
before the write into the batch is coherent. Adding extra (tested with
upto 32x) MI_FLUSH to either the invalidation, flush or both phases does
not solve the incoherency issue with the relocations, but emitting the
MI_STORE_DWORD_IMM twice does. So be it.
David S. Miller [Wed, 21 Nov 2018 04:59:27 +0000 (20:59 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2018-11-20
This series contains updates to the ice driver only.
Akeem updates the driver to determine whether or not to do
auto-negotiation based on the VSI state.
Bruce cleans up the control queue code to remove duplicate code. Take
advantage of some compiler optimizations by making some structures
constant, and also note that they cannot be modified. Cleaned up
formatting issues and code comment that needed clarification. Fixed a
potential NULL pointer dereference by adding a check.
Jaroslaw adds a check to verify if memory was allocated or not.
Yashaswini Raghuram fixes the driver to ensure we are not enabling the
LAN_EN flag if the MAC in the MAC-VLAN is a unicast MAC, so that the
unicast packets are not forwarded to the wire.
Dave fixes the return value of ice_napi_poll() to be more useful in
returning the work that was done and should only return 0 when no work
was done.
Anirudh does code comment cleanup, to make more consistent.
====================
====================
net: dsa: microchip: Modify KSZ9477 DSA driver in preparation to add other KSZ switch drivers
This series of patches is to modify the original KSZ9477 DSA driver so
that other KSZ switch drivers can be added and use the common code.
There are several steps to accomplish this achievement. First is to
rename some function names with a prefix to indicate chip specific
function. Second is to move common code into header that can be shared.
Last is to modify tag_ksz.c so that it can handle many tail tag formats
used by different KSZ switch drivers.
ksz_common.c will contain the common code used by all KSZ switch drivers.
ksz9477.c will contain KSZ9477 code from the original ksz_common.c.
ksz9477_spi.c is renamed from ksz_spi.c.
ksz9477_reg.h is renamed from ksz_9477_reg.h.
ksz_common.h is added to provide common code access to KSZ switch
drivers.
ksz_spi.h is added to provide common SPI access functions to KSZ SPI
drivers.
v4
- Patches were removed to concentrate on changing driver structure without
adding new code.
v3
- The phy_device structure is used to hold port link information
- A structure is passed in ksz_xmit and ksz_rcv instead of function pointer
- Switch offload forwarding is supported
v2
- Initialize reg_mutex before use
- The alu_mutex is only used inside chip specific functions
v1
- Each patch in the set is self-contained
- Use ksz9477 prefix to indicate KSZ9477 specific code
====================
Tristram Ha [Tue, 20 Nov 2018 23:55:09 +0000 (15:55 -0800)]
net: dsa: microchip: break KSZ9477 DSA driver into two files
Break KSZ9477 DSA driver into two files in preparation to add more KSZ
switch drivers.
Add common functions in ksz_common.h so that other KSZ switch drivers
can access code in ksz_common.c.
Add ksz_spi.h for common functions used by KSZ switch SPI drivers.
Yonghong Song [Tue, 20 Nov 2018 22:08:20 +0000 (14:08 -0800)]
bpf: fix a compilation error when CONFIG_BPF_SYSCALL is not defined
Kernel test robot ([email protected]) reports a compilation error at
https://www.spinics.net/lists/netdev/msg534913.html
introduced by commit 838e96904ff3 ("bpf: Introduce bpf_func_info").
If CONFIG_BPF is defined and CONFIG_BPF_SYSCALL is not defined,
the following error will appear:
kernel/bpf/core.c:414: undefined reference to `btf_type_by_id'
kernel/bpf/core.c:415: undefined reference to `btf_name_by_offset'
When CONFIG_BPF_SYSCALL is not defined,
let us define stub inline functions for btf_type_by_id()
and btf_name_by_offset() in include/linux/btf.h.
This way, the compilation failure can be avoided.
Davide Caratti [Tue, 20 Nov 2018 21:18:44 +0000 (22:18 +0100)]
net/sched: act_police: fix race condition on state variables
after 'police' configuration parameters were converted to use RCU instead
of spinlock, the state variables used to compute the traffic rate (namely
'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in
the traffic path without any protection.
Use a dedicated spinlock to avoid race conditions on these variables, and
ensure proper cache-line alignment. In this way, 'police' is still faster
than what we observed when 'tcf_lock' was used in the traffic path _ i.e.
reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock
in the data path"). Moreover, we preserve the throughput improvement that
was obtained after 'police' started using per-cpu counters, when 'avrate'
is used instead of 'rate'.
Changes since v1 (thanks to Eric Dumazet):
- call ktime_get_ns() before acquiring the lock in the traffic path
- use a dedicated spinlock instead of tcf_lock
- improve cache-line usage
Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path") Reported-and-suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Eric Dumazet <[email protected]>
Linus Torvalds [Tue, 20 Nov 2018 22:31:00 +0000 (14:31 -0800)]
Merge tag 'mips_fixes_4.20_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes for 4.20:
- Re-enable the Cavium Octeon USB driver in its defconfig after it
was accidentally removed back in 4.14.
- Have early memblock allocations be performed bottom-up to more
closely match the behaviour we used to have with bootmem, which
seems a safer choice since we've seen fallout from the change made
in the 4.20 merge window.
- Simplify max_low_pfn calculation in the NUMA code for the Loongson3
and SGI IP27 platforms to both clean up the code & ensure
max_low_pfn has been set appropriately before it is used"
* tag 'mips_fixes_4.20_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Loongson3,SGI-IP27: Simplify max_low_pfn calculation
MIPS: Let early memblock_alloc*() allocate memories bottom-up
MIPS: OCTEON: cavium_octeon_defconfig: re-enable OCTEON USB driver