Julian Wiedmann [Wed, 18 Dec 2019 15:32:28 +0000 (16:32 +0100)]
s390/qeth: don't return -ENOTSUPP to userspace
ENOTSUPP is not uapi, use EOPNOTSUPP instead.
Fixes: d66cb37e9664 ("qeth: Add new priority queueing options") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Julian Wiedmann [Wed, 18 Dec 2019 15:32:27 +0000 (16:32 +0100)]
s390/qeth: fix promiscuous mode after reset
When managing the promiscuous mode during an RX modeset, qeth caches the
current HW state to avoid repeated programming of the same state on each
modeset.
But while tearing down a device, we forget to clear the cached state. So
when the device is later set online again, the initial RX modeset
doesn't program the promiscuous mode since we believe it is already
enabled.
Fix this by clearing the cached state in the tear-down path.
Note that for the SBP variant of promiscuous mode, this accidentally
works right now because we unconditionally restore the SBP role while
re-initializing.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <[email protected]> Reviewed-by: Alexandra Winter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Julian Wiedmann [Wed, 18 Dec 2019 15:32:26 +0000 (16:32 +0100)]
s390/qeth: handle error due to unsupported transport mode
Along with z/VM NICs, there's additional device types that only support
a specific transport mode (eg. external-bridged IQD).
Identify the corresponding error code, and raise a fitting error message
so that the user knows to adjust their device configuration.
On top of that also fix the subsequent error path, so that the rejected
cmd doesn't need to wait for a timeout but gets cancelled straight away.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Jeffery [Tue, 17 Dec 2019 16:00:24 +0000 (11:00 -0500)]
sbitmap: only queue kyber's wait callback if not already active
Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck. When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.
This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait. Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented. So the sbitmap queue does not know there is a
waiter on a wait list.
Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.
This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL. If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber. Let it run and add the
sbt_wait to the wait list if still needing to wait.
Rahul Lakkireddy [Wed, 18 Dec 2019 03:49:29 +0000 (09:19 +0530)]
cxgb4: fix refcount init for TC-MQPRIO offload
Properly initialize refcount to 1 when hardware queue arrays for
TC-MQPRIO offload have been freshly allocated. Otherwise, following
warning is observed. Also fix up error path to only free hardware
queue arrays when refcount reaches 0.
v2:
- Move the refcount_set() closer to where the hardware queue arrays
are being allocated.
- Fix up error path to only free hardware queue arrays when refcount
reaches 0.
Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support") Signed-off-by: Rahul Lakkireddy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Fri, 20 Dec 2019 21:36:49 +0000 (13:36 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Leftover put_cpu() in the perf/smmuv3 error path.
- Add Hisilicon TSV110 to spectre-v2 safe list
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list
perf/smmuv3: Remove the leftover put_cpu() in error path
Linus Torvalds [Fri, 20 Dec 2019 21:33:50 +0000 (13:33 -0800)]
Merge tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Probably the last one before Christmas, I'll see if there is much
demand over next few weeks for more fixes, I expect it'll be quiet
enough.
This has one exynos fix, and a bunch of i915 core and i915 GVT fixes.
Summary:
exynos:
- component delete fix
i915:
- Fix to drop an unused and harmful display W/A
- Fix to define EHL power wells independent of ICL
- Fix for priority inversion on bonded requests
- Fix in mmio offset calculation of DSB instance
- Fix memory leak from get_task_pid when banning clients
- Fixes to avoid dereference of uninitialized ops in dma_fence
tracing and keep reference to execbuf object until submitted.
- vGPU state setting locking fix (Zhenyu)
- Fix vGPU display dmabuf as read-only (Zhenyu)
- Properly handle vGPU display dmabuf page pin when rendering (Tina)
- Fix one guest boot warning to handle guc reset state (Fred)"
* tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm:
drm/exynos: gsc: add missed component_del
drm/i915: Fix pid leak with banned clients
drm/i915/gem: Keep request alive while attaching fences
drm/i915: Fix WARN_ON condition for cursor plane ddb allocation
drm/i915/gvt: Fix guest boot warning
drm/i915/tgl: Drop Wa#1178
drm/i915/ehl: Define EHL powerwells independently of ICL
drm/i915: Set fence_work.ops before dma_fence_init
drm/i915: Copy across scheduler behaviour flags across submit fences
drm/i915/dsb: Fix in mmio offset calculation of DSB instance
drm/i915/gvt: Pin vgpu dma address before using
drm/i915/gvt: set guest display buffer as readonly
drm/i915/gvt: use vgpu lock for active state setting
Linus Torvalds [Fri, 20 Dec 2019 21:30:49 +0000 (13:30 -0800)]
Merge tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Here's a set of fixes that should go into 5.5-rc3 for io_uring.
This is bigger than I'd like it to be, mainly because we're fixing the
case where an application reuses sqe data right after issue. This
really must work, or it's confusing. With 5.5 we're flagging us as
submit stable for the actual data, this must also be the case for
SQEs.
Honestly, I'd really like to add another series on top of this, since
it cleans it up considerable and prevents any SQE reuse by design. I
posted that here:
and may still send it your way early next week once it's been looked
at and had some more soak time (does pass all regression tests). With
that series, we've unified the prep+issue handling, and only the prep
phase even has access to the SQE.
Anyway, outside of that, fixes in here for a few other issues that
have been hit in testing or production"
* tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block:
io_uring: io_wq_submit_work() should not touch req->rw
io_uring: don't wait when under-submitting
io_uring: warn about unhandled opcode
io_uring: read opcode and user_data from SQE exactly once
io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
io_uring: make IORING_OP_CANCEL_ASYNC deferrable
io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
io_uring: make HARDLINK imply LINK
io_uring: any deferred command must have stable sqe data
io_uring: remove 'sqe' parameter to the OP helpers that take it
io_uring: fix pre-prepped issue with force_nonblock == true
io-wq: re-add io_wq_current_is_worker()
io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG
io_uring: fix stale comment and a few typos
Dave Airlie [Fri, 20 Dec 2019 20:08:19 +0000 (06:08 +1000)]
Merge tag 'drm-intel-fixes-2019-12-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix to drop an unused and harmful display W/A
- Fix to define EHL power wells independent of ICL
- Fix for priority inversion on bonded requests
- Fix in mmio offset calculation of DSB instance
- Fix memory leak from get_task_pid when banning clients
- Fixes to avoid dereference of uninitialized ops in dma_fence tracing
and keep reference to execbuf object until submitted.
Helge Deller [Fri, 20 Dec 2019 20:00:19 +0000 (21:00 +0100)]
parisc: Fix compiler warnings in debug_core.c
Fix this compiler warning:
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
arch/parisc/include/asm/cmpxchg.h:48:3: warning: value computed is not used [-Wunused-value]
48 | ((__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))))
arch/parisc/include/asm/atomic.h:78:30: note: in expansion of macro ‘xchg’
78 | #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
| ^~~~
kernel/debug/debug_core.c:596:4: note: in expansion of macro ‘atomic_xchg’
596 | atomic_xchg(&kgdb_active, cpu);
| ^~~~~~~~~~~
If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
Stefan Haberland [Thu, 19 Dec 2019 08:43:51 +0000 (09:43 +0100)]
s390/dasd: fix memleak in path handling error case
If for whatever reason the dasd_eckd_check_characteristics() function
exits after at least some paths have their configuration data
allocated those data is never freed again. In the error case the
device->private pointer is set to NULL and dasd_eckd_uncheck_device()
will exit without freeing the path data because of this NULL pointer.
Fix by calling dasd_eckd_clear_conf_data() for error cases.
Also use dasd_eckd_clear_conf_data() in dasd_eckd_uncheck_device()
to avoid code duplication.
Jan Höppner [Thu, 19 Dec 2019 08:43:50 +0000 (09:43 +0100)]
s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
The max data count (mdc) is an unsigned 16-bit integer value as per AR
documentation and is received via ccw_device_get_mdc() for a specific
path mask from the CIO layer. The function itself also always returns a
positive mdc value or 0 in case mdc isn't supported or couldn't be
determined.
Though, the comment for this function describes a negative return value
to indicate failures.
As a result, the DASD device driver interprets the return value of
ccw_device_get_mdc() incorrectly. The error case is essentially a dead
code path.
To fix this behaviour, check explicitly for a return value of 0 and
change the comment for ccw_device_get_mdc() accordingly.
This fix merely enables the error code path in the DASD functions
get_fcx_max_data() and verify_fcx_max_data(). The actual functionality
stays the same and is still correct.
Bart Van Assche [Wed, 18 Dec 2019 00:23:29 +0000 (16:23 -0800)]
block: Fix the type of 'sts' in bsg_queue_rq()
This patch fixes the following sparse warnings:
block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19: expected int sts
block/bsg-lib.c:269:19: got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16: expected restricted blk_status_t
block/bsg-lib.c:286:16: got int [assigned] sts
Cc: Martin Wilck <[email protected]> Fixes: d46fe2cb2dce ("block: drop device references in bsg_queue_rq()") Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Linus Torvalds [Fri, 20 Dec 2019 18:42:25 +0000 (10:42 -0800)]
Merge tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix kmemleak warning in IOVA code
- Fix compile warnings on ARM32/64 in dma-iommu code due to dma_mask
type mismatches
- Make ISA reserved regions relaxable, so that VFIO can assign devices
which have such regions defined
- Fix mapping errors resulting in IO page-faults in the VT-d driver
- Make sure direct mappings for a domain are created after the default
domain is updated
- Map ISA reserved regions in the VT-d driver with correct permissions
- Remove unneeded check for PSI capability in the IOTLB flush code of
the VT-d driver
- Lockdep fix iommu_dma_prepare_msi()
* tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dma: Relax locking in iommu_dma_prepare_msi()
iommu/vt-d: Remove incorrect PSI capability check
iommu/vt-d: Allocate reserved region for ISA with correct permission
iommu: set group default domain before creating direct mappings
iommu/vt-d: Fix dmar pte read access not set error
iommu/vt-d: Set ISA bridge reserved region as relaxable
iommu/dma: Rationalise types for DMA masks
iommu/iova: Init the struct iova to fix the possible memleak
Linus Torvalds [Fri, 20 Dec 2019 18:38:21 +0000 (10:38 -0800)]
Merge tag 'platform-drivers-x86-v5.5-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
"Bucket of fixes for PDx86. Note, that there is no ABI breakage in
Mellanox driver because it has been introduced in v5.5-rc1, so we can
change it.
Summary:
- Add support of APUv4 and fix an assignment of simswap GPIO
- Add Siemens CONNECT X300 to DMI table to avoid stuck during boot
- Correct arguments of WMI call on HP Envy x360 15-cp0xxx model
- Fix the mlx-bootctl sysfs attributes to be device related"
* tag 'platform-drivers-x86-v5.5-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: pcengines-apuv2: Spelling fixes in the driver
platform/x86: pcengines-apuv2: detect apuv4 board
platform/x86: pcengines-apuv2: fix simswap GPIO assignment
platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
platform/mellanox: fix the mlx-bootctl sysfs
Linus Torvalds [Fri, 20 Dec 2019 18:11:30 +0000 (10:11 -0800)]
Merge tag 'char-misc-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and other driver fixes for 5.5-rc3.
The most noticable one is a much-reported fix for a random driver
issue that came up from 5.5-rc1 compat_ioctl cleanups. The others are
a chunk of habanalab driver fixes and intel_th driver fixes and new
device ids.
All have been in linux-next with no reported issues"
* tag 'char-misc-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
random: don't forget compat_ioctl on urandom
intel_th: msu: Fix window switching without windows
intel_th: Fix freeing IRQs
intel_th: pci: Add Elkhart Lake SOC support
intel_th: pci: Add Comet Lake PCH-V support
habanalabs: remove variable 'val' set but not used
habanalabs: rate limit error msg on waiting for CS
Linus Torvalds [Fri, 20 Dec 2019 18:09:21 +0000 (10:09 -0800)]
Merge tag 'staging-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for a number of reported
issues.
The majority here are some fixes for the wfx driver, but also in here
is a comedi driver fix found during some code review, and an axis-fifo
build dependancy issue to resolve some reported testing problems.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wfx: fix wrong error message
staging: wfx: fix hif_set_mfp() with big endian hosts
staging: wfx: detect race condition in WEP authentication
staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps
staging: wfx: fix rate control handling
staging: wfx: firmware does not support more than 32 total retries
staging: wfx: use boolean appropriately
staging: wfx: fix counter overflow
staging: wfx: fix case of lack of tx_retry_policies
staging: wfx: fix the cache of rate policies on interface reset
staging: axis-fifo: add unspecified HAS_IOMEM dependency
staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
Wei Li [Fri, 20 Dec 2019 09:17:10 +0000 (17:17 +0800)]
arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list
HiSilicon Taishan v110 CPUs didn't implement CSV2 field of the
ID_AA64PFR0_EL1, but spectre-v2 is mitigated by hardware, so
whitelist the MIDR in the safe list.
Linus Torvalds [Fri, 20 Dec 2019 17:55:28 +0000 (09:55 -0800)]
Merge tag 'tty-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.5-rc3.
Only four small patches here:
- atmel serial driver fix
- msm_serial driver fix
- sprd serial driver fix
- tty core port fix
The last tty core fix should resolve a long-standing bug with a race
at port creation time that some people would see, and Sudip finally
tracked down.
All of these have been in linux-next with no reported issues"
* tag 'tty-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty/serial: atmel: fix out of range clock divider handling
tty: link tty and port before configuring it as console
serial: sprd: Add clearing break interrupt operation
tty: serial: msm_serial: Fix lockup for sysrq and oops
Linus Torvalds [Fri, 20 Dec 2019 17:53:24 +0000 (09:53 -0800)]
Merge tag 'usb-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported issues.
Included in here are:
- xhci build warning fix
- ehci disconnect warning fix
- usbip lockup fix and error cleanup fix
- typec build fix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci: Fix build warning seen with CONFIG_PM=n
usbip: Fix error path of vhci_recv_ret_submit()
usbip: Fix receive error in vhci-hcd when using scatter-gather
USB: EHCI: Do not return -EPIPE when hub is disconnected
usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'
Linus Torvalds [Fri, 20 Dec 2019 17:49:05 +0000 (09:49 -0800)]
Merge tag 'pinctrl-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Sorry that this fixes pull request took a while. Too much christmas
business going on.
This contains a few really important Intel fixes and some odd fixes:
- A host of fixes for the Intel baytrail and cherryview: properly
serialize all register accesses and add the irqchip with the
gpiochip as we need to, fix some pin lists and initialize the
hardware in the right order.
- Fix the Aspeed G6 LPC configuration.
- Handle a possible NULL pointer exception in the core.
- Fix the Kconfig dependencies for the Equilibrium driver"
* tag 'pinctrl-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: ingenic: Fixup PIN_CONFIG_OUTPUT config
pinctrl: Modify Kconfig to fix linker error
pinctrl: pinmux: fix a possible null pointer in pinmux_can_be_used_for_gpio
pinctrl: aspeed-g6: Fix LPC/eSPI mux configuration
pinctrl: cherryview: Pass irqchip when adding gpiochip
pinctrl: cherryview: Add GPIO <-> pin mapping ranges via callback
pinctrl: cherryview: Split out irq hw-init into a separate helper function
pinctrl: baytrail: Pass irqchip when adding gpiochip
pinctrl: baytrail: Add GPIO <-> pin mapping ranges via callback
pinctrl: baytrail: Update North Community pin list
pinctrl: baytrail: Really serialize all register accesses
Jens Axboe [Fri, 20 Dec 2019 01:24:38 +0000 (18:24 -0700)]
io_uring: pass in 'sqe' to the prep handlers
This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.
With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.
Jens Axboe [Thu, 19 Dec 2019 21:44:26 +0000 (14:44 -0700)]
io_uring: standardize the prep methods
We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.
For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.
Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.
Michael Haener [Fri, 29 Nov 2019 09:16:49 +0000 (10:16 +0100)]
platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
The CONNECT X300 uses the PMC clock for on-board components and gets
stuck during boot if the clock is disabled. Therefore, add this
device to the critical systems list.
Tested on CONNECT X300.
Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: Michael Haener <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
Hans de Goede [Tue, 17 Dec 2019 19:06:04 +0000 (20:06 +0100)]
platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
At least on the HP Envy x360 15-cp0xxx model the WMI interface
for HPWMI_FEATURE2_QUERY requires an outsize of at least 128 bytes,
otherwise it fails with an error code 5 (HPWMI_RET_INVALID_PARAMETERS):
Dec 06 00:59:38 kernel: hp_wmi: query 0xd returned error 0x5
We do not care about the contents of the buffer, we just want to know
if the HPWMI_FEATURE2_QUERY command is supported.
This commits bumps the buffer size, fixing the error.
Liming Sun [Wed, 18 Dec 2019 18:35:27 +0000 (13:35 -0500)]
platform/mellanox: fix the mlx-bootctl sysfs
This is a follow-up commit for the sysfs attributes to change
from DRIVER_ATTR to DEVICE_ATTR according to some initial comments.
In such case, it's better to point the sysfs path to the device
itself instead of the driver. The ABI document is also updated.
Fixes: 79e29cb8fbc5 ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc") Signed-off-by: Liming Sun <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
Jens Axboe [Fri, 20 Dec 2019 16:02:01 +0000 (09:02 -0700)]
io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler
Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.
Jens Axboe [Fri, 20 Dec 2019 15:58:21 +0000 (08:58 -0700)]
io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
the send/recvmsg prep handlers have grabbed what they need from the SQE
by the time prep is done.
Jens Axboe [Fri, 20 Dec 2019 15:45:55 +0000 (08:45 -0700)]
io_uring: add and use struct io_rw for read/writes
Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.
Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.
Chen Wandun [Fri, 20 Dec 2019 16:07:31 +0000 (08:07 -0800)]
xfs: Make the symbol 'xfs_rtalloc_log_count' static
Fix the following sparse warning:
fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static?
Fixes: b1de6fc7520f ("xfs: fix log reservation overflows when allocating large rt extents") Signed-off-by: Chen Wandun <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
Aditya Pakki [Tue, 17 Dec 2019 20:53:56 +0000 (14:53 -0600)]
xen/grant-table: remove multiple BUG_ON on gnttab_interface
gnttab_request_version() always sets the gnttab_interface variable
and the assertions to check for empty gnttab_interface is unnecessary.
The patch eliminates multiple such assertions.
Paul Durrant [Wed, 11 Dec 2019 15:29:56 +0000 (15:29 +0000)]
xen-blkback: support dynamic unbind/bind
By simply re-attaching to shared rings during connect_ring() rather than
assuming they are freshly allocated (i.e assuming the counters are zero)
it is possible for vbd instances to be unbound and re-bound from and to
(respectively) a running guest.
This has been tested by running:
while true;
do fio --name=randwrite --ioengine=libaio --iodepth=16 \
--rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32;
done
in a PV guest whilst running:
while true;
do echo vbd-$DOMID-$VBD >unbind;
echo unbound;
sleep 5;
echo vbd-$DOMID-$VBD >bind;
echo bound;
sleep 3;
done
in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
re-bind its system disk image.
This is a highly useful feature for a backend module as it allows it to be
unloaded and re-loaded (i.e. updated) without requiring domUs to be halted.
This was also tested by running:
while true;
do echo vbd-$DOMID-$VBD >unbind;
echo unbound;
sleep 5;
rmmod xen-blkback;
echo unloaded;
sleep 1;
modprobe xen-blkback;
echo bound;
cd $(pwd);
sleep 3;
done
in dom0 whilst running the same loop as above in the (single) PV guest.
Some (less stressful) testing has also been done using a Windows HVM guest
with the latest 9.0 PV drivers installed.
Paul Durrant [Wed, 11 Dec 2019 15:29:55 +0000 (15:29 +0000)]
xen/interface: re-define FRONT/BACK_RING_ATTACH()
Currently these macros are defined to re-initialize a front/back ring
(respectively) to values read from the shared ring in such a way that any
requests/responses that are added to the shared ring whilst the front/back
is detached will be skipped over. This, in general, is not a desirable
semantic since most frontend implementations will eventually block waiting
for a response which would either never appear or never be processed.
Since the macros are currently unused, take this opportunity to re-define
them to re-initialize a front/back ring using specified values. This also
allows FRONT/BACK_RING_INIT() to be re-defined in terms of
FRONT/BACK_RING_ATTACH() using a specified value of 0.
NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch.
Paul Durrant [Wed, 11 Dec 2019 15:29:54 +0000 (15:29 +0000)]
xenbus: limit when state is forced to closed
If a driver probe() fails then leave the xenstore state alone. There is no
reason to modify it as the failure may be due to transient resource
allocation issues and hence a subsequent probe() may succeed.
If the driver supports re-binding then only force state to closed during
remove() only in the case when the toolstack may need to clean up. This can
be detected by checking whether the state in xenstore has been set to
closing prior to device removal.
NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver,
which defaults to false. Subsequent patches will add support to
some backend drivers.
Paul Durrant [Wed, 11 Dec 2019 15:29:53 +0000 (15:29 +0000)]
xenbus: move xenbus_dev_shutdown() into frontend code...
...and make it static
xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of PV
frontends when a guest is rebooted. Indeed the function waits for a
conpletion which is only set by a call to xenbus_frontend_closed().
This patch removes the shutdown() method from backends and moves
xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c,
renaming it appropriately and making it static.
NOTE: In the case where the backend is running in a driver domain, the
toolstack should have already terminated any frontends that may be
using it (since Xen does not support re-startable PV driver domains)
so xenbus_dev_shutdown() should never be called.
xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
Clang warns:
../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
nr_parts = PARTS_PER_DISK;
^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
if (err)
^
This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.
While we are here, the previous line has some trailing whitespace; clean
that up as well.
The sifive_l2_cache.c is in no way related to RISC-V architecture
memory management. It is a little stub driver working around the fact
that the EDAC maintainers prefer their drivers to be structured in a
certain way that doesn't fit the SiFive SOCs.
Move the file to drivers/soc and add a Kconfig option for it, as well
as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE.
Fixes: a967a289f169 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Borislav Petkov <[email protected]>
[[email protected]: keep the MAINTAINERS change specific to the L2$ controller code] Signed-off-by: Paul Walmsley <[email protected]>
pfn_to_page & page_to_pfn depend on vmemmap being available before the calls
if kernel is configured with CONFIG_SPARSEMEM_VMEMMAP=y. This was caused
by NOMMU changes which moved vmemmap definition bellow functions definitions
calling pfn_to_page & page_to_pfn.
Noticed while compiled 5.5-rc2 kernel for Fedora/RISCV.
Greentime Hu [Thu, 19 Dec 2019 06:44:59 +0000 (14:44 +0800)]
riscv: fix scratch register clearing in M-mode.
This patch fixes that the sscratch register clearing in M-mode. It cleared
sscratch register in M-mode, but it should clear mscratch register. That will
cause kernel trap if the CPU core doesn't support S-mode when trying to access
sscratch.
Fixes: 9e80635619b5 ("riscv: clear the instruction cache and all registers when booting") Signed-off-by: Greentime Hu <[email protected]> Reviewed-by: Anup Patel <[email protected]> Signed-off-by: Paul Walmsley <[email protected]>
a refcount leak in the error path of u32_change() has been recently
introduced. It can be observed with the following commands:
[root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 97 \
> u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action drop
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
[root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 98 \
> handle 42:42 u32 divisor 256
Error: cls_u32: Divisor can only be used on a hash table.
We have an error talking to the kernel
[root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 99 \
> u32 ht 47:47
Error: cls_u32: Specified hash table not found.
We have an error talking to the kernel
they all legitimately return -EINVAL; however, they leave semi-configured
filters at eth0 tc ingress:
[root@f31 ~]# tc filter show dev eth0 ingress
filter protocol ip pref 97 u32 chain 0
filter protocol ip pref 97 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 98 u32 chain 0
filter protocol ip pref 98 u32 chain 0 fh 801: ht divisor 1
filter protocol ip pref 99 u32 chain 0
filter protocol ip pref 99 u32 chain 0 fh 802: ht divisor 1
With older kernels, filters were unconditionally considered empty (and
thus de-refcounted) on the error path of ->change().
After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for
concurrent execution"), filters were considered empty when the walk()
function didn't set 'walker.stop' to 1.
Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker
flag when tp is empty"), tc filters are considered empty unless the walker
function is called with a non-NULL handle. This last change doesn't fit
cls_u32 design, because at least the "root hnode" is (almost) always
non-NULL, as it's allocated in u32_init().
- patch 1/2 is a proposal to restore the original kernel behavior, where
no filter was installed in the error path of u32_change().
- patch 2/2 adds tdc selftests that can be ued to verify the correct
behavior of u32 in the error path of ->change().
====================
Davide Caratti [Tue, 17 Dec 2019 23:00:05 +0000 (00:00 +0100)]
tc-testing: initial tdc selftests for cls_u32
- move test "e9a3 - Add u32 with source match" to u32.json, and change the
match pattern to catch all hnodes
- add testcases for relevant error paths of cls_u32 module
Davide Caratti [Tue, 17 Dec 2019 23:00:04 +0000 (00:00 +0100)]
net/sched: cls_u32: fix refcount leak in the error path of u32_change()
when users replace cls_u32 filters with new ones having wrong parameters,
so that u32_change() fails to validate them, the kernel doesn't roll-back
correctly, and leaves semi-configured rules.
Fix this in u32_walk(), avoiding a call to the walker function on filters
that don't have a match rule connected. The side effect is, these "empty"
filters are not even dumped when present; but that shouldn't be a problem
as long as we are restoring the original behaviour, where semi-configured
filters were not even added in the error path of u32_change().
Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Aditya Pakki [Tue, 17 Dec 2019 20:43:00 +0000 (14:43 -0600)]
nfc: s3fwrn5: replace the assertion with a WARN_ON
In s3fwrn5_fw_recv_frame, if fw_info->rsp is not empty, the
current code causes a crash via BUG_ON. However, s3fwrn5_fw_send_msg
does not crash in such a scenario. The patch replaces the BUG_ON
by returning the error to the callers and frees up skb.
====================
net: macb: fix probing of PHY not described in the dt
The macb Ethernet driver supports various ways of referencing its
network PHY. When a device tree is used the PHY can be referenced with
a phy-handle or, if connected to its internal MDIO bus, described in
a child node. Some platforms omitted the PHY description while
connecting the PHY to the internal MDIO bus and in such cases the MDIO
bus has to be scanned "manually" by the macb driver.
Prior to the phylink conversion the driver registered the MDIO bus with
of_mdiobus_register and then in case the PHY couldn't be retrieved
using dt or using phy_find_first (because registering an MDIO bus with
of_mdiobus_register masks all PHYs) the macb driver was "manually"
scanning the MDIO bus (like mdiobus_register does). The phylink
conversion did break this particular case but reimplementing the manual
scan of the bus in the macb driver wouldn't be very clean. The solution
seems to be registering the MDIO bus based on if the PHYs are described
in the device tree or not.
There are multiple ways to do this, none is perfect. I chose to check if
any of the child nodes of the macb node was a network PHY and based on
this to register the MDIO bus with the of_ helper or not. The drawback
is boards referencing the PHY through phy-handle, would scan the entire
MDIO bus of the macb at boot time (as the MDIO bus would be registered
with mdiobus_register). For this solution to work properly
of_mdiobus_child_is_phy has to be exported, which means the patch doing
so has to be backported to -stable as well.
Another possible solution could have been to simply check if the macb
node has a child node by counting its sub-nodes. This isn't techically
perfect, as there could be other sub-nodes (in practice this should be
fine, fixed-link being taken care of in the driver). We could also
simply s/of_mdiobus_register/mdiobus_register/ but that could break
boards using the PHY description in child node as a selector (which
really would be not a proper way to do this...).
The real issue here being having PHYs not described in the dt but we
have dt backward compatibility, so we have to live with that.
====================
Antoine Tenart [Tue, 17 Dec 2019 17:07:42 +0000 (18:07 +0100)]
net: macb: fix probing of PHY not described in the dt
This patch fixes the case where the PHY isn't described in the device
tree. This is due to the way the MDIO bus is registered in the driver:
whether the PHY is described in the device tree or not, the bus is
registered through of_mdiobus_register. The function masks all the PHYs
and only allow probing the ones described in the device tree. Prior to
the Phylink conversion this was also done but later on in the driver
the MDIO bus was manually scanned to circumvent the fact that the PHY
wasn't described.
This patch fixes it in a proper way, by registering the MDIO bus based
on if the PHY attached to a given interface is described in the device
tree or not.
Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Israel Rukshin [Wed, 11 Dec 2019 15:36:02 +0000 (17:36 +0200)]
scsi: target/iblock: Fix protection error with blocks greater than 512B
The sector size of the block layer is 512 bytes, but integrity interval
size might be different (in case of 4K block size of the media). At the
initiator side the virtual start sector is the one that was originally
submitted by the block layer (512 bytes) for the Reftag usage. The
initiator converts the Reftag to integrity interval units and sends it to
the target. So the target virtual start sector should be calculated at
integrity interval units. prepare_fn() and complete_fn() don't remap
correctly the Reftag when using incorrect units of the virtual start
sector, which leads to the following protection error at the device:
"blk_update_request: protection error, dev sdb, sector 2048 op 0x0:(READ)
flags 0x10000 phys_seg 1 prio class 0"
To fix that, set the seed in integrity interval units.
tracing: Have the histogram compare functions convert to u64 first
The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.
To fix this, first derference the value as a u64 then convert it to the type
being compared.
Keita Suzuki [Wed, 11 Dec 2019 09:12:58 +0000 (09:12 +0000)]
tracing: Avoid memory leak in process_system_preds()
When failing in the allocation of filter_item, process_system_preds()
goes to fail_mem, where the allocated filter is freed.
However, this leads to memory leak of filter->filter_string and
filter->prog, which is allocated before and in process_preds().
This bug has been detected by kmemleak as well.
Daniel Borkmann [Thu, 19 Dec 2019 21:19:51 +0000 (22:19 +0100)]
bpf: Add further test_verifier cases for record_func_key
Expand dummy prog generation such that we can easily check on return
codes and add few more test cases to make sure we keep on tracking
pruning behavior.
# ./test_verifier
[...]
#1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
#1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED
Also verified that JIT dump of added test cases looks good.
Daniel Borkmann [Thu, 19 Dec 2019 21:19:50 +0000 (22:19 +0100)]
bpf: Fix record_func_key to perform backtracking on r3
While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort
implementation, I noticed a strange BPF SNAT engine behavior from time to
time. In some cases it would do the correct SNAT/DNAT service translation,
but at a random point in time it would just stop and perform an unexpected
translation after SYN, SYN/ACK and stack would send a RST back. While initially
assuming that there is some sort of a race condition in BPF code, adding
trace_printk()s for debugging purposes at some point seemed to have resolved
the issue auto-magically.
Digging deeper on this Heisenbug and reducing the trace_printk() calls to
an absolute minimum, it turns out that a single call would suffice to
trigger / not trigger the seen RST issue, even though the logic of the
program itself remains unchanged. Turns out the single call changed verifier
pruning behavior to get everything to work. Reconstructing a minimal test
case, the incorrect JIT dump looked as follows:
Hence, unexpectedly, JIT emitted a direct jump even though retpoline based
one would have been needed since in line 2c and 3a we have different slot
keys in BPF reg r3. Verifier log of the test case reveals what happened:
Second branch is pruned by verifier since considered safe, but issue is that
record_func_key() couldn't have seen the index in line 3a and therefore
decided that emitting a direct jump at this location was okay.
Fix this by reusing our backtracking logic for precise scalar verification
in order to prevent pruning on the slot key. This means verifier will track
content of r3 all the way backwards and only prune if both scalars were
unknown in state equivalence check and therefore poisoned in the first place
in record_func_key(). The range is [x,x] in record_func_key() case since
the slot always would have to be constant immediate. Correct verification
after fix:
Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since
backtracking is enabled under former (direct jumps as well, but use different
test). In case of only tracking different map pointers as in c93552c443eb ("bpf:
properly enforce index mask to prevent out-of-bounds speculation"), pruning
cannot make such short-cuts, neither if there are paths with scalar and non-scalar
types as r3. mark_chain_precision() is only needed after we know that
register_is_const(). If it was not the case, we already poison the key on first
path and non-const key in later paths are not matching the scalar range in regsafe()
either. Cilium NodePort testing passes fine as well now. Note, released kernels
not affected.
net, sysctl: Fix compiler warning when only cBPF is present
proc_dointvec_minmax_bpf_restricted() has been firstly introduced
in commit 2e4a30983b0f ("bpf: restrict access to core bpf sysctls")
under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations"), because a new sysctl, bpf_jit_limit, made use of it.
Finally, this parameter has become long instead of integer with fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
added.
With this last change, we got back to that
proc_dointvec_minmax_bpf_restricted() is used only under
CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
brought back.
So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
since v4.20 we have:
CC net/core/sysctl_net_core.o
net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.
Linus Torvalds [Thu, 19 Dec 2019 16:13:04 +0000 (08:13 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <[email protected]>:
lib/Kconfig.debug: fix some messed up configurations
mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG
kasan: don't assume percpu shadow allocations will succeed
kasan: use apply_to_existing_page_range() for releasing vmalloc shadow
mm/memory.c: add apply_to_existing_page_range() helper
kasan: fix crashes on access to memory mapped by vm_map_ram()
Linus Torvalds [Thu, 19 Dec 2019 16:09:43 +0000 (08:09 -0800)]
Merge tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a problem related to CPU offline/online and cpufreq governors that
in some system configurations may lead to a system-wide deadlock
during CPU online"
* tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid leaving stale IRQ work items during CPU offline
Darrick J. Wong [Wed, 11 Dec 2019 21:19:06 +0000 (13:19 -0800)]
xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly. The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry. The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.
Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair. Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.
Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before. We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.
Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.
Darrick J. Wong [Wed, 18 Dec 2019 19:13:16 +0000 (11:13 -0800)]
xfs: split the sunit parameter update into two parts
If the administrator provided a sunit= mount option, we need to validate
the raw parameter, convert the mount option units (512b blocks) into the
internal unit (fs blocks), and then validate that the (now cooked)
parameter doesn't screw anything up on disk. The incore inode geometry
computation can depend on the new sunit option, but a subsequent patch
will make validating the cooked value depends on the computed inode
geometry, so break the sunit update into two steps.
Darrick J. Wong [Wed, 18 Dec 2019 19:09:55 +0000 (11:09 -0800)]
xfs: refactor agfl length computation function
Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
case it returns the largest possible minimum length. This will be used
in an upcoming patch to compute the length of the AGFL at mkfs time.
Darrick J. Wong [Mon, 16 Dec 2019 19:14:09 +0000 (11:14 -0800)]
libxfs: resync with the userspace libxfs
Prepare to resync the userspace libxfs with the kernel libxfs. There
were a few things I missed -- a couple of static inline directory
functions that have to be exported for xfs_repair; a couple of directory
naming functions that make porting much easier if they're /not/ static
inline; and a u16 usage that should have been uint16_t.
None of these things are bugs in their own right; this just makes
porting xfsprogs easier.
Brian Foster [Tue, 17 Dec 2019 21:50:26 +0000 (13:50 -0800)]
xfs: use bitops interface for buf log item AIL flag check
The xfs_log_item flags were converted to atomic bitops as of commit 22525c17ed ("xfs: log item flags are racy"). The assert check for
AIL presence in xfs_buf_item_relse() still uses the old value based
check. This likely went unnoticed as XFS_LI_IN_AIL evaluates to 0
and causes the assert to unconditionally pass. Fix up the check.
Daniel Borkmann [Thu, 19 Dec 2019 15:20:49 +0000 (16:20 +0100)]
Merge branch 'bpf-fix-xsk-wakeup'
Maxim Mikityanskiy says:
====================
This series addresses the issue described in the commit message of the
first patch: lack of synchronization between XSK wakeup and destroying
the resources used by XSK wakeup. The idea is similar to napi_synchronize.
The series contains fixes for the drivers that implement XSK.
v2 incorporates changes suggested by Björn:
1. Call synchronize_rcu in Intel drivers only if the XDP program is
being unloaded.
2. Don't forget rcu_read_lock when wakeup is called from xsk_poll.
3. Use xs->zc as the condition to call ndo_xsk_wakeup.
====================
net/ixgbe: Fix concurrency issues between config flow and XSK
Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:
1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN.
2. After switching the XDP program, call synchronize_rcu to let
ixgbe_xsk_wakeup exit before the XDP program is freed.
3. Changing the number of channels brings the interface down.
4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware
resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to
avoid using the XDP ring when it's already destroyed. synchronize_rcu is
called from ixgbe_txrx_ring_disable.
net/i40e: Fix concurrency issues between config flow and XSK
Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:
1. i40e_down already calls synchronize_rcu. On i40e_down either
__I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
i40e_xsk_wakeup (the former is already checked there).
2. After switching the XDP program, call synchronize_rcu to let
i40e_xsk_wakeup exit before the XDP program is freed.
3. Changing the number of channels brings the interface down (see
i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
net/mlx5e: Fix concurrency issues between config flow and XSK
After disabling resources necessary for XSK (the XDP program, channels,
XSK queues), use synchronize_rcu to wait until the XSK wakeup function
finishes, before freeing the resources.
Suspend XSK wakeups during switching channels. If the XDP program is
being removed, synchronize_rcu before closing the old channels to allow
XSK wakeup to complete.
The XSK wakeup callback in drivers makes some sanity checks before
triggering NAPI. However, some configuration changes may occur during
this function that affect the result of those checks. For example, the
interface can go down, and all the resources will be destroyed after the
checks in the wakeup function, but before it attempts to use these
resources. Wrap this callback in rcu_read_lock to allow driver to
synchronize_rcu before actually destroying the resources.
xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
wrapped into the RCU lock. After this commit, xsk_poll starts using
xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
need_wakeup feature: a non-zero-copy socket may be used with a driver
supporting zero-copy, and in this case ndo_xsk_wakeup should not be
called, so the xs->zc check is the correct one.
clk: qcom: gcc-sc7180: Fix setting flag for votable GDSCs
Commit 17269568f7267 ("clk: qcom: Add Global Clock controller (GCC)
driver for SC7180") sets the VOTABLE flag in .pwrsts, but it needs
to be set in .flags, fix this.
Linus Torvalds [Thu, 19 Dec 2019 01:17:36 +0000 (17:17 -0800)]
Merge tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Bunch of fixes for rc3"
* tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm/tpm_ftpm_tee: add shutdown call back
tpm: selftest: cleanup after unseal with wrong auth/policy test
tpm: selftest: add test covering async mode
tpm: fix invalid locking in NONBLOCKING mode
security: keys: trusted: fix lost handle flush
tpm_tis: reserve chip for duration of tpm_tis_core_init
KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails
KEYS: remove CONFIG_KEYS_COMPAT
Vasily Gorbik [Tue, 10 Dec 2019 12:50:23 +0000 (13:50 +0100)]
s390/ftrace: save traced function caller
A typical backtrace acquired from ftraced function currently looks like
the following (e.g. for "path_openat"):
arch_stack_walk+0x15c/0x2d8
stack_trace_save+0x50/0x68
stack_trace_call+0x15a/0x3b8
ftrace_graph_caller+0x0/0x1c
0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8)
do_open_execat+0x70/0x1b8
__do_execve_file.isra.0+0x7d8/0x860
__s390x_sys_execve+0x56/0x68
system_call+0xdc/0x2d8
Note random "0x3e0007e3c98" stack value as ftraced function caller. This
value causes either imprecise unwinder result or unwinding failure.
That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which
it haven't had a chance to initialize since the very first instruction
calls ftrace code ("ftrace_caller"). (ftraced function might never
save r14 as well). Nevertheless according to s390 ABI any function
is called with stack frame allocated for it and r14 contains return
address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller".
So, to fix this issue simply always save traced function caller onto
ftraced function stack frame.
Vasily Gorbik [Wed, 11 Dec 2019 16:27:31 +0000 (17:27 +0100)]
s390/unwind: stop gracefully at user mode pt_regs in irq stack
Consider reaching user mode pt_regs at the bottom of irq stack graceful
unwinder termination. This is the case when irq/mcck/ext interrupt arrives
while in user mode.
s390/purgatory: do not build purgatory with kcov, kasan and friends
the purgatory must not rely on functions from the "old" kernel,
so we must disable kasan and friends. We also need to have a
separate copy of string.c as the default does not build memcmp
with KASAN.
The stack overflow happens because get_tod_clock_monotonic() gets called
by ftrace but itself calls preempt_{disable,enable}(), which leads to a
endless recursion. Fix this by using preempt_{disable,enable}_notrace().
Jose Abreu [Wed, 18 Dec 2019 10:17:43 +0000 (11:17 +0100)]
net: stmmac: Always arm TX Timer at end of transmission start
If TX Coalesce timer is enabled we should always arm it, otherwise we
may hit the case where an interrupt is missed and the TX Queue will
timeout.
Arming the timer does not necessarly mean it will run the tx_clean()
because this function is wrapped around NAPI launcher.
Fixes: 9125cdd1be11 ("stmmac: add the initial tx coalesce schema") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:42 +0000 (11:17 +0100)]
net: stmmac: Enable 16KB buffer size
XGMAC supports maximum MTU that can go to 16KB. Lets add this check in
the calculation of RX buffer size.
Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:41 +0000 (11:17 +0100)]
net: stmmac: 16KB buffer must be 16 byte aligned
The 16KB RX Buffer must also be 16 byte aligned. Fix it.
Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:40 +0000 (11:17 +0100)]
net: stmmac: RX buffer size must be 16 byte aligned
We need to align the RX buffer size to at least 16 byte so that IP
doesn't mis-behave. This is required by HW.
Changes from v2:
- Align UP and not DOWN (David)
Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:39 +0000 (11:17 +0100)]
net: stmmac: xgmac: Clear previous RX buffer size
When switching between buffer sizes we need to clear the previous value.
Fixes: d6ddfacd95c7 ("net: stmmac: Add DMA related callbacks for XGMAC2") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:38 +0000 (11:17 +0100)]
net: stmmac: Only the last buffer has the FCS field
Only the last received buffer contains the FCS field. Check for end of
packet before trying to strip the FCS field.
Fixes: 88ebe2cf7f3f ("net: stmmac: Rework stmmac_rx()") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jose Abreu [Wed, 18 Dec 2019 10:17:37 +0000 (11:17 +0100)]
net: stmmac: Do not accept invalid MTU values
The maximum MTU value is determined by the maximum size of TX FIFO so
that a full packet can fit in the FIFO. Add a check for this in the MTU
change callback.
Also check if provided and rounded MTU does not passes the maximum limit
of 16K.
Changes from v2:
- Align MTU before checking if its valid
Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver") Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: David S. Miller <[email protected]>