As defined at ETSI TS 101 162, original network IDs up to 0xfebf
are reserved for registration at dvb.org.
Let's use, instead, an original network ID at the range
0xff00-0xffff, as this is for private temporary usage.
As the same value is also used for the network ID,
the range 0xff01-0xffff also fits better, as values
lower than that depend if the network is used for
satellite, terrestrial, cable of CI.
While here, move the TS ID to the bridge code, where it
is used, and change its value, as it was identical to
the value previously used by network ID. While we could
keep the same value, let's change it, just to make easier
to check for the new code while reading it with DVB tools
like dvbinspector.
Do some cleanups at the coding style of the driver:
- remove "inline" declarations;
- use reverse xmas-tree for local var declarations;
- Adjust some indent to avoid breaking 80-cols;
- Cleanup some comments.
Matti Hamalainen [Fri, 20 Nov 2020 15:23:38 +0000 (17:23 +0200)]
drm/nouveau: fix relocations applying logic and a double-free
Commit 03e0d26fcf79 ("drm/nouveau: slowpath for pushbuf ioctl") included
a logic-bug which results in the relocations not actually getting
applied at all as the call to nouveau_gem_pushbuf_reloc_apply() is
never reached. This causes a regression with graphical corruption,
triggered when relocations need to be done (for example after a
suspend/resume cycle.)
Fix by setting *apply_relocs value only if there were more than 0
relocations.
Additionally, the never reached code had a leftover u_free() call,
which, after fixing the logic, now got called and resulted in a
double-free. Fix by removing one u_free(), moving the other
and adding check for errors.
Minimize the number of data copies and initialization at
the code, passing them as pointers instead of duplicating
the data.
The only case where we're keeping the data copy is at
vidtv_pes_write_h(), as it needs a copy of the passed
arguments. On such case, we're being more explicit.
The tone generator logic were repeating the song after the
first silent. There's also a wrong logic at the note
offset calculus, which may create some noise.
Currently, there are not checks if something gets bad during
memory allocation: it will simply use NULL pointers and
crash.
Add error path at the logic which allocates memory for the
MPEG-TS generator code, propagating the errors up to the
vidtv_bridge. Now, if something wents bad, start_streaming
will return an error that userspace can detect:
media: vidtv: Move s302m specific fields into encoder context
A few fields used only by the tone generator in the s302m encoder
are stored in struct vidtv_encoder. Move them into
struct vidtv_s302m_ctx instead. While we are at it: fix a
checkpatch warning for long lines.
media: vidtv: psi: extract descriptor chaining code into a helper
The code to append a descriptor to the end of a chain is repeated
throughout the psi generator code. Extract it into its own helper
function to avoid cluttering.
media: vidtv: psi: add a Network Information Table (NIT)
Add a Network Information Table (NIT) as specified in ETSI EN 300 468.
This table conveys information relating to the physical organization of
the multiplexes carried via a given network and the characteristics of
the network itself.
It is conveyed in the output of vidtv as packets with TS PID of 0x0010
net/tls: Protect from calling tls_dev_del for TLS RX twice
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.
This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.
Jakub Kicinski [Thu, 26 Nov 2020 01:26:37 +0000 (17:26 -0800)]
Merge branch 'devlink-port-attribute-fixes'
Parav Pandit says:
====================
devlink port attribute fixes
This patchset contains 2 small fixes for devlink port attributes.
Patch summary:
Patch-1 synchronize the devlink port attribute reader
with net namespace change operation
Patch-2 Ensure to return devlink port's netdevice attributes
when netdev and devlink instance belong to same net namespace
====================
Parav Pandit [Wed, 25 Nov 2020 09:16:20 +0000 (11:16 +0200)]
devlink: Make sure devlink instance and port are in same net namespace
When devlink reload operation is not used, netdev of an Ethernet port may
be present in different net namespace than the net namespace of the
devlink instance.
Ensure that both the devlink instance and devlink port netdev are located
in same net namespace.
Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload") Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Parav Pandit [Wed, 25 Nov 2020 09:16:19 +0000 (11:16 +0200)]
devlink: Hold rtnl lock while reading netdev attributes
A netdevice of a devlink port can be moved to different net namespace
than its parent devlink instance.
This scenario occurs when devlink reload is not used.
When netdevice is undergoing migration to net namespace, its ifindex
and name may change.
In such use case, devlink port query may read stale netdev attributes.
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Vladimir Oltean [Tue, 24 Nov 2020 22:02:59 +0000 (00:02 +0200)]
enetc: Let the hardware auto-advance the taprio base-time of 0
The tc-taprio base time indicates the beginning of the tc-taprio
schedule, which is cyclic by definition (where the length of the cycle
in nanoseconds is called the cycle time). The base time is a 64-bit PTP
time in the TAI domain.
Logically, the base-time should be a future time. But that imposes some
restrictions to user space, which has to retrieve the current PTP time
from the NIC first, then calculate a base time that will still be larger
than the base time by the time the kernel driver programs this value
into the hardware. Actually ensuring that the programmed base time is in
the future is still a problem even if the kernel alone deals with this.
Luckily, the enetc hardware already advances a base-time that is in the
past into a congruent time in the immediate future, according to the
same formula that can be found in the software implementation of taprio
(in taprio_get_start_time):
/* Schedule the start time for the beginning of the next
* cycle.
*/
n = div64_s64(ktime_sub_ns(now, base), cycle);
*start = ktime_add_ns(base, (n + 1) * cycle);
There's only one problem: the driver doesn't let the hardware do that.
It interferes with the base-time passed from user space, by special-casing
the situation when the base-time is zero, and replaces that with the
current PTP time. This changes the intended effective base-time of the
schedule, which will in the end have a different phase offset than if
the base-time of 0.000000000 was to be advanced by an integer multiple
of the cycle-time.
Antonio Borneo [Tue, 24 Nov 2020 22:37:29 +0000 (23:37 +0100)]
net: stmmac: fix incorrect merge of patch upstream
Commit 757926247836 ("net: stmmac: add flexible PPS to dwmac
4.10a") was intended to modify the struct dwmac410_ops, but it got
somehow badly merged and modified the struct dwmac4_ops.
Revert the modification in struct dwmac4_ops and re-apply it
properly in struct dwmac410_ops.
Anand K Mistry [Tue, 10 Nov 2020 01:33:53 +0000 (12:33 +1100)]
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).
However, since
21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.
Since
1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
this discrepency caused the misreporting of IB speculation via prctl().
On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.
Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.
Linus Torvalds [Wed, 25 Nov 2020 18:35:44 +0000 (10:35 -0800)]
Merge tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a rand Kconfig fixup for mtk-vcodec
- a fix at h264 handling at cedrus codec driver
- some warning fixes when config PM is not enabled at marvell-ccic
- two fixes at venus codec driver: one related to codec profile and the
other one related to a bad error path which causes an OOPS on module
re-bind
* tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: venus: pm_helpers: Fix kernel module reload
media: venus: venc: Fix setting of profile and level
media: cedrus: h264: Fix check for presence of scaling matrix
media: media/platform/marvell-ccic: fix warnings when CONFIG_PM is not enabled
media: mtk-vcodec: fix build breakage when one of VPU or SCP is enabled
media: mtk-vcodec: move firmware implementations into their own files
Randy Dunlap [Tue, 17 Nov 2020 01:39:51 +0000 (17:39 -0800)]
RISC-V: fix barrier() use in <vdso/processor.h>
riscv's <vdso/processor.h> uses barrier() so it should include
<asm/barrier.h>
Fixes this build error:
CC [M] drivers/net/ethernet/emulex/benet/be_main.o
In file included from ./include/vdso/processor.h:10,
from ./arch/riscv/include/asm/processor.h:11,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
14 | barrier();
This happens with a total of 5 networking drivers -- they all use
<linux/prefetch.h>.
rv64 allmodconfig now builds cleanly after this patch.
Fixes fallout from: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
riscv: Explicitly specify the build id style in vDSO Makefile again
Commit a96843372331 ("kbuild: explicitly specify the build id style")
explicitly set the build ID style to SHA1. Commit c2c81bb2f691 ("RISC-V:
Fix the VDSO symbol generaton for binutils-2.35+") undid this change,
likely unintentionally.
Restore it so that the build ID style stays consistent across the tree
regardless of linker.
Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+") Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Bill Wendling <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
CONFIG_EFI_EARLYCON defaults to yes, and thus is enabled on systems that
do not support EFI, or do not have EFI support enabled, but do satisfy
the symbol's other dependencies.
While drivers/firmware/efi/ won't be entered during the build phase if
CONFIG_EFI=n, and drivers/firmware/efi/earlycon.c itself thus won't be
built, enabling EFI_EARLYCON does force-enable CONFIG_FONT_SUPPORT and
CONFIG_ARCH_USE_MEMREMAP_PROT, and CONFIG_FONT_8x16, which is
undesirable.
Fix this by making CONFIG_EFI_EARLYCON depend on CONFIG_EFI.
This reduces kernel size on headless systems by more than 4 KiB.
Ard Biesheuvel [Wed, 25 Nov 2020 07:45:55 +0000 (08:45 +0100)]
efivarfs: revert "fix memory leak in efivarfs_create()"
The memory leak addressed by commit fe5186cf12e3 is a false positive:
all allocations are recorded in a linked list, and freed when the
filesystem is unmounted. This leads to double frees, and as reported
by David, leads to crashes if SLUB is configured to self destruct when
double frees occur.
So drop the redundant kfree() again, and instead, mark the offending
pointer variable so the allocation is ignored by kmemleak.
Cc: Vamshi K Sthambamkadi <[email protected]> Fixes: fe5186cf12e3 ("efivarfs: fix memory leak in efivarfs_create()") Reported-by: David Laight <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
Efivars allows for overriding of SSDT tables, however starting with
commit
bf67fad19e493b ("efi: Use more granular check for availability for variable services")
this use case is broken. When loading SSDT generic ops should be set
first, however mentioned commit reversed order of operations. Fix this
by restoring original order of operations.
Shiraz Saleem [Wed, 25 Nov 2020 00:56:16 +0000 (18:56 -0600)]
RDMA/i40iw: Address an mmap handler exploit in i40iw
i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
without any validation. This is vulnerable to an mmap exploit as described
in: https://lore.kernel.org/r/20201119093523[email protected]
The push feature is disabled in the driver currently and therefore no push
mmaps are issued from user-space. The feature does not work as expected in
the x722 product.
Remove the push module parameter and all VMA attribute manipulations for
this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
to a single page.
Jon Hunter [Wed, 18 Nov 2020 16:04:58 +0000 (16:04 +0000)]
arm64: tegra: Fix Tegra234 VDK node names
When the device-tree board file was added for the Tegra234 VDK simulator
it incorrectly used the names 'cbb' and 'sdhci' instead of 'bus' and
'mmc', respectively. The names 'bus' and 'mmc' are required by the
device-tree json-schema validation tools. Therefore, fix this by
renaming these nodes accordingly.
JC Kuo [Thu, 19 Nov 2020 07:23:45 +0000 (15:23 +0800)]
arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1
USB host mode is broken on the OTG port of Jetson TX1 platform because
the USB_VBUS_EN0 regulator (regulator@11) is being overwritten by the
vdd-cam-1v2 regulator. This commit rearranges USB_VBUS_EN0 to be
regulator@14.
Jon Hunter [Wed, 11 Nov 2020 10:41:17 +0000 (10:41 +0000)]
arm64: tegra: Correct the UART for Jetson Xavier NX
The Jetson Xavier NX board routes UARTA to the 40-pin header and UARTC
to a 12-pin debug header. The UARTs can be used by either the Tegra
Combined UART (TCU) driver or the Tegra 8250 driver. By default, the
TCU will use UARTC on Jetson Xavier NX. Currently, device-tree for
Xavier NX enables the TCU and the Tegra 8250 node for UARTC. Fix this
by disabling the Tegra 8250 node for UARTC and enabling the Tegra 8250
node for UARTA.
Jon Hunter [Mon, 16 Nov 2020 16:20:26 +0000 (16:20 +0000)]
arm64: tegra: Disable the ACONNECT for Jetson TX2
Commit ff4c371d2bc0 ("arm64: defconfig: Build ADMA and ACONNECT driver")
enable the Tegra ADMA and ACONNECT drivers and this is causing resume
from system suspend to fail on Jetson TX2. Resume is failing because the
ACONNECT driver is being resumed before the BPMP driver, and the ACONNECT
driver is attempting to power on a power-domain that is provided by the
BPMP. While a proper fix for the resume sequencing problem is identified,
disable the ACONNECT for Jetson TX2 temporarily to avoid breaking system
suspend.
Please note that ACONNECT driver is used by the Audio Processing Engine
(APE) on Tegra, but because there is no mainline support for APE on
Jetson TX2 currently, disabling the ACONNECT does not disable any useful
feature at the moment.
Lars Povlsen [Fri, 20 Nov 2020 21:34:14 +0000 (22:34 +0100)]
spi: dw: Fix spi registration for controllers overriding CS
When SPI DW memory ops support was introduced, there was a check for
excluding controllers which supplied their own CS function. Even so,
the mem_ops pointer is *always* presented to the SPI core.
This causes the SPI core sanity check in spi_controller_check_ops() to
refuse registration, since a mem_ops pointer is being supplied without
an exec_op member function.
The end result is failure of the SPI DW driver on sparx5 and similar
platforms.
The fix in the core SPI DW driver is to avoid presenting the mem_ops
pointer if the exec_op function is not set.
Lu Baolu [Wed, 25 Nov 2020 01:41:24 +0000 (09:41 +0800)]
x86/tboot: Don't disable swiotlb when iommu is forced on
After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
domain"), swiotlb could also be used for direct memory access if IOMMU
is enabled but a device is configured to pass through the DMA translation.
Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
if "iommu=pt" kernel parameter is used.
Rui Miguel Silva [Fri, 13 Nov 2020 15:06:04 +0000 (15:06 +0000)]
optee: add writeback to valid memory type
Only in smp systems the cache policy is setup as write alloc, in
single cpu systems the cache policy is set as writeback and it is
normal memory, so, it should pass the is_normal_memory check in the
share memory registration.
Add the right condition to make it work in no smp systems.
Fixes: cdbcf83d29c1 ("tee: optee: check type of registered shared memory") Signed-off-by: Rui Miguel Silva <[email protected]> Signed-off-by: Jens Wiklander <[email protected]>
drm/ast: Reload gamma LUT after changing primary plane's color format
The gamma LUT has to be reloaded after changing the primary plane's
color format. This used to be done implicitly by the CRTC atomic_enable()
helper after updating the primary plane. With the recent reordering of
the steps, the primary plane's setup was moved last and invalidated
the gamma LUT. Fix this by setting the LUT from within atomic_flush().
Lijun Pan [Mon, 23 Nov 2020 19:35:47 +0000 (13:35 -0600)]
ibmvnic: enhance resetting status check during module exit
Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.
Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset") Signed-off-by: Lijun Pan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Lijun Pan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Lijun Pan [Mon, 23 Nov 2020 19:35:45 +0000 (13:35 -0600)]
ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues
adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset
did not complete after freeing sub crqs. Check for NULL before
dereferencing them.
Jakub Kicinski [Wed, 25 Nov 2020 00:07:16 +0000 (16:07 -0800)]
Merge branch 'fixes-for-ena-driver'
Shay Agroskin says:
====================
Fixes for ENA driver
- fix wrong data offset on machines that support rx offset
- work-around Intel iommu issue
- fix out of bound access when request id is wrong
====================
Shay Agroskin [Mon, 23 Nov 2020 19:08:59 +0000 (21:08 +0200)]
net: ena: fix packet's addresses for rx_offset feature
This patch fixes two lines in which the rx_offset received by the device
wasn't taken into account:
- prefetch function:
In our driver the copied data would reside in
rx_info->page + rx_headroom + rx_offset
so the prefetch function is changed accordingly.
- setting page_offset to zero for descriptors > 1:
for every descriptor but the first, the rx_offset is zero. Hence
the page_offset value should be set to rx_headroom.
The previous implementation changed the value of rx_info after
the descriptor was added to the SKB (essentially providing wrong
page offset).
Fixes: 68f236df93a9 ("net: ena: add support for the rx offset feature") Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Shay Agroskin [Mon, 23 Nov 2020 19:08:58 +0000 (21:08 +0200)]
net: ena: set initial DMA width to avoid intel iommu issue
The ENA driver uses the readless mechanism, which uses DMA, to find
out what the DMA mask is supposed to be.
If DMA is used without setting the dma_mask first, it causes the
Intel IOMMU driver to think that ENA is a 32-bit device and therefore
disables IOMMU passthrough permanently.
This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48
before readless initialization in
ena_device_init()->ena_com_mmio_reg_read_request_init(),
which is large enough to workaround the intel_iommu issue.
DMA mask is set again to the correct value after it's received from the
device after readless is initialized.
The patch also changes the driver to use dma_set_mask_and_coherent()
function instead of the two pci_set_dma_mask() and
pci_set_consistent_dma_mask() ones. Both methods achieve the same
effect.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Mike Cui <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Shay Agroskin [Mon, 23 Nov 2020 19:08:57 +0000 (21:08 +0200)]
net: ena: handle bad request id in ena_netdev
After request id is checked in validate_rx_req_id() its value is still
used in the line
rx_ring->free_ids[next_to_clean] =
rx_ring->ena_bufs[i].req_id;
even if it was found to be out-of-bound for the array free_ids.
The patch moves the request id to an earlier stage in the napi routine and
makes sure its value isn't used if it's found out-of-bounds.
Linus Torvalds [Tue, 24 Nov 2020 23:33:18 +0000 (15:33 -0800)]
Merge tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four smb3 fixes for stable: one fixes a memleak, the other three
address a problem found with decryption offload that can cause a use
after free"
* tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Handle error case during offload read path
smb3: Avoid Mid pending list corruption
smb3: Call cifs reconnect from demultiplex thread
cifs: fix a memleak with modefromsid
Hugh Dickins [Tue, 24 Nov 2020 16:46:43 +0000 (08:46 -0800)]
mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.
The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.
https://lore.kernel.org/linux-mm/20200827122019[email protected]/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
Then on crashing a second time, realized there's a stronger reason against
that approach. If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()? And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon? What
would that look like?
It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).
Matthew Wilcox explaining this to himself:
"page is allocated, added to page cache, dirtied, writeback starts,
--- thread A ---
filesystem calls end_page_writeback()
test_clear_page_writeback()
--- context switch to thread B ---
truncate_inode_pages_range() finds the page, it doesn't have writeback set,
we delete it from the page cache. Page gets reallocated, dirtied, writeback
starts again. Then we call write_cache_pages(), see
PageWriteback() set, call wait_on_page_writeback()
--- context switch back to thread A ---
wake_up_page(page, PG_writeback);
... thread B is woken, but because the wakeup was for the old use of
the page, PageWriteback is still set.
Devious"
And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.
I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().
Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared? I think not,
because each waiter does hold a reference on the page. This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it). The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
nfc: s3fwrn5: use signed integer for parsing GPIO numbers
GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are
signed integers, where negative number indicates error. The return
value of of_get_named_gpio() should not be assigned to an unsigned int
because in case of !CONFIG_GPIOLIB such number would be a valid GPIO.
Ezequiel Garcia [Mon, 23 Nov 2020 16:35:53 +0000 (18:35 +0200)]
dpaa2-eth: Fix compile error due to missing devlink support
The dpaa2 driver depends on devlink, so it should select
NET_DEVLINK in order to fix compile errors, such as:
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err':
dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get':
dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put'
Alexander Duyck [Sat, 21 Nov 2020 03:47:44 +0000 (19:47 -0800)]
tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN
When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.
To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.
To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.
Moshe Shemesh [Mon, 23 Nov 2020 05:36:25 +0000 (07:36 +0200)]
devlink: Fix reload stats structure
Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.
Reload stats structure before this fix:
"stats": {
"reload": {
"driver_reinit": 2,
"fw_activate": 1,
"fw_activate_no_reset": 0
}
}
After this fix:
"stats": {
"reload": {
"driver_reinit": {
"unspecified": 2
},
"fw_activate": {
"unspecified": 1,
"no_reset": 0
}
}
Linus Torvalds [Tue, 24 Nov 2020 20:12:55 +0000 (12:12 -0800)]
Merge tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"A couple more stack unwinder related fixes:
- More stack unwinding updates
- Misc minor fixes"
* tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: stack unwinding: reorganize how initial register state setup
ARC: stack unwinding: don't assume non-current task is sleeping
ARC: mm: fix spelling mistakes
ARC: bitops: Remove unecessary operation and value
Lincoln Ramsay [Mon, 23 Nov 2020 21:40:43 +0000 (21:40 +0000)]
aquantia: Remove the build_skb path
When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.
aq_ring.c has this code (edited slightly for brevity):
There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.
As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.
Rodrigo Siqueira [Tue, 17 Nov 2020 20:25:48 +0000 (15:25 -0500)]
drm/amd/display: Avoid HDCP initialization in devices without output
The HDCP feature requires at least one connector attached to the device;
however, some GPUs do not have a physical output, making the HDCP
initialization irrelevant. This patch disables HDCP initialization when
the graphic card does not have output.
Chris Wilson [Mon, 23 Nov 2020 11:37:17 +0000 (11:37 +0000)]
drm/i915/gt: Free stale request on destroying the virtual engine
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.
v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.
Chris Wilson [Mon, 23 Nov 2020 11:37:16 +0000 (11:37 +0000)]
drm/i915/gt: Don't cancel the interrupt shadow too early
We currently want to keep the interrupt enabled until the interrupt after
which we have no more work to do. This heuristic was broken by us
kicking the irq-work on adding a completed request without attaching a
signaler -- hence it appearing to the irq-worker that an interrupt had
fired when we were idle.
Sonny Jiang [Fri, 20 Nov 2020 07:38:09 +0000 (02:38 -0500)]
drm/amdgpu: fix a page fault
The UVD firmware is copied to cpu addr in uvd_resume, so it
should be used after that. This is to fix a bug introduced by
patch drm/amdgpu: fix SI UVD firmware validate resume fail.
Sonny Jiang [Fri, 6 Nov 2020 21:42:47 +0000 (16:42 -0500)]
drm/amdgpu: fix SI UVD firmware validate resume fail
The SI UVD firmware validate key is stored at the end of firmware,
which is changed during resume while playing video. So get the key
at sw_init and store it for fw validate using.
Chris Wilson [Mon, 23 Nov 2020 11:37:14 +0000 (11:37 +0000)]
drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).
By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.
v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().
drm/i915/perf: workaround register corruption in OATAILPTR
After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.
When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.
The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.
This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.
Peter Zijlstra [Fri, 20 Nov 2020 10:28:35 +0000 (11:28 +0100)]
intel_idle: Fix intel_idle() vs tracing
cpuidle->enter() callbacks should not call into tracing because RCU
has already been disabled. Instead of doing the broadcast thing
itself, simply advertise to the cpuidle core that those states stop
the timer.
Joseph Qi [Tue, 24 Nov 2020 07:03:03 +0000 (15:03 +0800)]
io_uring: fix shift-out-of-bounds when round up cq size
Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():
[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239
This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.
Clark Wang [Tue, 24 Nov 2020 08:52:47 +0000 (16:52 +0800)]
spi: imx: fix the unbalanced spi runtime pm management
If set active without increase the usage count of pm, the dont use
autosuspend function will call the suspend callback to close the two
clocks of spi because the usage count is reduced to -1.
This will cause the warning dump below when the defer-probe occurs.
firmware: xilinx: Use hash-table for api feature check
Currently array of fix length PM_API_MAX is used to cache
the pm_api version (valid or invalid). However ATF based
PM APIs values are much higher then PM_API_MAX.
So to include ATF based PM APIs also, use hash-table to
store the pm_api version status.
Xiaochen Shen [Fri, 30 Oct 2020 19:11:28 +0000 (03:11 +0800)]
x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Xiaochen Shen [Fri, 30 Oct 2020 19:10:53 +0000 (03:10 +0800)]
x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.
On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().
With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.
Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info",
"mon_groups" and "mon_data" directories and sub-directories are
created, the reference count of newly created kernfs node is set to 1.
But after kernfs_create_dir() returns, superfluous kernfs_get() are
called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two") Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support") Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Reported-by: Willem de Bruijn <[email protected]> Signed-off-by: Xiaochen Shen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Willem de Bruijn <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
Eyal Birger [Sat, 21 Nov 2020 06:28:17 +0000 (08:28 +0200)]
net/packet: fix packet receive on L3 devices without visible hard header
In the patchset merged by commit b9fcf0a0d826
("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
did not have header_ops were given one for the purpose of protocol parsing
on af_packet transmit path.
That change made af_packet receive path regard these devices as having a
visible L3 header and therefore aligned incoming skb->data to point to the
skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
reset their mac_header prior to ingress and therefore their incoming
packets became malformed.
Ideally these devices would reset their mac headers, or af_packet would be
able to rely on dev->hard_header_len being 0 for such cases, but it seems
this is not the case.
Fix by changing af_packet RX ll visibility criteria to include the
existence of a '.create()' header operation, which is used when creating
a device hard header - via dev_hard_header() - by upper layers, and does
not exist in these L3 devices.
As this predicate may be useful in other situations, add it as a common
dev_has_header() helper in netdevice.h.
Hao Si [Tue, 20 Oct 2020 02:18:32 +0000 (10:18 +0800)]
soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.
During LTP testing, the following error was generated:
i40e: Fix removing driver while bare-metal VFs pass traffic
Prevent VFs from resetting when PF driver is being unloaded:
- introduce new pf state: __I40E_VF_RESETS_DISABLED;
- check if pf state has __I40E_VF_RESETS_DISABLED state set,
if so, disable any further VFLR event notifications;
- when i40e_remove (rmmod i40e) is called, disable any resets on
the VFs;
Previously if there were bare-metal VFs passing traffic and PF
driver was removed, there was a possibility of VFs triggering a Tx
timeout right before iavf_remove. This was causing iavf_close to
not be called because there is a check in the beginning of iavf_remove
that bails out early if adapter->state < IAVF_DOWN_PENDING. This
makes it so some resources do not get cleaned up.
vsock/virtio: discard packets only when socket is really closed
Starting from commit 8692cefc433f ("virtio_vsock: Fix race condition
in virtio_transport_recv_pkt"), we discard packets in
virtio_transport_recv_pkt() if the socket has been released.
When the socket is connected, we schedule a delayed work to wait the
RST packet from the other peer, also if SHUTDOWN_MASK is set in
sk->sk_shutdown.
This is done to complete the virtio-vsock shutdown algorithm, releasing
the port assigned to the socket definitively only when the other peer
has consumed all the packets.
If we discard the RST packet received, the socket will be closed only
when the VSOCK_CLOSE_TIMEOUT is reached.
Sergio discovered the issue while running ab(1) HTTP benchmark using
libkrun [1] and observing a latency increase with that commit.
To avoid this issue, we discard packet only if the socket is really
closed (SOCK_DONE flag is set).
We also set SOCK_DONE in virtio_transport_release() when we don't need
to wait any packets from the other peer (we didn't schedule the delayed
work). In this case we remove the socket from the vsock lists, releasing
the port assigned.
Ricardo Dias [Fri, 20 Nov 2020 11:11:33 +0000 (11:11 +0000)]
tcp: fix race condition when creating child sockets from syncookies
When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.
The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.
Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.
When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.
This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we drop the packet and discard the second child socket
to the same client.
Jakub Kicinski [Mon, 23 Nov 2020 22:59:40 +0000 (14:59 -0800)]
Merge tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.10
First set of fixes for v5.10. One fix for iwlwifi kernel panic, others
less notable.
rtw88
* fix a bogus test found by clang
iwlwifi
* fix long memory reads causing soft lockup warnings
* fix kernel panic during Channel Switch Announcement (CSA)
* other smaller fixes
MAINTAINERS
* email address updates
* tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
iwlwifi: mvm: fix kernel panic in case of assert during CSA
iwlwifi: pcie: set LTR to avoid completion timeout
iwlwifi: mvm: write queue_sync_state only for sync
iwlwifi: mvm: properly cancel a session protection for P2P
iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC
iwlwifi: sta: set max HE max A-MPDU according to HE capa
MAINTAINERS: update maintainers list for Cypress
MAINTAINERS: update Yan-Hsuan's email address
iwlwifi: pcie: limit memory read spin time
rtw88: fix fw_fifo_addr check
====================