Git Repo - linux.git/log

media: vidtv: fix the network ID range

As defined at ETSI TS 101 162, original network IDs up to 0xfebf
are reserved for registration at dvb.org.

Let's use, instead, an original network ID at the range
0xff00-0xffff, as this is for private temporary usage.

As the same value is also used for the network ID,
the range 0xff01-0xffff also fits better, as values
lower than that depend if the network is used for
satellite, terrestrial, cable of CI.

While here, move the TS ID to the bridge code, where it
is used, and change its value, as it was identical to
the value previously used by network ID. While we could
keep the same value, let's change it, just to make easier
to check for the new code while reading it with DVB tools
like dvbinspector.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: improve EIT data

Place some text at EIT data, and use ISO 8859-15 encoding for
the German letter "ü" (u mit umlat) letter.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: cleanup null packet initialization logic

Initialize the destination buffer/size and the initial
offset when creating the local var.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: pre-initialize mux arrays

Instead of first zeroing all fields at the mux structs and
then filling, do some initialization for the const data
when they're created.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: remove some unused functions

Right now, there's no need to access the length of some
tables. So, drop the unused functions.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: do some cleanups at the driver

Do some cleanups at the coding style of the driver:
- remove "inline" declarations;
- use reverse xmas-tree for local var declarations;
- Adjust some indent to avoid breaking 80-cols;
- Cleanup some comments.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

drm/nouveau: fix relocations applying logic and a double-free

Commit 03e0d26fcf79 ("drm/nouveau: slowpath for pushbuf ioctl") included
a logic-bug which results in the relocations not actually getting
applied at all as the call to nouveau_gem_pushbuf_reloc_apply() is
never reached. This causes a regression with graphical corruption,
triggered when relocations need to be done (for example after a
suspend/resume cycle.)

Fix by setting *apply_relocs value only if there were more than 0
relocations.

Additionally, the never reached code had a leftover u_free() call,
which, after fixing the logic, now got called and resulted in a
double-free. Fix by removing one u_free(), moving the other
and adding check for errors.

Cc: Daniel Vetter <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Matti Hamalainen <[email protected]>
Fixes: 03e0d26fcf79 ("drm/nouveau: slowpath for pushbuf ioctl")
References: https://gitlab.freedesktop.org/drm/nouveau/-/issues/11
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

media: vidtv: avoid copying data for PES structs

Minimize the number of data copies and initialization at
the code, passing them as pointers instead of duplicating
the data.

The only case where we're keeping the data copy is at
vidtv_pes_write_h(), as it needs a copy of the passed
arguments. On such case, we're being more explicit.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: avoid data copy when initializing the multiplexer

Initialize the fields of the arguments directly when
declaring it, and pass the args as a pointer, instead of
copying them.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: fix some notes at the tone generator

The sheet music used to generate the tones had a few
polyphonic notes. Due to that, its conversion to a
tones sequence had a few errors.

Also, due to a bug at the tone generator, it was missing
the pause at the initial compass.

Fix them.

While here, reduce the compass to 100bpm.

The music was converted from a Music XML file using
this small script:

<snip>
my $count = 0;
my $silent = 0;
my $note;
my $octave;
print "\t";
while (<>) {
$note = $1 if (m,\<step\>(.*)\</step\>,);
$octave = "_$1" if (m,\<octave\>(.*)\</octave\>,);
if (m,\<alter\>1\</alter\>,) {
$note .= "S";
$sharp = 1;
}
if (m,\<rest/\>,) {
$note = "SILENT";
$silent = 1;
}
if (m,\<duration\>(.*)\</duration\>,) {
printf "{ NOTE_${note}${octave}, %d},", $1 * 128 / 480;
$count++;
if ($silent || $count >= 3) {
print "\n\t";
$count = 0;
$silent = 0;
} else {
print " ";
print " " if (!$sharp);
}
$sharp = 0;
$note = "";
$octave = "";
};
};
print "\n";
</snip>

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: fix the tone generator logic

The tone generator logic were repeating the song after the
first silent. There's also a wrong logic at the note
offset calculus, which may create some noise.

Fix it.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: fix the name of the program

While the original plan was to use the first movement of
the 5th Symphony, it was opted to use the Für Elise song,
instead.

Fix it.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: don't use recursive functions

The Linux stack is too short. So, using recursive functions
is a very bad idea. Convert those into non-recursive ones.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: add error checks

Currently, there are not checks if something gets bad during
memory allocation: it will simply use NULL pointers and
crash.

Add error path at the logic which allocates memory for the
MPEG-TS generator code, propagating the errors up to the
vidtv_bridge. Now, if something wents bad, start_streaming
will return an error that userspace can detect:

ERROR DMX_SET_PES_FILTER failed (PID = 0x2000): 12 Cannot allocate memory

and the driver doesn't crash.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: reorganize includes

- Place the includes on alphabetical order;
- get rid of asm/byteorder.h;
- add bug.h at vidtv_s302m.c, as it is needed by
inux/fixp-arith.h

Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: psi: fix missing assignments in while loops

Some variables were only assigned once but were used in while
loops as if they had been updated at every iteration. Fix this.

Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: Move s302m specific fields into encoder context

A few fields used only by the tone generator in the s302m encoder
are stored in struct vidtv_encoder. Move them into
struct vidtv_s302m_ctx instead. While we are at it: fix a
checkpatch warning for long lines.

Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: psi: extract descriptor chaining code into a helper

The code to append a descriptor to the end of a chain is repeated
throughout the psi generator code. Extract it into its own helper
function to avoid cluttering.

Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: psi: Implement an Event Information Table (EIT)

Implement an Event Information Table (EIT) as per EN 300 468
5.2.4.

The EIT provides information in chronological order regarding
the events contained within each service.

For now only present event information is supported.

[[email protected]: removed an extra blank line]
Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: psi: add a Network Information Table (NIT)

Add a Network Information Table (NIT) as specified in ETSI EN 300 468.

This table conveys information relating to the physical organization of
the multiplexes carried via a given network and the characteristics of
the network itself.

It is conveyed in the output of vidtv as packets with TS PID of 0x0010

[[email protected]: removed an extra blank line]
Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

media: vidtv: extract the initial CRC value to into a #define

The same constant (0xffffffff) is used in three different functions.

Extract it into a #define to avoid repetition.

Signed-off-by: Daniel W. S. Almeida <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

ch_ktls: lock is not freed

Currently lock gets freed only if timeout expires, but missed a
case when HW returns failure and goes for cleanup.

Fixes: efca3878a5fb ("ch_ktls: Issue if connection offload fails")
Signed-off-by: Rohit Maheshwari <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/tls: Protect from calling tls_dev_del for TLS RX twice

tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.

This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'devlink-port-attribute-fixes'

Parav Pandit says:

====================
devlink port attribute fixes

This patchset contains 2 small fixes for devlink port attributes.

Patch summary:
Patch-1 synchronize the devlink port attribute reader
with net namespace change operation
Patch-2 Ensure to return devlink port's netdevice attributes
when netdev and devlink instance belong to same net namespace
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

devlink: Make sure devlink instance and port are in same net namespace

When devlink reload operation is not used, netdev of an Ethernet port may
be present in different net namespace than the net namespace of the
devlink instance.

Ensure that both the devlink instance and devlink port netdev are located
in same net namespace.

Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

devlink: Hold rtnl lock while reading netdev attributes

A netdevice of a devlink port can be moved to different net namespace
than its parent devlink instance.
This scenario occurs when devlink reload is not used.

When netdevice is undergoing migration to net namespace, its ifindex
and name may change.

In such use case, devlink port query may read stale netdev attributes.

Fix it by reading them under rtnl lock.

Fixes: bfcd3a466172 ("Introduce devlink infrastructure")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ptp: clockmatrix: bug fix for idtcm_strverscmp

Feed kstrtou8 with NULL terminated string.

Changes since v1:
-Use sscanf to get rid of adhoc string parse.
Changes since v2:
-Check if sscanf returns 3.

Fixes: 7ea5fda2b132 ("ptp: ptp_clockmatrix: update to support 4.8.7 firmware")
Signed-off-by: Min Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

IB/hfi1: Ensure correct mm is used at all times

Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.

To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.

Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.

Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.

Cc: <[email protected]>
Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Jann Horn <[email protected]>
Reported-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

enetc: Let the hardware auto-advance the taprio base-time of 0

The tc-taprio base time indicates the beginning of the tc-taprio
schedule, which is cyclic by definition (where the length of the cycle
in nanoseconds is called the cycle time). The base time is a 64-bit PTP
time in the TAI domain.

Logically, the base-time should be a future time. But that imposes some
restrictions to user space, which has to retrieve the current PTP time
from the NIC first, then calculate a base time that will still be larger
than the base time by the time the kernel driver programs this value
into the hardware. Actually ensuring that the programmed base time is in
the future is still a problem even if the kernel alone deals with this.

Luckily, the enetc hardware already advances a base-time that is in the
past into a congruent time in the immediate future, according to the
same formula that can be found in the software implementation of taprio
(in taprio_get_start_time):

/* Schedule the start time for the beginning of the next
* cycle.
*/
n = div64_s64(ktime_sub_ns(now, base), cycle);
*start = ktime_add_ns(base, (n + 1) * cycle);

There's only one problem: the driver doesn't let the hardware do that.
It interferes with the base-time passed from user space, by special-casing
the situation when the base-time is zero, and replaces that with the
current PTP time. This changes the intended effective base-time of the
schedule, which will in the end have a different phase offset than if
the base-time of 0.000000000 was to be advanced by an integer multiple
of the cycle-time.

Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gro_cells: reduce number of synchronize_net() calls

After cited commit, gro_cells_destroy() became damn slow
on hosts with a lot of cores.

This is because we have one additional synchronize_net() per cpu as
stated in the changelog.

gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
to not have one synchronize_net() call per netif_napi_del()

We can factorize all the synchronize_net() to a single one,
right before freeing per-cpu memory.

Fixes: 5198d545dba8 ("net: remove napi_hash_del() from driver-facing API")
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: fix incorrect merge of patch upstream

Commit 757926247836 ("net: stmmac: add flexible PPS to dwmac
4.10a") was intended to modify the struct dwmac410_ops, but it got
somehow badly merged and modified the struct dwmac4_ops.

Revert the modification in struct dwmac4_ops and re-apply it
properly in struct dwmac410_ops.

Fixes: 757926247836 ("net: stmmac: add flexible PPS to dwmac 4.10a")
Signed-off-by: Antonio Borneo <[email protected]>
Reported-by: Ahmad Fatoum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init

kmemleak report a memory leak as follows:

unreferenced object 0xffff8880059c6a00 (size 64):
  comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
  hex dump (first 32 bytes):
    20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00   ...............
    1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00  ................
  backtrace:
    [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0
    [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170
    [<000000006a9ca9d4>] ops_init+0xa8/0x3c0
    [<000000002da57bf2>] setup_net+0x2de/0x7e0
    [<000000004e52d573>] copy_net_ns+0x27d/0x530
    [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30
    [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0
    [<0000000030653721>] ksys_unshare+0x3a4/0x780
    [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40
    [<0000000031a10c08>] do_syscall_64+0x33/0x40
    [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

We should free all rules when we catch an error in ip6addrlbl_net_init().
otherwise a memory leak will occur.

Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb

When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).

However, since

21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")

the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.

Since

1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")

this discrepency caused the misreporting of IB speculation via prctl().

On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.

Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.

[ bp: Massage commit message. ]

Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Fixes: 1978b3a53a74 ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid

Merge tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- a rand Kconfig fixup for mtk-vcodec

- a fix at h264 handling at cedrus codec driver

- some warning fixes when config PM is not enabled at marvell-ccic

- two fixes at venus codec driver: one related to codec profile and the
   other one related to a bad error path which causes an OOPS on module
   re-bind

* tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: venus: pm_helpers: Fix kernel module reload
  media: venus: venc: Fix setting of profile and level
  media: cedrus: h264: Fix check for presence of scaling matrix
  media: media/platform/marvell-ccic: fix warnings when CONFIG_PM is not enabled
  media: mtk-vcodec: fix build breakage when one of VPU or SCP is enabled
  media: mtk-vcodec: move firmware implementations into their own files

RISC-V: fix barrier() use in <vdso/processor.h>

riscv's <vdso/processor.h> uses barrier() so it should include
<asm/barrier.h>

Fixes this build error:
  CC [M]  drivers/net/ethernet/emulex/benet/be_main.o
In file included from ./include/vdso/processor.h:10,
                 from ./arch/riscv/include/asm/processor.h:11,
                 from ./include/linux/prefetch.h:15,
                 from drivers/net/ethernet/emulex/benet/be_main.c:14:
./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
   14 |  barrier();

This happens with a total of 5 networking drivers -- they all use
<linux/prefetch.h>.

rv64 allmodconfig now builds cleanly after this patch.

Fixes fallout from:
815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")

Fixes: ad5d1122b82f ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Reported-by: Andreas Schwab <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Arvind Sankar <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Add missing jump label initialization

The jump_label_init() should be called from setup_arch() very
early for proper functioning of jump label support.

Fixes: ebc00dde8a97 ("riscv: Add jump-label implementation")
Signed-off-by: Anup Patel <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Explicitly specify the build id style in vDSO Makefile again

Commit a96843372331 ("kbuild: explicitly specify the build id style")
explicitly set the build ID style to SHA1. Commit c2c81bb2f691 ("RISC-V:
Fix the VDSO symbol generaton for binutils-2.35+") undid this change,
likely unintentionally.

Restore it so that the build ID style stays consistent across the tree
regardless of linker.

Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Bill Wendling <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

efi: EFI_EARLYCON should depend on EFI

CONFIG_EFI_EARLYCON defaults to yes, and thus is enabled on systems that
do not support EFI, or do not have EFI support enabled, but do satisfy
the symbol's other dependencies.

While drivers/firmware/efi/ won't be entered during the build phase if
CONFIG_EFI=n, and drivers/firmware/efi/earlycon.c itself thus won't be
built, enabling EFI_EARLYCON does force-enable CONFIG_FONT_SUPPORT and
CONFIG_ARCH_USE_MEMREMAP_PROT, and CONFIG_FONT_8x16, which is
undesirable.

Fix this by making CONFIG_EFI_EARLYCON depend on CONFIG_EFI.

This reduces kernel size on headless systems by more than 4 KiB.

Fixes: 69c1f396f25b805a ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>

efivarfs: revert "fix memory leak in efivarfs_create()"

The memory leak addressed by commit fe5186cf12e3 is a false positive:
all allocations are recorded in a linked list, and freed when the
filesystem is unmounted. This leads to double frees, and as reported
by David, leads to crashes if SLUB is configured to self destruct when
double frees occur.

So drop the redundant kfree() again, and instead, mark the offending
pointer variable so the allocation is ignored by kmemleak.

Cc: Vamshi K Sthambamkadi <[email protected]>
Fixes: fe5186cf12e3 ("efivarfs: fix memory leak in efivarfs_create()")
Reported-by: David Laight <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>

efi/efivars: Set generic ops before loading SSDT

Efivars allows for overriding of SSDT tables, however starting with
commit

bf67fad19e493b ("efi: Use more granular check for availability for variable services")

this use case is broken. When loading SSDT generic ops should be set
first, however mentioned commit reversed order of operations. Fix this
by restoring original order of operations.

Fixes: bf67fad19e493b ("efi: Use more granular check for availability for variable services")
Signed-off-by: Amadeusz Sławiński <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Cezary Rojewski <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>

RDMA/i40iw: Address an mmap handler exploit in i40iw

i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
without any validation. This is vulnerable to an mmap exploit as described
in: https://lore.kernel.org/r/20201119093523 [email protected]

The push feature is disabled in the driver currently and therefore no push
mmaps are issued from user-space. The feature does not work as expected in
the x722 product.

Remove the push module parameter and all VMA attribute manipulations for
this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
to a single page.

Cc: <[email protected]>
Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Di Zhu <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

arm64: tegra: Fix Tegra234 VDK node names

When the device-tree board file was added for the Tegra234 VDK simulator
it incorrectly used the names 'cbb' and 'sdhci' instead of 'bus' and
'mmc', respectively. The names 'bus' and 'mmc' are required by the
device-tree json-schema validation tools. Therefore, fix this by
renaming these nodes accordingly.

Fixes: 639448912ba1 ("arm64: tegra: Initial Tegra234 VDK support")
Reported-by: Ashish Singhal <[email protected]>
Signed-off-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>

arm64: tegra: Wrong AON HSP reg property size

The AON HSP node's "reg" property size 0xa0000 will overlap with other
resources. This patch fixes that wrong value with correct size 0x90000.

Reviewed-by: Mikko Perttunen <[email protected]>
Signed-off-by: Dipen Patel <[email protected]>
Fixes: a38570c22e9d ("arm64: tegra: Add nodes for TCU on Tegra194")
Signed-off-by: Thierry Reding <[email protected]>

arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1

USB host mode is broken on the OTG port of Jetson TX1 platform because
the USB_VBUS_EN0 regulator (regulator@11) is being overwritten by the
vdd-cam-1v2 regulator. This commit rearranges USB_VBUS_EN0 to be
regulator@14.

Fixes: 257c8047be44 ("arm64: tegra: jetson-tx1: Add camera supplies")
Cc: [email protected]
Signed-off-by: JC Kuo <[email protected]>
Reviewed-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>

arm64: tegra: Correct the UART for Jetson Xavier NX

The Jetson Xavier NX board routes UARTA to the 40-pin header and UARTC
to a 12-pin debug header. The UARTs can be used by either the Tegra
Combined UART (TCU) driver or the Tegra 8250 driver. By default, the
TCU will use UARTC on Jetson Xavier NX. Currently, device-tree for
Xavier NX enables the TCU and the Tegra 8250 node for UARTC. Fix this
by disabling the Tegra 8250 node for UARTC and enabling the Tegra 8250
node for UARTA.

Fixes: 3f9efbbe57bc ("arm64: tegra: Add support for Jetson Xavier NX")
Cc: [email protected]
Signed-off-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>

arm64: tegra: Disable the ACONNECT for Jetson TX2

Commit ff4c371d2bc0 ("arm64: defconfig: Build ADMA and ACONNECT driver")
enable the Tegra ADMA and ACONNECT drivers and this is causing resume
from system suspend to fail on Jetson TX2. Resume is failing because the
ACONNECT driver is being resumed before the BPMP driver, and the ACONNECT
driver is attempting to power on a power-domain that is provided by the
BPMP. While a proper fix for the resume sequencing problem is identified,
disable the ACONNECT for Jetson TX2 temporarily to avoid breaking system
suspend.

Please note that ACONNECT driver is used by the Audio Processing Engine
(APE) on Tegra, but because there is no mainline support for APE on
Jetson TX2 currently, disabling the ACONNECT does not disable any useful
feature at the moment.

Signed-off-by: Jon Hunter <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>

spi: dw: Fix spi registration for controllers overriding CS

When SPI DW memory ops support was introduced, there was a check for
excluding controllers which supplied their own CS function. Even so,
the mem_ops pointer is *always* presented to the SPI core.

This causes the SPI core sanity check in spi_controller_check_ops() to
refuse registration, since a mem_ops pointer is being supplied without
an exec_op member function.

The end result is failure of the SPI DW driver on sparx5 and similar
platforms.

The fix in the core SPI DW driver is to avoid presenting the mem_ops
pointer if the exec_op function is not set.

Fixes: 6423207e57ea (spi: dw: Add memory operations support)
Signed-off-by: Lars Povlsen <[email protected]>
Acked-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

x86/tboot: Don't disable swiotlb when iommu is forced on

After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
domain"), swiotlb could also be used for direct memory access if IOMMU
is enabled but a device is configured to pass through the DMA translation.
Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
if "iommu=pt" kernel parameter is used.

Fixes: 327d5b2fee91 ("iommu/vt-d: Allow 32bit devices to uses DMA domain")
Reported-and-tested-by: Adrian Huang <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210237
Signed-off-by: Will Deacon <[email protected]>

trace: fix potenial dangerous pointer

The bdi_dev_name() returns a char [64], and
the __entry->name is a char [32].

It maybe dangerous to TP_printk("%s", __entry->name)
after the strncpy().

CC: [email protected]
Link: https://lore.kernel.org/r/20201124165205.GA23937@rlk
Acked-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Hui Su <[email protected]>
Signed-off-by: Jan Kara <[email protected]>

optee: add writeback to valid memory type

Only in smp systems the cache policy is setup as write alloc, in
single cpu systems the cache policy is set as writeback and it is
normal memory, so, it should pass the is_normal_memory check in the
share memory registration.

Add the right condition to make it work in no smp systems.

Fixes: cdbcf83d29c1 ("tee: optee: check type of registered shared memory")
Signed-off-by: Rui Miguel Silva <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>

drm/ast: Reload gamma LUT after changing primary plane's color format

The gamma LUT has to be reloaded after changing the primary plane's
color format. This used to be done implicitly by the CRTC atomic_enable()
helper after updating the primary plane. With the recent reordering of
the steps, the primary plane's setup was moved last and invalidated
the gamma LUT. Fix this by setting the LUT from within atomic_flush().

Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Fixes: 2f0ddd89fe32 ("drm/ast: Enable CRTC before planes")
Cc: Thomas Zimmermann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry-picked from 8e3784dfef8a03143b13e7e4011f276a954f1bc6)

drm/amdgpu: Fix size calculation when init onchip memory

Size is page count here.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1372
Reviewed-by: Christian König <[email protected]>
Signed-off-by: xinhui pan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit d836917da7e5ca9b33ef4d499972f1feeb519e00)
[airlied: from drm-next]
Signed-off-by: Dave Airlie <[email protected]>

Documentation: netdev-FAQ: suggest how to post co-dependent series

Make an explicit suggestion how to post user space side of kernel
patches to avoid reposts when patchwork groups the wrong patches.

v2: mention the cases unlike iproute2 explicitly

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

- set module owner to THIS_MODULE, by Taehee Yoo

* tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge:
batman-adv: set .owner to THIS_MODULE
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'ibmvnic-null-pointer-dereference'

Lijun Pan says:

====================
ibmvnic: null pointer dereference

Fix two NULL pointer dereference crash issues.
Improve module removal procedure.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: enhance resetting status check during module exit

Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.

Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq

crq->msgs could be NULL if the previous reset did not complete after
freeing crq->msgs. Check for NULL before dereferencing them.

Snippet of call trace:
...
ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ
ibmvnic 30000003 env3 (unregistering): Releasing CRQ
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0000000000c1a30
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic]
CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G            E     5.10.0-rc1+ #12
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400
REGS: c00000000d05b7a0 TRAP: 0380   Tainted: G            E      (5.10.0-rc1+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44002480  XER: 20040000
CFAR: c0000000000c19ec IRQMASK: 0
GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2
GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280
GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002
GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550
GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800
NIP [c0000000000c1a30] memset+0x68/0x104
LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic]
Call Trace:
[c00000000d05ba30] [0000000000000800] 0x800 (unreliable)
[c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic]
[c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic]
[c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800
[c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520
[c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0
[c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues

adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset
did not complete after freeing sub crqs. Check for NULL before
dereferencing them.

Snippet of call trace:
ibmvnic 30000006 env6: Releasing sub-CRQ
ibmvnic 30000006 env6: Releasing CRQ
...
ibmvnic 30000006 env6: Got Control IP offload Response
ibmvnic 30000006 env6: Re-setting tx_scrq[0]
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000003dea7cc
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun bridge stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmvnic ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
CPU: 80 PID: 1856 Comm: kworker/80:2 Tainted: G        W         5.8.0+ #4
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c008000003dea7cc LR: c008000003dea7bc CTR: 0000000000000000
REGS: c0000007ef7db860 TRAP: 0380   Tainted: G        W          (5.8.0+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28002422  XER: 0000000d
CFAR: c000000000bd9520 IRQMASK: 0
GPR00: c008000003dea7bc c0000007ef7dbaf0 c008000003df7400 c0000007fa26ec00
GPR04: c0000007fcd0d008 c0000007fcd96350 0000000000000027 c0000007fcd0d010
GPR08: 0000000000000023 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000002000 c00000001ec18e00 c0000000001982f8 c0000007bad6e840
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 fffffffffffffef7
GPR24: 0000000000000402 c0000007fa26f3a8 0000000000000003 c00000016f8ec048
GPR28: 0000000000000000 0000000000000000 0000000000000000 c0000007fa26ec00
NIP [c008000003dea7cc] ibmvnic_reset_init+0x15c/0x258 [ibmvnic]
LR [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic]
Call Trace:
[c0000007ef7dbaf0] [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] (unreliable)
[c0000007ef7dbb80] [c008000003de8860] __ibmvnic_reset+0x408/0x970 [ibmvnic]
[c0000007ef7dbc50] [c00000000018b7cc] process_one_work+0x2cc/0x800
[c0000007ef7dbd20] [c00000000018bd78] worker_thread+0x78/0x520
[c0000007ef7dbdb0] [c0000000001984c4] kthread+0x1d4/0x1e0
[c0000007ef7dbe20] [c00000000000cea8] ret_from_kernel_thread+0x5c/0x74

Fixes: 57a49436f4e8 ("ibmvnic: Reset sub-crqs during driver reset")
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'fixes-for-ena-driver'

Shay Agroskin says:

====================
Fixes for ENA driver

- fix wrong data offset on machines that support rx offset
- work-around Intel iommu issue
- fix out of bound access when request id is wrong
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ena: fix packet's addresses for rx_offset feature

This patch fixes two lines in which the rx_offset received by the device
wasn't taken into account:

- prefetch function:
In our driver the copied data would reside in
rx_info->page + rx_headroom + rx_offset

so the prefetch function is changed accordingly.

- setting page_offset to zero for descriptors > 1:
for every descriptor but the first, the rx_offset is zero. Hence
the page_offset value should be set to rx_headroom.

The previous implementation changed the value of rx_info after
the descriptor was added to the SKB (essentially providing wrong
page offset).

Fixes: 68f236df93a9 ("net: ena: add support for the rx offset feature")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ena: set initial DMA width to avoid intel iommu issue

The ENA driver uses the readless mechanism, which uses DMA, to find
out what the DMA mask is supposed to be.

If DMA is used without setting the dma_mask first, it causes the
Intel IOMMU driver to think that ENA is a 32-bit device and therefore
disables IOMMU passthrough permanently.

This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48
before readless initialization in
ena_device_init()->ena_com_mmio_reg_read_request_init(),
which is large enough to workaround the intel_iommu issue.

DMA mask is set again to the correct value after it's received from the
device after readless is initialized.

The patch also changes the driver to use dma_set_mask_and_coherent()
function instead of the two pci_set_dma_mask() and
pci_set_consistent_dma_mask() ones. Both methods achieve the same
effect.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Mike Cui <[email protected]>
Signed-off-by: Arthur Kiyanovski <[email protected]>
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: ena: handle bad request id in ena_netdev

After request id is checked in validate_rx_req_id() its value is still
used in the line
rx_ring->free_ids[next_to_clean] =
rx_ring->ena_bufs[i].req_id;
even if it was found to be out-of-bound for the array free_ids.

The patch moves the request id to an earlier stage in the napi routine and
makes sure its value isn't used if it's found out-of-bounds.

Fixes: 30623e1ed116 ("net: ena: avoid memory access violation by validating req_id properly")
Signed-off-by: Ido Segev <[email protected]>
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'irqchip-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

- Fix Exiu driver trigger type when using ACPI

- Fix GICv3 ITS suspend/resume to use the in-kernel path
at all times, sidestepping braindead firmware support

Link: https://lore.kernel.org/r/[email protected]

Merge tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Four smb3 fixes for stable: one fixes a memleak, the other three
  address a problem found with decryption offload that can cause a use
  after free"

* tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: Handle error case during offload read path
  smb3: Avoid Mid pending list corruption
  smb3: Call cifs reconnect from demultiplex thread
  cifs: fix a memleak with modefromsid

mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019 [email protected]/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
"page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: [email protected]
Reported-by: Qian Cai <[email protected]>
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: [email protected] # v5.8+
Signed-off-by: Linus Torvalds <[email protected]>

nfc: s3fwrn5: use signed integer for parsing GPIO numbers

GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are
signed integers, where negative number indicates error. The return
value of of_get_named_gpio() should not be assigned to an unsigned int
because in case of !CONFIG_GPIOLIB such number would be a valid GPIO.

Fixes: c04c674fadeb ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip")
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dpaa2-eth: Fix compile error due to missing devlink support

The dpaa2 driver depends on devlink, so it should select
NET_DEVLINK in order to fix compile errors, such as:

drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err':
dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get':
dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put'

Fixes: ceeb03ad8e22 ("dpaa2-eth: add basic devlink support")
Signed-off-by: Ezequiel Garcia <[email protected]>
Signed-off-by: Ioana Ciornei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: Update page pool entry

Add some file F: matches that is related to page_pool.

Acked-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Link: https://lore.kernel.org/r/160613894639.2826716.14635284017814375894.stgit@firesoul
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN

When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.

To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.

To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <[email protected]>
Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <[email protected]>

devlink: Fix reload stats structure

Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.

Reload stats structure before this fix:
"stats": {
    "reload": {
        "driver_reinit": 2,
        "fw_activate": 1,
        "fw_activate_no_reset": 0
     }
}

After this fix:
"stats": {
    "reload": {
        "driver_reinit": {
            "unspecified": 2
        },
        "fw_activate": {
            "unspecified": 1,
            "no_reset": 0
        }
}

Fixes: a254c264267e ("devlink: Add reload stats")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Heiko Carstens:
"Disable interrupts when restoring fpu and vector registers, otherwise
KVM guests might see corrupted register contents"

* tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix fpu restore in entry.S

Merge tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
"A couple more stack unwinder related fixes:

   - More stack unwinding updates

   - Misc minor fixes"

* tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: stack unwinding: reorganize how initial register state setup
  ARC: stack unwinding: don't assume non-current task is sleeping
  ARC: mm: fix spelling mistakes
  ARC: bitops: Remove unecessary operation and value

aquantia: Remove the build_skb path

When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.

aq_ring.c has this code (edited slightly for brevity):

if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
} else {
skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);

There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.

As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Lincoln Ramsay <[email protected]>
Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com
Signed-off-by: Jakub Kicinski <[email protected]>

drm/amdgpu: update golden setting for sienna_cichlid

Update golden setting for sienna_cichlid.

Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.9.x

drm/amd/display: Avoid HDCP initialization in devices without output

The HDCP feature requires at least one connector attached to the device;
however, some GPUs do not have a physical output, making the HDCP
initialization irrelevant. This patch disables HDCP initialization when
the graphic card does not have output.

Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/i915/gt: Free stale request on destroying the virtual engine

Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.

v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 46eecfccb4c2b0f258adbafb2e53ca3b822cd663)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Don't cancel the interrupt shadow too early

We currently want to keep the interrupt enabled until the interrupt after
which we have no more work to do. This heuristic was broken by us
kicking the irq-work on adding a completed request without attaching a
signaler -- hence it appearing to the irq-worker that an interrupt had
fired when we were idle.

Fixes: 2854d866327a ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 3aef910d26ef48b8a79d48b006dc04383b86dd31)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock

Make b->signaled_requests a lockless-list so that we can manipulate it
outside of the b->irq_lock.

Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 6cfe66eb71b638968350b5f0fff051fd25eb75fb)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/amdgpu: fix a page fault

The UVD firmware is copied to cpu addr in uvd_resume, so it
should be used after that. This is to fix a bug introduced by
patch drm/amdgpu: fix SI UVD firmware validate resume fail.

Signed-off-by: Sonny Jiang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: [email protected]

drm/amdgpu: fix SI UVD firmware validate resume fail

The SI UVD firmware validate key is stored at the end of firmware,
which is changed during resume while playing video. So get the key
at sw_init and store it for fw validate using.

Signed-off-by: Sonny Jiang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/amdgpu: fix null pointer in runtime pm

fix the null pointer issue when runtime pm is triggered.

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission

Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).

By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.

v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: e23005604b2f ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 9d5612ca165a58aacc160465532e7998b9aab270)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/gvt: correct a false comment of flag F_UNALIGN

Correct falsely removed comment of flag F_UNALIGN.

Fixes: a6c5817a38cf ("drm/i915/gvt: remove flag F_CMD_ACCESSED")
Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Yan Zhao <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 6594094f819e0020e926e137e47e2edb97ba500b)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/perf: workaround register corruption in OATAILPTR

After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.

When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.

The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.

This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.

Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 059a0beb486344a577ff476acce75e69eab704be)
Signed-off-by: Rodrigo Vivi <[email protected]>

intel_idle: Fix intel_idle() vs tracing

cpuidle->enter() callbacks should not call into tracing because RCU
has already been disabled. Instead of doing the broadcast thing
itself, simply advertise to the cpuidle core that those states stop
the timer.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/idle: Fix arch_cpu_idle() vs tracing

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

io_uring: fix ITER_BVEC check

iov_iter::type is a bitmask that also keeps direction etc., so it
shouldn't be directly compared against ITER_*. Use proper helper.

Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: David Howells <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Cc: <[email protected]> # 5.9
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix shift-out-of-bounds when round up cq size

Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():

[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239

This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.

Fixes: 88ec3211e463 ("io_uring: round-up cq size before comparing with rounded sq size")
Reported-by: Abaci Fuzz <[email protected]>
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

spi: imx: fix the unbalanced spi runtime pm management

If set active without increase the usage count of pm, the dont use
autosuspend function will call the suspend callback to close the two
clocks of spi because the usage count is reduced to -1.
This will cause the warning dump below when the defer-probe occurs.

[ 129.379701] ecspi2_root_clk already disabled
[ 129.384005] WARNING: CPU: 1 PID: 33 at drivers/clk/clk.c:952 clk_core_disable+0xa4/0xb0

So add the get noresume function before set active.

Fixes: 43b6bf406cd0 spi: imx: fix runtime pm support for !CONFIG_PM
Signed-off-by: Clark Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

firmware: xilinx: Use hash-table for api feature check

Currently array of fix length PM_API_MAX is used to cache
the pm_api version (valid or invalid). However ATF based
PM APIs values are much higher then PM_API_MAX.
So to include ATF based PM APIs also, use hash-table to
store the pm_api version status.

Signed-off-by: Amit Sunil Dhamne <[email protected]>
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Ravi Patel <[email protected]>
Signed-off-by: Rajan Vaja <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Tested-by: Michal Simek <[email protected]>
Fixes: f3217d6f2f7a ("firmware: xilinx: fix out-of-bounds access")
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michal Simek <[email protected]>

firmware: xilinx: Fix SD DLL node reset issue

Fix the SD DLL node reset issue where incorrect node is being referenced
instead of SD DLL node.

Fixes: 426c8d85df7a ("firmware: xilinx: Use APIs instead of IOCTLs")
Signed-off-by: Manish Narani <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michal Simek <[email protected]>

x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak

On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.

In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().

Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.

For example, reference count leak is observed in these two cases:
  (1) mount -t resctrl resctrl /sys/fs/resctrl
      mkdir /sys/fs/resctrl/c1
      mkdir /sys/fs/resctrl/c1/mon_groups/m1
      umount /sys/fs/resctrl

  (2) mkdir /sys/fs/resctrl/c1
      mkdir /sys/fs/resctrl/c1/mon_groups/m1
      rmdir /sys/fs/resctrl/c1

The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().

Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
  (1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
  kernfs_put() and kfree().

  (2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
  structure is about to be freed by kfree().

  (3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
  exit paths of mkdir where an extra reference is taken by kernfs_get().

Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak

Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.

On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().

With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.

Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.

Superfluous kernfs_get() calls are removed from two areas:

  (1) In call paths of mount and mkdir, when kernfs nodes for "info",
  "mon_groups" and "mon_data" directories and sub-directories are
  created, the reference count of newly created kernfs node is set to 1.
  But after kernfs_create_dir() returns, superfluous kernfs_get() are
  called to take an additional reference.

  (2) kernfs_get() calls in rmdir call paths.

Fixes: 17eafd076291 ("x86/intel_rdt: Split resource group removal in two")
Fixes: 4af4a88e0c92 ("x86/intel_rdt/cqm: Add mount,umount support")
Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data")
Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Fixes: 5dc1d5c6bac2 ("x86/intel_rdt: Simplify info and base file lists")
Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system")
Reported-by: Willem de Bruijn <[email protected]>
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Willem de Bruijn <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

net/packet: fix packet receive on L3 devices without visible hard header

In the patchset merged by commit b9fcf0a0d826
("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
did not have header_ops were given one for the purpose of protocol parsing
on af_packet transmit path.

That change made af_packet receive path regard these devices as having a
visible L3 header and therefore aligned incoming skb->data to point to the
skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
reset their mac_header prior to ingress and therefore their incoming
packets became malformed.

Ideally these devices would reset their mac headers, or af_packet would be
able to rely on dev->hard_header_len being 0 for such cases, but it seems
this is not the case.

Fix by changing af_packet RX ll visibility criteria to include the
existence of a '.create()' header operation, which is used when creating
a device hard header - via dev_hard_header() - by upper layers, and does
not exist in these L3 devices.

As this predicate may be useful in other situations, add it as a common
dev_has_header() helper in netdevice.h.

Fixes: b9fcf0a0d826 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
Signed-off-by: Eyal Birger <[email protected]>
Acked-by: Jason A. Donenfeld <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)

The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.

During LTP testing, the following error was generated:

Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
  ESR = 0x96000007
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G    B   W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
irq_affinity_hint_proc_show+0x54/0xb0
seq_read+0x1b0/0x440
proc_reg_read+0x80/0xd8
__vfs_read+0x60/0x178
vfs_read+0x94/0x150
ksys_read+0x74/0xf0
__arm64_sys_read+0x24/0x30
el0_svc_common.constprop.0+0xd8/0x1a0
el0_svc_handler+0x34/0x88
el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Fix it by using 'cpumask_of(cpu)' to get the cpumask.

Signed-off-by: Hao Si <[email protected]>
Signed-off-by: Lin Chen <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Signed-off-by: Li Yang <[email protected]>

i40e: Fix removing driver while bare-metal VFs pass traffic

Prevent VFs from resetting when PF driver is being unloaded:
- introduce new pf state: __I40E_VF_RESETS_DISABLED;
- check if pf state has __I40E_VF_RESETS_DISABLED state set,
  if so, disable any further VFLR event notifications;
- when i40e_remove (rmmod i40e) is called, disable any resets on
  the VFs;

Previously if there were bare-metal VFs passing traffic and PF
driver was removed, there was a possibility of VFs triggering a Tx
timeout right before iavf_remove. This was causing iavf_close to
not be called because there is a check in the beginning of  iavf_remove
that bails out early if adapter->state < IAVF_DOWN_PENDING. This
makes it so some resources do not get cleaned up.

Fixes: 6a9ddb36eeb8 ("i40e: disable IOV before freeing resources")
Signed-off-by: Slawomir Laba <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vsock/virtio: discard packets only when socket is really closed

Starting from commit 8692cefc433f ("virtio_vsock: Fix race condition
in virtio_transport_recv_pkt"), we discard packets in
virtio_transport_recv_pkt() if the socket has been released.

When the socket is connected, we schedule a delayed work to wait the
RST packet from the other peer, also if SHUTDOWN_MASK is set in
sk->sk_shutdown.
This is done to complete the virtio-vsock shutdown algorithm, releasing
the port assigned to the socket definitively only when the other peer
has consumed all the packets.

If we discard the RST packet received, the socket will be closed only
when the VSOCK_CLOSE_TIMEOUT is reached.

Sergio discovered the issue while running ab(1) HTTP benchmark using
libkrun [1] and observing a latency increase with that commit.

To avoid this issue, we discard packet only if the socket is really
closed (SOCK_DONE flag is set).
We also set SOCK_DONE in virtio_transport_release() when we don't need
to wait any packets from the other peer (we didn't schedule the delayed
work). In this case we remove the socket from the vsock lists, releasing
the port assigned.

[1] https://github.com/containers/libkrun

Fixes: 8692cefc433f ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt")
Cc: [email protected]
Reported-by: Sergio Lopez <[email protected]>
Tested-by: Sergio Lopez <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Acked-by: Jia He <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: fix race condition when creating child sockets from syncookies

When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.

The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.

Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.

When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.

This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we drop the packet and discard the second child socket
to the same client.

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V fix from Wei Liu:
"One patch from Dexuan to fix VRAM cache type in Hyper-V framebuffer
driver"

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
video: hyperv_fb: Fix the cache type when mapping the VRAM

Merge tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.10

First set of fixes for v5.10. One fix for iwlwifi kernel panic, others
less notable.

rtw88

* fix a bogus test found by clang

iwlwifi

* fix long memory reads causing soft lockup warnings

* fix kernel panic during Channel Switch Announcement (CSA)

* other smaller fixes

MAINTAINERS

* email address updates

* tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
  iwlwifi: mvm: fix kernel panic in case of assert during CSA
  iwlwifi: pcie: set LTR to avoid completion timeout
  iwlwifi: mvm: write queue_sync_state only for sync
  iwlwifi: mvm: properly cancel a session protection for P2P
  iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC
  iwlwifi: sta: set max HE max A-MPDU according to HE capa
  MAINTAINERS: update maintainers list for Cypress
  MAINTAINERS: update Yan-Hsuan's email address
  iwlwifi: pcie: limit memory read spin time
  rtw88: fix fw_fifo_addr check
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>