Git Repo - linux.git/log

Merge tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
"This includes new ioctls to get and set parameters and in particular
  the backup switch mode that is needed for some RTCs to actually enable
  the backup voltage (and have a useful RTC).

  The same interface can also be used to get the actual features
  supported by the RTC so userspace has a better way than trying and
  failing.

  Summary:

  Subsystem:
   - Add new ioctl to get and set extra RTC parameters, this includes
     backup switch mode
   - Expose available features to userspace, in particular, when alarmas
     have a resolution of one minute instead of a second.
   - Let the core handle those alarms with a minute resolution

  New driver:
   - MSTAR MSC313 RTC

  Drivers:
   - Add SPI ID table where necessary
   - Add BSM support for rv3028, rv3032 and pcf8523
   - s3c: set RTC range
   - rx8025: set range, implement .set_offset and .read_offset"

* tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  rtc: rx8025: use .set_offset/.read_offset
  rtc: rx8025: use rtc_add_group
  rtc: rx8025: clear RTC_FEATURE_ALARM when alarm are not supported
  rtc: rx8025: set range
  rtc: rx8025: let the core handle the alarm resolution
  rtc: rx8025: switch to devm_rtc_allocate_device
  rtc: ab8500: let the core handle the alarm resolution
  rtc: ab-eoz9: support UIE when available
  rtc: ab-eoz9: use RTC_FEATURE_UPDATE_INTERRUPT
  rtc: rv3032: let the core handle the alarm resolution
  rtc: s35390a: let the core handle the alarm resolution
  rtc: handle alarms with a minute resolution
  rtc: pcf85063: silence cppcheck warning
  rtc: rv8803: fix writing back ctrl in flag register
  rtc: s3c: Add time range
  rtc: s3c: Extract read/write IO into separate functions
  rtc: s3c: Remove usage of devm_rtc_device_register()
  rtc: tps80031: Remove driver
  rtc: sun6i: Allow probing without an early clock provider
  rtc: pcf8523: add BSM support
  ...

x86/mce: Add errata workaround for Skylake SKX37

Errata SKX37 is word-for-word identical to the other errata listed in
this workaround. I happened to notice this after investigating a CMCI
storm on a Skylake host. While I can't confirm this was the root cause,
spurious corrected errors does sound like a likely suspect.

Fixes: 2976908e4198 ("x86/mce: Do not log spurious corrected mce errors")
Signed-off-by: Dave Jones <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'libata-5.16-rc1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull more libata updates from Damien Le Moal:
"Second round of updates for libata for 5.16:

   - Fix READ LOG EXT and READ LOG DMA EXT command timeouts during disk
     revalidation after a resume or a modprobe of the LLDD (me)

   - Remove unnecessary error message in sata_highbank driver (Xu)

   - Better handling of accesses to the IDENTIFY DEVICE data log for
     drives that do not support this log page (me)

   - Fix ahci_shost_attr_group declaration in ahci driver (me)"

* tag 'libata-5.16-rc1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  libata: libahci: declare ahci_shost_attr_group as static
  libata: add horkage for missing Identify Device log
  ata: sata_highbank: Remove unnecessary print function dev_err()
  libata: fix read log timeout value

smb3: do not setup the fscache_super_cookie until fsinfo initialized

We were calling cifs_fscache_get_super_cookie after tcon but before
we queried the info (QFS_Info) we need to initialize the cookie
properly. Also includes an additional check suggested by Paulo
to make sure we don't initialize super cookie twice.

Suggested-by: David Howells <[email protected]>
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

tools/lib/lockdep: drop liblockdep

TL;DR: While a tool like liblockdep is useful, it probably doesn't
belong within the kernel tree.

liblockdep attempts to reuse kernel code both directly (by directly
building the kernel's lockdep code) as well as indirectly (by using
sanitized headers). This makes liblockdep an integral part of the
kernel.

It also makes liblockdep quite unique: while other userspace code might
use sanitized headers, it generally doesn't attempt to use kernel code
directly which means that changes on the kernel side of things don't
affect (and break) it directly.

All our workflows and tooling around liblockdep don't support this
uniqueness. Changes that go into the kernel code aren't validated to not
break in-tree userspace code.

liblockdep ended up being very fragile, breaking over and over, to the
point that living in the same tree as the lockdep code lost most of it's
value.

liblockdep should continue living in an external tree, syncing with
the kernel often, in a controllable way.

Signed-off-by: Sasha Levin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

cifs: fix potential use-after-free bugs

Ensure that share and prefix variables are set to NULL after kfree()
when looping through DFS targets in __tree_connect_dfs_target().

Also, get rid of @ref in __tree_connect_dfs_target() and just pass a
boolean to indicate whether we're handling link targets or not.

Fixes: c88f7dcd6d64 ("cifs: support nested dfs links over reconnect")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix memory leak of smb3_fs_context_dup::server_hostname

Fix memory leak of smb3_fs_context_dup::server_hostname when parsing
and duplicating fs contexts during mount(2) as reported by kmemleak:

  unreferenced object 0xffff888125715c90 (size 16):
    comm "mount.cifs", pid 3832, jiffies 4304535868 (age 190.094s)
    hex dump (first 16 bytes):
      7a 65 6c 64 61 2e 74 65 73 74 00 6b 6b 6b 6b a5  zelda.test.kkkk.
    backtrace:
      [<ffffffff8168106e>] kstrdup+0x2e/0x60
      [<ffffffffa027a362>] smb3_fs_context_dup+0x392/0x8d0 [cifs]
      [<ffffffffa0136353>] cifs_smb3_do_mount+0x143/0x1700 [cifs]
      [<ffffffffa02795e8>] smb3_get_tree+0x2e8/0x520 [cifs]
      [<ffffffff817a19aa>] vfs_get_tree+0x8a/0x2d0
      [<ffffffff8181e3e3>] path_mount+0x423/0x1a10
      [<ffffffff8181fbca>] __x64_sys_mount+0x1fa/0x270
      [<ffffffff83ae364b>] do_syscall_64+0x3b/0x90
      [<ffffffff83c0007c>] entry_SYSCALL_64_after_hwframe+0x44/0xae
  unreferenced object 0xffff888111deed20 (size 32):
    comm "mount.cifs", pid 3832, jiffies 4304536044 (age 189.918s)
    hex dump (first 32 bytes):
      44 46 53 52 4f 4f 54 31 2e 5a 45 4c 44 41 2e 54  DFSROOT1.ZELDA.T
      45 53 54 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  EST.kkkkkkkkkkk.
    backtrace:
      [<ffffffff8168118d>] kstrndup+0x2d/0x90
      [<ffffffffa027ab2e>] smb3_parse_devname+0x9e/0x360 [cifs]
      [<ffffffffa01870c8>] cifs_setup_volume_info+0xa8/0x470 [cifs]
      [<ffffffffa018c469>] connect_dfs_target+0x309/0xc80 [cifs]
      [<ffffffffa018d6cb>] cifs_mount+0x8eb/0x17f0 [cifs]
      [<ffffffffa0136475>] cifs_smb3_do_mount+0x265/0x1700 [cifs]
      [<ffffffffa02795e8>] smb3_get_tree+0x2e8/0x520 [cifs]
      [<ffffffff817a19aa>] vfs_get_tree+0x8a/0x2d0
      [<ffffffff8181e3e3>] path_mount+0x423/0x1a10
      [<ffffffff8181fbca>] __x64_sys_mount+0x1fa/0x270
      [<ffffffff83ae364b>] do_syscall_64+0x3b/0x90
      [<ffffffff83c0007c>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb3: add additional null check in SMB311_posix_mkdir

Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.

Addresses-Coverity: 1437501 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: release lock earlier in dequeue_mid error case

In dequeue_mid we can log an error while holding a spinlock,
GlobalMid_Lock. Coverity notes that the error logging
also grabs a lock so it is cleaner (and a bit safer) to
release the GlobalMid_Lock before logging the warning.

Addresses-Coverity: 1507573 ("Thread deadlock")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

thermal: int340x: fix build on 32-bit targets

Commit aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.

That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access. Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.

It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.

The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.

So just add the include for the 'io-64-nonatomic-lo-hi.h' version.

Fixes: aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Reported-by: Jakub Kicinski <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

net,lsm,selinux: revert the security_sctp_assoc_established() hook

This patch reverts two prior patches, e7310c94024c
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e6a ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation. Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.

Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel. In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.

Signed-off-by: Paul Moore <[email protected]>

blk-mq: fix filesystem I/O request allocation

submit_bio_checks() may update bio->bi_opf, so we have to initialize
blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks()
returns when allocating new request.

In case of using cached request, fallback to allocate new request if
cached rq isn't compatible with the incoming bio, otherwise change
rq->cmd_flags with incoming bio->bi_opf.

Fixes: 900e080752025f00 ("block: move queue enter logic into blk_mq_submit_bio()")
Reported-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

smb3: add additional null check in SMB2_tcon

Although unlikely to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.

Addresses-Coverity: 1420428 ("Explicit null dereferenced")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb3: add additional null check in SMB2_open

Although unlikely to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.

Addresses-Coverity: 1418458 ("Explicit null dereferenced")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

of/irq: Don't ignore interrupt-controller when interrupt-map failed

Since 041284181226 ("of/irq: Allow matching of an interrupt-map local
to an interrupt controller"), the irq code favors using an interrupt-map
over a interrupt-controller property if both are available, while the
earlier behaviour was to ignore the interrupt-map altogether.

However, we now end-up with the opposite behaviour, which is to
ignore the interrupt-controller property even if the interrupt-map
fails to match its input. This new behaviour breaks the AmigaOne
X1000 machine, which ships with an extremely "creative" (read:
broken) device tree.

Fix this by allowing the interrupt-controller property to be selected
when interrupt-map fails to match anything.

Fixes: 041284181226 ("of/irq: Allow matching of an interrupt-map local to an interrupt controller")
Reported-by: Christian Zigotzky <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Cc: Bjorn Helgaas <[email protected]>

irqchip/sifive-plic: Fixup EOI failed when masked

When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver,
only the first interrupt is handled, and following interrupts are never
delivered (initially reported in [1]).

That's because the RISC-V PLIC cannot EOI masked interrupts, as explained
in the description of Interrupt Completion in the PLIC spec [2]:

<quote>
The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored.
</quote>

Re-enable the interrupt before completion if it has been masked during
the handling, and remask it afterwards.

[1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html
[2] https://github.com/riscv/riscv-plic-spec/blob/8bc15a35d07c9edf7b5d23fec9728302595ffc4d/riscv-plic.adoc

Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow")
Reported-by: Vincent Pelletier <[email protected]>
Tested-by: Nikita Shubin <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Atish Patra <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
[maz: amended commit message]
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/csky-mpintc: Fixup mask/unmask implementation

The mask/unmask must be implemented, and enable/disable supplement
them if the HW requires something different at startup time. When
irq source is disabled by mask, mpintc could complete irq normally.

So drop enable/disable if favour of mask/unmask.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

bpf: Fix inner map state pruning regression.

Introduction of map_uid made two lookups from outer map to be distinct.
That distinction is only necessary when inner map has an embedded timer.
Otherwise it will make the verifier state pruning to be conservative
which will cause complex programs to hit 1M insn_processed limit.
Tighten map_uid logic to apply to inner maps with timers only.

Fixes: 3e8ce29850f1 ("bpf: Prevent pointer mismatch in bpf_timer_init.")
Reported-by: Lorenz Bauer <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/CACAyw99hVEJFoiBH_ZGyy=+oO-jyydoz6v1DeKPKs2HVsUH28w@mail.gmail.com
Link: https://lore.kernel.org/bpf/[email protected]

xsk: Fix crash on double free in buffer pool

Fix a crash in the buffer pool allocator when a buffer is double
freed. It is possible to trigger this behavior not only from a faulty
driver, but also from user space like this: Create a zero-copy AF_XDP
socket. Load an XDP program that will issue XDP_DROP for all
packets. Put the same umem buffer into the fill ring multiple times,
then bind the socket and send some traffic. This will crash the kernel
as the XDP_DROP action triggers one call to xsk_buff_free()/xp_free()
for every packet dropped. Each call will add the corresponding buffer
entry to the free_list and increase the free_list_cnt. Some entries
will have been added multiple times due to the same buffer being
freed. The buffer allocation code will then traverse this broken list
and since the same buffer is in the list multiple times, it will try
to delete the same buffer twice from the list leading to a crash.

The fix for this is just to test that the buffer has not been added
before in xp_free(). If it has been, just return from the function and
do not put it in the free_list a second time.

Note that this bug was not present in the code before the commit
referenced in the Fixes tag. That code used one list entry per
allocated buffer, so multiple frees did not have any side effects. But
the commit below optimized the usage of the pool and only uses a
single entry per buffer in the umem, meaning that multiple
allocations/frees of the same buffer will also only use one entry,
thus leading to the problem.

Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

tracing/osnoise: Make osnoise_instances static

Make the struct list_head osnoise_instances definition static.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/d001f0eeac66e2b2eeec7d2a15e9e7abede0453a.1636667971.git.bristot@kernel.org
Cc: Ingo Molnar <[email protected]>
Fixes: dae181349f1e ("tracing/osnoise: Support a list of trace_array *tr")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

perf test: Use macro for "suite" definitions

Add a macro to simplify later refactoring. No functional change.

Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Sohaib Mohamed <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Daniel Latypov <[email protected]>
Cc: David Gow <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf test: Use macro for "suite" declarations

Currently tests are setup in builtin-test with function pointers. Kunit
exposes tests as a kunit_suite with a null terminated array of test
cases. Use a macro to aid transition from one to the other in later
changes.

Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Sohaib Mohamed <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Daniel Latypov <[email protected]>
Cc: David Gow <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty: Add socket level scnprintf that handles ARCH specific SOL_SOCKET

SOL_SOCKET has a different value according to the architecture, some
have it as 0xffff while all the others have it as 1, so a simple string
array isn't usable, add a scnprintf routine that treats it as a special
case, using the array for other values.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf trace: Beautify the 'level' argument of setsockopt

  # perf trace -e setsockopt
     0.000 ( 0.019 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 50, optval: 0x7ffee2c0c134, optlen: 4) = 0
     0.022 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 11, optval: 0x7ffee2c0c114, optlen: 4) = 0
     0.027 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 8, optval: 0x7ffee2c0c134, optlen: 4) = 0
     0.032 ( 0.002 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 10, optval: 0x7ffee2c0c134, optlen: 4) = 0
     0.036 ( 0.002 ms): systemd-resolv/1121 setsockopt(fd: 22, level: IP, optname: 25, optval: 0x7ffee2c0c114, optlen: 4) = 0
     0.043 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: 1, optname: 62, optval: 0x7ffee2c0c0fc, optlen: 4) = 0
     0.055 ( 0.003 ms): systemd-resolv/1121 setsockopt(fd: 22, level: 1, optname: 25)
  ^C#

So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf trace: Beautify the 'level' argument of getsockopt

  # perf trace -e getsockopt
       0.000 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
       0.301 ( 0.003 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
       2.215 ( 0.005 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
       2.422 ( 0.005 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
    1001.308 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
    1001.586 ( 0.003 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
    1001.647 ( 0.002 ms): systemd-resolv/1121 getsockopt(fd: 23, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
    1003.868 ( 0.010 ms): systemd-resolv/1121 getsockopt(fd: 21, level: 1, optname: 17, optval: 0x7ffee2c0c6cc, optlen: 0x7ffee2c0c6c8) = 0
    1004.036 ( 0.006 ms): systemd-resolv/1121 getsockopt(fd: 22, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
    1004.087 ( 0.002 ms): systemd-resolv/1121 getsockopt(fd: 23, level: IP, optname: 14, optval: 0x7ffee2c0c1a0, optlen: 0x7ffee2c0c1a4) = -1 ENOTCONN (Transport endpoint is not connected)
  ^C#

So the simple straight STRARRAY method is not enough as SOL_SOCKET is
'1' in most architectures but some use 0xffff (alpha, mips, parisc and
sparc), so a followup patch will create a specialized scnprintf to cover
that.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty socket: Add generator for socket level (SOL_*) string table

  $ tools/perf/trace/beauty/socket.sh
  static const char *socket_ipproto[] = {
   [0] = "IP",
   [1] = "ICMP",
  <SNIP>
   [255] = "RAW",
   [262] = "MPTCP",
  };

  static const char *socket_level[] = {
   [0] = "IP",
   [6] = "TCP",
   [17] = "UDP",
   [41] = "IPV6",
   [58] = "ICMPV6",
   [132] = "SCTP",
   [136] = "UDPLITE",
   [255] = "RAW",
   [256] = "IPX",
   [257] = "AX25",
   [258] = "ATALK",
   [259] = "NETROM",
   [260] = "ROSE",
   [261] = "DECNET",
   [262] = "X25",
   [263] = "PACKET",
   [264] = "ATM",
   [265] = "AAL",
   [266] = "IRDA",
   [267] = "NETBEUI",
   [268] = "LLC",
   [269] = "DCCP",
   [270] = "NETLINK",
   [271] = "TIPC",
   [272] = "RXRPC",
   [273] = "PPPOL2TP",
   [274] = "BLUETOOTH",
   [275] = "PNPIPE",
   [276] = "RDS",
   [277] = "IUCV",
   [278] = "CAIF",
   [279] = "ALG",
   [280] = "NFC",
   [281] = "KCM",
   [282] = "TLS",
   [283] = "XDP",
   [284] = "MPTCP",
   [285] = "MCTP",
  };
  $

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty socket: Sort the ipproto array entries

Just tidying up the output for human consumption.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty socket: Rename 'regex' to 'ipproto_regex'

Paving the way for more regexps to be used here.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty socket: Prep to receive more input header files

Move from ternary like expression to an if block, this way we'll
have just the extra lines for new files in the following patches.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty socket: Rename header_dir to uapi_header_dir

Paving the way to pass more headers to be consumed, like
tools/perf/trace/beauty/include/linux/socket.h in addition to the
current tools/include/uapi/linux/in.h, to get the SOL_* defines.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty: Rename socket_ipproto.sh to socket.sh to hold more socket table generators

To avoid having to add new entries to tools/perf/Makefile.perf prep
socket.sh so that it can generate other socket table generators, such as
the upcoming SOL_ socket level one.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty: Make all sockaddr files use a common naming scheme

The script that generates the tables was named 'socket.sh', which is
confusing, rename it to sockaddr.sh and make sure the related
Makefile.perf targets also use the 'sockaddr' namespace.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

ARM: 9156/1: drop cc-option fallbacks for architecture selection

Naresh and Antonio ran into a build failure with latest Debian
armhf compilers, with lots of output like

tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode

As it turns out, $(cc-option) fails early here when the FPU is not
selected before CPU architecture is selected, as the compiler
option check runs before enabling -msoft-float, which causes
a problem when testing a target architecture level without an FPU:

cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU

Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this
issue, but the fallback logic is already broken because all supported
compilers (gcc-5 and higher) are much more recent than these options,
and building with -march=armv5t as a fallback no longer works.

The best way forward that I see is to just remove all the checks, which
also has the nice side-effect of slightly improving the startup time for
'make'.

The -mtune=marvell-f option was apparently never supported by any mainline
compiler, and the custom Codesourcery gcc build that did support is
now too old to build kernels, so just use -mtune=xscale unconditionally
for those.

This should be safe to apply on all stable kernels, and will be required
in order to keep building them with gcc-11 and higher.

Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419
Reported-by: Antonio Terceiro <[email protected]>
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Sebastian Reichel <[email protected]>
Tested-by: Klaus Kudielka <[email protected]>
Cc: Matthias Klose <[email protected]>
Cc: [email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

ARM: 9155/1: fix early early_iounmap()

Currently __set_fixmap() bails out with a warning when called in early boot
from early_iounmap(). Fix it, and while at it, make the comment a bit easier
to understand.

Cc: <[email protected]>
Fixes: b089c31c519c ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap")
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

blkcg: Remove extra blkcg_bio_issue_init

KASAN reports a use-after-free report when doing block test:

==================================================================
[10050.967049] BUG: KASAN: use-after-free in
submit_bio_checks+0x1539/0x1550

[10050.977638] Call Trace:
[10050.978190]  dump_stack+0x9b/0xce
[10050.979674]  print_address_description.constprop.6+0x3e/0x60
[10050.983510]  kasan_report.cold.9+0x22/0x3a
[10050.986089]  submit_bio_checks+0x1539/0x1550
[10050.989576]  submit_bio_noacct+0x83/0xc80
[10050.993714]  submit_bio+0xa7/0x330
[10050.994435]  mpage_readahead+0x380/0x500
[10050.998009]  read_pages+0x1c1/0xbf0
[10051.002057]  page_cache_ra_unbounded+0x4c2/0x6f0
[10051.007413]  do_page_cache_ra+0xda/0x110
[10051.008207]  force_page_cache_ra+0x23d/0x3d0
[10051.009087]  page_cache_sync_ra+0xca/0x300
[10051.009970]  generic_file_buffered_read+0xbea/0x2130
[10051.012685]  generic_file_read_iter+0x315/0x490
[10051.014472]  blkdev_read_iter+0x113/0x1b0
[10051.015300]  aio_read+0x2ad/0x450
[10051.023786]  io_submit_one+0xc8e/0x1d60
[10051.029855]  __se_sys_io_submit+0x125/0x350
[10051.033442]  do_syscall_64+0x2d/0x40
[10051.034156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.048733] Allocated by task 18598:
[10051.049482]  kasan_save_stack+0x19/0x40
[10051.050263]  __kasan_kmalloc.constprop.1+0xc1/0xd0
[10051.051230]  kmem_cache_alloc+0x146/0x440
[10051.052060]  mempool_alloc+0x125/0x2f0
[10051.052818]  bio_alloc_bioset+0x353/0x590
[10051.053658]  mpage_alloc+0x3b/0x240
[10051.054382]  do_mpage_readpage+0xddf/0x1ef0
[10051.055250]  mpage_readahead+0x264/0x500
[10051.056060]  read_pages+0x1c1/0xbf0
[10051.056758]  page_cache_ra_unbounded+0x4c2/0x6f0
[10051.057702]  do_page_cache_ra+0xda/0x110
[10051.058511]  force_page_cache_ra+0x23d/0x3d0
[10051.059373]  page_cache_sync_ra+0xca/0x300
[10051.060198]  generic_file_buffered_read+0xbea/0x2130
[10051.061195]  generic_file_read_iter+0x315/0x490
[10051.062189]  blkdev_read_iter+0x113/0x1b0
[10051.063015]  aio_read+0x2ad/0x450
[10051.063686]  io_submit_one+0xc8e/0x1d60
[10051.064467]  __se_sys_io_submit+0x125/0x350
[10051.065318]  do_syscall_64+0x2d/0x40
[10051.066082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.067455] Freed by task 13307:
[10051.068136]  kasan_save_stack+0x19/0x40
[10051.068931]  kasan_set_track+0x1c/0x30
[10051.069726]  kasan_set_free_info+0x1b/0x30
[10051.070621]  __kasan_slab_free+0x111/0x160
[10051.071480]  kmem_cache_free+0x94/0x460
[10051.072256]  mempool_free+0xd6/0x320
[10051.072985]  bio_free+0xe0/0x130
[10051.073630]  bio_put+0xab/0xe0
[10051.074252]  bio_endio+0x3a6/0x5d0
[10051.074984]  blk_update_request+0x590/0x1370
[10051.075870]  scsi_end_request+0x7d/0x400
[10051.076667]  scsi_io_completion+0x1aa/0xe50
[10051.077503]  scsi_softirq_done+0x11b/0x240
[10051.078344]  blk_mq_complete_request+0xd4/0x120
[10051.079275]  scsi_mq_done+0xf0/0x200
[10051.080036]  virtscsi_vq_done+0xbc/0x150
[10051.080850]  vring_interrupt+0x179/0x390
[10051.081650]  __handle_irq_event_percpu+0xf7/0x490
[10051.082626]  handle_irq_event_percpu+0x7b/0x160
[10051.083527]  handle_irq_event+0xcc/0x170
[10051.084297]  handle_edge_irq+0x215/0xb20
[10051.085122]  asm_call_irq_on_stack+0xf/0x20
[10051.085986]  common_interrupt+0xae/0x120
[10051.086830]  asm_common_interrupt+0x1e/0x40

==================================================================

Bio will be checked at beginning of submit_bio_noacct(). If bio needs
to be throttled, it will start the timer and stop submit bio directly.
Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
But in the current process, if bio is throttled, it will still set bio
issue->value by blkcg_bio_issue_init(). This is redundant and may cause
the above use-after-free.

CPU0                                   CPU1
submit_bio
submit_bio_noacct
  submit_bio_checks
    blk_throtl_bio()
      <=mod_timer(&sq->pending_timer
                                      blk_throtl_dispatch_work_fn
                                        submit_bio_noacct() <= bio have
                                        throttle tag, will throw directly
                                        and bio issue->value will be set
                                        here

                                      bio_endio()
                                      bio_put()
                                      bio_free() <= free this bio

    blkcg_bio_issue_init(bio)
      <= bio has been freed and
      will lead to UAF
  return BLK_QC_T_NONE

Fix this by remove extra blkcg_bio_issue_init.

Fixes: e439bedf6b24 (blkcg: consolidate bio_issue_init() to be a part of core)
Signed-off-by: Laibin Qiu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from

Use the same cleanup code independent of whether the cgroup to be
uncharged and unref'd is the source or the destination cgroup. Use a
bool to track whether the destination cgroup has been charged, which also
fixes a bug in the error case: the destination cgroup must be uncharged
only if it does not match the source.

Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: move guest_pv_has out of user_access section

When UBSAN is enabled, the code emitted for the call to guest_pv_has
includes a call to __ubsan_handle_load_invalid_value. objtool
complains that this call happens with UACCESS enabled; to avoid
the warning, pull the calls to user_access_begin into both arms
of the "if" statement, after the check for guest_pv_has.

Reported-by: Stephen Rothwell <[email protected]>
Cc: David Woodhouse <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'next' into for-linus

Prepare input updates for 5.16 merge window.

dt-bindings: watchdog: sunxi: fix error in schema

"maxItems" is not needed with an "items" list

Fixes:
$ DT_SCHEMA_FILES=Documentation/devicetree/bindings/watchdog/allwinner,sun4i-a10-wdt.yaml make dtbs_check
Documentation/devicetree/bindings/watchdog/allwinner,sun4i-a10-wdt.yaml: properties:clocks: {'required': ['maxItems']} is not allowed for {'minItems': 1, 'maxItems': 2, 'items': [{'description': 'High-frequency oscillator input, divided internally'}, {'description': 'Low-frequency oscillator input, only found on some variants'}]}
hint: "maxItems" is not needed with an "items" list
from schema $id: http://devicetree.org/meta-schemas/items.yaml#
...

Signed-off-by: David Heidelberg <[email protected]>
Acked-by: Rob Herring <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

bindings: media: venus: Drop redundant maxItems for power-domain-names

make dt_binding_check:

    Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml: ignoring, error in schema: properties: power-domain-names
    warning: no schema found in file: Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml
    Documentation/devicetree/bindings/media/qcom,sc7280-venus.yaml: properties:power-domain-names: {'required': ['maxItems']} is not allowed for {'minItems': 2, 'maxItems': 3, 'items': [{'const': 'venus'}, {'const': 'vcodec0'}, {'const': 'cx'}]}
   hint: "maxItems" is not needed with an "items" list
   from schema $id: http://devicetree.org/meta-schemas/items.yaml#

Fixes: e48b839b6699c226 ("media: dt-bindings: media: venus: Add sc7280 dt schema")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/d94924e1bd00f396f2106f04d4a2bb839cf5f071.1636453406.git.geert+renesas@glider.be

dt-bindings: Remove Netlogic bindings

Support for Netlogic was removed in commit 95b8a5e0111a ("MIPS: Remove
NETLOGIC support"). Remove the now unused bindings.

The GPIO binding also includes "brcm,vulcan-gpio", but it appears to be
unused as well as Broadcom Vulkan became Cavium ThunderX2 which is ACPI
based.

Cc: Linus Walleij <[email protected]>
Cc: Bartosz Golaszewski <[email protected]>
Cc: George Cherian <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Acked-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

clk: versatile: clk-icst: Ensure clock names are unique

Commit 2d3de197a818 ("ARM: dts: arm: Update ICST clock nodes 'reg' and
node names") moved to using generic node names. That results in trying
to register multiple clocks with the same name. Fix this by including
the unit-address in the clock name.

Fixes: 2d3de197a818 ("ARM: dts: arm: Update ICST clock nodes 'reg' and node names")
Cc: [email protected]
Cc: Linus Walleij <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

of: Support using 'mask' in making device bus id

Commit 25b892b583cc ("ARM: dts: arm: Update register-bit-led nodes
'reg' and node names") added a 'reg' property to nodes. This change has
the side effect of changing how the kernel generates the device name.
The assumption was a translatable 'reg' address is unique. However, in
the case of the register-bit-led binding (and a few others) that is not
the case. The 'mask' property must also be used in this case to make a
unique device name.

Fixes: 25b892b583cc ("ARM: dts: arm: Update register-bit-led nodes 'reg' and node names")
Reported-by: Guenter Roeck <[email protected]>
Cc: [email protected]
Cc: Frank Rowand <[email protected]>
Cc: Linus Walleij <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: treewide: Update @st.com email address to @foss.st.com

Not all @st.com email address are concerned, only people who have
a specific @foss.st.com email will see their entry updated.
For some people, who left the company, remove their email.

Cc: Alexandre Torgue <[email protected]>
Cc: Arnaud Pouliquen <[email protected]>
Cc: Fabien Dessenne <[email protected]>
Cc: Christophe Roullier <[email protected]>
Cc: Gabriel Fernandez <[email protected]>
Cc: Lionel Debieve <[email protected]>
Cc: Amelie Delaunay <[email protected]>
Cc: Pierre-Yves MORDRET <[email protected]>
Cc: Ludovic Barre <[email protected]>
Cc: Christophe Kerello <[email protected]>
Cc: pascal Paillet <[email protected]>
Cc: Erwan Le Ray <[email protected]>
Cc: Philippe CORNU <[email protected]>
Cc: Yannick Fertre <[email protected]>
Cc: Fabrice Gasnier <[email protected]>
Cc: Olivier Moysan <[email protected]>
Cc: Hugues Fruchet <[email protected]>
Signed-off-by: Patrice Chotard <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Mark Brown <[email protected]>
Acked-by: Lee Jones <[email protected]>
Acked-By: Vinod Koul <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml

Benjamin has left the company, remove his name from maintainers.

Signed-off-by: Patrice Chotard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: media: Update maintainers for st,stm32-cec.yaml

Benjamin has left the company, remove his name from maintainers.

Signed-off-by: Patrice Chotard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: mfd: timers: Update maintainers for st,stm32-timers

Benjamin has left the company, remove his name from maintainers.

Signed-off-by: Patrice Chotard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: timer: Update maintainers for st,stm32-timer

Benjamin has left the company, add Fabrice and myself as maintainers.

Signed-off-by: Patrice Chotard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz

clock-frequency is only restricted by the upper limit of 400 kHz.

Found with:
$ DT_SCHEMA_FILES=Documentation/devicetree/bindings/i2c/i2c-imx.yaml make dtbs_check
...
arch/arm64/boot/dts/freescale/imx8mq-librem5-r2.dt.yaml: i2c@30a20000: clock-frequency:0:0: 387000 is not one of [100000, 400000]
From schema: linux/Documentation/devicetree/bindings/i2c/i2c-imx.yaml
...

Fixes: 4bdc44347299 ("dt-bindings: i2c: Convert imx i2c to json-schema")
Signed-off-by: David Heidelberg <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml

Convert toshiba,tc358767.txt binding to yaml format

Signed-off-by: Rahul T R <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: Rename Ingenic CGU headers to ingenic,*.h

Tidy up a bit the tree, by prefixing all include/dt-bindings/clock/ files
related to Ingenic SoCs with 'ingenic,'.

Signed-off-by: Paul Cercueil <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Stephen Boyd <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'drm-misc-fixes-2021-11-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

* dma-buf: name_lock fixes
* prime: Keep object ref during mmap
* nouveau: Fix a refcount issue; Fix device removal; Protect client
list with dedicated mutex; Fix address CE0 address calculation
* ttm: Fix race condition during BO eviction

Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

ksmbd: Use the SMB3_Create definitions from the shared

Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: Move more definitions into the shared area

Move SMB2_SessionSetup, SMB2_Close, SMB2_Read, SMB2_Write and
SMB2_ChangeNotify commands into smbfs_common/smb2pdu.h

Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: use the common definitions for NEGOTIATE_PROTOCOL

Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: switch to use shared definitions where available

Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: change LeaseKey data type to u8 array

cifs define LeaseKey as u8 array in structure. To move lease structure
to smbfs_common, ksmbd change LeaseKey data type to u8 array.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: remove smb2_buf_length in smb2_transform_hdr

To move smb2_transform_hdr to smbfs_common, This patch remove
smb2_buf_length variable in smb2_transform_hdr.

Cc: Ronnie Sahlberg <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: remove smb2_buf_length in smb2_hdr

To move smb2_hdr to smbfs_common, This patch remove smb2_buf_length
variable in smb2_hdr. Also, declare smb2_get_msg function to get smb2
request/response from ->request/response_buf.

Cc: Ronnie Sahlberg <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: remove md4 leftovers

As NTLM authentication is removed, md4 is no longer used.
ksmbd remove md4 leftovers, i.e. select CRYPTO_MD4, MODULE_SOFTDEP md4.

Acked-by: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

ksmbd: set unique value to volume serial field in FS_VOLUME_INFORMATION

Steve French reported ksmbd set fixed value to volume serial field in
FS_VOLUME_INFORMATION. Volume serial value needs to be set to a unique
value for client fscache. This patch set crc value that is generated
with share name, path name and netbios name to volume serial.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: [email protected] # v5.15
Reported-by: Steve French <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>

io-wq: serialize hash clear with wakeup

We need to ensure that we serialize the stalled and hash bits with the
wait_queue wait handler, or we could be racing with someone modifying
the hashed state after we find it busy, but before we then give up and
wait for it to be cleared. This can cause random delays or stalls when
handling buffered writes for many files, where some of these files cause
hash collisions between the worker threads.

Cc: [email protected]
Reported-by: Daniel Black <[email protected]>
Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <[email protected]>

BackMerge tag 'v5.15' into drm-next

I got a drm-fixes which had some 5.15 stuff in it, so to avoid
the mess just backmerge here.

Linux 5.15

Signed-off-by: Dave Airlie <[email protected]>

Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"Revert conversion to struct device.driver instead of struct
  pci_dev.driver.

  The device.driver is set earlier, and using it caused the PCI core to
  call driver PM entry points before .probe() and after .remove(), when
  the driver isn't prepared.

  This caused NULL pointer dereferences in i2c_designware_pci and
  probably other driver issues"

* tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
  Revert "PCI: Remove struct pci_dev->driver"

libata: libahci: declare ahci_shost_attr_group as static

ahci_shost_attr_group is referenced only in drivers/ata/libahci.c.
Declare it as static.

Fixes: c3f69c7f629f ("scsi: ata: Switch to attribute groups")
Cc: Bart Van Assche <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

libata: add horkage for missing Identify Device log

ACS-3 introduced the ATA Identify Device Data log as mandatory. A
warning message currently signals to the user if a device does not
report supporting this log page in the log directory page, regardless
of the ATA version of the device. Furthermore, this warning will appear
for all attempts at accessing this missing log page during device
revalidation.

Since it is useless to constantly access the log directory and warn
about this lack of support once we have discovered that the device
does not support this log page, introduce the horkage flag
ATA_HORKAGE_NO_ID_DEV_LOG to mark a device as lacking support for
the Identify Device Data log page. Set this flag when
ata_log_supported() returns false in ata_identify_page_supported().
The warning is printed only if the device ATA level is 10 or above
(ACS-3 or above), and only once on device scan. With this flag set, the
log directory page is not accessed again to test for Identify Device
Data log page support.

Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>

Merge tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull KCSAN updates from Paul McKenney:
"This contains initialization fixups, testing improvements, addition of
  instruction pointer to data-race reports, and scoped data-race checks"

* tag 'kcsan.2021.11.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  kcsan: selftest: Cleanup and add missing __init
  kcsan: Move ctx to start of argument list
  kcsan: Support reporting scoped read-write access type
  kcsan: Start stack trace with explicit location if provided
  kcsan: Save instruction pointer for scoped accesses
  kcsan: Add ability to pass instruction pointer of access to reporting
  kcsan: test: Fix flaky test case
  kcsan: test: Use kunit_skip() to skip tests
  kcsan: test: Defer kcsan_test_init() after kunit initialization

Merge tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
"Features
   - use per file locks for transactional queries
   - update policy management capability checks to work with LSM stacking

  Bug Fixes:
   - check/put label on apparmor_sk_clone_security()
   - fix error check on update of label hname
   - fix introspection of of task mode for unconfined tasks

  Cleanups:
   - avoid -Wempty-body warning
   - remove duplicated 'Returns:' comments
   - fix doc warning
   - remove unneeded one-line hook wrappers
   - use struct_size() helper in kzalloc()
   - fix zero-length compiler warning in AA_BUG()
   - file.h: delete duplicated word
   - delete repeated words in comments
   - remove repeated declaration"

* tag 'apparmor-pr-2021-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: remove duplicated 'Returns:' comments
  apparmor: remove unneeded one-line hook wrappers
  apparmor: Use struct_size() helper in kzalloc()
  apparmor: fix zero-length compiler warning in AA_BUG()
  apparmor: use per file locks for transactional queries
  apparmor: fix doc warning
  apparmor: Remove the repeated declaration
  apparmor: avoid -Wempty-body warning
  apparmor: Fix internal policy capable check for policy management
  apparmor: fix error check
  security: apparmor: delete repeated words in comments
  security: apparmor: file.h: delete duplicated word
  apparmor: switch to apparmor to internal capable check for policy management
  apparmor: update policy capable checks to use a label
  apparmor: fix introspection of of task mode for unconfined tasks
  apparmor: check/put label on apparmor_sk_clone_security()

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
"The post-linux-next material.

  7 patches.

  Subsystems affected by this patch series (all mm): debug,
  slab-generic, migration, memcg, and kasan"

* emailed patches from Andrew Morton <[email protected]>:
  kasan: add kasan mode messages when kasan init
  mm: unexport {,un}lock_page_memcg
  mm: unexport folio_memcg_{,un}lock
  mm/migrate.c: remove MIGRATE_PFN_LOCKED
  mm: migrate: simplify the file-backed pages validation when migrating its mapping
  mm: allow only SLUB on PREEMPT_RT
  mm/page_owner.c: modify the type of argument "order" in some functions

Merge tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:
"Only two changes.

  One removes the now unused CONFIG_MCPU32 symbol. The other sets a
  default for the CONFIG_MEMORY_RESERVE config symbol (this aids
  scripting and other automation) so you don't interactively get asked
  for a value at configure time.

  Summary:

   - remove unused CONFIG_MCPU32 symbol

   - default CONFIG_MEMORY_RESERVE value (for scripting)"

* tag 'm68knommu-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: Remove MCPU32 config symbol
  m68k: set a default value for MEMORY_RESERVE

smb3: add additional null check in SMB2_ioctl

Although unlikely for it to be possible for rsp to be null here,
the check is safer to add, and quiets a Coverity warning.

Addresses-Coverity: 1443909 ("Explicit Null dereference")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"

This reverts commit 2a4d9408c9e8b6f6fc150c66f3fef755c9e20d4a.

Robert reported a NULL pointer dereference caused by the PCI core
(local_pci_probe()) calling the i2c_designware_pci driver's
.runtime_resume() method before the .probe() method. i2c_dw_pci_resume()
depends on initialization done by i2c_dw_pci_probe().

Prior to 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver"), pci_pm_runtime_resume() avoided calling the
.runtime_resume() method because pci_dev->driver had not been set yet.

2a4d9408c9e8 and b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"),
removed pci_dev->driver, replacing it by device->driver, which *has* been
set by this time, so pci_pm_runtime_resume() called the .runtime_resume()
method when it previously had not.

Fixes: 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/
Reported-by: Robert Święcki <[email protected]>
Tested-by: Robert Święcki <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Revert "PCI: Remove struct pci_dev->driver"

This reverts commit b5f9c644eb1baafcd349ad134e2110773f8d0a38.

Revert b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"), which is needed
to revert 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
pci_dev->driver").

2a4d9408c9e8 caused a NULL pointer dereference reported by Robert Święcki.
Details in the revert of that commit.

Fixes: 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/
Reported-by: Robert Święcki <[email protected]>
Tested-by: Robert Święcki <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

block: Hold invalidate_lock in BLKRESETZONE ioctl

When BLKRESETZONE ioctl and data read race, the data read leaves stale
page cache. The commit e5113505904e ("block: Discard page cache of zone
reset target range") added page cache truncation to avoid stale page
cache after the ioctl. However, the stale page cache still can be read
during the reset zone operation for the ioctl. To avoid the stale page
cache completely, hold invalidate_lock of the block device file mapping.

Fixes: e5113505904e ("block: Discard page cache of zone reset target range")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Cc: [email protected] # v5.15
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: rename blk_attempt_bio_merge

It is very annoying to have two block layer functions which share same
name, so rename blk_attempt_bio_merge in blk-mq.c as
blk_mq_attempt_bio_merge.

Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge

blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(),
which is called when queue usage counter is grabbed already:

1) blk_mq_get_new_requests()

2) blk_mq_get_request()
- cached request in current plug owns one queue usage counter

So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more
importantly this nest way causes hang in blk_mq_freeze_queue_wait().

Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: fix kerneldoc for disk_register_independent_access__ranges()

The naming got changed as part of a revision of the patchset, but the
kerneldoc apparently never got updated. Fix it.

Reported-by: kernel test robot <[email protected]>
Fixes: a2247f19ee1c ("block: Add independent access ranges support")
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Two locking fixes:

   - Add mutex protection to ring_buffer_reset()

   - Fix deadlock in modify_ftrace_direct_multi()"

* tag 'trace-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/direct: Fix lockup in modify_ftrace_direct_multi
  ring-buffer: Protect ring_buffer_reset() from reentrancy

Merge tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - bpf: do not reject when the stack read size is different from the
     tracked scalar size

   - net: fix premature exit from NAPI state polling in napi_disable()

   - riscv, bpf: fix RV32 broken build, and silence RV64 warning

  Current release - new code bugs:

   - net: fix possible NULL deref in sock_reserve_memory

   - amt: fix error return code in amt_init(); fix stopping the
     workqueue

   - ax88796c: use the correct ioctl callback

  Previous releases - always broken:

   - bpf: stop caching subprog index in the bpf_pseudo_func insn

   - security: fixups for the security hooks in sctp

   - nfc: add necessary privilege flags in netlink layer, limit
     operations to admin only

   - vsock: prevent unnecessary refcnt inc for non-blocking connect

   - net/smc: fix sk_refcnt underflow on link down and fallback

   - nfnetlink_queue: fix OOB when mac header was cleared

   - can: j1939: ignore invalid messages per standard

   - bpf, sockmap:
      - fix race in ingress receive verdict with redirect to self
      - fix incorrect sk_skb data_end access when src_reg = dst_reg
      - strparser, and tls are reusing qdisc_skb_cb and colliding

   - ethtool: fix ethtool msg len calculation for pause stats

   - vlan: fix a UAF in vlan_dev_real_dev() when ref-holder tries to
     access an unregistering real_dev

   - udp6: make encap_rcv() bump the v6 not v4 stats

   - drv: prestera: add explicit padding to fix m68k build

   - drv: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge

   - drv: mvpp2: fix wrong SerDes reconfiguration order

  Misc & small latecomers:

   - ipvs: auto-load ipvs on genl access

   - mctp: sanity check the struct sockaddr_mctp padding fields

   - libfs: support RENAME_EXCHANGE in simple_rename()

   - avoid double accounting for pure zerocopy skbs"

* tag 'net-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (123 commits)
  selftests/net: udpgso_bench_rx: fix port argument
  net: wwan: iosm: fix compilation warning
  cxgb4: fix eeprom len when diagnostics not implemented
  net: fix premature exit from NAPI state polling in napi_disable()
  net/smc: fix sk_refcnt underflow on linkdown and fallback
  net/mlx5: Lag, fix a potential Oops with mlx5_lag_create_definer()
  gve: fix unmatched u64_stats_update_end()
  net: ethernet: lantiq_etop: Fix compilation error
  selftests: forwarding: Fix packet matching in mirroring selftests
  vsock: prevent unnecessary refcnt inc for nonblocking connect
  net: marvell: mvpp2: Fix wrong SerDes reconfiguration order
  net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory
  net: stmmac: allow a tc-taprio base-time of zero
  selftests: net: test_vxlan_under_vrf: fix HV connectivity test
  net: hns3: allow configure ETS bandwidth of all TCs
  net: hns3: remove check VF uc mac exist when set by PF
  net: hns3: fix some mac statistics is always 0 in device version V2
  net: hns3: fix kernel crash when unload VF while it is being reset
  net: hns3: sync rx ring head in echo common pull
  net: hns3: fix pfc packet number incorrect after querying pfc parameters
  ...

Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
"Here is a single fix for 5.16-rc1 to resolve a build problem that came
  in through the coresight tree (and as such came in through the
  char/misc tree merge in the 5.16-rc1 merge window).

  It resolves a build problem with 'allmodconfig' on arm64 and is acked
  by the proper subsystem maintainers. It has been in linux-next all
  week with no reported problems"

* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  arm64: cpufeature: Export this_cpu_has_cap helper

Merge tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small reverts and fixes for USB drivers for issues that
  came up during the 5.16-rc1 merge window.

  These include:

   - two reverts of xhci and USB core patches that are causing problems
     in many systems.

   - xhci 3.1 enumeration delay fix for systems that were having
     problems.

  All three of these have been in linux-next all week with no reported
  issues"

* tag 'usb-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
  Revert "usb: core: hcd: Add support for deferring roothub registration"
  Revert "xhci: Set HCD flag to defer primary roothub registration"

kasan: add kasan mode messages when kasan init

There are multiple kasan modes. It makes sense that we add some
messages to know which kasan mode is active when booting up [1].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212195
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Yee Lee <[email protected]>
Cc: Nicholas Tang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: unexport {,un}lock_page_memcg

These are only used in built-in core mm code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: unexport folio_memcg_{,un}lock

Patch series "unexport memcg locking helpers".

Neither the old page-based nor the new folio-based memcg locking helpers
are used in modular code at all, so drop the exports.

This patch (of 2):

folio_memcg_{,un}lock are only used in built-in core mm code.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/migrate.c: remove MIGRATE_PFN_LOCKED

MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect().  If it
wasn't then the a second attempt is made to lock the page.  However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity.  So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alistair Popple <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ben Skeggs <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: migrate: simplify the file-backed pages validation when migrating its mapping

There is no need to validate the file-backed page's refcount before
trying to freeze the page's expected refcount, instead we can rely on
the folio_ref_freeze() to validate if the page has the expected refcount
before migrating its mapping.

Moreover we are always under the page lock when migrating the page
mapping, which means nowhere else can remove it from the page cache, so
we can remove the xas_load() validation under the i_pages lock.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/df4c129fd8e86a95dbc55f4663d77441cc0d3bd1.1629447552.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Alistair Popple <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: allow only SLUB on PREEMPT_RT

Memory allocators may disable interrupts or preemption as part of the
allocation and freeing process.  For PREEMPT_RT it is important that
these sections remain deterministic and short and therefore don't depend
on the size of the memory to allocate/ free or the inner state of the
algorithm.

Until v3.12-RT the SLAB allocator was an option but involved several
changes to meet all the requirements.  The SLUB design fits better with
PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT
patchset.  Comparing the two allocator, SLUB outperformed SLAB in both
throughput (time needed to allocate and free memory) and the maximal
latency of the system measured with cyclictest during hackbench.

SLOB was never evaluated since it was unlikely that it preforms better
than SLAB.  During a quick test, the kernel crashed with SLOB enabled
during boot.

Disable SLAB and SLOB on PREEMPT_RT.

[[email protected]: commit description]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/page_owner.c: modify the type of argument "order" in some functions

The type of "order" in struct page_owner is unsigned short.
However, it is unsigned int in the following 3 functions:

  __reset_page_owner
  __set_page_owner_handle
  __set_page_owner_handle

The type of "order" in argument list is unsigned int, which is
inconsistent.

[[email protected]: update include/linux/page_owner.h]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yixuan Cao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

xfs: sync xfs_btree_split macros with userspace libxfs

Sync this one last bit of discrepancy between kernel and userspace
libxfs.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>

Merge branch 'kvm-5.16-fixes' into kvm-master

* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status

* Fix selftests on APICv machines

* Fix sparse warnings

* Fix detection of KVM features in CPUID

* Cleanups for bogus writes to MSR_KVM_PV_EOI_EN

* Fixes and cleanups for MSR bitmap handling

* Cleanups for INVPCID

* Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures

Merge branch 'kvm-sev-move-context' into kvm-master

Add support for AMD SEV and SEV-ES intra-host migration support.  Intra
host migration provides a low-cost mechanism for userspace VMM upgrades.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant.  As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next.  In-kernel
hand off makes it possible to move any data that would be
unsafe/impossible for the kernel to hand directly to userspace, and
cannot be reproduced using data that can be handed to userspace.

Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS

KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.

Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.

For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.

Suggested-by: Eduardo Habkost <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20211111134733 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()

Handle #GP on INVPCID due to an invalid type in the common switch
statement instead of relying on the callers (VMX and SVM) to manually
validate the type.

Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check
the type before reading the operand from memory, so deferring the
type validity check until after that point is architecturally allowed.

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <20211109174426.2350547 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT

handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2
field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that
holds the invalidation type. Add a helper to retrieve reg2 from VMX
instruction info to consolidate and document the shift+mask magic.

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <20211109174426.2350547 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Clean up x2APIC MSR handling for L2

Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations. Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20211109013047.2041518 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Macrofy the MSR bitmap getters and setters

Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20211109013047.2041518 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Handle dynamic MSR intercept toggling

Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02. This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.

The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.

The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.

Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.

Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.

Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Fixes: d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: [email protected]
Cc: Alexander Graf <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20211109013047.2041518 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use

Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.

Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.

Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.

Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20211109013047.2041518 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN

When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails,
MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to
expect that the value we keep internally in KVM wasn't updated.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20211108152819 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Rename kvm_lapic_enable_pv_eoi()

kvm_lapic_enable_pv_eoi() is a misnomer as the function is also
used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi().

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20211108152819 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>