Git Repo - linux.git/log

Input: aiptek - properly check endpoint type

Syzbot reported warning in usb_submit_urb() which is caused by wrong
endpoint type. There was a check for the number of endpoints, but not
for the type of endpoint.

Fix it by replacing old desc.bNumEndpoints check with
usb_find_common_endpoints() helper for finding endpoints

Fail log:

usb 5-1: BOGUS urb xfer, pipe 1 != type 3
WARNING: CPU: 2 PID: 48 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502
Modules linked in:
CPU: 2 PID: 48 Comm: kworker/2:2 Not tainted 5.17.0-rc6-syzkaller-00226-g07ebd38a0da2 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: usb_hub_wq hub_event
...
Call Trace:
<TASK>
aiptek_open+0xd5/0x130 drivers/input/tablet/aiptek.c:830
input_open_device+0x1bb/0x320 drivers/input/input.c:629
kbd_connect+0xfe/0x160 drivers/tty/vt/keyboard.c:1593

Fixes: 8e20cf2bce12 ("Input: aiptek - fix crash on detecting device without endpoints")
Reported-and-tested-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

block: don't merge across cgroup boundaries if blkcg is enabled

blk-iocost and iolatency are cgroup aware rq-qos policies but they didn't
disable merges across different cgroups. This obviously can lead to
accounting and control errors but more importantly to priority inversions -
e.g. an IO which belongs to a higher priority cgroup or IO class may end up
getting throttled incorrectly because it gets merged to an IO issued from a
low priority cgroup.

Fix it by adding blk_cgroup_mergeable() which is called from merge paths and
rejects cross-cgroup and cross-issue_as_root merges.

Signed-off-by: Tejun Heo <[email protected]>
Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Cc: [email protected] # v4.19+
Cc: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/Yi/eE/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net coming late
in the 5.17-rc process:

1) Revert port remap to mitigate shadowing service ports, this is causing
   problems in existing setups and this mitigation can be achieved with
   explicit ruleset, eg.

... tcp sport < 16386 tcp dport >= 32768 masquerade random

  This patches provided a built-in policy similar to the one described above.

2) Disable register tracking infrastructure in nf_tables. Florian reported
   two issues:

   - Existing expressions with no implemented .reduce interface
     that causes data-store on register should cancel the tracking.
   - Register clobbering might be possible storing data on registers that
     are larger than 32-bits.

   This might lead to generating incorrect ruleset bytecode. These two
   issues are scheduled to be addressed in the next release cycle.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: disable register tracking
  Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
  Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: marvell: Fix invalid comparison in the resume and suspend functions

This bug resulted in only the current mode being resumed and suspended when
the PHY supported both fiber and copper modes and when the PHY only supported
copper mode the fiber mode would incorrectly be attempted to be resumed and
suspended.

Fixes: 3758be3dc162 ("Marvell phy: add functions to suspend and resume both interfaces: fiber and copper links.")
Signed-off-by: Kurt Cancemi <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

block: fix rq-qos breakage from skipping rq_qos_done_bio()

a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't
tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set.
While this fixed a potential oops, it also broke blk-iocost by skipping the
done_bio callback for merged bios.

Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(),
rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED
distinguishing the former from the latter. rq_qos_done_bio() is not called
for bios which wenth through rq_qos_merge(). This royally confuses
blk-iocost as the merged bios never finish and are considered perpetually
in-flight.

One reliably reproducible failure mode is an intermediate cgroup geting
stuck active preventing its children from being activated due to the
leaf-only rule, leading to loss of control. The following is from
resctl-bench protection scenario which emulates isolating a web server like
workload from a memory bomb run on an iocost configuration which should
yield a reasonable level of protection.

  # cat /sys/block/nvme2n1/device/model
  Samsung SSD 970 PRO 512GB
  # cat /sys/fs/cgroup/io.cost.model
  259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025
  # cat /sys/fs/cgroup/io.cost.qos
  259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00
  # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
  ...
  Memory Hog Summary
  ==================

  IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m
              W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m

  Isolation and Request Latency Impact Distributions:

                min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
  isol%       15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82
  lat-imp%        0     0     0     0     0  4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6

  Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96%

The isolation result of 58.12% is close to what this device would show
without any IO control.

Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and
calling rq_qos_done_bio() on them too. For consistency and clarity, rename
BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into
rq_qos_done_bio() so that it's next to the code paths that set the flags.

With the patch applied, the above same benchmark shows:

  # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
  ...
  Memory Hog Summary
  ==================

  IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m
              W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m

  Isolation and Request Latency Impact Distributions:

                min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
  isol%       84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42  2.81
  lat-imp%        0     0     0     0     0  2.81  5.73 11.11 13.92 17.53 22.61  4.10  4.68

  Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0%

Signed-off-by: Tejun Heo <[email protected]>
Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked")
Cc: [email protected] # v5.15+
Cc: Ming Lei <[email protected]>
Cc: Yu Kuai <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: release rq qos structures for queue without disk

blkcg_init_queue() may add rq qos structures to request queue, previously
blk_cleanup_queue() calls rq_qos_exit() to release them, but commit
8e141f9eb803 ("block: drain file system I/O on del_gendisk")
moves rq_qos_exit() into del_gendisk(), so memory leak is caused
because queues may not have disk, such as un-present scsi luns, nvme
admin queue, ...

Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back.

BTW, v5.18 won't need this patch any more since we move
blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release
handler, and patches have been in for-5.18/block.

Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
Reported-by: [email protected]
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'for-next/spectre-bhb' into for-next/core

Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.

* for-next/spectre-bhb: (21 commits)
  arm64: Do not include __READ_ONCE() block in assembly files
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  ...

Merge branch 'for-next/fpsimd' into for-next/core

* for-next/fpsimd:
  arm64: cpufeature: Warn if we attempt to read a zero width field
  arm64: cpufeature: Add missing .field_width for GIC system registers
  arm64: signal: nofpsimd: Do not allocate fp/simd context when not available
  arm64: cpufeature: Always specify and use a field width for capabilities
  arm64: Always use individual bits in CPACR floating point enables
  arm64: Define CPACR_EL1_FPEN similarly to other floating point controls

Merge branch 'for-next/strings' into for-next/core

* for-next/strings:
  Revert "arm64: Mitigate MTE issues with str{n}cmp()"
  arm64: lib: Import latest version of Arm Optimized Routines' strncmp
  arm64: lib: Import latest version of Arm Optimized Routines' strcmp

Merge branch 'for-next/rng' into for-next/core

* for-next/rng:
arm64: random: implement arch_get_random_int/_long based on RNDR

Merge branch 'for-next/perf' into for-next/core

* for-next/perf: (25 commits)
  perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
  drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
  drivers/perf: arm_pmu: Handle 47 bit counters
  arm64: perf: Consistently make all event numbers as 16-bits
  arm64: perf: Expose some Armv9 common events under sysfs
  perf/marvell: cn10k DDR perf event core ownership
  perf/marvell: cn10k DDR perfmon event overflow handling
  perf/marvell: CN10k DDR performance monitor support
  dt-bindings: perf: marvell: cn10k ddr performance monitor
  perf/arm-cmn: Update watchpoint format
  perf/arm-cmn: Hide XP PUB events for CMN-600
  perf: replace bitmap_weight with bitmap_empty where appropriate
  perf: Replace acpi_bus_get_device()
  perf/marvell_cn10k: Fix unused variable warning when W=1 and CONFIG_OF=n
  perf/arm-cmn: Make arm_cmn_debugfs static
  perf: MARVELL_CN10K_TAD_PMU should depend on ARCH_THUNDER
  perf/arm-ccn: Use platform_get_irq() to get the interrupt
  irqchip/apple-aic: Move PMU-specific registers to their own include file
  arm64: dts: apple: Add t8303 PMU nodes
  arm64: dts: apple: Add t8103 PMU interrupt affinities
  ...

Merge branch 'for-next/pauth' into for-next/core

* for-next/pauth:
  arm64: Add support of PAuth QARMA3 architected algorithm
  arm64: cpufeature: Mark existing PAuth architected algorithm as QARMA5
  arm64: cpufeature: Account min_field_value when cheking secondaries for PAuth

Merge branch 'for-next/mte' into for-next/core

* for-next/mte:
  docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
  arm64/mte: Remove asymmetric mode from the prctl() interface
  kasan: fix a missing header include of static_keys.h
  arm64/mte: Add userspace interface for enabling asymmetric mode
  arm64/mte: Add hwcap for asymmetric mode
  arm64/mte: Add a little bit of documentation for mte_update_sctlr_user()
  arm64/mte: Document ABI for asymmetric mode
  arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary
  kasan: split kasan_*enabled() functions into a separate header

Merge branch 'for-next/mm' into for-next/core

* for-next/mm:
  Documentation: vmcoreinfo: Fix htmldocs warning
  arm64/mm: Drop use_1G_block()
  arm64: avoid flushing icache multiple times on contiguous HugeTLB
  arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges
  arm64/hugetlb: Define __hugetlb_valid_size()
  arm64/mm: avoid fixmap race condition when create pud mapping
  arm64/mm: Consolidate TCR_EL1 fields

Merge branch 'for-next/misc' into for-next/core

* for-next/misc:
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  arm64: clean up tools Makefile
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  arm64: prevent instrumentation of bp hardening callbacks
  arm64: cpufeature: Remove cpu_has_fwb() check
  arm64: atomics: remove redundant static branch
  arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set

Merge branch 'for-next/linkage' into for-next/core

* for-next/linkage:
  arm64: module: remove (NOLOAD) from linker script
  linkage: remove SYM_FUNC_{START,END}_ALIAS()
  x86: clean up symbol aliasing
  arm64: clean up symbol aliasing
  linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()

Merge branch 'for-next/kselftest' into for-next/core

* for-next/kselftest:
  kselftest/arm64: Log the PIDs of the parent and child in sve-ptrace
  kselftest/arm64: signal: Allow tests to be incompatible with features
  kselftest/arm64: mte: user_mem: test a wider range of values
  kselftest/arm64: mte: user_mem: add more test types
  kselftest/arm64: mte: user_mem: add test type enum
  kselftest/arm64: mte: user_mem: check different offsets and sizes
  kselftest/arm64: mte: user_mem: rework error handling
  kselftest/arm64: mte: user_mem: introduce tag_offset and tag_len
  kselftest/arm64: Remove local definitions of MTE prctls
  kselftest/arm64: Remove local ARRAY_SIZE() definitions

Merge branch 'for-next/insn' into for-next/core

* for-next/insn:
  arm64: insn: add encoders for atomic operations
  arm64: move AARCH64_BREAK_FAULT into insn-def.h
  arm64: insn: Generate 64 bit mask immediates correctly

Merge branch 'for-next/errata' into for-next/core

* for-next/errata:
arm64: Add cavium_erratum_23154_cpus missing sentinel
irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

Merge branch 'for-next/docs' into for-next/core

* for-next/docs:
arm64/mte: Clarify mode reported by PR_GET_TAGGED_ADDR_CTRL
arm64: booting.rst: Clarify on requiring non-secure EL2

Merge branch 'for-next/coredump' into for-next/core

* for-next/coredump:
  arm64: Change elfcore for_each_mte_vma() to use VMA iterator
  arm64: mte: Document the core dump file format
  arm64: mte: Dump the MTE tags in the core file
  arm64: mte: Define the number of bytes for storing the tags in a page
  elf: Introduce the ARM MTE ELF segment type
  elfcore: Replace CONFIG_{IA64, UML} checks with a new option

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
"A last minute regression fix.

  I thought we did a lot of testing, but a regression still managed to
  sneak in. The fix seems trivial"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: allow batching hint without size

Merge tag 'v5.17-rc8' into irq/core, to fix conflicts

Conflicts:
drivers/pinctrl/pinctrl-starfive.c

Signed-off-by: Ingo Molnar <[email protected]>

esp6: fix check on ipv6_skip_exthdr's return value

Commit 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:

IPv6 | ext | ... | ext | UDP | ESP | ...

That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.

In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).

Fixes: 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr")
Reported-by: Xiumei Mu <[email protected]>
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>

net: dsa: microchip: add spi_device_id tables

Add spi_device_id tables to avoid logs like "SPI driver ksz9477-switch
has no spi_device_id".

Signed-off-by: Claudiu Beznea <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'irqchip-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

  - Add support for the STM32MP13 variant

  - Move parent device away from struct irq_chip

  - Remove all instances of non-const strings assigned to
    struct irq_chip::name, enabling a nice cleanup for VIC and GIC)

  - Simplify the Qualcomm PDC driver

  - A bunch of SiFive PLIC cleanups

  - Add support for a new variant of the Meson GPIO block

  - Add support for the irqchip side of the Apple M1 PMU

  - Add support for the Apple M1 Pro/Max AICv2 irqchip

  - Add support for the Qualcomm MPM wakeup gadget

  - Move the Xilinx driver over to the generic irqdomain handling

  - Tiny speedup for IPIs on GICv3 systems

  - The usual odd cleanups

Link: https://lore.kernel.org/all/[email protected]

Merge tag 'timers-v5.18-rc1' of https://git.linaro.org/people/daniel.lezcano/linux into timers/core

Pull clocksource/events updates from Daniel Lezcano:

  - Fix return error code check for the timer-of layer when getting
    the base address (Guillaume Ranquet)

  - Remove MMIO dependency, add notrace annotation for sched_clock
    and increase the timer resolution for the Microchip
    PIT64b (Claudiu Beznea)

  - Convert DT bindings to yaml for the Tegra timer (David Heidelberg)

  - Fix compilation error on architecture other than ARM for the
    i.MX TPM (Nathan Chancellor)

  - Add support for the event stream scaling for 1GHz counter on
    the arch ARM timer (Marc Zyngier)

  - Support a higher number of interrupts by the Exynos MCT timer
    driver (Alim Akhtar)

  - Detect and prevent memory corruption when the specified number
    of interrupts in the DTS is greater than the array size in the
    code for the Exynos MCT timer (Krzysztof Kozlowski)

  - Fix regression from a previous errata fix on the TI DM
    timer (Drew Fustini)

  - Several fixes and code improvements for the i.MX TPM
    driver (Peng Fan)

Link: https://lore.kernel.org/all/[email protected]

Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Pull tick/NOHZ updates from Frederic Weisbecker:

- A fix for rare jiffies update stalls that were reported by Paul McKenney

- Tick side cleanups after RCU_FAST_NO_HZ removal

- Handle softirqs on idle more gracefully

Link: https://lore.kernel.org/all/[email protected]

nvmet: use snprintf() with PAGE_SIZE in configfs

Instead of using sprintf, use snprintf with buffer size limited to
PAGE_SIZE just like what we have for the rest of the file.

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvmet: don't fold lines

Don't fold line that can fit into 80 char limit. No functional change
in this patch.

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal

This fixes following kernel-doc warning:-

drivers/nvme/target/rdma.c:1722: warning: expecting prototype for nvme_rdma_device_removal(). Prototype was for nvmet_rdma_device_removal() instead

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport

This fixes following kernel-doc warning:-

drivers/nvme/target/fc.c:1619: warning: expecting prototype for nvme_fc_unregister_targetport(). Prototype was for nvmet_fc_unregister_targetport() instead

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport

This fixes following kernel-doc warning :-

drivers/nvme/target/fc.c:1365: warning: expecting prototype for nvme_fc_register_targetport(). Prototype was for nvmet_fc_register_targetport() instead

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-tcp: lockdep: annotate in-kernel sockets

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/
Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-tcp: don't fold the line

The call to nvme_tcp_alloc_queue() fits perfectly in one line without
exceeding 80 char limit for the line.

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-tcp: don't initialize ret variable

No point in initializing ret variable to 0 in nvme_tcp_start_io_queue()
since it gets overwritten by a call to nvme_tcp_start_queue().

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio

Use bio_io_error() here since bio_io_error does the same thing.

Signed-off-by: Guoqing Jiang <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-multipath: use vmalloc for ANA log buffer

The ANA log buffer can get really large, as it depends on the
controller configuration. So to avoid an out-of-memory issue
during scanning use kvmalloc() instead of the kmalloc().

Signed-off-by: Hannes Reinecke <[email protected]>
Tested-by: Daniel Wagner <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST

This patch turns the new SHA driver into a tristate and also allows
compile testing.

Signed-off-by: Herbert Xu <[email protected]>

MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list

Zaibo moved projects and is not looking into crypto stuff.
I am responsible for checking the patches of these modules.
so the maintainers list needs to be updated.

I take care of HPRE, Qian Weili take care of TRNG,
Ye Kai and me take care of SEC2.

Signed-off-by: Longfang Liu <[email protected]>
Signed-off-by: Kai Ye <[email protected]>
Signed-off-by: Weili Qian <[email protected]>
Signed-off-by: Zaibo Xu <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

crypto: dh - Remove the unused function dh_safe_prime_dh_alg()

Fix the following W=1 kernel warnings:

crypto/dh.c:311:31: warning: unused function 'dh_safe_prime_dh_alg'
[-Wunused-function]

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

hwrng: nomadik - Change clk_disable to clk_disable_unprepare

The corresponding API for clk_prepare_enable is clk_disable_unprepare,
other than clk_disable_unprepare.

Fix this by changing clk_disable to clk_disable_unprepare.

Fixes: beca35d05cc2 ("hwrng: nomadik - use clk_prepare_enable()")
Signed-off-by: Miaoqian Lin <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

crypto: qcom-rng - ensure buffer for generate is completely filled

The generate function in struct rng_alg expects that the destination
buffer is completely filled if the function returns 0. qcom_rng_read()
can run into a situation where the buffer is partially filled with
randomness and the remaining part of the buffer is zeroed since
qcom_rng_generate() doesn't check the return value. This issue can
be reproduced by running the following from libkcapi:

    kcapi-rng -b 9000000 > OUTFILE

The generated OUTFILE will have three huge sections that contain all
zeros, and this is caused by the code where the test
'val & PRNG_STATUS_DATA_AVAIL' fails.

Let's fix this issue by ensuring that qcom_rng_read() always returns
with a full buffer if the function returns success. Let's also have
qcom_rng_generate() return the correct value.

Here's some statistics from the ent project
(https://www.fourmilab.ch/random/) that shows information about the
quality of the generated numbers:

    $ ent -c qcom-random-before
    Value Char Occurrences Fraction
      0           606748   0.067416
      1            33104   0.003678
      2            33001   0.003667
    ...
    253   �        32883   0.003654
    254   �        33035   0.003671
    255   �        33239   0.003693

    Total:       9000000   1.000000

    Entropy = 7.811590 bits per byte.

    Optimum compression would reduce the size
    of this 9000000 byte file by 2 percent.

    Chi square distribution for 9000000 samples is 9329962.81, and
    randomly would exceed this value less than 0.01 percent of the
    times.

    Arithmetic mean value of data bytes is 119.3731 (127.5 = random).
    Monte Carlo value for Pi is 3.197293333 (error 1.77 percent).
    Serial correlation coefficient is 0.159130 (totally uncorrelated =
    0.0).

Without this patch, the results of the chi-square test is 0.01%, and
the numbers are certainly not random according to ent's project page.
The results improve with this patch:

    $ ent -c qcom-random-after
    Value Char Occurrences Fraction
      0            35432   0.003937
      1            35127   0.003903
      2            35424   0.003936
    ...
    253   �        35201   0.003911
    254   �        34835   0.003871
    255   �        35368   0.003930

    Total:       9000000   1.000000

    Entropy = 7.999979 bits per byte.

    Optimum compression would reduce the size
    of this 9000000 byte file by 0 percent.

    Chi square distribution for 9000000 samples is 258.77, and randomly
    would exceed this value 42.24 percent of the times.

    Arithmetic mean value of data bytes is 127.5006 (127.5 = random).
    Monte Carlo value for Pi is 3.141277333 (error 0.01 percent).
    Serial correlation coefficient is 0.000468 (totally uncorrelated =
    0.0).

This change was tested on a Nexus 5 phone (msm8974 SoC).

Signed-off-by: Brian Masney <[email protected]>
Fixes: ceec5f5b5988 ("crypto: qcom-rng - Add Qcom prng driver")
Cc: [email protected] # 4.19+
Reviewed-by: Bjorn Andersson <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

Linux 5.17-rc8

drm/mgag200: Fix PLL setup for g200wb and g200ew

commit f86c3ed55920 ("drm/mgag200: Split PLL setup into compute and
update functions") introduced a regression for g200wb and g200ew.
The PLLs are not set up properly, and VGA screen stays
black, or displays "out of range" message.

MGA1064_WB_PIX_PLLC_N/M/P was mistakenly replaced with
MGA1064_PIX_PLLC_N/M/P which have different addresses.

Patch tested on a Dell T310 with g200wb

Fixes: f86c3ed55920 ("drm/mgag200: Split PLL setup into compute and update functions")
Cc: [email protected]
Signed-off-by: Jocelyn Falempe <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Free shmem backing storage for SGX enclave pages when those are
   swapped back into EPC memory

- Prevent do_int3() from being kprobed, to avoid recursion

- Remap setup_data and setup_indirect structures properly when
   accessing their members

- Correct the alternatives patching order for modules too

* tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Free backing memory after faulting the enclave page
  x86/traps: Mark do_int3() NOKPROBE_SYMBOL
  x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
  x86/boot: Fix memremap of setup_indirect structures
  x86/module: Fix the paravirt vs alternative order

random: check for signal and try earlier when generating entropy

Rather than waiting a full second in an interruptable waiter before
trying to generate entropy, try to generate entropy first and wait
second. While waiting one second might give an extra second for getting
entropy from elsewhere, we're already pretty late in the init process
here, and whatever else is generating entropy will still continue to
contribute. This has implications on signal handling: we call
try_to_generate_entropy() from wait_for_random_bytes(), and
wait_for_random_bytes() always uses wait_event_interruptible_timeout()
when waiting, since it's called by userspace code in restartable
contexts, where signals can pend. Since try_to_generate_entropy() now
runs first, if a signal is pending, it's necessary for
try_to_generate_entropy() to check for signals, since it won't hit the
wait until after try_to_generate_entropy() has returned. And even before
this change, when entering a busy loop in try_to_generate_entropy(), we
should have been checking to see if any signals are pending, so that a
process doesn't get stuck in that loop longer than expected.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: reseed more often immediately after booting

In order to chip away at the "premature first" problem, we augment our
existing entropy accounting with more frequent reseedings at boot.

The idea is that at boot, we're getting entropy from various places, and
we're not very sure which of early boot entropy is good and which isn't.
Even when we're crediting the entropy, we're still not totally certain
that it's any good. Since boot is the one time (aside from a compromise)
that we have zero entropy, it's important that we shepherd entropy into
the crng fairly often.

At the same time, we don't want a "premature next" problem, whereby an
attacker can brute force individual bits of added entropy. In lieu of
going full-on Fortuna (for now), we can pick a simpler strategy of just
reseeding more often during the first 5 minutes after boot. This is
still bounded by the 256-bit entropy credit requirement, so we'll skip a
reseeding if we haven't reached that, but in case entropy /is/ coming
in, this ensures that it makes its way into the crng rather rapidly
during these early stages.

Ordinarily we reseed if the previous reseeding is 300 seconds old. This
commit changes things so that for the first 600 seconds of boot time, we
reseed if the previous reseeding is uptime / 2 seconds old. That means
that we'll reseed at the very least double the uptime of the previous
reseeding.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: make consistent usage of crng_ready()

Rather than sometimes checking `crng_init < 2`, we should always use the
crng_ready() macro, so that should we change anything later, it's
consistent. Additionally, that macro already has a likely() around it,
which means we don't need to open code our own likely() and unlikely()
annotations.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: use SipHash as interrupt entropy accumulator

The current fast_mix() function is a piece of classic mailing list
crypto, where it just sort of sprung up by an anonymous author without a
lot of real analysis of what precisely it was accomplishing. As an ARX
permutation alone, there are some easily searchable differential trails
in it, and as a means of preventing malicious interrupts, it completely
fails, since it xors new data into the entire state every time. It can't
really be analyzed as a random permutation, because it clearly isn't,
and it can't be analyzed as an interesting linear algebraic structure
either, because it's also not that. There really is very little one can
say about it in terms of entropy accumulation. It might diffuse bits,
some of the time, maybe, we hope, I guess. But for the most part, it
fails to accomplish anything concrete.

As a reminder, the simple goal of add_interrupt_randomness() is to
simply accumulate entropy until ~64 interrupts have elapsed, and then
dump it into the main input pool, which uses a cryptographic hash.

It would be nice to have something cryptographically strong in the
interrupt handler itself, in case a malicious interrupt compromises a
per-cpu fast pool within the 64 interrupts / 1 second window, and then
inside of that same window somehow can control its return address and
cycle counter, even if that's a bit far fetched. However, with a very
CPU-limited budget, actually doing that remains an active research
project (and perhaps there'll be something useful for Linux to come out
of it). And while the abundance of caution would be nice, this isn't
*currently* the security model, and we don't yet have a fast enough
solution to make it our security model. Plus there's not exactly a
pressing need to do that. (And for the avoidance of doubt, the actual
cluster of 64 accumulated interrupts still gets dumped into our
cryptographically secure input pool.)

So, for now we are going to stick with the existing interrupt security
model, which assumes that each cluster of 64 interrupt data samples is
mostly non-malicious and not colluding with an infoleaker. With this as
our goal, we have a few more choices, simply aiming to accumulate
entropy, while discarding the least amount of it.

We know from <https://eprint.iacr.org/2019/198> that random oracles,
instantiated as computational hash functions, make good entropy
accumulators and extractors, which is the justification for using
BLAKE2s in the main input pool. As mentioned, we don't have that luxury
here, but we also don't have the same security model requirements,
because we're assuming that there aren't malicious inputs. A
pseudorandom function instance can approximately behave like a random
oracle, provided that the key is uniformly random. But since we're not
concerned with malicious inputs, we can pick a fixed key, which is not
secret, knowing that "nature" won't interact with a sufficiently chosen
fixed key by accident. So we pick a PRF with a fixed initial key, and
accumulate into it continuously, dumping the result every 64 interrupts
into our cryptographically secure input pool.

For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on
32-bit, which are already in use in the kernel's hsiphash family of
functions and achieve the same performance as the function they replace.
It would be nice to do two rounds, but we don't exactly have the CPU
budget handy for that, and one round alone is already sufficient.

As mentioned, we start with a fixed initial key (zeros is fine), and
allow SipHash's symmetry breaking constants to turn that into a useful
starting point. Also, since we're dumping the result (or half of it on
64-bit so as to tax our hash function the same amount on all platforms)
into the cryptographically secure input pool, there's no point in
finalizing SipHash's output, since it'll wind up being finalized by
something much stronger. This means that all we need to do is use the
ordinary round function word-by-word, as normal SipHash does.
Simplified, the flow is as follows:

Initialize:

    siphash_state_t state;
    siphash_init(&state, key={0, 0, 0, 0});

Update (accumulate) on interrupt:

    siphash_update(&state, interrupt_data_and_timing);

Dump into input pool after 64 interrupts:

    blake2s_update(&input_pool, &state, sizeof(state) / 2);

The result of all of this is that the security model is unchanged from
before -- we assume non-malicious inputs -- yet we now implement that
model with a stronger argument. I would like to emphasize, again, that
the purpose of this commit is to improve the existing design, by making
it analyzable, without changing any fundamental assumptions. There may
well be value down the road in changing up the existing design, using
something cryptographically strong, or simply using a ring buffer of
samples rather than having a fast_mix() at all, or changing which and
how much data we collect each interrupt so that we can use something
linear, or a variety of other ideas. This commit does not invalidate the
potential for those in the future.

For example, in the future, if we're able to characterize the data we're
collecting on each interrupt, we may be able to inch toward information
theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s
= ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good
accumulators for 2-monotone distributions, which would apply to
timestamp counters, like random_get_entropy() or jiffies, but would not
apply to our current combination of the two values, or to the various
function addresses and register values we mix in. Alternatively,
<https://eprint.iacr.org/2021/1002> shows that max-period linear
functions with no non-trivial invariant subspace make good extractors,
used in the form `s = f(s) ^ x`. However, this only works if the input
data is both identical and independent, and obviously a collection of
address values and counters fails; so it goes with theoretical papers.
Future directions here may involve trying to characterize more precisely
what we actually need to collect in the interrupt handler, and building
something specific around that.

However, as mentioned, the morass of data we're gathering at the
interrupt handler presently defies characterization, and so we use
SipHash for now, which works well and performs well.

Cc: Theodore Ts'o <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Jean-Philippe Aumasson <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

wireguard: device: clear keys on VM fork

When a virtual machine forks, it's important that WireGuard clear
existing sessions so that different plaintexts are not transmitted using
the same key+nonce, which can result in catastrophic cryptographic
failure. To accomplish this, we simply hook into the newly added vmfork
notifier.

As a bonus, it turns out that, like the vmfork registration function,
the PM registration function is stubbed out when CONFIG_PM_SLEEP is not
set, so we can actually just remove the maze of ifdefs, which makes it
really quite clean to support both notifiers at once.

Cc: Dominik Brodowski <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: provide notifier for VM fork

Drivers such as WireGuard need to learn when VMs fork in order to clear
sessions. This commit provides a simple notifier_block for that, with a
register and unregister function. When no VM fork detection is compiled
in, this turns into a no-op, similar to how the power notifier works.

Cc: Dominik Brodowski <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: replace custom notifier chain with standard one

We previously rolled our own randomness readiness notifier, which only
has two users in the whole kernel. Replace this with a more standard
atomic notifier block that serves the same purpose with less code. Also
unexport the symbols, because no modules use it, only unconditional
builtins. The only drawback is that it's possible for a notification
handler returning the "stop" code to prevent further processing, but
given that there are only two users, and that we're unexporting this
anyway, that doesn't seem like a significant drawback for the
simplification we receive here.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: do not export add_vmfork_randomness() unless needed

Since add_vmfork_randomness() is only called from vmgenid.o, we can
guard it in CONFIG_VMGENID, similarly to how we do with
add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things
calling into add_vmfork_randomness(), we can add another shared Kconfig
symbol for that, but for now, this is good enough. Even though
add_vmfork_randomess() is a pretty small function, removing it means
that there are only calls to crng_reseed(false) and none to
crng_reseed(true), which means the compiler can constant propagate the
false, removing branches from crng_reseed() and its descendants.

Additionally, we don't even need the symbol to be exported if
CONFIG_VMGENID is not a module, so conditionalize that too.

Cc: Dominik Brodowski <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

virt: vmgenid: notify RNG of VM fork and supply generation ID

VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

    If the OS is running in a VM, there is a problem that most
    hypervisors can snapshot the state of the machine and later rewind
    the VM state to the saved state. This results in the machine running
    a second time with the exact same RNG state, which leads to serious
    security problems.  To reduce the window of vulnerability, Windows
    10 on a Hyper-V VM will detect when the VM state is reset, retrieve
    a unique (not random) value from the hypervisor, and reseed the root
    RNG with that unique value.  This does not eliminate the
    vulnerability, but it greatly reduces the time during which the RNG
    system will produce the same outputs as it did during a previous
    instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
After setting that, use `savevm` in the monitor to save the VM state,
then quit QEMU, start it again, and use `loadvm`. That will trigger this
driver's notify function, which hands the new UUID to the RNG. This is
described in <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.
And there are hooks for this in libvirt as well, described in
<https://libvirt.org/formatdomain.html#general-metadata>.

Note, however, that the treatment of this as a UUID is considered to be
an accidental QEMU nuance, per
<https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt>,
so this driver simply treats these bytes as an opaque 128-bit binary
blob, as per the spec. This doesn't really make a difference anyway,
considering that's how it ends up when handed to the RNG in the end.

Cc: Alexander Graf <[email protected]>
Cc: Adrian Catangiu <[email protected]>
Cc: Daniel P. Berrangé <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Wei Yongjun <[email protected]>
Tested-by: Souradeep Chakrabarti <[email protected]> # With Hyper-V's virtual hardware
Reviewed-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Laszlo Ersek <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

ACPI: allow longer device IDs

We create a list of ACPI "PNP" IDs which contains _HID, _CID, and CLS
entries of the respective devices. However, when making structs for
matching, we squeeze those IDs into acpi_device_id, which only has 9
bytes space to store the identifier. The subsystem actually captures the
full length of the IDs, and the modalias has the full length, but this
struct we use for matching is limited. It originally had 16 bytes, but
was changed to only have 9 in 6543becf26ff ("mod/file2alias: make
modalias generation safe for cross compiling"), presumably on the theory
that it would match the ACPI spec so it didn't matter.

Unfortunately, while most people adhere to the ACPI specs, Microsoft
decided that its VM Generation Counter device [1] should only be
identifiable by _CID with a value of "VM_Gen_Counter", which is longer
than 9 characters.

To allow device drivers to match identifiers that exceed the 9 byte
limit, this simply ups the length to 16, just like it was before the
aforementioned commit. Empirical testing indicates that this
doesn't actually increase vmlinux size on 64-bit, because the ulong in
the same struct caused there to be 7 bytes of padding anyway, and when
doing a s/M/Y/g i386_defconfig build, the bzImage only increased by
0.0055%, so negligible.

This patch is a prerequisite to add support for VMGenID in Linux, the
subsequent patch in this series. It has been confirmed to also work on
the udev/modalias side in userspace.

[1] https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docx

Signed-off-by: Alexander Graf <[email protected]>
Co-developed-by: Jason A. Donenfeld <[email protected]>
[Jason: reworked commit message a bit, went with len=16 approach.]
Cc: Mika Westerberg <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Acked-by: Hans de Goede <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: add mechanism for VM forks to reinitialize crng

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that, by force reseeding the crng.

This has the added benefit of also draining the entropy pool and setting
its timer back, so that any old entropy that was there prior -- which
could have already been used by a different fork, or generally gone
stale -- does not contribute to the accounting of the next 256 bits.

Cc: Dominik Brodowski <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Eric Biggers <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: don't let 644 read-only sysctls be written to

We leave around these old sysctls for compatibility, and we keep them
"writable" for compatibility, but even after writing, we should keep
reporting the same value. This is consistent with how userspaces tend to
use sysctl_random_write_wakeup_bits, writing to it, and then later
reading from it and using the value.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: give sysctl_random_min_urandom_seed a more sensible value

This isn't used by anything or anywhere, but we can't delete it due to
compatibility. So at least give it the correct value of what it's
supposed to be instead of a garbage one.

Cc: Theodore Ts'o <[email protected]>
Reviewed-by: Dominik Brodowski <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

random: block in /dev/urandom

This topic has come up countless times, and usually doesn't go anywhere.
This time I thought I'd bring it up with a slightly narrower focus,
updated for some developments over the last three years: we finally can
make /dev/urandom always secure, in light of the fact that our RNG is
now always seeded.

Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
rather than passively wait for it"), the RNG does a haveged-style jitter
dance around the scheduler, in order to produce entropy (and credit it)
for the case when we're stuck in wait_for_random_bytes(). How ever you
feel about the Linus Jitter Dance is beside the point: it's been there
for three years and usually gets the RNG initialized in a second or so.

As a matter of fact, this is what happens currently when people use
getrandom(). It's already there and working, and most people have been
using it for years without realizing.

So, given that the kernel has grown this mechanism for seeding itself
from nothing, and that this procedure happens pretty fast, maybe there's
no point any longer in having /dev/urandom give insecure bytes. In the
past we didn't want the boot process to deadlock, which was
understandable. But now, in the worst case, a second goes by, and the
problem is resolved. It seems like maybe we're finally at a point when
we can get rid of the infamous "urandom read hole".

The one slight drawback is that the Linus Jitter Dance relies on random_
get_entropy() being implemented. The first lines of try_to_generate_
entropy() are:

stack.now = random_get_entropy();
if (stack.now == random_get_entropy())
return;

On most platforms, random_get_entropy() is simply aliased to get_cycles().
The number of machines without a cycle counter or some other
implementation of random_get_entropy() in 2022, which can also run a
mainline kernel, and at the same time have a both broken and out of date
userspace that relies on /dev/urandom never blocking at boot is thought
to be exceedingly low. And to be clear: those museum pieces without
cycle counters will continue to run Linux just fine, and even
/dev/urandom will be operable just like before; the RNG just needs to be
seeded first through the usual means, which should already be the case
now.

On systems that really do want unseeded randomness, we already offer
getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
their hash tables at boot. Nothing in this commit would affect
GRND_INSECURE, and it remains the means of getting those types of random
numbers.

This patch goes a long way toward eliminating a long overdue userspace
crypto footgun. After several decades of endless user confusion, we will
finally be able to say, "use any single one of our random interfaces and
you'll be fine. They're all the same. It doesn't matter." And that, I
think, is really something. Finally all of those blog posts and
disagreeing forums and contradictory articles will all become correct
about whatever they happened to recommend, and along with it, a whole
class of vulnerabilities eliminated.

With very minimal downside, we're finally in a position where we can
make this change.

Cc: Dinh Nguyen <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Joshua Kinard <[email protected]>
Cc: David Laight <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Lennart Poettering <[email protected]>
Cc: Konstantin Ryabitsev <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

Merge tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix event parser error for hybrid systems

- Fix NULL check against wrong variable in 'perf bench' and in the
   parsing code

- Update arm64 KVM headers from the kernel sources

- Sync cpufeatures header with the kernel sources

* tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf parse: Fix event parser error for hybrid systems
  perf bench: Fix NULL check against wrong variable
  perf parse-events: Fix NULL check against wrong variable
  tools headers cpufeatures: Sync with the kernel sources
  tools kvm headers arm64: Update KVM headers from the kernel sources

Merge tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm

Pull drm kconfig fix from Dave Airlie:
"Thorsten pointed out this had fallen down the cracks and was in -next
  only, I've picked it out, fixed up it's Fixes: line.

   - fix regression in Kconfig"

* tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm:
  drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP

netfilter: nf_tables: disable register tracking

The register tracking infrastructure is incomplete, it might lead to
generating incorrect ruleset bytecode, disable it by now given we are
late in the release process.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

perf parse: Fix event parser error for hybrid systems

This bug happened on hybrid systems when both cpu_core and cpu_atom
have the same event name such as "UOPS_RETIRED.MS" while their event
terms are different, then during perf stat, the event for cpu_atom
will parse fail and then no output for cpu_atom.

UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/

It is because event terms in the "head" of parse_events_multi_pmu_add
will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS
for cpu_core, then when parsing the same event for cpu_atom, it still
uses the event terms for cpu_core, but event terms for cpu_atom are
different with cpu_core, the event parses for cpu_atom will fail. This
patch fixes it, the event terms should be parsed from the original
event.

This patch can work for the hybrid systems that have the same event
in more than 2 PMUs. It also can work in non-hybrid systems.

Before:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 2737845 16068518485 16068518485

Performance counter stats for 'system wide':

         2,737,845      cpu_core/UOPS_RETIRED.MS/

       1.002553850 seconds time elapsed

After:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 1977555 16076950711 16076950711
  UOPS_RETIRED.MS: 568684 8038694234 8038694234

Performance counter stats for 'system wide':

         1,977,555      cpu_core/UOPS_RETIRED.MS/
           568,684      cpu_atom/UOPS_RETIRED.MS/

       1.004758259 seconds time elapsed

Fixes: fb0811535e92c6c1 ("perf parse-events: Allow config on kernel PMU events")
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Zhengjun Xing <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

io_uring: remove duplicated member check for io_msg_ring_prep()

Julia and the kernel test robot report that the prep handling for this
command inadvertently checks one field twice:

fs/io_uring.c:4338:42-56: duplicated argument to && or ||

Get rid of it.

Reported-by: kernel test robot <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Fixes: 4f57f06ce218 ("io_uring: add support for IORING_OP_MSG_RING command")
Signed-off-by: Jens Axboe <[email protected]>

perf bench: Fix NULL check against wrong variable

We did a NULL check after "epollfdp = calloc(...)", but we checked
"epollfd" instead of "epollfdp".

Signed-off-by: Weiguo Li <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf parse-events: Fix NULL check against wrong variable

We did a null check after "tmp->symbol = strdup(...)", but we checked
"list->symbol" other than "tmp->symbol".

Reviewed-by: John Garry <[email protected]>
Signed-off-by: Weiguo Li <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers cpufeatures: Sync with the kernel sources

To pick the changes from:

  d45476d983240937 ("x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE")

Its just a comment fixup.

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/lkml/YiyiHatGaJQM7l/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools kvm headers arm64: Update KVM headers from the kernel sources

To pick the changes from:

  a5905d6af492ee6a ("KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated")

That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: James Morse <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP

As reported in [1], DRM_PANEL_EDP depends on DRM_DP_HELPER. Select
the option to fix the build failure. The error message is shown
below.

  arm-linux-gnueabihf-ld: drivers/gpu/drm/panel/panel-edp.o: in function
    `panel_edp_probe': panel-edp.c:(.text+0xb74): undefined reference to
    `drm_panel_dp_aux_backlight'
  make[1]: *** [/builds/linux/Makefile:1222: vmlinux] Error 1

The issue has been reported before, when DisplayPort helpers were
hidden behind the option CONFIG_DRM_KMS_HELPER. [2]

v2:
* fix and expand commit description (Arnd)

Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: 9d6366e743f3 ("drm: fb_helper: improve CONFIG_FB dependency")
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Linux Kernel Functional Testing <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Acked-by: Sam Ravnborg <[email protected]>
Link: https://lore.kernel.org/dri-devel/CA+G9fYvN0NyaVkRQmA1O6rX7H8PPaZrUAD7=RDy33QY9rUU-9g@mail.gmail.com/
Link: https://lore.kernel.org/all/[email protected]/
Cc: Thomas Zimmermann <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Airlie <[email protected]>

vsock: each transport cycles only on its own sockets

When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jiyong Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

alx: acquire mutex for alx_reinit in alx_change_mtu

alx_reinit has a lockdep assertion that the alx->mtx mutex must be held.
alx_reinit is called from two places: alx_reset and alx_change_mtu.
alx_reset does acquire alx->mtx before calling alx_reinit.
alx_change_mtu does not acquire this mutex, nor do its callers or any
path towards alx_change_mtu.
Acquire the mutex in alx_change_mtu.

The issue was introduced when the fine-grained locking was introduced
to the code to replace the RTNL. The same commit also introduced the
lockdep assertion.

Fixes: 4a5fe57e7751 ("alx: use fine-grained locking instead of RTNL")
Signed-off-by: Niels Dossche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipv6: fix skb_over_panic in __ip6_append_data

Syzbot found a kernel bug in the ipv6 stack:
LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580
The reproducer triggers it by sending a crafted message via sendmmsg()
call, which triggers skb_over_panic, and crashes the kernel:

skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575
head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0
dev:<NULL>

Update the check that prevents an invalid packet with MTU equal
to the fregment header size to eat up all the space for payload.

The reproducer can be found here:
LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000
Reported-by: [email protected]
Signed-off-by: Tadeusz Struk <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Docs: ktap: add code-block type

Fix multiple "code-block::" warnings by adding "none" as the type of
code-block. Mends these warnings:

Documentation/dev-tools/ktap.rst:71: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:120: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:126: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:132: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:139: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:145: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:195: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:208: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.
Documentation/dev-tools/ktap.rst:238: WARNING: Error in "code-block" directive:
1 argument(s) required, 0 supplied.

Fixes: a32fa6b2e8b4 ("Documentation: dev-tools: Add KTAP specification")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Rae Moar <[email protected]>
Cc: David Gow <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Reviewed-by: Frank Rowand <[email protected]>
Reviewed-by: David Gow <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

docs: serial: fix a reference file name in driver.rst

Fix the following 'make refcheckdocs' warning:
Warning: Documentation/driver-api/serial/driver.rst references a file
that doesn't exist: Documentation/driver-api/serial/tty.rst

Signed-off-by: Wan Jiabing <[email protected]>
Reviewed-by: Jiri Slaby <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

docs: UML: Mention telnetd for port channel

It is not obvious from the documentation that using the "port" channel
for the console requires telnetd to be installed (see port_connection()
in arch/um/drivers/port_user.c). Mention this, and the fact that UML
will not boot until a client connects.

Signed-off-by: Vincent Whitchurch <[email protected]>
Acked-by: Anton Ivanov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

ARM: Spectre-BHB: provide empty stub for non-config

When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references
to spectre_v2_update_state() cause a build error, so provide an
empty stub for that function when the Kconfig option is not set.

Fixes this build error:

  arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init':
  proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state'
  arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state'

Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

docs/zh_CN: add damon reclaim translation

Translate .../admin-guide/mm/damon/reclaim.rst into Chinese.

Signed-off-by: Yanteng Si <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Link: https://lore.kernel.org/r/0a436fa78814bb0a7b9c2f3049e544b1e1802560.1646899089.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

docs/zh_CN: add damon usage translation

Translate .../admin-guide/mm/damon/usage.rst into Chinese.

Signed-off-by: Yanteng Si <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Link: https://lore.kernel.org/r/431f1c2a158c61a6556f58048cb54961ab7a8790.1646899089.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

docs/zh_CN: add admin-guide damon start translation

Translate Documentation/admin-guide/mm/damon/start.rst into Chinese.

Signed-off-by: Yanteng Si <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Link: https://lore.kernel.org/r/e6e328be018cbf5f9105adfdad56c951acbb8c8f.1646899089.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

docs/zh_CN: add admin-guide damon index translation

Translate .../admin-guide/mm/damon/index.rst into Chinese.

Signed-off-by: Yanteng Si <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Link: https://lore.kernel.org/r/0251f09dc926972068329b87b0563dd432849497.1646899089.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

docs/zh_CN: Refactoring the admin-guide directory index

The Todolist in the html document looks a mess, now give it a nice looking format.

Signed-off-by: Yanteng Si <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Link: https://lore.kernel.org/r/d410408ec13d6e9cff97da50a13d793a428e05cf.1646899089.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

zh_CN: Add translation for admin-guide/mm/index.rst

Translate Documentation/admin-guide/mm/index.rst into Chinese.
Update Documentation/admin-guide/index.rst.

Reviewed-by: Yang Yang <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Reviewed-by: Yanteng Si <[email protected]>
Signed-off-by: xu xin <[email protected]>
Signed-off-by: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/2d695dac05efc012b99fbc7525be65a421c7de03.1646899056.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

zh_CN: Add translations for admin-guide/mm/ksm.rst

Translate Documentation/admin-guide/mm/ksm.rst into Chinese.

Reviewed-by: Yang Yang <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Reviewed-by: Yanteng Si <[email protected]>
Signed-off-by: xu xin <[email protected]>
Signed-off-by: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/f987a3a2cbffaad64f6e2377a5e393d9afbb099c.1646899056.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

Add Chinese translation for vm/ksm.rst

Translate Documentation/vm/ksm.rst into Chinese.
Update Documentation/translations/zh_CN/vm/index.rst.

Reviewed-by: Yang Yang <[email protected]>
Reviewed-by: Alex Shi <[email protected]>
Reviewed-by: Yanteng Si <[email protected]>
Signed-off-by: xu xin <[email protected]>
Signed-off-by: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/ceb82d6458cd79bc3b7060199db0c3518adc3b8b.1646899056.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <[email protected]>

Merge tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

- prevent users from enabling the alternatives framework (and thus
   errata handling) on XIP kernels, where runtime code patching does not
   function correctly.

- properly detect offset overflow for AUIPC-based relocations in
   modules. This may manifest as modules calling arbitrary invalid
   addresses, depending on the address allocated when a module is
   loaded.

* tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix auipc+jalr relocation range checks
  riscv: alternative only works on !XIP_KERNEL

Merge tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
"Fix STACKTRACE=n build, in particular for skiroot_defconfig"

* tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix STACKTRACE=n build

ARM: fix Thumb2 regression with Spectre BHB

When building for Thumb2, the vectors make use of a local label. Sadly,
the Spectre BHB code also uses a local label with the same number which
results in the Thumb2 reference pointing at the wrong place. Fix this
by changing the number used for the Spectre BHB local label.

Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Tested-by: Nathan Chancellor <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC core:
   - Restore (mostly) the busy polling for MMC_SEND_OP_COND

  MMC host:
   - meson-gx: Fix DMA usage of meson_mmc_post_req()"

* tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND
  mmc: meson: Fix usage of meson_mmc_post_req()

Merge branch irq/qcom-mpm into irq/irqchip-next

* irq/qcom-mpm:
  : .
  : Add support for Qualcomm's MPM wakeup controller, courtesy
  : of Shawn Guo.
  : .
  irqchip: Add Qualcomm MPM controller driver
  dt-bindings: interrupt-controller: Add Qualcomm MPM support

Signed-off-by: Marc Zyngier <[email protected]>

irqchip: Add Qualcomm MPM controller driver

Qualcomm SoCs based on the RPM architecture have a MSM Power Manager (MPM)
in always-on domain. In addition to managing resources during sleep, the
hardware also has an interrupt controller that monitors the interrupts
when the system is asleep, wakes up the APSS when one of these interrupts
occur and replays it to GIC after it becomes operational.

It adds an irqchip driver for this interrupt controller, and here are
some notes about it.

- For given SoC, a fixed number of MPM pins are supported, e.g. 96 pins
  on QCM2290.  Each of these MPM pins can be either a MPM_GIC pin or
  a MPM_GPIO pin. The mapping between MPM_GIC pin and GIC interrupt
  is defined by SoC, as well as the mapping between MPM_GPIO pin and
  GPIO number.  The former mapping is retrieved from device tree, while
  the latter is defined in TLMM pinctrl driver.

- The power domain (PD) .power_off hook is used to notify RPM that APSS
  is about to power collapse.  This requires MPM PD be the parent PD of
  CPU cluster.

- When SoC gets awake from sleep mode, the driver will receive an
  interrupt from RPM, so that it can replay interrupt for particular
  polarity.

Signed-off-by: Shawn Guo <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: interrupt-controller: Add Qualcomm MPM support

It adds DT binding support for Qualcomm MPM interrupt controller.

Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

parisc: Increase parisc_cache_flush_threshold setting

In testing the "Fix non-access data TLB cache flush faults" change,
I noticed a significant improvement in glibc build and check times.
This led me to investigate the parisc_cache_flush_threshold setting.
It determines when we switch from line flushing to whole cache flushing.

It turned out that the parisc_cache_flush_threshold setting on
mako and mako2 machines (PA8800 and PA8900 processors) was way too
small. Adjusting this setting provided almost a factor two improvement
in the glibc build and check time.

Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Enhance user-space visible output

Userspace is up to now limited to 32-bit, so it's sufficient to print
only 32-bit values when showing pointer addresses.

Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Rewrite 32-bit inline assembly of emulate_sth()

Convert to use real temp variables instead of clobbering processor
registers.

Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Rewrite 32-bit inline assembly of emulate_ldd()

Convert to use real temp variables instead of clobbering processor
registers.

Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Rewrite inline assembly of emulate_ldw()

Convert to use real temp variables instead of clobbering processor
registers.

Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Rewrite inline assembly of emulate_ldh()

Convert to use real temp variables instead of clobbering processor
registers.

Signed-off-by: Helge Deller <[email protected]>

parisc/unaligned: Use EFAULT fixup handler in unaligned handlers

Convert the inline assembly code to use the automatic EFAULT exception
handler. With that the fixup code can be dropped.

The other change is to allow double-word only when a 64-bit kernel is
used instead of depending on CONFIG_PA20.

Signed-off-by: Helge Deller <[email protected]>

parisc: Reduce code size by optimizing get_current() function calls

The get_current() code uses the mfctl() macro to get the pointer to the
current task struct from %cr30. The problem with the mfctl() macro is,
that it is marked volatile which is basically correct, because mfctl()
is used to get e.g. the current internal timer or interrupt flags as
well.

But specifically the task struct pointer (%cr30) doesn't change over
time when the kernel executes code for a task.

So, by dropping the volatile when retrieving %cr30 the compiler is now
able to get this value only once and optimize the generated code a lot.

A bloat-o-meter comparism shows that this patch saves ~5kB kernel code
on a 32-bit kernel and ~6kB kernel code on a 64-bit kernel.

Signed-off-by: Helge Deller <[email protected]>