Git Repo - linux.git/log

Merge tag 'sound-fix-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Here are a few remaining patches for 6.1-rc1.

  The major changes are the hibernation fixes for HD-audio CS35L41 codec
  and the USB-audio small fixes against the last change. In addition, a
  couple of HD-audio regression fixes and a couple of potential
  mutex-deadlock fixes with OSS emulation in ALSA core side are seen"

* tag 'sound-fix-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: cs35l41: Support System Suspend
  ALSA: hda: cs35l41: Remove suspend/resume hda hooks
  ALSA: hda/cs_dsp_ctl: Fix mutex inversion when creating controls
  ALSA: hda: hda_cs_dsp_ctl: Ensure pwr_lock is held before reading/writing controls
  ALSA: hda: hda_cs_dsp_ctl: Minor clean and redundant code removal
  ALSA: oss: Fix potential deadlock at unregistration
  ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
  ALSA: hda/realtek: Add Intel Reference SSID to support headset keys
  ALSA: hda/realtek: Add quirk for ASUS GV601R laptop
  ALSA: hda/realtek: Correct pin configs for ASUS G533Z
  ALSA: usb-audio: Avoid superfluous endpoint setup
  ALSA: usb-audio: Correct the return code from snd_usb_endpoint_set_params()
  ALSA: usb-audio: Apply mutex around snd_usb_endpoint_set_params()
  ALSA: usb-audio: Avoid unnecessary interface change at EP close
  ALSA: hda: Update register polling macros
  ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530

Merge tag 'leds-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds

Pull LED updates from Pavel Machek:
"This is very quiet release for LEDs, pca963 got blinking support and
  that's pretty much it"

* tag 'leds-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
  leds: pca963: fix misleading indentation
  dt-bindings: leds: Document mmc trigger
  leds: pca963x: fix blink with hw acceleration

Merge tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull PSI updates from Ingo Molnar:

- Various performance optimizations, resulting in a 4%-9% speedup in
   the mmtests/config-scheduler-perfpipe micro-benchmark.

- New interface to turn PSI on/off on a per cgroup level.

* tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/psi: Per-cgroup PSI accounting disable/re-enable interface
  sched/psi: Cache parent psi_group to speed up group iteration
  sched/psi: Consolidate cgroup_psi()
  sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure
  sched/psi: Remove NR_ONCPU task accounting
  sched/psi: Optimize task switch inside shared cgroups again
  sched/psi: Move private helpers to sched/stats.h
  sched/psi: Save percpu memory when !psi_cgroups_enabled
  sched/psi: Don't create cgroup PSI files when psi_disabled
  sched/psi: Fix periodic aggregation shut off

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Cortex-A55 errata workaround (repeat TLBI)

- AMPERE1 added to the Spectre-BHB affected list

- MTE fix to avoid setting PG_mte_tagged if no tags have been touched
   on a page

- Fixed typo in the SCTLR_EL1.SPINTMASK bit naming (the commit log has
   other typos)

- perf: return value check in ali_drw_pmu_probe(),
   ALIBABA_UNCORE_DRW_PMU dependency on ACPI

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Add AMPERE1 to the Spectre-BHB affected list
  arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored
  MAINTAINERS: rectify file entry in ALIBABA PMU DRIVER
  drivers/perf: ALIBABA_UNCORE_DRW_PMU should depend on ACPI
  drivers/perf: fix return value check in ali_drw_pmu_probe()
  arm64: errata: Add Cortex-A55 to the repeat tlbi list
  arm64/sysreg: Fix typo in SCTR_EL1.SPINTMASK

Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

- fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

- fix userfaultfd test harness instability (Peter Xu)

- various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...

Revert "PCI: Distribute available resources for root buses, too"

This reverts commit e96e27fc6f7971380283768e9a734af16b1716ee.

Jonathan reported that this commit broke this topology, where all the space
available on bus 02 was assigned to the 02:00.0 bridge window, leaving none
for the e1000 device at 02:00.1:

  pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
  pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
  pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
  e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]

Link: https://lore.kernel.org/r/[email protected]
Reported-by: Jonathan Cameron <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

Merge tag 'parisc-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
"Fixes:

   - When we added basic vDSO support in kernel 5.18 we introduced a bug
     which prevented a mmap() of graphic card memory. This is because we
     used the DMB (data memory break trap bit) page flag as special-bit,
     but missed to clear that bit when loading the TLB.

   - Graphics card memory size was not correctly aligned

   - Spelling fixes (from Colin Ian King)

  Enhancements:

   - PDC console (which uses firmware calls) now rewritten as early
     console

   - Reduced size of alternative tables"

* tag 'parisc-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix spelling mistake "mis-match" -> "mismatch" in eisa driver
  parisc: Fix userspace graphics card breakage due to pgtable special bit
  parisc: fbdev/stifb: Align graphics memory size to 4MB
  parisc: Convert PDC console to an early console
  parisc: Reduce kernel size by packing alternative tables

Merge tag 's390-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

- Generate a change uevent on unsolicited device end I/O interrupt for
   z/VM unit record devices supported by the vmur driver. This event can
   be used to automatically trigger processing of files as they arrive
   in the z/VM reader.

* tag 's390-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vmur: generate uevent on unsolicited device end
  s390/vmur: remove unnecessary BUG statement

Merge tag 'riscv-for-linus-6.1-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

- DT updates for the PolarFire SOC

- a fix to correct the handling of write-only mappings

- m{vetndor,arcd,imp}id is now in /proc/cpuinfo

- the SiFive L2 cache controller support has been refactored to also
   support L3 caches

- misc fixes, cleanups and improvements throughout the tree

* tag 'riscv-for-linus-6.1-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
  MAINTAINERS: add RISC-V's patchwork
  RISC-V: Make port I/O string accessors actually work
  riscv: enable software resend of irqs
  RISC-V: Re-enable counter access from userspace
  riscv: vdso: fix NULL deference in vdso_join_timens() when vfork
  riscv: Add cache information in AUX vector
  soc: sifive: ccache: define the macro for the register shifts
  soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes
  soc: sifive: ccache: reduce printing on init
  soc: sifive: ccache: determine the cache level from dts
  soc: sifive: ccache: Rename SiFive L2 cache to Composable cache.
  dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache
  riscv: check for kernel config option in t-head memory types errata
  riscv: use BIT() marco for cpufeature probing
  riscv: use BIT() macros in t-head errata init
  riscv: drop some idefs from CMO initialization
  riscv: cleanup svpbmt cpufeature probing
  riscv: Pass -mno-relax only on lld < 15.0.0
  RISC-V: Avoid dereferening NULL regs in die()
  dt-bindings: riscv: add new riscv,isa strings for emulators
  ...

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

- Fix lost interrupts when returning to soft-masked context on 64-bit.

- Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs

drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n

After commit 8799c0be89eb ("drm/amd/display: Fix vblank refcount in vrr
transition"), a build with CONFIG_DEBUG_FS=n is broken due to a
misplaced brace, along the lines of:

  In file included from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_trace.h:39,
                   from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:41:
  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: At top level:
  ./include/drm/drm_atomic.h:864:9: error: expected identifier or ‘(’ before ‘for’
    864 |         for ((__i) = 0;                                                 \
        |         ^~~
  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8317:9: note: in expansion of macro ‘for_each_new_crtc_in_state’
   8317 |         for_each_new_crtc_in_state(state, crtc, new_crtc_state, j)
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~

Move the brace within the #ifdef so that the file can be built with or
without CONFIG_DEBUG_FS.

Fixes: 8799c0be89eb ("drm/amd/display: Fix vblank refcount in vrr transition")
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

parisc: Fix spelling mistake "mis-match" -> "mismatch" in eisa driver

There are several spelling mistakes in kernel error messages. Fix them.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

cifs: fix static checker warning

Remove unnecessary NULL check of oparam->cifs_sb when parsing symlink
error response as it's already set by all smb2_open_file() callers and
deferenced earlier.

This fixes below report:

fs/cifs/smb2file.c:126 smb2_open_file()
warn: variable dereferenced before check 'oparms->cifs_sb' (see line 112)

Link: https://lore.kernel.org/r/Y0kt42j2tdpYakRu@kili
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>

perf: Skip and warn on unknown format 'configN' attrs

If the kernel exposes a new perf_event_attr field in a format attr, perf
will return an error stating the specified PMU can't be found. For
example, a format attr with 'config3:0-63' causes an error as config3 is
unknown to perf. This causes a compatibility issue between a newer
kernel with older perf tool.

Before this change with a kernel adding 'config3' I get:

  $ perf record -e arm_spe// -- true
  event syntax error: 'arm_spe//'
                       \___ Cannot find PMU `arm_spe'. Missing kernel support?
  Run 'perf list' for a list of valid events

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list
  available events

After this change, I get:

  $ perf record -e arm_spe// -- true
  WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.091 MB perf.data ]

To support unknown configN formats, rework the YACC implementation to
pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
warning.

Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf list: Fix metricgroups title message

  $ perf list metricgroups

gives

  List of pre-defined events (to be used in -e):

  Metric Groups:

  Backend
  Bad
  BadSpec

But that's incorrect of course because metric groups or metrics can only
be specified with -M. So fix the message to say -e or -M

Signed-off-by: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf mem: Fix -C option behavior for perf mem record

The -C/--cpu option was maily for report but it also affected record as
it ate the option.  So users needed to use "--" after perf mem record to
pass the info to the perf record properly.

Check if this option is set for record, and pass it to the actual perf
record.

Before)
  $ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
  ...
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 11
  ...

After)
  $ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
  ...
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 7

Reported-by: Ravi Bangoria <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf annotate: Add missing condition flags for arm64

According to the document [1], it can also have 'hs', 'lo', 'vc', 'vs' as a
condition code. Let's add them too.

[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codes

Reported-by: Kevin Nomura <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

libperf: Do not include non-UAPI linux/compiler.h header

Its just for that __packed define, so use it expanded as __attribute__((packed)),
like the other files in /usr/include do.

This was problem was preventing building the libperf examples on ALT
Linux and Fedora 35, fix it.

Reported-by: Vitaly Chikunov <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Dmitry Levin <[email protected]
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

scripts/clang-tools: Convert clang-tidy args to list

Convert list of clang-tidy arguments to a list for ease of adding to
them and extending them as required.

Signed-off-by: Guru Das Srinagesh <[email protected]>
Suggested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

modpost: put modpost options before argument

The musl implementation of getopt stops looking for options after the
first non-option argument. Put the options before the non-option
argument so environments using musl can still build the kernel and
modules.

Fixes: f73edc8951b2 ("kbuild: unify two modpost invocations")
Link: https://git.musl-libc.org/cgit/musl/tree/src/misc/getopt.c?h=dc9285ad1dc19349c407072cc48ba70dab86de45#n44
Signed-off-by: Richard Acayan <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

perf test: Fix test_arm_coresight.sh failures on Juno

This test commonly fails on Arm Juno because the instruction interval
is large enough to miss generating any samples for Perf in system-wide
mode.

Fix this by lowering the interval until a comfortable number of Perf
instructions are generated. The test is still quick to run because only
a small amount of trace is gathered.

Before:

  sudo ./perf test coresight -vvv
  ...
  Recording trace with system wide mode
  Looking at perf.data file for dumping branch samples:
  Looking at perf.data file for reporting branch samples:
  Looking at perf.data file for instruction samples:
  CoreSight system wide testing: FAIL
  ...

After:

  sudo ./perf test coresight -vvv
  ...
  Recording trace with system wide mode
  Looking at perf.data file for dumping branch samples:
  Looking at perf.data file for reporting branch samples:
  Looking at perf.data file for instruction samples:
  CoreSight system wide testing: PASS
  ...

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Support old kernels for bperf cgroup counting

The recent change in the cgroup will break the backward compatiblity in
the BPF program. It should support both old and new kernels using BPF
CO-RE technique.

Like the task_struct->__state handling in the offcpu analysis, we can
check the field name in the cgroup struct.

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: zefan li <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

parisc: Fix userspace graphics card breakage due to pgtable special bit

Commit df24e1783e6e ("parisc: Add vDSO support") introduced the vDSO
support, for which a _PAGE_SPECIAL page table flag was needed. Since we
wanted to keep every page table entry in 32-bits, this patch re-used the
existing - but yet unused - _PAGE_DMB flag (which triggers a hardware break
if a page is accessed) to store the special bit.

But when graphics card memory is mmapped into userspace, the kernel uses
vm_iomap_memory() which sets the the special flag. So, with the DMB bit
set, every access to the graphics memory now triggered a hardware
exception and segfaulted the userspace program.

Fix this breakage by dropping the DMB bit when writing the page
protection bits to the CPU TLB.

In addition this patch adds a small optimization: if huge pages aren't
configured (which is at least the case for 32-bit kernels), then the
special bit is stored in the hpage (HUGE PAGE) bit instead. That way we
can skip to reset the DMB bit.

Fixes: df24e1783e6e ("parisc: Add vDSO support")
Cc: <[email protected]> # 5.18+
Signed-off-by: Helge Deller <[email protected]>

parisc: fbdev/stifb: Align graphics memory size to 4MB

Independend of the current graphics resolution, adjust the reported
graphics card memory size to the next 4MB boundary.
This fixes the fbtest program which expects a naturally aligned size.

Signed-off-by: Helge Deller <[email protected]>
Cc: <[email protected]>

Merge tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm

Pull more drm updates from Dave Airlie:
"Round of fixes for the merge window stuff, bunch of amdgpu and i915
  changes, this should have the gcc11 warning fix, amongst other
  changes.

  amdgpu:
   - DC mutex fix
   - DC SubVP fixes
   - DCN 3.2.x fixes
   - DCN 3.1.x fixes
   - SDMA 6.x fixes
   - Enable DPIA for 3.1.4
   - VRR fixes
   - VRAM BO swapping fix
   - Revert dirty fb helper change
   - SR-IOV suspend/resume fixes
   - Work around GCC array bounds check fail warning
   - UMC 8.10 fixes
   - Misc fixes and cleanups

  i915:
   - Round to closest in g4x+ HDMI clock readout
   - Update MOCS table for EHL
   - Fix PSR_IMR/IIR field handling
   - Fix watermark calculations for gen12+/DG2 modifiers
   - Reject excessive dotclocks early
   - Fix revocation of non-persistent contexts
   - Handle migration for dpt
   - Fix display problems after resume
   - Allow control over the flags when migrating
   - Consider DG2_RC_CCS_CC when migrating buffers"

* tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm: (110 commits)
  drm/amd/display: Add HUBP surface flip interrupt handler
  drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers
  drm/i915: allow control over the flags when migrating
  drm/amd/display: Simplify bool conversion
  drm/amd/display: fix transfer function passed to build_coefficients()
  drm/amd/display: add a license to cursor_reg_cache.h
  drm/amd/display: make virtual_disable_link_output static
  drm/amd/display: fix indentation in dc.c
  drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static
  drm/amd/display: fix build error on arm64
  drm/amd/display: 3.2.207
  drm/amd/display: Clean some DCN32 macros
  drm/amdgpu: Add poison mode query for umc v8_10_0
  drm/amdgpu: Update umc v8_10_0 headers
  drm/amdgpu: fix coding style issue for mca notifier
  drm/amdgpu: define convert_error_address for umc v8.7
  drm/amdgpu: define RAS convert_error_address API
  drm/amdgpu: remove check for CE in RAS error address query
  drm/i915: Fix display problems after resume
  drm/amd/display: fix array-bounds error in dc_stream_remove_writeback() [take 2]
  ...

Merge tag 'block-6.1-2022-10-13' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:
"Fixes that ended up landing later than the initial block pull request.
  Nothing really major in here:

   - NVMe pull request via Christoph:
        - add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
        - add NVME_QUIRK_NO_DEEPEST_PS to avoid the deepest sleep state
          on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
        - fix possible hang caused during ctrl deletion (Sagi Grimberg)
        - fix possible hang in live ns resize with ANA access (Sagi
          Grimberg)

   - Proactively avoid a sign extension issue with the queue flags
     (Brian)

   - Regression fix for hidden disks (Christoph)

   - Update OPAL maintainers entry (Jonathan)

   - blk-wbt regression initialization fix (Yu)"

* tag 'block-6.1-2022-10-13' of git://git.kernel.dk/linux:
  nvme-multipath: fix possible hang in live ns resize with ANA access
  nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
  nvme-tcp: fix possible hang caused during ctrl deletion
  nvme-rdma: fix possible hang caused during ctrl deletion
  block: fix leaking minors of hidden disks
  block: avoid sign extend problem with default queue flags mask
  blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
  block: Remove the repeat word 'can'
  MAINTAINERS: Update SED-Opal Maintainers

Merge tag 'io_uring-6.1-2022-10-13' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
"A collection of fixes that ended up either being later than the
  initial pull, or dependent on multiple branches (6.0-late being one of
  them) and hence deferred purposely. This contains:

   - Cleanup fixes for the single submitter late 6.0 change, which we
     pushed to 6.1 to keep the 6.0 changes small (Dylan, Pavel)

   - Fix for IORING_OP_CONNECT not handling -EINPROGRESS correctly (me)

   - Ensure that the zc sendmsg variant gets audited correctly (me)

   - Regression fix from this merge window where kiocb_end_write()
     doesn't always gets called, which can cause issues with fs freezing
     (me)

   - Registered files SCM handling fix (Pavel)

   - Regression fix for big sqe dumping in fdinfo (Pavel)

   - Registered buffers accounting fix (Pavel)

   - Remove leftover notification structures, we killed them off late in
     6.0 (Pavel)

   - Minor optimizations (Pavel)

   - Cosmetic variable shadowing fix (Stefan)"

* tag 'io_uring-6.1-2022-10-13' of git://git.kernel.dk/linux:
  io_uring/rw: ensure kiocb_end_write() is always called
  io_uring: fix fdinfo sqe offsets calculation
  io_uring: local variable rw shadows outer variable in io_write
  io_uring/opdef: remove 'audit_skip' from SENDMSG_ZC
  io_uring: optimise locking for local tw with submit_wait
  io_uring: remove redundant memory barrier in io_req_local_work_add
  io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT
  io_uring: remove notif leftovers
  io_uring: correct pinned_vm accounting
  io_uring/af_unix: defer registered files gc to io_uring release
  io_uring: limit registration w/ SINGLE_ISSUER
  io_uring: remove io_register_submitter
  io_uring: simplify __io_uring_add_tctx_node

MAINTAINERS: add RISC-V's patchwork

The RISC-V patchwork instance on kernel.org has had some necromancy
performed on it & will be used going forward. The statuses that are
intended to be used are:
- New: No action has been taken yet
- Under Review: The maintainer is waiting for review comments from others
- Changes Requested: Either the maintainer or a reviewer requested
  changes in the patch. The patch author is expected to submit a new
  version
- Superseded: There's a new version of the patch available
- Not Applicable: The patch is not intended for the RISC-V tree
- Accepted: The patch has been applied
- Rejected: The patch has been rejected, with reasons stated in an
  email

Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'amd-drm-fixes-6.1-2022-10-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.1-2022-10-12:

amdgpu:
- DC mutex fix
- DC SubVP fixes
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- SDMA 6.x fixes
- Enable DPIA for 3.1.4
- VRR fixes
- VRAM BO swapping fix
- Revert dirty fb helper change
- SR-IOV suspend/resume fixes
- Work around GCC array bounds check fail warning
- UMC 8.10 fixes
- Misc fixes and cleanups

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-next-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

- Fix revocation of non-persistent contexts (Tvrtko Ursulin)
- Handle migration for dpt (Matthew Auld)
- Fix display problems after resume (Thomas Hellström)
- Allow control over the flags when migrating (Matthew Auld)
- Consider DG2_RC_CCS_CC when migrating buffers (Matthew Auld)

Signed-off-by: Dave Airlie <[email protected]>
From: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/Y0gK9QmCmktLLzqp@tursulin-desk

rtc: rv3028: Fix codestyle errors

Compiler warnings：

drivers/rtc/rtc-rv3028.c: In function 'rv3028_param_set':
drivers/rtc/rtc-rv3028.c:559:20: warning: statement will never be executed [-Wswitch-unreachable]
  559 |                 u8 mode;
      |                    ^~~~
drivers/rtc/rtc-rv3028.c: In function 'rv3028_param_get':
drivers/rtc/rtc-rv3028.c:526:21: warning: statement will never be executed [-Wswitch-unreachable]
  526 |                 u32 value;
      |                     ^~~~~

Fix it by moving the variable declaration to the beginning of the function.

Cc: Alessandro Zummo <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reported-by: k2ci <[email protected]>
Signed-off-by: Ke Sun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Belloni <[email protected]>

rtc: cmos: Fix event handler registration ordering issue

Because acpi_install_fixed_event_handler() enables the event
automatically on success, it is incorrect to call it before the
handler routine passed to it is ready to handle events.

Unfortunately, the rtc-cmos driver does exactly the incorrect thing
by calling cmos_wake_setup(), which passes rtc_handler() to
acpi_install_fixed_event_handler(), before cmos_do_probe(), because
rtc_handler() uses dev_get_drvdata() to get to the cmos object
pointer and the driver data pointer is only populated in
cmos_do_probe().

This leads to a NULL pointer dereference in rtc_handler() on boot
if the RTC fixed event happens to be active at the init time.

To address this issue, change the initialization ordering of the
driver so that cmos_wake_setup() is always called after a successful
cmos_do_probe() call.

While at it, change cmos_pnp_probe() to call cmos_do_probe() after
the initial if () statement used for computing the IRQ argument to
be passed to cmos_do_probe() which is cleaner than calling it in
each branch of that if () (local variable "irq" can be of type int,
because it is passed to that function as an argument of type int).

Note that commit 6492fed7d8c9 ("rtc: rtc-cmos: Do not check
ACPI_FADT_LOW_POWER_S0") caused this issue to affect a larger number
of systems, because previously it only affected systems with
ACPI_FADT_LOW_POWER_S0 set, but it is present regardless of that
commit.

Fixes: 6492fed7d8c9 ("rtc: rtc-cmos: Do not check ACPI_FADT_LOW_POWER_S0")
Fixes: a474aaedac99 ("rtc-cmos: move wake setup from ACPI glue into RTC driver")
Link: https://lore.kernel.org/linux-acpi/[email protected]/
Reported-by: Mel Gorman <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Bjorn Helgaas <[email protected]>
Tested-by: Mel Gorman <[email protected]>
Link: https://lore.kernel.org/r/5629262.DvuYhMxLoT@kreacher
Signed-off-by: Alexandre Belloni <[email protected]>

RISC-V: Make port I/O string accessors actually work

Fix port I/O string accessors such as `insb', `outsb', etc. which use
the physical PCI port I/O address rather than the corresponding memory
mapping to get at the requested location, which in turn breaks at least
accesses made by our parport driver to a PCIe parallel port such as:

PCI parallel port detected: 1415:c118, I/O at 0x1000(0x1008), IRQ 20
parport0: PC-style at 0x1000 (0x1008), irq 20, using FIFO [PCSPP,TRISTATE,COMPAT,EPP,ECP]

causing a memory access fault:

Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000001008
Oops [#1]
Modules linked in:
CPU: 1 PID: 350 Comm: cat Not tainted 6.0.0-rc2-00283-g10d4879f9ef0-dirty #23
Hardware name: SiFive HiFive Unmatched A00 (DT)
epc : parport_pc_fifo_write_block_pio+0x266/0x416
ra : parport_pc_fifo_write_block_pio+0xb4/0x416
epc : ffffffff80542c3e ra : ffffffff80542a8c sp : ffffffd88899fc60
gp : ffffffff80fa2700 tp : ffffffd882b1e900 t0 : ffffffd883d0b000
t1 : ffffffffff000002 t2 : 4646393043330a38 s0 : ffffffd88899fcf0
s1 : 0000000000001000 a0 : 0000000000000010 a1 : 0000000000000000
a2 : ffffffd883d0a010 a3 : 0000000000000023 a4 : 00000000ffff8fbb
a5 : ffffffd883d0a001 a6 : 0000000100000000 a7 : ffffffc800000000
s2 : ffffffffff000002 s3 : ffffffff80d28880 s4 : ffffffff80fa1f50
s5 : 0000000000001008 s6 : 0000000000000008 s7 : ffffffd883d0a000
s8 : 0004000000000000 s9 : ffffffff80dc1d80 s10: ffffffd8807e4000
s11: 0000000000000000 t3 : 00000000000000ff t4 : 393044410a303930
t5 : 0000000000001000 t6 : 0000000000040000
status: 0000000200000120 badaddr: 0000000000001008 cause: 000000000000000f
[<ffffffff80543212>] parport_pc_compat_write_block_pio+0xfe/0x200
[<ffffffff8053bbc0>] parport_write+0x46/0xf8
[<ffffffff8050530e>] lp_write+0x158/0x2d2
[<ffffffff80185716>] vfs_write+0x8e/0x2c2
[<ffffffff80185a74>] ksys_write+0x52/0xc2
[<ffffffff80185af2>] sys_write+0xe/0x16
[<ffffffff80003770>] ret_from_syscall+0x0/0x2
---[ end trace 0000000000000000 ]---

For simplicity address the problem by adding PCI_IOBASE to the physical
address requested in the respective wrapper macros only, observing that
the raw accessors such as `__insb', `__outsb', etc. are not supposed to
be used other than by said macros. Remove the cast to `long' that is no
longer needed on `addr' now that it is used as an offset from PCI_IOBASE
and add parentheses around `addr' needed for predictable evaluation in
macro expansion. No need to make said adjustments in separate changes
given that current code is gravely broken and does not ever work.

Signed-off-by: Maciej W. Rozycki <[email protected]>
Fixes: fab957c11efe2 ("RISC-V: Atomic and Locking Code")
Cc: [email protected] # v4.15+
Reviewed-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output

I'm merging this in as a single commit as it's a dependency for some
other work.

* commit '3baca1a4d490484fcd555413f1fec85b2e071912':
RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output

RISC-V: Make mmap() with PROT_WRITE imply PROT_READ

Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() reject mappings with only PROT_WRITE set in an
attempt to fix an observed inconsistency in behavior when attempting
to read from a PROT_WRITE-only mapping. The root cause of this behavior
was actually that while RISC-V's protection_map maps VM_WRITE to
readable PTE permissions (since write-only PTEs are considered reserved
by the privileged spec), the page fault handler considered loads from
VM_WRITE-only VMAs illegal accesses. Fix the underlying cause by
handling faults in VM_WRITE-only VMAs (patch 1) and then re-enable
use of mmap(PROT_WRITE) (patch 2), making RISC-V's behavior consistent
with all other architectures that don't support write-only PTEs.

* remotes/palmer/riscv-wonly:
riscv: Allow PROT_WRITE-only mmap()
riscv: Make VM_WRITE imply VM_READ

Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'devicetree-fixes-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Fixes for Mediatek MT6370 binding

- Merge the DT overlay maintainer entry to the main entry as Pantelis
   is not active and Frank is taking a step back

* tag 'devicetree-fixes-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  MAINTAINERS: of: collapse overlay entry into main device tree entry
  dt-bindings: mfd: mt6370: fix the interrupt order of the charger in the example
  dt-bindings: leds: mt6370: Fix MT6370 LED indicator DT warning

Merge tag 'mmc-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC core:
   - Add SD card quirk for broken discard

  MMC host:
   - renesas_sdhi: Fix clock rounding errors
   - sdhci-sprd: Fix minimum clock limit to detect cards
   - sdhci-tegra: Use actual clock rate for SW tuning correction"

* tag 'mmc-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix minimum clock limit
  mmc: sdhci-tegra: Use actual clock rate for SW tuning correction
  mmc: renesas_sdhi: Fix rounding errors
  mmc: core: Add SD card quirk for broken discard

riscv: enable software resend of irqs

The PLIC specification does not describe the interrupt pendings bits as
read-write, only that they "can be read". To allow for retriggering of
interrupts (and the use of the irq debugfs interface) enable
HARDIRQS_SW_RESEND for RISC-V.

Link: https://github.com/riscv/riscv-plic-spec/blob/master/riscv-plic.adoc#interrupt-pending-bits
Signed-off-by: Conor Dooley <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Tested-by: Palmer Dabbelt <[email protected]> # on QEMU
Reviewed-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Re-enable counter access from userspace

These counters were part of the ISA when we froze the uABI, removing
them breaks userspace.

Link: https://lore.kernel.org/all/YxEhC%[email protected]/
Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension")
Tested-by: Conor Dooley <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: vdso: fix NULL deference in vdso_join_timens() when vfork

Testing tools/testing/selftests/timens/vfork_exec.c got below
kernel log:

[    6.838454] Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000020
[    6.842255] Oops [#1]
[    6.842871] Modules linked in:
[    6.844249] CPU: 1 PID: 64 Comm: vfork_exec Not tainted 6.0.0-rc3-rt15+ #8
[    6.845861] Hardware name: riscv-virtio,qemu (DT)
[    6.848009] epc : vdso_join_timens+0xd2/0x110
[    6.850097]  ra : vdso_join_timens+0xd2/0x110
[    6.851164] epc : ffffffff8000635c ra : ffffffff8000635c sp : ff6000000181fbf0
[    6.852562]  gp : ffffffff80cff648 tp : ff60000000fdb700 t0 : 3030303030303030
[    6.853852]  t1 : 0000000000000030 t2 : 3030303030303030 s0 : ff6000000181fc40
[    6.854984]  s1 : ff60000001e6c000 a0 : 0000000000000010 a1 : ffffffff8005654c
[    6.856221]  a2 : 00000000ffffefff a3 : 0000000000000000 a4 : 0000000000000000
[    6.858114]  a5 : 0000000000000000 a6 : 0000000000000008 a7 : 0000000000000038
[    6.859484]  s2 : ff60000001e6c068 s3 : ff6000000108abb0 s4 : 0000000000000000
[    6.860751]  s5 : 0000000000001000 s6 : ffffffff8089dc40 s7 : ffffffff8089dc38
[    6.862029]  s8 : ffffffff8089dc30 s9 : ff60000000fdbe38 s10: 000000000000005e
[    6.863304]  s11: ffffffff80cc3510 t3 : ffffffff80d1112f t4 : ffffffff80d1112f
[    6.864565]  t5 : ffffffff80d11130 t6 : ff6000000181fa00
[    6.865561] status: 0000000000000120 badaddr: 0000000000000020 cause: 000000000000000d
[    6.868046] [<ffffffff8008dc94>] timens_commit+0x38/0x11a
[    6.869089] [<ffffffff8008dde8>] timens_on_fork+0x72/0xb4
[    6.870055] [<ffffffff80190096>] begin_new_exec+0x3c6/0x9f0
[    6.871231] [<ffffffff801d826c>] load_elf_binary+0x628/0x1214
[    6.872304] [<ffffffff8018ee7a>] bprm_execve+0x1f2/0x4e4
[    6.873243] [<ffffffff8018f90c>] do_execveat_common+0x16e/0x1ee
[    6.874258] [<ffffffff8018f9c8>] sys_execve+0x3c/0x48
[    6.875162] [<ffffffff80003556>] ret_from_syscall+0x0/0x2
[    6.877484] ---[ end trace 0000000000000000 ]---

This is because the mm->context.vdso_info is NULL in vfork case. From
another side, mm->context.vdso_info either points to vdso info
for RV64 or vdso info for compat, there's no need to bloat riscv's
mm_context_t, we can handle the difference when setup the additional
page for vdso.

Signed-off-by: Jisheng Zhang <[email protected]>
Suggested-by: Palmer Dabbelt <[email protected]>
Fixes: 3092eb456375 ("riscv: compat: vdso: Add setup additional pages implementation")
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge patch series "Use composable cache instead of L2 cache"

Zong Li <[email protected]> says:

Since composable cache may be L3 cache if private L2 cache exists, we
should use its original name "composable cache" to prevent confusion.

This patchset contains the modification which is related to ccache, such
as DT binding and EDAC driver.

* b4-shazam-merge:
  riscv: Add cache information in AUX vector
  soc: sifive: ccache: define the macro for the register shifts
  soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes
  soc: sifive: ccache: reduce printing on init
  soc: sifive: ccache: determine the cache level from dts
  soc: sifive: ccache: Rename SiFive L2 cache to Composable cache.
  dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Add cache information in AUX vector

There are no standard CSR registers to provide cache information, the
way for RISC-V is to get this information from DT. sysconf syscall
could use them to get information of cache through AUX vector.

The result of 'getconf -a|grep -i cache' as follows:
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                2
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 32768
LEVEL1_DCACHE_ASSOC                4
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  524288
LEVEL2_CACHE_ASSOC                 8
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  4194304
LEVEL3_CACHE_ASSOC                 16
LEVEL3_CACHE_LINESIZE              64
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

Signed-off-by: Greentime Hu <[email protected]>
Signed-off-by: Zong Li <[email protected]>
Suggested-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

soc: sifive: ccache: define the macro for the register shifts

Define the macro for the register shifts, it could make the code be
more readable

Signed-off-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes

Use the pr_fmt() macro to prefix all the output with "CCACHE:"
to avoid having to write it out each time, or make a large diff
when the next change comes along.

Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

soc: sifive: ccache: reduce printing on init

The driver prints out 6 lines on startup, which can easily be redcued
to two lines without losing any information.

Note, to make the types work better, uint64_t has been replaced with
ULL to make the unsigned long long match the format in the print
statement.

Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

soc: sifive: ccache: determine the cache level from dts

Composable cache could be L2 or L3 cache, use 'cache-level' property of
device node to determine the level.

Signed-off-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

soc: sifive: ccache: Rename SiFive L2 cache to Composable cache.

Since composable cache may be L3 cache if there is a L2 cache, we should
use its original name composable cache to prevent confusion.

There are some new lines were generated due to adding the compatible
"sifive,ccache0" into ID table and indent requirement.

The sifive L2 has been renamed to sifive CCACHE, EDAC driver needs to
apply the change as well.

Signed-off-by: Greentime Hu <[email protected]>
Signed-off-by: Zong Li <[email protected]>
Co-developed-by: Zong Li <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache

Since composable cache may be L3 cache if private L2 cache exists, we
should use its original name Composable cache to prevent confusion.

Signed-off-by: Zong Li <[email protected]>
Suggested-by: Conor Dooley <[email protected]>
Suggested-by: Ben Dooks <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'docs-6.1-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
"A handful of relatively simple documentation fixes, plus a set of
  patches catching the Chinese translation up with the front-page
  rework"

* tag 'docs-6.1-2' of git://git.lwn.net/linux:
  Documentation: rtla: Correct command line example
  docs/zh_CN: add a man-pages link to zh_CN/index.rst
  docs/zh_CN: Rewrite the Chinese translation front page
  docs/zh_CN: add zh_CN/arch.rst
  docs/zh_CN: promote the title of zh_CN/process/index.rst
  docs/zh_CN: Update the translation of page_owner to 6.0-rc7
  docs/zh_CN: Update the translation of ksm to 6.0-rc7
  docs/howto: Replace abundoned URL of gmane.org
  Documentation: ubifs: Fix compression idiom
  Documentation/mm/page_owner.rst: delete frequently changing experimental data
  docs/zh_CN: Fix build warning
  docs: ftrace: Correct access mode

Merge tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, and wifi.

Current release - regressions:

   - Revert "net/sched: taprio: make qdisc_leaf() see the
     per-netdev-queue pfifo child qdiscs", it may cause crashes when the
     qdisc is reconfigured

   - inet: ping: fix splat due to packet allocation refactoring in inet

   - tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF
     due to races when per-netns hash table is used

  Current release - new code bugs:

   - eth: adin1110: check in netdev_event that netdev belongs to driver

   - fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.

  Previous releases - regressions:

   - ipv4: handle attempt to delete multipath route when fib_info
     contains an nh reference, avoid oob access

   - wifi: fix handful of bugs in the new Multi-BSSID code

   - wifi: mt76: fix rate reporting / throughput regression on mt7915
     and newer, fix checksum offload

   - wifi: iwlwifi: mvm: fix double list_add at
     iwl_mvm_mac_wake_tx_queue (other cases)

   - wifi: mac80211: do not drop packets smaller than the LLC-SNAP
     header on fast-rx

  Previous releases - always broken:

   - ieee802154: don't warn zero-sized raw_sendmsg()

   - ipv6: ping: fix wrong checksum for large frames

   - mctp: prevent double key removal and unref

   - tcp/udp: fix memory leaks and races around IPV6_ADDRFORM

   - hv_netvsc: fix race between VF offering and VF association message

  Misc:

   - remove -Warray-bounds silencing in the drivers, compilers fixed"

* tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
  sunhme: fix an IS_ERR() vs NULL check in probe
  net: marvell: prestera: fix a couple NULL vs IS_ERR() checks
  kcm: avoid potential race in kcm_tx_work
  tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
  net: phy: micrel: Fixes FIELD_GET assertion
  openvswitch: add nf_ct_is_confirmed check before assigning the helper
  tcp: Fix data races around icsk->icsk_af_ops.
  ipv6: Fix data races around sk->sk_prot.
  tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
  udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
  tcp/udp: Fix memory leak in ipv6_renew_options().
  mctp: prevent double key removal and unref
  selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
  netfilter: rpfilter/fib: Populate flowic_l3mdev field
  selftests: netfilter: Test reverse path filtering
  net/mlx5: Make ASO poll CQ usable in atomic context
  tcp: cdg: allow tcp_cdg_release() to be called multiple times
  inet: ping: fix recent breakage
  ipv6: ping: fix wrong checksum for large frames
  net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
  ...

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:

- Fix a regression in virtio pci on power

- Add a reviewer for ifcvf

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/ifcvf: add reviewer
virtio_pci: use irq to detect interrupt support

Merge tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Found that the synthetic events were using strlen/strscpy() on values
   that could have come from userspace, and that is bad.

   Consolidate the string logic of kprobe and eprobe and extend it to
   the synthetic events to safely process string addresses.

- Clean up content of text dump in ftrace_bug() where the output does
   not make char reads into signed and sign extending the byte output.

- Fix some kernel docs in the ring buffer code.

* tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix reading strings from synthetic events
  tracing: Add "(fault)" name injection to kernel probes
  tracing: Move duplicate code of trace_kprobe/eprobe.c into header
  ring-buffer: Fix kernel-doc
  ftrace: Fix char print issue in print_ip_ins()

Merge tag 'linux-watchdog-6.1-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

- new driver for Exar/MaxLinear XR28V38x

- support for exynosautov9 SoC

- support for Renesas R-Car V5H (R8A779G0) and RZ/V2M (r9a09g011) SoC

- support for imx93

- several other fixes and improvements

* tag 'linux-watchdog-6.1-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits)
  watchdog: twl4030_wdt: add missing mod_devicetable.h include
  dt-bindings: watchdog: migrate mt7621 text bindings to YAML
  watchdog: sp5100_tco: Add "action" module parameter
  watchdog: imx93: add watchdog timer on imx93
  watchdog: imx7ulp_wdt: init wdog when it was active
  watchdog: imx7ulp_wdt: Handle wdog reconfigure failure
  watchdog: imx7ulp_wdt: Fix RCS timeout issue
  watchdog: imx7ulp_wdt: Check CMD32EN in wdog init
  watchdog: imx7ulp: Add explict memory barrier for unlock sequence
  watchdog: imx7ulp: Move suspend/resume to noirq phase
  watchdog: rti-wdt:using the pm_runtime_resume_and_get to simplify the code
  dt-bindings: watchdog: rockchip: add rockchip,rk3128-wdt
  watchdog: s3c2410_wdt: support exynosautov9 watchdog
  dt-bindings: watchdog: add exynosautov9 compatible
  watchdog: npcm: Enable clock if provided
  watchdog: meson: keep running if already active
  watchdog: dt-bindings: atmel,at91sam9-wdt: convert to json-schema
  watchdog: armada_37xx_wdt: Fix .set_timeout callback
  watchdog: sa1100: make variable sa1100dog_driver static
  watchdog: w83977f_wdt: Fix comment typo
  ...

Merge tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
"A quiet round this time: several assorted filesystem fixes, the most
  noteworthy one being some additional wakeups in cap handling code, and
  a messenger cleanup"

* tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client:
  ceph: remove Sage's git tree from documentation
  ceph: fix incorrectly showing the .snap size for stat
  ceph: fail the open_by_handle_at() if the dentry is being unlinked
  ceph: increment i_version when doing a setattr with caps
  ceph: Use kcalloc for allocating multiple elements
  ceph: no need to wait for transition RDCACHE|RD -> RD
  ceph: fail the request if the peer MDS doesn't support getvxattr op
  ceph: wake up the waiters if any new caps comes
  libceph: drop last_piece flag from ceph_msg_data_cursor

Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"New Features:
   - Add NFSv4.2 xattr tracepoints
   - Replace xprtiod WQ in rpcrdma
   - Flexfiles cancels I/O on layout recall or revoke

  Bugfixes and Cleanups:
   - Directly use ida_alloc() / ida_free()
   - Don't open-code max_t()
   - Prefer using strscpy over strlcpy
   - Remove unused forward declarations
   - Always return layout states on flexfiles layout return
   - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of
     error
   - Allow more xprtrdma memory allocations to fail without triggering a
     reclaim
   - Various other xprtrdma clean ups
   - Fix rpc_killall_tasks() races"

* tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
  NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked
  SUNRPC: Add API to force the client to disconnect
  SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls
  SUNRPC: Fix races with rpc_killall_tasks()
  xprtrdma: Fix uninitialized variable
  xprtrdma: Prevent memory allocations from driving a reclaim
  xprtrdma: Memory allocation should be allowed to fail during connect
  xprtrdma: MR-related memory allocation should be allowed to fail
  xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc()
  xprtrdma: Clean up synopsis of rpcrdma_req_create()
  svcrdma: Clean up RPCRDMA_DEF_GFP
  SUNRPC: Replace the use of the xprtiod WQ in rpcrdma
  NFSv4.2: Add a tracepoint for listxattr
  NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr
  NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2
  NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR
  nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration
  NFSv4: remove nfs4_renewd_prepare_shutdown() declaration
  fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment
  NFSv4/pNFS: Always return layout stats on layout return for flexfiles
  ...

Merge tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs update from Mike Marshall:
"Change iterate to iterate_shared"

* tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
Orangefs: change iterate to iterate_shared

Documentation: rtla: Correct command line example

The '-t/-T' parameters seem to have been swapped:
-t/--trace[=file]: save the stopped trace
to [file|timerlat_trace.txt]
-T/--thread us: stop trace if the thread latency
is higher than the argument in us

Swap them back.

Signed-off-by: Pierre Gondois <[email protected]>
Acked-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

sunhme: fix an IS_ERR() vs NULL check in probe

The devm_request_region() function does not return error pointers, it
returns NULL on error.

Fixes: 914d9b2711dd ("sunhme: switch to devres")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Sean Anderson <[email protected]>
Reviewed-by: Rolf Eike Beer <[email protected]>
Link: https://lore.kernel.org/r/Y0bWzJL8JknX8MUf@kili
Signed-off-by: Jakub Kicinski <[email protected]>

net: marvell: prestera: fix a couple NULL vs IS_ERR() checks

The __prestera_nexthop_group_create() function returns NULL on error
and the prestera_nexthop_group_get() returns error pointers. Fix these
two checks.

Fixes: 0a23ae237171 ("net: marvell: prestera: Add router nexthops ABI")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/Y0bWq+7DoKK465z8@kili
Signed-off-by: Jakub Kicinski <[email protected]>

kcm: avoid potential race in kcm_tx_work

syzbot found that kcm_tx_work() could crash [1] in:

/* Primarily for SOCK_SEQPACKET sockets */
if (likely(sk->sk_socket) &&
test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
<<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_space(sk);
}

I think the reason is that another thread might concurrently
run in kcm_release() and call sock_orphan(sk) while sk is not
locked. kcm_tx_work() find sk->sk_socket being NULL.

[1]
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53

CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: kkcmd kcm_tx_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
kasan_report+0xbe/0x1f0 mm/kasan/report.c:495
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
instrument_atomic_write include/linux/instrumented.h:86 [inline]
clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Tom Herbert <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Clean up kernel listener's reqsk in inet_twsk_purge()

Eric Dumazet reported a use-after-free related to the per-netns ehash
series. [0]

When we create a TCP socket from userspace, the socket always holds a
refcnt of the netns.  This guarantees that a reqsk timer is always fired
before netns dismantle.  Each reqsk has a refcnt of its listener, so the
listener is not freed before the reqsk, and the net is not freed before
the listener as well.

OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt
of its netns.  Thus, a reqsk timer can be fired after the netns dismantle
and access freed per-netns ehash.

To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets
in inet_twsk_purge() if the netns uses a per-netns ehash.

[0]: https://lore.kernel.org/netdev/CANn89iLXMup0dRD_Ov79Xt8N9FM0XdhCHEN05sf3eLwxKweM6w@mail.gmail.com/

BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo
include/net/inet_hashtables.h:181 [inline]
BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350
net/ipv4/inet_connection_sock.c:913
Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301

CPU: 1 PID: 8301 Comm: syz-executor.2 Not tainted
6.0.0-syzkaller-02757-gaf7d23f9d96a #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 09/22/2022
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline]
reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913
inet_csk_reqsk_queue_drop net/ipv4/inet_connection_sock.c:927 [inline]
inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:939 [inline]
reqsk_timer_handler+0x724/0x1160 net/ipv4/inet_connection_sock.c:1053
call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790
__run_timers kernel/time/timer.c:1768 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
invoke_softirq kernel/softirq.c:445 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1107
</IRQ>

Fixes: d1e5e6408b30 ("tcp: Introduce optional per-netns ehash.")
Reported-by: syzbot <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: of: collapse overlay entry into main device tree entry

Pantelis has not been active in recent years so no need to maintain
a separate entry for device tree overlays.

Signed-off-by: Frank Rowand <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>

Merge patch series "Some style cleanups for recent extension additions"

Heiko Stuebner <[email protected]> says:

As noted by some people, some parts of the recently added extensions
(svpbmt, zicbom) + t-head errata could use some styling upgrades.

So this series provides these.

changes in v2:
- add patch also converting cpufeature probe to BIT()
- update commit message in patch1 (Conor)

Heiko Stuebner (5):
  riscv: cleanup svpbmt cpufeature probing
  riscv: drop some idefs from CMO initialization
  riscv: use BIT() macros in t-head errata init
  riscv: use BIT() marco for cpufeature probing
  riscv: check for kernel config option in t-head memory types errata

arch/riscv/errata/thead/errata.c    | 14 ++++++-----
arch/riscv/include/asm/cacheflush.h |  2 ++
arch/riscv/kernel/cpufeature.c      | 39 ++++++++++++-----------------
3 files changed, 26 insertions(+), 29 deletions(-)

Link: https://lore.kernel.org/r/[email protected]
* b4-shazam-merge:
  riscv: check for kernel config option in t-head memory types errata
  riscv: use BIT() marco for cpufeature probing
  riscv: use BIT() macros in t-head errata init
  riscv: drop some idefs from CMO initialization
  riscv: cleanup svpbmt cpufeature probing

Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: check for kernel config option in t-head memory types errata

The t-head variant of page-based memory types should also check first
for the enabled kernel config option.

Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head")
Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: use BIT() marco for cpufeature probing

Using the appropriate BIT macro makes the code better readable.

Suggested-by: Conor Dooley <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: use BIT() macros in t-head errata init

Using the appropriate BIT macro makes the code better readable.

Suggested-by: Conor Dooley <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: drop some idefs from CMO initialization

Wrapping things in #ifdefs makes the code harder to read
while we also have IS_ENABLED() macros to do this in regular code
and the extension detection is not _that_ runtime critical.

So define a stub for riscv_noncoherent_supported() in the
non-CONFIG_RISCV_DMA_NONCOHERENT case and move the code to
us IS_ENABLED.

Suggested-by: Conor Dooley <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: cleanup svpbmt cpufeature probing

For better readability (and compile time coverage) use IS_ENABLED
instead of ifdef and drop the new unneeded switch statement.

Signed-off-by: Heiko Stuebner <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Pass -mno-relax only on lld < 15.0.0

lld since llvm:6611d58f5bbc ("[ELF] Relax R_RISCV_ALIGN"), which will be
included in the 15.0.0 release, has implemented some RISC-V linker
relaxation.  -mno-relax is no longer needed in
KBUILD_CFLAGS/KBUILD_AFLAGS to suppress R_RISCV_ALIGN which older lld
can not handle:

    ld.lld: error: capability.c:(.fixup+0x0): relocation R_RISCV_ALIGN
    requires unimplemented linker relaxation; recompile with -mno-relax
    but the .o is already compiled with -mno-relax

Signed-off-by: Fangrui Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Conor Dooley <[email protected]>
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

cifs: use ALIGN() and round_up() macros

Improve code readability by using existing macros:

Replace hardcoded alignment computations (e.g. (len + 7) & ~0x7) by
ALIGN()/IS_ALIGNED() macros.

Also replace (DIV_ROUND_UP(len, 8) * 8) with ALIGN(len, 8), which, if
not optimized by the compiler, has the overhead of a multiplication
and a division. Do the same for roundup() by replacing it by round_up()
(division-less version, but requires the multiple to be a power of 2,
which is always the case for us).

And remove some unnecessary checks where !IS_ALIGNED() would fit, but
calling round_up() directly is fine as it's a no-op if the value is
already aligned.

Signed-off-by: Enzo Matsumiya <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: find and use the dentry for cached non-root directories also

This allows us to use cached attributes for the entries in a cached
directory for as long as a lease is held on the directory itself.
Previously we have always allowed "used cached attributes for 1 second"
but this extends this to the lifetime of the lease as well as making the
caching safer.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: enable caching of directories for which a lease is held

This expands the directory caching to now cache an open handle for all
directories (up to a maximum) and not just the root directory.

In this patch, locking and refcounting is intended to work as so:

The main function to get a reference to a cached handle is
find_or_create_cached_dir() called from open_cached_dir()
These functions are protected under the cfid_list_lock spin-lock
to make sure we do not race creating new references for cached dirs
with deletion of expired ones.

An successful open_cached_dir() will take out 2 references to the cfid if
this was the very first and successful call to open the directory and
it acquired a lease from the server.
One reference is for the lease and the other is for the cfid that we
return. The is lease reference is tracked by cfid->has_lease.
If the directory already has a handle with an active lease, then we just
take out one new reference for the cfid and return it.
It can happen that we have a thread that tries to open a cached directory
where we have a cfid already but we do not, yet, have a working lease. In
this case we will just return NULL, and this the caller will fall back to
the case when no handle was available.

In this model the total number of references we have on a cfid is
1 for while the handle is open and we have a lease, and one additional
reference for each open instance of a cfid.

Once we get a lease break (cached_dir_lease_break()) we remove the
cfid from the list under the spinlock. This prevents any new threads to
use it, and we also call smb2_cached_lease_break() via the work_queue
in order to drop the reference we got for the lease (we drop it outside
of the spin-lock.)
Anytime a thread calls close_cached_dir() we also drop a reference to the
cfid.
When the last reference to the cfid is released smb2_close_cached_fid()
will be invoked which will drop the reference ot the dentry we held for
this cfid and it will also, if we the handle is open/has a lease
also call SMB2_close() to close the handle on the server.

Two events require special handling:
invalidate_all_cached_dirs() this function is called from SMB2_tdis()
and cifs_mark_open_files_invalid().
In both cases the tcon is either gone already or will be shortly so
we do not need to actually close the handles. They will be dropped
server side as part of the tcon dropping.
But we have to be careful about a potential race with a concurrent
lease break so we need to take out additional refences to avoid the
cfid from being freed while we are still referencing it.

free_cached_dirs() which is called from tconInfoFree().
This is called quite late in the umount process so there should no longer
be any open handles or files and we can just free all the remaining data.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: prevent copying past input buffer boundaries

Prevent copying past @data buffer in smb2_validate_and_copy_iov() as
the output buffer in @iov might be potentially bigger and thus copying
more bytes than requested in @minbufsize.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix uninitialised var in smb2_compound_op()

Fix uninitialised variable @idata when calling smb2_compound_op() with
SMB2_OP_POSIX_QUERY_INFO.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: improve symlink handling for smb2+

When creating inode for symlink, the client used to send below
requests to fill it in:

    * create+query_info+close (STATUS_STOPPED_ON_SYMLINK)
    * create(+reparse_flag)+query_info+close (set file attrs)
    * create+ioctl(get_reparse)+close (query reparse tag)

and then for every access to the symlink dentry, the ->link() method
would send another:

    * create+ioctl(get_reparse)+close (parse symlink)

So, in order to improve:

    (i) Get rid of unnecessary roundtrips and then resolve symlinks as
follows:

        * create+query_info+close (STATUS_STOPPED_ON_SYMLINK +
                           parse symlink + get reparse tag)
        * create(+reparse_flag)+query_info+close (set file attrs)

    (ii) Set the resolved symlink target directly in inode->i_link and
         use simple_get_link() for ->link() to simply return it.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

smb3: clarify multichannel warning

When server does not return network interfaces, clarify the
message to indicate that "multichannel not available" not just
that "empty network interface returned by server ..."

Suggested-by: Tom Talpey <[email protected]>
Reviewed-by: Bharath SM <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix regression in very old smb1 mounts

BZ: 215375

Fixes: 76a3c92ec9e0 ("cifs: remove support for NTLM and weaker authentication algorithms")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

vdpa/ifcvf: add reviewer

Zhu Lingshan has been writing and reviewing ifcvf patches for
a while now, add as reviewer.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Zhu Lingshan <[email protected]>
Acked-by: Jason Wang <[email protected]>

virtio_pci: use irq to detect interrupt support

commit 71491c54eafa ("virtio_pci: don't try to use intxif pin is zero")
breaks virtio_pci on powerpc, when running as a qemu guest.

vp_find_vqs() bails out because pci_dev->pin == 0.

But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would
succeed if we called it - which is what the code used to do.

This seems to happen because pci_dev->pin is not populated in
pci_assign_irq(). A PCI core bug? Maybe.

However Linus said:
I really think that that is basically the only time you should use
that 'pci_dev->pin' thing: it basically exists not for "does this
device have an IRQ", but for "what is the routing of this irq on this
device".

and
The correct way to check for "no irq" doesn't use NO_IRQ at all, it just does
if (dev->irq) ...

so let's just check irq and be done with it.

Suggested-by: Linus Torvalds <[email protected]>
Reported-by: Michael Ellerman <[email protected]>
Fixes: 71491c54eafa ("virtio_pci: don't try to use intxif pin is zero")
Cc: "Angus Chen" <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Tested-by: Michael Ellerman <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20221012220312 [email protected]>

powerpc/pseries: Fix CONFIG_DTL=n build

The recently moved dtl code must be compiled-in if
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y even if CONFIG_DTL=n.

Fixes: 6ba5aa541aaa0 ("powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context

It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Reported-by: Sachin Sant <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'wireless-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
More wireless fixes for 6.1

This has only the fixes for the scan parsing issues.

* tag 'wireless-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: cfg80211: update hidden BSSes to avoid WARN_ON
  wifi: mac80211: fix crash in beacon protection for P2P-device
  wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
  wifi: cfg80211: avoid nontransmitted BSS list corruption
  wifi: cfg80211: fix BSS refcounting bugs
  wifi: cfg80211: ensure length byte is present before access
  wifi: mac80211: fix MBSSID parsing use-after-free
  wifi: cfg80211/mac80211: reject bad MBSSID elements
  wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'cve-fixes-2022-10-13'

Pull in the fixes for various scan parsing bugs found by
Sönke Huster by fuzzing.

RISC-V: Avoid dereferening NULL regs in die()

I don't think we can actually die() without a regs pointer, but the
compiler was warning about a NULL check after a dereference. It seems
prudent to just avoid the possibly-NULL dereference, given that when
die()ing the system is already toast so who knows how we got there.

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

highmem: fix kmap_to_page() for kmap_local_page() addresses

kmap_to_page() is used to get the page for a virtual address which may
be kmap'ed.  Unfortunately, kmap_local_page() stores mappings in a
thread local array separate from kmap().  These mappings were not
checked by the call.

Check the kmap_local_page() mappings and return the page if found.

Because it is intended to remove kmap_to_page() add a warn on once to
the kmap checks to flag potential issues early.

NOTE Due to 32bit x86 use of kmap local in iomap atmoic, KMAP_LOCAL does
not require HIGHMEM to be set.  Therefore the support calls required a
new KMAP_LOCAL section to fix 0day build errors.

[[email protected]: fix warning]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ira Weiny <[email protected]>
Reported-by: Al Viro <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: "Fabio M. De Francesco" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page

PGFREE and PGALLOC represent the number of freed and allocated pages. So
the page order must be considered.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/selftest: uffd: explain the write missing fault check

It's not obvious why we had a write check for each of the missing
messages, especially when it should be a locking op. Add a rich comment
for that, and also try to explain its good side and limitations, so that
if someone hit it again for either a bug or a different glibc impl
there'll be some clue to start with.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: use hugetlb_pte_stable in migration race check

After hugetlb_pte_stable() introduced, we can also rewrite the migration
race condition against page allocation to use the new helper too.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nadav Amit <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: fix race condition of uffd missing/minor handling

Patch series "mm/hugetlb: Fix selftest failures with write check", v3.

Currently akpm mm-unstable fails with uffd hugetlb private mapping test
randomly on a write check.

The initial bisection of that points to the recent pmd unshare series, but
it turns out there's no direction relationship with the series but only
some timing change caused the race to start trigger.

The race should be fixed in patch 1.  Patch 2 is a trivial cleanup on the
similar race with hugetlb migrations, patch 3 comment on the write check
so when anyone read it again it'll be clear why it's there.

This patch (of 3):

After the recent rework patchset of hugetlb locking on pmd sharing,
kselftest for userfaultfd sometimes fails on hugetlb private tests with
unexpected write fault checks.

It turns out there's nothing wrong within the locking series regarding
this matter, but it could have changed the timing of threads so it can
trigger an old bug.

The real bug is when we call hugetlb_no_page() we're not with the pgtable
lock.  It means we're reading the pte values lockless.  It's perfectly
fine in most cases because before we do normal page allocations we'll take
the lock and check pte_same() again.  However before that, there are
actually two paths on userfaultfd missing/minor handling that may directly
move on with the fault process without checking the pte values.

It means for these two paths we may be generating an uffd message based on
an unstable pte, while an unstable pte can legally be anything as long as
the modifier holds the pgtable lock.

One example, which is also what happened in the failing kselftest and
caused the test failure, is that for private mappings wr-protection
changes can happen on one page.  While hugetlb_change_protection()
generally requires pte being cleared before being changed, then there can
be a race condition like:

        thread 1                              thread 2
        --------                              --------

      UFFDIO_WRITEPROTECT                     hugetlb_fault
        hugetlb_change_protection
          pgtable_lock()
          huge_ptep_modify_prot_start
                                              pte==NULL
                                              hugetlb_no_page
                                                generate uffd missing event
                                                even if page existed!!
          huge_ptep_modify_prot_commit
          pgtable_unlock()

Fix this by rechecking the pte after pgtable lock for both userfaultfd
missing & minor fault paths.

This bug should have been around starting from uffd hugetlb introduced, so
attaching a Fixes to the commit.  Also attach another Fixes to the minor
support commit for easier tracking.

Note that userfaultfd is actually fine with false positives (e.g.  caused
by pte changed), but not wrong logical events (e.g.  caused by reading a
pte during changing).  The latter can confuse the userspace, so the
strictness is very much preferred.  E.g., MISSING event should never
happen on the page after UFFDIO_COPY has correctly installed the page and
returned.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Peter Xu <[email protected]>
Co-developed-by: Mike Kravetz <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

zram: always expose rw_page

Currently zram will adjust its fops to a version which does not contain
rw_page when a backing device has been assigned. This is done to prevent
upper layers from assuming a synchronous operation when a page may have
been written back. This forces every operation through bio which has
overhead associated with bio_alloc/frees.

The code can be simplified to always expose an rw_page method and only in
the rare event that a page is written back we instead will return
-EOPNOTSUPP forcing the upper layer to fallback to bio.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Brian Geffon <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Rom Lemarchand <[email protected]>
Cc: Suleiman Souhlal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

LoongArch: update local TLB if PTE entry exists

Currently, the implementation of update_mmu_tlb() is empty if
__HAVE_ARCH_UPDATE_MMU_TLB is not defined. Then if two threads
concurrently fault at the same page, the second thread that did not win
the race will give up and do nothing. In the LoongArch architecture, this
second thread will trigger another fault, and only updates its local TLB.

Instead of triggering another fault, it's better to implement
update_mmu_tlb() to directly update the local TLB of the second thread.
Just do it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Suggested-by: Bibo Mao <[email protected]>
Acked-by: Huacai Chen <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: use update_mmu_tlb() on the second thread

As message in commit 7df676974359 ("mm/memory.c: Update local TLB if PTE
entry exists") said, we should update local TLB only on the second thread.
So in the do_anonymous_page() here, we should use update_mmu_tlb()
instead of update_mmu_cache() on the second thread.

As David pointed out, this is a performance improvement, not a
correctness fix.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Bibo Mao <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kasan: fix array-bounds warnings in tests

GCC's -Warray-bounds option detects out-of-bounds accesses to
statically-sized allocations in krealloc out-of-bounds tests.

Use OPTIMIZER_HIDE_VAR to suppress the warning.

Also change kmalloc_memmove_invalid_size to use OPTIMIZER_HIDE_VAR
instead of a volatile variable.

Link: https://lkml.kernel.org/r/e94399242d32e00bba6fd0d9ec4c897f188128e8.1664215688.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

hmm-tests: add test for migrate_device_range()

Link: https://lkml.kernel.org/r/a73cf109de0224cfd118d22be58ddebac3ae2897.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nouveau/dmem: evict device private memory during release

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to still be mapped in currently running
processes. This can lead to a hangs and RCU stall warnings when unbinding
the device as memunmap_pages() will wait in an uninterruptible state until
all device pages have been freed which may never happen.

Fix this by migrating device mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Link: https://lkml.kernel.org/r/66277601fb8fda9af408b33da9887192bf895bda.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nouveau/dmem: refactor nouveau_dmem_fault_copy_one()

nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
the migrate_to_ram() callback and is used to copy data from GPU to CPU
memory. It is currently specific to fault handling, however a future
patch implementing eviction of data during teardown needs similar
functionality.

Refactor out the core functionality so that it is not specific to fault
handling.

Link: https://lkml.kernel.org/r/20573d7b4e641a78fde9935f948e64e71c9e709e.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/migrate_device.c: add migrate_device_range()

Device drivers can use the migrate_vma family of functions to migrate
existing private anonymous mappings to device private pages.  These pages
are backed by memory on the device with drivers being responsible for
copying data to and from device memory.

Device private pages are freed via the pgmap->page_free() callback when
they are unmapped and their refcount drops to zero.  Alternatively they
may be freed indirectly via migration back to CPU memory in response to a
pgmap->migrate_to_ram() callback called whenever the CPU accesses an
address mapped to a device private page.

In other words drivers cannot control the lifetime of data allocated on
the devices and must wait until these pages are freed from userspace.
This causes issues when memory needs to reclaimed on the device, either
because the device is going away due to a ->release() callback or because
another user needs to use the memory.

Drivers could use the existing migrate_vma functions to migrate data off
the device.  However this would require them to track the mappings of each
page which is both complicated and not always possible.  Instead drivers
need to be able to migrate device pages directly so they can free up
device memory.

To allow that this patch introduces the migrate_device family of functions
which are functionally similar to migrate_vma but which skips the initial
lookup based on mapping.

Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()

migrate_device_coherent_page() reuses the existing migrate_vma family of
functions to migrate a specific page without providing a valid mapping or
vma.  This looks a bit odd because it means we are calling migrate_vma_*()
without setting a valid vma, however it was considered acceptable at the
time because the details were internal to migrate_device.c and there was
only a single user.

One of the reasons the details could be kept internal was that this was
strictly for migrating device coherent memory.  Such memory can be copied
directly by the CPU without intervention from a driver.  However this
isn't true for device private memory, and a future change requires similar
functionality for device private memory.  So refactor the code into
something more sensible for migrating device memory without a vma.

Link: https://lkml.kernel.org/r/c7b2ff84e9b33d022cf4a40f87d051f281a16d8f.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Christian König <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/memremap.c: take a pgmap reference on page allocation

ZONE_DEVICE pages have a struct dev_pagemap which is allocated by a
driver.  When the struct page is first allocated by the kernel in
memremap_pages() a reference is taken on the associated pagemap to ensure
it is not freed prior to the pages being freed.

Prior to 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") pages were considered free and returned to the driver when the
reference count dropped to one.  However the pagemap reference was not
dropped until the page reference count hit zero.  This would occur as part
of the final put_page() in memunmap_pages() which would wait for all pages
to be freed prior to returning.

When the extra refcount was removed the pagemap reference was no longer
being dropped in put_page().  Instead memunmap_pages() was changed to
explicitly drop the pagemap references.  This means that memunmap_pages()
can complete even though pages are still mapped by the kernel which can
lead to kernel crashes, particularly if a driver frees the pagemap.

To fix this drivers should take a pagemap reference when allocating the
page.  This reference can then be returned when the page is freed.

Link: https://lkml.kernel.org/r/12d155ec727935ebfbb4d639a03ab374917ea51b.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Fixes: 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page refcount")
Cc: Jason Gunthorpe <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: free device private pages have zero refcount

Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use.  However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.

This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount.  Instead we should return
pages to the drivers page allocator with a zero reference count.  Kernel
code can then safely use kernel functions such as get_page_unless_zero().

Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Alex Sierra <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>