cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
syzbot is reporting circular locking dependency between cpu_hotplug_lock
and freezer_mutex, for commit f5d39b020809 ("freezer,sched: Rewrite core
freezer logic") replaced atomic_inc() in freezer_apply_state() with
static_branch_inc() which holds cpu_hotplug_lock.
It was found that commit 7a2127e66a00 ("cpuset: Call
set_cpus_allowed_ptr() with appropriate mask for task") introduced a bug
that corrupted "cpuset.cpus" of a partition root when it was updated.
It is because the tmp->new_cpus field of the passed tmp parameter
of update_parent_subparts_cpumask() should not be used at all as
it contains important cpumask data that should not be overwritten.
Fix it by using tmp->addmask instead.
Also update update_cpumask() to make sure that trialcs->cpu_allowed
will not be corrupted until it is no longer needed.
Fixes: 7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task") Signed-off-by: Waiman Long <[email protected]> Cc: [email protected] # v6.2+ Signed-off-by: Tejun Heo <[email protected]>
Linus Torvalds [Fri, 17 Mar 2023 18:20:27 +0000 (11:20 -0700)]
Merge tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"A bit bigger than usual, as the NVMe pull request missed last weeks
submission. In detail:
- NVMe pull request via Christoph:
- Avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- More quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- Fix a memory leak in the nvme-pci probe teardown path
(Irvin Cote)
- Repair the MAINTAINERS entry (Lukas Bulwahn)
- Fix handling single range discard request (Ming Lei)
- Show more opcode names in trace events (Minwoo Im)
- Fix nvme-tcp timeout reporting (Sagi Grimberg)
- MD pull request via Song:
- Two fixes for old issues (Neil)
- Resource leak in device stopping (Xiao)
- Fix for reversal of request ordering upon issue for certain cases
(Jan)
- null_blk timeout fixes (Damien)
- Loop use-after-free fix (Bart)
- blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Chris)"
* tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux:
block: remove obsolete config BLOCK_COMPAT
md: select BLOCK_LEGACY_AUTOLOAD
block: count 'ios' and 'sectors' when io is done for bio-based device
block: sunvdc: add check for mdesc_grab() returning NULL
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
block: null_blk: cleanup null_queue_rq()
block: null_blk: Fix handling of fake timeout request
blk-mq: fix "bad unlock balance detected" on q->srcu in __blk_mq_run_dispatch_ops
loop: Fix use-after-free issues
block: do not reverse request order when flushing plug list
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
Linus Torvalds [Fri, 17 Mar 2023 18:12:07 +0000 (11:12 -0700)]
Merge tag 'io_uring-6.3-2023-03-16' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- When PF_NO_SETAFFINITY was removed for io-wq threads, we kind of
forgot about the SQPOLL thread. Remove it there as well, there's even
less of a reason to set it there (Michal)
- Fixup a confusing 'ret' setting (Li)
- When MSG_RING is used to send a direct descriptor to another ring,
it's possible to have it allocate it on the target ring rather than
provide a specific index for it. If this is done, return the chosen
value in the CQE, like we would've done locally (Pavel)
- Fix a regression in this series on huge page bvec collapsing (Pavel)
* tag 'io_uring-6.3-2023-03-16' of git://git.kernel.dk/linux:
io_uring/rsrc: fix folio accounting
io_uring/msg_ring: let target know allocated index
io_uring: rsrc: Optimize return value variable 'ret'
io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads
Linus Torvalds [Fri, 17 Mar 2023 18:02:26 +0000 (11:02 -0700)]
Merge tag 'pm-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an error code path issue in a cpuidle driver and make the
sleepgraph utility more robust against unexpected input.
Specifics:
- Fix the psci_pd_init_topology() failure path in the PSCI cpuidle
driver (Shawn Guo)
- Modify the sleepgraph utility so it does not crash on binary data
in device names (Todd Brandt)"
* tag 'pm-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
pm-graph: sleepgraph: Avoid crashing on binary data in device names
cpuidle: psci: Iterate backwards over list in psci_pd_remove()
Linus Torvalds [Fri, 17 Mar 2023 17:57:09 +0000 (10:57 -0700)]
Merge tag 'acpi-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These add some new quirks, fix PPTT handling, fix an ACPI utility and
correct a mistake in the ACPI documentation.
Specifics:
- Fix ACPI PPTT handling to avoid sleep in the atomic context when it
is not present (Sudeep Holla)
- Add 'backlight=native' DMI quirk for Dell Vostro 15 3535 to the
ACPI video driver (Chia-Lin Kao)
- Add ACPI quirks for I2C device enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede)
- Fix handling of invalid command line option values in the ACPI
pfrut utility (Chen Yu)
- Fix references to I2C device data type in the ACPI documentation
for device enumeration (Andy Shevchenko)"
* tag 'acpi-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
ACPI: PPTT: Fix to avoid sleep in the atomic context when PPTT is absent
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
ACPI: docs: enumeration: Correct reference to the I²C device data type
Linus Torvalds [Fri, 17 Mar 2023 17:33:33 +0000 (10:33 -0700)]
Merge tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- fixes to the ASID allocator to avoid leaking stale mappings between
tasks
- fix the vmalloc fault handler to tolerate huge pages
* tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: mm: Support huge page in vmalloc_fault()
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
Linus Torvalds [Fri, 17 Mar 2023 17:15:53 +0000 (10:15 -0700)]
Merge tag 's390-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Update defconfigs
- Fix early boot code by adding missing intersection check to prevent
potential overwriting of the ipl report
- Fix a use-after-free issue in s390-specific code related to PCI
resources being retained after hot-unplugging individual functions,
by removing the resources from the PCI bus's resource list and using
the zpci_bar_struct's resource pointer directly
* tag 's390-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
PCI: s390: Fix use-after-free of PCI resources with per-function hotplug
s390/ipl: add missing intersection check to ipl_report handling
Linus Torvalds [Fri, 17 Mar 2023 17:01:07 +0000 (10:01 -0700)]
Merge tag 'powerpc-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix false detection of read faults, introduced by execute-only
support
- Fix a build failure when GENERIC_ALLOCATOR is not selected
Thanks to Russell Currey, Randy Dunlap, Michal Suchánek, Nathan Lynch,
and Benjamin Gray.
* tag 'powerpc-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix false detection of read faults
powerpc/pseries: RTAS work area requires GENERIC_ALLOCATOR
Linus Torvalds [Fri, 17 Mar 2023 16:43:10 +0000 (09:43 -0700)]
Merge tag 'sound-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing surprising, a collection of small device-specific fixes.
The majority of changes are for ASoC Intel stuff, while a few other
ASoC and HD-audio fixes are found"
* tag 'sound-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
ALSA: asihpi: check pao in control_message()
ASoC: hdmi-codec: only startup/shutdown on supported streams
ASoC: da7219: Initialize jack_det_mutex
ALSA: hda: Match only Intel devices with CONTROLLER_IN_GPU()
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro
ALSA: hda/realtek: fix speaker, mute/micmute LEDs not work on a HP platform
ALSA: hda: intel-dsp-config: add MTL PCI id
ASoC: SOF: IPC4: update gain ipc msg definition to align with fw
ASoC: SOF: sof-audio: don't squelch errors in WIDGET_SETUP phase
ASoC: SOF: Intel: hda-ctrl: re-add sleep after entering and exiting reset
ASoC: SOF: Intel: hda-dsp: harden D0i3 programming sequence
ASoC: SOF: ipc4-topology: set dmic dai index from copier
ASoC: SOF: sof-audio: Fix broken early bclk feature for SSP
ASoC: SOF: Intel: pci-tng: revert invalid bar size setting
ASoC: SOF: topology: Fix error handling in sof_widget_ready()
ASoC: Intel: soc-acpi: fix copy-paste issue in topology names
ASoC: SOF: ipc4-topology: Fix incorrect sample rate print unit
ASoC: SOF: ipc3: Check for upper size limit for the received message
ASOC: SOF: Intel: pci-tgl: Fix device description
...
Linus Torvalds [Fri, 17 Mar 2023 16:35:40 +0000 (09:35 -0700)]
Merge tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Seems like a pretty regular rc3, i915 and amdgpu with the usual
selection of fixes, then a scattering of fixes across misc drivers and
other areas:
i915:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
amdgpu:
- SMU 13 update
- RDNA2 suspend/resume fix when overclocking is enabled
- SRIOV VCN fixes
- HDCP suspend/resume fix
- Fix drm polling splat regression
- Fix dirty rectangle tracking for PSR
- Fix vangogh regression on certain BIOSes
- Misc display fixes
- Suspend/resume IOMMU regression fix
amdkfd:
- Fix BO offset for multi-VMA page migration
- Fix a possible double free
- Fix potential use after free
- Fix process cleanup on module exit
bridge:
- fix returned array size name documentation
fbdev:
- ref-counting fix for fbdev deferred I/O
virtio:
- dma sync fix
shmem-helper:
- error path fix
msm:
- shrinker blocking fix
panfrost:
- shrinker rpm fix
chipsfb:
- fix error code
meson:
- fix 1px pink line
- fix regulator interaction
sun4i:
- fix missing component unbind"
* tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs
drm/amdgpu: Don't resume IOMMU after incomplete init
drm/amdkfd: Fixed kfd_process cleanup on module exit.
drm/amd/display: disconnect MPCC only on OTG change
drm/amd/display: Fix DP MST sinks removal issue
drm/amd/display: Do not set DRR on pipe Commit
drm/amd/display: Remove OTG DIV register write for Virtual signals.
drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion again
drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
drm/amdgpu/nv: fix codec array for SR_IOV
drm/amd/display: Write to correct dirty_rect
drm/amdgpu: move poll enabled/disable into non DC path
drm/amd/display: Fix HDCP failing to enable after suspend
drm/amdkfd: fix potential kgd_mem UAFs
drm/amdgpu/vcn: custom video info caps for sriov
drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume
drm/amd/pm: bump SMU 13.0.4 driver_if header version
drm/amdkfd: fix a potential double free in pqm_create_queue
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
...
Linus Torvalds [Fri, 17 Mar 2023 16:30:57 +0000 (09:30 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Ten patches, eight in drivers and two in the core, which correct a
regression from directory removal and add a no VPD size quirk also to
fix a regression. All pretty small"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Use active_reqs to check busy in clock scaling
scsi: core: Fix a procfs host directory removal regression
scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD
scsi: mpi3mr: Fix expander node leak in mpi3mr_remove()
scsi: mpi3mr: Fix memory leaks in mpi3mr_init_ioc()
scsi: mpi3mr: Fix sas_hba.phy memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix mpi3mr_hba_port memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix config page DMA memory leak
scsi: mpi3mr: Fix throttle_groups memory leak
scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add()
Merge branches 'acpi-video', 'acpi-x86', 'acpi-tools' and 'acpi-docs'
Merge a new ACPI backlight quirk, new ACPI quirks for I2C device
enumeration on some platforms, a pfrut utility fix and an ACPI
documentation fix for 6.3-rc3:
- Add backlight=native DMI quirk for Dell Vostro 15 3535 to the ACPI
video driver (Chia-Lin Kao).
- Add ACPI quirks for I2C devices enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede).
- Fix handling of invalid command line option values in the ACPI pfrut
utility (Chen Yu).
- Fix references to I2C device data type in the ACPI documentation for
device enumeration (Andy Shevchenko).
Dave Airlie [Fri, 17 Mar 2023 02:03:57 +0000 (12:03 +1000)]
Merge tag 'drm-intel-fixes-2023-03-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.3-rc3:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
Linus Torvalds [Thu, 16 Mar 2023 22:06:16 +0000 (15:06 -0700)]
Merge tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Seven cifs/smb3 client fixes, all also for stable:
- four DFS fixes
- multichannel reconnect fix
- fix smb1 stats for cancel command
- fix for set file size error path"
* tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: use DFS root session instead of tcon ses
cifs: return DFS root session id in DebugData
cifs: fix use-after-free bug in refresh_cache_worker()
cifs: set DFS root session in cifs_get_smb_ses()
cifs: generate signkey for the channel that's reconnecting
cifs: Fix smb2_set_path_size()
cifs: Move the in_send statistic to __smb_send_rqst()
Linus Torvalds [Thu, 16 Mar 2023 18:32:12 +0000 (11:32 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Address a rather annoying bug w.r.t. guest timer offsetting. The
synchronization of timer offsets between vCPUs was broken, leading
to inconsistent timer reads within the VM.
x86:
- New tests for the slow path of the EVTCHNOP_send Xen hypercall
- Add missing nVMX consistency checks for CR0 and CR4
- Fix bug that broke AMD GATag on 512 vCPU machines
Selftests:
- Skip hugetlb tests if huge pages are not available
- Sync KVM exit reasons"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Sync KVM exit reasons in selftests
KVM: selftests: Add macro to generate KVM exit reason strings
KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
KVM: selftests: Make vCPU exit reason test assertion common
KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
KVM: selftests: Use enum for test numbers in xen_shinfo_test
KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
KVM: SVM: WARN if GATag generation drops VM or vCPU ID information
KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs
KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask
selftests: KVM: skip hugetlb tests if huge pages are not available
KVM: VMX: Use tabs instead of spaces for indentation
KVM: VMX: Fix indentation coding style issue
KVM: nVMX: remove unnecessary #ifdef
KVM: nVMX: add missing consistency checks for CR0 and CR4
KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
Lukas Bulwahn [Thu, 16 Mar 2023 11:16:30 +0000 (12:16 +0100)]
block: remove obsolete config BLOCK_COMPAT
Before commit bdc1ddad3e5f ("compat_ioctl: block: move
blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to
include compat_ioctl.c into the kernel build. With this commit, the code
is moved into ioctl.c and included with the config COMPAT. So, since then,
the config BLOCK_COMPAT has no effect and any further purpose.
Mark reports an issue with the recent patches coalescing compound pages
while registering them in io_uring. The reason is that we try to drop
excessive references with folio_put_refs(), but pages were acquired
with pin_user_pages(), which has extra accounting and so should be put
down with matching unpin_user_pages() or at least gup_put_folio().
As a fix unpin_user_pages() all but first page instead, and let's figure
out a better API after.
Pavel Begunkov [Thu, 16 Mar 2023 12:11:42 +0000 (12:11 +0000)]
io_uring/msg_ring: let target know allocated index
msg_ring requests transferring files support auto index selection via
IORING_FILE_INDEX_ALLOC, however they don't return the selected index
to the target ring and there is no other good way for the userspace to
know where is the receieved file.
Return the index for allocated slots and 0 otherwise, which is
consistent with other fixed file installing requests.
Jens Axboe [Thu, 16 Mar 2023 13:01:48 +0000 (07:01 -0600)]
Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
- repair the MAINTAINERS entry (Lukas Bulwahn)
- fix handling single range discard request (Ming Lei)
- show more opcode names in trace events (Minwoo Im)
- fix nvme-tcp timeout reporting (Sagi Grimberg)"
* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
Felix Kuehling [Tue, 14 Mar 2023 00:03:08 +0000 (20:03 -0400)]
drm/amdgpu: Don't resume IOMMU after incomplete init
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
KFD IOMMU initialization failed.
Cruise Hung [Thu, 2 Mar 2023 02:33:51 +0000 (10:33 +0800)]
drm/amd/display: Fix DP MST sinks removal issue
[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.
[How]
Restore the link connection type if it fails to read DPCD register.
Saaem Rizvi [Mon, 27 Feb 2023 23:55:07 +0000 (18:55 -0500)]
drm/amd/display: Remove OTG DIV register write for Virtual signals.
[WHY]
Hot plugging and then hot unplugging leads to k1 and k2 values to
change, as signal is detected as a virtual signal on hot unplug. Writing
these values to OTG_PIXEL_RATE_DIV register might cause primary display
to blank (known hw bug).
[HOW]
No longer write k1 and k2 values to register if signal is virtual, we
have safe guards in place in the case that k1 and k2 is unassigned so
that an unknown value is not written to the register either.
Linus Torvalds [Wed, 15 Mar 2023 19:20:37 +0000 (12:20 -0700)]
Merge tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
kselftest common lib.mk"
* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix LLVM build for i386 and x86_64
selftests: amd-pstate: fix TEST_FILES
Jens Axboe [Wed, 15 Mar 2023 18:18:07 +0000 (12:18 -0600)]
Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.3
Pull MD fixes from Song:
"This set contains two fixes for old issues (by Neil) and one fix
for 6.3 (by Xiao)."
* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: select BLOCK_LEGACY_AUTOLOAD
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
NeilBrown [Mon, 13 Mar 2023 20:29:17 +0000 (13:29 -0700)]
md: select BLOCK_LEGACY_AUTOLOAD
When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to
activate new arrays unless "CREATE names=yes" appears in
mdadm.conf
As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD
for when MD is selected - at least until mdadm is updated and the
updates widely available.
Yu Kuai [Thu, 23 Feb 2023 09:12:26 +0000 (17:12 +0800)]
block: count 'ios' and 'sectors' when io is done for bio-based device
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Damien Le Moal [Mon, 6 Mar 2023 01:13:13 +0000 (10:13 +0900)]
nvmet: avoid potential UAF in nvmet_req_complete()
An nvme target ->queue_response() operation implementation may free the
request passed as argument. Such implementation potentially could result
in a use after free of the request pointer when percpu_ref_put() is
called in nvmet_req_complete().
Avoid such problem by using a local variable to save the sq pointer
before calling __nvmet_req_complete(), thus avoiding dereferencing the
req pointer after that function call.
Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
Sagi Grimberg [Mon, 13 Mar 2023 08:56:22 +0000 (10:56 +0200)]
nvme-tcp: fix opcode reporting in the timeout handler
For non in-capsule writes we reuse the request pdu space for a h2cdata
pdu in order to avoid over allocating space (either preallocate or
dynamically upon receving an r2t pdu). However if the request times out
the core expects to find the opcode in the start of the request, which
we override.
In order to prevent that, without sacrificing additional 24 bytes per
request, we just use the tail of the command pdu space instead (last
24 bytes from the 72 bytes command pdu). That should make the command
opcode always available, and we get away from allocating more space.
If in the future we would need the last 24 bytes of the nvme command
available we would need to allocate a dedicated space for it in the
request, but until then we can avoid doing so.
Ming Lei [Fri, 3 Mar 2023 23:13:45 +0000 (07:13 +0800)]
nvme: fix handling single range discard request
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.
Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.
Fix the issue by building range from request sector/nr_sectors directly.
Fixes: b35ba01ea697 ("nvme: support ranged discard requests") Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
Michal Koutný [Tue, 14 Mar 2023 18:33:32 +0000 (19:33 +0100)]
io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads
Users may specify a CPU where the sqpoll thread would run. This may
conflict with cpuset operations because of strict PF_NO_SETAFFINITY
requirement. That flag is unnecessary for polling "kernel" threads, see
the reasoning in commit 01e68ce08a30 ("io_uring/io-wq: stop setting
PF_NO_SETAFFINITY on io-wq workers"). Drop the flag on poll threads too.
Damien Le Moal [Tue, 14 Mar 2023 04:11:06 +0000 (13:11 +0900)]
block: null_blk: cleanup null_queue_rq()
Use a local struct request pointer variable to avoid having to
dereference struct blk_mq_queue_data multiple times. While at it, also
fix the function argument indentation and remove a useless "else" after
a return.
Damien Le Moal [Tue, 14 Mar 2023 04:11:05 +0000 (13:11 +0900)]
block: null_blk: Fix handling of fake timeout request
When injecting a fake timeout into the null_blk driver using
fail_io_timeout, the request timeout handler does not execute
blk_mq_complete_request(), so the complete callback is never executed
for a timedout request.
The null_blk driver also has a driver-specific fake timeout mechanism
which does not have this problem. Fix the problem with fail_io_timeout
by using the same meachanism as null_blk internal timeout feature, using
the fake_timeout field of null_blk commands.
Russell Currey [Fri, 10 Mar 2023 05:08:34 +0000 (16:08 +1100)]
powerpc/mm: Fix false detection of read faults
To support detection of read faults with Radix execute-only memory, the
vma_is_accessible() check in access_error() (which checks for PROT_NONE)
was replaced with a check to see if VM_READ was missing, and if so,
returns true to assert the fault was caused by a bad read.
This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply
read on powerpc, as defined in protection_map[]. This causes mappings
containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of
page faults, since the MMU is still allowing reads.
Correct this by restoring the original vma_is_accessible() check for
PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only
mappings.
drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion again
devm_regulator_get_enable_optional() returns -ENODEV if requested
optional regulator is not present. Adjust code for that, because in the 67d0a30128c9 I've incorrectly assumed that it also returns 0 when
regulator is not present.
Liu Ying [Tue, 14 Mar 2023 05:50:35 +0000 (13:50 +0800)]
drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
The returned array size for input formats is set through
atomic_get_input_bus_fmts()'s 'num_input_fmts' argument, so use
'num_input_fmts' to represent the array size in the function's kdoc,
not 'num_output_fmts'.
Paulo Alcantara [Tue, 14 Mar 2023 23:32:56 +0000 (20:32 -0300)]
cifs: use DFS root session instead of tcon ses
Use DFS root session whenever possible to get new DFS referrals
otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that
doesn't respond to them. It should be safe accessing
@ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it
has same lifetime as of @tcon.
Paulo Alcantara [Tue, 14 Mar 2023 23:32:55 +0000 (20:32 -0300)]
cifs: return DFS root session id in DebugData
Return the DFS root session id in /proc/fs/cifs/DebugData to make it
easier to track which IPC tcon was used to get new DFS referrals for a
specific connection, and aids in debugging.
Linus Torvalds [Wed, 15 Mar 2023 02:32:38 +0000 (19:32 -0700)]
sched_getaffinity: don't assume 'cpumask_size()' is fully initialized
The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good. It is indeed the allocation size of a
cpumask.
But the code also assumes that the whole allocation is initialized
without actually doing so itself. That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.
Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.
See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.
Fix this by just zeroing the allocation before using it. Do the same
for the compat version of sched_getaffinity(), which had the same logic.
Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits. For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'. The compat case already did that.
Dylan Jhong [Fri, 10 Mar 2023 07:50:21 +0000 (15:50 +0800)]
RISC-V: mm: Support huge page in vmalloc_fault()
Since RISC-V supports ioremap() with huge page (pud/pmd) mapping,
However, vmalloc_fault() assumes that the vmalloc range is limited
to pte mappings. To complete the vmalloc_fault() function by adding
huge page support.
Paulo Alcantara [Tue, 14 Mar 2023 23:32:54 +0000 (20:32 -0300)]
cifs: fix use-after-free bug in refresh_cache_worker()
The UAF bug occurred because we were putting DFS root sessions in
cifs_umount() while DFS cache refresher was being executed.
Make DFS root sessions have same lifetime as DFS tcons so we can avoid
the use-after-free bug is DFS cache refresher and other places that
require IPCs to get new DFS referrals on. Also, get rid of mount
group handling in DFS cache as we no longer need it.
This fixes below use-after-free bug catched by KASAN
Paulo Alcantara [Tue, 14 Mar 2023 23:32:53 +0000 (20:32 -0300)]
cifs: set DFS root session in cifs_get_smb_ses()
Set the DFS root session pointer earlier when creating a new SMB
session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon()
and DFS cache refresher.
Chris Leech [Fri, 10 Mar 2023 01:09:13 +0000 (09:09 +0800)]
blk-mq: fix "bad unlock balance detected" on q->srcu in __blk_mq_run_dispatch_ops
The 'q' parameter of the macro __blk_mq_run_dispatch_ops may not be one
local variable, such as, it is rq->q, then request queue pointed by
this variable could be changed to another queue in case of
BLK_MQ_F_TAG_QUEUE_SHARED after 'dispatch_ops' returns, then
'bad unlock balance' is triggered.
Fixes the issue by adding one local variable for doing srcu lock/unlock.
Bart Van Assche [Tue, 14 Mar 2023 18:21:54 +0000 (11:21 -0700)]
loop: Fix use-after-free issues
do_req_filebacked() calls blk_mq_complete_request() synchronously or
asynchronously when using asynchronous I/O unless memory allocation fails.
Hence, modify loop_handle_cmd() such that it does not dereference 'cmd' nor
'rq' after do_req_filebacked() finished unless we are sure that the request
has not yet been completed. This patch fixes the following kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000054
Call trace:
css_put.42938+0x1c/0x1ac
loop_process_work+0xc8c/0xfd4
loop_rootcg_workfn+0x24/0x34
process_one_work+0x244/0x558
worker_thread+0x400/0x8fc
kthread+0x16c/0x1e0
ret_from_fork+0x10/0x20
Linus Torvalds [Wed, 15 Mar 2023 00:13:58 +0000 (17:13 -0700)]
Merge tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Eleven hotfixes.
Four of these are cc:stable and the remainder address post-6.2 issues
or aren't considered suitable for backporting.
Seven of these fixes are for MM"
* tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate()
mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young()
ocfs2: fix data corruption after failed write
migrate_pages: try migrate in batch asynchronously firstly
migrate_pages: move split folios processing out of migrate_pages_batch()
migrate_pages: fix deadlock in batched migration
.mailmap: add Alexandre Ghiti personal email address
mailmap: correct Dikshita Agarwal's Qualcomm email address
mailmap: updates for Jarkko Sakkinen
mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage
mm: teach mincore_hugetlb about pte markers
Linus Torvalds [Wed, 15 Mar 2023 00:07:54 +0000 (17:07 -0700)]
Merge tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Do not allow histogram values to have modifies. They can cause a NULL
pointer dereference if they do.
- Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
dereference mentioned above.
- Fix invalid address look up race in lookup_rec()
- Define ftrace_stub_graph conditionally to prevent linker errors
- Always check if RCU is watching at all tracepoint locations
* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make tracepoint lockdep check actually test something
ftrace,kcfi: Define ftrace_stub_graph conditionally
ftrace: Fix invalid address access in lookup_rec() when index is 0
tracing: Check field value in hist_field_name()
tracing: Do not let histogram values have some modifiers
Linus Torvalds [Wed, 15 Mar 2023 00:03:25 +0000 (17:03 -0700)]
Merge tag 'zstd-linus-v6.3-rc3' of https://github.com/terrelln/linux
Pull zstd fixes from Nick Terrell:
"A small number of fixes for zstd-v1.5.2.
I'm not pulling in zstd-v1.5.4 from upstream this release because it
didn't have any time to bake in linux-next, but I'm aiming for the
next update in v6.4"
* tag 'zstd-linus-v6.3-rc3' of https://github.com/terrelln/linux:
zstd: Fix definition of assert()
lib: zstd: Backport fix for in-place decompression
lib: zstd: Fix -Wstringop-overflow warning
Linus Torvalds [Tue, 14 Mar 2023 23:58:33 +0000 (16:58 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A collection of clk driver fixes, and a couple OF clk patches to fix
regressions seen in the last few weeks. The fwnode patch broke the
build for one driver that isn't always compiled, so I waited over the
weekend to be certain no more build issues came up.
- Mark the firmware node (fwnode) that matches the compatible in
CLK_OF_DECLARE() as initialized to fix a regression on u8500 SoCs
after fw_devlink stopped checking parent nodes in
of_link_to_phandle()
- Remove a couple MODULE_LICENSE macros in non-modules
- Update the maintainers file for Microchip clk drivers
- Use 'select' instead of 'depend on' for the REGMAP config to fix
Kconfig issues
- Use div_u64() for portable 64-bit division in K210 clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Avoid invalid function names in CLK_OF_DECLARE()
clk: k210: remove an implicit 64-bit division
MAINTAINERS: add missing clock driver coverage for Microchip FPGAs
clk: HI655X: select REGMAP instead of depending on it
kbuild, clk: remove MODULE_LICENSE in non-modules
kbuild, clk: bcm2835: remove MODULE_LICENSE in non-modules
clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro
Shyam Prasad N [Fri, 10 Mar 2023 15:32:01 +0000 (15:32 +0000)]
cifs: generate signkey for the channel that's reconnecting
Before my changes to how multichannel reconnects work, the
primary channel was always used to do a non-binding session
setup. With my changes, that is not the case anymore.
Missed this place where channel at index 0 was forcibly
updated with the signing key.
If the tracepoint is enabled, it could trigger RCU issues if called in
the wrong place. And this warning was only triggered if lockdep was
enabled. If the tracepoint was never enabled with lockdep, the bug would
not be caught. To handle this, the above sequence was done when lockdep
was enabled regardless if the tracepoint was enabled or not (although the
always enabled code really didn't do anything, it would still trigger a
warning).
But a lot has changed since that lockdep code was added. One is, that
sequence no longer triggers any warning. Another is, the tracepoint when
enabled doesn't even do that sequence anymore.
The main check we care about today is whether RCU is "watching" or not.
So if lockdep is enabled, always check if rcu_is_watching() which will
trigger a warning if it is not (tracepoints require RCU to be watching).
Note, that old sequence did add a bit of overhead when lockdep was enabled,
and with the latest kernel updates, would cause the system to slow down
enough to trigger kernel "stalled" warnings.
Chen Yu [Wed, 8 Mar 2023 13:23:09 +0000 (21:23 +0800)]
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
The user provides arbitrary non-numeic value to level and type,
which could bring unexpected behavior. In this case the expected
behavior would be to throw an error.
Sudeep Holla [Wed, 8 Mar 2023 11:26:32 +0000 (11:26 +0000)]
ACPI: PPTT: Fix to avoid sleep in the atomic context when PPTT is absent
Commit 0c80f9e165f8 ("ACPI: PPTT: Leave the table mapped for the runtime usage")
enabled to map PPTT once on the first invocation of acpi_get_pptt() and
never unmapped the same allowing it to be used at runtime with out the
hassle of mapping and unmapping the table. This was needed to fetch LLC
information from the PPTT in the cpuhotplug path which is executed in
the atomic context as the acpi_get_table() might sleep waiting for a
mutex.
However it missed to handle the case when there is no PPTT on the system
which results in acpi_get_pptt() being called from all the secondary
CPUs attempting to fetch the LLC information in the atomic context
without knowing the absence of PPTT resulting in the splat like below:
| BUG: sleeping function called from invalid context at kernel/locking/semaphore.c:164
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| no locks held by swapper/1/0.
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x61c/0x1b40
| softirqs last enabled at (0): copy_process+0x61c/0x1b40
| softirqs last disabled at (0): 0x0
| CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.3.0-rc1 #1
| Call trace:
| dump_backtrace+0xac/0x138
| show_stack+0x30/0x48
| dump_stack_lvl+0x60/0xb0
| dump_stack+0x18/0x28
| __might_resched+0x160/0x270
| __might_sleep+0x58/0xb0
| down_timeout+0x34/0x98
| acpi_os_wait_semaphore+0x7c/0xc0
| acpi_ut_acquire_mutex+0x58/0x108
| acpi_get_table+0x40/0xe8
| acpi_get_pptt+0x48/0xa0
| acpi_get_cache_info+0x38/0x140
| init_cache_level+0xf4/0x118
| detect_cache_attributes+0x2e4/0x640
| update_siblings_masks+0x3c/0x330
| store_cpu_topology+0x88/0xf0
| secondary_start_kernel+0xd0/0x168
| __secondary_switched+0xb8/0xc0
Update acpi_get_pptt() to consider the fact that PPTT is once checked and
is not available on the system and return NULL avoiding any attempts to
fetch PPTT and thereby avoiding any possible sleep waiting for a mutex
in the atomic context.
Linus Torvalds [Tue, 14 Mar 2023 18:08:28 +0000 (11:08 -0700)]
Merge tag 'docs-6.3-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of fixes and minor documentation updates"
* tag 'docs-6.3-fixes' of git://git.lwn.net/linux:
docs: vfio: fix header path
docs: process: typo fix
docs/mm: hugetlbfs_reserv: fix a reference to a file that doesn't exist
docs/mm: Physical Memory: fix a reference to a file that doesn't exist
docs: rebasing-and-merging: Drop wrong statement about git
docs: programming-language: add Rust programming language section
docs: programming-language: remove mention of the Intel compiler
docs: Correct missing "d_" prefix for dentry_operations member d_weak_revalidate
sched/doc: supplement CPU capacity with RISC-V
Todd Brandt [Mon, 13 Mar 2023 22:26:52 +0000 (15:26 -0700)]
pm-graph: sleepgraph: Avoid crashing on binary data in device names
A regression has occurred in the hid-sensor code where a device
name string has not been initialized to 0, and ends up without
a NULL char and is printed with %s. This includes random binary
data in the device name, which makes its way into the ftrace output
and ends up crashing sleepgraph because it expects the ftrace output
to be ASCII only.
For example: "HID-SENSOR-INT-020b?.39.auto" ends up in ftrace instead
of "HID-SENSOR-INT-020b.39.auto". It causes this crash in sleepgraph:
File "/usr/bin/sleepgraph", line 5579, in executeSuspend
for line in fp:
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position
1568: invalid start byte
The issue is present in 6.3-rc1 and is described in full here:
https://bugzilla.kernel.org/show_bug.cgi?id=217169
A separate fix has been submitted to have this issue repaired, but
it has also exposed a larger bug in sleepgraph, since nothing should
make sleepgraph crash. Sleepgraph needs to be able to handle binary
data showing up in ftrace gracefully.
Modify the ftrace processing code to treat it as potentially binary
and to filter out binary data and leave just the ASCII.
sound/pci/hda/patch_ca0132.c:4229:2: note: After for loop, i has value 12
for (i = 0; i < TUNING_CTLS_COUNT; i++)
^
sound/pci/hda/patch_ca0132.c:4234:43: note: Array index out of bounds
dspio_set_param(codec, ca0132_tuning_ctls[i].mid, 0x20,
^
This patch cares non match case.
Therefore, We will get too many warning via cppcheck, like below
sound/pci/asihpi/hpi6205.c:238:27: warning: Possible null pointer dereference: pao [nullPointer]
struct hpi_hw_obj *phw = pao->priv;
^
sound/pci/asihpi/hpi6205.c:433:13: note: Calling function '_HPI_6205', 1st argument 'NULL' value is 0
_HPI_6205(NULL, phm, phr);
^
sound/pci/asihpi/hpi6205.c:401:20: note: Calling function 'control_message', 1st argument 'pao' value is 0
control_message(pao, phm, phr);
^
Set phr->error like many functions doing, and don't call _HPI_6205()
with NULL.
Jan Kara [Mon, 13 Mar 2023 09:30:02 +0000 (10:30 +0100)]
block: do not reverse request order when flushing plug list
Commit 26fed4ac4eab ("block: flush plug based on hardware and software
queue order") changed flushing of plug list to submit requests one
device at a time. However while doing that it also started using
list_add_tail() instead of list_add() used previously thus effectively
submitting requests in reverse order. Also when forming a rq_list with
remaining requests (in case two or more devices are used), we
effectively reverse the ordering of the plug list for each device we
process. Submitting requests in reverse order has negative impact on
performance for rotational disks (when BFQ is not in use). We observe
10-25% regression in random 4k write throughput, as well as ~20%
regression in MariaDB OLTP benchmark on rotational storage on btrfs
filesystem.
Fix the problem by preserving ordering of the plug list when inserting
requests into the queuelist as well as by appending to requeue_list
instead of prepending to it.
drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Benjamin Cheng [Mon, 13 Mar 2023 00:47:39 +0000 (20:47 -0400)]
drm/amd/display: Write to correct dirty_rect
When FB_DAMAGE_CLIPS are provided in a non-MPO scenario, the loop does
not use the counter i. This causes the fill_dc_dity_rect() to always
fill dirty_rects[0], causing graphical artifacts when a damage clip
aware DRM client sends more than 1 damage clip.
Instead, use the flip_addrs->dirty_rect_count which is incremented by
fill_dc_dirty_rect() on a successful fill.
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2453 Signed-off-by: Benjamin Cheng <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.1.x
Guchun Chen [Thu, 9 Mar 2023 02:02:45 +0000 (10:02 +0800)]
drm/amdgpu: move poll enabled/disable into non DC path
Some amd asics having reliable hotplug support don't call
drm_kms_helper_poll_init in driver init sequence. However,
due to the unified suspend/resume path for all asics, because
the output_poll_work->func is not set for these asics, a warning
arrives when suspending.
drm_kms_helper_poll_enable/disable is valid when poll_init is called in
amdgpu code, which is only used in non DC path. So move such codes into
non-DC path code to get rid of such warnings.
v1: introduce use_kms_poll flag in amdgpu as the poll stuff check
v2: use dc_enabled as the flag to simply code
v3: move code into non DC path instead of relying on any flag
drm/amd/display: Fix HDCP failing to enable after suspend
[Why]
On resume some displays are not ready for HDCP, so they will fail if we
start the hdcp authentintication too soon.
Add a delay so that the displays can be ready before we start.
NOTE: Previoulsy this delay was set to 3 seconds but it was causing
issues with compliance, 2 seconds should enough for compliance and the
s3 resume case.
Chia-I Wu [Wed, 8 Mar 2023 21:37:24 +0000 (13:37 -0800)]
drm/amdkfd: fix potential kgd_mem UAFs
kgd_mem pointers returned by kfd_process_device_translate_handle are
only guaranteed to be valid while p->mutex is held. As soon as the mutex
is unlocked, another thread can free the BO.
drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume
Always setup overdrive tables after resume. Preserve only some
user-defined settings in user_overdrive_table if they're set.
Copy restored user_overdrive_table into od_table to get correct
values.
On cold boot, BTC was triggered and GfxVfCurve was calibrated. We
got VfCurve settings (a). On resuming back, BTC will be triggered
again and GfxVfCurve will be recalibrated. VfCurve settings (b)
got may be different from those of cold boot. So if we reuse
those VfCurve settings (a) got on cold boot on suspend, we can
run into discrepencies.
Xiaogang Chen [Thu, 9 Mar 2023 23:44:55 +0000 (17:44 -0600)]
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
During miration to vram prange->offset is valid after vram buffer is located,
either use old one or allocate a new one. Move svm_range_vram_node_new before
migrate for each vma to get valid prange->offset.
v2: squash in warning fix
Fixes: b4ee9606378b ("drm/amdkfd: Fix BO offset for multi-VMA page migration") Signed-off-by: Xiaogang Chen <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Xiaogang Chen [Wed, 1 Mar 2023 16:21:06 +0000 (10:21 -0600)]
drm/amdkfd: Fix BO offset for multi-VMA page migration
svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may
cross multiple vma. Need remember current dst vram offset in the TTM resource for
each migration.
Jan Beulich [Mon, 13 Mar 2023 14:45:48 +0000 (15:45 +0100)]
x86/PVH: obtain VGA console info in Dom0
A new platform-op was added to Xen to allow obtaining the same VGA
console information PV Dom0 is handed. Invoke the new function and have
the output data processed by xen_init_vga().