vdso/datapage: Quick fix - use asm/page-def.h for ARM64
The vdso rework for the generic union vdso_data_store broke compat VDSO on
arm64:
In file included from arch/arm64/include/asm/lse.h:5,
from arch/arm64/include/asm/cmpxchg.h:14,
from arch/arm64/include/asm/atomic.h:16,
from include/linux/atomic.h:7,
from include/asm-generic/bitops/atomic.h:5,
from arch/arm64/include/asm/bitops.h:25,
from include/linux/bitops.h:68,
from arch/arm64/include/asm/memory.h:209,
from arch/arm64/include/asm/page.h:46,
from include/vdso/datapage.h:22,
from lib/vdso/gettimeofday.c:5,
from <command-line>:
arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128'
298 | u128 full;
| ^~~~
arch/arm64/include/asm/atomic_ll_sc.h:305:24: error: unknown type name 'u128'
305 | static __always_inline u128
\
|
The reason is the include of asm/page.h which in turn includes headers
which are outside the scope of compat VDSO. The only reason for the
asm/page.h include is the required definition of PAGE_SIZE. But as arm64
defines PAGE_SIZE in asm/page-def.h without extra header includes, this
could be used instead.
Caution: this is a quick fix only! The final fix is an upcoming cleanup of
Arnd which consolidates PAGE_SIZE definition. After the cleanup, the
include of asm/page.h to access PAGE_SIZE is no longer required.
timers: Assert no next dyntick timer look-up while CPU is offline
The next timer (re-)evaluation, with the purpose of entering/updating
the dyntick mode, can happen from 3 sites and none of them are relevant
while the CPU is offline:
1) The idle loop:
a) From the quick check helping the cpuidle governor to heuristically
predict the best C-state.
b) While stopping the tick.
But if the CPU is offline, the tick has been cancelled and there is
consequently no need to further stop the tick.
2) Remote expiry: when a CPU remotely expires global timers on behalf of
another CPU, the latter target's next timer is re-evaluated
afterwards. However remote expîry doesn't happen on offline CPUs.
3) IRQ exit: on nohz_full mode, the tick is (re-)evaluated on IRQ exit.
But full dynticks is disabled on offline CPUs.
Therefore it is safe to assume that no next dyntick timer lookup can
be performed on offline CPUs.
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop
machine, then the oneshot tick is stopped right after. Therefore it's
guaranteed that the current CPU isn't the timekeeper upon its last call
to idle.
Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes
into idle suggests that the tick is going to be stopped while it is
actually stopped already from the appropriate CPU hotplug state.
Remove the confusing call and the obsolete case handling and convert it
to a sanity check that verifies the above assumption.
The timekeeping duty is handed over from the outgoing CPU within stop
machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in
high-res mode. However in low-res dynticks mode, the tick isn't
cancelled until the clockevent is shut down, which can happen later. The
tick may therefore fire again once IRQs are re-enabled on stop machine
and until IRQs are disabled for good upon the last call to idle.
That's so many opportunities for a timekeeper to go idle and the
outgoing CPU to take over that duty. This is why
tick_nohz_idle_stop_tick() is called one last time on idle if the CPU
is seen offline: so that the timekeeping duty is handed over again in
case the CPU has re-taken the duty.
This means there are two timekeeping handovers on CPU down hotplug with
different undocumented constraints and purposes:
1) A handover on stop machine for !dynticks || highres. All online CPUs
are guaranteed to be non-idle and the timekeeping duty can be safely
handed-over. The hrtimer tick is cancelled so it is guaranteed that in
dynticks mode the outgoing CPU won't take again the duty.
2) A handover on last idle call for dynticks && lowres. Setting the
duty to TICK_DO_TIMER_NONE makes sure that a CPU will take over the
timekeeping.
Prepare for consolidating the handover to a single place (the first one)
with shutting down the low-res tick as well from
tick_cancel_sched_timer() as well. This will simplify the handover and
unify the tick cancellation between high-res and low-res.
tick: Split nohz and highres features from nohz_mode
The nohz mode field tells about low resolution nohz mode or high
resolution nohz mode but it doesn't tell about high resolution non-nohz
mode.
In order to retrieve the latter state, tick_cancel_sched_timer() must
fiddle with struct hrtimer's internals to guess if the tick has been
initialized in high resolution.
Move instead the nohz mode field information into the tick flags and
provide two new bits: one to know if the tick is in nohz mode and
another one to know if the tick is in high resolution. The combination
of those two flags provides all the needed informations to determine
which of the three tick modes is running.
tick: Move individual bit features to debuggable mask accesses
The individual bitfields of struct tick_sched must be modified from
IRQs disabled places, otherwise local modifications can race due to them
sharing the same memory storage.
The recent move of the "got_idle_tick" bitfield to its own storage shows
that the use of these bitfields, as pretty as they look, can be as much
error prone.
In order to avoid future issues of the like and make sure that those
bitfields are safely accessed, move those flags to an explicit mask
along with a mutator function performing the basic IRQs disabled sanity
check.
tick_nohz_idle_got_tick() is called by cpuidle_reflect() within the idle
loop with interrupts enabled. This function modifies the struct
tick_sched's bitfield "got_idle_tick". However this bitfield is stored
within the same mask as other bitfields that can be modified from
interrupts.
Fortunately so far it looks like the only race that can happen is while
writing ->got_idle_tick to 0, an interrupt fires and writes the
->idle_active field to 0. It's then possible that the interrupted write
to ->got_idle_tick writes back the old value of ->idle_active back to 1.
However if that happens, the worst possible outcome is that the time
spent between that interrupt and the upcoming call to
tick_nohz_idle_exit() is accounted as idle, which is negligible quantity.
Still all the bitfield writes within this struct tick_sched's shadow
mask should be IRQ-safe. Therefore move this bitfield out to its own
storage to avoid further suprises.
tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
The full-nohz update function checks if the nohz mode is active before
proceeding. It considers one exception though: if the tick is already
stopped even though the nohz mode is inactive, it still moves on in
order to update/restart the tick if needed.
However in order for the tick to be stopped, the nohz_mode has to be
either NOHZ_MODE_LOWRES or NOHZ_MODE_HIGHRES. Therefore it doesn't make
sense to test if the tick is stopped before verifying NOHZ_MODE_INACTIVE
mode.
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
The broadcast shutdown code is executed through a random explicit call
within stop machine from the outgoing CPU.
However the tick broadcast is a midware between the tick callback and
the clocksource, therefore it makes more sense to shut it down after the
tick callback and before the clocksource drivers.
Move it instead to the common tick shutdown CPU hotplug state where
related operations can be ordered from highest to lowest level.
tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
The tick hrtimer is cancelled right before hrtimers are migrated. This
is done from the hrtimer subsystem even though it shouldn't know about
its actual users.
Move instead the tick hrtimer cancellation to the relevant CPU hotplug
state that aims at centralizing high level tick shutdown operations so
that the related flow is easy to follow.
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are
shut down from scattered places, sometimes from teardown callbacks on
stop machine, sometimes through explicit calls, sometimes from the
control CPU after the CPU died. The reason why these shutdown operations
are spread around is not always clear and it makes the tick lifecycle
hard to follow.
The tick should be shut down in order from highest to lowest level:
On stop machine from the dying CPU (high-level):
1) Hand-over the timekeeping duty (tick_handover_do_timer())
2) Cancel the tick implementation called by the clockevent callback
(tick_cancel_sched_timer())
3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())
From the control CPU after the CPU died (low-level):
5) Shutdown/unregister/cleanup clockevents for the dead CPU
(tick_cleanup_dead_cpu())
Instead the current order is 2, 4 (both from CPU hotplug states), then
1 and 3 through direct calls. This layout and order don't make much
sense. The operations 1, 2, 3 should be gathered together and in order.
Sort this situation with creating a new TICK shut-down CPU hotplug state
and start with introducing the timekeeping duty hand-over there. The
state must precede hrtimers migration because the tick hrtimer will be
stopped from it in a further patch.
tick-sched.c is only built when CONFIG_TICK_ONESHOT=y, which is selected
only if CONFIG_NO_HZ_COMMON=y or CONFIG_HIGH_RES_TIMERS=y. Therefore
the related ifdeferry in this file is needless and can be removed.
Peng Liu [Sun, 25 Feb 2024 22:54:54 +0000 (23:54 +0100)]
tick/nohz: Remove duplicate between lowres and highres handlers
tick_nohz_lowres_handler() does the same work as
tick_nohz_highres_handler() plus the clockevent device reprogramming, so
make the former reuse the latter and rename it accordingly.
Peng Liu [Sun, 25 Feb 2024 22:54:53 +0000 (23:54 +0100)]
tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
The ts->sched_timer initialization work of tick_nohz_switch_to_nohz()
is almost the same as that of tick_setup_sched_timer(), so adjust the
latter to get it reused by tick_nohz_switch_to_nohz().
This also makes the low resolution mode sched_timer benefit from the tick
skew boot option.
Placing timers at enqueue time on a target CPU based on dubious heuristics
does not make any sense:
1) Most timer wheel timers are canceled or rearmed before they expire.
2) The heuristics to predict which CPU will be busy when the timer expires
are wrong by definition.
So placing the timers at enqueue wastes precious cycles.
The proper solution to this problem is to always queue the timers on the
local CPU and allow the non pinned timers to be pulled onto a busy CPU at
expiry time.
Therefore split the timer storage into local pinned and global timers:
Local pinned timers are always expired on the CPU on which they have been
queued. Global timers can be expired on any CPU.
As long as a CPU is busy it expires both local and global timers. When a
CPU goes idle it arms for the first expiring local timer. If the first
expiring pinned (local) timer is before the first expiring movable timer,
then no action is required because the CPU will wake up before the first
movable timer expires. If the first expiring movable timer is before the
first expiring pinned (local) timer, then this timer is queued into an idle
timerqueue and eventually expired by another active CPU.
To avoid global locking the timerqueues are implemented as a hierarchy. The
lowest level of the hierarchy holds the CPUs. The CPUs are associated to
groups of 8, which are separated per node. If more than one CPU group
exist, then a second level in the hierarchy collects the groups. Depending
on the size of the system more than 2 levels are required. Each group has a
"migrator" which checks the timerqueue during the tick for remote expirable
timers.
If the last CPU in a group goes idle it reports the first expiring event in
the group up to the next group(s) in the hierarchy. If the last CPU goes
idle it arms its timer for the first system wide expiring timer to ensure
that no timer event is missed.
timers: Introduce function to check timer base is_idle flag
To prepare for the conversion of the NOHZ timer placement to a pull at
expiry time model it's required to have a function that returns the value
of the is_idle flag of the timer base to keep the hierarchy states during
online in sync with timer base state.
Due to the conversion of the NOHZ timer placement to a pull at expiry
time model, the per CPU timer bases with non pinned timers are no
longer handled only by the local CPU. In case a remote CPU already
expires the non pinned timers base of the local CPU, nothing more
needs to be done by the local CPU. A check at the begin of the expire
timers routine is required, because timer base lock is dropped before
executing the timer callback function.
This is a preparatory work, but has no functional impact right now.
Move the locking out from __run_timers() to the call sites, so the
protected section can be extended at the call site. Preparatory work for
changing the NOHZ timer placement to a pull at expiry time model.
timers: Add get next timer interrupt functionality for remote CPUs
To prepare for the conversion of the NOHZ timer placement to a pull at
expiry time model it's required to have functionality available getting the
next timer interrupt on a remote CPU.
Locking of the timer bases and getting the information for the next timer
interrupt functionality is split into separate functions. This is required
to be compliant with lock ordering when the new model is in place.
timers: Split out "get next timer interrupt" functionality
The functionality for getting the next timer interrupt in
get_next_timer_interrupt() is split into a separate function
fetch_next_timer_interrupt() to be usable by other call sites.
This is preparatory work for the conversion of the NOHZ timer
placement to a pull at expiry time model. No functional change.
timers: Retrieve next expiry of pinned/non-pinned timers separately
For the conversion of the NOHZ timer placement to a pull at expiry time
model it's required to have separate expiry times for the pinned and the
non-pinned (movable) timers. Therefore struct timer_events is introduced.
Split the logic for getting next timer interrupt (no matter of recalculated
or already stored in base->next_expiry) into a separate function named
next_timer_interrupt(). Make it available to local call sites only.
The logic for raising a softirq the way it is implemented right now, is
readable for two timer bases. When increasing the number of timer bases,
code gets harder to read. With the introduction of the timer migration
hierarchy, there will be three timer bases.
Therefore restructure the code to use a loop. No functional change.
timers: Make sure TIMER_PINNED flag is set in add_timer_on()
When adding a timer to the timer wheel using add_timer_on(), it is an
implicitly pinned timer. With the timer pull at expiry time model in place,
the TIMER_PINNED flag is required to make sure timers end up in proper
base.
Set the TIMER_PINNED flag unconditionally when add_timer_on() is executed.
The implementation of the NOHZ pull at expiry model will change the timer
bases per CPU. Timers, that have to expire on a specific CPU, require the
TIMER_PINNED flag. If the CPU doesn't matter, the TIMER_PINNED flag must be
dropped. This is required for call sites which use the timer alternately as
pinned and not pinned timer like workqueues do.
Therefore use add_timer_global() in __queue_delayed_work() for non-bound
delayed work to make sure the TIMER_PINNED flag is dropped.
timers: Introduce add_timer() variants which modify timer flags
A timer might be used as a pinned timer (using add_timer_on()) and later on
as non-pinned timer using add_timer(). When the "NOHZ timer pull at expiry
model" is in place, the TIMER_PINNED flag is required to be used whenever a
timer needs to expire on a dedicated CPU. Otherwise the flag must not be
set if expiration on a dedicated CPU is not required.
add_timer_on()'s behavior will be changed during the preparation patches
for the "NOHZ timer pull at expiry model" to unconditionally set the
TIMER_PINNED flag. To be able to clear/ set the flag when queueing a
timer, two variants of add_timer() are introduced.
This is a preparatory step and has no functional change.
timers: Optimization for timer_base_try_to_set_idle()
When tick is stopped also the timer base is_idle flag is set. When
reentering timer_base_try_to_set_idle() with the tick stopped, there is no
need to check whether the timer base needs to be set idle again. When a
timer was enqueued in the meantime, this is already handled by the
tick_nohz_next_event() call which was executed before
tick_nohz_stop_tick().
timers: Move marking timer bases idle into tick_nohz_stop_tick()
The timer base is marked idle when get_next_timer_interrupt() is
executed. But the decision whether the tick will be stopped and whether the
system is able to go idle is done later. When the timer bases is marked
idle and a new first timer is enqueued remote an IPI is raised. Even if it
is not required because the tick is not stopped and the timer base is
evaluated again at the next tick.
To prevent this, the timer base is marked idle in tick_nohz_stop_tick() and
get_next_timer_interrupt() is streamlined by only looking for the next timer
interrupt. All other work is postponed to timer_base_try_to_set_idle() which is
called by tick_nohz_stop_tick(). timer_base_try_to_set_idle() never resets
timer_base::is_idle state. This is done when the tick is restarted via
tick_nohz_restart_sched_tick().
With this, tick_sched::tick_stopped and timer_base::is_idle are always in
sync. So there is no longer the need to execute timer_clear_idle() in
tick_nohz_idle_retain_tick(). This was required before, as
tick_nohz_next_event() set timer_base::is_idle even if the tick would not be
stopped. So timer_clear_idle() is only executed, when timer base is idle. So the
check whether timer base is idle, is now no longer required as well.
While at it fix some nearby whitespace damage as well.
get_next_timer_interrupt() contains two parts for the next timer interrupt
calculation. Those two parts are separated by forwarding the base
clock. But the second part does not depend on the forwarded base
clock.
Therefore restructure get_next_timer_interrupt() to keep things together
which belong together.
Feng Tang [Wed, 21 Feb 2024 06:08:59 +0000 (14:08 +0800)]
clocksource: Scale the watchdog read retries automatically
On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:
clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns,
wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152)
clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896.
clocksource: Switched to clocksource hpet
The reason is that for platform with a large number of CPUs, there are
sporadic big or huge read latencies while reading the watchog/clocksource
during boot or when system is under stress work load, and the frequency and
maximum value of the latency goes up with the number of online CPUs.
The cCurrent code already has logic to detect and filter such high latency
case by reading the watchdog twice and checking the two deltas. Due to the
randomness of the latency, there is a low probabilty that the first delta
(latency) is big, but the second delta is small and looks valid. The
watchdog code retries the readouts by default twice, which is not
necessarily sufficient for systems with a large number of CPUs.
There is a command line parameter 'max_cswd_read_retries' which allows to
increase the number of retries, but that's not user friendly as it needs to
be tweaked per system. As the number of required retries is proportional to
the number of online CPUs, this parameter can be calculated at runtime.
Scale and enlarge the number of retries according to the number of online
CPUs and remove the command line parameter completely.
vdso/datapage.h includes the architecture specific vdso/data.h header
file. So there is no need to do it also the other way round and including
the generic vdso/datapage.h header file inside the architecture specific
data.h header file.
Peter Hilber [Mon, 18 Dec 2023 07:38:41 +0000 (08:38 +0100)]
timekeeping: Fix cross-timestamp interpolation for non-x86
So far, get_device_system_crosststamp() unconditionally passes
system_counterval.cycles to timekeeping_cycles_to_ns(). But when
interpolating system time (do_interp == true), system_counterval.cycles is
before tkr_mono.cycle_last, contrary to the timekeeping_cycles_to_ns()
expectations.
On x86, CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE will mitigate on
interpolating, setting delta to 0. With delta == 0, xtstamp->sys_monoraw
and xtstamp->sys_realtime are then set to the last update time, as
implicitly expected by adjust_historical_crosststamp(). On other
architectures, the resulting nonsense xtstamp->sys_monoraw and
xtstamp->sys_realtime corrupt the xtstamp (ts) adjustment in
adjust_historical_crosststamp().
Fix this by deriving xtstamp->sys_monoraw and xtstamp->sys_realtime from
the last update time when interpolating, by using the local variable
"cycles". The local variable already has the right value when
interpolating, unlike system_counterval.cycles.
Peter Hilber [Mon, 18 Dec 2023 07:38:40 +0000 (08:38 +0100)]
timekeeping: Fix cross-timestamp interpolation corner case decision
The cycle_between() helper checks if parameter test is in the open interval
(before, after). Colloquially speaking, this also applies to the counter
wrap-around special case before > after. get_device_system_crosststamp()
currently uses cycle_between() at the first call site to decide whether to
interpolate for older counter readings.
get_device_system_crosststamp() has the following problem with
cycle_between() testing against an open interval: Assume that, by chance,
cycles == tk->tkr_mono.cycle_last (in the following, "cycle_last" for
brevity). Then, cycle_between() at the first call site, with effective
argument values cycle_between(cycle_last, cycles, now), returns false,
enabling interpolation. During interpolation,
get_device_system_crosststamp() will then call cycle_between() at the
second call site (if a history_begin was supplied). The effective argument
values are cycle_between(history_begin->cycles, cycles, cycles), since
system_counterval.cycles == interval_start == cycles, per the assumption.
Due to the test against the open interval, cycle_between() returns false
again. This causes get_device_system_crosststamp() to return -EINVAL.
This failure should be avoided, since get_device_system_crosststamp() works
both when cycles follows cycle_last (no interpolation), and when cycles
precedes cycle_last (interpolation). For the case cycles == cycle_last,
interpolation is actually unneeded.
Fix this by changing cycle_between() into timestamp_in_interval(), which
now checks against the closed interval, rather than the open interval.
This changes the get_device_system_crosststamp() behavior for three corner
cases:
1. Bypass interpolation in the case cycles == tk->tkr_mono.cycle_last,
fixing the problem described above.
2. At the first timestamp_in_interval() call site, cycles == now no longer
causes failure.
3. At the second timestamp_in_interval() call site, history_begin->cycles
== system_counterval.cycles no longer causes failure.
adjust_historical_crosststamp() also works for this corner case,
where partial_history_cycles == total_history_cycles.
These behavioral changes should not cause any problems.
jiffies: Transform comment about time_* functions into DOC block
This general note about time_* functions is also useful to be available in
kernel documentation. Therefore transform it into a kernel-doc DOC block
with proper formatting.
tick/sched: Add function description for tick_nohz_next_event()
The return value of tick_nohz_next_event() is not obvious at the first
glance. Add a kernel-doc compatible function description which also covers
return values.
hrtimers: Move hrtimer base related definitions into hrtimer_defs.h
hrtimer base related struct definitions are part of hrtimers.h as it is
required there. With this, also the struct documentation which is for core
code internal use, is exposed into the general api.
To prevent this, move all core internal definitions and the related
includes into hrtimer_defs.h.
Linus Torvalds [Sun, 18 Feb 2024 18:09:25 +0000 (10:09 -0800)]
Merge tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Reformat nested if-conditionals in Makefiles with 4 spaces
- Fix CONFIG_DEBUG_INFO_BTF builds for big endian
- Fix modpost for module srcversion
- Fix an escape sequence warning in gen_compile_commands.py
- Fix kallsyms to ignore ARMv4 thunk symbols
* tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kallsyms: ignore ARMv4 thunks along with others
modpost: trim leading spaces when processing source files list
gen_compile_commands: fix invalid escape sequence warning
kbuild: Fix changing ELF file type for output of gen_btf for big endian
docs: kconfig: Fix grammar and formatting
kbuild: use 4-space indentation when followed by conditionals
Linus Torvalds [Sun, 18 Feb 2024 17:22:48 +0000 (09:22 -0800)]
Merge tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
- Use a GB page for identity mapping only when memory of this size is
requested so that mapping of reserved regions is prevented which
would otherwise lead to system crashes on UV machines
* tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
- Prevent spurious interrupts on Broadcom devices using GIC v3
architecture
- Other minor fixes
* tag 'irq_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
irqchip/qcom-mpm: Fix IS_ERR() vs NULL check in qcom_mpm_init()
irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
Linus Torvalds [Sun, 18 Feb 2024 17:08:57 +0000 (09:08 -0800)]
Merge tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two fixes for i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc"
* tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix block process call transactions
i2c: pasemi: split driver into two separate modules
i2c: qcom-geni: Correct I2C TRE sequence
Linus Torvalds [Sun, 18 Feb 2024 00:59:31 +0000 (16:59 -0800)]
Merge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"This is a bit of a big batch for rc4, but just due to holiday hangover
and because I didn't send any fixes last week due to a late revert
request. I think next week should be back to normal.
- Fix ftrace bug on boot caused by exit text sections with
'-fpatchable-function-entry'
- Fix accuracy of stolen time on pseries since the switch to
VIRT_CPU_ACCOUNTING_GEN
- Fix a crash in the IOMMU code when doing DLPAR remove
- Set pt_regs->link on scv entry to fix BPF stack unwinding
- Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
gdb
- Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled
- Fix build failures with KASAN enabled and 32KB stack size
- Some other minor fixes
Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
Srikar Dronamraju, and Venkat Rao Bagalkote"
* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
powerpc/pseries: fix accuracy of stolen time
powerpc/ftrace: Ignore ftrace locations in exit text sections
powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
powerpc/kasan: Limit KASAN thread size increase to 32KB
Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
powerpc: 85xx: mark local functions static
powerpc: udbg_memcons: mark functions static
powerpc/kasan: Fix addr error caused by page alignment
powerpc/6xx: set High BAT Enable flag on G2_LE cores
selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
powerpc/64: Set task pt_regs->link to the LR value on scv entry
powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
powerpc/pseries/papr-sysparm: use u8 arrays for payloads
Linus Torvalds [Sat, 17 Feb 2024 16:56:41 +0000 (08:56 -0800)]
Merge tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core fixes, a kobject fix, and a documentation
update for 6.8-rc5. In detail these changes are:
- devlink fixes for reported issues with 6.8-rc1
- topology scheduling regression fix that has been reported by many
- kobject loosening of checks change in -rc1 is now reverted as some
codepaths seemed to need the checks
- documentation update for the CVE process. Has been reviewed by
many, the last minute change to the document was to bring the .rst
format back into the the new style rules, the contents did not
change.
All of these, except for the documentation update, have been in
linux-next for over a week. The documentation update has been reviewed
for weeks by a group of developers, and in public for a week and the
wording has stabilized for now. If future changes are needed, we can
do so before 6.8-final is out (or anytime after that)"
* tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: Document the Linux Kernel CVE process
Revert "kobject: Remove redundant checks for whether ktype is NULL"
driver core: fw_devlink: Improve logs for cycle detection
driver core: fw_devlink: Improve detection of overlapping cycles
driver core: Fix device_link_flag_is_sync_state_only()
topology: Set capacity_freq_ref in all cases
Linus Torvalds [Sat, 17 Feb 2024 16:52:38 +0000 (08:52 -0800)]
Merge tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / miscdriver fixes from Greg KH:
"Here is a small set of char/misc and IIO driver fixes for 6.8-rc5.
Included in here are:
- lots of iio driver fixes for reported issues
- nvmem device naming fixup for reported problem
- interconnect driver fixes for reported issues
All of these have been in linux-next for a while with no reported the
issues (the nvmem patch was included in a different branch in
linux-next before sent to me for inclusion here)"
* tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
nvmem: include bit index in cell sysfs file name
iio: adc: ad4130: only set GPIO_CTRL if pin is unused
iio: adc: ad4130: zero-initialize clock init data
interconnect: qcom: x1e80100: Add missing ACV enable_mask
interconnect: qcom: sm8650: Use correct ACV enable_mask
iio: accel: bma400: Fix a compilation problem
iio: commom: st_sensors: ensure proper DMA alignment
iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier
staging: iio: ad5933: fix type mismatch regression
iio: humidity: hdc3020: fix temperature offset
iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup()
iio: adc: ad_sigma_delta: ensure proper DMA alignment
iio: imu: adis: ensure proper DMA alignment
iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry
iio: imu: bno055: serdev requires REGMAP
iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
iio: pressure: bmp280: Add missing bmp085 to SPI id table
iio: core: fix memleak in iio_device_register_sysfs
interconnect: qcom: sm8550: Enable sync_state
...
Linus Torvalds [Sat, 17 Feb 2024 16:46:57 +0000 (08:46 -0800)]
Merge tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial fixes from Greg KH:
"Here are three small tty and serial driver fixes for 6.8-rc5:
- revert a 8250_pci1xxxx off-by-one change that was incorrect
- two changes to fix the transmit path of the mxs-auart driver,
fixing a regression in the 6.2 release
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: mxs-auart: fix tx
serial: core: introduce uart_port_tx_flags()
serial: 8250_pci1xxxx: partially revert off by one patch
Linus Torvalds [Sat, 17 Feb 2024 16:44:55 +0000 (08:44 -0800)]
Merge tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are two small fixes for 6.8-rc5:
- thunderbolt to fix a reported issue on many platforms
- dwc3 driver revert of a commit that caused problems in -rc1
Both of these changes have been in linux-next for over a week with no
reported issues"
* tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: dwc3: Support EBC feature of DWC_usb31"
thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
Linus Torvalds [Sat, 17 Feb 2024 16:13:32 +0000 (08:13 -0800)]
Merge tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- regression fix for rkisp1 shared IRQ logic
- fix atomisp breakage due to a kAPI change
- permission fix for remote controller BPF support
- memleak fix in ir_toy driver
- Kconfig dependency fix for pwm-ir-rx
* tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: pwm-ir-tx: Depend on CONFIG_HIGH_RES_TIMERS
media: ir_toy: fix a memleak in irtoy_tx
media: rc: bpf attach/detach requires write permission
media: atomisp: Adjust for v4l2_subdev_state handling changes in 6.8
media: rkisp1: Fix IRQ handling due to shared interrupts
media: Revert "media: rkisp1: Drop IRQF_SHARED"
Linus Torvalds [Sat, 17 Feb 2024 16:06:20 +0000 (08:06 -0800)]
Merge tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Keep bridges in D0 if we need to poll downstream devices for PME to
resolve a v6.6 regression where we failed to enumerate devices below
bridges put in D3hot by runtime PM, e.g., NVMe drives connected via
Thunderbolt or USB4 docks (Alex Williamson)
- Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
* tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
PCI: Fix active state requirement in PME polling
Linus Torvalds [Sat, 17 Feb 2024 15:59:47 +0000 (07:59 -0800)]
Merge tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- tracing/probes: Fix BTF structure member finder to find the members
which are placed after any anonymous union member correctly.
* tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix to search structure fields correctly
Linus Torvalds [Sat, 17 Feb 2024 15:56:10 +0000 (07:56 -0800)]
Merge tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Five smb3 client fixes, most also for stable:
- Two multichannel fixes (one to fix potential handle leak on retry)
- Work around possible serious data corruption (due to change in
folios in 6.3, for cases when non standard maximum write size
negotiated)
- Symlink creation fix
- Multiuser automount fix"
* tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: Fix regression in writes when non-standard maximum write size negotiated
smb: client: handle path separator of created SMB symlinks
smb: client: set correct id, uid and cruid for multiuser automounts
cifs: update the same create_guid on replay
cifs: fix underflow in parse_server_interfaces()
Documentation: Document the Linux Kernel CVE process
The Linux kernel project now has the ability to assign CVEs to fixed
issues, so document the process and how individual developers can get a
CVE if one is not automatically assigned for their fixes.
tracing/probes: Fix to search structure fields correctly
Fix to search a field from the structure which has anonymous union
correctly.
Since the reference `type` pointer was updated in the loop, the search
loop suddenly aborted where it hits an anonymous union. Thus it can not
find the field after the anonymous union. This avoids updating the
cursor `type` pointer in the loop.
Wolfram Sang [Sat, 17 Feb 2024 12:13:33 +0000 (13:13 +0100)]
Merge tag 'i2c-host-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Three fixes are included here. Two are strictly hardware-related
for the i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc.
Linus Torvalds [Fri, 16 Feb 2024 22:05:02 +0000 (14:05 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes: the two fnic ones are a revert and a refix, which is why
the diffstat is a bit big. The target one also extracts a function to
add a check for configuration and so looks bigger than it is"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Move fnic_fnic_flush_tx() to a work queue
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
scsi: target: Fix unmap setup during configuration
Linus Torvalds [Fri, 16 Feb 2024 22:00:19 +0000 (14:00 -0800)]
Merge tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Just one patch to revert commit ca10d851b9ad ("workqueue: Override
implicit ordered attribute in workqueue_apply_unbound_cpumask()").
This commit could break ordering guarantees for ordered workqueues.
The problem that the commit tried to resolve partially - making
ordered workqueues follow unbound cpumask - is fully solved in
wq/for-6.9 branch"
* tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
Linus Torvalds [Fri, 16 Feb 2024 21:37:15 +0000 (13:37 -0800)]
Merge tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Additional cap handling fixes from Xiubo to avoid "client isn't
responding to mclientcaps(revoke)" stalls on the MDS side"
* tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client:
ceph: add ceph_cap_unlink_work to fire check_caps() immediately
ceph: always queue a writeback when revoking the Fb caps
Linus Torvalds [Fri, 16 Feb 2024 18:48:14 +0000 (10:48 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Avoid dropping the page refcount twice when freeing an unlinked
page-table subtree.
- Don't source the VFIO Kconfig twice
- Fix protected-mode locking order between kvm and vcpus
RISC-V:
- Fix steal-time related sparse warnings
x86:
- Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"
- Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if
and only if the incoming events->nmi.pending is non-zero. If the
target vCPU is in the UNITIALIZED state, the spurious request will
result in KVM exiting to userspace, which in turn causes QEMU to
constantly acquire and release QEMU's global mutex, to the point
where the BSP is unable to make forward progress.
- Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl
being incorrectly truncated, and ultimately causes KVM to think a
fixed counter has already been disabled (KVM thinks the old value
is '0').
- Fix a stack leak in KVM_GET_MSRS where a failed MSR read from
userspace that is ultimately ignored due to ignore_msrs=true
doesn't zero the output as intended.
Selftests cleanups and fixes:
- Remove redundant newlines from error messages.
- Delete an unused variable in the AMX test (which causes build
failures when compiling with -Werror).
- Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails
with an error code other than ENOENT (a Hyper-V selftest bug
resulted in an EMFILE, and the test eventually got skipped).
- Fix TSC related bugs in several Hyper-V selftests.
- Fix a bug in the dirty ring logging test where a sem_post() could
be left pending across multiple runs, resulting in incorrect
synchronization between the main thread and the vCPU worker thread.
- Relax the dirty log split test's assertions on 4KiB mappings to fix
false positives due to the number of mappings for memslot 0 (used
for code and data that is NOT being dirty logged) changing, e.g.
due to NUMA balancing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
RISC-V: KVM: Use correct restricted types
RISC-V: paravirt: Use correct restricted types
RISC-V: paravirt: steal_time should be static
KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test
KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
KVM: x86: Fix KVM_GET_MSRS stack info leak
KVM: arm64: Do not source virt/lib/Kconfig twice
KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
KVM: x86: Make gtod_is_based_on_tsc() return 'bool'
KVM: selftests: Make hyperv_clock require TSC based system clocksource
KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too
KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test
KVM: selftests: Generalize check_clocksource() from kvm_clock_test
KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
KVM: arm64: Fix circular locking dependency
KVM: selftests: Fail tests when open() fails with !ENOENT
KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
KVM: selftests: x86_64: Remove redundant newlines
...
Linus Torvalds [Fri, 16 Feb 2024 18:33:51 +0000 (10:33 -0800)]
Merge tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the #ifndef that didn't have the 'CONFIG_' prefix on
HAVE_DYNAMIC_FTRACE_WITH_REGS
The fix to have dynamic trampolines work with x86 broke arm64 as the
config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the
previous fix was to fix.
- Fix tracing_on state
The code to test if "tracing_on" is set incorrectly used
ring_buffer_record_is_on() which returns false if the ring buffer
isn't able to be written to.
But the ring buffer disable has several bits that disable it. One is
internal disabling which is used for resizing and other modifications
of the ring buffer. But the "tracing_on" user space visible flag
should only report if tracing is actually on and not internally
disabled, as this can cause confusion as writing "1" when it is
disabled will not enable it.
Instead use ring_buffer_record_is_set_on() which shows the user space
visible settings.
- Fix a false positive kmemleak on saved cmdlines
Now that the saved_cmdlines structure is allocated via alloc_page()
and not via kmalloc() it has become invisible to kmemleak. The
allocation done to one of its pointers was flagged as a dangling
allocation leak. Make kmemleak aware of this allocation and free.
- Fix synthetic event dynamic strings
An update that cleaned up the synthetic event code removed the return
value of trace_string(), and had it return zero instead of the
length, causing dynamic strings in the synthetic event to always have
zero size.
- Clean up documentation and header files for seq_buf
* tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
seq_buf: Fix kernel documentation
seq_buf: Don't use "proxy" headers
tracing/synthetic: Fix trace_string() return value
tracing: Inform kmemleak of saved_cmdlines allocation
tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Linus Torvalds [Fri, 16 Feb 2024 18:28:29 +0000 (10:28 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"It's a little busier than normal, but it's still not a lot of code and
things seem fairly quiet in general:
- Fix allocation failure during SVE coredumps
- Fix handling of SVE context on signal delivery
- Enable Neoverse N2 CPU errata workarounds for Microsoft's "Azure
Cobalt 100" clone
- Work around CMN PMU erratum in AmpereOneX implementation
- Fix typo in CXL PMU event definition
- Fix jump label asm constraints"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sve: Lower the maximum allocation for the SVE ptrace regset
arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count)
arm64: jump_label: use constraints "Si" instead of "i"
arm64: fix typo in comments
perf: CXL: fix mismatched cpmu event opcode
arm64/signal: Don't assume that TIF_SVE means we saved SVE state
Linus Torvalds [Fri, 16 Feb 2024 17:29:26 +0000 (09:29 -0800)]
Merge tag 'zonefs-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
- Fix direct write error handling to avoid a race between failed IO
completion and the submission path itself which can result in an
invalid file size exposed to the user after the failed IO.
* tag 'zonefs-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Improve error handling
Linus Torvalds [Fri, 16 Feb 2024 17:02:19 +0000 (09:02 -0800)]
Merge tag 'sound-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of device-specific fixes. It became a bit bigger than
wished, but all look reasonably small and safe to apply.
- A few Cirrus Logic CS35L56 and CS42L43 driver fixes
- ASoC SOF fixes and workarounds
- Various ASoC Intel fixes
- Lots of HD-, USB-audio and AMD ACP quirks"
* tag 'sound-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
ALSA: usb-audio: More relaxed check of MIDI jack names
ALSA: hda/realtek: fix mute/micmute LED For HP mt645
ALSA: hda/realtek: cs35l41: Fix order and duplicates in quirks table
ALSA: hda/realtek: cs35l41: Fix device ID / model name
ALSA: hda/realtek: cs35l41: Add internal speaker support for ASUS UM3402 with missing DSD
ASoC: cs35l56: Workaround for ACPI with broken spk-id-gpios property
ALSA: hda: Add Lenovo Legion 7i gen7 sound quirk
ASoC: SOF: IPC3: fix message bounds on ipc ops
ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend
ASoC: q6dsp: fix event handler prototype
ASoC: SOF: Intel: pci-lnl: Change the topology path to intel/sof-ipc4-tplg
ASoC: SOF: Intel: pci-tgl: Change the default paths and firmware names
ASoC: amd: yc: Fix non-functional mic on Lenovo 82UU
ASoC: rt5645: Add DMI quirk for inverted jack-detect on MeeGoPad T8
ASoC: rt5645: Make LattePanda board DMI match more precise
ASoC: SOF: amd: Fix locking in ACP IRQ handler
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
ASoC: Intel: cht_bsw_rt5645: Cleanup codec_name handling
ASoC: Intel: Boards: Fix NULL pointer deref in BYT/CHT boards
ASoC: cs35l56: Remove default from IRQ1_CFG register
...
Linus Torvalds [Fri, 16 Feb 2024 16:55:46 +0000 (08:55 -0800)]
Merge tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- add missing stubs for functions that are not built with GPIOLIB
disabled
* tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: add gpio_device_get_label() stub for !GPIOLIB
gpiolib: add gpio_device_get_base() stub for !GPIOLIB
gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
Linus Torvalds [Fri, 16 Feb 2024 16:05:30 +0000 (08:05 -0800)]
Merge tag 'drm-fixes-2024-02-16' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular weekly fixes, nothing too major, mostly amdgpu, then i915, xe,
msm and nouveau with some scattered bits elsewhere.
crtc:
- fix uninit variable
prime:
- support > 4GB page arrays
buddy:
- fix error handling in allocations
i915:
- fix blankscreen on JSL chromebooks
- stable fix to limit DP sst link rates
xe:
- Fix an out-of-bounds shift.
- Fix the display code thinking xe uses shmem
- Fix a warning about index out-of-bound
- Fix a clang-16 compilation warning
amdgpu:
- PSR fixes
- Suspend/resume fixes
- Link training fix
- Aspect ratio fix
- DCN 3.5 fixes
- VCN 4.x fix
- GFX 11 fix
- Misc display fixes
- Misc small fixes
amdkfd:
- Cache size reporting fix
- SIMD distribution fix
msm:
- GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable
- tlb invalidation fix
* tag 'drm-fixes-2024-02-16' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/amdgpu: Fix implicit assumtion in gfx11 debug flags
drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards
drm/amd/display: Increase ips2_eval delay for DCN35
drm/amdgpu/display: Initialize gamma correction mode variable in dcn30_get_gamcor_current()
drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
drm/amd/display: fixed integer types and null check locations
drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr
drm/amd/display: Preserve original aspect ratio in create stream
drm/amd/display: Fix possible NULL dereference on device remove/driver unload
Revert "drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhz"
drm/amd/display: Add align done check
Revert "drm/amd: flush any delayed gfxoff on suspend entry"
drm/amd: Stop evicting resources on APUs in suspend
drm/amd/display: Fix possible buffer overflow in 'find_dcfclk_for_voltage()'
drm/amd/display: Fix possible use of uninitialized 'max_chunks_fbc_mode' in 'calculate_bandwidth()'
drm/amd/display: Initialize 'wait_time_microsec' variable in link_dp_training_dpia.c
drm/amd/display: Fix && vs || typos
drm/amdkfd: Fix L2 cache size reporting in GFX9.4.3
drm/amdgpu: make damage clips support configurable
drm/msm: Wire up tlb ops
...
Dave Airlie [Fri, 16 Feb 2024 05:45:03 +0000 (15:45 +1000)]
Merge tag 'drm-xe-fixes-2024-02-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix an out-of-bounds shift.
- Fix the display code thinking xe uses shmem
- Fix a warning about index out-of-bound
- Fix a clang-16 compilation warning
Dave Airlie [Fri, 16 Feb 2024 04:33:09 +0000 (14:33 +1000)]
Merge tag 'drm-misc-fixes-2024-02-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A suspend/resume error fix for ivpu, a couple of scheduler fixes for
nouveau, a patch to support large page arrays in prime, a uninitialized
variable fix in crtc, a locking fix in rockchip/vop2 and a buddy
allocator error reporting fix.
Steve French [Tue, 6 Feb 2024 22:34:22 +0000 (16:34 -0600)]
smb: Fix regression in writes when non-standard maximum write size negotiated
The conversion to netfs in the 6.3 kernel caused a regression when
maximum write size is set by the server to an unexpected value which is
not a multiple of 4096 (similarly if the user overrides the maximum
write size by setting mount parm "wsize", but sets it to a value that
is not a multiple of 4096). When negotiated write size is not a
multiple of 4096 the netfs code can skip the end of the final
page when doing large sequential writes, causing data corruption.
This section of code is being rewritten/removed due to a large
netfs change, but until that point (ie for the 6.3 kernel until now)
we can not support non-standard maximum write sizes.
Add a warning if a user specifies a wsize on mount that is not
a multiple of 4096 (and round down), also add a change where we
round down the maximum write size if the server negotiates a value
that is not a multiple of 4096 (we also have to check to make sure that
we do not round it down to zero).
Damien Le Moal [Thu, 8 Feb 2024 08:26:59 +0000 (17:26 +0900)]
zonefs: Improve error handling
Write error handling is racy and can sometime lead to the error recovery
path wrongly changing the inode size of a sequential zone file to an
incorrect value which results in garbage data being readable at the end
of a file. There are 2 problems:
1) zonefs_file_dio_write() updates a zone file write pointer offset
after issuing a direct IO with iomap_dio_rw(). This update is done
only if the IO succeed for synchronous direct writes. However, for
asynchronous direct writes, the update is done without waiting for
the IO completion so that the next asynchronous IO can be
immediately issued. However, if an asynchronous IO completes with a
failure right before the i_truncate_mutex lock protecting the update,
the update may change the value of the inode write pointer offset
that was corrected by the error path (zonefs_io_error() function).
2) zonefs_io_error() is called when a read or write error occurs. This
function executes a report zone operation using the callback function
zonefs_io_error_cb(), which does all the error recovery handling
based on the current zone condition, write pointer position and
according to the mount options being used. However, depending on the
zoned device being used, a report zone callback may be executed in a
context that is different from the context of __zonefs_io_error(). As
a result, zonefs_io_error_cb() may be executed without the inode
truncate mutex lock held, which can lead to invalid error processing.
Fix both problems as follows:
- Problem 1: Perform the inode write pointer offset update before a
direct write is issued with iomap_dio_rw(). This is safe to do as
partial direct writes are not supported (IOMAP_DIO_PARTIAL is not
set) and any failed IO will trigger the execution of zonefs_io_error()
which will correct the inode write pointer offset to reflect the
current state of the one on the device.
- Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error()
and call this function directly from __zonefs_io_error() after
obtaining the zone information using blkdev_report_zones() with a
simple callback function that copies to a local stack variable the
struct blk_zone obtained from the device. This ensures that error
handling is performed holding the inode truncate mutex.
This change also simplifies error handling for conventional zone files
by bypassing the execution of report zones entirely. This is safe to
do because the condition of conventional zones cannot be read-only or
offline and conventional zone files are always fully mapped with a
constant file size.