Git Repo - linux.git/log

Merge branch 'axienet-broken-link'

Andy Chiu says:

====================
Fix broken link on Xilinx's AXI Ethernet in SGMII mode

The Ethernet driver use phy-handle to reference the PCS/PMA PHY. This
could be a problem if one wants to configure an external PHY via phylink,
since it use the same phandle to get the PHY. To fix this, introduce a
dedicated pcs-handle to point to the PCS/PMA PHY and deprecate the use
of pointing it with phy-handle. A similar use case of pcs-handle can be
seen on dpaa2 as well.

--- patch v5 ---
- Re-apply the v4 patch on the net tree.
- Describe the pcs-handle DT binding at ethernet-controller level.
--- patch v6 ---
- Remove "preferrably" to clearify usage of pcs_handle.
--- patch v7 ---
- Rebase the patch on latest net/master
--- patch v8 ---
- Rebase the patch on net-next/master
- Add "reviewed-by" tag in PATCH 3/4: dt-bindings: net: add pcs-handle
attribute
- Remove "fix" tag in last commit message since this is not a critical
bug and will not be back ported to stable.
====================

Signed-off-by: David S. Miller <[email protected]>

net: axiemac: use a phandle to reference pcs_phy

In some SGMII use cases where both a fixed link external PHY and the
internal PCS/PMA PHY need to be configured, we should explicitly use a
phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the
driver would use "phy-handle" in the DT as the reference to both the
external and the internal PCS/PMA PHY.

In other cases where the core is connected to a SFP cage, we could still
point phy-handle to the intenal PCS/PMA PHY, and let the driver connect
to the SFP module, if exist, via phylink.

Signed-off-by: Andy Chiu <[email protected]>
Reviewed-by: Greentime Hu <[email protected]>
Reviewed-by: Robert Hancock <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: add pcs-handle attribute

Document the new pcs-handle attribute to support connecting to an
external PHY. For Xilinx's AXI Ethernet, this is used when the core
operates in SGMII or 1000Base-X modes and links through the internal
PCS/PMA PHY.

Signed-off-by: Andy Chiu <[email protected]>
Reviewed-by: Greentime Hu <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: axienet: factor out phy_node in struct axienet_local

the struct member `phy_node` of struct axienet_local is not used by the
driver anymore after initialization. It might be a remnent of old code
and could be removed.

Signed-off-by: Andy Chiu <[email protected]>
Reviewed-by: Greentime Hu <[email protected]>
Reviewed-by: Robert Hancock <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: axienet: setup mdio unconditionally

The call to axienet_mdio_setup should not depend on whether "phy-node"
pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in
SGMII or 100Base-X modes, move it into the if statement. And the next patch
will remove `lp->phy_node` from driver's private structure and do an
of_node_put on it right away after use since it is not used elsewhere.

Signed-off-by: Andy Chiu <[email protected]>
Reviewed-by: Greentime Hu <[email protected]>
Reviewed-by: Robert Hancock <[email protected]>
Reviewed-by: Radhey Shyam Pandey <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sfc: fix using uninitialized xdp tx_queue

In some cases, xdp tx_queue can get used before initialization.
1. interface up/down
2. ring buffer size change

When CPU cores are lower than maximum number of channels of sfc driver,
it creates new channels only for XDP.

When an interface is up or ring buffer size is changed, all channels
are initialized.
But xdp channels are always initialized later.
So, the below scenario is possible.
Packets are received to rx queue of normal channels and it is acted
XDP_TX and tx_queue of xdp channels get used.
But these tx_queues are not initialized yet.
If so, TX DMA or queue error occurs.

In order to avoid this problem.
1. initializes xdp tx_queues earlier than other rx_queue in
efx_start_channels().
2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().

Splat looks like:
   sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250
   sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL)
   sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22
   (raw=22) arg=789
   sfc 0000:08:00.1 enp8s0f1np1: has been disabled

Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)")
Acked-by: Martin Habets <[email protected]>
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rxrpc: fix a race in rxrpc_exit_net()

Current code can lead to the following race:

CPU0                                                 CPU1

rxrpc_exit_net()
                                                     rxrpc_peer_keepalive_worker()
                                                       if (rxnet->live)

  rxnet->live = false;
  del_timer_sync(&rxnet->peer_keepalive_timer);

                                                             timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);

  cancel_work_sync(&rxnet->peer_keepalive_work);

rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.

syzbot report was:

ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
kfree+0xd6/0x310 mm/slab.c:3809
ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
ops_free_list net/core/net_namespace.c:174 [inline]
cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>

Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: David Howells <[email protected]>
Cc: Marc Dionne <[email protected]>
Cc: [email protected]
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: openvswitch: fix leak of nested actions

While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.

Currently there are only two such actions: ct() and set().  However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.

For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():

  actions:clone(ct(commit),0)

Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.

Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case).  The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.

Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.

New build time assertion should protect us from this problem if new
actions will be added in the future.

Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked.  sample() and clone() actions
are mixing extra attributes into the user-provided action list.  That
prevents some code generalization too.

Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
Reported-by: Stéphane Graber <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry

tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when
updating the mmu_gather structure.

Unfortunately on arm64 there are two additional huge page sizes that
need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user
attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced
due to the fact that the tlb structure hasn't been correctly updated by
the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry.

This patch adds inequality logic to the generic implementation of
tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are
effectively covered on arm64. Also, as well as ptes, pmds and puds;
p4ds are now considered too.

Reported-by: David Hildenbrand <[email protected]>
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: Steve Capper <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

drm/vc4: hdmi: Remove clock rate initialization

Now that the clock driver makes sure we never end up with a rate of 0,
the HDMI driver doesn't need to care anymore.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

drm/vc4: Add logging and comments

The HVS core clock isn't really obvious, so let's add a bunch more
comments and some logging for easier debugging.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

arm64: alternatives: mark patch_alternative() as `noinstr`

The alternatives code must be `noinstr` such that it does not patch itself,
as the cache invalidation is only performed after all the alternatives have
been applied.

Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update()
and get_alt_insn() with `__always_inline` since they are both only called
through patch_alternative().

Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused
a boot hang:
[    0.241121] CPU: All CPU(s) started at EL2

The alternatives code was patching the atomics in __tsan_read4() from LL/SC
atomics to LSE atomics.

The following fragment is using LL/SC atomics in the .text section:
  | <__tsan_unaligned_read4+304>:     ldxr    x6, [x2]
  | <__tsan_unaligned_read4+308>:     add     x6, x6, x5
  | <__tsan_unaligned_read4+312>:     stxr    w7, x6, [x2]
  | <__tsan_unaligned_read4+316>:     cbnz    w7, <__tsan_unaligned_read4+304>

This LL/SC atomic sequence was to be replaced with LSE atomics. However since
the alternatives code was instrumentable, __tsan_read4() was being called after
only the first instruction was replaced, which led to the following code in memory:
  | <__tsan_unaligned_read4+304>:     ldadd   x5, x6, [x2]
  | <__tsan_unaligned_read4+308>:     add     x6, x6, x5
  | <__tsan_unaligned_read4+312>:     stxr    w7, x6, [x2]
  | <__tsan_unaligned_read4+316>:     cbnz    w7, <__tsan_unaligned_read4+304>

This caused an infinite loop as the `stxr` instruction never completed successfully,
so `w7` was always 0.

Signed-off-by: Joey Gouly <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

drm/i915/pmu: Drop redundant IS_VALLEYVIEW check in __get_rc6()

Because VLV_GT_RENDER_RC6 == GEN6_GT_GFX_RC6, the IS_VALLEYVIEW() check is
not needed. Neither is the check present in other code paths which call
intel_rc6_residency_ns() (in functions gen6_drpc(), rc6_residency() and
rc6_residency_ms_show()).

v2: Elimintate VLV_GT_RENDER_RC6 #define (Jani)

Cc: Jani Nikula <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Signed-off-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm: bridge: icn6211: Drop I2C module owner assignment

The module owner = THIS_MODULE is set by I2C core, drop duplicate assignment.

Fixes: 8dde6f7452a1 ("drm: bridge: icn6211: Add I2C configuration support")
Signed-off-by: Marek Vasut <[email protected]>
Cc: Jagan Teki <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Robert Foss <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
To: [email protected]
Reviewed-by: Jagan Teki <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/gma500: fix a missing break in psb_intel_crtc_mode_set

Instead of exiting the loop as expected when an entry is found, the
list_for_each_entry() continues until the traversal is complete.
when found the entry, add a break after the switch statement.

Signed-off-by: Xiaomeng Tong <[email protected]>
Signed-off-by: Patrik Jakobsson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm: bridge: icn6211: Mark module exit callback with __exit

Fix copy-paste error, module exit function should be marked with __exit
instead of __init.

Fixes: 8dde6f7452a1 ("drm: bridge: icn6211: Add I2C configuration support")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Marek Vasut <[email protected]>
Cc: Jagan Teki <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Robert Foss <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
To: [email protected]
Signed-off-by: Robert Foss <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Robert Foss <[email protected]>

ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back

CONFIG_SATA_LPM_MOBILE_POLICY was renamed to CONFIG_SATA_LPM_POLICY in
commit 4dd4d3deb502 ("ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY
configuration item").

This can potentially cause problems as users would invisibly lose
configuration policy defaults when they built the new kernel. To
avoid such problems, switch back to the old name (even if it's wrong).

Suggested-by: Christoph Hellwig <[email protected]>
Suggested-by: Damien Le Moal <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>

net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()

There is often not a MAC address available in an EEPROM accessible by
Linux with Marvell devices. Instead the bootload has the MAC address
and directly programs it into the hardware. So don't consider an error
from of_get_mac_address() has fatal. However, the check was added for
the case where there is a MAC address in an the EEPROM, but the EEPROM
has not probed yet, and -EPROBE_DEFER is returned. In that case the
error should be returned. So make the check specific to this error
code.

Cc: Mauri Sandberg <[email protected]>
Reported-by: Thomas Walther <[email protected]>
Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address")
Signed-off-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: openvswitch: don't send internal clone attribute to the userspace.

'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel.  It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.

The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump.  However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:

  # ovs-dpctl add-dp system@ovs-system
  # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
  # ovs-dpctl dump-flows
  <flow match>, packets:0, bytes:0, used:never,
    actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
                  ct(commit),0)

With the fix:

  # ovs-dpctl dump-flows
  <flow match>, packets:0, bytes:0, used:never,
    actions:clone(ct(commit),0)

Additionally fixed an incorrect attribute name in the comment.

Fixes: b233504033db ("openvswitch: kernel datapath clone action")
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: micrel: Fix KS8851 Kconfig

KS8851 selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL, so
make KS8851 also depend on PTP_1588_CLOCK_OPTIONAL.

Fixes kconfig warning and build errors:

WARNING: unmet direct dependencies detected for MICREL_PHY
  Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
    Selected by [y]:
      - KS8851 [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && SPI [=y]

ld.lld: error: undefined symbol: ptp_clock_register referenced by micrel.c
net/phy/micrel.o:(lan8814_probe) in archive drivers/built-in.a
ld.lld: error: undefined symbol: ptp_clock_index referenced by micrel.c
net/phy/micrel.o:(lan8814_ts_info) in archive drivers/built-in.a

Reported-by: kernel test robot <[email protected]>
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Horatiu Vultur <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

drm: bridge: icn6211: Fix DSI-to-DPI PLL configuration

The datasheet for this bridge is not available, the PLL behavior has been
inferred from [1] and [2] and by analyzing the DPI pixel clock with scope.
After further testing with other displays and different DSI data lane count,
it turns out the P-factor is not 1/2^N divider, but rather only 1/N divider.
It also turns out the input into the PLL seem to be ByteClock instead of DSI
HS clock.

Rework the P-factor calculation such that the PLL calculation code handles
P-factor from 1..32 with P-factors above 16 must be even. In case P-factor
is even, enable built-in 1:2 divider and program P-factor/2 to PLL_REF_DIV,
otherwise configure only the P-factor into PLL_REF_DIV register.

Switch the PLL factor calculation from kHz to Hz to maintain precision.

[1] https://github.com/rockchip-linux/kernel/blob/develop-4.19/drivers/gpu/drm/bridge/icn6211.c
[2] https://github.com/tdjastrzebski/ICN6211-Configurator

Fixes: f30cf0ece691 ("drm: bridge: icn6211: Add generic DSI-to-DPI PLL configuration")
Signed-off-by: Marek Vasut <[email protected]>
Cc: Jagan Teki <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Robert Foss <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
To: [email protected]
Acked-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/panel: innolux-ej030na and abt-y030xx067a: add .enable and .disable

Following the introduction of bridge_atomic_enable in the ingenic
drm driver, the crtc is enabled between .prepare and .enable, if
it exists. Add it so the backlight is only enabled after the crtc is, to
avoid graphical issues.

As we're moving the "sleep out" command out of the init sequence
into .enable for the ABT, we need to switch the regmap cache
to REGCACHE_FLAT to be able to use regmap_set_bits, given this
panel registers are write-ony and read as 0.

Signed-off-by: Christophe Branchereau <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
[pcercuei: Remove empty line after opening brace]
Acked-by: Sam Ravnborg <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/panel: Add panel driver for NewVision NV3052C based LCDs

This driver supports the NewVision NV3052C based LCDs. Right now, it
only supports the LeadTek LTK035C5444T 2.4" 640x480 TFT LCD panel, which
can be found in the Anbernic RG-350M handheld console.

Signed-off-by: Christophe Branchereau <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
[pcercuei: Change msleep(5) to usleep_range(5000, 20000)]
Reviewed-by: Sam Ravnborg <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/ingenic: Add ingenic_drm_bridge_atomic_enable and disable

ingenic_drm_bridge_atomic_enable allows the CRTC to be enabled after
panels have slept out, and before their display is turned on, solving
a graphical bug on the newvision nv3502c.

Also add ingenic_drm_bridge_atomic_disable to balance it out.

Signed-off-by: Christophe Branchereau <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
Acked-by: Artur Rojek <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden.

2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects,
   from Vasily Averin.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: memcg accounting for dynamically allocated objects
  netfilter: bitwise: fix reduce comparisons
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ACPI: bus: Eliminate acpi_bus_get_device()

Replace the last instance of acpi_bus_get_device(), added recently
by commit 87e59b36e5e2 ("spi: Support selection of the index of the
ACPI Spi Resource before alloc"), with acpi_fetch_acpi_dev() and
finally drop acpi_bus_get_device() that has no more users.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Mark Brown <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Fixes and cleanups:

   - A couple of mlx5 fixes related to cvq

   - A couple of reverts dropping useless code (code that used it got
     reverted earlier)"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa: mlx5: synchronize driver status with CVQ
  vdpa: mlx5: prevent cvq work from hogging CPU
  Revert "virtio_config: introduce a new .enable_cbs method"
  Revert "virtio: use virtio_device_ready() in virtio_device_restore()"

x86/speculation: Restore speculation related MSRs during S3 resume

After resuming from suspend-to-RAM, the MSRs that control CPU's
speculative execution behavior are not being restored on the boot CPU.

These MSRs are used to mitigate speculative execution vulnerabilities.
Not restoring them correctly may leave the CPU vulnerable. Secondary
CPU's MSRs are correctly being restored at S3 resume by
identify_secondary_cpu().

During S3 resume, restore these MSRs for boot CPU when restoring its
processor state.

Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS")
Reported-by: Neelima Krishnan <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Tested-by: Neelima Krishnan <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

x86/pm: Save the MSR validity status at context setup

The mechanism to save/restore MSRs during S3 suspend/resume checks for
the MSR validity during suspend, and only restores the MSR if its a
valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
throw an exception for every suspend cycle.  The more invalid MSRs,
higher the impact will be.

Check and save the MSR validity at setup.  This ensures that only valid
MSRs that are guaranteed to not throw an exception will be attempted
during suspend.

Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

ice: clear cmd_type_offset_bsz for TX rings

Currently when XDP rings are created, each descriptor gets its DD bit
set, which turns out to be the wrong approach as it can lead to a
situation where more descriptors get cleaned than it was supposed to,
e.g. when AF_XDP busy poll is run with a large batch size. In this
situation, the driver would request for more buffers than it is able to
handle.

Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They
should be initialized to zero instead.

Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Shwetha Nagaraju <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: xsk: fix VSI state check in ice_xsk_wakeup()

ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on
vsi->state in ice_xsk_wakeup().

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Shwetha Nagaraju <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: synchronize_rcu() when terminating rings

Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.

This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be757 ("net/i40e: Fix concurrency issues
between config flow and XSK")

Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: Shwetha Nagaraju <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

Merge tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- prevent deleting subvolume with active swapfile

- fix qgroup reserve limit calculation overflow

- remove device count in superblock and its item in one transaction so
   they cant't get out of sync

- skip defragmenting an isolated sector, this could cause some extra IO

- unify handling of mtime/permissions in hole punch with fallocate

- zoned mode fixes:
     - remove assert checking for only single mode, we have the
       DUP mode implemented
     - fix potential lockdep warning while traversing devices
       when checking for zone activation

* tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: prevent subvol with swapfile from being deleted
  btrfs: do not warn for free space inode in cow_file_range
  btrfs: avoid defragging extents whose next extents are not targets
  btrfs: fix fallocate to use file_modified to update permissions consistently
  btrfs: remove device item and update super block in the same transaction
  btrfs: fix qgroup reserve overflow the qgroup limit
  btrfs: zoned: remove left over ASSERT checking for single profile
  btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone

irqchip/gic, gic-v3: Prevent GSI to SGI translations

At the moment the GIC IRQ domain translation routine happily converts
ACPI table GSI numbers below 16 to GIC SGIs (Software Generated
Interrupts aka IPIs). On the Devicetree side we explicitly forbid this
translation, actually the function will never return HWIRQs below 16 when
using a DT based domain translation.

We expect SGIs to be handled in the first part of the function, and any
further occurrence should be treated as a firmware bug, so add a check
and print to report this explicitly and avoid lengthy debug sessions.

Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts")
Signed-off-by: Andre Przywara <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/gic-v3: Fix GICR_CTLR.RWP polling

It turns out that our polling of RWP is totally wrong when checking
for it in the redistributors, as we test the *distributor* bit index,
whereas it is a different bit number in the RDs... Oopsie boo.

This is embarassing. Not only because it is wrong, but also because
it took *8 years* to notice the blunder...

Just fix the damn thing.

Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: Marc Zyngier <[email protected]>
Cc: [email protected]
Reviewed-by: Andre Przywara <[email protected]>
Reviewed-by: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling

The way KVM drives GICv4.{0,1} is as follows:
- vcpu_load() makes the VPE resident, instructing the RD to start
  scanning for interrupts
- just before entering the guest, we check that the RD has finished
  scanning and that we can start running the vcpu
- on preemption, we deschedule the VPE by making it invalid on
  the RD

However, we are preemptible between the first two steps. If it so
happens *and* that the RD was still scanning, we nonetheless write
to the GICR_VPENDBASER register while Dirty is set, and bad things
happen (we're in UNPRED land).

This affects both the 4.0 and 4.1 implementations.

Make sure Dirty is cleared before performing the deschedule,
meaning that its_clear_vpend_valid() becomes a sort of full VPE
residency barrier.

Reported-by: Jingyi Wang <[email protected]>
Tested-by: Nianyao Tang <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit")
Link: https://lore.kernel.org/r/[email protected]

irqchip/irq-qcom-mpm: fix return value check in qcom_mpm_init()

If devm_platform_ioremap_resource() fails, it never returns
NULL, replace NULL check with IS_ERR().

Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Acked-by: Shawn Guo <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irq/qcom-mpm: Fix build error without MAILBOX

If MAILBOX is n, building fails:

drivers/irqchip/irq-qcom-mpm.o: In function `mpm_pd_power_off':
irq-qcom-mpm.c:(.text+0x174): undefined reference to `mbox_send_message'
irq-qcom-mpm.c:(.text+0x174): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `mbox_send_message'

Make QCOM_MPM depends on MAILBOX to fix this.

Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver")
Signed-off-by: YueHaibing <[email protected]>
Acked-by: Shawn Guo <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

random: opportunistically initialize on /dev/urandom reads

In 6f98a4bfee72 ("random: block in /dev/urandom"), we tried to make a
successful try_to_generate_entropy() call *required* if the RNG was not
already initialized. Unfortunately, weird architectures and old
userspaces combined in TCG test harnesses, making that change still not
realistic, so it was reverted in 0313bc278dac ("Revert "random: block in
/dev/urandom"").

However, rather than making a successful try_to_generate_entropy() call
*required*, we can instead make it *best-effort*.

If try_to_generate_entropy() fails, it fails, and nothing changes from
the current behavior. If it succeeds, then /dev/urandom becomes safe to
use for free. This way, we don't risk the regression potential that led
to us reverting the required-try_to_generate_entropy() call before.

Practically speaking, this means that at least on x86, /dev/urandom
becomes safe. Probably other architectures with working cycle counters
will also become safe. And architectures with slow or broken cycle
counters at least won't be affected at all by this change.

So it may not be the glorious "all things are unified!" change we were
hoping for initially, but practically speaking, it makes a positive
impact.

Cc: Theodore Ts'o <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>

kobject: kobj_type: remove default_attrs

Now that all in-kernel users of default_attrs for the kobj_type are gone
and converted to properly use the default_groups pointer instead, it can
be safely removed.

There is one standard way to create sysfs files in a kobj_type, and not
two like before, causing confusion as to which should be used.

Cc: "Rafael J. Wysocki" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

powerpc/pseries/vas: use default_groups in kobj_type

There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the pseries vas sysfs code to use default_groups field
which has been the preferred way since aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Haren Myneni <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915/dsb: modified to drm_info in dsb_prepare()

The request to aqquire gem resources is failing for DSB in rare
scenario where it is busy and the register programming will be done
through mmio fallback path.

DSB has extra advantage of faster register programming which may
go away through mmio path. Adding wait for gem resource also may
not be right as anyways losing time.

To make the CI execution happy replaced drm_err() to drm_info()
for printing debug info during dsb buffer preparation.

v1: Initial version.
v2: Added print for mmio fallback at out label. [Nirmoy]
v3: Improved debug message. [Nirmoy]

Cc: Nirmoy Das <[email protected]>
Signed-off-by: Animesh Manna <[email protected]>
Signed-off-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Nirmoy Das <[email protected]>

ipv6: Fix stats accounting in ip6_pkt_drop

VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.

Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: Pudak, Filip <[email protected]>
Reported-by: Xiao, Jiguang <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'ice-bug-fixes'

Tony Nguyen says:

====================
ice bug fixes

Alice Michael says:

There were a couple of bugs that have been found and
fixed by Anatolii in the ice driver.  First he fixed
a bug on ring creation by setting the default value
for the teid.  Anatolli also fixed a bug with deleting
queues in ice_vc_dis_qs_msg based on their enablement.
---
v2: Remove empty lines between tags

The following are changes since commit 458f5d92df4807e2a7c803ed928369129996bf96:
  sfc: Do not free an empty page_ring
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ice: Do not skip not enabled queues in ice_vc_dis_qs_msg

Disable check for queue being enabled in ice_vc_dis_qs_msg, because
there could be a case when queues were created, but were not enabled.
We still need to delete those queues.

Normal workflow for VF looks like:
Enable path:
VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)

Disable path:
VIRTCHNL_OP_DISABLE_QUEUES (opcode 9)
VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)

The issue appears only in stress conditions when VF is enabled and
disabled very fast.
Eventually there will be a case, when queues are created by
VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
VIRTCHNL_OP_ENABLE_QUEUES.
In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
because there is a check whether queues are enabled in
ice_vc_dis_qs_msg.

When we bring up the VF again, we will see the "Failed to set LAN Tx queue
context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
happens because old 16 queues were not deleted and VF requests to create
16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
find a parent node for first newly requested queue (because all nodes
are allocated to 16 old queues).

Testing Hints:

Just enable and disable VF fast enough, so it would be disabled before
reaching VIRTCHNL_OP_ENABLE_QUEUES.

while true; do
        ip link set dev ens785f0v0 up
        sleep 0.065 # adjust delay value for you machine
        ip link set dev ens785f0v0 down
done

Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Signed-off-by: Anatolii Gerasymenko <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Alice Michael <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

ice: Set txq_teid to ICE_INVAL_TEID on ring creation

When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.

The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).

Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down

If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[  639.531454] disable queue 89 failed 14
[  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5

The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.

As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:

node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
continue;

But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.

Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7")
Signed-off-by: Anatolii Gerasymenko <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Alice Michael <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe

This node pointer is returned by of_find_compatible_node() with
refcount incremented. Calling of_node_put() to aovid the refcount leak.

Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver")
Signed-off-by: Miaoqian Lin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netfilter: nf_tables: memcg accounting for dynamically allocated objects

nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
use __GFP_ACCOUNT flag for objects that are dynamically
allocated from the packet path.

Such objects are allocated inside nft_expr_ops->init() callbacks
executed in task context while processing netlink messages.

In addition, this patch adds accounting to nft_set_elem_expr_clone()
used for the same purposes.

Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

drm/nouveau: support more than one write fence in fenv50_wndw_prepare_fb

Use dma_resv_get_singleton() here to eventually get more than one write
fence as single fence.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge drm-misc/drm-misc-next-fixes into drm-misc-fixes

There were a few patches left in drm-misc-next-fixes, let's bring them
into drm-misc-fixes.

Signed-off-by: Maxime Ripard <[email protected]>

Merge drm/drm-fixes into drm-misc-fixes

Let's start the 5.18 fixes cycle.

Signed-off-by: Maxime Ripard <[email protected]>

Merge drm/drm-next into drm-misc-next

Let's start the 5.19 development cycle.

Signed-off-by: Maxime Ripard <[email protected]>

objtool: Fix SLS validation for kcov tail-call replacement

Since not all compilers have a function attribute to disable KCOV
instrumentation, objtool can rewrite KCOV instrumentation in noinstr
functions as per commit:

f56dae88a81f ("objtool: Handle __sanitize_cov*() tail calls")

However, this has subtle interaction with the SLS validation from
commit:

1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")

In that when a tail-call instrucion is replaced with a RET an
additional INT3 instruction is also written, but is not represented in
the decoded instruction stream.

This then leads to false positive missing INT3 objtool warnings in
noinstr code.

Instead of adding additional struct instruction objects, mark the RET
instruction with retpoline_safe to suppress the warning (since we know
there really is an INT3).

Fixes: 1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

objtool: Fix IBT tail-call detection

Objtool reports:

  arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx() falls through to next function poly1305_blocks_x86_64()
  arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_emit_avx() falls through to next function poly1305_emit_x86_64()
  arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2() falls through to next function poly1305_blocks_x86_64()

Which reads like:

0000000000000040 <poly1305_blocks_x86_64>:
40:       f3 0f 1e fa             endbr64
...

0000000000000400 <poly1305_blocks_avx>:
400:       f3 0f 1e fa             endbr64
404:       44 8b 47 14             mov    0x14(%rdi),%r8d
408:       48 81 fa 80 00 00 00    cmp    $0x80,%rdx
40f:       73 09                   jae    41a <poly1305_blocks_avx+0x1a>
411:       45 85 c0                test   %r8d,%r8d
414:       0f 84 2a fc ff ff       je     44 <poly1305_blocks_x86_64+0x4>
...

These are simple conditional tail-calls and *should* be recognised as
such by objtool, however due to a mistake in commit 08f87a93c8ec
("objtool: Validate IBT assumptions") this is failing.

Specifically, the jump_dest is +4, this means the instruction pointed
at will not be ENDBR and as such it will fail the second clause of
is_first_func_insn() that was supposed to capture this exact case.

Instead, have is_first_func_insn() look at the previous instruction.

Fixes: 08f87a93c8ec ("objtool: Validate IBT assumptions")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/bug: Prevent shadowing in __WARN_FLAGS

The macro __WARN_FLAGS() uses a local variable named "f". This being a
common name, there is a risk of shadowing other variables.

For example, GCC would yield:

| In file included from ./include/linux/bug.h:5,
|                  from ./include/linux/cpumask.h:14,
|                  from ./arch/x86/include/asm/cpumask.h:5,
|                  from ./arch/x86/include/asm/msr.h:11,
|                  from ./arch/x86/include/asm/processor.h:22,
|                  from ./arch/x86/include/asm/timex.h:5,
|                  from ./include/linux/timex.h:65,
|                  from ./include/linux/time32.h:13,
|                  from ./include/linux/time.h:60,
|                  from ./include/linux/stat.h:19,
|                  from ./include/linux/module.h:13,
|                  from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu':
| ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow]
|    80 |         __auto_type f = BUGFLAG_WARNING|(flags);                \
|       |                     ^
| ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS'
|   106 |                 __WARN_FLAGS(BUGFLAG_ONCE |                     \
|       |                 ^~~~~~~~~~~~
| ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE'
|  1007 |         WARN_ON_ONCE(func != (rcu_callback_t)~0L);
|       |         ^~~~~~~~~~~~
| In file included from ./include/linux/rbtree.h:24,
|                  from ./include/linux/mm_types.h:11,
|                  from ./include/linux/buildid.h:5,
|                  from ./include/linux/module.h:14,
|                  from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here
|  1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
|       |                                               ~~~~~~~~~~~~~~~^

For reference, sparse also warns about it, c.f. [1].

This patch renames the variable from f to __flags (with two underscore
prefixes as suggested in the Linux kernel coding style [2]) in order
to prevent collisions.

[1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/

[2] Linux kernel coding style, section 12) Macros, Enums and RTL,
paragraph 5) namespace collisions when defining local variables in
macros resembling functions
https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl

Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm")
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

drm/i915/dp: Fix DFP rgb->ycbcr conversion matrix

Our YCbCr output is always supposed to be limited range BT.709.
That's what we send with native HDMI. The conn_state->colorspace
stuff is entirely independent of that and is not supposed to alter
the generated output in any way. If we want a way to do that then
we need a new proprty for it.

Make it so that the RGB->YCbCr conversion when performed by the
DPF will match the BT.709 we would transmit with native HDMI.

Cc: Ankit Nautiyal <[email protected]>
Cc: Uma Shankar <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Duplicate native HDMI TMDS clock limit handling for DP HDMI DFPs

With native HDMI we allow the user to override the mode with
something that may not respect the downstream (sink,dual-mode adapter)
TMDS clock limits. Let's reuse the same logic for DP HDMI DFPs
so that behaviour is more or less uniform.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Add support for "4:2:0 also" modes for DP

Currently we only support "4:2:0 also" modes on native HDMI.
Extend that support for DP as well.

With all the HDMI DFP TMDS clock handling sorted out this
is now going to work for both native DP and DP->HDMI
converters. As with native HDMI we first check if RGB
output is possible, and if not we try YCbCr 4:2:0 instead.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Rework HDMI DFP TMDS clock handling

Rework the HDMI DFP TMDS clock checks to also check at 8bpc.
Previously we only checked the deep color cases. But I suppose
a sink could potentially declare "4:2:0 also" modes that only
actually fit within its own limits when using 4:2:0. Even if
that is too nuts to be real there is no real harm in running
through the full checks for everything.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Make intel_dp_output_format() usable for "4:2:0 also" modes

Hoist the drm_mode_is_420_only() from intel_dp_output_format()
into the caller. This will allow intel_dp_output_format() to be
reused for "4:2:0 also" modes.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Pass around intel_connector rather than drm_connector

Prefer to use intel_connector over drm_connector. Also clean
up the related variable names a bit.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Reorder intel_dp_compute_config() a bit

Consolidate the double pfit call, and reorder things so that
intel_dp_output_format() and intel_dp_compute_link_config() are
back-to-back. They are intimately related, and will need to be
called twice to properly handle the "4:2:0 also" modes.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: s/intel_dp_hdmi_ycbcr420/intel_dp_is_ycbcr420/

intel_dp_hdmi_ycbcr420() does account for native DP 4:2:0
output as well, so lets rename it a bit.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Extract intel_dp_has_audio()

Declutter intel_dp_compute_config() a bit by moving the
has_audio computation into a helper. HDMI already does the same thing.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Respect the sink's max TMDS clock when dealing with DP->HDMI DFPs

Currently we only look at the DFPs max TMDS clock limit when
considering whether the mode is valid, or whether we can do
deep color. The sink's max TMDS clock limit may be lower than
the DFPs, so we need to account for it as well.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4095
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2844
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

drm/i915/dp: Extract intel_dp_tmds_clock_valid()

We're currently duplicating the DFP min/max TMDS clock checks
in .mode_valid() and .compute_config(). Extract a helper suitable
for both use cases.

Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>

perf/core: Always set cpuctx cgrp when enable cgroup event

When enable a cgroup event, cpuctx->cgrp setting is conditional
on the current task cgrp matching the event's cgroup, so have to
do it for every new event. It brings complexity but no advantage.

To keep it simple, this patch would always set cpuctx->cgrp
when enable the first cgroup event, and reset to NULL when disable
the last cgroup event.

Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/core: Fix perf_cgroup_switch()

There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp)
in perf_cgroup_switch().

CPU1 CPU2
perf_cgroup_sched_out(prev, next)
  cgrp1 = perf_cgroup_from_task(prev)
  cgrp2 = perf_cgroup_from_task(next)
  if (cgrp1 != cgrp2)
    perf_cgroup_switch(prev, PERF_CGROUP_SWOUT)
cgroup_migrate_execute()
  task->cgroups = ?
  perf_cgroup_attach()
    task_function_call(task, __perf_cgroup_move)
perf_cgroup_sched_in(prev, next)
  cgrp1 = perf_cgroup_from_task(prev)
  cgrp2 = perf_cgroup_from_task(next)
  if (cgrp1 != cgrp2)
    perf_cgroup_switch(next, PERF_CGROUP_SWIN)
__perf_cgroup_move()
  perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN)

The commit a8d757ef076f ("perf events: Fix slow and broken cgroup
context switch code") want to skip perf_cgroup_switch() when the
perf_cgroup of "prev" and "next" are the same.

But task->cgroups can change in concurrent with context_switch()
in cgroup_migrate_execute(). If cgrp1 == cgrp2 in sched_out(),
cpuctx won't do sched_out. Then task->cgroups changed cause
cgrp1 != cgrp2 in sched_in(), cpuctx will do sched_in. So trigger
WARN_ON_ONCE(cpuctx->cgrp).

Even though __perf_cgroup_move() will be synchronized as the context
switch disables the interrupt, context_switch() still can see the
task->cgroups is changing in the middle, since task->cgroups changed
before sending IPI.

So we have to combine perf_cgroup_sched_in() into perf_cgroup_sched_out(),
unified into perf_cgroup_switch(), to fix the incosistency between
perf_cgroup_sched_out() and perf_cgroup_sched_in().

But we can't just compare prev->cgroups with next->cgroups to decide
whether to skip cpuctx sched_out/in since the prev->cgroups is changing
too. For example:

CPU1 CPU2
cgroup_migrate_execute()
  prev->cgroups = ?
  perf_cgroup_attach()
    task_function_call(task, __perf_cgroup_move)
perf_cgroup_switch(task)
  cgrp1 = perf_cgroup_from_task(prev)
  cgrp2 = perf_cgroup_from_task(next)
  if (cgrp1 != cgrp2)
    cpuctx sched_out/in ...
task_function_call() will return -ESRCH

In the above example, prev->cgroups changing cause (cgrp1 == cgrp2)
to be true, so skip cpuctx sched_out/in. And later task_function_call()
would return -ESRCH since the prev task isn't running on cpu anymore.
So we would leave perf_events of the old prev->cgroups still sched on
the CPU, which is wrong.

The solution is that we should use cpuctx->cgrp to compare with
the next task's perf_cgroup. Since cpuctx->cgrp can only be changed
on local CPU, and we have irq disabled, we can read cpuctx->cgrp to
compare without holding ctx lock.

Fixes: a8d757ef076f ("perf events: Fix slow and broken cgroup context switch code")
Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/core: Use perf_cgroup_info->active to check if cgroup is active

Since we use perf_cgroup_set_timestamp() to start cgroup time and
set active to 1, then use update_cgrp_time_from_cpuctx() to stop
cgroup time and set active to 0.

We can use info->active directly to check if cgroup is active.

Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/core: Don't pass task around when ctx sched in

The current code pass task around for ctx_sched_in(), only
to get perf_cgroup of the task, then update the timestamp
of it and its ancestors and set them to active.

But we can use cpuctx->cgrp to get active perf_cgroup and
its ancestors since cpuctx->cgrp has been set before
ctx_sched_in().

This patch remove the task argument in ctx_sched_in()
and cleanup related code.

Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids

On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the
FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't
support it.

Update intel_spr_extra_regs[] to support it.

Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

perf/x86/intel: Don't extend the pseudo-encoding to GP counters

The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR.
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0

Performance counter stats for 'CPU(s) 0':

           607,246      cpu/event=0xc0,umask=0x0/
                 0      cpu/event=0x0,umask=0x1/

The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which
doesn't work on the generic counters. However, current perf extends its
mask to the generic counters.

The pseudo event-code for a fixed counter must be 0x00. Check and avoid
extending the mask for the fixed counter event which using the
pseudo-encoding, e.g., ref-cycles and PREC_DIST event.

With the patch,
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0

Performance counter stats for 'CPU(s) 0':

           583,184      cpu/event=0xc0,umask=0x0/
           583,048      cpu/event=0x0,umask=0x1/

Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

perf/core: Inherit event_caps

It was reported that some perf event setup can make fork failed on
ARM64.  It was the case of a group of mixed hw and sw events and it
failed in perf_event_init_task() due to armpmu_event_init().

The ARM PMU code checks if all the events in a group belong to the
same PMU except for software events.  But it didn't set the event_caps
of inherited events and no longer identify them as software events.
Therefore the test failed in a child process.

A simple reproducer is:

  $ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  perf: fork(): Invalid argument

The perf stat was fine but the perf bench failed in fork().  Let's
inherit the event caps from the parent.

Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

perf/x86/uncore: Add Raptor Lake uncore support

The uncore PMU of the Raptor Lake is the same as Alder Lake.
Add new PCIIDs of IMC for Raptor Lake.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/x86/msr: Add Raptor Lake CPU support

Raptor Lake is Intel's successor to Alder lake. PPERF and SMI_COUNT MSRs
are also supported.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/x86/cstate: Add Raptor Lake support

Raptor Lake is Intel's successor to Alder lake. From the perspective of
Intel cstate residency counters, there is nothing changed compared with
Alder lake.

Share adl_cstates with Alder lake.
Update the comments for Raptor Lake.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/x86: Add Intel Raptor Lake support

From PMU's perspective, Raptor Lake is the same as the Alder Lake. The
only difference is the event list, which will be supported in the perf
tool later.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Revert "mm/page_alloc: mark pagesets as __maybe_unused"

The local_lock() is now using a proper static inline function which is
enough for llvm to accept that the variable is used.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Revert "locking/local_lock: Make the empty local_lock_*() function a macro."

With volatile removed from arch_raw_cpu_ptr() the compiler no longer
creates the per-CPU reference. The usage of the macro can be reverted
now.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/percpu: Remove volatile from arch_raw_cpu_ptr().

The volatile attribute in the inline assembly of arch_raw_cpu_ptr()
forces the compiler to always generate the code, even if the compiler
can decide upfront that its result is not needed.

For instance invoking __intel_pmu_disable_all(false) (like
intel_pmu_snapshot_arch_branch_stack() does) leads to loading the
address of &cpu_hw_events into the register while compiler knows that it
has no need for it. This ends up with code like:

| movq $cpu_hw_events, %rax #, tcp_ptr__
| add %gs:this_cpu_off(%rip), %rax # this_cpu_off, tcp_ptr__
| xorl %eax, %eax # tmp93

It also creates additional code within local_lock() with !RT &&
!LOCKDEP which is not desired.

By removing the volatile attribute the compiler can place the
function freely and avoid it if it is not needed in the end.
By using the function twice the compiler properly caches only the
variable offset and always loads the CPU-offset.

this_cpu_ptr() also remains properly placed within a preempt_disable()
sections because
- arch_raw_cpu_ptr() assembly has a memory input ("m" (this_cpu_off))
- prempt_{dis,en}able() fundamentally has a 'barrier()' in it

Therefore this_cpu_ptr() is already properly serialized and does not
rely on the 'volatile' attribute.

Remove volatile from arch_raw_cpu_ptr().

[ bigeasy: Added Linus' explanation why this_cpu_ptr() is not moved out
of a preempt_disable() section without the 'volatile' attribute. ]

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

static_call: Remove __DEFINE_STATIC_CALL macro

Only DEFINE_STATIC_CALL use __DEFINE_STATIC_CALL macro now when
CONFIG_HAVE_STATIC_CALL is selected.

Only keep __DEFINE_STATIC_CALL() for the generic fallback, and
also use it to implement DEFINE_STATIC_CALL_NULL() in that case.

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/329074f92d96e3220ebe15da7bbe2779beee31eb.1647253456.git.christophe.leroy@csgroup.eu

static_call: Properly initialise DEFINE_STATIC_CALL_RET0()

When a static call is updated with __static_call_return0() as target,
arch_static_call_transform() set it to use an optimised set of
instructions which are meant to lay in the same cacheline.

But when initialising a static call with DEFINE_STATIC_CALL_RET0(),
we get a branch to the real __static_call_return0() function instead
of getting the optimised setup:

c00d8120 <__SCT__perf_snapshot_branch_stack>:
c00d8120: 4b ff ff f4 b       c00d8114 <__static_call_return0>
c00d8124: 3d 80 c0 0e lis     r12,-16370
c00d8128: 81 8c 81 3c lwz     r12,-32452(r12)
c00d812c: 7d 89 03 a6 mtctr   r12
c00d8130: 4e 80 04 20 bctr
c00d8134: 38 60 00 00 li      r3,0
c00d8138: 4e 80 00 20 blr
c00d813c: 00 00 00 00 .long 0x0

Add ARCH_DEFINE_STATIC_CALL_RET0_TRAMP() defined by each architecture
to setup the optimised configuration, and rework
DEFINE_STATIC_CALL_RET0() to call it:

c00d8120 <__SCT__perf_snapshot_branch_stack>:
c00d8120: 48 00 00 14 b       c00d8134 <__SCT__perf_snapshot_branch_stack+0x14>
c00d8124: 3d 80 c0 0e lis     r12,-16370
c00d8128: 81 8c 81 3c lwz     r12,-32452(r12)
c00d812c: 7d 89 03 a6 mtctr   r12
c00d8130: 4e 80 04 20 bctr
c00d8134: 38 60 00 00 li      r3,0
c00d8138: 4e 80 00 20 blr
c00d813c: 00 00 00 00 .long 0x0

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lore.kernel.org/r/1e0a61a88f52a460f62a58ffc2a5f847d1f7d9d8.1647253456.git.christophe.leroy@csgroup.eu

static_call: Don't make __static_call_return0 static

System.map shows that vmlinux contains several instances of
__static_call_return0():

c0004fc0 t __static_call_return0
c0011518 t __static_call_return0
c00d8160 t __static_call_return0

arch_static_call_transform() uses the middle one to check whether we are
setting a call to __static_call_return0 or not:

c0011520 <arch_static_call_transform>:
c0011520:       3d 20 c0 01     lis     r9,-16383 <== r9 =  0xc001 << 16
c0011524:       39 29 15 18     addi    r9,r9,5400 <== r9 += 0x1518
c0011528:       7c 05 48 00     cmpw    r5,r9 <== r9 has value 0xc0011518 here

So if static_call_update() is called with one of the other instances of
__static_call_return0(), arch_static_call_transform() won't recognise it.

In order to work properly, global single instance of __static_call_return0() is required.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu

x86,static_call: Fix __static_call_return0 for i386

Paolo reported that the instruction sequence that is used to replace:

    call __static_call_return0

namely:

    66 66 48 31 c0 data16 data16 xor %rax,%rax

decodes to something else on i386, namely:

    66 66 48 data16 dec %ax
    31 c0 xor    %eax,%eax

Which is a nonsensical sequence that happens to have the same outcome.
*However* an important distinction is that it consists of 2
instructions which is a problem when the thing needs to be overwriten
with a regular call instruction again.

As such, replace the instruction with something that decodes the same
on both i386 and x86_64.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

entry: Fix compile error in dynamic_irqentry_exit_cond_resched()

kernel/entry/common.c: In function ‘dynamic_irqentry_exit_cond_resched’:
kernel/entry/common.c:409:14: error: implicit declaration of function ‘static_key_unlikely’; did you mean ‘static_key_enable’? [-Werror=implicit-function-declaration]
  409 |         if (!static_key_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
      |              ^~~~~~~~~~~~~~~~~~~
      |              static_key_enable

static_key_unlikely() should be static_branch_unlikely().

Fixes: 99cf983cc8bca ("sched/preempt: Add PREEMPT_DYNAMIC using static keys")
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

sched: Teach the forced-newidle balancer about CPU affinity limitation.

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/core: Fix forceidle balancing

Steve reported that ChromeOS encounters the forceidle balancer being
ran from rt_mutex_setprio()'s balance_callback() invocation and
explodes.

Now, the forceidle balancer gets queued every time the idle task gets
selected, set_next_task(), which is strictly too often.
rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:

queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
running = task_current(rq, p); /* rq->curr == p */

if (queued)
dequeue_task(...);
if (running)
put_prev_task(...);

/* change task properties */

if (queued)
enqueue_task(...);
if (running)
set_next_task(...);

However, rt_mutex_setprio() will explicitly not run this pattern on
the idle task (since priority boosting the idle task is quite insane).
Most other 'change' pattern users are pidhash based and would also not
apply to idle.

Also, the change pattern doesn't contain a __balance_callback()
invocation and hence we could have an out-of-band balance-callback,
which *should* trigger the WARN in rq_pin_lock() (which guards against
this exact anti-pattern).

So while none of that explains how this happens, it does indicate that
having it in set_next_task() might not be the most robust option.

Instead, explicitly queue the forceidle balancer from pick_next_task()
when it does indeed result in forceidle selection. Having it here,
ensures it can only be triggered under the __schedule() rq->lock
instance, and hence must be ran from that context.

This also happens to clean up the code a little, so win-win.

Fixes: d2dfa17bc7de ("sched: Trivial forced-newidle balancer")
Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: T.J. Alumbaugh <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

dma-buf: finally make dma_resv_excl_fence private v2

Drivers should never touch this directly.

v2: fix rebase clash

Signed-off-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Daniel Vetter <[email protected]>

dt-bindings: display: bridge: Drop requirement on input port for DSI devices

MIPI-DSI devices, if they are controlled through the bus itself, have to
be described as a child node of the controller they are attached to.

Thus, there's no requirement on the controller having an OF-Graph output
port to model the data stream: it's assumed that it would go from the
parent to the child.

However, some bridges controlled through the DSI bus still require an
input OF-Graph port, thus requiring a controller with an OF-Graph output
port. This prevents those bridges from being used with the controllers
that do not have one without any particular reason to.

Let's drop that requirement.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

sctp: count singleton chunks in assoc user stats

Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN-
COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks"
counter available to the assoc owner.

These are all control chunks so they should be counted as such.

Add counting of singleton chunks so they are properly accounted for.

Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call")
Signed-off-by: Jamie Bainbridge <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <[email protected]>

drm/nouveau: stop using dma_resv_excl_fence

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Lyude Paul <[email protected]>
Cc: [email protected]
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

cifs: update internal module number

To 2.36

Signed-off-by: Steve French <[email protected]>

cifs: force new session setup and tcon for dfs

Do not reuse existing sessions and tcons in DFS failover as it might
connect to different servers and shares.

Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Cc: [email protected]
Reviewed-by: Enzo Matsumiya <[email protected]>
Signed-off-by: Steve French <[email protected]>

io_uring: move read/write file prep state into actual opcode handler

In preparation for not necessarily having a file assigned at prep time,
defer any initialization associated with the file to when the opcode
handler is run.

Cc: [email protected] # v5.15+
Signed-off-by: Jens Axboe <[email protected]>

io_uring: defer splice/tee file validity check until command issue

In preparation for not using the file at prep time, defer checking if this
file refers to a valid io_uring instance until issue time.

This also means we can get rid of the cleanup flag for splice and tee.

Cc: [email protected] # v5.15+
Signed-off-by: Jens Axboe <[email protected]>

drm/nouveau/pmu: Add missing callbacks for Tegra devices

Fixes a crash booting on those platforms with nouveau.

Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence")
Cc: Ben Skeggs <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # v5.17+
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/nouveau/clk: Fix an incorrect NULL check on list iterator

The bug is here:
if (nvkm_cstate_valid(clk, cstate, max_volt, clk->temp))
return cstate;

The list iterator value 'cstate' will *always* be set and non-NULL
by list_for_each_entry_from_reverse(), so it is incorrect to assume
that the iterator value will be unchanged if the list is empty or no
element is found (In fact, it will be a bogus pointer to an invalid
structure object containing the HEAD). Also it missed a NULL check
at callsite and may lead to invalid memory access after that.

To fix this bug, just return 'encoder' when found, otherwise return
NULL. And add the NULL check.

Cc: [email protected]
Fixes: 1f7f3d91ad38a ("drm/nouveau/clk: Respect voltage limits in nvkm_cstate_prog")
Signed-off-by: Xiaomeng Tong <[email protected]>
Reviewed-by: Lyude Paul <[email protected]>
Signed-off-by: Lyude Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

selftests/harness: Pass variant to teardown

FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as "variant".

In some cases, the variant will change the setup, such that expectations
also change on teardown. Also pass variant to FIXTURE_TEARDOWN.

The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP,
right above.

Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Shuah Khan <[email protected]>

selftests/harness: Run TEARDOWN for ASSERT failures

The kselftest test harness has traditionally not run the registered
TEARDOWN handler when a test encountered an ASSERT. This creates
unexpected situations and tests need to be very careful about using
ASSERT, which seems a needless hurdle for test writers.

Because of the harness's design for optional failure handlers, the
original implementation of ASSERT used an abort() to immediately
stop execution, but that meant the context for running teardown was
lost. Instead, use setjmp/longjmp so that teardown can be done.

Failed SETUP routines continue to not be followed by TEARDOWN, though.

Cc: Andy Lutomirski <[email protected]>
Cc: Will Drewry <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

selftests: fix an unused variable warning in pidfd selftest

I fixed a few warnings like this in commit e2aa5e650b07
("selftests: fixup build warnings in pidfd / clone3 tests"), but I
missed this one by mistake. Since this variable is unused, remove it.

Signed-off-by: Axel Rasmussen <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>