]> Git Repo - linux.git/log
linux.git
7 months agodrm/xe: Fix use after free when client stats are captured
Umesh Nerlige Ramappa [Thu, 18 Jul 2024 21:05:48 +0000 (14:05 -0700)]
drm/xe: Fix use after free when client stats are captured

xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.

While at it, revert an earlier change that contained a partial work
around for this issue.

v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)

v3: Fix typo (Lucas)

Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks")
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 2149ded63079449b8dddf9da38392632f155e6b5)
Signed-off-by: Rodrigo Vivi <[email protected]>
7 months agodrm/xe: Take a ref to xe file when user creates a VM
Umesh Nerlige Ramappa [Thu, 18 Jul 2024 21:05:47 +0000 (14:05 -0700)]
drm/xe: Take a ref to xe file when user creates a VM

Take a reference to xef when user creates the VM and put the reference
when user destroys the VM.

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit a2387e69493df3de706f14e4573ee123d23d5d34)
Signed-off-by: Rodrigo Vivi <[email protected]>
7 months agodrm/xe: Add ref counting for xe_file
Umesh Nerlige Ramappa [Thu, 18 Jul 2024 21:05:46 +0000 (14:05 -0700)]
drm/xe: Add ref counting for xe_file

Add ref counting for xe_file.

v2:
- Add kernel doc for exported functions (Matt)
- Instead of xe_file_destroy, export the get/put helpers (Lucas)

v3: Fixup the kernel-doc format and description (Matt, Lucas)

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit ce8c161cbad43f4056451e541f7ae3471d0cca12)
Signed-off-by: Rodrigo Vivi <[email protected]>
7 months agodrm/xe: Move part of xe_file cleanup to a helper
Umesh Nerlige Ramappa [Thu, 18 Jul 2024 21:05:45 +0000 (14:05 -0700)]
drm/xe: Move part of xe_file cleanup to a helper

In order to make xe_file ref counted, move destruction of xe_file
members to a helper.

v2: Move xe_vm_close_and_put back into xe_file_close (Matt)

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
(cherry picked from commit 3d0c4a62cc553c6ffde4cb11620eba991e770665)
Signed-off-by: Rodrigo Vivi <[email protected]>
7 months agodrm/xe: Validate user fence during creation
Matthew Brost [Wed, 17 Jul 2024 14:04:28 +0000 (07:04 -0700)]
drm/xe: Validate user fence during creation

Fail invalid addresses during user fence creation.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 0fde907da2d5fd4da68845e96c6842497159c858)
Signed-off-by: Rodrigo Vivi <[email protected]>
7 months agoMerge tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 15 Aug 2024 11:25:06 +0000 (13:25 +0200)]
Merge tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Ignores ifindex for types other than mcast/linklocal in ipv6 frag
   reasm, from Tom Hughes.

2) Initialize extack for begin/end netlink message marker in batch,
   from Donald Hunter.

3) Initialize extack for flowtable offload support, also from Donald.

4) Dropped packets with cloned unconfirmed conntracks in nfqueue,
   later it should be possible to explore lookup after reinject but
   Florian prefers this approach at this stage. From Florian Westphal.

5) Add selftest for cloned unconfirmed conntracks in nfqueue for
   previous update.

6) Audit after filling netlink header successfully in object dump,
   from Phil Sutter.

7-8) Fix concurrent dump and reset which could result in underflow
     counter / quota objects.

netfilter pull request 24-08-15

* tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
  netfilter: nf_tables: Introduce nf_tables_getobj_single
  netfilter: nf_tables: Audit log dump reset after the fact
  selftests: netfilter: add test for br_netfilter+conntrack+queue combination
  netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
  netfilter: flowtable: initialise extack before use
  netfilter: nfnetlink: Initialise extack before use in ACKs
  netfilter: allow ipv6 fragments to arrive on different devices
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agoMerge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Paolo Abeni [Thu, 15 Aug 2024 11:07:10 +0000 (13:07 +0200)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: hns3: use correct release function during uninitialization
Peiyang Wang [Tue, 13 Aug 2024 14:10:24 +0000 (22:10 +0800)]
net: hns3: use correct release function during uninitialization

pci_request_regions is called to apply for PCI I/O and memory resources
when the driver is initialized, Therefore, when the driver is uninstalled,
pci_release_regions should be used to release PCI I/O and memory resources
instead of pci_release_mem_regions is used to release memory reasouces
only.

Signed-off-by: Peiyang Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: hns3: void array out of bound when loop tnl_num
Peiyang Wang [Tue, 13 Aug 2024 14:10:23 +0000 (22:10 +0800)]
net: hns3: void array out of bound when loop tnl_num

When query reg inf of SSU, it loops tnl_num times. However, tnl_num comes
from hardware and the length of array is a fixed value. To void array out
of bound, make sure the loop time is not greater than the length of array

Signed-off-by: Peiyang Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: hns3: fix a deadlock problem when config TC during resetting
Jie Wang [Tue, 13 Aug 2024 14:10:22 +0000 (22:10 +0800)]
net: hns3: fix a deadlock problem when config TC during resetting

When config TC during the reset process, may cause a deadlock, the flow is
as below:
                             pf reset start
                                 │
                                 ▼
                              ......
setup tc                         │
    │                            ▼
    ▼                      DOWN: napi_disable()
napi_disable()(skip)             │
    │                            │
    ▼                            ▼
  ......                      ......
    │                            │
    ▼                            │
napi_enable()                    │
                                 ▼
                           UINIT: netif_napi_del()
                                 │
                                 ▼
                              ......
                                 │
                                 ▼
                           INIT: netif_napi_add()
                                 │
                                 ▼
                              ......                 global reset start
                                 │                      │
                                 ▼                      ▼
                           UP: napi_enable()(skip)    ......
                                 │                      │
                                 ▼                      ▼
                              ......                 napi_disable()

In reset process, the driver will DOWN the port and then UINIT, in this
case, the setup tc process will UP the port before UINIT, so cause the
problem. Adds a DOWN process in UINIT to fix it.

Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: hns3: use the user's cfg after reset
Peiyang Wang [Tue, 13 Aug 2024 14:10:21 +0000 (22:10 +0800)]
net: hns3: use the user's cfg after reset

Consider the followed case that the user change speed and reset the net
interface. Before the hw change speed successfully, the driver get old
old speed from hw by timer task. After reset, the previous speed is config
to hw. As a result, the new speed is configed successfully but lost after
PF reset. The followed pictured shows more dirrectly.

+------+              +----+                 +----+
| USER |              | PF |                 | HW |
+---+--+              +-+--+                 +-+--+
    |  ethtool -s 100G  |                      |
    +------------------>|   set speed 100G     |
    |                   +--------------------->|
    |                   |  set successfully    |
    |                   |<---------------------+---+
    |                   |query cfg (timer task)|   |
    |                   +--------------------->|   | handle speed
    |                   |     return 200G      |   | changing event
    |  ethtool --reset  |<---------------------+   | (100G)
    +------------------>|  cfg previous speed  |<--+
    |                   |  after reset (200G)  |
    |                   +--------------------->|
    |                   |                      +---+
    |                   |query cfg (timer task)|   |
    |                   +--------------------->|   | handle speed
    |                   |     return 100G      |   | changing event
    |                   |<---------------------+   | (200G)
    |                   |                      |<--+
    |                   |query cfg (timer task)|
    |                   +--------------------->|
    |                   |     return 200G      |
    |                   |<---------------------+
    |                   |                      |
    v                   v                      v

This patch save new speed if hw change speed successfully, which will be
used after reset successfully.

Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary")
Signed-off-by: Peiyang Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: hns3: fix wrong use of semaphore up
Jie Wang [Tue, 13 Aug 2024 14:10:20 +0000 (22:10 +0800)]
net: hns3: fix wrong use of semaphore up

Currently, if hns3 PF or VF FLR reset failed after five times retry,
the reset done process will directly release the semaphore
which has already released in hclge_reset_prepare_general.
This will cause down operation fail.

So this patch fixes it by adding reset state judgement. The up operation is
only called after successful PF FLR reset.

Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR")
Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR")
Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
7 months agoselftests: net: lib: kill PIDs before del netns
Matthieu Baerts (NGI0) [Tue, 13 Aug 2024 13:39:34 +0000 (15:39 +0200)]
selftests: net: lib: kill PIDs before del netns

When deleting netns, it is possible to still have some tasks running,
e.g. background tasks like tcpdump running in the background, not
stopped because the test has been interrupted.

Before deleting the netns, it is then safer to kill all attached PIDs,
if any. That should reduce some noises after the end of some tests, and
help with the debugging of some issues. That's why this modification is
seen as a "fix".

Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Acked-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Reviewed-by: Hangbin Liu <[email protected]>
Link: https://patch.msgid.link/20240813-upstream-net-20240813-selftests-net-lib-kill-v1-1-27b689b248b8@kernel.org
Signed-off-by: Paolo Abeni <[email protected]>
7 months agopse-core: Conditionally set current limit during PI regulator registration
Oleksij Rempel [Tue, 13 Aug 2024 07:37:19 +0000 (09:37 +0200)]
pse-core: Conditionally set current limit during PI regulator registration

Fix an issue where `devm_regulator_register()` would fail for PSE
controllers that do not support current limit control, such as simple
GPIO-based controllers like the podl-pse-regulator. The
`REGULATOR_CHANGE_CURRENT` flag and `max_uA` constraint are now
conditionally set only if the `pi_set_current_limit` operation is
supported. This change prevents the regulator registration routine from
attempting to call `pse_pi_set_current_limit()`, which would return
`-EOPNOTSUPP` and cause the registration to fail.

Fixes: 4a83abcef5f4f ("net: pse-pd: Add new power limit get and set c33 features")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Kory Maincent <[email protected]>
Tested-by: Kyle Swenson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agodrm/rockchip: inno-hdmi: Fix infoframe upload
Alex Bee [Mon, 5 Aug 2024 11:08:56 +0000 (13:08 +0200)]
drm/rockchip: inno-hdmi: Fix infoframe upload

HDMI analyser shows that the AVI infoframe is no being longer send.

The switch to the HDMI connector api should have used the frame content
which is now given in the buffer parameter, but instead still uses the
(now) empty and superfluous packed_frame variable.

Fix it.

Fixes: 65548c8ff0ab ("drm/rockchip: inno_hdmi: Switch to HDMI connector")
Signed-off-by: Alex Bee <[email protected]>
Acked-by: Maxime Ripard <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agonet: thunder_bgx: Fix netdev structure allocation
Marc Zyngier [Mon, 12 Aug 2024 14:13:22 +0000 (15:13 +0100)]
net: thunder_bgx: Fix netdev structure allocation

Commit 94833addfaba ("net: thunderx: Unembed netdev structure") had
a go at dynamically allocating the netdev structures for the thunderx_bgx
driver.  This change results in my ThunderX box catching fire (to be fair,
it is what it does best).

The issues with this change are that:

- bgx_lmac_enable() is called *after* bgx_acpi_register_phy() and
  bgx_init_of_phy(), both expecting netdev to be a valid pointer.

- bgx_init_of_phy() populates the MAC addresses for *all* LMACs
  attached to a given BGX instance, and thus needs netdev for each of
  them to have been allocated.

There is a few things to be said about how the driver mixes LMAC and
BGX states which leads to this sorry state, but that's beside the point.

To address this, go back to a situation where all netdev structures
are allocated before the driver starts relying on them, and move the
freeing of these structures to driver removal. Someone brave enough
can always go and restructure the driver if they want.

Fixes: 94833addfaba ("net: thunderx: Unembed netdev structure")
Signed-off-by: Marc Zyngier <[email protected]>
Cc: Breno Leitao <[email protected]>
Cc: Sunil Goutham <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Breno Leitao <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agonet: ethtool: Allow write mechanism of LPL and both LPL and EPL
Danielle Ratson [Mon, 12 Aug 2024 14:08:24 +0000 (17:08 +0300)]
net: ethtool: Allow write mechanism of LPL and both LPL and EPL

CMIS 5.2 standard section 9.4.2 defines four types of firmware update
supported mechanism: None, only LPL, only EPL, both LPL and EPL.

Currently, only LPL (Local Payload) type of write firmware block is
supported. However, if the module supports both LPL and EPL the flashing
process wrongly fails for no supporting LPL.

Fix that, by allowing the write mechanism to be LPL or both LPL and
EPL.

Fixes: c4f78134d45c ("ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB")
Reported-by: Vladyslav Mykhaliuk <[email protected]>
Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agovsock: fix recursive ->recvmsg calls
Cong Wang [Mon, 12 Aug 2024 02:21:53 +0000 (19:21 -0700)]
vsock: fix recursive ->recvmsg calls

After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
has been replaced with vsock_bpf_recvmsg(). Thus the following
recursiion could happen:

vsock_bpf_recvmsg()
 -> __vsock_recvmsg()
  -> vsock_connectible_recvmsg()
   -> prot->recvmsg()
    -> vsock_bpf_recvmsg() again

We need to fix it by calling the original ->recvmsg() without any BPF
sockmap logic in __vsock_recvmsg().

Fixes: 634f1a7110b4 ("vsock: support sockmap")
Reported-by: [email protected]
Tested-by: [email protected]
Cc: Bobby Eshleman <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Stefano Garzarella <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
7 months agoarm64: Fix KASAN random tag seed initialization
Samuel Holland [Wed, 14 Aug 2024 09:09:53 +0000 (02:09 -0700)]
arm64: Fix KASAN random tag seed initialization

Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(),
so per_cpu(prng_state, cpu) accesses the same address regardless of the
value of "cpu", and the same seed value gets copied to the percpu area
for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(),
which is the first architecture hook after setup_per_cpu_areas().

Fixes: 3c9e3aa11094 ("kasan: add tag related helper functions")
Fixes: 3f41b6093823 ("kasan: fix random seed generation for tag-based mode")
Signed-off-by: Samuel Holland <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
7 months agoRevert "serial: 8250_omap: Set the console genpd always on if no console suspend"
Griffin Kroah-Hartman [Wed, 14 Aug 2024 11:17:47 +0000 (13:17 +0200)]
Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"

This reverts commit 68e6939ea9ec3d6579eadeab16060339cdeaf940.

Kevin reported that this causes a crash during suspend on platforms that
dont use PM domains.

Link: https://lore.kernel.org/r/[email protected]
Cc: Thomas Richard <[email protected]>
Fixes: 68e6939ea9ec ("serial: 8250_omap: Set the console genpd always on if no console suspend")
Cc: stable <[email protected]>
Reported-by: Kevin Hilman <[email protected]>
Signed-off-by: Griffin Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
7 months agoMerge tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 15 Aug 2024 03:40:43 +0000 (20:40 -0700)]
Merge tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.11

We have few fixes to drivers. The most important here is a fix for
iwlwifi which caused major slowdowns for several users.

* tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: correctly lookup DMA address in SG table
  wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change
  wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
  wifi: rtlwifi: rtl8192du: Initialise value32 in _rtl92du_init_queue_reserved_page
  wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
7 months agoselftest: af_unix: Fix kselftest compilation warnings
Abhinav Jain [Wed, 14 Aug 2024 08:07:43 +0000 (13:37 +0530)]
selftest: af_unix: Fix kselftest compilation warnings

Change expected_buf from (const void *) to (const char *)
in function __recvpair().
This change fixes the below warnings during test compilation:

```
In file included from msg_oob.c:14:
msg_oob.c: In function ‘__recvpair’:

../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]

../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’

../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]

../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
```

Fixes: d098d77232c3 ("selftest: af_unix: Add msg_oob.c.")
Signed-off-by: Abhinav Jain <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
7 months agoMerge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 15 Aug 2024 00:56:15 +0000 (17:56 -0700)]
Merge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - extend tree-checker verification of directory item type

 - fix regression in page/folio and extent state tracking in xarray, the
   dirty status can get out of sync and can cause problems e.g. a hang

 - in send, detect last extent and allow to clone it instead of sending
   it as write, reduces amount of data transferred in the stream

 - fix checking extent references when cleaning deleted subvolumes

 - fix one more case in the extent map shrinker, let it run only in the
   kswapd context so it does not cause latency spikes during other
   operations

* tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix invalid mapping of extent xarray state
  btrfs: send: allow cloning non-aligned extent if it ends at i_size
  btrfs: only run the extent map shrinker from kswapd tasks
  btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
  btrfs: check delayed refs when we're checking if a ref exists

7 months agoi2c: tegra: Do not mark ACPI devices as irq safe
Breno Leitao [Tue, 13 Aug 2024 16:12:53 +0000 (09:12 -0700)]
i2c: tegra: Do not mark ACPI devices as irq safe

On ACPI machines, the tegra i2c module encounters an issue due to a
mutex being called inside a spinlock. This leads to the following bug:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
...

Call trace:
__might_sleep
__mutex_lock_common
mutex_lock_nested
acpi_subsys_runtime_resume
rpm_resume
tegra_i2c_xfer

The problem arises because during __pm_runtime_resume(), the spinlock
&dev->power.lock is acquired before rpm_resume() is called. Later,
rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
mutexes, triggering the error.

To address this issue, devices on ACPI are now marked as not IRQ-safe,
considering the dependency of acpi_subsys_runtime_resume() on mutexes.

Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
Cc: <[email protected]> # v5.17+
Co-developed-by: Michael van der Westhuizen <[email protected]>
Signed-off-by: Michael van der Westhuizen <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
7 months agonetfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
Phil Sutter [Fri, 9 Aug 2024 13:07:32 +0000 (15:07 +0200)]
netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests

Objects' dump callbacks are not concurrency-safe per-se with reset bit
set. If two CPUs perform a reset at the same time, at least counter and
quota objects suffer from value underrun.

Prevent this by introducing dedicated locking callbacks for nfnetlink
and the asynchronous dump handling to serialize access.

Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agonetfilter: nf_tables: Introduce nf_tables_getobj_single
Phil Sutter [Fri, 9 Aug 2024 13:07:31 +0000 (15:07 +0200)]
netfilter: nf_tables: Introduce nf_tables_getobj_single

Outsource the reply skb preparation for non-dump getrule requests into a
distinct function. Prep work for object reset locking.

Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agonetfilter: nf_tables: Audit log dump reset after the fact
Phil Sutter [Fri, 9 Aug 2024 13:07:30 +0000 (15:07 +0200)]
netfilter: nf_tables: Audit log dump reset after the fact

In theory, dumpreset may fail and invalidate the preceeding log message.
Fix this and use the occasion to prepare for object reset locking, which
benefits from a few unrelated changes:

* Add an early call to nfnetlink_unicast if not resetting which
  effectively skips the audit logging but also unindents it.
* Extract the table's name from the netlink attribute (which is verified
  via earlier table lookup) to not rely upon validity of the looked up
  table pointer.
* Do not use local variable family, it will vanish.

Fixes: 8e6cf365e1d5 ("audit: log nftables configuration change events")
Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agoselftests: netfilter: add test for br_netfilter+conntrack+queue combination
Florian Westphal [Thu, 8 Aug 2024 21:14:43 +0000 (23:14 +0200)]
selftests: netfilter: add test for br_netfilter+conntrack+queue combination

Trigger cloned skbs leaving softirq protection.
This triggers splat without the preceeding change
("netfilter: nf_queue: drop packets with cloned unconfirmed
 conntracks"):

WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..

because local delivery and forwarding will race for confirmation.

Based on a reproducer script from Yi Chen.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agonetfilter: nf_queue: drop packets with cloned unconfirmed conntracks
Florian Westphal [Wed, 7 Aug 2024 19:28:41 +0000 (21:28 +0200)]
netfilter: nf_queue: drop packets with cloned unconfirmed conntracks

Conntrack assumes an unconfirmed entry (not yet committed to global hash
table) has a refcount of 1 and is not visible to other cores.

With multicast forwarding this assumption breaks down because such
skbs get cloned after being picked up, i.e.  ct->use refcount is > 1.

Likewise, bridge netfilter will clone broad/mutlicast frames and
all frames in case they need to be flood-forwarded during learning
phase.

For ip multicast forwarding or plain bridge flood-forward this will
"work" because packets don't leave softirq and are implicitly
serialized.

With nfqueue this no longer holds true, the packets get queued
and can be reinjected in arbitrary ways.

Disable this feature, I see no other solution.

After this patch, nfqueue cannot queue packets except the last
multicast/broadcast packet.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agonetfilter: flowtable: initialise extack before use
Donald Hunter [Tue, 6 Aug 2024 16:16:37 +0000 (17:16 +0100)]
netfilter: flowtable: initialise extack before use

Fix missing initialisation of extack in flow offload.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Donald Hunter <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agonetfilter: nfnetlink: Initialise extack before use in ACKs
Donald Hunter [Tue, 6 Aug 2024 15:43:24 +0000 (16:43 +0100)]
netfilter: nfnetlink: Initialise extack before use in ACKs

Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END.

Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages")
Signed-off-by: Donald Hunter <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 14 Aug 2024 20:46:24 +0000 (13:46 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "s390:

   - Fix failure to start guests with kvm.use_gisa=0

   - Panic if (un)share fails to maintain security.

  ARM:

   - Use kvfree() for the kvmalloc'd nested MMUs array

   - Set of fixes to address warnings in W=1 builds

   - Make KVM depend on assembler support for ARMv8.4

   - Fix for vgic-debug interface for VMs without LPIs

   - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest

   - Minor code / comment cleanups for configuring PAuth traps

   - Take kvm->arch.config_lock to prevent destruction / initialization
     race for a vCPU's CPUIF which may lead to a UAF

  x86:

   - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)

   - Fix smatch issues

   - Small cleanups

   - Make x2APIC ID 100% readonly

   - Fix typo in uapi constant

  Generic:

   - Use synchronize_srcu_expedited() on irqfd shutdown"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
  KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
  KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
  KVM: selftests: Add a testcase to verify x2APIC is fully readonly
  KVM: x86: Make x2APIC ID 100% readonly
  KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
  KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
  KVM: SVM: Fix an error code in sev_gmem_post_populate()
  KVM: SVM: Fix uninitialized variable bug
  KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
  KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
  KVM: arm64: Tidying up PAuth code in KVM
  KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
  KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
  s390/uv: Panic for set and remove shared access UVC errors
  KVM: s390: fix validity interception issue when gisa is switched off
  docs: KVM: Fix register ID of SPSR_FIQ
  KVM: arm64: vgic: fix unexpected unlock sparse warnings
  KVM: arm64: fix kdoc warnings in W=1 builds
  KVM: arm64: fix override-init warnings in W=1 builds
  ...

7 months agoRISC-V: hwprobe: Add SCALAR to misaligned perf defines
Evan Green [Fri, 9 Aug 2024 21:44:44 +0000 (14:44 -0700)]
RISC-V: hwprobe: Add SCALAR to misaligned perf defines

In preparation for misaligned vector performance hwprobe keys, rename
the hwprobe key values associated with misaligned scalar accesses to
include the term SCALAR. Leave the old defines in place to maintain
source compatibility.

This change is intended to be a functional no-op.

Signed-off-by: Evan Green <[email protected]>
Reviewed-by: Charlie Jenkins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
7 months agoRISC-V: hwprobe: Add MISALIGNED_PERF key
Evan Green [Fri, 9 Aug 2024 21:44:43 +0000 (14:44 -0700)]
RISC-V: hwprobe: Add MISALIGNED_PERF key

RISCV_HWPROBE_KEY_CPUPERF_0 was mistakenly flagged as a bitmask in
hwprobe_key_is_bitmask(), when in reality it was an enum value. This
causes problems when used in conjunction with RISCV_HWPROBE_WHICH_CPUS,
since SLOW, FAST, and EMULATED have values whose bits overlap with
each other. If the caller asked for the set of CPUs that was SLOW or
EMULATED, the returned set would also include CPUs that were FAST.

Introduce a new hwprobe key, RISCV_HWPROBE_KEY_MISALIGNED_PERF, which
returns the same values in response to a direct query (with no flags),
but is properly handled as an enumerated value. As a result, SLOW,
FAST, and EMULATED are all correctly treated as distinct values under
the new key when queried with the WHICH_CPUS flag.

Leave the old key in place to avoid disturbing applications which may
have already come to rely on the key, with or without its broken
behavior with respect to the WHICH_CPUS flag.

Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag")
Signed-off-by: Evan Green <[email protected]>
Reviewed-by: Charlie Jenkins <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
7 months agoRISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
Haibo Xu [Mon, 5 Aug 2024 03:30:23 +0000 (11:30 +0800)]
RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE

Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE.
To ensure all the values were properly initialized, switch to initialize
all of them to NUMA_NO_NODE.

Fixes: eabd9db64ea8 ("ACPI: RISCV: Add NUMA support based on SRAT and SLIT")
Reported-by: Andrew Jones <[email protected]>
Suggested-by: Andrew Jones <[email protected]>
Signed-off-by: Haibo Xu <[email protected]>
Reviewed-by: Sunil V L <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Hanjun Guo <[email protected]>
Link: https://lore.kernel.org/r/0d362a8ae50558b95685da4c821b2ae9e8cf78be.1722828421.git.haibo1.xu@intel.com
Signed-off-by: Palmer Dabbelt <[email protected]>
7 months agoriscv: change XIP's kernel_map.size to be size of the entire kernel
Nam Cao [Wed, 8 May 2024 19:19:17 +0000 (21:19 +0200)]
riscv: change XIP's kernel_map.size to be size of the entire kernel

With XIP kernel, kernel_map.size is set to be only the size of data part of
the kernel. This is inconsistent with "normal" kernel, who sets it to be
the size of the entire kernel.

More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is
enabled, because there are checks on virtual addresses with the assumption
that kernel_map.size is the size of the entire kernel (these checks are in
arch/riscv/mm/physaddr.c).

Change XIP's kernel_map.size to be the size of the entire kernel.

Signed-off-by: Nam Cao <[email protected]>
Cc: <[email protected]> # v6.1+
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
7 months agoriscv: entry: always initialize regs->a0 to -ENOSYS
Celeste Liu [Thu, 27 Jun 2024 14:23:39 +0000 (22:23 +0800)]
riscv: entry: always initialize regs->a0 to -ENOSYS

Otherwise when the tracer changes syscall number to -1, the kernel fails
to initialize a0 with -ENOSYS and subsequently fails to return the error
code of the failed syscall to userspace. For example, it will break
strace syscall tampering.

Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
Reported-by: "Dmitry V. Levin" <[email protected]>
Reviewed-by: Björn Töpel <[email protected]>
Cc: [email protected]
Signed-off-by: Celeste Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
7 months agonetfilter: allow ipv6 fragments to arrive on different devices
Tom Hughes [Tue, 6 Aug 2024 11:40:52 +0000 (12:40 +0100)]
netfilter: allow ipv6 fragments to arrive on different devices

Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device
for multicast and link-local packets") modified the ipv6 fragment
reassembly logic to distinguish frag queues by device for multicast
and link-local packets but in fact only the main reassembly code
limits the use of the device to those address types and the netfilter
reassembly code uses the device for all packets.

This means that if fragments of a packet arrive on different interfaces
then netfilter will fail to reassemble them and the fragments will be
expired without going any further through the filters.

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Tom Hughes <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
7 months agoi2c: Use IS_REACHABLE() for substituting empty ACPI functions
Richard Fitzgerald [Wed, 14 Aug 2024 12:16:49 +0000 (13:16 +0100)]
i2c: Use IS_REACHABLE() for substituting empty ACPI functions

Replace IS_ENABLED() with IS_REACHABLE() to substitute empty stubs for:
    i2c_acpi_get_i2c_resource()
    i2c_acpi_client_count()
    i2c_acpi_find_bus_speed()
    i2c_acpi_new_device_by_fwnode()
    i2c_adapter *i2c_acpi_find_adapter_by_handle()
    i2c_acpi_waive_d0_probe()

commit f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI
functions") partially fixed this conditional to depend on CONFIG_I2C,
but used IS_ENABLED(), which is wrong since CONFIG_I2C is tristate.

CONFIG_ACPI is boolean but let's also change it to use IS_REACHABLE()
to future-proof it against becoming tristate.

Somehow despite testing various combinations of CONFIG_I2C and CONFIG_ACPI
we missed the combination CONFIG_I2C=m, CONFIG_ACPI=y.

Signed-off-by: Richard Fitzgerald <[email protected]>
Fixes: f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reviewed-by: Takashi Iwai <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
7 months agoKVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
Amit Shah [Wed, 14 Aug 2024 08:31:13 +0000 (10:31 +0200)]
KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG

"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
the UAPI, keep the current definition and add a new one with the fix.

Fix-suggested-by: Marc Zyngier <[email protected]>
Signed-off-by: Amit Shah <[email protected]>
Message-ID: <20240814083113[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
7 months agoarm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
Haibo Xu [Mon, 5 Aug 2024 03:30:24 +0000 (11:30 +0800)]
arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE

Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE.
To ensure all the values were properly initialized, switch to initialize
all of them to NUMA_NO_NODE.

Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization")
Cc: <[email protected]> # 4.19.x
Reported-by: Andrew Jones <[email protected]>
Suggested-by: Andrew Jones <[email protected]>
Signed-off-by: Haibo Xu <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Sunil V L <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Hanjun Guo <[email protected]>
Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.1722828421.git.haibo1.xu@intel.com
Signed-off-by: Catalin Marinas <[email protected]>
7 months agoarm64: uaccess: correct thinko in __get_mem_asm()
Mark Rutland [Wed, 7 Aug 2024 10:37:31 +0000 (11:37 +0100)]
arm64: uaccess: correct thinko in __get_mem_asm()

In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we
incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault
the extable fixup handler writes -EFAULT into "%w0", which is the
register containing 'x' (the result of the load).

This was a thinko in commit:

  86a6a68febfcf57b ("arm64: start using 'asm goto' for get_user() when available")

Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used
such that the extable fixup handler wrote -EFAULT into "%w0" (the
register containing 'err'), and zero into "%w1" (the register containing
'x'). When the 'err' variable was removed, the extable entry was updated
incorrectly.

Writing -EFAULT to the value register is unnecessary but benign:

* We never want -EFAULT in the value register, and previously this would
  have been zeroed in the extable fixup handler.

* In __get_user_error() the value is overwritten with zero explicitly in
  the error path.

* The asm goto outputs cannot be used when the goto label is taken, as
  older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto
  outputs are usable in this path and may use a stale value rather than
  the value in an output register. Consequently, zeroing in the extable
  fixup handler is insufficient to ensure callers see zero in the error
  path.

* The expected usage of unsafe_get_user() and get_kernel_nofault()
  requires that the value is not consumed in the error path.

Some versions of GCC would mis-compile asm goto with outputs, and
erroneously omit subsequent assignments, breaking the error path
handling in __get_user_error(). This was discussed at:

  https://lore.kernel.org/lkml/[email protected]/

... and was fixed by removing support for asm goto with outputs on those
broken compilers in commit:

  f2f6a8e887172503 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND")

With that out of the way, we can safely replace the usage of
_ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(),
leaving the value register unchanged in the case a fault is taken, as
was originally intended. This matches other architectures and matches
our __put_mem_asm().

Signed-off-by: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
7 months agoKVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
Sean Christopherson [Fri, 9 Aug 2024 19:02:58 +0000 (12:02 -0700)]
KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)

Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation.  Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.

But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes.  Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC.  And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.

Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs.  For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0.  But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.

Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM.  Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.

Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES")
Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support")
Cc: Peter Gonda <[email protected]>
Cc: Michael Roth <[email protected]>
Cc: Vishal Annapurve <[email protected]>
Cc: Ackerly Tng <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <20240809190319.1710470[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
7 months agoMerge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 14 Aug 2024 16:23:20 +0000 (09:23 -0700)]
Merge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:

 - Fix a xperms counting problem where we adding to the xperms count
   even if we failed to add the xperm.

 - Propogate errors from avc_add_xperms_decision() back to the caller so
   that we can trigger the proper cleanup and error handling.

 - Revert our use of vma_is_initial_heap() in favor of our older logic
   as vma_is_initial_heap() doesn't correctly handle the no-heap case
   and it is causing issues with the SELinux process/execheap access
   control. While the older SELinux logic may not be perfect, it
   restores the expected user visible behavior.

   Hopefully we will be able to resolve the problem with the
   vma_is_initial_heap() macro with the mm folks, but we need to fix
   this in the meantime.

* tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: revert our use of vma_is_initial_heap()
  selinux: add the processing of the failure of avc_add_xperms_decision()
  selinux: fix potential counting error in avc_add_xperms_decision()

7 months agoMerge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Linus Torvalds [Wed, 14 Aug 2024 16:06:28 +0000 (09:06 -0700)]
Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "VFS:

   - Fix the name of file lease slab cache. When file leases were split
     out of file locks the name of the file lock slab cache was used for
     the file leases slab cache as well.

   - Fix a type in take_fd() helper.

   - Fix infinite directory iteration for stable offsets in tmpfs.

   - When the icache is pruned all reclaimable inodes are marked with
     I_FREEING and other processes that try to lookup such inodes will
     block.

     But some filesystems like ext4 can trigger lookups in their inode
     evict callback causing deadlocks. Ext4 does such lookups if the
     ea_inode feature is used whereby a separate inode may be used to
     store xattrs.

     Introduce I_LRU_ISOLATING which pins the inode while its pages are
     reclaimed. This avoids inode deletion during inode_lru_isolate()
     avoiding the deadlock and evict is made to wait until
     I_LRU_ISOLATING is done.

  netfs:

   - Fault in smaller chunks for non-large folio mappings for
     filesystems that haven't been converted to large folios yet.

   - Fix the CONFIG_NETFS_DEBUG config option. The config option was
     renamed a short while ago and that introduced two minor issues.
     First, it depended on CONFIG_NETFS whereas it wants to depend on
     CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
     does. Second, the documentation for the config option wasn't fixed
     up.

   - Revert the removal of the PG_private_2 writeback flag as ceph is
     using it and fix how that flag is handled in netfs.

   - Fix DIO reads on 9p. A program watching a file on a 9p mount
     wouldn't see any changes in the size of the file being exported by
     the server if the file was changed directly in the source
     filesystem. Fix this by attempting to read the full size specified
     when a DIO read is requested.

   - Fix a NULL pointer dereference bug due to a data race where a
     cachefiles cookies was retired even though it was still in use.
     Check the cookie's n_accesses counter before discarding it.

  nsfs:

   - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
     the kernel is writing to userspace.

  pidfs:

   - Prevent the creation of pidfds for kthreads until we have a
     use-case for it and we know the semantics we want. It also confuses
     userspace why they can get pidfds for kthreads.

  squashfs:

   - Fix an unitialized value bug reported by KMSAN caused by a
     corrupted symbolic link size read from disk. Check that the
     symbolic link size is not larger than expected"

* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Squashfs: sanity check symbolic link size
  9p: Fix DIO read through netfs
  vfs: Don't evict inode under the inode lru traversing context
  netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
  netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
  file: fix typo in take_fd() comment
  pidfd: prevent creation of pidfds for kthreads
  netfs: clean up after renaming FSCACHE_DEBUG config
  libfs: fix infinite directory reads for offset dir
  nsfs: fix ioctl declaration
  fs/netfs/fscache_cookie: add missing "n_accesses" check
  filelock: fix name of file_lease slab cache
  netfs: Fault in smaller chunks for non-large folio mappings

7 months agoMerge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Wed, 14 Aug 2024 15:57:24 +0000 (08:57 -0700)]
Merge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix bpftrace regression from Kyle Huey.

   Tracing bpf prog was called with perf_event input arguments causing
   bpftrace produce garbage output.

 - Fix verifier crash in stacksafe() from Yonghong Song.

   Daniel Hodges reported verifier crash when playing with sched-ext.
   The stack depth in the known verifier state was larger than stack
   depth in being explored state causing out-of-bounds access.

 - Fix update of freplace prog in prog_array from Leon Hwang.

   freplace prog type wasn't recognized correctly.

* tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  perf/bpf: Don't call bpf_overflow_handler() for tracing events
  selftests/bpf: Add a test to verify previous stacksafe() fix
  bpf: Fix a kernel verifier crash in stacksafe()
  bpf: Fix updating attached freplace prog in prog_array map

7 months agoxfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
Darrick J. Wong [Sun, 4 Aug 2024 21:39:57 +0000 (14:39 -0700)]
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set

If a file has the S_DAX flag (aka fsdax access mode) set, we cannot
allow users to change the realtime flag unless the datadev and rtdev
both support fsdax access modes.  Even if there are no extents allocated
to the file, the setattr thread could be racing with another thread
that has already started down the write code paths.

Fixes: ba23cba9b3bdc ("fs: allow per-device dax status checking for filesystems")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
7 months agoxfs: revert AIL TASK_KILLABLE threshold
Darrick J. Wong [Sun, 4 Aug 2024 21:39:34 +0000 (14:39 -0700)]
xfs: revert AIL TASK_KILLABLE threshold

In commit 9adf40249e6c, we changed the behavior of the AIL thread to
set its own task state to KILLABLE whenever the timeout value is
nonzero.  Unfortunately, this missed the fact that xfsaild_push will
return 50ms (aka a longish sleep) when we reach the push target or the
AIL becomes empty, so xfsaild goes to sleep for a long period of time in
uninterruptible D state.

This results in artificially high load averages because KILLABLE
processes are UNINTERRUPTIBLE, which contributes to load average even
though the AIL is asleep waiting for someone to interrupt it.  It's not
blocked on IOs or anything, but people scrap ps for processes that look
like they're stuck in D state, so restore the previous threshold.

Fixes: 9adf40249e6c ("xfs: AIL doesn't need manual pushing")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
7 months agoxfs: attr forks require attr, not attr2
Darrick J. Wong [Mon, 29 Jul 2024 20:44:33 +0000 (13:44 -0700)]
xfs: attr forks require attr, not attr2

It turns out that I misunderstood the difference between the attr and
attr2 feature bits.  "attr" means that at some point an attr fork was
created somewhere in the filesystem.  "attr2" means that inodes have
variable-sized forks, but says nothing about whether or not there
actually /are/ attr forks in the system.

If we have an attr fork, we only need to check that attr is set.

Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
7 months agoALSA: hda/tas2781: Use correct endian conversion
Takashi Iwai [Wed, 14 Aug 2024 10:04:59 +0000 (12:04 +0200)]
ALSA: hda/tas2781: Use correct endian conversion

The data conversion is done rather by a wrong function.  We convert to
BE32, not from BE32.  Although the end result must be same, this was
complained by the compiler.

Fix the code again and align with another similar function
tas2563_apply_calib() that does already right.

Fixes: 3beddef84d90 ("ALSA: hda/tas2781: fix wrong calibrated data order")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
7 months agoRevert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Niklas Cassel [Tue, 13 Aug 2024 13:19:01 +0000 (15:19 +0200)]
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"

This reverts commit 28ab9769117ca944cb6eb537af5599aa436287a4.

Sense data can be in either fixed format or descriptor format.

SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."

The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)

Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).

Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.

However, after commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.

These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).

Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.

Cc: [email protected] # 4.19+
Reported-by: Stephan Eisvogel <[email protected]>
Reported-by: Christian Heusel <[email protected]>
Closes: https://lore.kernel.org/linux-ide/[email protected]/
Fixes: 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Niklas Cassel <[email protected]>
7 months agoALSA: usb-audio: Support Yamaha P-125 quirk entry
Juan José Arboleda [Tue, 13 Aug 2024 16:10:53 +0000 (11:10 -0500)]
ALSA: usb-audio: Support Yamaha P-125 quirk entry

This patch adds a USB quirk for the Yamaha P-125 digital piano.

Signed-off-by: Juan José Arboleda <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
7 months agotcp: Update window clamping condition
Subash Abhinov Kasiviswanathan [Thu, 8 Aug 2024 23:06:40 +0000 (16:06 -0700)]
tcp: Update window clamping condition

This patch is based on the discussions between Neal Cardwell and
Eric Dumazet in the link
https://lore.kernel.org/netdev/20240726204105.1466841[email protected]/

It was correctly pointed out that tp->window_clamp would not be
updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
(copied <= tp->rcvq_space.space). While it is expected for most
setups to leave the sysctl enabled, the latter condition may
not end up hitting depending on the TCP receive queue size and
the pattern of arriving data.

The updated check should be hit only on initial MSS update from
TCP_MIN_MSS to measured MSS value and subsequently if there was
an update to a larger value.

Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
Signed-off-by: Sean Tranchetti <[email protected]>
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
7 months agomedia: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
Hans de Goede [Sun, 21 Jul 2024 15:38:40 +0000 (17:38 +0200)]
media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices

Commit a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG)
support") broke BYT support because it removed a seemingly unused field
from struct sh_css_sp_config and a seemingly unused value from enum
ia_css_input_mode.

But these are part of the ABI between the kernel and firmware on ISP2400
and this part of the TPG support removal changes broke ISP2400 support.

ISP2401 support was not affected because on ISP2401 only a part of
struct sh_css_sp_config is used.

Restore the removed field and enum value to fix this.

Fixes: a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support")
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Hans Verkuil <[email protected]>
7 months agobcachefs: bcachefs_metadata_version_disk_accounting_inum
Kent Overstreet [Mon, 12 Aug 2024 06:27:36 +0000 (02:27 -0400)]
bcachefs: bcachefs_metadata_version_disk_accounting_inum

This adds another disk accounting counter to track usage per inode
number (any snapshot ID).

This will be used for a couple things:

- It'll give us a way to tell the user how much space a given file ista
  consuming in all snapshots; i.e. how much extra space it's consuming
  due to snapshot versioning.

- It counts number of extents and total size of extents (both in btree
  keyspace sectors and actual disk usage), meaning it gives us average
  extent size: that is, it'll let us cheaply find fragmented files that
  should be defragmented.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Kill __bch2_accounting_mem_mod()
Kent Overstreet [Mon, 12 Aug 2024 06:35:10 +0000 (02:35 -0400)]
bcachefs: Kill __bch2_accounting_mem_mod()

The next patch will be adding a disk accounting counter type which is
not kept in the in-memory eytzinger tree.

As prep, fold __bch2_accounting_mem_mod() into
bch2_accounting_mem_mod_locked() so that we can check for that counter
type and bail out without calling bpos_to_disk_accounting_pos() twice.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
Kent Overstreet [Tue, 13 Aug 2024 01:31:25 +0000 (21:31 -0400)]
bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()

bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Fix warning in __bch2_fsck_err() for trans not passed in
Kent Overstreet [Tue, 13 Aug 2024 03:29:46 +0000 (23:29 -0400)]
bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Add a time_stat for blocked on key cache flush
Kent Overstreet [Sat, 10 Aug 2024 19:48:18 +0000 (15:48 -0400)]
bcachefs: Add a time_stat for blocked on key cache flush

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Improve trans_blocked_journal_reclaim tracepoint
Kent Overstreet [Sat, 10 Aug 2024 18:31:17 +0000 (14:31 -0400)]
bcachefs: Improve trans_blocked_journal_reclaim tracepoint

include information about the state of the btree key cache

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Add hysteresis to waiting on btree key cache flush
Kent Overstreet [Sat, 10 Aug 2024 18:40:09 +0000 (14:40 -0400)]
bcachefs: Add hysteresis to waiting on btree key cache flush

This helps ensure key cache reclaim isn't contending with threads
waiting for the key cache to be helped, and fixes a severe performance
bug.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agolib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()
Kent Overstreet [Sun, 11 Aug 2024 01:04:35 +0000 (21:04 -0400)]
lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()

If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.

If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Convert for_each_btree_node() to lockrestart_do()
Kent Overstreet [Wed, 7 Aug 2024 20:34:28 +0000 (16:34 -0400)]
bcachefs: Convert for_each_btree_node() to lockrestart_do()

for_each_btree_node() now works similarly to for_each_btree_key(), where
the loop body is passed as an argument to be passed to lockrestart_do().

This now calls trans_begin() on every loop iteration - which fixes an
SRCU warning in backpointers fsck.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Add missing downgrade table entry
Kent Overstreet [Tue, 13 Aug 2024 05:01:35 +0000 (01:01 -0400)]
bcachefs: Add missing downgrade table entry

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: disk accounting: ignore unknown types
Kent Overstreet [Wed, 14 Aug 2024 02:47:55 +0000 (22:47 -0400)]
bcachefs: disk accounting: ignore unknown types

forward compat fix

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: bch2_accounting_invalid() fixup
Kent Overstreet [Tue, 13 Aug 2024 08:53:12 +0000 (04:53 -0400)]
bcachefs: bch2_accounting_invalid() fixup

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: Fix bch2_trigger_alloc when upgrading from old versions
Kent Overstreet [Tue, 13 Aug 2024 03:24:03 +0000 (23:24 -0400)]
bcachefs: Fix bch2_trigger_alloc when upgrading from old versions

bch2_trigger_alloc was assuming that the new key would always be newly
created and thus always an alloc_v4 key, but - not when called from
btree_gc.

Signed-off-by: Kent Overstreet <[email protected]>
7 months agobcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
Kent Overstreet [Wed, 14 Aug 2024 02:40:39 +0000 (22:40 -0400)]
bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()

bch2_btree_path_traverse_cached() was previously checking if it could
just relock the path, which is a common idiom in path traversal.

However, it was using btree_node_relock(), not btree_path_relock();
btree_path_relock() only succeeds if the path was in state
BTREE_ITER_NEED_RELOCK.

If the path was in state BTREE_ITER_NEED_TRAVERSE a full traversal is
needed; this led to a null ptr deref in
bch2_btree_path_traverse_cached().

And the short circuit check here isn't needed, since it was already done
in the main bch2_btree_path_traverse_one().

Signed-off-by: Kent Overstreet <[email protected]>
7 months agomptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size
Eugene Syromiatnikov [Mon, 12 Aug 2024 06:51:23 +0000 (08:51 +0200)]
mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size

ssn_offset field is u32 and is placed into the netlink response with
nla_put_u32(), but only 2 bytes are reserved for the attribute payload
in subflow_get_info_size() (even though it makes no difference
in the end, as it is aligned up to 4 bytes).  Supply the correct
argument to the relevant nla_total_size() call to make it less
confusing.

Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace")
Signed-off-by: Eugene Syromiatnikov <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
7 months agoMerge tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Tue, 13 Aug 2024 23:10:32 +0000 (16:10 -0700)]
Merge tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

 - binfmt_flat: Fix corruption when not offsetting data start

 - exec: Fix ToCToU between perm check and set-uid/gid usage

* tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Fix ToCToU between perm check and set-uid/gid usage
  binfmt_flat: Fix corruption when not offsetting data start

7 months agoi2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
Andi Shyti [Mon, 12 Aug 2024 19:40:28 +0000 (21:40 +0200)]
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume

Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.

Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.

Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <[email protected]>
Cc: [email protected] # v5.9+
Signed-off-by: Andi Shyti <[email protected]>
7 months agoof/irq: Prevent device address out-of-bounds read in interrupt map walk
Stefan Wiehler [Mon, 12 Aug 2024 10:06:51 +0000 (12:06 +0200)]
of/irq: Prevent device address out-of-bounds read in interrupt map walk

When of_irq_parse_raw() is invoked with a device address smaller than
the interrupt parent node (from #address-cells property), KASAN detects
the following out-of-bounds read when populating the initial match table
(dyndbg="func of_irq_parse_* +p"):

  OF: of_irq_parse_one: dev=/soc@0/picasso/watchdog, index=0
  OF:  parent=/soc@0/pci@878000000000/gpio0@17,0, intsize=2
  OF:  intspec=4
  OF: of_irq_parse_raw: ipar=/soc@0/pci@878000000000/gpio0@17,0, size=2
  OF:  -> addrsize=3
  ==================================================================
  BUG: KASAN: slab-out-of-bounds in of_irq_parse_raw+0x2b8/0x8d0
  Read of size 4 at addr ffffff81beca5608 by task bash/764

  CPU: 1 PID: 764 Comm: bash Tainted: G           O       6.1.67-484c613561-nokia_sm_arm64 #1
  Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2023.01-12.24.03-dirty 01/01/2023
  Call trace:
   dump_backtrace+0xdc/0x130
   show_stack+0x1c/0x30
   dump_stack_lvl+0x6c/0x84
   print_report+0x150/0x448
   kasan_report+0x98/0x140
   __asan_load4+0x78/0xa0
   of_irq_parse_raw+0x2b8/0x8d0
   of_irq_parse_one+0x24c/0x270
   parse_interrupts+0xc0/0x120
   of_fwnode_add_links+0x100/0x2d0
   fw_devlink_parse_fwtree+0x64/0xc0
   device_add+0xb38/0xc30
   of_device_add+0x64/0x90
   of_platform_device_create_pdata+0xd0/0x170
   of_platform_bus_create+0x244/0x600
   of_platform_notify+0x1b0/0x254
   blocking_notifier_call_chain+0x9c/0xd0
   __of_changeset_entry_notify+0x1b8/0x230
   __of_changeset_apply_notify+0x54/0xe4
   of_overlay_fdt_apply+0xc04/0xd94
   ...

  The buggy address belongs to the object at ffffff81beca5600
   which belongs to the cache kmalloc-128 of size 128
  The buggy address is located 8 bytes inside of
   128-byte region [ffffff81beca5600ffffff81beca5680)

  The buggy address belongs to the physical page:
  page:00000000230d3d03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1beca4
  head:00000000230d3d03 order:1 compound_mapcount:0 compound_pincount:0
  flags: 0x8000000000010200(slab|head|zone=2)
  raw: 8000000000010200 0000000000000000 dead000000000122 ffffff810000c300
  raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffffff81beca5500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffffff81beca5580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >ffffff81beca5600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                        ^
   ffffff81beca5680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffffff81beca5700: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
  ==================================================================
  OF:  -> got it !

Prevent the out-of-bounds read by copying the device address into a
buffer of sufficient size.

Signed-off-by: Stefan Wiehler <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring (Arm) <[email protected]>
7 months agoexec: Fix ToCToU between perm check and set-uid/gid usage
Kees Cook [Thu, 8 Aug 2024 18:39:08 +0000 (11:39 -0700)]
exec: Fix ToCToU between perm check and set-uid/gid usage

When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.

For example, if a file could change permissions from executable and not
set-id:

---------x 1 root root 16048 Aug  7 13:16 target

to set-id and non-executable:

---S------ 1 root root 16048 Aug  7 13:16 target

it is possible to gain root privileges when execution should have been
disallowed.

While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:

-rwxr-xr-x 1 root cdrom 16048 Aug  7 13:16 target

becomes:

-rwsr-xr-- 1 root cdrom 16048 Aug  7 13:16 target

But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".

Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.

Reported-by: Marco Vanotti <[email protected]>
Tested-by: Marco Vanotti <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Eric Biederman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
7 months agodm persistent data: fix memory allocation failure
Mikulas Patocka [Tue, 13 Aug 2024 14:35:14 +0000 (16:35 +0200)]
dm persistent data: fix memory allocation failure

kmalloc is unreliable when allocating more than 8 pages of memory. It may
fail when there is plenty of free memory but the memory is fragmented.
Zdenek Kabelac observed such failure in his tests.

This commit changes kmalloc to kvmalloc - kvmalloc will fall back to
vmalloc if the large allocation fails.

Signed-off-by: Mikulas Patocka <[email protected]>
Reported-by: Zdenek Kabelac <[email protected]>
Reviewed-by: Mike Snitzer <[email protected]>
Cc: [email protected]
7 months agoperf/bpf: Don't call bpf_overflow_handler() for tracing events
Kyle Huey [Tue, 13 Aug 2024 15:17:27 +0000 (15:17 +0000)]
perf/bpf: Don't call bpf_overflow_handler() for tracing events

The regressing commit is new in 6.10. It assumed that anytime event->prog
is set bpf_overflow_handler() should be invoked to execute the attached bpf
program. This assumption is false for tracing events, and as a result the
regressing commit broke bpftrace by invoking the bpf handler with garbage
inputs on overflow.

Prior to the regression the overflow handlers formed a chain (of length 0,
1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
(the tracing case) did not. Both set event->prog. The chain of overflow
handlers was replaced by a single overflow handler slot and a fixed call to
bpf_overflow_handler() when appropriate. This modifies the condition there
to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
previous behavior and fixing bpftrace.

Signed-off-by: Kyle Huey <[email protected]>
Suggested-by: Andrii Nakryiko <[email protected]>
Reported-by: Joe Damato <[email protected]>
Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/
Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery")
Cc: [email protected]
Tested-by: Joe Damato <[email protected]> # bpftrace
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
7 months agodrm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
Kenneth Feng [Thu, 8 Aug 2024 04:19:22 +0000 (12:19 +0800)]
drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1

add HDP_SD support on gc 12.0.0/1

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 61cffacb3a1c590b15c0e9ff987de02d293e0dd8)

7 months agodrm/amdgpu: Update kmd_fw_shared for VCN5
Yinjie Yao [Fri, 9 Aug 2024 21:20:26 +0000 (17:20 -0400)]
drm/amdgpu: Update kmd_fw_shared for VCN5

kmd_fw_shared changed in VCN5

Signed-off-by: Yinjie Yao <[email protected]>
Reviewed-by: Ruijing Dong <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit aa02486fb18cecbaca0c4fd393d1a03f1d4c3f9a)

7 months agodrm/amd/amdgpu: command submission parser for JPEG
David (Ming Qiang) Wu [Thu, 8 Aug 2024 16:19:50 +0000 (12:19 -0400)]
drm/amd/amdgpu: command submission parser for JPEG

Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: David (Ming Qiang) Wu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit a7f670d5d8e77b092404ca8a35bb0f8f89ed3117)
Cc: [email protected]
7 months agodrm/amdgpu/mes12: fix suspend issue
Jack Xiao [Wed, 7 Aug 2024 04:03:11 +0000 (12:03 +0800)]
drm/amdgpu/mes12: fix suspend issue

Use mes pipe to unmap kcq and kgq.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit f7fb9d677faf0460131bc2af15afd766d48a1f47)

7 months agodrm/amdgpu/mes12: sw/hw fini for unified mes
Jack Xiao [Wed, 7 Aug 2024 07:23:16 +0000 (15:23 +0800)]
drm/amdgpu/mes12: sw/hw fini for unified mes

Free memory for two pipes and unmap pipe0 via pipe1.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 98cae695a8ae0e4291b1fa7feef9b54fabefe885)

7 months agodrm/amdgpu/mes12: configure two pipes hardware resources
Jack Xiao [Wed, 7 Aug 2024 06:49:30 +0000 (14:49 +0800)]
drm/amdgpu/mes12: configure two pipes hardware resources

Configure two pipes with different hardware resources.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit ea5d6db17a8e3635ad91e8c53faa1fdc9570fbbb)

7 months agodrm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
Jack Xiao [Wed, 7 Aug 2024 06:44:07 +0000 (14:44 +0800)]
drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes

Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit aa539da8aff07ab08def6490e8c9b441439e70ba)

7 months agodrm/amdgpu/mes12: add mes pipe switch support
Jack Xiao [Wed, 7 Aug 2024 06:15:48 +0000 (14:15 +0800)]
drm/amdgpu/mes12: add mes pipe switch support

Add mes pipe switch to let caller choose pipe
to submit packet.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit b2dee0837a4be63e8d3e00550a9f057644f962c4)

7 months agodrm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1
Jack Xiao [Wed, 7 Aug 2024 05:19:59 +0000 (13:19 +0800)]
drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1

Enable unified mes firmware to load on pipe0 and pipe1.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit e69c2dd7534f3fcabf7bb801db2a7ac71e7e5da6)

7 months agodrm/amdgpu/mes: add multiple mes ring instances support
Jack Xiao [Wed, 7 Aug 2024 03:53:35 +0000 (11:53 +0800)]
drm/amdgpu/mes: add multiple mes ring instances support

Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit c7d4355648ffa02a1551495b05c71ea6c884d29c)

7 months agodrm/amdgpu/mes12: update mes_v12_api_def.h
Jack Xiao [Wed, 7 Aug 2024 03:43:45 +0000 (11:43 +0800)]
drm/amdgpu/mes12: update mes_v12_api_def.h

Update mes12 api definition.

Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 2ab5dc59177419d8a49e89585e82ff41524270fc)

7 months agodrm/amdgpu: Actually check flags for all context ops.
Bas Nieuwenhuizen [Tue, 6 Aug 2024 20:27:32 +0000 (22:27 +0200)]
drm/amdgpu: Actually check flags for all context ops.

Missing validation ...

Checked libdrm and it clears all the structs, so we should be
safe to just check everything.

Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit c6b86421f1f9ddf9d706f2453159813ee39d0cf9)
Cc: [email protected]
7 months agodrm/amdgpu/jpeg4: properly set atomics vmid field
Alex Deucher [Fri, 12 Jul 2024 14:06:05 +0000 (10:06 -0400)]
drm/amdgpu/jpeg4: properly set atomics vmid field

This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit c6c2e8b6a427d4fecc7c36cffccb908185afcab2)
Cc: [email protected]
7 months agodrm/amdgpu/jpeg2: properly set atomics vmid field
Alex Deucher [Fri, 12 Jul 2024 14:00:33 +0000 (10:00 -0400)]
drm/amdgpu/jpeg2: properly set atomics vmid field

This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 35c628774e50b3784c59e8ca7973f03bcb067132)
Cc: [email protected]
7 months agodrm/amd/display: Adjust cursor position
Rodrigo Siqueira [Thu, 1 Aug 2024 22:16:35 +0000 (16:16 -0600)]
drm/amd/display: Adjust cursor position

[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.

Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
Cc: [email protected]
7 months agodrm/amd/display: fix cursor offset on rotation 180
Melissa Wen [Tue, 31 Jan 2023 16:05:46 +0000 (15:05 -0100)]
drm/amd/display: fix cursor offset on rotation 180

[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.

Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247

The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669[email protected]/
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.

Reported-by: Xaver Hugl <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <[email protected]>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
Cc: [email protected]
7 months agodrm/amd/display: Fix MST BW calculation Regression
Fangzhi Zuo [Mon, 29 Jul 2024 14:23:03 +0000 (10:23 -0400)]
drm/amd/display: Fix MST BW calculation Regression

[Why & How]
Revert commit 8b2cb32cf0c6
("drm/amd/display: FEC overhead should be checked once for mst slot nums")
Because causes bw calculation regression

Cc: [email protected]
Cc: [email protected]
Reported-by: [email protected]
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3495
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1228093
Reviewed-by: Wayne Lin <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 12dbb3ed212fc7655fce421542a5add637f8af7a)
Cc: [email protected]
7 months agodrm/amd/display: Enable otg synchronization logic for DCN321
Loan Chen [Fri, 2 Aug 2024 05:57:40 +0000 (13:57 +0800)]
drm/amd/display: Enable otg synchronization logic for DCN321

[Why]
Tiled display cannot synchronize properly after S3.
The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg
synchronization logic") is not enable in DCN321, which causes
the otg is excluded from synchronization.

[How]
Enable otg synchronization logic in dcn321.

Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic")
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Loan Chen <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit d6ed53712f583423db61fbb802606759e023bf7b)
Cc: [email protected]
7 months agodrm/amd/display: fix s2idle entry for DCN3.5+
Hamza Mahfooz [Tue, 6 Aug 2024 13:55:55 +0000 (09:55 -0400)]
drm/amd/display: fix s2idle entry for DCN3.5+

To be able to get to the lowest power state when suspending systems with
DCN3.5+, we must be in IPS before the display hardware is put into
D3cold. So, to ensure that the system always reaches the lowest power
state while suspending, force systems that support IPS to enter idle
optimizations before entering D3cold.

Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 237193e21b29d4aa0617ffeea3d6f49e72999708)
Cc: [email protected] # 6.10+
7 months agodrm/amdgpu/mes: fix mes ring buffer overflow
Jack Xiao [Thu, 18 Jul 2024 08:38:50 +0000 (16:38 +0800)]
drm/amdgpu/mes: fix mes ring buffer overflow

wait memory room until enough before writing mes packets
to avoid ring buffer overflow.

v2: squash in sched_hw_submission fix

Fixes: de3246254156 ("drm/amdgpu: cleanup MES11 command submission")
Fixes: fffe347e1478 ("drm/amdgpu: cleanup MES12 command submission")
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 34e087e8920e635c62e2ed6a758b0cd27f836d13)
Cc: [email protected]
7 months agoKVM: eventfd: Use synchronize_srcu_expedited() on shutdown
Li RongQing [Thu, 11 Jul 2024 12:11:30 +0000 (20:11 +0800)]
KVM: eventfd: Use synchronize_srcu_expedited() on shutdown

When hot-unplug a device which has many queues, and guest CPU will has
huge jitter, and unplugging is very slow.

It turns out synchronize_srcu() in irqfd_shutdown() caused the guest
jitter and unplugging latency, so replace synchronize_srcu() with
synchronize_srcu_expedited(), to accelerate the unplugging, and reduce
the guest OS jitter, this accelerates the VM reboot too.

Signed-off-by: Li RongQing <[email protected]>
Message-ID: <20240711121130[email protected]>
[Call it just once in irqfd_resampler_shutdown. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
7 months agoMerge tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Tue, 13 Aug 2024 16:03:23 +0000 (09:03 -0700)]
Merge tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two smb3 server fixes for access denied problem on share path checks"

* tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: override fsids for smb2_query_info()
  ksmbd: override fsids for share path check

7 months agoKVM: selftests: Add a testcase to verify x2APIC is fully readonly
Michal Luczaj [Fri, 2 Aug 2024 20:29:41 +0000 (13:29 -0700)]
KVM: selftests: Add a testcase to verify x2APIC is fully readonly

Add a test to verify that userspace can't change a vCPU's x2APIC ID by
abusing KVM_SET_LAPIC.  KVM models the x2APIC ID (and x2APIC LDR) as
readonly, and silently ignores userspace attempts to change the x2APIC ID
for backwards compatibility.

Signed-off-by: Michal Luczaj <[email protected]>
[sean: write changelog, add to existing test]
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <20240802202941[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
7 months agoKVM: x86: Make x2APIC ID 100% readonly
Sean Christopherson [Fri, 2 Aug 2024 20:29:40 +0000 (13:29 -0700)]
KVM: x86: Make x2APIC ID 100% readonly

Ignore the userspace provided x2APIC ID when fixing up APIC state for
KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM.  Commit
a92e2543d6a8 ("KVM: x86: use hardware-compatible format for APIC ID
register"), which added the fixup, didn't intend to allow userspace to
modify the x2APIC ID.  In fact, that commit is when KVM first started
treating the x2APIC ID as readonly, apparently to fix some race:

 static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
-       return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+       /* To avoid a race between apic_base and following APIC_ID update when
+        * switching to x2apic_mode, the x2apic mode returns initial x2apic id.
+        */
+       if (apic_x2apic_mode(apic))
+               return apic->vcpu->vcpu_id;
+
+       return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
 }

Furthermore, KVM doesn't support delivering interrupts to vCPUs with a
modified x2APIC ID, but KVM *does* return the modified value on a guest
RDMSR and for KVM_GET_LAPIC.  I.e. no remotely sane setup can actually
work with a modified x2APIC ID.

Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map
calculation, which expects the LDR to align with the x2APIC ID.

  WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm]
  CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
  RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm]
  Call Trace:
   <TASK>
   kvm_apic_set_state+0x1cf/0x5b0 [kvm]
   kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm]
   kvm_vcpu_ioctl+0x663/0x8a0 [kvm]
   __x64_sys_ioctl+0xb8/0xf0
   do_syscall_64+0x56/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0
  RIP: 0033:0x7fade8b9dd6f

Unfortunately, the WARN can still trigger for other CPUs than the current
one by racing against KVM_SET_LAPIC, so remove it completely.

Reported-by: Michal Luczaj <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Reported-by: Haoyu Wu <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <20240802202941[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
7 months agoDocumentation: dm-crypt.rst warning + error fix
Daniel Yang [Wed, 7 Aug 2024 09:01:21 +0000 (02:01 -0700)]
Documentation: dm-crypt.rst warning + error fix

While building kernel documention using make htmldocs command, I was
getting unexpected indentation error. Single description was given for
two module parameters with wrong indentation. So, I corrected the
indentation of both parameters and the description.

Signed-off-by: Shibu kumar <[email protected]>
Signed-off-by: Daniel Yang <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
Fixes: 0d815e3400e6 ("dm-crypt: limit the size of encryption requests")
This page took 0.138644 seconds and 4 git commands to generate.