Git Repo - linux.git/log

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
   Reported by static analysis tooling, patch from Florian Westphal.

2) Missing set type checks in nf_tables: Validate that set declaration
   matches the an existing set type, otherwise bail out with EEXIST.
   Currently, nf_tables silently accepts the re-declaration with a
   different type but it bails out later with EINVAL when the user adds
   entries to the set. This fix is relatively large because it requires
   two preparation patches that are included in this batch.

3) Do not ignore updates of timeout and gc_interval parameters in
   existing sets.

4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
   ipset. Except hash:net,port,net and hash:net,iface, the set types don't
   support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
   a special handling in hash:net,port,net which was missing (hash:net,iface
   was not affected by this bug), from Jozsef Kadlecsik.

5) When adding/deleting large number of elements in one step in ipset,
   it can take a reasonable amount of time and can result in soft lockup
   errors. This patch is a complete rework of the previous version in order
   to use a smaller internal batch limit and at the same time removing
   the external hard limit to add arbitrary number of elements in one step.
   Also from Jozsef Kadlecsik.

Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"First batch of regression and regular fixes:

   - regressions:
       - fix error handling after conversion to qstr for paths
       - fix raid56/scrub recovery caused by uninitialized variable
         after conversion to error bitmaps
       - restore qgroup backref lookup behaviour after recent
         refactoring
       - fix leak of device lists at module exit time

   - fix resolving backrefs for inline extent followed by prealloc

   - reset defrag ioctl buffer on memory allocation error"

* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix fscrypt name leak after failure to join log transaction
  btrfs: scrub: fix uninitialized return value in recover_scrub_rbio
  btrfs: fix resolving backrefs for inline extent followed by prealloc
  btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
  btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup
  btrfs: fix leak of fs devices after removing btrfs module
  btrfs: fix an error handling path in btrfs_defrag_leaves()
  btrfs: fix an error handling path in btrfs_rename()

fs/ntfs3: don't hold ni_lock when calling truncate_setsize()

syzbot is reporting hung task at do_user_addr_fault() [1], for there is
a silent deadlock between PG_locked bit and ni_lock lock.

Since filemap_update_page() calls filemap_read_folio() after calling
folio_trylock() which will set PG_locked bit, ntfs_truncate() must not
call truncate_setsize() which will wait for PG_locked bit to be cleared
when holding ni_lock lock.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae
Reported-by: syzbot <[email protected]>
Debugged-by: Linus Torvalds <[email protected]>
Co-developed-by: Hillf Danton <[email protected]>
Signed-off-by: Hillf Danton <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Linus Torvalds <[email protected]>

x86/kexec: Fix double-free of elf header buffer

After

b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"),

freeing image->elf_headers in the error path of crash_load_segments()
is not needed because kimage_file_post_load_cleanup() will take
care of that later. And not clearing it could result in a double-free.

Drop the superfluous vfree() call at the error path of
crash_load_segments().

Fixes: b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer")
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

nfsd: fix handling of readdir in v4root vs. mount upcall timeout

If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.

That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.

If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.

Cc: Steve Dickson <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by: JianHong Yin <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Cc: <[email protected]>

fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB

Commit 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to
the same as vbG200 to avoid black screen") accidently decreases the
maximum memory size for the Matrox G200eW (102b:0532) from 8 MB to 1 MB
by missing one zero. This caused the driver initialization to fail with
the messages below, as the minimum required VRAM size is 2 MB:

     [    9.436420] matroxfb: Matrox MGA-G200eW (PCI) detected
     [    9.444502] matroxfb: cannot determine memory size
     [    9.449316] matroxfb: probe of 0000:0a:03.0 failed with error -1

So, add the missing 0 to make it the intended 16 MB. Successfully tested on
the Dell PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013, that the warning is
gone.

While at it, add a leading 0 to the maxdisplayable entry, so it’s aligned
properly. The value could probably also be increased from 8 MB to 16 MB, as
the G200 uses the same values, but I have not checked any datasheet.

Note, matroxfb is obsolete and superseded by the maintained DRM driver
mga200, which is used by default on most systems where both drivers are
available. Therefore, on most systems it was only a cosmetic issue.

Fixes: 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen")
Link: https://lore.kernel.org/linux-fbdev/[email protected]/T/#mb6953a9995ebd18acc8552f99d6db39787aec775
Cc: [email protected]
Cc: Z. Liu <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: [email protected]
Signed-off-by: Paul Menzel <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

netfilter: ipset: Rework long task execution when adding/deleting entries

When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.

The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: [email protected]
Signed-off-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: ipset: fix hash:net,port,net hang with /0 subnet

The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch
5f7b51bf09baca8e brings it out earlier.

Handle /0 subnets properly in hash:net,port,net set types.

Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: Марк Коренберг <[email protected]>
Signed-off-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

net: sparx5: Fix reading of the MAC address

There is an issue with the checking of the return value of
'of_get_mac_address', which returns 0 on success and negative value on
failure. The driver interpretated the result the opposite way. Therefore
if there was a MAC address defined in the DT, then the driver was
generating a random MAC address otherwise it would use address 0.
Fix this by checking correctly the return value of 'of_get_mac_address'

Fixes: b74ef9f9cb91 ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()")
Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: Fix memory leaks in error path

The memory allocated by vxlan_vnigroup_init() is not freed in the error
path, leading to memory leaks [1]. Fix by calling
vxlan_vnigroup_uninit() in the error path.

The leaks can be reproduced by annotating gro_cells_init() with
ALLOW_ERROR_INJECTION() and then running:

# echo "100" > /sys/kernel/debug/fail_function/probability
# echo "1" > /sys/kernel/debug/fail_function/times
# echo "gro_cells_init" > /sys/kernel/debug/fail_function/inject
# printf %#x -12 > /sys/kernel/debug/fail_function/gro_cells_init/retval
# ip link add name vxlan0 type vxlan dstport 4789 external vnifilter
RTNETLINK answers: Cannot allocate memory

[1]
unreferenced object 0xffff88810db84a00 (size 512):
  comm "ip", pid 330, jiffies 4295010045 (age 66.016s)
  hex dump (first 32 bytes):
    f8 d5 76 0e 81 88 ff ff 01 00 00 00 00 00 00 02  ..v.............
    03 00 04 00 48 00 00 00 00 00 00 01 04 00 01 00  ....H...........
  backtrace:
    [<ffffffff81a3097a>] kmalloc_trace+0x2a/0x60
    [<ffffffff82f049fc>] vxlan_vnigroup_init+0x4c/0x160
    [<ffffffff82ecd69e>] vxlan_init+0x1ae/0x280
    [<ffffffff836858ca>] register_netdevice+0x57a/0x16d0
    [<ffffffff82ef67b7>] __vxlan_dev_create+0x7c7/0xa50
    [<ffffffff82ef6ce6>] vxlan_newlink+0xd6/0x130
    [<ffffffff836d02ab>] __rtnl_newlink+0x112b/0x18a0
    [<ffffffff836d0a8c>] rtnl_newlink+0x6c/0xa0
    [<ffffffff836c0ddf>] rtnetlink_rcv_msg+0x43f/0xd40
    [<ffffffff83908ce0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff839066af>] netlink_unicast+0x53f/0x810
    [<ffffffff839072d8>] netlink_sendmsg+0x958/0xe70
    [<ffffffff835c319f>] ____sys_sendmsg+0x78f/0xa90
    [<ffffffff835cd6da>] ___sys_sendmsg+0x13a/0x1e0
    [<ffffffff835cd94c>] __sys_sendmsg+0x11c/0x1f0
    [<ffffffff8424da78>] do_syscall_64+0x38/0x80
unreferenced object 0xffff88810e76d5f8 (size 192):
  comm "ip", pid 330, jiffies 4295010045 (age 66.016s)
  hex dump (first 32 bytes):
    04 00 00 00 00 00 00 00 db e1 4f e7 00 00 00 00  ..........O.....
    08 d6 76 0e 81 88 ff ff 08 d6 76 0e 81 88 ff ff  ..v.......v.....
  backtrace:
    [<ffffffff81a3162e>] __kmalloc_node+0x4e/0x90
    [<ffffffff81a0e166>] kvmalloc_node+0xa6/0x1f0
    [<ffffffff8276e1a3>] bucket_table_alloc.isra.0+0x83/0x460
    [<ffffffff8276f18b>] rhashtable_init+0x43b/0x7c0
    [<ffffffff82f04a1c>] vxlan_vnigroup_init+0x6c/0x160
    [<ffffffff82ecd69e>] vxlan_init+0x1ae/0x280
    [<ffffffff836858ca>] register_netdevice+0x57a/0x16d0
    [<ffffffff82ef67b7>] __vxlan_dev_create+0x7c7/0xa50
    [<ffffffff82ef6ce6>] vxlan_newlink+0xd6/0x130
    [<ffffffff836d02ab>] __rtnl_newlink+0x112b/0x18a0
    [<ffffffff836d0a8c>] rtnl_newlink+0x6c/0xa0
    [<ffffffff836c0ddf>] rtnetlink_rcv_msg+0x43f/0xd40
    [<ffffffff83908ce0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff839066af>] netlink_unicast+0x53f/0x810
    [<ffffffff839072d8>] netlink_sendmsg+0x958/0xe70
    [<ffffffff835c319f>] ____sys_sendmsg+0x78f/0xa90

Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: htb: fix htb_classify() kernel-doc

Fix W=1 kernel-doc warning:

net/sched/sch_htb.c:214: warning: expecting prototype for htb_classify(). Prototype was for HTB_DIRECT() instead

by moving the HTB_DIRECT() macro above the function.
Add kernel-doc notation for function parameters as well.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'cls_drop-fix'

Jamal Hadi Salim says:

====================
net: dont intepret cls results when asked to drop

It is possible that an error in processing may occur in tcf_classify() which
will result in res.classid being some garbage value. Example of such a code path
is when the classifier goes into a loop due to bad policy. See patch 1/2
for a sample splat.
While the core code reacts correctly and asks the caller to drop the packet
(by returning TC_ACT_SHOT) some callers first intepret the res.class as
a pointer to memory and end up dropping the packet only after some activity with
the pointer. There is likelihood of this resulting in an exploit. So lets fix
all the known qdiscs that behave this way.
====================

Signed-off-by: David S. Miller <[email protected]>

net: sched: cbq: dont intepret cls results when asked to drop

If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that
res.class contains a valid pointer

Sample splat reported by Kyle Zeng

[    5.405624] 0: reclassify loop, rule prio 0, protocol 800
[    5.406326] ==================================================================
[    5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0
[    5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299
[    5.408731]
[    5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15
[    5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
[    5.410439] Call Trace:
[    5.410764]  dump_stack+0x87/0xcd
[    5.411153]  print_address_description+0x7a/0x6b0
[    5.411687]  ? vprintk_func+0xb9/0xc0
[    5.411905]  ? printk+0x76/0x96
[    5.412110]  ? cbq_enqueue+0x54b/0xea0
[    5.412323]  kasan_report+0x17d/0x220
[    5.412591]  ? cbq_enqueue+0x54b/0xea0
[    5.412803]  __asan_report_load1_noabort+0x10/0x20
[    5.413119]  cbq_enqueue+0x54b/0xea0
[    5.413400]  ? __kasan_check_write+0x10/0x20
[    5.413679]  __dev_queue_xmit+0x9c0/0x1db0
[    5.413922]  dev_queue_xmit+0xc/0x10
[    5.414136]  ip_finish_output2+0x8bc/0xcd0
[    5.414436]  __ip_finish_output+0x472/0x7a0
[    5.414692]  ip_finish_output+0x5c/0x190
[    5.414940]  ip_output+0x2d8/0x3c0
[    5.415150]  ? ip_mc_finish_output+0x320/0x320
[    5.415429]  __ip_queue_xmit+0x753/0x1760
[    5.415664]  ip_queue_xmit+0x47/0x60
[    5.415874]  __tcp_transmit_skb+0x1ef9/0x34c0
[    5.416129]  tcp_connect+0x1f5e/0x4cb0
[    5.416347]  tcp_v4_connect+0xc8d/0x18c0
[    5.416577]  __inet_stream_connect+0x1ae/0xb40
[    5.416836]  ? local_bh_enable+0x11/0x20
[    5.417066]  ? lock_sock_nested+0x175/0x1d0
[    5.417309]  inet_stream_connect+0x5d/0x90
[    5.417548]  ? __inet_stream_connect+0xb40/0xb40
[    5.417817]  __sys_connect+0x260/0x2b0
[    5.418037]  __x64_sys_connect+0x76/0x80
[    5.418267]  do_syscall_64+0x31/0x50
[    5.418477]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[    5.418770] RIP: 0033:0x473bb7
[    5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00
00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34
24 89
[    5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX:
000000000000002a
[    5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7
[    5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007
[    5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004
[    5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[    5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002
[    5.422471]
[    5.422562] Allocated by task 299:
[    5.422782]  __kasan_kmalloc+0x12d/0x160
[    5.423007]  kasan_kmalloc+0x5/0x10
[    5.423208]  kmem_cache_alloc_trace+0x201/0x2e0
[    5.423492]  tcf_proto_create+0x65/0x290
[    5.423721]  tc_new_tfilter+0x137e/0x1830
[    5.423957]  rtnetlink_rcv_msg+0x730/0x9f0
[    5.424197]  netlink_rcv_skb+0x166/0x300
[    5.424428]  rtnetlink_rcv+0x11/0x20
[    5.424639]  netlink_unicast+0x673/0x860
[    5.424870]  netlink_sendmsg+0x6af/0x9f0
[    5.425100]  __sys_sendto+0x58d/0x5a0
[    5.425315]  __x64_sys_sendto+0xda/0xf0
[    5.425539]  do_syscall_64+0x31/0x50
[    5.425764]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[    5.426065]
[    5.426157] The buggy address belongs to the object at ffff88800e312200
[    5.426157]  which belongs to the cache kmalloc-128 of size 128
[    5.426955] The buggy address is located 42 bytes to the right of
[    5.426955]  128-byte region [ffff88800e312200, ffff88800e312280)
[    5.427688] The buggy address belongs to the page:
[    5.427992] page:000000009875fabc refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0xe312
[    5.428562] flags: 0x100000000000200(slab)
[    5.428812] raw: 0100000000000200 dead000000000100 dead000000000122
ffff888007843680
[    5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff
ffff88800e312401
[    5.429875] page dumped because: kasan: bad access detected
[    5.430214] page->mem_cgroup:ffff88800e312401
[    5.430471]
[    5.430564] Memory state around the buggy address:
[    5.430846]  ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.431267]  ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[    5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.432123]                                   ^
[    5.432391]  ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[    5.432810]  ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[    5.433229] ==================================================================
[    5.433648] Disabling lock debugging due to kernel taint

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Kyle Zeng <[email protected]>
Signed-off-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: atm: dont intepret cls results when asked to drop

If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume
res.class contains a valid pointer
Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent")
Signed-off-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

gpio: sifive: Fix refcount leak in sifive_gpio_probe

of_irq_find_parent() returns a node pointer with refcount incremented,
We should use of_node_put() on it when not needed anymore.
Add missing of_node_put() to avoid refcount leak.

Fixes: 96868dce644d ("gpio/sifive: Add GPIO driver for SiFive SoCs")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

Linux 6.2-rc2

Merge tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

- Pass only an initialized perf event attribute to the LSM hook

- Fix a use-after-free on the perf syscall's error path

- A potential integer overflow fix in amd_core_pmu_init()

- Fix the cgroup events tracking after the context handling rewrite

- Return the proper value from the inherit_event() function on error

* tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Call LSM hook after copying perf_event_attr
  perf: Fix use-after-free in error path
  perf/x86/amd: fix potential integer overflow on shift of a int
  perf/core: Fix cgroup events tracking
  perf core: Return error pointer if inherit_event() fails to find pmu_ctx

Merge tag 'x86_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Two fixes to correct how kprobes handles INT3 now that they're added
   by other functionality like the rethunks and not only kgdb

- Remove __init section markings of two functions which are referenced
   by a function in the .text section

* tag 'x86_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
  x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
  x86/calldepth: Fix incorrect init section references

Merge tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

- Prevent the leaking of a debug timer in futex_waitv()

- A preempt-RT mutex locking fix, adding the proper acquire semantics

* tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error
rtmutex: Add acquire semantics for rtmutex lock acquisition slow path

Merge tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
"I'm just back from the mountains, and Dave is out at the beach and
  should be back in a week again. Just i915 fixes and since Rodrigo
  bothered to make the pull last week I figured I should warm up gpg and
  forward this in a nice signed tag as a new years present!

   - i915 fixes for newer platforms

   - i915 locking rework to not give up in vm eviction fallback path too
     early"

* tag 'drm-fixes-2023-01-01' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/dsi: fix MIPI_BKLT_EN_1 native GPIO index
  drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence
  drm/i915/uc: Fix two issues with over-size firmware files
  drm/i915: improve the catch-all evict to handle lock contention
  drm/i915: Remove __maybe_unused from mtl_info
  drm/i915: fix TLB invalidation for Gen12.50 video and compute engines

dt-bindings: net: marvell,orion-mdio: Fix examples

As stated in marvell-orion-mdio.txt deleted in commit 0781434af811f
("dt-bindings: net: orion-mdio: Convert to JSON schema") if
'interrupts' property is present, width of 'reg' should be 0x84.
Otherwise, width of 'reg' should be 0x4. Fix 'examples:' and add
constraints checking whether 'interrupts' property is present
and validate it against fixed values in reg.

Signed-off-by: Michał Grzelak <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: sun8i-emac: Add phy-supply property

This property has always been supported by the Linux driver; see
commit 9f93ac8d4085 ("net-next: stmmac: Add dwmac-sun8i"). In fact, the
original driver submission includes the phy-supply code but no mention
of it in the binding, so the omission appears to be accidental. In
addition, the property is documented in the binding for the previous
hardware generation, allwinner,sun7i-a20-gmac.

Document phy-supply in the binding to fix devicetree validation for the
25+ boards that already use this property.

Fixes: 0441bde003be ("dt-bindings: net-next: Add DT bindings documentation for Allwinner dwmac-sun8i")
Acked-by: Rob Herring <[email protected]>
Reviewed-by: Andre Przywara <[email protected]>
Signed-off-by: Samuel Holland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipa: use proper endpoint mask for suspend

It is now possible for a system to have more than 32 endpoints. As
a result, registers related to endpoint suspend are parameterized,
with 32 endpoints represented in one more registers.

In ipa_interrupt_suspend_control(), the IPA_SUSPEND_EN register
offset is determined properly, but the bit mask used still assumes
the number of enpoints won't exceed 32. This is a bug. Fix it.

Fixes: f298ba785e2d ("net: ipa: add a parameter to suspend registers")
Signed-off-by: Alex Elder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'selftests-fix'

Po-Hsu Lin says:

====================
selftests: net: fix for arp_ndisc_evict_nocarrier test

This patchset will fix a false-positive issue caused by the command in
cleanup_v6() of the arp_ndisc_evict_nocarrier test.

Also, it will make the test to return a non-zero value for any failure
reported in the test for us to avoid false-negative results.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier

Return non-zero return value if there is any failure reported in this
script during the test. Otherwise it can only reflect the status of
the last command.

Fixes: f86ca07eb531 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: net: fix cleanup_v6() for arp_ndisc_evict_nocarrier

The cleanup_v6() will cause the arp_ndisc_evict_nocarrier script exit
with 255 (No such file or directory), even the tests are good:

# selftests: net: arp_ndisc_evict_nocarrier.sh
# run arp_evict_nocarrier=1 test
# RTNETLINK answers: File exists
# ok
# run arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run all.arp_evict_nocarrier=0 test
# RTNETLINK answers: File exists
# ok
# run ndisc_evict_nocarrier=1 test
# ok
# run ndisc_evict_nocarrier=0 test
# ok
# run all.ndisc_evict_nocarrier=0 test
# ok
not ok 1 selftests: net: arp_ndisc_evict_nocarrier.sh # exit=255

This is because it's trying to modify the parameter for ipv4 instead.

Also, tests for ipv6 (run_ndisc_evict_nocarrier_enabled() and
run_ndisc_evict_nocarrier_disabled() are working on veth1, reflect
this fact in cleanup_v6().

Fixes: f86ca07eb531 ("selftests: net: add arp_ndisc_evict_nocarrier")
Signed-off-by: Po-Hsu Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Update documentation for get_rate_matching

Now that phylink no longer calls phy_get_rate_matching with
PHY_INTERFACE_MODE_NA, phys no longer need to support it. Remove the
documentation mandating support.

Fixes: 7642cc28fd37 ("net: phylink: fix PHY validation with rate adaption")
Signed-off-by: Sean Anderson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'drm-intel-fixes-2022-12-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- fix TLB invalidation for DG2 and newer platforms. (Andrzej)
- Remove __maybe_unused from mtl_info (Lucas)
- improve the catch-all evict to handle lock contention (Matt Auld)
- Fix two issues with over-size (GuC/HuC) firmware files (John)
- Fix DSI resume issues on ICL+ (Jani)

Signed-off-by: Daniel Vetter <[email protected]>
From: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge branch 'dsa-qca8k-fixes'

Christian Marangi says:

====================
net: dsa: qca8k: multiple fix on mdio read/write

Due to some problems in reading the Documentation and elaborating it
some wrong assumption were done. The error was reported and notice only
now due to how things are setup in the code flow.

First 2 patch fix mgmt eth where the lenght calculation is very
confusing and in step of word size. (the related commit description have
an extensive description about how this mess works)

Last 3 patch revert the broken mdio cache and apply a correct version
that should still save some extra mdio in phy poll secnario.

These 5 patch fix each related problem and apply what the Documentation
actually say.

Changes v2:
- Add cover letter
- Fix typo in revert patch
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: improve mdio master read/write by using single lo/hi

Improve mdio master read/write by using singe mii read/write lo/hi.

In a read and write we need to poll the mdio master regs in a busy loop
to check for a specific bit present in the upper half of the reg. We can
ignore the other half since it won't contain useful data. This will save
an additional useless read for each read and write operation.

In a read operation the returned data is present in the mdio master reg
lower half. We can ignore the other half since it won't contain useful
data. This will save an additional useless read for each read operation.

In a read operation it's needed to just set the hi half of the mdio
master reg as the lo half will be replaced by the result. This will save
an additional useless write for each read operation.

Tested-by: Ronald Wahl <[email protected]>
Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: introduce single mii read/write lo/hi

It may be useful to read/write just the lo or hi half of a reg.

This is especially useful for phy poll with the use of mdio master.
The mdio master reg is composed by the first 16 bit related to setup and
the other half with the returned data or data to write.

Refactor the mii function to permit single mii read/write of lo or hi
half of the reg.

Tested-by: Ronald Wahl <[email protected]>
Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net: dsa: qca8k: cache lo and hi for mdio write"

This reverts commit 2481d206fae7884cd07014fd1318e63af35e99eb.

The Documentation is very confusing about the topic.
The cache logic for hi and lo is wrong and actually miss some regs to be
actually written.

What the Documentation actually intended was that it's possible to skip
writing hi OR lo if half of the reg is not needed to be written or read.

Revert the change in favor of a better and correct implementation.

Reported-by: Ronald Wahl <[email protected]>
Signed-off-by: Christian Marangi <[email protected]>
Cc: [email protected] # v5.18+
Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_qca: fix wrong MGMT_DATA2 size

It was discovered that MGMT_DATA2 can contain up to 28 bytes of data
instead of the 12 bytes written in the Documentation by accounting the
limit of 16 bytes declared in Documentation subtracting the first 4 byte
in the packet header.

Update the define with the real world value.

Tested-by: Ronald Wahl <[email protected]>
Fixes: c2ee8181fddb ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet")
Signed-off-by: Christian Marangi <[email protected]>
Cc: [email protected] # v5.18+
Signed-off-by: David S. Miller <[email protected]>

net: dsa: qca8k: fix wrong length value for mgmt eth packet

The assumption that Documentation was right about how this value work was
wrong. It was discovered that the length value of the mgmt header is in
step of word size.

As an example to process 4 byte of data the correct length to set is 2.
To process 8 byte 4, 12 byte 6, 16 byte 8...

Odd values will always return the next size on the ack packet.
(length of 3 (6 byte) will always return 8 bytes of data)

This means that a value of 15 (0xf) actually means reading/writing 32 bytes
of data instead of 16 bytes. This behaviour is totally absent and not
documented in the switch Documentation.

In fact from Documentation the max value that mgmt eth can process is
16 byte of data while in reality it can process 32 bytes at once.

To handle this we always round up the length after deviding it for word
size. We check if the result is odd and we round another time to align
to what the switch will provide in the ack packet.
The workaround for the length limit of 15 is still needed as the length
reg max value is 0xf(15)

Reported-by: Ronald Wahl <[email protected]>
Tested-by: Ronald Wahl <[email protected]>
Fixes: 90386223f44e ("net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet")
Signed-off-by: Christian Marangi <[email protected]>
Cc: [email protected] # v5.18+
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'kbuild-fixes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix broken BuildID

- Add srcrpm-pkg to the help message

- Fix the option order for modpost built with musl libc

- Fix the build dependency of rpm-pkg for openSUSE

* tag 'kbuild-fixes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  fixdep: remove unneeded <stdarg.h> inclusion
  kbuild: sort single-targets alphabetically again
  kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires
  kbuild: Fix running modpost with musl libc
  kbuild: add a missing line for help message
  .gitignore: ignore *.rpm
  arch: fix broken BuildID for arm64 and riscv
  kconfig: Add static text for search information in help menu

Merge tag 'ata-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fix from Damien Le Moal:
"A single fix to address an issue with wake from suspend with PCS
adapters, from Adam"

* tag 'ata-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci: Fix PCS quirk application for suspend

Merge tag 'acpi-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These are new ACPI IRQ override quirks, low-power S0 idle (S0ix)
  support adjustments and ACPI backlight handling fixes, mostly for
  platforms using AMD chips.

  Specifics:

   - Add ACPI IRQ override quirks for Asus ExpertBook B2502, Lenovo
     14ALC7, and XMG Core 15 (Hans de Goede, Adrian Freund, Erik
     Schumacher).

   - Adjust ACPI video detection fallback path to prevent
     non-operational ACPI backlight devices from being created on
     systems where the native driver does not detect a suitable panel
     (Mario Limonciello).

   - Fix Apple GMUX backlight detection (Hans de Goede).

   - Add a low-power S0 idle (S0ix) handling quirk for HP Elitebook 865
     and stop using AMD-specific low-power S0 idle code path for systems
     with Rembrandt chips and newer (Mario Limonciello)"

* tag 'acpi-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: x86: s2idle: Stop using AMD specific codepath for Rembrandt+
  ACPI: x86: s2idle: Force AMD GUID/_REV 2 on HP Elitebook 865
  ACPI: video: Fix Apple GMUX backlight detection
  ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks
  ACPI: resource: do IRQ override on Lenovo 14ALC7
  ACPI: resource: do IRQ override on XMG Core 15
  ACPI: video: Don't enable fallback path for creating ACPI backlight by default
  drm/amd/display: Report to ACPI video if no panels were found
  ACPI: video: Allow GPU drivers to report no panels

Merge tag 'sound-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a few small fixes:

   - A regression fix for HDMI audio on HD-audio AMD codecs

   - Fixes for LINE6 MIDI handling

   - HD-audio quirk for Dell laptops"

* tag 'sound-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/hdmi: Static PCM mapping again with AMD HDMI codecs
  ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops
  ALSA: line6: fix stack overflow in line6_midi_transmit
  ALSA: line6: correct midi status byte when receiving data from podxt

Merge branches 'acpi-resource' and 'acpi-video'

Merge ACPI resource handling quirks and ACPI backlight handling fixes
for 6.2-rc2:

- Add ACPI IRQ override quirks for Asus ExpertBook B2502, Lenovo
   14ALC7, and XMG Core 15 (Hans de Goede, Adrian Freund,  Erik
   Schumacher).

- Adjust ACPI video detection fallback path to prevent non-operational
   ACPI backlight devices from being created on systems where the native
   driver does not detect a suitable panel (Mario Limonciello).

- Fix Apple GMUX backlight detection (Hans de Goede).

* acpi-resource:
  ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks
  ACPI: resource: do IRQ override on Lenovo 14ALC7
  ACPI: resource: do IRQ override on XMG Core 15

* acpi-video:
  ACPI: video: Fix Apple GMUX backlight detection
  ACPI: video: Don't enable fallback path for creating ACPI backlight by default
  drm/amd/display: Report to ACPI video if no panels were found
  ACPI: video: Allow GPU drivers to report no panels

gpio: sprd: Make the irqchip immutable

Make the struct irq_chip const, flag it as IRQCHIP_IMMUTABLE, add the
new helper functions, and call the appropriate gpiolib functions.

Signed-off-by: Cixi Geng <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: pmic-eic-sprd: Make the irqchip immutable

Remove the irq_chip from pmic_eic structure,
use the various calls by defining the statically
irq_chip structure.

Signed-off-by: Cixi Geng <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: eic-sprd: Make the irqchip immutable

Remove the irq_chip from pmic_eic structure,
use the various calls by defining the statically
irq_chip structure.

Signed-off-by: Cixi Geng <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: pca953x: avoid to use uninitialized value pinctrl

There is a variable pinctrl declared without initializer. And then
has the case (switch operation chose the default case) to directly
use this uninitialized value, this is not a safe behavior. So here
initialize the pinctrl as 0 to avoid this issue.
This is reported by Coverity.

Fixes: 13c5d4ce8060 ("gpio: pca953x: Add support for PCAL6534")
Signed-off-by: Haibo Chen <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpiolib: Fix using uninitialized lookup-flags on ACPI platforms

Commit 8eb1f71e7acc ("gpiolib: consolidate GPIO lookups") refactors
fwnode_get_named_gpiod() and gpiod_get_index() into a unified
gpiod_find_and_request() helper.

The old functions both initialized their local lookupflags variable to
GPIO_LOOKUP_FLAGS_DEFAULT, but the new code leaves it uninitialized.

This is a problem for at least ACPI platforms, where acpi_find_gpio()
only does a bunch of *lookupflags |= GPIO_* statements and thus relies
on the variable being initialized.

The variable not being initialized leads to:

1. Potentially the wrong flags getting used
2. The check for conflicting lookup flags in gpiod_configure_flags():
"multiple pull-up, pull-down or pull-disable enabled, invalid config"
sometimes triggering, making the GPIO unavailable

Restore the initialization of lookupflags to GPIO_LOOKUP_FLAGS_DEFAULT
to fix this.

Fixes: 8eb1f71e7acc ("gpiolib: consolidate GPIO lookups")
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

drm/i915/dsi: fix MIPI_BKLT_EN_1 native GPIO index

Due to copy-paste fail, MIPI_BKLT_EN_1 would always use PPS index 1,
never 0. Fix the sloppiest commit in recent memory.

Fixes: 963bbdb32b47 ("drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence")
Reported-by: Ville Syrjälä <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a561933c571798868b5fa42198427a7e6df56c09)
Cc: [email protected] # 6.1
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915/dsi: add support for ICL+ native MIPI GPIO sequence

Starting from ICL, the default for MIPI GPIO sequences seems to be using
native GPIOs i.e. GPIOs available in the GPU. These native GPIOs reuse
many pins that quite frankly seem scary to poke based on the VBT
sequences. We pretty much have to trust that the board is configured
such that the relevant HPD, PP_CONTROL and GPIO bits aren't used for
anything else.

MIPI sequence v4 also adds a flag to fall back to non-native sequences.

v5:
- Wrap SHOTPLUG_CTL_DDI modification in spin_lock() in icp_irq_handler()
too (Ville)
- References instead of Closes issue 6131 because this does not fix everything

v4:
- Wrap SHOTPLUG_CTL_DDI modification in spin_lock_irq() (Ville)

v3:
- Fix -Wbitwise-conditional-parentheses (kernel test robot <[email protected]>)

v2:
- Fix HPD pin output set (impacts GPIOs 0 and 5)
- Fix GPIO data output direction set (impacts GPIOs 4 and 9)
- Reduce register accesses to single intel_de_rwm()

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6131
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit f087cfe6fcff58044f7aa3b284965af47f472fb0)
Cc: [email protected] # 6.1
Signed-off-by: Rodrigo Vivi <[email protected]>

fixdep: remove unneeded <stdarg.h> inclusion

This is unneeded since commit 69304379ff03 ("fixdep: use fflush() and
ferror() to ensure successful write to files").

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: sort single-targets alphabetically again

This was previously alphabetically sorted. Sort it again.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Miguel Ojeda <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>

kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires

Guoqing Jiang reports that openSUSE cannot compile the kernel rpm due
to "BuildRequires: elfutils-libelf-devel" added by commit 8818039f959b
("kbuild: add ability to make source rpm buildable using koji").
The relevant package name in openSUSE is libelf-devel.

Add it as an alternative package.

BTW, if it is impossible to solve the build requirement, the final
resort would be:

    $ make RPMOPTS=--nodeps rpm-pkg

This passes --nodeps to the rpmbuild command so it will not verify
build dependencies. This is useful to test rpm builds on non-rpm
system. On Debian/Ubuntu, for example, you can install rpmbuild by
'apt-get install rpm'.

NOTE1:
  Likewise, it is possible to bypass the build dependency check for
  debian package builds:

    $ make DPKG_FLAGS=-d deb-pkg

NOTE2:
  The 'or' operator is supported since RPM 4.13. So, old distros such
  as CentOS 7 will break. I suggest installing newer rpmbuild in such
  cases.

Link: https://lore.kernel.org/linux-kbuild/[email protected]/T/#u
Fixes: 8818039f959b ("kbuild: add ability to make source rpm buildable using koji")
Reported-by: Guoqing Jiang <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Guoqing Jiang <[email protected]>
Acked-by: Jonathan Toppins <[email protected]>

kbuild: Fix running modpost with musl libc

commit 3d57e1b7b1d4 ("kbuild: refactor the prerequisites of the modpost
rule") moved 'vmlinux.o' inside modpost-args, possibly before some of
the other options. However, getopt() in musl libc follows POSIX and
stops looking for options upon reaching the first non-option argument.
As a result, the '-T' option is misinterpreted as a positional argument,
and the build fails:

  make -f ./scripts/Makefile.modpost
     scripts/mod/modpost   -E   -o Module.symvers vmlinux.o -T modules.order
  -T: No such file or directory
  make[1]: *** [scripts/Makefile.modpost:137: Module.symvers] Error 1
  make: *** [Makefile:1960: modpost] Error 2

The fix is to move all options before 'vmlinux.o' in modpost-args.

Fixes: 3d57e1b7b1d4 ("kbuild: refactor the prerequisites of the modpost rule")
Signed-off-by: Samuel Holland <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: add a missing line for help message

The help message line for building the source RPM package was missing.
Added it.

Signed-off-by: Jun ASAKA <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

.gitignore: ignore *.rpm

Previously, *.rpm files were created under $HOME/rpmbuild/, but since
commit 8818039f959b ("kbuild: add ability to make source rpm buildable
using koji"), srcrpm-pkg creates the source rpm in the kernel tree
because it sets '_srcrpmdir'.

Signed-off-by: Masahiro Yamada <[email protected]>

arch: fix broken BuildID for arm64 and riscv

Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
since commit 994b7ac1697b ("arm64: remove special treatment for the
link order of head.o").

The issue is that the type of .notes section, which contains the BuildID,
changed from NOTES to PROGBITS.

Ard Biesheuvel figured out that whichever object gets linked first gets
to decide the type of a section. The PROGBITS type is the result of the
compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.

While Ard provided a fix for arm64, I want to fix this globally because
the same issue is happening on riscv since commit 2348e6bf4421 ("riscv:
remove special treatment for the link order of head.o"). This problem
will happen in general for other architectures if they start to drop
unneeded entries from scripts/head-object-list.txt.

Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.

Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
Fixes: 994b7ac1697b ("arm64: remove special treatment for the link order of head.o")
Fixes: 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o")
Reported-by: Dennis Gilmore <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>

drm/i915/uc: Fix two issues with over-size firmware files

In the case where a firmware file is too large (e.g. someone
downloaded a web page ASCII dump from github...), the firmware object
is released but the pointer is not zerod. If no other firmware file
was found then release would be called again leading to a double kfree.

Also, the size check was only being applied to the initial firmware
load not any of the subsequent attempts. So move the check into a
wrapper that is used for all loads.

Fixes: 016241168dc5 ("drm/i915/uc: use different ggtt pin offsets for uc loads")
Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Daniele Ceraolo Spurio <[email protected]>
Cc: Alan Previn <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: "Thomas Hellström" <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 4071d98b296a5bc5fd4b15ec651bd05800ec9510)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: improve the catch-all evict to handle lock contention

The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.

v2 (Mani)
- Also revamp the docs for the different passes.

Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Mani Milani <[email protected]>
Cc: <[email protected]> # v5.18+
Reviewed-by: Mani Milani <[email protected]>
Tested-by: Mani Milani <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 801fa7a81f6da533cc5442fc40e32c72b76cd42a)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: Remove __maybe_unused from mtl_info

The attribute __maybe_unused should remain only until the respective
info is not in the pciidlist. The info can't be added together
with its definition because that would cause the driver to automatically
probe for the device, while it's still not ready for that. However once
pciidlist contains it, the attribute can be removed.

Fixes: 7835303982d1 ("drm/i915/mtl: Add MeteorLake PCI IDs")
Signed-off-by: Lucas De Marchi <[email protected]>
Reviewed-by: Radhakrishna Sripada <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 50490ce05b7a50b0bd4108fa7d6db3ca2972fa83)
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/i915: fix TLB invalidation for Gen12.50 video and compute engines

In case of Gen12.50 video and compute engines, TLB_INV registers are
masked - to modify one bit, corresponding bit in upper half of the register
must be enabled, otherwise nothing happens.

Fixes: 77fa9efc16a9 ("drm/i915/xehp: Create separate reg definitions for new MCR registers")
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 4d5cf7b1680a1e6db327e3c935ef58325cbedb2c)
Signed-off-by: Rodrigo Vivi <[email protected]>

net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe

of_phy_find_device() return device node with refcount incremented.
Call put_device() to relese it when not needed anymore.

Fixes: ab4e6ee578e8 ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ena-fixes'

David Arinzon says:

====================
ENA driver bug fixes
====================

Signed-off-by: David S. Miller <[email protected]>

net: ena: Update NUMA TPH hint register upon NUMA node update

The device supports a PCIe optimization hint, which indicates on
which NUMA the queue is currently processed. This hint is utilized
by PCIe in order to reduce its access time by accessing the
correct NUMA resources and maintaining cache coherence.

The driver calls the register update for the hint (called TPH -
TLP Processing Hint) during the NAPI loop.

Though the update is expected upon a NUMA change (when a queue
is moved from one NUMA to the other), the current logic performs
a register update when the queue is moved to a different CPU,
but the CPU is not necessarily in a different NUMA.

The changes include:
1. Performing the TPH update only when the queue has switched
a NUMA node.
2. Moving the TPH update call to be triggered only when NAPI was
scheduled from interrupt context, as opposed to a busy-polling loop.
This is due to the fact that during busy-polling, the frequency
of CPU switches for a particular queue is significantly higher,
thus, the likelihood to switch NUMA is much higher. Therefore,
providing the frequent updates to the device upon a NUMA update
are unlikely to be beneficial.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Set default value for RX interrupt moderation

RX ring can be NULL in XDP use cases where only TX queues
are configured. In this scenario, the RX interrupt moderation
value sent to the device remains in its default value of 0.

In this change, setting the default value of the RX interrupt
moderation to be the same as of the TX.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Fix rx_copybreak value update

Make the upper bound on rx_copybreak tighter, by
making sure it is smaller than the minimum of mtu and
ENA_PAGE_SIZE. With the current upper bound of mtu,
rx_copybreak can be larger than a page. Such large
rx_copybreak will not bring any performance benefit to
the user and therefore makes no sense.

In addition, the value update was only reflected in
the adapter structure, but not applied for each ring,
causing it to not take effect.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Osama Abboud <[email protected]>
Signed-off-by: Arthur Kiyanovski <[email protected]>
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Use bitmask to indicate packet redirection

Redirecting packets with XDP Redirect is done in two phases:
1. A packet is passed by the driver to the kernel using
   xdp_do_redirect().
2. After finishing polling for new packets the driver lets the kernel
   know that it can now process the redirected packet using
   xdp_do_flush_map().
   The packets' redirection is handled in the napi context of the
   queue that called xdp_do_redirect()

To avoid calling xdp_do_flush_map() each time the driver first checks
whether any packets were redirected, using
xdp_flags |= xdp_verdict;
and
if (xdp_flags & XDP_REDIRECT)
    xdp_do_flush_map()

essentially treating XDP instructions as a bitmask, which isn't the case:
    enum xdp_action {
    XDP_ABORTED = 0,
    XDP_DROP,
    XDP_PASS,
    XDP_TX,
    XDP_REDIRECT,
    };

Given the current possible values of xdp_action, the current design
doesn't have a bug (since XDP_REDIRECT = 100b), but it is still
flawed.

This patch makes the driver use a bitmask instead, to avoid future
issues.

Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Account for the number of processed bytes in XDP

The size of packets that were forwarded or dropped by XDP wasn't added
to the total processed bytes statistic.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Don't register memory info on XDP exchange

Since the queues aren't destroyed when we only exchange XDP programs,
there's no need to re-register them again.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Fix toeplitz initial hash value

On driver initialization, RSS hash initial value is set to zero,
instead of the default value. This happens because we pass NULL as
the RSS key parameter, which caused us to never initialize
the RSS hash value.

This patch fixes it by making sure the initial value is set, no matter
what the value of the RSS key is.

Fixes: 91a65b7d3ed8 ("net: ena: fix potential crash when rxfh key is NULL")
Signed-off-by: Nati Koler <[email protected]>
Signed-off-by: David Arinzon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: net: fix cmsg_so_mark.sh test hang

This cmsg_so_mark.sh test will hang on non-amd64 systems because of the
infinity loop for argument parsing in cmsg_sender.

Variable "o" in cs_parse_args() for taking getopt() should be an int,
otherwise it will be 255 when getopt() returns -1 on non-amd64 system
and thus causing infinity loop.

Link: https://lore.kernel.org/lkml/CA+G9fYsM2k7mrF7W4V_TrZ-qDauWM394=8yEJ=-t1oUg8_40YA@mail.gmail.com/t/
Signed-off-by: Po-Hsu Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-fixes-2022-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-fixes-2022-12-28

net: amd-xgbe: add missed tasklet_kill

The driver does not call tasklet_kill in several places.
Add the calls to fix it.

Fixes: 85b85c853401 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Jiguang Xiao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refine the handling for VF heartbeat

Currently, the PF check the VF alive by the KEEP_ALVE
mailbox from VF. VF keep sending the mailbox per 2
seconds. Once PF lost the mailbox for more than 8
seconds, it will regards the VF is abnormal, and stop
notifying the state change to VF, include link state,
vf mac, reset, even though it receives the KEEP_ALIVE
mailbox again. It's inreasonable.

This patch fixes it. PF will record the state change which
need to notify VF when lost the VF's KEEP_ALIVE mailbox.
And notify VF when receive the mailbox again. Introduce a
new flag HCLGE_VPORT_STATE_INITED, used to distinguish the
case whether VF driver loaded or not. For VF will query
these states when initializing, so it's unnecessary to
notify it in this case.

Fixes: aa5c4f175be6 ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Hao Lan <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: freescale: enetc: Drop empty platform remove function

A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.

Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: broadcom: bcm63xx_enet: Drop empty platform remove function

A remove callback just returning 0 is equivalent to no remove callback
at all. So drop the useless function.

Signed-off-by: Uwe Kleine-König <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tcp-bhash2-fixes'

Kuniyuki Iwashima says:

===================
tcp: Fix bhash2 and TIME_WAIT regression.

We forgot to add twsk to bhash2.  Therefore TIME_WAIT sockets cannot
prevent bind() to the same local address and port.

Changes:
  v1:
    * Patch 1:
      * Add tw_bind2_node in inet_timewait_sock instead of
        moving sk_bind2_node from struct sock to struct
sock_common.
====================

Signed-off-by: David S. Miller <[email protected]>

tcp: Add selftest for bind() and TIME_WAIT.

bhash2 split the bind() validation logic into wildcard and non-wildcard
cases.  Let's add a test to catch future regression.

Before the previous patch:

  # ./bind_timewait
  TAP version 13
  1..2
  # Starting 2 tests from 3 test cases.
  #  RUN           bind_timewait.localhost.1 ...
  # bind_timewait.c:87:1:Expected ret (0) == -1 (-1)
  # 1: Test terminated by assertion
  #          FAIL  bind_timewait.localhost.1
  not ok 1 bind_timewait.localhost.1
  #  RUN           bind_timewait.addrany.1 ...
  #            OK  bind_timewait.addrany.1
  ok 2 bind_timewait.addrany.1
  # FAILED: 1 / 2 tests passed.
  # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0

After:

  # ./bind_timewait
  TAP version 13
  1..2
  # Starting 2 tests from 3 test cases.
  #  RUN           bind_timewait.localhost.1 ...
  #            OK  bind_timewait.localhost.1
  ok 1 bind_timewait.localhost.1
  #  RUN           bind_timewait.addrany.1 ...
  #            OK  bind_timewait.addrany.1
  ok 2 bind_timewait.addrany.1
  # PASSED: 2 / 2 tests passed.
  # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Joanne Koong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: Add TIME_WAIT sockets in bhash2.

Jiri Slaby reported regression of bind() with a simple repro. [0]

The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port. Before commit 28044fc1d495 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.

The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.

The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.

To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().

[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258 [email protected]/

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <[email protected]>
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Joanne Koong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

libbpf: Restore errno after pr_warn.

pr_warn calls into user-provided callback, which can clobber errno, so
`errno = saved_errno` should happen after pr_warn.

Fixes: 07453245620c ("libbpf: fix errno is overwritten after being closed.")
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"Mostly just NVMe, but also a single fixup for BFQ for a regression
  that happened during the merge window. In detail:

   - NVMe pull requests via Christoph:
      - Fix doorbell buffer value endianness (Klaus Jensen)
      - Fix Linux vs NVMe page size mismatch (Keith Busch)
      - Fix a potential use memory access beyong the allocation limit
        (Keith Busch)
      - Fix a multipath vs blktrace NULL pointer dereference (Yanjun
        Zhang)
      - Fix various problems in handling the Command Supported and
        Effects log (Christoph Hellwig)
      - Don't allow unprivileged passthrough of commands that don't
        transfer data but modify logical block content (Christoph
        Hellwig)
      - Add a features and quirks policy document (Christoph Hellwig)
      - Fix some really nasty code that was correct but made smatch
        complain (Sagi Grimberg)

   - Use-after-free regression in BFQ from this merge window (Yu)"

* tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux:
  nvme-auth: fix smatch warning complaints
  nvme: consult the CSE log page for unprivileged passthrough
  nvme: also return I/O command effects from nvme_command_effects
  nvmet: don't defer passthrough commands with trivial effects to the workqueue
  nvmet: set the LBCC bit for commands that modify data
  nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
  nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
  docs, nvme: add a feature and quirk policy document
  nvme-pci: update sqsize when adjusting the queue depth
  nvme: fix setting the queue depth in nvme_alloc_io_tag_set
  block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
  nvme: fix multipath crash caused by flush request when blktrace is enabled
  nvme-pci: fix page size checks
  nvme-pci: fix mempool alloc size
  nvme-pci: fix doorbell buffer value endianness

Merge tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- Two fixes for mutex grabbing when the task state is != TASK_RUNNING
   (me)

- Check for invalid opcode in io_uring_register() a bit earlier, to
   avoid going through the quiesce machinery just to return -EINVAL
   later in the process (me)

- Fix for the uapi io_uring header, skipping including time_types.h
   when necessary (Stefan)

* tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux:
  uapi:io_uring.h: allow linux/time_types.h to be skipped
  io_uring: check for valid register opcode earlier
  io_uring/cancel: re-grab ctx mutex after finishing wait
  io_uring: finish waiting before flushing overflow entries

Merge tag 'linux-kselftest-kunit-fixes-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull KUnit fix from Shuah Khan:

- alloc_string_stream_fragment() error path fix to free before
returning a failure.

* tag 'linux-kselftest-kunit-fixes-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: alloc_string_stream_fragment error handling bug fix

libbpf: Added the description of some API functions

Currently, many API functions are not described in the document.
Add add API description of the following four API functions:
  - libbpf_set_print;
  - bpf_object__open;
  - bpf_object__load;
  - bpf_object__close.

Signed-off-by: Xin Liu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'samples/bpf: enhance syscall tracing program'

"Daniel T. Lee" says:

====================

Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects. This commit
enhances syscall tracing program with the following instruments.

In this patchset, ksyscall is used instead of kprobe. By using
ksyscall, libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.
Also ksyscall provides more fine method with attaching system call, the
coarse SYSCALL helper at trace_common.h can be removed.

Next, BPF_SYSCALL is used to reduce the inconvenience of parsing
arguments. Since the nature of SYSCALL_WRAPPER function wraps the
argument once, additional process of argument extraction is required
to properly parse the argument. The BPF_SYSCALL macro will reduces the
hassle of parsing arguments from pt_regs.

Lastly, vmlinux.h is applied to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.

Additionally, this patchset changes the suffix of _kern to .bpf to make
use of the new compile rule (CLANG-BPF) which is more simple and neat.
By just changing the _kern suffix to .bpf will inherit the benefit of
the new CLANG-BPF compile target.

Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
This will fix the compiling problem with 'clang -target bpf'.

To fix the build error with the s390x, this patchset also includes the
fix of libbpf invalid return address register mapping in s390.
---
Changes in V2:
- add gnu/stub.h hack to fix compile error with 'clang -target bpf'
Changes in V3:
- fix libbpf invalid return address register mapping in s390
====================

Signed-off-by: Andrii Nakryiko <[email protected]>

libbpf: Fix invalid return address register in s390

There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.

[1]: https://uclibc.org/docs/psABI-s390x.pdf#page=14

Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs

This commit enhances the syscall tracing programs by using the
BPF_SYSCALL macro to reduce the inconvenience of parsing arguments from
pt_regs. By simplifying argument extraction, bpf program will become
clear to understand.

Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro

Currently, there is a problem with tracex2, as it doesn't print the
histogram properly and the results are misleading. (all results report
as 0)

The problem is caused by a change in arguments of the function to which
the kprobe connects. This tracex2 bpf program uses kprobe (attached
to __x64_sys_write) to figure out the size of the write system call. In
order to achieve this, the third argument 'count' must be intact.

The following is a prototype of the sys_write variant. (checked with
pfunct)

    ~/git/linux$ pfunct -P fs/read_write.o | grep sys_write
    ssize_t ksys_write(unsigned int fd, const char  * buf, size_t count);
    long int __x64_sys_write(const struct pt_regs  * regs);
    ... cross compile with s390x ...
    long int __s390_sys_write(struct pt_regs * regs);

Since the nature of SYSCALL_WRAPPER function wraps the argument once,
additional process of argument extraction is required to properly parse
the argument.

    #define BPF_KSYSCALL(name, args...)
    ... snip ...
    struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER                    \
   ? (struct pt_regs *)PT_REGS_PARM1(ctx)       \
   : ctx;                                       \

In order to fix this problem, the BPF_SYSCALL macro has been used. This
reduces the hassle of parsing arguments from pt_regs. Since the macro
uses the CORE version of argument extraction, additional portability
comes too.

Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Change _kern suffix to .bpf with syscall tracing program

Currently old compile rule (CLANG-bpf) doesn't contains VMLINUX_H define
flag which is essential for the bpf program that includes "vmlinux.h".
Also old compile rule doesn't directly specify the compile target as bpf,
instead it uses bunch of extra options with clang followed by long chain
of commands. (e.g. clang | opt | llvm-dis | llc)

In Makefile, there is already new compile rule which is more simple and
neat. And it also has -D__VMLINUX_H__ option. By just changing the _kern
suffix to .bpf will inherit the benefit of the new CLANG-BPF compile
target.

Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
As commit 1c2dd16add7e ("selftests/bpf: get rid of -D__x86_64__") noted,
compiling with 'clang -target bpf' will raise an error with stubs.h
unless workaround (-D__x86_64) is used. This commit solves this problem
by adding dummy stub.h to make /usr/include/features.h to follow the
expected path as the same way selftests/bpf dealt with.

Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program

This commit applies vmlinux.h to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.

Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples/bpf: Use kyscall instead of kprobe in syscall tracing program

Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects.

In this commit, ksyscall is used instead of kprobe. By using ksyscall,
libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.

In addition, ksyscall provides more fine method with attaching system
call, the coarse SYSCALL helper at trace_common.h can be removed.

Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Changes that were posted too late for 6.1, or after the release.

  x86:

   - several fixes to nested VMX execution controls

   - fixes and clarification to the documentation for Xen emulation

   - do not unnecessarily release a pmu event with zero period

   - MMU fixes

   - fix Coverity warning in kvm_hv_flush_tlb()

  selftests:

   - fixes for the ucall mechanism in selftests

   - other fixes mostly related to compilation with clang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (41 commits)
  KVM: selftests: restore special vmmcall code layout needed by the harness
  Documentation: kvm: clarify SRCU locking order
  KVM: x86: fix deadlock for KVM_XEN_EVTCHN_RESET
  KVM: x86/xen: Documentation updates and clarifications
  KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi
  KVM: x86/xen: Simplify eventfd IOCTLs
  KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_ports
  KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly
  KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()
  KVM: Delete extra block of "};" in the KVM API documentation
  kvm: x86/mmu: Remove duplicated "be split" in spte.h
  kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()
  MAINTAINERS: adjust entry after renaming the vmx hyperv files
  KVM: selftests: Mark correct page as mapped in virt_map()
  KVM: arm64: selftests: Don't identity map the ucall MMIO hole
  KVM: selftests: document the default implementation of vm_vaddr_populate_bitmap
  KVM: selftests: Use magic value to signal ucall_alloc() failure
  KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warning
  KVM: selftests: Include lib.mk before consuming $(CC)
  KVM: selftests: Explicitly disable builtins for mem*() overrides
  ...

Merge tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme into block-6.2

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.2

- fix various problems in handling the Command Supported and Effects log
   (Christoph Hellwig)
- don't allow unprivileged passthrough of commands that don't transfer
   data but modify logical block content (Christoph Hellwig)
- add a features and quirks policy document (Christoph Hellwig)
- fix some really nasty code that was correct but made smatch complain
   (Sagi Grimberg)"

* tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme:
  nvme-auth: fix smatch warning complaints
  nvme: consult the CSE log page for unprivileged passthrough
  nvme: also return I/O command effects from nvme_command_effects
  nvmet: don't defer passthrough commands with trivial effects to the workqueue
  nvmet: set the LBCC bit for commands that modify data
  nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
  nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
  docs, nvme: add a feature and quirk policy document

bpf: rename list_head -> graph_root in field info types

Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:

include/linux/bpf.h:
  struct btf_field_list_head -> struct btf_field_graph_root
  list_head -> graph_root in struct btf_field union
kernel/bpf/btf.c:
  list_head -> graph_root in struct btf_field_info

This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.

Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

kconfig: Add static text for search information in help menu

Add few static text to explain how one can bring up the search dialog
box by pressing the forward slash key anywhere on this interface.

Signed-off-by: Bhaskar Chowdhury <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

bpf: Always use maximal size for copy_array()

Instead of counting on prior allocations to have sized allocations to
the next kmalloc bucket size, always perform a krealloc that is at least
ksize(dst) in size (which is a no-op), so the size can be correctly
tracked by all the various allocation size trackers (KASAN,
__alloc_size, etc).

Reported-by: Hyunwoo Kim <[email protected]>
Link: https://lore.kernel.org/bpf/20221223094551.GA1439509@ubuntu
Fixes: ceb35b666d42 ("bpf/verifier: Use kmalloc_size_roundup() to match ksize() usage")
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge branch 'bpf: fix the crash caused by task iterators over vma'

Kui-Feng Lee says:

====================

This issue is related to task iterators over vma. A system crash can
occur when a task iterator travels through vma of tasks as the death
of a task will clear the pointer to its mm, even though the
task_struct is still held. As a result, an unexpected crash happens
due to a null pointer. To address this problem, a reference to mm is
kept on the iterator to make sure that the pointer is always
valid. This patch set provides a solution for this crash by properly
referencing mm on task iterators over vma.

The major changes from v1 are:

- Fix commit logs of the test case.

- Use reverse Christmas tree coding style.

- Remove unnecessary error handling for time().

v1: https://lore.kernel.org/bpf/20221216015912 [email protected]/
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: add a test for iter/task_vma for short-lived processes

When a task iterator traverses vma(s), it is possible task->mm might
become invalid in the middle of traversal and this may cause kernel
misbehave (e.g., crash)

This test case creates iterators repeatedly and forks short-lived
processes in the background to detect this bug. The test will last
for 3 seconds to get the chance to trigger the issue.

Signed-off-by: Kui-Feng Lee <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: keep a reference to the mm, in case the task is dead.

Fix the system crash that happens when a task iterator travel through
vma of tasks.

In task iterators, we used to access mm by following the pointer on
the task_struct; however, the death of a task will clear the pointer,
even though we still hold the task_struct. That can cause an
unexpected crash for a null pointer when an iterator is visiting a
task that dies during the visit. Keeping a reference of mm on the
iterator ensures we always have a valid pointer to mm.

Co-developed-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Reported-by: Nathan Slingerland <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

libbpf: fix errno is overwritten after being closed.

In the ensure_good_fd function, if the fcntl function succeeds but
the close function fails, ensure_good_fd returns a normal fd and
sets errno, which may cause users to misunderstand. The close
failure is not a serious problem, and the correct FD has been
handed over to the upper-layer application. Let's restore errno here.

Signed-off-by: Xin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: Temporarily disable part of btf_dump:var_data test.

Commit 7443b296e699 ("x86/percpu: Move cpu_number next to current_task")
moved global per_cpu variable 'cpu_number' into pcpu_hot structure.
Therefore this part of var_data test is no longer valid.
Disable it until better solution is found.

Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: Fix panic due to wrong pageattr of im->image

In the scenario where livepatch and kretfunc coexist, the pageattr of
im->image is rox after arch_prepare_bpf_trampoline in
bpf_trampoline_update, and then modify_fentry or register_fentry returns
-EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag
will be configured, and arch_prepare_bpf_trampoline will be re-executed.

At this time, because the pageattr of im->image is rox,
arch_prepare_bpf_trampoline will read and write im->image, which causes
a fault. as follows:

  insmod livepatch-sample.ko    # samples/livepatch/livepatch-sample.c
  bpftrace -e 'kretfunc:cmdline_proc_show {}'

BUG: unable to handle page fault for address: ffffffffa0206000
PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 2 PID: 270 Comm: bpftrace Tainted: G            E K    6.1.0 #5
RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0
RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202
RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030
RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400
R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8
R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10
FS:  00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bpf_trampoline_update+0x25a/0x6b0
__bpf_trampoline_link_prog+0x101/0x240
bpf_trampoline_link_prog+0x2d/0x50
bpf_tracing_prog_attach+0x24c/0x530
bpf_raw_tp_link_attach+0x73/0x1d0
__sys_bpf+0x100e/0x2570
__x64_sys_bpf+0x1c/0x30
do_syscall_64+0x5b/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

With this patch, when modify_fentry or register_fentry returns -EAGAIN
from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset
to nx+rw.

Cc: [email protected]
Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Chuang Wang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

net/mlx5: Lag, fix failure to cancel delayed bond work

Commit 0d4e8ed139d8 ("net/mlx5: Lag, avoid lockdep warnings")
accidentally removed a call to cancel delayed bond work thus it may
cause queued delay to expire and fall on an already destroyed work
queue.

Fix by restoring the call cancel_delayed_work_sync() before
destroying the workqueue.

This prevents call trace such as this:

[  329.230417] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  329.231444] #PF: supervisor write access in kernel mode
[  329.232233] #PF: error_code(0x0002) - not-present page
[  329.233007] PGD 0 P4D 0
[  329.233476] Oops: 0002 [#1] SMP
[  329.234012] CPU: 5 PID: 145 Comm: kworker/u20:4 Tainted: G OE      6.0.0-rc5_mlnx #1
[  329.235282] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  329.236868] Workqueue: mlx5_cmd_0000:08:00.1 cmd_work_handler [mlx5_core]
[  329.237886] RIP: 0010:_raw_spin_lock+0xc/0x20
[  329.238585] Code: f0 0f b1 17 75 02 f3 c3 89 c6 e9 6f 3c 5f ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 f3 c3 89 c6 e9 45 3c 5f ff 0f 1f 44 00 00 0f 1f
[  329.241156] RSP: 0018:ffffc900001b0e98 EFLAGS: 00010046
[  329.241940] RAX: 0000000000000000 RBX: ffffffff82374ae0 RCX: 0000000000000000
[  329.242954] RDX: 0000000000000001 RSI: 0000000000000014 RDI: 0000000000000000
[  329.243974] RBP: ffff888106ccf000 R08: ffff8881004000c8 R09: ffff888100400000
[  329.244990] R10: 0000000000000000 R11: ffffffff826669f8 R12: 0000000000002000
[  329.246009] R13: 0000000000000005 R14: ffff888100aa7ce0 R15: ffff88852ca80000
[  329.247030] FS:  0000000000000000(0000) GS:ffff88852ca80000(0000) knlGS:0000000000000000
[  329.248260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  329.249111] CR2: 0000000000000000 CR3: 000000016d675001 CR4: 0000000000770ee0
[  329.250133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  329.251152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  329.252176] PKRU: 55555554

Fixes: 0d4e8ed139d8 ("net/mlx5: Lag, avoid lockdep warnings")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option

The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.

Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.

Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.

Fixes: 9272e3df3023 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>