Git Repo - linux.git/log

Fonts: Replace discarded const qualifier

Commit 6735b4632def ("Fonts: Support FONT_EXTRA_WORDS macros for built-in
fonts") introduced the following error when building rpc_defconfig (only
this build appears to be affected):

`acorndata_8x8' referenced in section `.text' of arch/arm/boot/compressed/ll_char_wr.o:
    defined in discarded section `.data' of arch/arm/boot/compressed/font.o
`acorndata_8x8' referenced in section `.data.rel.ro' of arch/arm/boot/compressed/font.o:
    defined in discarded section `.data' of arch/arm/boot/compressed/font.o
make[3]: *** [/scratch/linux/arch/arm/boot/compressed/Makefile:191: arch/arm/boot/compressed/vmlinux] Error 1
make[2]: *** [/scratch/linux/arch/arm/boot/Makefile:61: arch/arm/boot/compressed/vmlinux] Error 2
make[1]: *** [/scratch/linux/arch/arm/Makefile:317: zImage] Error 2

The .data section is discarded at link time.  Reinstating acorndata_8x8 as
const ensures it is still available after linking.  Do the same for the
other 12 built-in fonts as well, for consistency purposes.

Cc: <[email protected]>
Cc: Russell King <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Fixes: 6735b4632def ("Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts")
Signed-off-by: Lee Jones <[email protected]>
Co-developed-by: Peilin Ye <[email protected]>
Signed-off-by: Peilin Ye <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4

The current arm64 default config limits max NUMA nodes available on
system to 4 (NODES_SHIFT = 2). Today's arm64 systems can reach or
exceed 16 NUMA nodes. To accomodate current hardware and to fit
NODES_SHIFT within page flags on arm64, increase NODES_SHIFT to 4.

Signed-off-by: Vanshidhar Konda <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

nvme-tcp: avoid repeated request completion

The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.

Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Chao Leng <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-rdma: avoid repeated request completion

The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.

Signed-off-by: Sagi Grimberg <[email protected]>
Signed-off-by: Chao Leng <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-tcp: avoid race between time out and tear down

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.

Signed-off-by: Chao Leng <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-rdma: avoid race between time out and tear down

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.

Signed-off-by: Chao Leng <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme: introduce nvme_sync_io_queues

Introduce sync io queues for some scenarios which just only need sync
io queues not sync all queues.

Signed-off-by: Chao Leng <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

USB: Add NO_LPM quirk for Kingston flash drive

In Bugzilla #208257, Julien Humbert reports that a 32-GB Kingston
flash drive spontaneously disconnects and reconnects, over and over.
Testing revealed that disabling Link Power Management for the drive
fixed the problem.

This patch adds a quirk entry for that drive to turn off LPM permanently.

CC: Hans de Goede <[email protected]>
CC: <[email protected]>
Reported-and-tested-by: Julien Humbert <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

mei: protect mei_cl_mtu from null dereference

A receive callback is queued while the client is still connected
but can still be called after the client was disconnected. Upon
disconnect cl->me_cl is set to NULL, hence we need to check
that ME client is not-NULL in mei_cl_mtu to avoid
null dereference.

Cc: <[email protected]>
Signed-off-by: Alexander Usyskin <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/vc4: drv: Remove unused variable

The commit dcda7c28bff2 ("drm/vc4: kms: Add functions to create the state
objects") removed the last users of the vc4 variable, but didn't remove
that variable resulting in a warning.

Fixes: dcda7c28bff2 ("drm/vc4: kms: Add functions to create the state objects")
Reported-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/panfrost: Fix module unload

When unloading the call to pm_runtime_put_sync_suspend() will attempt to
turn the GPU cores off, however panfrost_device_fini() will have turned
the clocks off. This leads to the hardware locking up.

Instead don't call pm_runtime_put_sync_suspend() and instead simply mark
the device as suspended using pm_runtime_set_suspended(). And also
include this on the error path in panfrost_probe().

Fixes: aebe8c22a912 ("drm/panfrost: Fix possible suspend in panfrost_remove")
Signed-off-by: Steven Price <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/panfrost: Fix a deadlock between the shrinker and madvise path

panfrost_ioctl_madvise() and panfrost_gem_purge() acquire the mappings
and shmem locks in different orders, thus leading to a potential
the mappings lock first.

Fixes: bdefca2d8dc0 ("drm/panfrost: Add the panfrost_gem_mapping concept")
Cc: <[email protected]>
Cc: Christian Hewitt <[email protected]>
Reported-by: Christian Hewitt <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

sfp: Fix error handing in sfp_probe()

gpiod_to_irq() never return 0, but returns negative in
case of error, check it and set gpio_irq to 0.

Fixes: 73970055450e ("sfp: add SFP module support")
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

powerpc/vnic: Extend "failover pending" window

Commit 5a18e1e0c193b introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:

        a. we get a transport event due to a FAILOVER

        b. later, we get CRQ_INITIALIZED indicating the partner is
           ready  at which point we schedule a FAILOVER reset.

and ->failover_pending is true during this window.

If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.

This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.

This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.

We could try and block the open until failover is completed but a) that
could still timeout the application and b) Existing code "pretends" that
failover occurred "just after" open succeeded, so marks the open successful
and lets the failover complete the open. So, mark the open successful even
if the transport event occurs before we actually start the open.

Fixes: 5a18e1e0c193 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Acked-by: Dany Madden <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

RDMA/vmw_pvrdma: Fix the active_speed and phys_state value

The pvrdma_port_attr structure is ABI toward the hypervisor, changing it
breaks the ability to report the speed properly. Revert the change to u16.

Fixes: 376ceb31ff87 ("RDMA: Fix link active_speed size")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Vishnu Dasa <[email protected]>
Signed-off-by: Adit Ranadive <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

net: dsa: qca8k: Fix port MTU setting

The qca8k only supports a switch-wide MTU setting, and the code to take
the max of all ports was only looking at the port currently being set.
Fix to examine all ports.

Reported-by: DENG Qingfang <[email protected]>
Fixes: f58d2598cf70 ("net: dsa: qca8k: implement the port MTU callbacks")
Signed-off-by: Jonathan McDowell <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

scsi: mpt3sas: Fix timeouts observed while reenabling IRQ

While reenabling the IRQ after irq poll there may be small time window
where HBA firmware has posted some replies and raise the interrupts but
driver has not received the interrupts. So we may observe I/O timeouts as
the driver has not processed the replies as interrupts got missed while
reenabling the IRQ.

To fix this issue the driver has to go for one more round of processing the
reply descriptors from reply descriptor post queue after enabling the IRQ.

Link: https://lore.kernel.org/r/[email protected]
Reported-by: Tomas Henzl <[email protected]>
Reviewed-by: Tomas Henzl <[email protected]>
Signed-off-by: Sreekanth Reddy <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()

alua_bus_detach() might be running concurrently with alua_rtpg_work(), so
we might trip over h->sdev == NULL and call BUG_ON(). The correct way of
handling it is to not set h->sdev to NULL in alua_bus_detach(), and call
rcu_synchronize() before the final delete to ensure that all concurrent
threads have left the critical section. Then we can get rid of the
BUG_ON() and replace it with a simple if condition.

Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Cc: Brian Bunker <[email protected]>
Acked-by: Brian Bunker <[email protected]>
Tested-by: Jitendra Khasdev <[email protected]>
Reviewed-by: Jitendra Khasdev <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms

Commit 978aa0474115 ("sctp: fix some type cast warnings introduced since
very beginning")' broke err reading from sctp_arg, because it reads the
value as 32-bit integer, although the value is stored as 16-bit integer.
Later this value is passed to the userspace in 16-bit variable, thus the
user always gets 0 on big-endian platforms. Fix it by reading the __u16
field of sctp_arg union, as reading err field would produce a sparse
warning.

Fixes: 978aa0474115 ("sctp: fix some type cast warnings introduced since very beginning")
Signed-off-by: Petr Malat <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tty: make FONTX ioctl use the tty pointer they were actually passed

Some of the font tty ioctl's always used the current foreground VC for
their operations. Don't do that then.

This fixes a data race on fg_console.

Side note: both Michael Ellerman and Jiri Slaby point out that all these
ioctls are deprecated, and should probably have been removed long ago,
and everything seems to be using the KDFONTOP ioctl instead.

In fact, Michael points out that it looks like busybox's loadfont
program seems to have switched over to using KDFONTOP exactly _because_
of this bug (ahem.. 12 years ago ;-).

Reported-by: Minh Yuan <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Acked-by: Jiri Slaby <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"Subsystems affected by this patch series: mm (memremap, memcg,
  slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap),
  kthread, signals, lib, epoll, and core-kernel"

* emailed patches from Andrew Morton <[email protected]>:
  kernel/hung_task.c: make type annotations consistent
  epoll: add a selftest for epoll timeout race
  mm: always have io_remap_pfn_range() set pgprot_decrypted()
  mm, oom: keep oom_adj under or at upper limit when printing
  kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
  mm/truncate.c: make __invalidate_mapping_pages() static
  lib/crc32test: remove extra local_irq_disable/enable
  ptrace: fix task_join_group_stop() for the case when current is traced
  mm: mempolicy: fix potential pte_unmap_unlock pte error
  kasan: adopt KUNIT tests to SW_TAGS mode
  mm: memcg: link page counters to root if use_hierarchy is false
  mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
  hugetlb_cgroup: fix reservation accounting
  mm/mremap_pages: fix static key devmap_managed_key updates

libbpf, hashmap: Fix undefined behavior in hash_bits

If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.

Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <[email protected]>,
Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net: ethernet: ti: cpsw: disable PTPv1 hw timestamping advertisement

The TI CPTS does not natively support PTPv1, only PTPv2. But, as it
happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because
CPTS HW parser looks for PTP messageType id in PTP message octet 0 which
value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1
and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1
messages (Delay_req/resp) and will return PTP messageType id 0 for them.

The commit e9523a5a32a1 ("net: ethernet: ti: cpsw: enable
HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping
advertisement by mistake, only to make Linux Kernel "timestamping" utility
work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW
timestamped, but Delay_req/resp are not.

Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1
compatible HW/SW can properly roll back to SW timestamping.

Fixes: e9523a5a32a1 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter")
Signed-off-by: Grygorii Strashko <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vfio/fsl-mc: return -EFAULT if copy_to_user() fails

The copy_to_user() function returns the number of bytes remaining to be
copied, but this code should return -EFAULT.

Fixes: df747bcd5b21 ("vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Diana Craciun <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

vfio/type1: Use the new helper to find vfio_group

When attaching a new group to the container, let's use the new helper
vfio_iommu_find_iommu_group() to check if it's already attached. There
is no functional change.

Also take this chance to add a missing blank line.

Signed-off-by: Zenghui Yu <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>

tracing: Make -ENOMEM the default error for parse_synth_field()

parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.

Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.

Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ring-buffer: Fix recursion protection transitions between interrupt context

The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.

Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.

Cc: [email protected]
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

gfs2: Don't call cancel_delayed_work_sync from within delete work function

Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work. When that happens, it will result in a
deadlock. Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).

Reported-by: Alexander Ahring Oder Aring <[email protected]>
Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: [email protected] # v5.8+
Signed-off-by: Andreas Gruenbacher <[email protected]>

kernel/hung_task.c: make type annotations consistent

Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.

It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.

Hence, sparse complains:

  kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))

Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:

  kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
  kernel/hung_task.c:277:52:    expected void *
  kernel/hung_task.c:277:52:    got void [noderef] __user *buffer

No functional change. No change in object code.

Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrey Ignatov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

epoll: add a selftest for epoll timeout race

Add a test case to ensure an event is observed by at least one poller
when an epoll timeout is used.

Signed-off-by: Guantao Liu <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Khazhismel Kumykov <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: always have io_remap_pfn_range() set pgprot_decrypted()

The purpose of io_remap_pfn_range() is to map IO memory, such as a
memory mapped IO exposed through a PCI BAR. IO devices do not
understand encryption, so this memory must always be decrypted.
Automatically call pgprot_decrypted() as part of the generic
implementation.

This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail.
The CPU will encrypt access to those BAR pages instead of passing
unencrypted IO directly to the device.

Places not mapping IO should use remap_pfn_range().

Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brijesh Singh <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: "Dave Young" <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Toshimitsu Kani <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm, oom: keep oom_adj under or at upper limit when printing

For oom_score_adj values in the range [942,999], the current
calculations will print 16 for oom_adj. This patch simply limits the
output so output is inline with docs.

Signed-off-by: Charles Haithcock <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

CPU0 CPU1
kthread_cancel_delayed_work_sync()
   __kthread_cancel_work_sync()
     __kthread_cancel_work()
        work->canceling++;
      kthread_delayed_work_timer_fn()
   kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling is
set.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/truncate.c: make __invalidate_mapping_pages() static

Fix the following sparse warning:

mm/truncate.c:531:15: warning: symbol '__invalidate_mapping_pages' was not declared. Should it be static?

Fixes: eb1d7a65f08a ("mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED")
Signed-off-by: Jason Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Yafang Shao <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

lib/crc32test: remove extra local_irq_disable/enable

Commit 4d004099a668 ("lockdep: Fix lockdep recursion") uncovered the
following issue in lib/crc32test reported on s390:

  BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1
  caller is lockdep_hardirqs_on_prepare+0x48/0x270
  CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.9.0-next-20201015-15164-g03d992bd2de6 #19
  Hardware name: IBM 3906 M04 704 (LPAR)
  Call Trace:
    lockdep_hardirqs_on_prepare+0x48/0x270
    trace_hardirqs_on+0x9c/0x1b8
    crc32_test.isra.0+0x170/0x1c0
    crc32test_init+0x1c/0x40
    do_one_initcall+0x40/0x130
    do_initcalls+0x126/0x150
    kernel_init_freeable+0x1f6/0x230
    kernel_init+0x22/0x150
    ret_from_fork+0x24/0x2c
  no locks held by swapper/0/1.

Remove extra local_irq_disable/local_irq_enable helpers calls.

Fixes: 5fb7f87408f1 ("lib: add module support to crc32 tests")
Signed-off-by: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: https://lkml.kernel.org/r/patch.git-4369da00c06e.your-ad-here.call-01602859837-ext-1679@work.hours
Signed-off-by: Linus Torvalds <[email protected]>

ptrace: fix task_join_group_stop() for the case when current is traced

This testcase

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>

void *tf(void *arg)
{
return NULL;
}

int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);

pthread_t th;
pthread_create(&th, NULL, tf, NULL);

return 0;
}

waitpid(pid, NULL, WSTOPPED);

ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);

ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);

int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);

return 0;
}

fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().

This is because task_join_group_stop() has 2 problems when current is traced:

1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
   can be woken up by debugger and it can clone another thread which
   should join the group-stop.

   We need to check group_stop_count || SIGNAL_STOP_STOPPED.

2. If SIGNAL_STOP_STOPPED is already set, we should not increment
   sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
   should stop without another do_notify_parent_cldstop() report.

To clarify, the problem is very old and we should blame
ptrace_init_task().  But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.

Reported-by: [email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Zhiqiang Liu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: mempolicy: fix potential pte_unmap_unlock pte error

When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.

queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..

Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <[email protected]>
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Feilong Lin <[email protected]>
Cc: Shijie Luo <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

kasan: adopt KUNIT tests to SW_TAGS mode

Now that we have KASAN-KUNIT tests integration, it's easy to see that
some KASAN tests are not adopted to the SW_TAGS mode and are failing.

Adjust the allocation size for kasan_memchr() and kasan_memcmp() by
roung it up to OOB_TAG_OFF so the bad access ends up in a separate
memory granule.

Add a new kmalloc_uaf_16() tests that relies on UAF, and a new
kasan_bitops_tags() test that is tailored to tag-based mode, as it's
hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic()
(renamed from kasan_bitops()) without losing the precision.

Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs,
as it's hard to adopt the existing kmalloc_oob_16() and
kasan_bitops_oob() (rename from kasan_bitops()) without losing the
precision.

Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS
mode doesn't instrument globals nor dynamic allocas.

Signed-off-by: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: David Gow <[email protected]>
Link: https://lkml.kernel.org/r/76eee17b6531ca8b3ca92b240cb2fd23204aaff7.1603129942.git.andreyknvl@google.com
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg: link page counters to root if use_hierarchy is false

Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):

  WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
  Modules linked in:
  CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
  Workqueue: events drain_local_stock
  RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
  Call Trace:
    __memcg_kmem_uncharge (mm/memcontrol.c:3022)
    drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
    drain_local_stock (mm/memcontrol.c:2255)
    process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
    worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:292)
    ret_from_fork (arch/x86/entry/entry_64.S:300)

The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.

After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example.  It leads to an eventual underflow of the charge and
triggers a warning.

Fix it by linking all page counters to corresponding root page counters
in the non-hierarchical mode.

Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more
than 1 level.  This patch doesn't change it.

The patch also doesn't affect how the hierarchical mode is working,
which is the only sane and truly supported mode now.

Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!

Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Reported-by: <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Debugged-by: Richard Palethorpe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg

memcg_page_state will get the specified number in hierarchical memcg, It
should multiply by HPAGE_PMD_NR rather than an page if the item is
NR_ANON_THPS.

[[email protected]: fix printk warning]
[[email protected]: use u64 cast, per Michal]

Fixes: 468c398233da ("mm: memcontrol: switch to native NR_ANON_THPS counter")
Signed-off-by: zhongjiang-ali <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Link: https://lkml.kernel.org/r/1603722395-72443-1-git-send-email-zhongjiang-ali@linux.alibaba.com
Signed-off-by: Linus Torvalds <[email protected]>

hugetlb_cgroup: fix reservation accounting

Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below.  QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM.  The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.

  WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
  Modules linked in: ...
  CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
  Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
  RIP: 0010:page_counter_uncharge+0x4b/0x50
  ...
  Call Trace:
    hugetlb_cgroup_uncharge_file_region+0x4b/0x80
    region_del+0x1d3/0x300
    hugetlb_unreserve_pages+0x39/0xb0
    remove_inode_hugepages+0x1a8/0x3d0
    hugetlbfs_fallocate+0x3c4/0x5c0
    vfs_fallocate+0x146/0x290
    __x64_sys_fallocate+0x3e/0x70
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Investigation of the issue uncovered bugs in hugetlb cgroup reservation
accounting.  This patch addresses the found issues.

Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: Michal Privoznik <[email protected]>
Co-developed-by: David Hildenbrand <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Michal Privoznik <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Tejun Heo <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/mremap_pages: fix static key devmap_managed_key updates

commit 6f42193fd86e ("memremap: don't use a separate devm action for
devmap_managed_enable_get") changed the static key updates such that we
now call devmap_managed_enable_put() without doing the equivalent
devmap_managed_enable_get().

devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
types too.  This results in the below warning when switching between
system-ram and devdax mode for devdax namespace.

   jump label: negative count!
   WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
   Modules linked in:
   ....

   NIP static_key_slow_try_dec+0x88/0xa0
   LR static_key_slow_try_dec+0x84/0xa0
   Call Trace:
     static_key_slow_try_dec+0x84/0xa0
     __static_key_slow_dec_cpuslocked+0x34/0xd0
     static_key_slow_dec+0x54/0xf0
     memunmap_pages+0x36c/0x500
     devm_action_release+0x30/0x50
     release_nodes+0x2f4/0x3e0
     device_release_driver_internal+0x17c/0x280
     bus_remove_device+0x124/0x210
     device_del+0x1d4/0x530
     unregister_dev_dax+0x48/0xe0
     devm_action_release+0x30/0x50
     release_nodes+0x2f4/0x3e0
     device_release_driver_internal+0x17c/0x280
     unbind_store+0x130/0x170
     drv_attr_store+0x40/0x60
     sysfs_kf_write+0x6c/0xb0
     kernfs_fop_write+0x118/0x280
     vfs_write+0xe8/0x2a0
     ksys_write+0x84/0x140
     system_call_exception+0x120/0x270
     system_call_common+0xf0/0x27c

Reported-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

ARC: [plat-hsdk] Remap CCMs super early in asm boot trampoline

ARC HSDK platform stopped booting on released v5.10-rc1, getting stuck
in startup of non master SMP cores.

This was bisected to upstream commit 7fef431be9c9ac25
"(mm/page_alloc: place pages to tail in __free_pages_core())"
That commit itself is harmless, it just exposed a subtle assumption in
our platform code (hence CC'ing linux-mm just as FYI in case some other
arches / platforms trip on it).

The upstream commit is semantically disruptive as it reverses the order
of page allocations (actually it can be good test for hardware
verification to exercise different memory patterns altogether).
For ARC HSDK platform that meant a remapped memory region (pertaining to
unused Closely Coupled Memory) started getting used early for dynamice
allocations, while not effectively remapped on all the cores, triggering
memory error exception on those cores.

The fix is to move the CCM remapping from early platform code to to early core
boot code. And while it is undesirable to riddle common boot code with
platform quirks, there is no other way to do this since the faltering code
involves setting up stack itself so even function calls are not allowed at
that point.

If anyone is interested, all the gory details can be found at Link below.

Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/32
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Signed-off-by: Vineet Gupta <[email protected]>

ARC: stack unwinding: avoid indefinite looping

Currently stack unwinder is a while(1) loop which relies on the dwarf
unwinder to signal termination, which in turn relies on dwarf info to do
so. This in theory could cause an infinite loop if the dwarf info was
somehow messed up or the register contents were etc.

This fix thus detects the excessive looping and breaks the loop.

| Mem: 26184K used, 1009136K free, 0K shrd, 0K buff, 14416K cached
| CPU:  0.0% usr 72.8% sys  0.0% nic 27.1% idle  0.0% io  0.0% irq  0.0% sirq
| Load average: 4.33 2.60 1.11 2/74 139
|   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
|   133     2 root     SWN      0  0.0   3 22.9 [rcu_torture_rea]
|   132     2 root     SWN      0  0.0   0 22.0 [rcu_torture_rea]
|   131     2 root     SWN      0  0.0   3 21.5 [rcu_torture_rea]
|   126     2 root     RW       0  0.0   2  5.4 [rcu_torture_wri]
|   129     2 root     SWN      0  0.0   0  0.2 [rcu_torture_fak]
|   137     2 root     SW       0  0.0   0  0.2 [rcu_torture_cbf]
|   127     2 root     SWN      0  0.0   0  0.1 [rcu_torture_fak]
|   138   115 root     R     1464  0.1   2  0.1 top
|   130     2 root     SWN      0  0.0   0  0.1 [rcu_torture_fak]
|   128     2 root     SWN      0  0.0   0  0.1 [rcu_torture_fak]
|   115     1 root     S     1472  0.1   1  0.0 -/bin/sh
|   104     1 root     S     1464  0.1   0  0.0 inetd
|     1     0 root     S     1456  0.1   2  0.0 init
|    78     1 root     S     1456  0.1   0  0.0 syslogd -O /var/log/messages
|   134     2 root     SW       0  0.0   2  0.0 [rcu_torture_sta]
|    10     2 root     IW       0  0.0   1  0.0 [rcu_preempt]
|    88     2 root     IW       0  0.0   1  0.0 [kworker/1:1-eve]
|    66     2 root     IW       0  0.0   2  0.0 [kworker/2:2-eve]
|    39     2 root     IW       0  0.0   2  0.0 [kworker/2:1-eve]
| unwinder looping too long, aborting !

Cc: <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

IB/srpt: Fix memory leak in srpt_add_one

Failure in srpt_refresh_port() for the second port will leave MAD
registered for the first one, however, the srpt_add_one() will be marked
as "failed" and SRPT will leak resources for that registered but not used
and released first port.

Unregister the MAD agent for all ports in case of failure.

Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge branch 'dpaa_eth-buffer-layout-fixes'

Camelia Groza says:

====================
dpaa_eth: buffer layout fixes

The patches are related to the software workaround for the A050385 erratum.
The first patch ensures optimal buffer usage for non-erratum scenarios. The
second patch fixes a currently inconsequential discrepancy between the
FMan and Ethernet drivers.

Changes in v3:
- refactor defines for clarity in 1/2
- add more details on the user impact in 1/2
- remove unnecessary inline identifier in 2/2

Changes in v2:
- make the returned value for TX ports explicit in 2/2
- simplify the buf_layout reference in 2/2
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dpaa_eth: fix the RX headroom size alignment

The headroom reserved for received frames needs to be aligned to an
RX specific value. There is currently a discrepancy between the values
used in the Ethernet driver and the values passed to the FMan.
Coincidentally, the resulting aligned values are identical.

Fixes: 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround")
Acked-by: Willem de Bruijn <[email protected]>
Signed-off-by: Camelia Groza <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

dpaa_eth: update the buffer layout for non-A050385 erratum scenarios

Impose a larger RX private data area only when the A050385 erratum is
present on the hardware. A smaller buffer size is sufficient in all
other scenarios. This enables a wider range of linear Jumbo frame
sizes in non-erratum scenarios, instead of turning to multi
buffer Scatter/Gather frames. The maximum linear frame size is
increased by 128 bytes for non-erratum arm64 platforms.

Cleanup the hardware annotations header defines in the process.

Fixes: 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385 workaround")
Signed-off-by: Camelia Groza <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

RDMA: Fix software RDMA drivers for dma mapping error

The commit f959dcd6ddfd ("dma-direct: Fix potential NULL pointer
dereference") made dma_mask as mandetory field to be setup even for
dma_virt_ops based dma devices. The commit in the fixes tag omitted
setting up the dma_mask on virtual devices triggering the below trace when
they were combined during the merge window.

Fix it by setting empty DMA MASK for software based RDMA devices.

  WARNING: CPU: 1 PID: 8488 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x493/0x700
  CPU: 1 PID: 8488 Comm: syz-executor144 Not tainted 5.9.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
  Trace:
   dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
   ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
   ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715
   ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
   ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
   ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
   add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
   enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301
   ib_register_device drivers/infiniband/core/device.c:1376 [inline]
   ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
   rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182
   rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
   rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
   rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
   rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
   nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
   rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
   rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
   rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
   netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
   netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
   netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
   sock_sendmsg_nosec net/socket.c:651 [inline]
   sock_sendmsg+0xcf/0x120 net/socket.c:671
   ____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
   ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
   __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x443699

Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Fixes: e0477b34d9d1 ("RDMA: Explicitly pass in the dma_device to ib_register_device")
Signed-off-by: Parav Pandit <[email protected]>
Tested-by: Guoqing Jiang <[email protected]>
Tested-by: Dennis Dalessandro <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Zhu Yanjun <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Revert "nvme-pci: remove last_sq_tail"

Multiple CPUs may be mapped to the same hctx, allowing mulitple
submission contexts to attempt commit_rqs(). We need to verify we're
not writing the same doorbell value multiple times since that's a spec
violation.

Revert commit 54b2fcee1db041a83b52b51752dade6090cf952f.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1878596
Reported-by: "B.L. Jones" <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

Merge tag 'mac80211-for-net-2020-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of fixes, for
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
====================

Signed-off-by: Jakub Kicinski <[email protected]>

of: Drop superfluous ULL suffix for ~0

There is no need to specify a "ULL" suffix for "all bits set": "~0" is
sufficient, and works regardless of type. In fact adding the suffix
makes the code more fragile.

Fixes: 48ab6d5d1f09 ("dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n")
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

PM: runtime: Resume the device earlier in __device_release_driver()

Since the device is resumed from runtime-suspend in
__device_release_driver() anyway, it is better to do that before
looking for busy managed device links from it to consumers, because
if there are any, device_links_unbind_consumers() will be called
and it will cause the consumer devices' drivers to unbind, so the
consumer devices will be runtime-resumed. In turn, resuming each
consumer device will cause the supplier to be resumed and when the
runtime PM references from the given consumer to it are dropped, it
may be suspended. Then, the runtime-resume of the next consumer
will cause the supplier to resume again and so on.

Update the code accordingly.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Fixes: 9ed9895370ae ("driver core: Functional dependencies tracking support")
Cc: All applicable <[email protected]> # All applicable
Tested-by: Xiang Chen <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>

PM: runtime: Drop pm_runtime_clean_up_links()

After commit d12544fb2aa9 ("PM: runtime: Remove link state checks in
rpm_get/put_supplier()") nothing prevents the consumer device's
runtime PM from acquiring additional references to the supplier
device after pm_runtime_clean_up_links() has run (or even while it
is running), so calling this function from __device_release_driver()
may be pointless (or even harmful).

Moreover, it ignores stateless device links, so the runtime PM
handling of managed and stateless device links is inconsistent
because of it, so better get rid of it entirely.

Fixes: d12544fb2aa9 ("PM: runtime: Remove link state checks in rpm_get/put_supplier()")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: 5.1+ <[email protected]> # 5.1+
Tested-by: Xiang Chen <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>

PM: runtime: Drop runtime PM references to supplier on link removal

While removing a device link, drop the supplier device's runtime PM
usage counter as many times as needed to drop all of the runtime PM
references to it from the consumer in addition to dropping the
consumer's link count.

Fixes: baa8809f6097 ("PM / runtime: Optimize the use of device links")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: 5.1+ <[email protected]> # 5.1+
Tested-by: Xiang Chen <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>

powercap/intel_rapl: remove unneeded semicolon

A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Documentation: PM: cpuidle: correct path name

cpu/ is needed before cpu<N>/

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Documentation: PM: cpuidle: correct typo

cerainly -> certainly

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: schedutil: Don't skip freq update if need_freq_update is set

The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.

For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.

Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.

Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.

This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.

As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.

We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.

Reported-by: zhuguangqing <[email protected]>
Suggested-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
[ rjw: Rearrange code to avoid a branch ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

tracing: Fix the checking of stackidx in __ftrace_trace_stack

The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING
is illegal too. And fix two typos by the way.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qiujun Huang <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

nfsroot: Default mount option should ask for built-in NFS version

Change the nfsroot default mount option to ask for NFSv2 only *if* the
kernel was built with NFSv2 support.
If not, default to NFSv3 or as last choice to NFSv4, depending on actual
kernel config.

Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single

The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets. Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.

Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations")
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"

kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling

memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);

If the allocation must fail, no_iotlb_memory is set.

Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.

When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.

Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.

Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.

Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")
Reported-by: Elliott Mitchell <[email protected]>
Tested-by: Elliott Mitchell <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

ftrace: Handle tracing when switching between context

When an interrupt or NMI comes in and switches the context, there's a delay
from when the preempt_count() shows the update. As the preempt_count() is
used to detect recursion having each context have its own bit get set when
tracing starts, and if that bit is already set, it is considered a recursion
and the function exits. But if this happens in that section where context
has changed but preempt_count() has not been updated, this will be
incorrectly flagged as a recursion.

To handle this case, create another bit call TRANSITION and test it if the
current context bit is already set. Flag the call as a recursion if the
TRANSITION bit is already set, and if not, set it and continue. The
TRANSITION bit will be cleared normally on the return of the function that
set it, or if the current context bit is clear, set it and clear the
TRANSITION bit to allow for another transition between the current context
and an even higher one.

Cc: [email protected]
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

ftrace: Fix recursion check for NMI test

The code that checks recursion will work to only do the recursion check once
if there's nested checks. The top one will do the check, the other nested
checks will see recursion was already checked and return zero for its "bit".
On the return side, nothing will be done if the "bit" is zero.

The problem is that zero is returned for the "good" bit when in NMI context.
This will set the bit for NMIs making it look like *all* NMI tracing is
recursing, and prevent tracing of anything in NMI context!

The simple fix is to return "bit + 1" and subtract that bit on the end to
get the real bit.

Cc: [email protected]
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Fix out of bounds write in get_trace_buf

The nesting count of trace_printk allows for 4 levels of nesting. The
nesting counter starts at zero and is incremented before being used to
retrieve the current context's buffer. But the index to the buffer uses the
nesting counter after it was incremented, and not its original number,
which in needs to do.

Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification")
Signed-off-by: Qiujun Huang <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

scripts: get_api.pl: Add sub-titles to ABI output

Instead of adding titles just for the files, add titles
for each part of the ABI output, in order to make easier
to search for a symbol there.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/64752a5de06ab8263c296e3ed01414b25861e1eb.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <[email protected]>

scripts: get_abi.pl: Don't let ABI files to create subtitles

The ReST output should only contain documentation titles
automatically created by the script.

There are two reasons for that:

1) Consistency.

   just a handful ABI docs define titles

2) To avoid critical errors.

   Docutils (which is the basis for Sphinx) allows a free
   assign of documentation title markups. So, one document
   could be doing things like:

Level 1
=======

Level 2
-------

   While another one could do the reverse:

Level 1
-------

Level 2
=======

   But the same document can't mix.

   As the output of get_abi.pl will join contents from multiple
   files, if they don't define the levels on a consistent errors,
   errors like this can happen:

Sphinx parallel build error:
docutils.utils.SystemMessage: /home/rdunlap/lnx/lnx-510-rc2/Documentation/ABI/testing/sysfs-bus-rapidio:2: (SEVERE/4) Title level inconsistent:

Attributes Common for All RapidIO Devices
-----------------------------------------

   Which cause some versions of Sphinx to go into an endless
   loop.

   It should be noticed that an alternative to that would
   be to replace all title occurrences by a single markup,
   but that will make the parser more complex, and, due to
   (1) it would generate an inconsistent output.

   So, better to just remove the titles defined at the ABI
   files from the output.

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/6c62ef5c01d39dee8d891f8390c816d2a889670a.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <[email protected]>

docs: leds: index.rst: add a missing file

Changeset 26a07553041e ("docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication")
didn't include the needed changes at index.rst.

Fixes: 26a07553041e ("docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication")
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/36a6e3aef6e57ea349f1b47c7731d4cd1e03ca77.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <[email protected]>

docs: ABI: sysfs-class-net: fix a typo

clas->class

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/4bba2a1592df5a9435c8d4757a9abf20246e2a99.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <[email protected]>

docs: ABI: sysfs-driver-dma-ioatdma: what starts with /sys

This is the only file where the /sys doesn't start with
a /.

So, rename them to:

sys -> /sys

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Link: https://lore.kernel.org/r/f4c53fff9696a61ff0e144fee237a9527982626d.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'fixes-for-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

USB: fixes for v5.10-rc2

Nothing major as of yet, we're adding support for Intel Alder Lake-S
in dwc3, together with a few fixes that are quite important (memory
leak in raw-gadget, probe crashes in goku_udc, and so on).

Signed-off-by: Felipe Balbi <[email protected]>
* tag 'fixes-for-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
  usb: raw-gadget: fix memory leak in gadget_setup
  usb: dwc2: Avoid leaving the error_debugfs label unused
  usb: dwc3: ep0: Fix delay status handling
  usb: gadget: fsl: fix null pointer checking
  usb: gadget: goku_udc: fix potential crashes in probe
  usb: dwc3: pci: add support for the Intel Alder Lake-S

drm/vc4: kms: Add functions to create the state objects

In order to make the vc4_kms_load code and error path a bit easier to
read and extend, add functions to create state objects, and use managed
actions to cleanup if needed.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: Use devm_drm_dev_alloc

We can simplify a bit the bind code, its error path and unbind by using
the managed devm_drm_dev_alloc function.

Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: Use the helper to retrieve vc4_dev when needed

We have a helper to retrieve the vc4_dev structure from the drm_device one
when needed, so let's use it consistently.

Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: gem: Add a managed action to cleanup the job queue

The job queue needs to be cleaned up using vc4_gem_destroy, but it's
not used consistently (vc4_drv's bind calls it in its error path, but
doesn't in unbind), and we can make that automatic through a managed
action. Let's remove the requirement to call vc4_gem_destroy.

Fixes: d5b1a78a772f ("drm/vc4: Add support for drawing 3D frames.")
Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: drv: Use managed drm_mode_config_init

Using drmm_mode_config_init instead of drm_mode_config_init allows us to
cleanup a bit the error path.

Signed-off-by: Maxime Ripard <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: bo: Add a managed action to cleanup the cache

The BO cache needs to be cleaned up using vc4_bo_cache_destroy, but it's
not used consistently (vc4_drv's bind calls it in its error path, but
doesn't in unbind), and we can make that automatic through a managed
action. Let's remove the requirement to call vc4_bo_cache_destroy.

Fixes: c826a6e10644 ("drm/vc4: Add a BO cache.")
Acked-by: Daniel Vetter <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

powerpc/smp: Call rcu_cpu_starting() earlier

The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):

  WARNING: suspicious RCU usage
  -----------------------------
  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  RCU used illegally from offline CPU!
  rcu_scheduler_active = 1, debug_locks = 1
  no locks held by swapper/1/0.

  Call Trace:
  dump_stack+0xec/0x144 (unreliable)
  lockdep_rcu_suspicious+0x128/0x14c
  __lock_acquire+0x1060/0x1c60
  lock_acquire+0x140/0x5f0
  _raw_spin_lock_irqsave+0x64/0xb0
  clockevents_register_device+0x74/0x270
  register_decrementer_clockevent+0x94/0x110
  start_secondary+0x134/0x800
  start_secondary_prolog+0x10/0x14

This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:

  It uses a per-CPU variable so that RCU pays attention only to the
  first call to rcu_cpu_starting() if there is more than one of them.
  This is even intentional, due to there being a generic
  arch-independent call to rcu_cpu_starting() in
  notify_cpu_starting().

  So multiple calls to rcu_cpu_starting() are fine by design.

Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
[mpe: Add Fixes tag, reword slightly & expand change log]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/eeh_cache: Fix a possible debugfs deadlock

Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.

        CPU0                    CPU1
        ----                    ----
   lock(&pci_io_addr_cache_root.piar_lock);
                                local_irq_disable();
                                lock(&tp->lock);
                                lock(&pci_io_addr_cache_root.piar_lock);
   <Interrupt>
     lock(&tp->lock);

  *** DEADLOCK ***

  lock_acquire+0x140/0x5f0
  _raw_spin_lock_irqsave+0x64/0xb0
  eeh_addr_cache_insert_dev+0x48/0x390
  eeh_probe_device+0xb8/0x1a0
  pnv_pcibios_bus_add_device+0x3c/0x80
  pcibios_bus_add_device+0x118/0x290
  pci_bus_add_device+0x28/0xe0
  pci_bus_add_devices+0x54/0xb0
  pcibios_init+0xc4/0x124
  do_one_initcall+0xac/0x528
  kernel_init_freeable+0x35c/0x3fc
  kernel_init+0x24/0x148
  ret_from_kernel_thread+0x5c/0x80

  lock_acquire+0x140/0x5f0
  _raw_spin_lock+0x4c/0x70
  eeh_addr_cache_show+0x38/0x110
  seq_read+0x1a0/0x660
  vfs_read+0xc8/0x1f0
  ksys_read+0x74/0x130
  system_call_exception+0xf8/0x1d0
  system_call_common+0xe8/0x218

Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache")
Signed-off-by: Qian Cai <[email protected]>
Reviewed-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Linux 5.10-rc2

Merge tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"Three fixes all related to #DB:

   - Handle the BTF bit correctly so it doesn't get lost due to a kernel
     #DB

   - Only clear and set the virtual DR6 value used by ptrace on user
     space triggered #DB. A kernel #DB must leave it alone to ensure
     data consistency for ptrace.

   - Make the bitmasking of the virtual DR6 storage correct so it does
     not lose DR_STEP"

* tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
  x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
  x86/debug: Fix BTF handling

Merge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"A few fixes for timers/timekeeping:

   - Prevent undefined behaviour in the timespec64_to_ns() conversion
     which is used for converting user supplied time input to
     nanoseconds. It lacked overflow protection.

   - Mark sched_clock_read_begin/retry() to prevent recursion in the
     tracer

   - Remove unused debug functions in the hrtimer and timerlist code"

* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Prevent undefined behaviour in timespec64_to_ns()
  timers: Remove unused inline funtion debug_timer_free()
  hrtimer: Remove unused inline function debug_hrtimer_free()
  time/sched_clock: Mark sched_clock_read_begin/retry() as notrace

Merge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fix from Thomas Gleixner:
"A single fix for stop machine.

  Mark functions no trace to prevent a crash caused by recursion when
  enabling or disabling a tracer on RISC-V (probably all architectures
  which patch through stop machine)"

* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stop_machine, rcu: Mark functions as notrace

Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:

   - Fix incorrect failure injection handling in the fuxtex code

   - Prevent a preemption warning in lockdep when tracking
     local_irq_enable() and interrupts are already enabled

   - Remove more raw_cpu_read() usage from lockdep which causes state
     corruption on !X86 architectures.

   - Make the nr_unused_locks accounting in lockdep correct again"

* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Fix nr_unused_locks accounting
  locking/lockdep: Remove more raw_cpu_read() usage
  futex: Fix incorrect should_fail_futex() handling
  lockdep: Fix preemption WARN for spurious IRQ-enable

Merge tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes/removals from Greg KH:
"Here's some small fixes for 5.10-rc2 and a big driver removal.

  The fixes are for some reported issues in the interconnect and
  coresight drivers, nothing major.

  The "big" driver removal is the MIC drivers have been asked to be
  removed as the hardware never shipped and Intel no longer wants to
  maintain something that no one can use. This is welcomed by many as
  the DMA usage of these drivers was "interesting" and the security
  people were starting to question some issues that were starting to be
  found in the codebase.

  Note, one of the subsystems for this driver, the "VOP" code, will
  probably come back in future kernel versions as it was looking to
  potentially solve some PCIe virtualization issues that a number of
  other vendors were wanting to solve. But as-is, this codebase didn't
  work for anyone else so no actual functionality is being removed.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  coresight: cti: Initialize dynamic sysfs attributes
  coresight: Fix uninitialised pointer bug in etm_setup_aux()
  coresight: add module license
  misc: mic: remove the MIC drivers
  interconnect: qcom: use icc_sync state for sm8[12]50
  interconnect: qcom: Ensure that the floor bandwidth value is enforced
  interconnect: qcom: sc7180: Init BCMs before creating the nodes
  interconnect: qcom: sdm845: Init BCMs before creating the nodes
  interconnect: Aggregate before setting initial bandwidth
  interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM

Merge tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and documentation fixes from Greg KH:
"Here is one tiny debugfs change to fix up an API where the last user
  was successfully fixed up in 5.10-rc1 (so it couldn't be merged
  earlier), and a much larger Documentation/ABI/ update to the files so
  they can be automatically parsed by our tools.

  The Documentation/ABI/ updates are just formatting issues, small ones
  to bring the files into parsable format, and have been acked by
  numerous subsystem maintainers and the documentation maintainer. I
  figured it was good to get this into 5.10-rc2 to help wih the merge
  issues that would arise if these were to stick in linux-next until
  5.11-rc1.

  The debugfs change has been in linux-next for a long time, and the
  Documentation updates only for the last linux-next release"

* tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
  scripts: get_abi.pl: assume ReST format by default
  docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
  docs: ABI: sysfs-class-backlight: unify ABI documentation
  docs: ABI: sysfs-c2port: remove a duplicated entry
  docs: ABI: sysfs-class-power: unify duplicated properties
  docs: ABI: unify /sys/class/leds/<led>/brightness documentation
  docs: ABI: stable: remove a duplicated documentation
  docs: ABI: change read/write attributes
  docs: ABI: cleanup several ABI documents
  docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
  docs: ABI: vdso: use the right format for ABI
  docs: ABI: fix syntax to be parsed using ReST notation
  docs: ABI: convert testing/configfs-acpi to ReST
  docs: Kconfig/Makefile: add a check for broken ABI files
  docs: abi-testing.rst: enable --rst-sources when building docs
  docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
  docs: ABI: create a 2-depth index for ABI
  docs: ABI: make it parse ABI/stable as ReST-compatible files
  docs: ABI: sysfs-uevent: make it compatible with ReST output
  docs: ABI: testing: make the files compatible with ReST output
  ...

Merge tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for issues that have been
  reported in 5.10-rc1:

   - octeon driver fixes

   - wfx driver fixes

   - memory leak fix in vchiq driver

   - fieldbus driver bugfix

   - comedi driver bugfix

  All of these have been in linux-next with no reported issues"

* tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fieldbus: anybuss: jump to correct label in an error path
  staging: wfx: fix test on return value of gpiod_get_value()
  staging: wfx: fix use of uninitialized pointer
  staging: mmal-vchiq: Fix memory leak for vchiq_instance
  staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
  staging: octeon: Drop on uncorrectable alignment or FCS error
  staging: octeon: repair "fixed-link" support

Merge tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes for reported issues
  for 5.10-rc2. They include:

   - vt ioctl bugfix for reported problems

   - fsl_lpuart serial driver fix

   - 21285 serial driver bugfix

  All have been in linux-next with no reported issues"

* tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt_ioctl: fix GIO_UNIMAP regression
  vt: keyboard, extend func_buf_lock to readers
  vt: keyboard, simplify vt_kdgkbsent
  tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
  tty: serial: 21285: fix lockup on open

Merge tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are a number of small bugfixes for reported issues in some USB
  drivers. They include:

   - typec bugfixes

   - xhci bugfixes and lockdep warning fixes

   - cdc-acm driver regression fix

   - kernel doc fixes

   - cdns3 driver bugfixes for a bunch of reported issues

   - other tiny USB driver fixes

  All have been in linux-next with no reported issues"

* tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdns3: gadget: own the lock wrongly at the suspend routine
  usb: cdns3: Fix on-chip memory overflow issue
  usb: cdns3: gadget: suspicious implicit sign extension
  xhci: Don't create stream debugfs files with spinlock held.
  usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
  xhci: Fix sizeof() mismatch
  usb: typec: stusb160x: fix signedness comparison issue with enum variables
  usb: typec: add missing MODULE_DEVICE_TABLE() to stusb160x
  USB: apple-mfi-fastcharge: don't probe unhandled devices
  usbcore: Check both id_table and match() when both available
  usb: host: ehci-tegra: Fix error handling in tegra_ehci_probe()
  usb: typec: stusb160x: fix an IS_ERR() vs NULL check in probe
  usb: typec: tcpm: reset hard_reset_count for any disconnect
  usb: cdc-acm: fix cooldown mechanism
  usb: host: fsl-mph-dr-of: check return of dma_set_mask()
  usb: fix kernel-doc markups
  usb: typec: stusb160x: fix some signedness bugs
  usb: cdns3: Variable 'length' set but not used

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:
   - selftest fix
   - force PTE mapping on device pages provided via VFIO
   - fix detection of cacheable mapping at S2
   - fallback to PMD/PTE mappings for composite huge pages
   - fix accounting of Stage-2 PGD allocation
   - fix AArch32 handling of some of the debug registers
   - simplify host HYP entry
   - fix stray pointer conversion on nVHE TLB invalidation
   - fix initialization of the nVHE code
   - simplify handling of capabilities exposed to HYP
   - nuke VCPUs caught using a forbidden AArch32 EL0

  x86:
   - new nested virtualization selftest
   - miscellaneous fixes
   - make W=1 fixes
   - reserve new CPUID bit in the KVM leaves"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: vmx: remove unused variable
  KVM: selftests: Don't require THP to run tests
  KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
  KVM: selftests: test behavior of unmapped L2 APIC-access address
  KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
  KVM: x86: replace static const variables with macros
  KVM: arm64: Handle Asymmetric AArch32 systems
  arm64: cpufeature: upgrade hyp caps to final
  arm64: cpufeature: reorder cpus_have_{const, final}_cap()
  KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
  KVM: arm64: Force PTE mapping on fault resulting in a device mapping
  KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
  KVM: arm64: Fix masks in stage2_pte_cacheable()
  KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
  KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
  KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
  KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
  KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
  x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID

Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

  - A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836)
  - Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling)
  - "unmapped events" handling for the ti-sci-inta controller
  - Tidying up for the irq-mst driver (static functions, Kconfig)
  - Small cleanup in the Renesas irqpin driver
  - STM32 exti can now handle LP timer events

irqchip/ti-sci-inta: Add support for unmapped event handling

The DMA (BCDMA/PKTDMA and their rings/flows) events are under the INTA's
supervision as unmapped events in AM64.

In order to keep the current SW stack working, the INTA driver must replace
the dev_id with it's own when a request comes for BCDMA or PKTDMA
resources.

Implement parsing of the optional "ti,unmapped-event-sources" phandle array
to get the sci-dev-ids of the devices where the unmapped events originate.

Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling

The new DMA architecture introduced with AM64 introduced new event types:
unampped events.

These events are mapped within INTA in contrast to other K3 devices where
the events with similar function was originating from the UDMAP or ringacc.

The ti,unmapped-event-sources should contain phandle array to the devices
in the system (typically DMA controllers) from where the unmapped events
originate.

Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm

Get rid of the separate flag to indicate if the IRLM bit is present in
the INTC/Interrupt Control Register 0, by considering -1 an invalid
irlm_bit value.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/sifive-plic: Fix chip_data access within a hierarchy

The plic driver crashes in plic_irq_unmask() when the interrupt is within a
hierarchy, as it picks the top-level chip_data instead of its local one.

Using irq_data_get_irq_chip_data() instead of irq_get_chip_data() solves
the issue for good.

Fixes: f1ad1133b18f ("irqchip/sifive-plic: Add support for multiple PLICs")
Signed-off-by: Greentime Hu <[email protected]>
[maz: rewrote commit message]
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Incorrect netlink report logic in flowtable and genID.

2) Add a selftest to check that wireguard passes the right sk
to ip_route_me_harder, from Jason A. Donenfeld.

3) Pass the actual sk to ip_route_me_harder(), also from Jason.

4) Missing expression validation of updates via nft --check.

5) Update byte and packet counters regardless of whether they
match, from Stefano Brivio.
====================

Signed-off-by: Jakub Kicinski <[email protected]>

ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags

The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit 962924fa2b7a ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")

When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

cadence: force nonlinear buffers to be cloned

In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
second device with usb ethernet (r8152) sending ICMP packets.  If the packet
was larger than about 220 bytes, the SAMA5 device would "oops" with the
following trace:

kernel BUG at net/core/skbuff.c:1863!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
CPU: 0 PID: 0 Comm: swapper Tainted: G           O      5.9.1-prox2+ #1
Hardware name: Atmel SAMA5
PC is at skb_put+0x3c/0x50
LR is at macb_start_xmit+0x134/0xad0 [macb]
pc : [<c05258cc>]    lr : [<bf0ea5b8>]    psr: 20070113
sp : c0d01a60  ip : c07232c0  fp : c4250000
r10: c0d03cc8  r9 : 00000000  r8 : c0d038c0
r7 : 00000000  r6 : 00000008  r5 : c59b66c0  r4 : 0000002a
r3 : 8f659eff  r2 : c59e9eea  r1 : 00000001  r0 : c59b66c0
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 2640c059  DAC: 00000051
Process swapper (pid: 0, stack limit = 0x75002d81)

<snipped stack>

[<c05258cc>] (skb_put) from [<bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
[<bf0ea5b8>] (macb_start_xmit [macb]) from [<c053e504>] (dev_hard_start_xmit+0x90/0x11c)
[<c053e504>] (dev_hard_start_xmit) from [<c0571180>] (sch_direct_xmit+0x124/0x260)
[<c0571180>] (sch_direct_xmit) from [<c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
[<c053eae4>] (__dev_queue_xmit) from [<c05a5650>] (ip_finish_output2+0x350/0x580)
[<c05a5650>] (ip_finish_output2) from [<c05a7e24>] (ip_output+0xb4/0x13c)
[<c05a7e24>] (ip_output) from [<c05a39d0>] (ip_forward+0x474/0x500)
[<c05a39d0>] (ip_forward) from [<c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
[<c05a13d8>] (ip_sublist_rcv_finish) from [<c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
[<c05a19b8>] (ip_sublist_rcv) from [<c05a2494>] (ip_list_rcv+0xf8/0x124)
[<c05a2494>] (ip_list_rcv) from [<c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
[<c05403c4>] (__netif_receive_skb_list_core) from [<c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
[<c05405c4>] (netif_receive_skb_list_internal) from [<c0540684>] (gro_normal_list.part.0+0x14/0x28)
[<c0540684>] (gro_normal_list.part.0) from [<c0541280>] (napi_complete_done+0x16c/0x210)
[<c0541280>] (napi_complete_done) from [<bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
[<bf14c1c0>] (r8152_poll [r8152]) from [<c0541424>] (net_rx_action+0x100/0x328)
[<c0541424>] (net_rx_action) from [<c01012ec>] (__do_softirq+0xec/0x274)
[<c01012ec>] (__do_softirq) from [<c012d6d4>] (irq_exit+0xcc/0xd0)
[<c012d6d4>] (irq_exit) from [<c0160960>] (__handle_domain_irq+0x58/0xa4)
[<c0160960>] (__handle_domain_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc0d01ef0 to 0xc0d01f38)
1ee0:                                     00000000 0000003d 0c31f383 c0d0fa00
1f00: c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
1f20: fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
[<c04e0f8c>] (cpuidle_enter_state) from [<c04e12c4>] (cpuidle_enter+0x28/0x38)
[<c04e12c4>] (cpuidle_enter) from [<c014f710>] (do_idle+0x194/0x214)
[<c014f710>] (do_idle) from [<c014fa50>] (cpu_startup_entry+0xc/0x14)
[<c014fa50>] (cpu_startup_entry) from [<c0a00dc8>] (start_kernel+0x46c/0x4a0)
Code: e580c054 8a000002 e1a00002 e8bd8070 (e7f001f2)
---[ end trace 146c8a334115490c ]---

The solution was to force nonlinear buffers to be cloned.  This was previously
reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
but never formally submitted as a patch.

This is the third revision, hopefully the formatting is correct this time!

Suggested-by: Klaus Doth <[email protected]>
Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation")
Signed-off-by: Mark Deneen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost fixes from Michael Tsirkin:
"Fixes all over the place.

  A new UAPI is borderline: can also be considered a new feature but
  also seems to be the only way we could come up with to fix addressing
  for userspace - and it seems important to switch to it now before
  userspace making assumptions about addressing ability of devices is
  set in stone"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpasim: allow to assign a MAC address
  vdpasim: fix MAC address configuration
  vdpa: handle irq bypass register failure case
  vdpa_sim: Fix DMA mask
  Revert "vhost-vdpa: fix page pinning leakage in error path"
  vdpa/mlx5: Fix error return in map_direct_mr()
  vhost_vdpa: Return -EFAULT if copy_from_user() fails
  vdpa_sim: implement get_iova_range()
  vhost: vdpa: report iova range
  vdpa: introduce config op to get valid iova range