]> Git Repo - linux.git/commitdiff
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel...
authorLinus Torvalds <[email protected]>
Fri, 24 Feb 2023 01:09:35 +0000 (17:09 -0800)
committerLinus Torvalds <[email protected]>
Fri, 24 Feb 2023 01:09:35 +0000 (17:09 -0800)
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...

108 files changed:
1  2 
Documentation/admin-guide/cgroup-v1/memory.rst
Documentation/admin-guide/mm/damon/reclaim.rst
Documentation/admin-guide/mm/hugetlbpage.rst
Documentation/admin-guide/mm/idle_page_tracking.rst
Documentation/admin-guide/mm/numaperf.rst
Documentation/admin-guide/mm/pagemap.rst
Documentation/mm/balance.rst
Documentation/mm/highmem.rst
Documentation/mm/hugetlbfs_reserv.rst
Documentation/mm/page_owner.rst
Documentation/mm/slub.rst
Documentation/mm/transhuge.rst
Documentation/mm/unevictable-lru.rst
Documentation/mm/zsmalloc.rst
Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst
Documentation/translations/zh_CN/mm/page_owner.rst
MAINTAINERS
arch/arm/kernel/process.c
arch/arm64/include/asm/pgtable.h
arch/riscv/include/asm/pgtable.h
arch/s390/include/asm/pgtable.h
arch/x86/entry/vdso/vma.c
arch/x86/mm/pat/memtype.c
drivers/accel/habanalabs/common/memory.c
drivers/accel/habanalabs/gaudi/gaudi.c
drivers/accel/habanalabs/gaudi2/gaudi2.c
drivers/accel/habanalabs/goya/goya.c
drivers/accel/ivpu/ivpu_gem.c
drivers/block/brd.c
drivers/block/zram/zram_drv.c
drivers/crypto/hisilicon/qm.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_process.c
drivers/gpu/drm/drm_gem.c
drivers/gpu/drm/drm_gem_dma_helper.c
drivers/gpu/drm/drm_gem_shmem_helper.c
drivers/gpu/drm/gma500/framebuffer.c
drivers/gpu/drm/i915/gem/i915_gem_mman.c
drivers/gpu/drm/mediatek/mtk_drm_gem.c
drivers/gpu/drm/omapdrm/omap_gem.c
drivers/gpu/drm/ttm/ttm_bo_vm.c
drivers/infiniband/hw/hfi1/file_ops.c
drivers/infiniband/hw/mlx5/main.c
drivers/video/fbdev/core/fb_defio.c
fs/afs/write.c
fs/btrfs/extent_io.c
fs/buffer.c
fs/ceph/addr.c
fs/cifs/file.c
fs/coredump.c
fs/erofs/data.c
fs/exec.c
fs/ext4/inode.c
fs/ext4/super.c
fs/f2fs/data.c
fs/fuse/file.c
fs/gfs2/aops.c
fs/gfs2/glops.c
fs/gfs2/log.c
fs/hugetlbfs/inode.c
fs/iomap/buffered-io.c
fs/mpage.c
fs/nfs/write.c
fs/ntfs3/inode.c
fs/orangefs/file.c
fs/orangefs/inode.c
fs/ramfs/file-nommu.c
fs/udf/inode.c
fs/xfs/xfs_file.c
include/linux/blkdev.h
include/linux/fs.h
include/linux/hugetlb.h
include/linux/memcontrol.h
include/linux/mm.h
include/linux/mm_types.h
include/linux/pagemap.h
init/main.c
io_uring/io_uring.c
kernel/bpf/syscall.c
kernel/events/core.c
kernel/fork.c
kernel/pid_namespace.c
kernel/sched/fair.c
kernel/sys.c
lib/Kconfig.debug
mm/compaction.c
mm/filemap.c
mm/huge_memory.c
mm/internal.h
mm/kasan/kasan.h
mm/khugepaged.c
mm/madvise.c
mm/memcontrol.c
mm/migrate.c
mm/page_alloc.c
mm/page_io.c
mm/secretmem.c
mm/shmem.c
mm/slab.c
mm/slub.c
mm/swap.c
mm/swapfile.c
mm/vmalloc.c
net/ipv4/tcp.c
tools/objtool/check.c
tools/testing/selftests/Makefile
tools/testing/selftests/mm/Makefile

index 27d89495ac880a5acc43b97ab5f76c483373a81d,258e45cc3b2db1866e005fbe2e7da09f5ed67195..47d1d7d932a82be09b072854ee8de609823e4fe0
@@@ -2,18 -2,18 +2,18 @@@
  Memory Resource Controller
  ==========================
  
 -NOTE:
 +.. caution::
        This document is hopelessly outdated and it asks for a complete
        rewrite. It still contains a useful information so we are keeping it
        here but make sure to check the current code if you need a deeper
        understanding.
  
 -NOTE:
 +.. note::
        The Memory Resource Controller has generically been referred to as the
        memory controller in this document. Do not confuse memory controller
        used here with the memory controller that is used in hardware.
  
 -(For editors) In this document:
 +.. hint::
        When we mention a cgroup (cgroupfs's directory) with memory controller,
        we call it "memory cgroup". When you see git-log and source code, you'll
        see patch's title and function names tend to use "memcg".
@@@ -23,7 -23,7 +23,7 @@@ Benefits and Purpose of the memory cont
  =============================================
  
  The memory controller isolates the memory behaviour of a group of tasks
 -from the rest of the system. The article on LWN [12] mentions some probable
 +from the rest of the system. The article on LWN [12]_ mentions some probable
  uses of the memory controller. The memory controller can be used to
  
  a. Isolate an application or a group of applications
@@@ -55,8 -55,7 +55,8 @@@ Features
   - Root cgroup has no limit controls.
  
   Kernel memory support is a work in progress, and the current version provides
 - basically functionality. (See Section 2.7)
 + basically functionality. (See :ref:`section 2.7
 + <cgroup-v1-memory-kernel-extension>`)
  
  Brief summary of control files.
  
@@@ -87,6 -86,8 +87,8 @@@
   memory.swappiness                 set/show swappiness parameter of vmscan
                                     (See sysctl's vm.swappiness)
   memory.move_charge_at_immigrate     set/show controls of moving charges
+                                      This knob is deprecated and shouldn't be
+                                      used.
   memory.oom_control                set/show oom controls.
   memory.numa_stat                  show the number of memory usage per numa
                                     node
  ==========
  
  The memory controller has a long history. A request for comments for the memory
 -controller was posted by Balbir Singh [1]. At the time the RFC was posted
 +controller was posted by Balbir Singh [1]_. At the time the RFC was posted
  there were several implementations for memory control. The goal of the
  RFC was to build consensus and agreement for the minimal features required
 -for memory control. The first RSS controller was posted by Balbir Singh[2]
 -in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the
 -RSS controller. At OLS, at the resource management BoF, everyone suggested
 -that we handle both page cache and RSS together. Another request was raised
 -to allow user space handling of OOM. The current memory controller is
 +for memory control. The first RSS controller was posted by Balbir Singh [2]_
 +in Feb 2007. Pavel Emelianov [3]_ [4]_ [5]_ has since posted three versions
 +of the RSS controller. At OLS, at the resource management BoF, everyone
 +suggested that we handle both page cache and RSS together. Another request was
 +raised to allow user space handling of OOM. The current memory controller is
  at version 6; it combines both mapped (RSS) and unmapped Page
 -Cache Control [11].
 +Cache Control [11]_.
  
  2. Memory Control
  =================
@@@ -148,8 -149,7 +150,8 @@@ specific data structure (mem_cgroup) as
  2.2. Accounting
  ---------------
  
 -::
 +.. code-block::
 +   :caption: Figure 1: Hierarchy of Accounting
  
                +--------------------+
                |  mem_cgroup        |
             |               |           |               |
             +---------------+           +---------------+
  
 -             (Figure 1: Hierarchy of Accounting)
  
  
  Figure 1 shows the important aspects of the controller
@@@ -222,9 -223,8 +224,9 @@@ behind this approach is that a cgroup t
  page will eventually get charged for it (once it is uncharged from
  the cgroup that brought it in -- this will happen on memory pressure).
  
 -But see section 8.2: when moving a task to another cgroup, its pages may
 -be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
 +But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
 +task to another cgroup, its pages may be recharged to the new cgroup, if
 +move_charge_at_immigrate has been chosen.
  
  2.4 Swap Extension
  --------------------------------------
@@@ -246,8 -246,7 +248,8 @@@ In this case, setting memsw.limit_in_by
  By using the memsw limit, you can avoid system OOM which can be caused by swap
  shortage.
  
 -**why 'memory+swap' rather than swap**
 +2.4.1 why 'memory+swap' rather than swap
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
  The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
  to move account from memory to swap...there is no change in usage of
@@@ -255,8 -254,7 +257,8 @@@ memory+swap. In other words, when we wa
  affecting global LRU, memory+swap limit is better than just limiting swap from
  an OS point of view.
  
 -**What happens when a cgroup hits memory.memsw.limit_in_bytes**
 +2.4.2. What happens when a cgroup hits memory.memsw.limit_in_bytes
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
  When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
  in this cgroup. Then, swap-out will not be done by cgroup routine and file
@@@ -272,26 -270,26 +274,26 @@@ global VM. When a cgroup goes over its 
  to reclaim memory from the cgroup so as to make space for the new
  pages that the cgroup has touched. If the reclaim is unsuccessful,
  an OOM routine is invoked to select and kill the bulkiest task in the
 -cgroup. (See 10. OOM Control below.)
 +cgroup. (See :ref:`10. OOM Control <cgroup-v1-memory-oom-control>` below.)
  
  The reclaim algorithm has not been modified for cgroups, except that
  pages that are selected for reclaiming come from the per-cgroup LRU
  list.
  
 -NOTE:
 -  Reclaim does not work for the root cgroup, since we cannot set any
 -  limits on the root cgroup.
 +.. note::
 +   Reclaim does not work for the root cgroup, since we cannot set any
 +   limits on the root cgroup.
  
 -Note2:
 -  When panic_on_oom is set to "2", the whole system will panic.
 +.. note::
 +   When panic_on_oom is set to "2", the whole system will panic.
  
  When oom event notifier is registered, event will be delivered.
 -(See oom_control section)
 +(See :ref:`oom_control <cgroup-v1-memory-oom-control>` section)
  
  2.6 Locking
  -----------
  
 -Lock order is as follows:
 +Lock order is as follows::
  
    Page lock (PG_locked bit of page->flags)
      mm->page_table_lock or split pte_lock
@@@ -303,8 -301,6 +305,8 @@@ Per-node-per-memcgroup LRU (cgroup's pr
  lruvec->lru_lock; PG_lru bit of page->flags is cleared before
  isolating a page from its LRU under lruvec->lru_lock.
  
 +.. _cgroup-v1-memory-kernel-extension:
 +
  2.7 Kernel Memory Extension
  -----------------------------------------------
  
@@@ -373,10 -369,10 +375,10 @@@ U != 0, K < U
      never greater than the total memory, and freely set U at the cost of his
      QoS.
  
 -WARNING:
 -    In the current implementation, memory reclaim will NOT be
 -    triggered for a cgroup when it hits K while staying below U, which makes
 -    this setup impractical.
 +    .. warning::
 +       In the current implementation, memory reclaim will NOT be triggered for
 +       a cgroup when it hits K while staying below U, which makes this setup
 +       impractical.
  
  U != 0, K >= U:
      Since kmem charges will also be fed to the user counter and reclaim will be
  3. User Interface
  =================
  
 -3.0. Configuration
 -------------------
 -
 -a. Enable CONFIG_CGROUPS
 -b. Enable CONFIG_MEMCG
 -
 -3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
 --------------------------------------------------------------------
 +To use the user interface:
  
 -::
 +1. Enable CONFIG_CGROUPS and CONFIG_MEMCG options
 +2. Prepare the cgroups (see :ref:`Why are cgroups needed?
 +   <cgroups-why-needed>` for the background information)::
  
        # mount -t tmpfs none /sys/fs/cgroup
        # mkdir /sys/fs/cgroup/memory
        # mount -t cgroup none /sys/fs/cgroup/memory -o memory
  
 -3.2. Make the new group and move bash into it::
 +3. Make the new group and move bash into it::
  
        # mkdir /sys/fs/cgroup/memory/0
        # echo $$ > /sys/fs/cgroup/memory/0/tasks
  
 -Since now we're in the 0 cgroup, we can alter the memory limit::
 +4. Since now we're in the 0 cgroup, we can alter the memory limit::
  
        # echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
  
 -NOTE:
 -  We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
 -  mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
 -  Gibibytes.)
 +   The limit can now be queried::
  
 -NOTE:
 -  We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
 +      # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
 +      4194304
  
 -NOTE:
 -  We cannot set limits on the root cgroup any more.
 +.. note::
 +   We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
 +   mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
 +   Gibibytes.)
  
 -::
 +.. note::
 +   We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
 +
 +.. note::
 +   We cannot set limits on the root cgroup any more.
  
 -  # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
 -  4194304
  
  We can check the usage::
  
@@@ -460,8 -460,6 +462,8 @@@ test because it has noise of shared obj
  But the above two are testing extreme situations.
  Trying usual test under memory controller is always helpful.
  
 +.. _cgroup-v1-memory-test-troubleshoot:
 +
  4.1 Troubleshooting
  -------------------
  
@@@ -474,11 -472,8 +476,11 @@@ terminated by the OOM killer. There ar
  A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
  some of the pages cached in the cgroup (page cache pages).
  
 -To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
 -seeing what happens will be helpful.
 +To know what happens, disabling OOM_Kill as per :ref:`"10. OOM Control"
 +<cgroup-v1-memory-oom-control>` (below) and seeing what happens will be
 +helpful.
 +
 +.. _cgroup-v1-memory-test-task-migration:
  
  4.2 Task migration
  ------------------
@@@ -489,16 -484,15 +491,16 @@@ remain charged to it, the charge is dro
  reclaimed.
  
  You can move charges of a task along with task migration.
 -See 8. "Move charges at task migration"
 +See :ref:`8. "Move charges at task migration" <cgroup-v1-memory-move-charges>`
  
  4.3 Removing a cgroup
  ---------------------
  
 -A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
 -cgroup might have some charge associated with it, even though all
 -tasks have migrated away from it. (because we charge against pages, not
 -against tasks.)
 +A cgroup can be removed by rmdir, but as discussed in :ref:`sections 4.1
 +<cgroup-v1-memory-test-troubleshoot>` and :ref:`4.2
 +<cgroup-v1-memory-test-task-migration>`, a cgroup might have some charge
 +associated with it, even though all tasks have migrated away from it. (because
 +we charge against pages, not against tasks.)
  
  We move the stats to parent, and no change on the charge except uncharging
  from the child.
@@@ -527,66 -521,67 +529,66 @@@ will be charged as a new owner of it
  5.2 stat file
  -------------
  
 -memory.stat file includes following statistics
 -
 -per-memory cgroup local status
 -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 -
 -=============== ===============================================================
 -cache         # of bytes of page cache memory.
 -rss           # of bytes of anonymous and swap cache memory (includes
 -              transparent hugepages).
 -rss_huge      # of bytes of anonymous transparent hugepages.
 -mapped_file   # of bytes of mapped file (includes tmpfs/shmem)
 -pgpgin                # of charging events to the memory cgroup. The charging
 -              event happens each time a page is accounted as either mapped
 -              anon page(RSS) or cache page(Page Cache) to the cgroup.
 -pgpgout               # of uncharging events to the memory cgroup. The uncharging
 -              event happens each time a page is unaccounted from the cgroup.
 -swap          # of bytes of swap usage
 -dirty         # of bytes that are waiting to get written back to the disk.
 -writeback     # of bytes of file/anon cache that are queued for syncing to
 -              disk.
 -inactive_anon # of bytes of anonymous and swap cache memory on inactive
 -              LRU list.
 -active_anon   # of bytes of anonymous and swap cache memory on active
 -              LRU list.
 -inactive_file # of bytes of file-backed memory and MADV_FREE anonymous memory(
 -                LazyFree pages) on inactive LRU list.
 -active_file   # of bytes of file-backed memory on active LRU list.
 -unevictable   # of bytes of memory that cannot be reclaimed (mlocked etc).
 -=============== ===============================================================
 -
 -status considering hierarchy (see memory.use_hierarchy settings)
 -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 -
 -========================= ===================================================
 -hierarchical_memory_limit # of bytes of memory limit with regard to hierarchy
 -                        under which the memory cgroup is
 -hierarchical_memsw_limit  # of bytes of memory+swap limit with regard to
 -                        hierarchy under which memory cgroup is.
 -
 -total_<counter>                 # hierarchical version of <counter>, which in
 -                        addition to the cgroup's own value includes the
 -                        sum of all hierarchical children's values of
 -                        <counter>, i.e. total_cache
 -========================= ===================================================
 -
 -The following additional stats are dependent on CONFIG_DEBUG_VM
 -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 -
 -========================= ========================================
 -recent_rotated_anon     VM internal parameter. (see mm/vmscan.c)
 -recent_rotated_file     VM internal parameter. (see mm/vmscan.c)
 -recent_scanned_anon     VM internal parameter. (see mm/vmscan.c)
 -recent_scanned_file     VM internal parameter. (see mm/vmscan.c)
 -========================= ========================================
 -
 -Memo:
 +memory.stat file includes following statistics:
 +
 +  * per-memory cgroup local status
 +
 +    =============== ===============================================================
 +    cache           # of bytes of page cache memory.
 +    rss             # of bytes of anonymous and swap cache memory (includes
 +                    transparent hugepages).
 +    rss_huge        # of bytes of anonymous transparent hugepages.
 +    mapped_file     # of bytes of mapped file (includes tmpfs/shmem)
 +    pgpgin          # of charging events to the memory cgroup. The charging
 +                    event happens each time a page is accounted as either mapped
 +                    anon page(RSS) or cache page(Page Cache) to the cgroup.
 +    pgpgout         # of uncharging events to the memory cgroup. The uncharging
 +                    event happens each time a page is unaccounted from the
 +                    cgroup.
 +    swap            # of bytes of swap usage
 +    dirty           # of bytes that are waiting to get written back to the disk.
 +    writeback       # of bytes of file/anon cache that are queued for syncing to
 +                    disk.
 +    inactive_anon   # of bytes of anonymous and swap cache memory on inactive
 +                    LRU list.
 +    active_anon     # of bytes of anonymous and swap cache memory on active
 +                    LRU list.
 +    inactive_file   # of bytes of file-backed memory and MADV_FREE anonymous
 +                    memory (LazyFree pages) on inactive LRU list.
 +    active_file     # of bytes of file-backed memory on active LRU list.
 +    unevictable     # of bytes of memory that cannot be reclaimed (mlocked etc).
 +    =============== ===============================================================
 +
 +  * status considering hierarchy (see memory.use_hierarchy settings):
 +
 +    ========================= ===================================================
 +    hierarchical_memory_limit # of bytes of memory limit with regard to
 +                              hierarchy
 +                              under which the memory cgroup is
 +    hierarchical_memsw_limit  # of bytes of memory+swap limit with regard to
 +                              hierarchy under which memory cgroup is.
 +
 +    total_<counter>           # hierarchical version of <counter>, which in
 +                              addition to the cgroup's own value includes the
 +                              sum of all hierarchical children's values of
 +                              <counter>, i.e. total_cache
 +    ========================= ===================================================
 +
 +  * additional vm parameters (depends on CONFIG_DEBUG_VM):
 +
 +    ========================= ========================================
 +    recent_rotated_anon       VM internal parameter. (see mm/vmscan.c)
 +    recent_rotated_file       VM internal parameter. (see mm/vmscan.c)
 +    recent_scanned_anon       VM internal parameter. (see mm/vmscan.c)
 +    recent_scanned_file       VM internal parameter. (see mm/vmscan.c)
 +    ========================= ========================================
 +
 +.. hint::
        recent_rotated means recent frequency of LRU rotation.
        recent_scanned means recent # of scans to LRU.
        showing for better debug please see the code for meanings.
  
 -Note:
 +.. note::
        Only anonymous and swap cache memory is listed as part of 'rss' stat.
        This should not be confused with the true 'resident set size' or the
        amount of physical memory used by the cgroup.
@@@ -717,18 -712,22 +719,25 @@@ If we want to change this to 1G, we ca
  
        # echo 1G > memory.soft_limit_in_bytes
  
 -NOTE1:
 +.. note::
         Soft limits take effect over a long period of time, since they involve
         reclaiming memory for balancing between memory cgroups
 -NOTE2:
 +
 +.. note::
         It is recommended to set the soft limit always below the hard limit,
         otherwise the hard limit will take precedence.
  
- 8. Move charges at task migration
- =================================
 +.. _cgroup-v1-memory-move-charges:
 +
+ 8. Move charges at task migration (DEPRECATED!)
+ ===============================================
+ THIS IS DEPRECATED!
+ It's expensive and unreliable! It's better practice to launch workload
+ tasks directly from inside their target cgroup. Use dedicated workload
+ cgroups to allow fine-grained policy adjustments without having to
+ move physical pages between control domains.
  
  Users can move charges associated with a task along with task migration, that
  is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
@@@ -745,29 -744,23 +754,29 @@@ If you want to enable it:
  
        # echo (some positive value) > memory.move_charge_at_immigrate
  
 -Note:
 +.. note::
        Each bits of move_charge_at_immigrate has its own meaning about what type
 -      of charges should be moved. See 8.2 for details.
 -Note:
 +      of charges should be moved. See :ref:`section 8.2
 +      <cgroup-v1-memory-movable-charges>` for details.
 +
 +.. note::
        Charges are moved only when you move mm->owner, in other words,
        a leader of a thread group.
 -Note:
 +
 +.. note::
        If we cannot find enough space for the task in the destination cgroup, we
        try to make space by reclaiming memory. Task migration may fail if we
        cannot make enough space.
 -Note:
 +
 +.. note::
        It can take several seconds if you move charges much.
  
  And if you want disable it again::
  
        # echo 0 > memory.move_charge_at_immigrate
  
 +.. _cgroup-v1-memory-movable-charges:
 +
  8.2 Type of charges which can be moved
  --------------------------------------
  
@@@ -817,8 -810,6 +826,8 @@@ threshold in any direction
  
  It's applicable for root and non-root cgroup.
  
 +.. _cgroup-v1-memory-oom-control:
 +
  10. OOM Control
  ===============
  
@@@ -974,16 -965,15 +983,16 @@@ commented and discussed quite extensive
  References
  ==========
  
 -1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
 -2. Singh, Balbir. Memory Controller (RSS Control),
 +.. [1] Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
 +.. [2] Singh, Balbir. Memory Controller (RSS Control),
     http://lwn.net/Articles/222762/
 -3. Emelianov, Pavel. Resource controllers based on process cgroups
 +.. [3] Emelianov, Pavel. Resource controllers based on process cgroups
     https://lore.kernel.org/r/[email protected]
 -4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
 +.. [4] Emelianov, Pavel. RSS controller based on process cgroups (v2)
     https://lore.kernel.org/r/[email protected]
 -5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
 +.. [5] Emelianov, Pavel. RSS controller based on process cgroups (v3)
     https://lore.kernel.org/r/[email protected]
 +
  6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
  7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
     subsystem (v3), http://lwn.net/Articles/235534/
     https://lore.kernel.org/r/[email protected]
  10. Singh, Balbir. Memory controller v6 test results,
      https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
 -11. Singh, Balbir. Memory controller introduction (v6),
 -    https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop
 -12. Corbet, Jonathan, Controlling memory use in cgroups,
 -    http://lwn.net/Articles/243795/
 +
 +.. [11] Singh, Balbir. Memory controller introduction (v6),
 +   https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop
 +.. [12] Corbet, Jonathan, Controlling memory use in cgroups,
 +   http://lwn.net/Articles/243795/
index d2ccd9c21b9a444123f5a371897e96af5ea0b12d,ff335e96e0d84ab337867e71e348ad6e28ea17cb..343e25b252f430f687c6549437b3f15fec86d261
@@@ -46,7 -46,7 +46,7 @@@ that is built with ``CONFIG_DAMON_RECLA
  To let sysadmins enable or disable it and tune for the given system,
  DAMON_RECLAIM utilizes module parameters.  That is, you can put
  ``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write
 -proper values to ``/sys/modules/damon_reclaim/parameters/<parameter>`` files.
 +proper values to ``/sys/module/damon_reclaim/parameters/<parameter>`` files.
  
  Below are the description of each parameter.
  
@@@ -205,6 -205,15 +205,15 @@@ The end physical address of memory regi
  against.  That is, DAMON_RECLAIM will find cold memory regions in this region
  and reclaims.  By default, biggest System RAM is used as the region.
  
+ skip_anon
+ ---------
+ Skip anonymous pages reclamation.
+ If this parameter is set as ``Y``, DAMON_RECLAIM does not reclaim anonymous
+ pages.  By default, ``N``.
  kdamond_pid
  -----------
  
@@@ -251,7 -260,7 +260,7 @@@ therefore the free memory rate becomes 
  do nothing again, so that we can fall back to the LRU-list based page
  granularity reclamation. ::
  
 -    # cd /sys/modules/damon_reclaim/parameters
 +    # cd /sys/module/damon_reclaim/parameters
      # echo 30000000 > min_age
      # echo $((1 * 1024 * 1024 * 1024)) > quota_sz
      # echo 1000 > quota_reset_interval_ms
index bca00cb6f43aae1f2862a202395a98a9ee231242,a969a2c742b212dd5657f024e9776cc792fc0e00..e4d4b4a8dc97361719053b467c24cd475eba1a14
@@@ -1,3 -1,5 +1,3 @@@
 -.. _hugetlbpage:
 -
  =============
  HugeTLB Pages
  =============
@@@ -84,7 -86,7 +84,7 @@@ by increasing or decreasing the value o
  
  Note: When the feature of freeing unused vmemmap pages associated with each
  hugetlb page is enabled, we can fail to free the huge pages triggered by
 -the user when ths system is under memory pressure.  Please try again later.
 +the user when the system is under memory pressure.  Please try again later.
  
  Pages that are used as huge pages are reserved inside the kernel and cannot
  be used for other purposes.  Huge pages cannot be swapped out under
@@@ -311,7 -313,7 +311,7 @@@ memory policy mode--bind, preferred, lo
  resulting effect on persistent huge page allocation is as follows:
  
  #. Regardless of mempolicy mode [see
 -   :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`],
 +   Documentation/admin-guide/mm/numa_memory_policy.rst],
     persistent huge pages will be distributed across the node or nodes
     specified in the mempolicy as if "interleave" had been specified.
     However, if a node in the policy does not contain sufficient contiguous
@@@ -459,13 -461,13 +459,13 @@@ Example
  .. _map_hugetlb:
  
  ``map_hugetlb``
-       see tools/testing/selftests/vm/map_hugetlb.c
+       see tools/testing/selftests/mm/map_hugetlb.c
  
  ``hugepage-shm``
-       see tools/testing/selftests/vm/hugepage-shm.c
+       see tools/testing/selftests/mm/hugepage-shm.c
  
  ``hugepage-mmap``
-       see tools/testing/selftests/vm/hugepage-mmap.c
+       see tools/testing/selftests/mm/hugepage-mmap.c
  
  The `libhugetlbfs`_  library provides a wide range of userspace tools
  to help with huge page usability, environment setup, and control.
index b5a285bd73fd376eefc6dd53cc776a989eaa40d4,19492064278ce6a22bb7ff3f14835ad6b307145c..16fcf38dac56d520bb75c9ea0521df4dfef54c33
@@@ -1,3 -1,5 +1,3 @@@
 -.. _idle_page_tracking:
 -
  ==================
  Idle Page Tracking
  ==================
@@@ -63,13 -65,14 +63,13 @@@ workload one should
      are not reclaimable, he or she can filter them out using
      ``/proc/kpageflags``.
  
- The page-types tool in the tools/vm directory can be used to assist in this.
+ The page-types tool in the tools/mm directory can be used to assist in this.
  If the tool is run initially with the appropriate option, it will mark all the
  queried pages as idle.  Subsequent runs of the tool can then show which pages have
  their idle flag cleared in the interim.
  
 -See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more
 -information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and
 -``/proc/kpagecgroup``.
 +See Documentation/admin-guide/mm/pagemap.rst for more information about
 +``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
  
  .. _impl_details:
  
index 24e63e740420c34c9832dcd7bd53b0de4c0dcd63,544a6d16c80152c4bd80496c021cde58d31153b0..90a12b6a8bfc0880bf526e1a39cbc62747379ea2
@@@ -1,4 -1,9 +1,7 @@@
- =============
 -.. _numaperf:
 -
+ =======================
+ NUMA Memory Performance
+ =======================
  NUMA Locality
  =============
  
@@@ -59,7 -64,6 +62,6 @@@ that are CPUs and hence suitable for ge
  IO initiators such as GPUs and NICs.  Unlike access class 0, only
  nodes containing CPUs are considered.
  
- ================
  NUMA Performance
  ================
  
@@@ -94,7 -98,6 +96,6 @@@ for the platform
  Access class 1 takes the same form but only includes values for CPU to
  memory activity.
  
- ==========
  NUMA Cache
  ==========
  
@@@ -168,7 -171,6 +169,6 @@@ The "size" is the number of bytes provi
  The "write_policy" will be 0 for write-back, and non-zero for
  write-through caching.
  
- ========
  See Also
  ========
  
index 1a22674ab18ea50dd1099507d51689e0b2308227,ceb5da3172ba902451ea09f854878663fdf806ba..b5f970dc91e7045bd903238b8045732846c8d47f
@@@ -1,3 -1,5 +1,3 @@@
 -.. _pagemap:
 -
  =============================
  Examining Process Page Tables
  =============================
@@@ -17,10 -19,10 +17,10 @@@ There are four components to pagemap
      * Bits 0-4   swap type if swapped
      * Bits 5-54  swap offset if swapped
      * Bit  55    pte is soft-dirty (see
 -      :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
 +      Documentation/admin-guide/mm/soft-dirty.rst)
      * Bit  56    page exclusively mapped (since 4.2)
      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
 -      :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
 +      Documentation/admin-guide/mm/userfaultfd.rst)
      * Bits 58-60 zero
      * Bit  61    page is file-page or shared-anon (since 3.5)
      * Bit  62    page swapped
@@@ -44,7 -46,7 +44,7 @@@
   * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
     times each page is mapped, indexed by PFN.
  
- The page-types tool in the tools/vm directory can be used to query the
+ The page-types tool in the tools/mm directory can be used to query the
  number of times a page is mapped.
  
   * ``/proc/kpageflags``.  This file contains a 64-bit set of flags for each
@@@ -103,7 -105,8 +103,7 @@@ Short descriptions to the page flag
      A compound page with order N consists of 2^N physically contiguous pages.
      A compound page with order 2 takes the form of "HTTT", where H donates its
      head page and T donates its tail page(s).  The major consumers of compound
 -    pages are hugeTLB pages
 -    (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
 +    pages are hugeTLB pages (Documentation/admin-guide/mm/hugetlbpage.rst),
      the SLUB etc.  memory allocators and various device drivers.
      However in this interface, only huge/giga pages are made visible
      to end users.
      Zero page for pfn_zero or huge_zero page.
  25 - IDLE
      The page has not been accessed since it was marked idle (see
 -    :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
 +    Documentation/admin-guide/mm/idle_page_tracking.rst).
      Note that this flag may be stale in case the page was accessed via
      a PTE. To make sure the flag is up-to-date one has to read
      ``/sys/kernel/mm/page_idle/bitmap`` first.
@@@ -170,7 -173,7 +170,7 @@@ LRU related page flag
  14 - SWAPBACKED
     The page is backed by swap/RAM.
  
- The page-types tool in the tools/vm directory can be used to query the
+ The page-types tool in the tools/mm directory can be used to query the
  above flags.
  
  Using pagemap to do something useful
index 6cd0127154ac467b665d4ce6fb33c98c0fc92282,e38e9d83c1c72be024ac2f05925a8db0fd2c262f..abaa78561c3133c99bed3f6feb52d576c2a3ac4f
@@@ -1,10 -1,12 +1,10 @@@
 -.. _balance:
 -
  ================
  Memory Balancing
  ================
  
  Started Jan 2000 by Kanoj Sarcar <[email protected]>
  
- Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+ Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
  well as for non __GFP_IO allocations.
  
  The first reason why a caller may avoid reclaim is that the caller can not
index bb3f90e195fa580ec329a6bf2a6aef402fb80b96,4503868b0865f744201464d56d6db1a84022af2f..c964e084870282994b5b1375062857e43fa7c391
@@@ -1,3 -1,5 +1,3 @@@
 -.. _highmem:
 -
  ====================
  High Memory Handling
  ====================
@@@ -55,7 -57,8 +55,8 @@@ list shows them in order of preference 
    It can be invoked from any context (including interrupts) but the mappings
    can only be used in the context which acquired them.
  
-   This function should be preferred, where feasible, over all the others.
+   This function should always be used, whereas kmap_atomic() and kmap() have
+   been deprecated.
  
    These mappings are thread-local and CPU-local, meaning that the mapping
    can only be accessed from within this thread and the thread is bound to the
@@@ -80,7 -83,7 +81,7 @@@
    for pages which are known to not come from ZONE_HIGHMEM. However, it is
    always safe to use kmap_local_page() / kunmap_local().
  
-   While it is significantly faster than kmap(), for the higmem case it
+   While it is significantly faster than kmap(), for the highmem case it
    comes with restrictions about the pointers validity. Contrary to kmap()
    mappings, the local mappings are only valid in the context of the caller
    and cannot be handed to other contexts. This implies that users must
    (included in the "Functions" section) for details on how to manage nested
    mappings.
  
- * kmap_atomic().  This permits a very short duration mapping of a single
-   page.  Since the mapping is restricted to the CPU that issued it, it
-   performs well, but the issuing task is therefore required to stay on that
-   CPU until it has finished, lest some other task displace its mappings.
+ * kmap_atomic(). This function has been deprecated; use kmap_local_page().
+   NOTE: Conversions to kmap_local_page() must take care to follow the mapping
+   restrictions imposed on kmap_local_page(). Furthermore, the code between
+   calls to kmap_atomic() and kunmap_atomic() may implicitly depend on the side
+   effects of atomic mappings, i.e. disabling page faults or preemption, or both.
+   In that case, explicit calls to pagefault_disable() or preempt_disable() or
+   both must be made in conjunction with the use of kmap_local_page().
+   [Legacy documentation]
+   This permits a very short duration mapping of a single page.  Since the
+   mapping is restricted to the CPU that issued it, it performs well, but
+   the issuing task is therefore required to stay on that CPU until it has
+   finished, lest some other task displace its mappings.
  
    kmap_atomic() may also be used by interrupt contexts, since it does not
    sleep and the callers too may not sleep until after kunmap_atomic() is
  
    It is assumed that k[un]map_atomic() won't fail.
  
- * kmap().  This should be used to make short duration mapping of a single
-   page with no restrictions on preemption or migration. It comes with an
-   overhead as mapping space is restricted and protected by a global lock
-   for synchronization. When mapping is no longer needed, the address that
-   the page was mapped to must be released with kunmap().
+ * kmap(). This function has been deprecated; use kmap_local_page().
+   NOTE: Conversions to kmap_local_page() must take care to follow the mapping
+   restrictions imposed on kmap_local_page(). In particular, it is necessary to
+   make sure that the kernel virtual memory pointer is only valid in the thread
+   that obtained it.
+   [Legacy documentation]
+   This should be used to make short duration mapping of a single page with no
+   restrictions on preemption or migration. It comes with an overhead as mapping
+   space is restricted and protected by a global lock for synchronization. When
+   mapping is no longer needed, the address that the page was mapped to must be
+   released with kunmap().
  
    Mapping changes must be propagated across all the CPUs. kmap() also
    requires global TLB invalidation when the kmap's pool wraps and it might
index 05a44760da323b376447e9747546969c15b4d05b,611728c49bff172360aca25501f0f80a98caefed..3d05d64de9b463ac1caa68c5d1d46487ab9f75bc
@@@ -1,3 -1,5 +1,3 @@@
 -.. _hugetlbfs_reserve:
 -
  =====================
  Hugetlbfs Reservation
  =====================
@@@ -5,7 -7,7 +5,7 @@@
  Overview
  ========
  
 -Huge pages as described at :ref:`hugetlbpage` are typically
 +Huge pages as described at Documentation/mm/hugetlbpage.rst are typically
  preallocated for application use.  These huge pages are instantiated in a
  task's address space at page fault time if the VMA indicates huge pages are
  to be used.  If no huge page exists at page fault time, the task is sent
@@@ -179,14 -181,14 +179,14 @@@ Consuming Reservations/Allocating a Hug
  
  Reservations are consumed when huge pages associated with the reservations
  are allocated and instantiated in the corresponding mapping.  The allocation
- is performed within the routine alloc_huge_page()::
+ is performed within the routine alloc_hugetlb_folio()::
  
-       struct page *alloc_huge_page(struct vm_area_struct *vma,
+       struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
                                     unsigned long addr, int avoid_reserve)
  
- alloc_huge_page is passed a VMA pointer and a virtual address, so it can
+ alloc_hugetlb_folio is passed a VMA pointer and a virtual address, so it can
  consult the reservation map to determine if a reservation exists.  In addition,
- alloc_huge_page takes the argument avoid_reserve which indicates reserves
+ alloc_hugetlb_folio takes the argument avoid_reserve which indicates reserves
  should not be used even if it appears they have been set aside for the
  specified address.  The avoid_reserve argument is most often used in the case
  of Copy on Write and Page Migration where additional copies of an existing
@@@ -206,7 -208,8 +206,8 @@@ a reservation for the allocation.  Afte
  exists and can be used for the allocation, the routine dequeue_huge_page_vma()
  is called.  This routine takes two arguments related to reservations:
  
- - avoid_reserve, this is the same value/argument passed to alloc_huge_page()
+ - avoid_reserve, this is the same value/argument passed to
+   alloc_hugetlb_folio().
  - chg, even though this argument is of type long only the values 0 or 1 are
    passed to dequeue_huge_page_vma.  If the value is 0, it indicates a
    reservation exists (see the section "Memory Policy and Reservations" for
@@@ -231,9 -234,9 +232,9 @@@ the scope reservations.  Even if a surp
  reservation based adjustments as above will be made: SetPagePrivate(page) and
  resv_huge_pages--.
  
- After obtaining a new huge page, (page)->private is set to the value of
- the subpool associated with the page if it exists.  This will be used for
subpool accounting when the page is freed.
+ After obtaining a new hugetlb folio, (folio)->_hugetlb_subpool is set to the
+ value of the subpool associated with the page if it exists.  This will be used
for subpool accounting when the folio is freed.
  
  The routine vma_commit_reservation() is then called to adjust the reserve
  map based on the consumption of the reservation.  In general, this involves
@@@ -244,8 -247,8 +245,8 @@@ was no reservation in a shared mapping 
  entry must be created.
  
  It is possible that the reserve map could have been changed between the call
- to vma_needs_reservation() at the beginning of alloc_huge_page() and the
- call to vma_commit_reservation() after the page was allocated.  This would
+ to vma_needs_reservation() at the beginning of alloc_hugetlb_folio() and the
+ call to vma_commit_reservation() after the folio was allocated.  This would
  be possible if hugetlb_reserve_pages was called for the same page in a shared
  mapping.  In such cases, the reservation count and subpool free page count
  will be off by one.  This rare condition can be identified by comparing the
index e8d5090a9e6b9a8a9c8287e6019928065fa3efa5,5df26c0a0c1fc9c4bd9fe427780bd2750a7062ef..62e3f7ab23cc18b131276cc778ed4a5e0ffdb093
@@@ -1,3 -1,5 +1,3 @@@
 -.. _page_owner:
 -
  ==================================================
  page owner: Tracking about who allocated each page
  ==================================================
@@@ -50,7 -52,7 +50,7 @@@ pages are investigated and marked as al
  Although it doesn't mean that they have the right owner information,
  at least, we can tell whether the page is allocated or not,
  more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
 -are catched and marked, although they are mostly allocated from struct
 +are caught and marked, although they are mostly allocated from struct
  page extension feature. Anyway, after that, no page is left in
  un-tracking state.
  
@@@ -59,7 -61,7 +59,7 @@@ Usag
  
  1) Build user-space helper::
  
-       cd tools/vm
+       cd tools/mm
        make page_owner_sort
  
  2) Enable page owner: add "page_owner=on" to boot cmdline.
@@@ -176,7 -178,7 +176,7 @@@ STANDARD FORMAT SPECIFIER
        at              alloc_ts        timestamp of the page when it was allocated
        ator            allocator       memory allocator for pages
  
 -  For --curl option:
 +  For --cull option:
  
        KEY             LONG            DESCRIPTION
        p               pid             process ID
index fa01cdfd7d3a4219a2e9fa5109bba96d5349607b,3ffa7eded25116f0494ee77b9072744678753882..be75971532f57d3495c0090f208be744ca8fd073
@@@ -1,3 -1,5 +1,3 @@@
 -.. _slub:
 -
  ==========================
  Short users guide for SLUB
  ==========================
@@@ -19,7 -21,7 +19,7 @@@ slabs that have data in them. See "slab
  running the command. ``slabinfo`` can be compiled with
  ::
  
-       gcc -o slabinfo tools/vm/slabinfo.c
+       gcc -o slabinfo tools/mm/slabinfo.c
  
  Some of the modes of operation of ``slabinfo`` require that slub debugging
  be enabled on the command line. F.e. no tracking information will be
index 9d924b651c61d1d3ea0cacbd26eaa1d9690a2795,a9608fe516499083aa9398d7e5fbab0393b287a7..9a607059ea11cfc3765a11c98d46f11021690215
@@@ -1,3 -1,5 +1,3 @@@
 -.. _transhuge:
 -
  ============================
  Transparent Hugepage Support
  ============================
@@@ -110,20 -112,20 +110,20 @@@ Refcounts and transparent huge page
  Refcounting on THP is mostly consistent with refcounting on other compound
  pages:
  
-   - get_page()/put_page() and GUP operate on head page's ->_refcount.
+   - get_page()/put_page() and GUP operate on the folio->_refcount.
  
    - ->_refcount in tail pages is always zero: get_page_unless_zero() never
      succeeds on tail pages.
  
-   - map/unmap of PMD entry for the whole compound page increment/decrement
-     ->compound_mapcount, stored in the first tail page of the compound page;
-     and also increment/decrement ->subpages_mapcount (also in the first tail)
-     by COMPOUND_MAPPED when compound_mapcount goes from -1 to 0 or 0 to -1.
+   - map/unmap of a PMD entry for the whole THP increment/decrement
+     folio->_entire_mapcount and also increment/decrement
+     folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount
+     goes from -1 to 0 or 0 to -1.
  
-   - map/unmap of sub-pages with PTE entry increment/decrement ->_mapcount
-     on relevant sub-page of the compound page, and also increment/decrement
-     ->subpages_mapcount, stored in first tail page of the compound page, when
-     _mapcount goes from -1 to 0 or 0 to -1: counting sub-pages mapped by PTE.
+   - map/unmap of individual pages with PTE entry increment/decrement
+     page->_mapcount and also increment/decrement folio->_nr_pages_mapped
+     when page->_mapcount goes from -1 to 0 or 0 to -1 as this counts
+     the number of pages mapped by PTE.
  
  split_huge_page internally has to distribute the refcounts in the head
  page to the tail pages before clearing all PG_head/tail bits from the page
@@@ -151,8 -153,8 +151,8 @@@ clear where references should go after 
  Note that split_huge_pmd() doesn't have any limitations on refcounting:
  pmd can be split at any point and never fails.
  
- Partial unmap and deferred_split_huge_page()
- ============================================
+ Partial unmap and deferred_split_folio()
+ ========================================
  
  Unmapping part of THP (with munmap() or other way) is not going to free
  memory immediately. Instead, we detect that a subpage of THP is not in use
@@@ -164,6 -166,6 +164,6 @@@ the place where we can detect partial u
  counterproductive since in many cases partial unmap happens during exit(2) if
  a THP crosses a VMA boundary.
  
- The function deferred_split_huge_page() is used to queue a page for splitting.
+ The function deferred_split_folio() is used to queue a folio for splitting.
  The splitting itself will happen when we get memory pressure via shrinker
  interface.
index b5dc98cd1ba822206ad0b94e55ce3efd17226675,53e59433497a8d2986afd22e7ee304e4613ffe6e..92ac5dca420c5bbfeae106a4b7521de973ad749d
@@@ -1,3 -1,5 +1,3 @@@
 -.. _unevictable_lru:
 -
  ==============================
  Unevictable LRU Infrastructure
  ==============================
@@@ -10,7 -12,7 +10,7 @@@ Introductio
  
  This document describes the Linux memory manager's "Unevictable LRU"
  infrastructure and the use of this to manage several types of "unevictable"
pages.
folios.
  
  The document attempts to provide the overall rationale behind this mechanism
  and the rationale for some of the design decisions that drove the
@@@ -25,8 -27,8 +25,8 @@@ The Unevictable LR
  ===================
  
  The Unevictable LRU facility adds an additional LRU list to track unevictable
pages and to hide these pages from vmscan.  This mechanism is based on a patch
- by Larry Woodman of Red Hat to address several scalability problems with page
folios and to hide these folios from vmscan.  This mechanism is based on a patch
+ by Larry Woodman of Red Hat to address several scalability problems with folio
  reclaim in Linux.  The problems have been observed at customer sites on large
  memory x86_64 systems.
  
@@@ -50,40 -52,41 +50,41 @@@ The infrastructure may also be able to 
  unevictable, either by definition or by circumstance, in the future.
  
  
- The Unevictable LRU Page List
- -----------------------------
+ The Unevictable LRU Folio List
+ ------------------------------
  
- The Unevictable LRU page list is a lie.  It was never an LRU-ordered list, but a
- companion to the LRU-ordered anonymous and file, active and inactive page lists;
- and now it is not even a page list.  But following familiar convention, here in
- this document and in the source, we often imagine it as a fifth LRU page list.
+ The Unevictable LRU folio list is a lie.  It was never an LRU-ordered
+ list, but a companion to the LRU-ordered anonymous and file, active and
+ inactive folio lists; and now it is not even a folio list.  But following
+ familiar convention, here in this document and in the source, we often
+ imagine it as a fifth LRU folio list.
  
  The Unevictable LRU infrastructure consists of an additional, per-node, LRU list
- called the "unevictable" list and an associated page flag, PG_unevictable, to
- indicate that the page is being managed on the unevictable list.
+ called the "unevictable" list and an associated folio flag, PG_unevictable, to
+ indicate that the folio is being managed on the unevictable list.
  
  The PG_unevictable flag is analogous to, and mutually exclusive with, the
- PG_active flag in that it indicates on which LRU list a page resides when
+ PG_active flag in that it indicates on which LRU list a folio resides when
  PG_lru is set.
  
- The Unevictable LRU infrastructure maintains unevictable pages as if they were
+ The Unevictable LRU infrastructure maintains unevictable folios as if they were
  on an additional LRU list for a few reasons:
  
-  (1) We get to "treat unevictable pages just like we treat other pages in the
+  (1) We get to "treat unevictable folios just like we treat other folios in the
       system - which means we get to use the same code to manipulate them, the
       same code to isolate them (for migrate, etc.), the same code to keep track
       of the statistics, etc..." [Rik van Riel]
  
-  (2) We want to be able to migrate unevictable pages between nodes for memory
+  (2) We want to be able to migrate unevictable folios between nodes for memory
       defragmentation, workload management and memory hotplug.  The Linux kernel
-      can only migrate pages that it can successfully isolate from the LRU
+      can only migrate folios that it can successfully isolate from the LRU
       lists (or "Movable" pages: outside of consideration here).  If we were to
-      maintain pages elsewhere than on an LRU-like list, where they can be
-      detected by isolate_lru_page(), we would prevent their migration.
+      maintain folios elsewhere than on an LRU-like list, where they can be
+      detected by folio_isolate_lru(), we would prevent their migration.
  
- The unevictable list does not differentiate between file-backed and anonymous,
- swap-backed pages.  This differentiation is only important while the pages are,
- in fact, evictable.
+ The unevictable list does not differentiate between file-backed and
+ anonymous, swap-backed folios.  This differentiation is only important
while the folios are, in fact, evictable.
  
  The unevictable list benefits from the "arrayification" of the per-node LRU
  lists and statistics originally proposed and posted by Christoph Lameter.
@@@ -156,7 -159,7 +157,7 @@@ These are currently used in three place
  Detecting Unevictable Pages
  ---------------------------
  
- The function page_evictable() in mm/internal.h determines whether a page is
+ The function folio_evictable() in mm/internal.h determines whether a folio is
  evictable or not using the query function outlined above [see section
  :ref:`Marking address spaces unevictable <mark_addr_space_unevict>`]
  to check the AS_UNEVICTABLE flag.
@@@ -165,7 -168,7 +166,7 @@@ For address spaces that are so marked a
  might be), the lock action (e.g. SHM_LOCK) can be lazy, and need not populate
  the page tables for the region as does, for example, mlock(), nor need it make
  any special effort to push any pages in the SHM_LOCK'd area to the unevictable
- list.  Instead, vmscan will do this if and when it encounters the pages during
+ list.  Instead, vmscan will do this if and when it encounters the folios during
  a reclamation scan.
  
  On an unlock action (such as SHM_UNLOCK), the unlocker (e.g. shmctl()) must scan
@@@ -174,41 -177,43 +175,43 @@@ condition is keeping them unevictable
  the pages are also "rescued" from the unevictable list in the process of
  freeing them.
  
- page_evictable() also checks for mlocked pages by testing an additional page
- flag, PG_mlocked (as wrapped by PageMlocked()), which is set when a page is
faulted into a VM_LOCKED VMA, or found in a VMA being VM_LOCKED.
+ folio_evictable() also checks for mlocked folios by calling
+ folio_test_mlocked(), which is set when a folio is faulted into a
+ VM_LOCKED VMA, or found in a VMA being VM_LOCKED.
  
  
- Vmscan's Handling of Unevictable Pages
- --------------------------------------
+ Vmscan's Handling of Unevictable Folios
+ ---------------------------------------
  
- If unevictable pages are culled in the fault path, or moved to the unevictable
- list at mlock() or mmap() time, vmscan will not encounter the pages until they
+ If unevictable folios are culled in the fault path, or moved to the unevictable
+ list at mlock() or mmap() time, vmscan will not encounter the folios until they
  have become evictable again (via munlock() for example) and have been "rescued"
  from the unevictable list.  However, there may be situations where we decide,
- for the sake of expediency, to leave an unevictable page on one of the regular
+ for the sake of expediency, to leave an unevictable folio on one of the regular
  active/inactive LRU lists for vmscan to deal with.  vmscan checks for such
pages in all of the shrink_{active|inactive|page}_list() functions and will
- "cull" such pages that it encounters: that is, it diverts those pages to the
folios in all of the shrink_{active|inactive|page}_list() functions and will
+ "cull" such folios that it encounters: that is, it diverts those folios to the
  unevictable list for the memory cgroup and node being scanned.
  
- There may be situations where a page is mapped into a VM_LOCKED VMA, but the
- page is not marked as PG_mlocked.  Such pages will make it all the way to
- shrink_active_list() or shrink_page_list() where they will be detected when
- vmscan walks the reverse map in folio_referenced() or try_to_unmap().  The page
- is culled to the unevictable list when it is released by the shrinker.
+ There may be situations where a folio is mapped into a VM_LOCKED VMA,
+ but the folio does not have the mlocked flag set.  Such folios will make
+ it all the way to shrink_active_list() or shrink_page_list() where they
+ will be detected when vmscan walks the reverse map in folio_referenced()
+ or try_to_unmap().  The folio is culled to the unevictable list when it
+ is released by the shrinker.
  
- To "cull" an unevictable page, vmscan simply puts the page back on the LRU list
- using putback_lru_page() - the inverse operation to isolate_lru_page() - after
- dropping the page lock.  Because the condition which makes the page unevictable
- may change once the page is unlocked, __pagevec_lru_add_fn() will recheck the
- unevictable state of a page before placing it on the unevictable list.
+ To "cull" an unevictable folio, vmscan simply puts the folio back on
+ the LRU list using folio_putback_lru() - the inverse operation to
+ folio_isolate_lru() - after dropping the folio lock.  Because the
+ condition which makes the folio unevictable may change once the folio
+ is unlocked, __pagevec_lru_add_fn() will recheck the unevictable state
+ of a folio before placing it on the unevictable list.
  
  
  MLOCKED Pages
  =============
  
- The unevictable page list is also useful for mlock(), in addition to ramfs and
+ The unevictable folio list is also useful for mlock(), in addition to ramfs and
  SYSV SHM.  Note that mlock() is only available in CONFIG_MMU=y situations; in
  NOMMU situations, all mappings are effectively mlocked.
  
@@@ -293,7 -298,7 +296,7 @@@ treated as a no-op and mlock_fixup() si
  If the VMA passes some filtering as described in "Filtering Special VMAs"
  below, mlock_fixup() will attempt to merge the VMA with its neighbors or split
  off a subset of the VMA if the range does not cover the entire VMA.  Any pages
- already present in the VMA are then marked as mlocked by mlock_page() via
+ already present in the VMA are then marked as mlocked by mlock_folio() via
  mlock_pte_range() via walk_page_range() via mlock_vma_pages_range().
  
  Before returning from the system call, do_mlock() or mlockall() will call
@@@ -306,22 -311,22 +309,22 @@@ do end up getting faulted into this VM_
  fault path - which is also how mlock2()'s MLOCK_ONFAULT areas are handled.
  
  For each PTE (or PMD) being faulted into a VMA, the page add rmap function
- calls mlock_vma_page(), which calls mlock_page() when the VMA is VM_LOCKED
+ calls mlock_vma_folio(), which calls mlock_folio() when the VMA is VM_LOCKED
  (unless it is a PTE mapping of a part of a transparent huge page).  Or when
- it is a newly allocated anonymous page, lru_cache_add_inactive_or_unevictable()
calls mlock_new_page() instead: similar to mlock_page(), but can make better
+ it is a newly allocated anonymous page, folio_add_lru_vma() calls
mlock_new_folio() instead: similar to mlock_folio(), but can make better
  judgments, since this page is held exclusively and known not to be on LRU yet.
  
- mlock_page() sets PageMlocked immediately, then places the page on the CPU's
- mlock pagevec, to batch up the rest of the work to be done under lru_lock by
- __mlock_page().  __mlock_page() sets PageUnevictable, initializes mlock_count
+ mlock_folio() sets PG_mlocked immediately, then places the page on the CPU's
+ mlock folio batch, to batch up the rest of the work to be done under lru_lock by
+ __mlock_folio().  __mlock_folio() sets PG_unevictable, initializes mlock_count
  and moves the page to unevictable state ("the unevictable LRU", but with
- mlock_count in place of LRU threading).  Or if the page was already PageLRU
- and PageUnevictable and PageMlocked, it simply increments the mlock_count.
+ mlock_count in place of LRU threading).  Or if the page was already PG_lru
+ and PG_unevictable and PG_mlocked, it simply increments the mlock_count.
  
  But in practice that may not work ideally: the page may not yet be on an LRU, or
  it may have been temporarily isolated from LRU.  In such cases the mlock_count
- field cannot be touched, but will be set to 0 later when __pagevec_lru_add_fn()
+ field cannot be touched, but will be set to 0 later when __munlock_folio()
  returns the page to "LRU".  Races prohibit mlock_count from being set to 1 then:
  rather than risk stranding a page indefinitely as unevictable, always err with
  mlock_count on the low side, so that when munlocked the page will be rescued to
@@@ -368,20 -373,21 +371,21 @@@ Because of the VMA filtering discussed 
  any "special" VMAs.  So, those VMAs will be ignored for munlock.
  
  If the VMA is VM_LOCKED, mlock_fixup() again attempts to merge or split off the
- specified range.  All pages in the VMA are then munlocked by munlock_page() via
+ specified range.  All pages in the VMA are then munlocked by munlock_folio() via
  mlock_pte_range() via walk_page_range() via mlock_vma_pages_range() - the same
  function used when mlocking a VMA range, with new flags for the VMA indicating
  that it is munlock() being performed.
  
- munlock_page() uses the mlock pagevec to batch up work to be done under
- lru_lock by  __munlock_page().  __munlock_page() decrements the page's
- mlock_count, and when that reaches 0 it clears PageMlocked and clears
- PageUnevictable, moving the page from unevictable state to inactive LRU.
+ munlock_folio() uses the mlock pagevec to batch up work to be done
+ under lru_lock by  __munlock_folio().  __munlock_folio() decrements the
+ folio's mlock_count, and when that reaches 0 it clears the mlocked flag
+ and clears the unevictable flag, moving the folio from unevictable state
+ to the inactive LRU.
  
- But in practice that may not work ideally: the page may not yet have reached
+ But in practice that may not work ideally: the folio may not yet have reached
  "the unevictable LRU", or it may have been temporarily isolated from it.  In
  those cases its mlock_count field is unusable and must be assumed to be 0: so
- that the page will be rescued to an evictable LRU, then perhaps be mlocked
+ that the folio will be rescued to an evictable LRU, then perhaps be mlocked
  again later if vmscan finds it in a VM_LOCKED VMA.
  
  
@@@ -408,7 -414,7 +412,7 @@@ However, since mlock_vma_pages_range() 
  before mlocking any pages already present, if one of those pages were migrated
  before mlock_pte_range() reached it, it would get counted twice in mlock_count.
  To prevent that, mlock_vma_pages_range() temporarily marks the VMA as VM_IO,
- so that mlock_vma_page() will skip it.
+ so that mlock_vma_folio() will skip it.
  
  To complete page migration, we place the old and new pages back onto the LRU
  afterwards.  The "unneeded" page - old page on success, new page on failure -
@@@ -481,18 -487,19 +485,19 @@@ Before the unevictable/mlock changes, m
  way, so unmapping them required no processing.
  
  For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
- munlock_vma_page(), which calls munlock_page() when the VMA is VM_LOCKED
+ munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
  (unless it was a PTE mapping of a part of a transparent huge page).
  
- munlock_page() uses the mlock pagevec to batch up work to be done under
- lru_lock by  __munlock_page().  __munlock_page() decrements the page's
- mlock_count, and when that reaches 0 it clears PageMlocked and clears
- PageUnevictable, moving the page from unevictable state to inactive LRU.
+ munlock_folio() uses the mlock pagevec to batch up work to be done
+ under lru_lock by  __munlock_folio().  __munlock_folio() decrements the
+ folio's mlock_count, and when that reaches 0 it clears the mlocked flag
+ and clears the unevictable flag, moving the folio from unevictable state
+ to the inactive LRU.
  
- But in practice that may not work ideally: the page may not yet have reached
+ But in practice that may not work ideally: the folio may not yet have reached
  "the unevictable LRU", or it may have been temporarily isolated from it.  In
  those cases its mlock_count field is unusable and must be assumed to be 0: so
- that the page will be rescued to an evictable LRU, then perhaps be mlocked
+ that the folio will be rescued to an evictable LRU, then perhaps be mlocked
  again later if vmscan finds it in a VM_LOCKED VMA.
  
  
@@@ -505,7 -512,7 +510,7 @@@ which had been Copied-On-Write from th
  
  Mlocked pages can be munlocked and deleted in this way: like with munmap(),
  for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
- munlock_vma_page(), which calls munlock_page() when the VMA is VM_LOCKED
+ munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
  (unless it was a PTE mapping of a part of a transparent huge page).
  
  However, if there is a racing munlock(), since mlock_vma_pages_range() starts
@@@ -513,7 -520,7 +518,7 @@@ munlocking by clearing VM_LOCKED from 
  present, if one of those pages were unmapped by truncation or hole punch before
  mlock_pte_range() reached it, it would not be recognized as mlocked by this VMA,
  and would not be counted out of mlock_count.  In this rare case, a page may
- still appear as PageMlocked after it has been fully unmapped: and it is left to
+ still appear as PG_mlocked after it has been fully unmapped: and it is left to
  release_pages() (or __page_cache_release()) to clear it and update statistics
  before freeing (this event is counted in /proc/vmstat unevictable_pgs_cleared,
  which is usually 0).
@@@ -525,7 -532,7 +530,7 @@@ Page Reclaim in shrink_*_list(
  vmscan's shrink_active_list() culls any obviously unevictable pages -
  i.e. !page_evictable(page) pages - diverting those to the unevictable list.
  However, shrink_active_list() only sees unevictable pages that made it onto the
- active/inactive LRU lists.  Note that these pages do not have PageUnevictable
+ active/inactive LRU lists.  Note that these pages do not have PG_unevictable
  set - otherwise they would be on the unevictable list and shrink_active_list()
  would never see them.
  
@@@ -547,6 -554,6 +552,6 @@@ and node unevictable list
  
  rmap's folio_referenced_one(), called via vmscan's shrink_active_list() or
  shrink_page_list(), and rmap's try_to_unmap_one() called via shrink_page_list(),
- check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_page()
+ check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_folio()
  to correct them.  Such pages are culled to the unevictable list when released
  by the shrinker.
index 24616a7c115aaa20b92ea04990afb40f35453e7d,40323c9b39d886b057177f46e0c80b9818514058..64d127bfc221fe243bbfb6e6758a547c6877585a
@@@ -1,3 -1,5 +1,3 @@@
 -.. _zsmalloc:
 -
  ========
  zsmalloc
  ========
@@@ -78,3 -80,171 +78,171 @@@ Similarly, we assign zspage to
  * ZS_ALMOST_FULL  when n > N / f
  * ZS_EMPTY        when n == 0
  * ZS_FULL         when n == N
+ Internals
+ =========
+ zsmalloc has 255 size classes, each of which can hold a number of zspages.
+ Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
+ The optimal zspage chain size for each size class is calculated during the
+ creation of the zsmalloc pool (see calculate_zspage_chain_size()).
+ As an optimization, zsmalloc merges size classes that have similar
+ characteristics in terms of the number of pages per zspage and the number
+ of objects that each zspage can store.
+ For instance, consider the following size classes:::
+   class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+   ...
+      94  1536           0            0             0          0          0                3        0
+     100  1632           0            0             0          0          0                2        0
+   ...
+ Size classes #95-99 are merged with size class #100. This means that when we
+ need to store an object of size, say, 1568 bytes, we end up using size class
+ #100 instead of size class #96. Size class #100 is meant for objects of size
+ 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
+ Size class #100 consists of zspages with 2 physical pages each, which can
+ hold a total of 5 objects. If we need to store 13 objects of size 1568, we
+ end up allocating three zspages, or 6 physical pages.
+ However, if we take a closer look at size class #96 (which is meant for
+ objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
+ find that the most optimal zspage configuration for this class is a chain
+ of 5 physical pages:::
+     pages per zspage      wasted bytes     used%
+            1                  960           76
+            2                  352           95
+            3                 1312           89
+            4                  704           95
+            5                   96           99
+ This means that a class #96 configuration with 5 physical pages can store 13
+ objects of size 1568 in a single zspage, using a total of 5 physical pages.
+ This is more efficient than the class #100 configuration, which would use 6
+ physical pages to store the same number of objects.
+ As the zspage chain size for class #96 increases, its key characteristics
+ such as pages per-zspage and objects per-zspage also change. This leads to
+ dewer class mergers, resulting in a more compact grouping of classes, which
+ reduces memory wastage.
+ Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
+   class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+   ...
+     202  3264           0            0             0          0          0                4        0
+     254  4096           0            0             0          0          0                1        0
+   ...
+ Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
+ per zspage. Any object larger than 3264 bytes is considered huge and belongs
+ to size class #254, which stores each object in its own physical page (objects
+ in huge classes do not share pages).
+ Increasing the size of the chain of zspages also results in a higher watermark
+ for the huge size class and fewer huge classes overall. This allows for more
+ efficient storage of large objects.
+ For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
+   class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+   ...
+     202  3264           0            0             0          0          0                4        0
+     211  3408           0            0             0          0          0                5        0
+     217  3504           0            0             0          0          0                6        0
+     222  3584           0            0             0          0          0                7        0
+     225  3632           0            0             0          0          0                8        0
+     254  4096           0            0             0          0          0                1        0
+   ...
+ For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
+   class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+   ...
+     202  3264           0            0             0          0          0                4        0
+     206  3328           0            0             0          0          0               13        0
+     207  3344           0            0             0          0          0                9        0
+     208  3360           0            0             0          0          0               14        0
+     211  3408           0            0             0          0          0                5        0
+     212  3424           0            0             0          0          0               16        0
+     214  3456           0            0             0          0          0               11        0
+     217  3504           0            0             0          0          0                6        0
+     219  3536           0            0             0          0          0               13        0
+     222  3584           0            0             0          0          0                7        0
+     223  3600           0            0             0          0          0               15        0
+     225  3632           0            0             0          0          0                8        0
+     228  3680           0            0             0          0          0                9        0
+     230  3712           0            0             0          0          0               10        0
+     232  3744           0            0             0          0          0               11        0
+     234  3776           0            0             0          0          0               12        0
+     235  3792           0            0             0          0          0               13        0
+     236  3808           0            0             0          0          0               14        0
+     238  3840           0            0             0          0          0               15        0
+     254  4096           0            0             0          0          0                1        0
+   ...
+ Overall the combined zspage chain size effect on zsmalloc pool configuration:::
+   pages per zspage   number of size classes (clusters)   huge size class watermark
+          4                        69                               3264
+          5                        86                               3408
+          6                        93                               3504
+          7                       112                               3584
+          8                       123                               3632
+          9                       140                               3680
+         10                       143                               3712
+         11                       159                               3744
+         12                       164                               3776
+         13                       180                               3792
+         14                       183                               3808
+         15                       188                               3840
+         16                       191                               3840
+ A synthetic test
+ ----------------
+ zram as a build artifacts storage (Linux kernel compilation).
+ * `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
+   zsmalloc classes stats:::
+     class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+     ...
+     Total                13           51        413836     412973     159955                         3
+   zram mm_stat:::
+    1691783168 628083717 655175680        0 655175680       60        0    34048    34049
+ * `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
+   zsmalloc classes stats:::
+     class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+     ...
+     Total                18           87        414852     412978     156666                         0
+   zram mm_stat:::
+     1691803648 627793930 641703936        0 641703936       60        0    33591    33591
+ Using larger zspage chains may result in using fewer physical pages, as seen
+ in the example where the number of physical pages used decreased from 159955
+ to 156666, at the same time maximum zsmalloc pool memory usage went down from
+ 655175680 to 641703936 bytes.
+ However, this advantage may be offset by the potential for increased system
+ memory pressure (as some zspages have larger chain sizes) in cases where there
+ is heavy internal fragmentation and zspool compaction is unable to relocate
+ objects and release zspages. In these cases, it is recommended to decrease
+ the limit on the size of the zspage chains (as specified by the
+ CONFIG_ZSMALLOC_CHAIN_SIZE option).
index 80787af29222381ef8b2e88c4e6748335428f866,826a50c47389e4e99133d823c784508ecee7d5a3..c1fa35315d8b2238892d1ff7661873a532088388
@@@ -15,7 -15,7 +15,7 @@@ Hugetlbfs é¢\84ç\95
  概述
  ====
  
 -:ref:`hugetlbpage` 中描述的巨页通常是预先分配给应用程序使用的。如果VMA指
 +Documentation/mm/hugetlbpage.rst 中描述的巨页通常是预先分配给应用程序使用的。如果VMA指
  示要使用巨页,这些巨页会在缺页异常时被实例化到任务的地址空间。如果在缺页异常
  时没有巨页存在,任务就会被发送一个SIGBUS,并经常不高兴地死去。在加入巨页支
  持后不久,人们决定,在mmap()时检测巨页的短缺情况会更好。这个想法是,如果
@@@ -142,14 -142,14 +142,14 @@@ HPAGE_RESV_OWNERæ \87å¿\97被设置ï¼\8c以è¡
  消耗预留/分配一个巨页
  ===========================
  
- 当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_huge_page()
+ 当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_hugetlb_folio()
  中进行的::
  
-       struct page *alloc_huge_page(struct vm_area_struct *vma,
+       struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
                                     unsigned long addr, int avoid_reserve)
  
- alloc_huge_page被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
- 此外,alloc_huge_page需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
+ alloc_hugetlb_folio被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
+ 此外,alloc_hugetlb_folio需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
  预留,也不应该使用预留。avoid_reserve参数最常被用于写时拷贝和页面迁移的情况下,即现有页面的额
  外拷贝被分配。
  
@@@ -162,7 -162,7 +162,7 @@@ vma_needs_reservation()è¿\94å\9b\9eç\9a\84å\80¼é\80\9aå
  确定预留是否存在并可用于分配后,调用dequeue_huge_page_vma()函数。这个函数需要两个与预留有关
  的参数:
  
- - avoid_reserve,这是传递给alloc_huge_page()的同一个值/参数。
+ - avoid_reserve,这是传递给alloc_hugetlb_folio()的同一个值/参数。
  - chg,尽管这个参数的类型是long,但只有0或1的值被传递给dequeue_huge_page_vma。如果该值为0,
    则表明存在预留(关于可能的问题,请参见 “预留和内存策略” 一节)。如果值
    为1,则表示不存在预留,如果可能的话,必须从全局空闲池中取出该页。
@@@ -179,7 -179,7 +179,7 @@@ free_huge_pages的值被递减。如果
  的剩余巨页和超额分配的问题。即使分配了一个多余的页面,也会进行与上面一样的基于预留的调整:
  SetPagePrivate(page) 和 resv_huge_pages--.
  
- 在获得一个新的巨页后,(page)->private被设置为与该页面相关的子池的值,如果它存在的话。当页
+ 在获得一个新的巨页后,(folio)->_hugetlb_subpool被设置为与该页面相关的子池的值,如果它存在的话。当页
  面被释放时,这将被用于子池的计数。
  
  然后调用函数vma_commit_reservation(),根据预留的消耗情况调整预留映射。一般来说,这涉及
@@@ -199,7 -199,7 +199,7 @@@ SetPagePrivate(page)和resv_huge_pages-
  已经存在,所以不做任何改变。然而,如果共享映射中没有预留,或者这是一个私有映射,则必须创建
  一个新的条目。
  
- 在alloc_huge_page()开始调用vma_needs_reservation()和页面分配后调用
+ 在alloc_hugetlb_folio()开始调用vma_needs_reservation()和页面分配后调用
  vma_commit_reservation()之间,预留映射有可能被改变。如果hugetlb_reserve_pages在共
  享映射中为同一页面被调用,这将是可能的。在这种情况下,预留计数和子池空闲页计数会有一个偏差。
  这种罕见的情况可以通过比较vma_needs_reservation和vma_commit_reservation的返回值来
index dba511fafef22fb757d60d9be39699ce622dfe07,4d3b2c33e4ef8e1730cdd053f038438cf7b9c7a4..b72a972271d92b725dcc861cf500b363373f4883
@@@ -34,9 -34,20 +34,9 @@@ page ownerå\9c¨é»\98认æ\83\85å\86µä¸\8bæ\98¯ç¦\81ç\94¨ç\9a
  一样进行。这两个不可能的分支应该不会影响到分配的性能,特别是在静态键跳转标签修补
  功能可用的情况下。以下是由于这个功能而导致的内核代码大小的变化。
  
 -- 没有page owner::
 -
 -   text    data     bss     dec     hex filename
 -   48392   2333     644   51369    c8a9 mm/page_alloc.o
 -
 -- 有page owner::
 -
 -   text    data     bss     dec     hex filename
 -   48800   2445     644   51889    cab1 mm/page_alloc.o
 -   6662     108      29    6799    1a8f mm/page_owner.o
 -   1025       8       8    1041     411 mm/page_ext.o
 -
 -虽然总共增加了8KB的代码,但page_alloc.o增加了520字节,其中不到一半是在hotpath
 -中。构建带有page owner的内核,并在需要时打开它,将是调试内核内存问题的最佳选择。
 +尽管启用page owner会使内核的大小增加几千字节,但这些代码大部分都在页面分配器和
 +热路径之外。构建带有page owner的内核,并在需要时打开它,将是调试内核内存问题的
 +最佳选择。
  
  有一个问题是由实现细节引起的。页所有者将信息存储到struct page扩展的内存中。这
  个内存的初始化时间比稀疏内存系统中的页面分配器启动的时间要晚一些,所以,在初始化
@@@ -51,7 -62,7 +51,7 @@@
  
  1) 构建用户空间的帮助::
  
-       cd tools/vm
+       cd tools/mm
        make page_owner_sort
  
  2) 启用page owner: 添加 "page_owner=on" 到 boot cmdline.
diff --combined MAINTAINERS
index 8afdd3dcc920ad22f9bcdf1a30db81aef5cda207,b92a2a0cb36b4239d20f80dd18ec803d4e18d857..eb6f650c6c0b64b31b513cf5d5c753be5c5e699c
@@@ -361,8 -361,6 +361,8 @@@ T: git git://git.kernel.org/pub/scm/lin
  F:    Documentation/ABI/testing/configfs-acpi
  F:    Documentation/ABI/testing/sysfs-bus-acpi
  F:    Documentation/firmware-guide/acpi/
 +F:    arch/x86/kernel/acpi/
 +F:    arch/x86/pci/acpi.c
  F:    drivers/acpi/
  F:    drivers/pci/*/*acpi*
  F:    drivers/pci/*acpi*
@@@ -385,7 -383,7 +385,7 @@@ ACPI COMPONENT ARCHITECTURE (ACPICA
  M:    Robert Moore <[email protected]>
  M:    "Rafael J. Wysocki" <[email protected]>
  L:    [email protected]
 -L:    devel@acpica.org
 +L:    [email protected].org
  S:    Supported
  W:    https://acpica.org/
  W:    https://github.com/acpica/acpica/
@@@ -1099,12 -1097,14 +1099,12 @@@ S:   Maintaine
  F:    drivers/dma/ptdma/
  
  AMD SEATTLE DEVICE TREE SUPPORT
 -M:    Brijesh Singh <[email protected]>
  M:    Suravee Suthikulpanit <[email protected]>
  M:    Tom Lendacky <[email protected]>
  S:    Supported
  F:    arch/arm64/boot/dts/amd/
  
  AMD XGBE DRIVER
 -M:    Tom Lendacky <[email protected]>
  M:    "Shyam Sundar S K" <[email protected]>
  L:    [email protected]
  S:    Supported
@@@ -1853,6 -1853,21 +1853,6 @@@ F:     include/dt-bindings/reset/actions,
  F:    include/linux/soc/actions/
  N:    owl
  
 -ARM/ADS SPHERE MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
 -ARM/AFEB9260 MACHINE SUPPORT
 -M:    Sergey Lapin <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
 -ARM/AJECO 1ARM MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/Allwinner SoC Clock Support
  M:    Emilio López <[email protected]>
  S:    Maintained
@@@ -1873,7 -1888,7 +1873,7 @@@ F:      drivers/pinctrl/sunxi
  F:    drivers/soc/sunxi/
  N:    allwinner
  N:    sun[x456789]i
 -N:    sun50i
 +N:    sun[25]0i
  
  ARM/Amlogic Meson SoC CLOCK FRAMEWORK
  M:    Neil Armstrong <[email protected]>
@@@ -2043,6 -2058,11 +2043,6 @@@ F:     arch/arm/boot/dts/ecx-*.dts
  F:    arch/arm/boot/dts/highbank.dts
  F:    arch/arm/mach-highbank/
  
 -ARM/CAVIUM NETWORKS CNS3XXX MACHINE SUPPORT
 -M:    Krzysztof Halasa <[email protected]>
 -S:    Maintained
 -F:    arch/arm/mach-cns3xxx/
 -
  ARM/CAVIUM THUNDER NETWORK DRIVER
  M:    Sunil Goutham <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2071,8 -2091,8 +2071,8 @@@ M:      Hartley Sweeten <hsweeten@visionengr
  M:    Alexander Sverdlin <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
 +F:    arch/arm/boot/compressed/misc-ep93xx.h
  F:    arch/arm/mach-ep93xx/
 -F:    arch/arm/mach-ep93xx/include/mach/
  
  ARM/CLKDEV SUPPORT
  M:    Russell King <[email protected]>
@@@ -2088,6 -2108,11 +2088,6 @@@ S:     Maintaine
  F:    arch/arm/boot/dts/cx92755*
  N:    digicolor
  
 -ARM/CONTEC MICRO9 MACHINE SUPPORT
 -M:    Hubert Feurstein <[email protected]>
 -S:    Maintained
 -F:    arch/arm/mach-ep93xx/micro9.c
 -
  ARM/CORESIGHT FRAMEWORK AND DRIVERS
  M:    Mathieu Poirier <[email protected]>
  M:    Suzuki K Poulose <[email protected]>
@@@ -2114,6 -2139,10 +2114,6 @@@ F:     tools/perf/arch/arm/util/pmu.
  F:    tools/perf/util/cs-etm-decoder/*
  F:    tools/perf/util/cs-etm.*
  
 -ARM/CORGI MACHINE SUPPORT
 -M:    Richard Purdie <[email protected]>
 -S:    Maintained
 -
  ARM/CORTINA SYSTEMS GEMINI ARM ARCHITECTURE
  M:    Hans Ulli Kroll <[email protected]>
  M:    Linus Walleij <[email protected]>
@@@ -2153,6 -2182,12 +2153,6 @@@ F:     include/dt-bindings/bus/moxtet.
  F:    include/linux/armada-37xx-rwtm-mailbox.h
  F:    include/linux/moxtet.h
  
 -ARM/EZX SMARTPHONES (A780, A910, A1200, E680, ROKR E2 and ROKR E6)
 -M:    Robert Jarzmik <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/ezx.c
 -
  ARM/FARADAY FA526 PORT
  M:    Hans Ulli Kroll <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2178,9 -2213,6 +2178,9 @@@ L:      [email protected]
  S:    Maintained
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
  X:    drivers/media/i2c/
 +F:    arch/arm64/boot/dts/freescale/
 +X:    arch/arm64/boot/dts/freescale/fsl-*
 +X:    arch/arm64/boot/dts/freescale/qoriq-*
  N:    imx
  N:    mxs
  
@@@ -2205,11 -2237,25 +2205,11 @@@ T:   git git://git.kernel.org/pub/scm/lin
  F:    arch/arm/boot/dts/vf*
  F:    arch/arm/mach-imx/*vf610*
  
 -ARM/GLOMATION GESBC9312SX MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/GUMSTIX MACHINE SUPPORT
  M:    Steve Sakoman <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
  
 -ARM/H4700 (HP IPAQ HX4700) MACHINE SUPPORT
 -M:    Philipp Zabel <[email protected]>
 -M:    Paul Parsons <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/hx4700.c
 -F:    arch/arm/mach-pxa/include/mach/hx4700.h
 -F:    sound/soc/pxa/hx4700.c
 -
  ARM/HISILICON SOC SUPPORT
  M:    Wei Xu <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2234,16 -2280,13 +2234,16 @@@ ARM/HPE GXP ARCHITECTUR
  M:    Jean-Marie Verdun <[email protected]>
  M:    Nick Hawkins <[email protected]>
  S:    Maintained
 +F:    Documentation/hwmon/gxp-fan-ctrl.rst
  F:    Documentation/devicetree/bindings/arm/hpe,gxp.yaml
 +F:    Documentation/devicetree/bindings/hwmon/hpe,gxp-fan-ctrl.yaml
  F:    Documentation/devicetree/bindings/spi/hpe,gxp-spifi.yaml
  F:    Documentation/devicetree/bindings/timer/hpe,gxp-timer.yaml
  F:    arch/arm/boot/dts/hpe-bmc*
  F:    arch/arm/boot/dts/hpe-gxp*
  F:    arch/arm/mach-hpe/
  F:    drivers/clocksource/timer-gxp.c
 +F:    drivers/hwmon/gxp-fan-ctrl.c
  F:    drivers/spi/spi-gxp.c
  F:    drivers/watchdog/gxp-wdt.c
  
  S:    Maintained
  F:    arch/arm/boot/dts/omap3-igep*
  
 -ARM/INCOME PXA270 SUPPORT
 -M:    Marek Vasut <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/colibri-pxa270-income.c
 -
 -ARM/INTEL IOP32X ARM ARCHITECTURE
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
 -ARM/INTEL IQ81342EX MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
 -ARM/INTEL IXDP2850 MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/INTEL IXP4XX ARM ARCHITECTURE
  M:    Linus Walleij <[email protected]>
  M:    Imre Kaloz <[email protected]>
@@@ -2287,12 -2351,22 +2287,12 @@@ M:   Lennert Buytenhek <kernel@wantstofly
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
  
 -ARM/IP FABRICS DOUBLE ESPRESSO MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/LG1K ARCHITECTURE
  M:    Chanho Min <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
  F:    arch/arm64/boot/dts/lg/
  
 -ARM/LOGICPD PXA270 MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/LPC18XX ARCHITECTURE
  M:    Vladimir Zapolskiy <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2319,6 -2393,10 +2319,6 @@@ F:     drivers/usb/host/ohci-nxp.
  F:    drivers/watchdog/pnx4008_wdt.c
  N:    lpc32xx
  
 -ARM/MAGICIAN MACHINE SUPPORT
 -M:    Philipp Zabel <[email protected]>
 -S:    Maintained
 -
  ARM/Marvell Dove/MV78xx0/Orion SOC support
  M:    Andrew Lunn <[email protected]>
  M:    Sebastian Hesselbarth <[email protected]>
@@@ -2373,14 -2451,11 +2373,14 @@@ F:   drivers/rtc/rtc-mt7622.
  
  ARM/Mediatek SoC support
  M:    Matthias Brugger <[email protected]>
 +R:    AngeloGioacchino Del Regno <[email protected]>
 +L:    [email protected]
  L:    [email protected] (moderated for non-subscribers)
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
  W:    https://mtk.wiki.kernel.org/
 -C:    irc://chat.freenode.net/linux-mediatek
 +C:    irc://irc.libera.chat/linux-mediatek
 +F:    arch/arm/boot/dts/mt2*
  F:    arch/arm/boot/dts/mt6*
  F:    arch/arm/boot/dts/mt7*
  F:    arch/arm/boot/dts/mt8*
@@@ -2388,7 -2463,7 +2388,7 @@@ F:      arch/arm/mach-mediatek
  F:    arch/arm64/boot/dts/mediatek/
  F:    drivers/soc/mediatek/
  N:    mtk
 -N:    mt[678]
 +N:    mt[2678]
  K:    mediatek
  
  ARM/Mediatek USB3 PHY DRIVER
@@@ -2450,6 -2525,12 +2450,6 @@@ F:     arch/arm/boot/dts/milbeaut
  F:    arch/arm/mach-milbeaut/
  N:    milbeaut
  
 -ARM/MIOA701 MACHINE SUPPORT
 -M:    Robert Jarzmik <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/mioa701.c
 -
  ARM/MStar/Sigmastar Armv7 SoC support
  M:    Daniel Palmer <[email protected]>
  M:    Romain Perier <[email protected]>
@@@ -2470,6 -2551,10 +2470,6 @@@ F:     drivers/watchdog/msc313e_wdt.
  F:    include/dt-bindings/clock/mstar-*
  F:    include/dt-bindings/gpio/msc313-gpio.h
  
 -ARM/NEC MOBILEPRO 900/c MACHINE SUPPORT
 -M:    Michael Petchkovsky <[email protected]>
 -S:    Maintained
 -
  ARM/NOMADIK/Ux500 ARCHITECTURES
  M:    Linus Walleij <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2525,7 -2610,6 +2525,7 @@@ S:      Maintaine
  W:    https://github.com/neuschaefer/wpcm450/wiki
  F:    Documentation/devicetree/bindings/*/*wpcm*
  F:    arch/arm/boot/dts/nuvoton-wpcm450*
 +F:    arch/arm/configs/wpcm450_defconfig
  F:    arch/arm/mach-npcm/wpcm450.c
  F:    drivers/*/*/*wpcm*
  F:    drivers/*/*wpcm*
  S:    Maintained
  F:    arch/arm64/boot/dts/freescale/s32g*.dts*
  
 -ARM/OPENMOKO NEO FREERUNNER (GTA02) MACHINE SUPPORT
 -L:    [email protected] (subscribers-only)
 -S:    Orphan
 -W:    http://wiki.openmoko.org/wiki/Neo_FreeRunner
 -F:    arch/arm/mach-s3c/gta02.h
 -F:    arch/arm/mach-s3c/mach-gta02.c
 -
  ARM/Orion SoC/Technologic Systems TS-78xx platform support
  M:    Alexander Clouter <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2556,6 -2647,43 +2556,6 @@@ F:     arch/arm/mach-oxnas
  F:    drivers/power/reset/oxnas-restart.c
  N:    oxnas
  
 -ARM/PALM TREO SUPPORT
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Orphan
 -F:    arch/arm/mach-pxa/palmtreo.*
 -
 -ARM/PALMTX,PALMT5,PALMLD,PALMTE2,PALMTC SUPPORT
 -M:    Marek Vasut <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -W:    http://hackndev.com
 -F:    arch/arm/mach-pxa/include/mach/palmld.h
 -F:    arch/arm/mach-pxa/include/mach/palmtc.h
 -F:    arch/arm/mach-pxa/include/mach/palmtx.h
 -F:    arch/arm/mach-pxa/palmld.c
 -F:    arch/arm/mach-pxa/palmt5.*
 -F:    arch/arm/mach-pxa/palmtc.c
 -F:    arch/arm/mach-pxa/palmte2.*
 -F:    arch/arm/mach-pxa/palmtx.c
 -
 -ARM/PALMZ72 SUPPORT
 -M:    Sergey Lapin <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -W:    http://hackndev.com
 -F:    arch/arm/mach-pxa/palmz72.*
 -
 -ARM/PLEB SUPPORT
 -M:    Peter Chubb <[email protected]>
 -S:    Maintained
 -W:    http://www.disy.cse.unsw.edu.au/Hardware/PLEB
 -
 -ARM/PT DIGITAL BOARD PORT
 -M:    Stefan Eletzhofer <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -W:    http://www.armlinux.org.uk/
 -
  ARM/QUALCOMM SUPPORT
  M:    Andy Gross <[email protected]>
  M:    Bjorn Andersson <[email protected]>
@@@ -2599,6 -2727,11 +2599,6 @@@ F:     include/dt-bindings/*/qcom
  F:    include/linux/*/qcom*
  F:    include/linux/soc/qcom/
  
 -ARM/RADISYS ENP2611 MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
  ARM/RDA MICRO ARCHITECTURE
  M:    Manivannan Sadhasivam <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2674,7 -2807,7 +2674,7 @@@ F:      Documentation/devicetree/bindings/i2
  F:    Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml
  F:    Documentation/devicetree/bindings/spi/spi-rockchip.yaml
  F:    arch/arm/boot/dts/rk3*
 -F:    arch/arm/boot/dts/rv1108*
 +F:    arch/arm/boot/dts/rv11*
  F:    arch/arm/mach-rockchip/
  F:    drivers/*/*/*rockchip*
  F:    drivers/*/*rockchip*
@@@ -2719,6 -2852,7 +2719,6 @@@ F:      include/linux/platform_data/*s3c
  F:    include/linux/serial_s3c.h
  F:    include/linux/soc/samsung/
  N:    exynos
 -N:    s3c2410
  N:    s3c64xx
  N:    s5pv210
  
@@@ -2791,7 -2925,6 +2791,7 @@@ M:      Patrice Chotard <patrice.chotard@fos
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
  W:    http://www.stlinux.com
 +F:    Documentation/devicetree/bindings/spi/st,ssc-spi.yaml
  F:    Documentation/devicetree/bindings/i2c/i2c-st.txt
  F:    arch/arm/boot/dts/sti*
  F:    arch/arm/mach-sti/
@@@ -2877,7 -3010,7 +2877,7 @@@ M:      [email protected]
  L:    [email protected] (moderated for non-subscribers)
  L:    [email protected]
  S:    Maintained
 -F:    arch/arm64/boot/dts/tesla*
 +F:    arch/arm64/boot/dts/tesla/
  
  ARM/TETON BGA MACHINE SUPPORT
  M:    "Mark F. Brown" <[email protected]>
@@@ -2930,6 -3063,16 +2930,6 @@@ F:     arch/arm64/boot/dts/ti/Makefil
  F:    arch/arm64/boot/dts/ti/k3-*
  F:    include/dt-bindings/pinctrl/k3.h
  
 -ARM/THECUS N2100 MACHINE SUPPORT
 -M:    Lennert Buytenhek <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -
 -ARM/TOSA MACHINE SUPPORT
 -M:    Dmitry Eremin-Solenikov <[email protected]>
 -M:    Dirk Opfer <[email protected]>
 -S:    Maintained
 -
  ARM/TOSHIBA VISCONTI ARCHITECTURE
  M:    Nobuhiro Iwamatsu <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -2986,7 -3129,7 +2986,7 @@@ S:      Maintaine
  F:    */*/*/vexpress*
  F:    */*/vexpress*
  F:    arch/arm/boot/dts/vexpress*
 -F:    arch/arm/mach-vexpress/
 +F:    arch/arm/mach-versatile/
  F:    arch/arm64/boot/dts/arm/
  F:    drivers/clk/versatile/clk-vexpress-osc.c
  F:    drivers/clocksource/timer-versatile.c
@@@ -2999,6 -3142,13 +2999,6 @@@ S:     Maintaine
  W:    http://www.armlinux.org.uk/
  F:    arch/arm/vfp/
  
 -ARM/VOIPAC PXA270 SUPPORT
 -M:    Marek Vasut <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/include/mach/vpac270.h
 -F:    arch/arm/mach-pxa/vpac270.c
 -
  ARM/VT8500 ARM ARCHITECTURE
  L:    [email protected] (moderated for non-subscribers)
  S:    Orphan
@@@ -3016,6 -3166,13 +3016,6 @@@ F:     drivers/video/fbdev/vt8500lcdfb.
  F:    drivers/video/fbdev/wm8505fb*
  F:    drivers/video/fbdev/wmt_ge_rops.*
  
 -ARM/ZIPIT Z2 SUPPORT
 -M:    Marek Vasut <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Maintained
 -F:    arch/arm/mach-pxa/include/mach/z2.h
 -F:    arch/arm/mach-pxa/z2.c
 -
  ARM/ZYNQ ARCHITECTURE
  M:    Michal Simek <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
@@@ -3153,7 -3310,7 +3153,7 @@@ ASPEED CRYPTO DRIVE
  M:    Neal Liu <[email protected]>
  L:    [email protected] (moderated for non-subscribers)
  S:    Maintained
 -F:    Documentation/devicetree/bindings/crypto/aspeed,ast2500-hace.yaml
 +F:    Documentation/devicetree/bindings/crypto/aspeed,*
  F:    drivers/crypto/aspeed/
  
  ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
@@@ -3354,7 -3511,7 +3354,7 @@@ F:      drivers/net/ieee802154/atusb.
  AUDIT SUBSYSTEM
  M:    Paul Moore <[email protected]>
  M:    Eric Paris <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 +L:    [email protected]
  S:    Supported
  W:    https://github.com/linux-audit
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
@@@ -3610,6 -3767,7 +3610,6 @@@ F:      net/bluetooth
  
  BONDING DRIVER
  M:    Jay Vosburgh <[email protected]>
 -M:    Veaceslav Falico <[email protected]>
  M:    Andy Gospodarek <[email protected]>
  L:    [email protected]
  S:    Supported
  S:    Maintained
  F:    tools/testing/selftests/bpf/
  
 +BPF [DOCUMENTATION] (Related to Standardization)
 +R:    David Vernet <[email protected]>
 +L:    [email protected]
 +L:    [email protected]
 +S:    Maintained
 +F:    Documentation/bpf/instruction-set.rst
 +
  BPF [MISC]
  L:    [email protected]
  S:    Odd Fixes
  S:    Maintained
  F:    drivers/phy/broadcom/phy-brcm-usb*
  
 +BROADCOM Broadband SoC High Speed SPI Controller DRIVER
 +M:    William Zhang <[email protected]>
 +M:    Kursad Oney <[email protected]>
 +M:    Jonas Gorski <[email protected]>
 +R:    Broadcom internal kernel review list <[email protected]>
 +L:    [email protected]
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/spi/brcm,bcm63xx-hsspi.yaml
 +F:    drivers/spi/spi-bcm63xx-hsspi.c
 +F:    drivers/spi/spi-bcmbca-hsspi.c
 +
  BROADCOM ETHERNET PHY DRIVERS
  M:    Florian Fainelli <[email protected]>
  R:    Broadcom internal kernel review list <[email protected]>
@@@ -4664,11 -4804,12 +4664,11 @@@ F:   net/sched/sch_etf.
  F:    net/sched/sch_taprio.c
  
  CC2520 IEEE-802.15.4 RADIO DRIVER
 -M:    Varka Bhadram <[email protected]>
 +M:    Stefan Schmidt <[email protected]>
  L:    [email protected]
 -S:    Maintained
 +S:    Odd Fixes
  F:    Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
  F:    drivers/net/ieee802154/cc2520.c
 -F:    include/linux/spi/cc2520.h
  
  CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
  M:    Gilad Ben-Yossef <[email protected]>
@@@ -4844,13 -4985,6 +4844,13 @@@ S:    Maintaine
  F:    Documentation/devicetree/bindings/sound/google,cros-ec-codec.yaml
  F:    sound/soc/codecs/cros_ec_codec.*
  
 +CHROMEOS EC UART DRIVER
 +M:    Bhanu Prakash Maiya <[email protected]>
 +R:    Benson Leung <[email protected]>
 +R:    Tzung-Bi Shih <[email protected]>
 +S:    Maintained
 +F:    drivers/platform/chrome/cros_ec_uart.c
 +
  CHROMEOS EC SUBDRIVERS
  M:    Benson Leung <[email protected]>
  R:    Guenter Roeck <[email protected]>
@@@ -4864,9 -4998,8 +4864,9 @@@ CHROMEOS EC USB TYPE-C DRIVE
  M:    Prashant Malani <[email protected]>
  L:    [email protected]
  S:    Maintained
 -F:    drivers/platform/chrome/cros_ec_typec.c
 +F:    drivers/platform/chrome/cros_ec_typec.*
  F:    drivers/platform/chrome/cros_typec_switch.c
 +F:    drivers/platform/chrome/cros_typec_vdm.*
  
  CHROMEOS EC USB PD NOTIFY DRIVER
  M:    Prashant Malani <[email protected]>
@@@ -5657,6 -5790,11 +5657,11 @@@ M:    SeongJae Park <[email protected]
  L:    [email protected]
  L:    [email protected]
  S:    Maintained
+ W:    https://damonitor.github.io
+ P:    Documentation/mm/damon/maintainer-profile.rst
+ T:    git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T:    quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
+ T:    git git://git.kernel.org/pub/scm/linux/kernel/git/sj/linux.git damon/next
  F:    Documentation/ABI/testing/sysfs-kernel-mm-damon
  F:    Documentation/admin-guide/mm/damon/
  F:    Documentation/mm/damon/
@@@ -5794,6 -5932,7 +5799,6 @@@ F:      Documentation/ABI/testing/sysfs-plat
  F:    drivers/platform/x86/dell/dell-wmi-ddv.c
  
  DELL WMI SYSMAN DRIVER
 -M:    Divya Bharathi <[email protected]>
  M:    Prasanth Ksr <[email protected]>
  L:    [email protected]
  L:    [email protected]
@@@ -5965,7 -6104,7 +5970,7 @@@ S:      Supporte
  F:    Documentation/networking/devlink
  F:    include/net/devlink.h
  F:    include/uapi/linux/devlink.h
 -F:    net/core/devlink.c
 +F:    net/devlink/
  
  DH ELECTRONICS IMX6 DHCOM/DHCOR BOARD SUPPORT
  M:    Christoph Niedermaier <[email protected]>
@@@ -6288,7 -6427,6 +6293,7 @@@ T:      git git://git.linbit.com/linux-drbd.
  T:    git git://git.linbit.com/drbd-8.4.git
  F:    Documentation/admin-guide/blockdev/
  F:    drivers/block/drbd/
 +F:    include/linux/drbd*
  F:    lib/lru_cache.c
  
  DRIVER COMPONENT FRAMEWORK
@@@ -6417,14 -6555,6 +6422,14 @@@ S:    Maintaine
  T:    git git://anongit.freedesktop.org/drm/drm-misc
  F:    drivers/gpu/drm/tiny/gm12u320.c
  
 +DRM DRIVER FOR HIMAX HX8394 MIPI-DSI LCD panels
 +M:    Ondrej Jirman <[email protected]>
 +M:    Javier Martinez Canillas <[email protected]>
 +S:    Maintained
 +T:    git git://anongit.freedesktop.org/drm/drm-misc
 +F:    Documentation/devicetree/bindings/display/panel/himax,hx8394.yaml
 +F:    drivers/gpu/drm/panel/panel-himax-hx8394.c
 +
  DRM DRIVER FOR HX8357D PANELS
  M:    Emma Anholt <[email protected]>
  S:    Maintained
@@@ -6446,6 -6576,11 +6451,6 @@@ T:     git git://anongit.freedesktop.org/dr
  F:    Documentation/devicetree/bindings/display/ilitek,ili9486.yaml
  F:    drivers/gpu/drm/tiny/ili9486.c
  
 -DRM DRIVER FOR INTEL I810 VIDEO CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/i810/
 -F:    include/uapi/drm/i810_drm.h
 -
  DRM DRIVER FOR JADARD JD9365DA-H3 MIPI-DSI LCD PANELS
  M:    Jagan Teki <[email protected]>
  S:    Maintained
@@@ -6474,6 -6609,11 +6479,6 @@@ S:     Maintaine
  F:    Documentation/devicetree/bindings/display/panel/mantix,mlaf057we51-x.yaml
  F:    drivers/gpu/drm/panel/panel-mantix-mlaf057we51.c
  
 -DRM DRIVER FOR MATROX G200/G400 GRAPHICS CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/mga/
 -F:    include/uapi/drm/mga_drm.h
 -
  DRM DRIVER FOR MGA G200 GRAPHICS CHIPS
  M:    Dave Airlie <[email protected]>
  R:    Thomas Zimmermann <[email protected]>
@@@ -6592,6 -6732,11 +6597,6 @@@ T:     git git://anongit.freedesktop.org/dr
  F:    drivers/gpu/drm/qxl/
  F:    include/uapi/drm/qxl_drm.h
  
 -DRM DRIVER FOR RAGE 128 VIDEO CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/r128/
 -F:    include/uapi/drm/r128_drm.h
 -
  DRM DRIVER FOR RAYDIUM RM67191 PANELS
  M:    Robert Chiras <[email protected]>
  S:    Maintained
@@@ -6619,6 -6764,11 +6624,6 @@@ S:     Maintaine
  F:    Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml
  F:    drivers/gpu/drm/panel/panel-sitronix-st7703.c
  
 -DRM DRIVER FOR SAVAGE VIDEO CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/savage/
 -F:    include/uapi/drm/savage_drm.h
 -
  DRM DRIVER FOR FIRMWARE FRAMEBUFFERS
  M:    Thomas Zimmermann <[email protected]>
  M:    Javier Martinez Canillas <[email protected]>
@@@ -6634,6 -6784,11 +6639,6 @@@ F:     include/drm/drm_aperture.
  F:    include/linux/aperture.h
  F:    include/video/nomodeset.h
  
 -DRM DRIVER FOR SIS VIDEO CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/sis/
 -F:    include/uapi/drm/sis_drm.h
 -
  DRM DRIVER FOR SITRONIX ST7586 PANELS
  M:    David Lechner <[email protected]>
  S:    Maintained
@@@ -6661,6 -6816,10 +6666,6 @@@ T:     git git://anongit.freedesktop.org/dr
  F:    Documentation/devicetree/bindings/display/ste,mcde.yaml
  F:    drivers/gpu/drm/mcde/
  
 -DRM DRIVER FOR TDFX VIDEO CARDS
 -S:    Orphan / Obsolete
 -F:    drivers/gpu/drm/tdfx/
 -
  DRM DRIVER FOR TI DLPC3433 MIPI DSI TO DMD BRIDGE
  M:    Jagan Teki <[email protected]>
  S:    Maintained
@@@ -6760,16 -6919,6 +6765,16 @@@ C:    irc://irc.oftc.net/dri-deve
  T:    git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
  F:    Documentation/accel/
  F:    drivers/accel/
 +F:    include/drm/drm_accel.h
 +
 +DRM ACCEL DRIVERS FOR INTEL VPU
 +M:    Jacek Lawrynowicz <[email protected]>
 +M:    Stanislaw Gruszka <[email protected]>
 +L:    [email protected]
 +S:    Supported
 +T:    git git://anongit.freedesktop.org/drm/drm-misc
 +F:    drivers/accel/ivpu/
 +F:    include/uapi/drm/ivpu_accel.h
  
  DRM DRIVERS FOR ALLWINNER A10
  M:    Maxime Ripard <[email protected]>
@@@ -6840,7 -6989,7 +6845,7 @@@ M:      Philipp Zabel <[email protected]
  L:    [email protected]
  S:    Maintained
  F:    Documentation/devicetree/bindings/display/imx/
 -F:    drivers/gpu/drm/imx/
 +F:    drivers/gpu/drm/imx/ipuv3/
  F:    drivers/gpu/ipu-v3/
  
  DRM DRIVERS FOR FREESCALE IMX BRIDGE
@@@ -6863,10 -7012,9 +6868,10 @@@ F:    drivers/gpu/drm/gma500
  DRM DRIVERS FOR HISILICON
  M:    Xinliang Liu <[email protected]>
  M:    Tian Tao  <[email protected]>
 -R:    John Stultz <[email protected]>
  R:    Xinwei Kong <[email protected]>
 -R:    Chen Feng <[email protected]>
 +R:    Sumit Semwal <[email protected]>
 +R:    Yongqin Liu <[email protected]>
 +R:    John Stultz <[email protected]>
  L:    [email protected]
  S:    Maintained
  T:    git git://anongit.freedesktop.org/drm/drm-misc
@@@ -6907,7 -7055,7 +6912,7 @@@ M:      Thierry Reding <thierry.reding@gmail
  L:    [email protected]
  L:    [email protected]
  S:    Supported
 -T:    git git://anongit.freedesktop.org/tegra/linux.git
 +T:    git https://gitlab.freedesktop.org/drm/tegra.git
  F:    Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
  F:    Documentation/devicetree/bindings/gpu/host1x/
  F:    drivers/gpu/drm/tegra/
@@@ -7473,6 -7621,7 +7478,6 @@@ S:      Maintaine
  F:    drivers/firmware/efi/test/
  
  EFI VARIABLE FILESYSTEM
 -M:    Matthew Garrett <[email protected]>
  M:    Jeremy Kerr <[email protected]>
  M:    Ard Biesheuvel <[email protected]>
  L:    [email protected]
@@@ -7601,7 -7750,6 +7606,7 @@@ R:      Jeffle Xu <[email protected]
  L:    [email protected]
  S:    Maintained
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
 +F:    Documentation/ABI/testing/sysfs-fs-erofs
  F:    Documentation/filesystems/erofs.rst
  F:    fs/erofs/
  F:    include/trace/events/erofs.h
@@@ -7752,11 -7900,7 +7757,11 @@@ F:    include/linux/extcon
  
  EXTRA BOOT CONFIG
  M:    Masami Hiramatsu <[email protected]>
 +L:    [email protected]
 +L:    [email protected]
 +Q:    https://patchwork.kernel.org/project/linux-trace-kernel/list/
  S:    Maintained
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
  F:    Documentation/admin-guide/bootconfig.rst
  F:    fs/proc/bootconfig.c
  F:    include/linux/bootconfig.h
@@@ -8057,7 -8201,7 +8062,7 @@@ F:      drivers/fpga/microchip-spi.
  FPU EMULATOR
  M:    Bill Metzenthen <[email protected]>
  S:    Maintained
 -W:    http://floatingpoint.sourceforge.net/emulator/index.html
 +W:    https://floatingpoint.billm.au/
  F:    arch/x86/math-emu/
  
  FRAMEBUFFER CORE
@@@ -8329,16 -8473,16 +8334,16 @@@ F:   fs/fscache
  F:    include/linux/fscache*.h
  
  FSCRYPT: FILE SYSTEM LEVEL ENCRYPTION SUPPORT
 +M:    Eric Biggers <[email protected]>
  M:    Theodore Y. Ts'o <[email protected]>
  M:    Jaegeuk Kim <[email protected]>
 -M:    Eric Biggers <[email protected]>
  L:    [email protected]
  S:    Supported
  Q:    https://patchwork.kernel.org/project/linux-fscrypt/list/
 -T:    git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git
 +T:    git https://git.kernel.org/pub/scm/fs/fscrypt/linux.git
  F:    Documentation/filesystems/fscrypt.rst
  F:    fs/crypto/
 -F:    include/linux/fscrypt*.h
 +F:    include/linux/fscrypt.h
  F:    include/uapi/linux/fscrypt.h
  
  FSI SUBSYSTEM
@@@ -8381,10 -8525,10 +8386,10 @@@ F:   include/linux/fsnotify*.
  FSVERITY: READ-ONLY FILE-BASED AUTHENTICITY PROTECTION
  M:    Eric Biggers <[email protected]>
  M:    Theodore Y. Ts'o <[email protected]>
 -L:    [email protected]
 +L:    [email protected]
  S:    Supported
 -Q:    https://patchwork.kernel.org/project/linux-fscrypt/list/
 -T:    git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git fsverity
 +Q:    https://patchwork.kernel.org/project/fsverity/list/
 +T:    git https://git.kernel.org/pub/scm/fs/fsverity/linux.git
  F:    Documentation/filesystems/fsverity.rst
  F:    fs/verity/
  F:    include/linux/fsverity.h
@@@ -8432,7 -8576,6 +8437,7 @@@ F:      kernel/trace/fgraph.
  F:    arch/*/*/*/*ftrace*
  F:    arch/*/*/*ftrace*
  F:    include/*/ftrace.h
 +F:    samples/ftrace
  
  FUNGIBLE ETHERNET DRIVERS
  M:    Dimitris Michailidis <[email protected]>
@@@ -8905,15 -9048,13 +8910,15 @@@ F:   block/partitions/efi.
  
  HABANALABS PCI DRIVER
  M:    Oded Gabbay <[email protected]>
 +L:    [email protected]
  S:    Supported
 +C:    irc://irc.oftc.net/dri-devel
  T:    git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git
  F:    Documentation/ABI/testing/debugfs-driver-habanalabs
  F:    Documentation/ABI/testing/sysfs-driver-habanalabs
 -F:    drivers/misc/habanalabs/
 +F:    drivers/accel/habanalabs/
  F:    include/trace/events/habanalabs.h
 -F:    include/uapi/misc/habanalabs.h
 +F:    include/uapi/drm/habanalabs_accel.h
  
  HACKRF MEDIA DRIVER
  M:    Antti Palosaari <[email protected]>
@@@ -9064,12 -9205,9 +9069,12 @@@ M:    Benjamin Tissoires <benjamin.tissoir
  L:    [email protected]
  S:    Maintained
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
 +F:    Documentation/hid/
  F:    drivers/hid/
  F:    include/linux/hid*
  F:    include/uapi/linux/hid*
 +F:    samples/hid/
 +F:    tools/testing/selftests/hid/
  
  HID LOGITECH DRIVERS
  R:    Filipe Laíns <[email protected]>
  S:    Maintained
  F:    drivers/hid/hid-logitech-*
  
 +HID++ LOGITECH DRIVERS
 +R:    Filipe Laíns <[email protected]>
 +R:    Bastien Nocera <[email protected]>
 +L:    [email protected]
 +S:    Maintained
 +F:    drivers/hid/hid-logitech-hidpp.c
 +
  HID PLAYSTATION DRIVER
  M:    Roderick Colenbrander <[email protected]>
  L:    [email protected]
@@@ -9173,7 -9304,7 +9178,7 @@@ F:      net/dsa/tag_hellcreek.
  
  HISILICON DMA DRIVER
  M:    Zhou Wang <[email protected]>
 -M:    Jie Hai <haijie1@hisilicon.com>
 +M:    Jie Hai <haijie1@huawei.com>
  L:    [email protected]
  S:    Maintained
  F:    drivers/dma/hisi_dma.c
@@@ -9340,7 -9471,7 +9345,7 @@@ F:      Documentation/mm/hmm.rs
  F:    include/linux/hmm*
  F:    lib/test_hmm*
  F:    mm/hmm*
- F:    tools/testing/selftests/vm/*hmm*
+ F:    tools/testing/selftests/mm/*hmm*
  
  HOST AP DRIVER
  M:    Jouni Malinen <[email protected]>
@@@ -9863,7 -9994,7 +9868,7 @@@ S:      Maintaine
  T:    git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git
  F:    Documentation/filesystems/idmappings.rst
  F:    tools/testing/selftests/mount_setattr/
 -F:    include/linux/mnt_idmapping.h
 +F:    include/linux/mnt_idmapping.*
  
  IDT VersaClock 5 CLOCK DRIVER
  M:    Luca Ceresoli <[email protected]>
@@@ -9874,7 -10005,6 +9879,7 @@@ F:     drivers/clk/clk-versaclock5.
  IEEE 802.15.4 SUBSYSTEM
  M:    Alexander Aring <[email protected]>
  M:    Stefan Schmidt <[email protected]>
 +M:    Miquel Raynal <[email protected]>
  L:    [email protected]
  S:    Maintained
  W:    https://linux-wpan.org/
  S:    Maintained
  F:    drivers/iio/pressure/dps310.c
  
 +INFINEON PEB2466 ASoC CODEC
 +M:    Herve Codina <[email protected]>
 +L:    [email protected] (moderated for non-subscribers)
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/sound/infineon,peb2466.yaml
 +F:    sound/soc/codecs/peb2466.c
 +
  INFINIBAND SUBSYSTEM
  M:    Jason Gunthorpe <[email protected]>
  M:    Leon Romanovsky <[email protected]>
@@@ -10458,7 -10581,7 +10463,7 @@@ S:   Maintaine
  F:    Documentation/ABI/testing/sysfs-driver-intel-m10-bmc
  F:    Documentation/hwmon/intel-m10-bmc-hwmon.rst
  F:    drivers/hwmon/intel-m10-bmc-hwmon.c
 -F:    drivers/mfd/intel-m10-bmc.c
 +F:    drivers/mfd/intel-m10-bmc*
  F:    include/linux/mfd/intel-m10-bmc.h
  
  INTEL MENLOW THERMAL DRIVER
@@@ -10565,13 -10688,6 +10570,13 @@@ S: Maintaine
  F:    arch/x86/include/asm/intel_telemetry.h
  F:    drivers/platform/x86/intel/telemetry/
  
 +INTEL TPMI DRIVER
 +M:    Srinivas Pandruvada <[email protected]>
 +L:    [email protected]
 +S:    Maintained
 +F:    drivers/platform/x86/intel/tpmi.c
 +F:    include/linux/intel_tpmi.h
 +
  INTEL UNCORE FREQUENCY CONTROL
  M:    Srinivas Pandruvada <[email protected]>
  L:    [email protected]
@@@ -10809,13 -10925,6 +10814,13 @@@ M: David Sterba <[email protected]
  S:    Odd Fixes
  F:    drivers/tty/ipwireless/
  
 +IRON DEVICE AUDIO CODEC DRIVERS
 +M:    Kiseok Jo <[email protected]>
 +L:    [email protected] (moderated for non-subscribers)
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/sound/irondevice,*
 +F:    sound/soc/codecs/sma*
 +
  IRQ DOMAINS (IRQ NUMBER MAPPING LIBRARY)
  M:    Marc Zyngier <[email protected]>
  S:    Maintained
  S:    Maintained
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
  F:    kernel/irq/
 +F:    include/linux/group_cpus.h
 +F:    lib/group_cpus.c
  
  IRQCHIP DRIVERS
  M:    Thomas Gleixner <[email protected]>
@@@ -11572,12 -11679,6 +11577,12 @@@ M: John Hawley <[email protected]
  S:    Maintained
  F:    tools/testing/ktest
  
 +KTZ8866 BACKLIGHT DRIVER
 +M:    Jianhua Lu <[email protected]>
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
 +F:    drivers/video/backlight/ktz8866.c
 +
  L3MDEV
  M:    David Ahern <[email protected]>
  L:    [email protected]
@@@ -11590,7 -11691,7 +11595,7 @@@ M:   Mickaël Salaün <[email protected]
  L:    [email protected]
  S:    Supported
  W:    https://landlock.io
 -T:    git https://github.com/landlock-lsm/linux.git
 +T:    git https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
  F:    Documentation/security/landlock.rst
  F:    Documentation/userspace-api/landlock.rst
  F:    include/uapi/linux/landlock.h
@@@ -12719,9 -12820,9 +12724,9 @@@ F:   drivers/iio/potentiometer/mcp4018.
  F:    drivers/iio/potentiometer/mcp4531.c
  
  MCR20A IEEE-802.15.4 RADIO DRIVER
 -M:    Xue Liu <[email protected]>
 +M:    Stefan Schmidt <[email protected]>
  L:    [email protected]
 -S:    Maintained
 +S:    Odd Fixes
  W:    https://github.com/xueliu/mcr20a-linux
  F:    Documentation/devicetree/bindings/net/ieee802154/mcr20a.txt
  F:    drivers/net/ieee802154/mcr20a.c
@@@ -13378,7 -13479,7 +13383,7 @@@ M:   Andrew Morton <akpm@linux-foundation
  L:    [email protected]
  S:    Maintained
  W:    http://www.linux-mm.org
- T:    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T:    git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
  T:    quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
  F:    include/linux/gfp.h
  F:    include/linux/gfp_types.h
@@@ -13387,7 -13488,8 +13392,8 @@@ F:   include/linux/mm.
  F:    include/linux/mmzone.h
  F:    include/linux/pagewalk.h
  F:    mm/
- F:    tools/testing/selftests/vm/
+ F:    tools/mm/
+ F:    tools/testing/selftests/mm/
  
  VMALLOC
  M:    Andrew Morton <[email protected]>
@@@ -13396,7 -13498,7 +13402,7 @@@ R:   Christoph Hellwig <[email protected]
  L:    [email protected]
  S:    Maintained
  W:    http://www.linux-mm.org
- T:    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T:    git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
  F:    include/linux/vmalloc.h
  F:    mm/vmalloc.c
  
@@@ -13614,7 -13716,6 +13620,7 @@@ S:   Maintaine
  F:    Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
  F:    Documentation/devicetree/bindings/net/dsa/microchip,lan937x.yaml
  F:    drivers/net/dsa/microchip/*
 +F:    include/linux/dsa/ksz_common.h
  F:    include/linux/platform_data/microchip-ksz.h
  F:    net/dsa/tag_ksz.c
  
@@@ -14047,7 -14148,6 +14053,7 @@@ M:   Saravanan Sekar <[email protected]
  S:    Maintained
  F:    Documentation/devicetree/bindings/mfd/mps,mp2629.yaml
  F:    Documentation/devicetree/bindings/regulator/mps,mp*.yaml
 +F:    drivers/hwmon/pmbus/mpq7932.c
  F:    drivers/iio/adc/mp2629_adc.c
  F:    drivers/mfd/mp2629.c
  F:    drivers/power/supply/mp2629_charger.c
@@@ -14068,7 -14168,6 +14074,7 @@@ M:   Peter Geis <[email protected]
  M:    Frank <[email protected]>
  L:    [email protected]
  S:    Maintained
 +F:    Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml
  F:    drivers/net/phy/motorcomm.c
  
  MOXA SMARTIO/INDUSTIO/INTELLIO SERIAL CARD
@@@ -14085,9 -14184,9 +14091,9 @@@ T:   git git://linuxtv.org/media_tree.gi
  F:    drivers/media/radio/radio-mr800.c
  
  MRF24J40 IEEE 802.15.4 RADIO DRIVER
 -M:    Alan Ott <[email protected]>
 +M:    Stefan Schmidt <[email protected]>
  L:    [email protected]
 -S:    Maintained
 +S:    Odd Fixes
  F:    Documentation/devicetree/bindings/net/ieee802154/mrf24j40.txt
  F:    drivers/net/ieee802154/mrf24j40.c
  
@@@ -14187,7 -14286,7 +14193,7 @@@ F:   drivers/media/i2c/mt9v111.
  
  MULTIFUNCTION DEVICES (MFD)
  M:    Lee Jones <[email protected]>
 -S:    Supported
 +S:    Maintained
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
  F:    Documentation/devicetree/bindings/mfd/
  F:    drivers/mfd/
@@@ -14449,8 -14548,6 +14455,8 @@@ M:   Florian Fainelli <[email protected]
  M:    Vladimir Oltean <[email protected]>
  S:    Maintained
  F:    Documentation/devicetree/bindings/net/dsa/
 +F:    Documentation/devicetree/bindings/net/ethernet-switch-port.yaml
 +F:    Documentation/devicetree/bindings/net/ethernet-switch.yaml
  F:    drivers/net/dsa/
  F:    include/linux/dsa/
  F:    include/linux/platform_data/dsa.h
@@@ -14469,10 -14566,8 +14475,10 @@@ Q: https://patchwork.kernel.org/project
  B:    mailto:[email protected]
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
 +F:    Documentation/core-api/netlink.rst
  F:    Documentation/networking/
  F:    Documentation/process/maintainer-netdev.rst
 +F:    Documentation/userspace-api/netlink/
  F:    include/linux/in.h
  F:    include/linux/net.h
  F:    include/linux/netdevice.h
@@@ -14484,7 -14579,6 +14490,7 @@@ F:   include/uapi/linux/netdevice.
  F:    lib/net_utils.c
  F:    lib/random32.c
  F:    net/
 +F:    tools/net/
  F:    tools/testing/selftests/net/
  
  NETWORKING [IPSEC]
@@@ -14513,6 -14607,7 +14519,6 @@@ F:   tools/testing/selftests/net/ipsec.
  
  NETWORKING [IPv4/IPv6]
  M:    "David S. Miller" <[email protected]>
 -M:    Hideaki YOSHIFUJI <[email protected]>
  M:    David Ahern <[email protected]>
  L:    [email protected]
  S:    Maintained
@@@ -14545,6 -14640,7 +14551,6 @@@ F:   net/netfilter/xt_SECMARK.
  F:    net/netlabel/
  
  NETWORKING [MPTCP]
 -M:    Mat Martineau <[email protected]>
  M:    Matthieu Baerts <[email protected]>
  L:    [email protected]
  L:    [email protected]
@@@ -15062,7 -15158,6 +15068,7 @@@ M:   Colin Foster <colin.foster@in-advant
  S:    Supported
  F:    Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml
  F:    drivers/mfd/ocelot*
 +F:    drivers/net/dsa/ocelot/ocelot_ext.c
  F:    include/linux/mfd/ocelot.h
  
  OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
@@@ -15226,6 -15321,7 +15232,6 @@@ Q:   http://patchwork.kernel.org/project/
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
  F:    arch/arm/configs/omap1_defconfig
  F:    arch/arm/mach-omap1/
 -F:    arch/arm/plat-omap/
  F:    drivers/i2c/busses/i2c-omap.c
  F:    include/linux/platform_data/ams-delta-fiq.h
  F:    include/linux/platform_data/i2c-omap.h
@@@ -15240,6 -15336,7 +15246,6 @@@ Q:   http://patchwork.kernel.org/project/
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
  F:    arch/arm/configs/omap2plus_defconfig
  F:    arch/arm/mach-omap2/
 -F:    arch/arm/plat-omap/
  F:    drivers/bus/ti-sysc.c
  F:    drivers/i2c/busses/i2c-omap.c
  F:    drivers/irqchip/irq-omap-intc.c
@@@ -15483,7 -15580,6 +15489,7 @@@ F:   drivers/mtd/nand/onenand
  F:    include/linux/mtd/onenand*.h
  
  ONEXPLAYER FAN DRIVER
 +M:    Derek John Clark <[email protected]>
  M:    Joaquín Ignacio Aramendía <[email protected]>
  L:    [email protected]
  S:    Maintained
  S:    Maintained
  F:    arch/mips/boot/dts/ralink/omega2p.dts
  
 +ONSEMI ETHERNET PHY DRIVERS
 +M:    Piergiorgio Beruto <[email protected]>
 +L:    [email protected]
 +S:    Supported
 +W:    http://www.onsemi.com
 +F:    drivers/net/phy/ncn*
 +
  OP-TEE DRIVER
  M:    Jens Wiklander <[email protected]>
  L:    [email protected]
@@@ -15576,7 -15665,7 +15582,7 @@@ OPENRISC ARCHITECTUR
  M:    Jonas Bonn <[email protected]>
  M:    Stefan Kristiansson <[email protected]>
  M:    Stafford Horne <[email protected]>
 -L:    [email protected].org
 +L:    [email protected].org
  S:    Maintained
  W:    http://openrisc.io
  T:    git https://github.com/openrisc/linux.git
@@@ -15667,12 -15756,6 +15673,12 @@@ S: Maintaine
  W:    https://wireless.wiki.kernel.org/en/users/Drivers/p54
  F:    drivers/net/wireless/intersil/p54/
  
 +PACKET SOCKETS
 +M:    Willem de Bruijn <[email protected]>
 +S:    Maintained
 +F:    include/uapi/linux/if_packet.h
 +F:    net/packet/af_packet.c
 +
  PACKING
  M:    Vladimir Oltean <[email protected]>
  L:    [email protected]
@@@ -15768,6 -15851,13 +15774,6 @@@ F:  arch/*/include/asm/paravirt*.
  F:    arch/*/kernel/paravirt*
  F:    include/linux/hypervisor.h
  
 -PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
 -M:    Tim Waugh <[email protected]>
 -L:    [email protected] (subscribers-only)
 -S:    Maintained
 -F:    Documentation/admin-guide/blockdev/paride.rst
 -F:    drivers/block/paride/
 -
  PARISC ARCHITECTURE
  M:    "James E.J. Bottomley" <[email protected]>
  M:    Helge Deller <[email protected]>
@@@ -16025,7 -16115,7 +16031,7 @@@ F:   drivers/pci/controller/pci-v3-semi.
  
  PCI ENDPOINT SUBSYSTEM
  M:    Lorenzo Pieralisi <[email protected]>
 -R:    Krzysztof Wilczyński <[email protected]>
 +M:    Krzysztof Wilczyński <[email protected]>
  R:    Manivannan Sadhasivam <[email protected]>
  R:    Kishon Vijay Abraham I <[email protected]>
  L:    [email protected]
@@@ -16033,7 -16123,7 +16039,7 @@@ S:   Supporte
  Q:    https://patchwork.kernel.org/project/linux-pci/list/
  B:    https://bugzilla.kernel.org
  C:    irc://irc.oftc.net/linux-pci
 -T:    git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  F:    Documentation/PCI/endpoint/*
  F:    Documentation/misc-devices/pci-endpoint-test.rst
  F:    drivers/misc/pci_endpoint_test.c
@@@ -16068,7 -16158,7 +16074,7 @@@ S:   Supporte
  Q:    https://patchwork.kernel.org/project/linux-pci/list/
  B:    https://bugzilla.kernel.org
  C:    irc://irc.oftc.net/linux-pci
 -T:    git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  F:    Documentation/driver-api/pci/p2pdma.rst
  F:    drivers/pci/p2pdma.c
  F:    include/linux/pci-p2pdma.h
@@@ -16090,14 -16180,14 +16096,14 @@@ F:        drivers/pci/controller/pci-xgene-msi
  
  PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS
  M:    Lorenzo Pieralisi <[email protected]>
 +M:    Krzysztof Wilczyński <[email protected]>
  R:    Rob Herring <[email protected]>
 -R:    Krzysztof Wilczyński <[email protected]>
  L:    [email protected]
  S:    Supported
  Q:    https://patchwork.kernel.org/project/linux-pci/list/
  B:    https://bugzilla.kernel.org
  C:    irc://irc.oftc.net/linux-pci
 -T:    git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  F:    Documentation/devicetree/bindings/pci/
  F:    drivers/pci/controller/
  F:    drivers/pci/pci-bridge-emul.c
@@@ -16110,7 -16200,7 +16116,7 @@@ S:   Supporte
  Q:    https://patchwork.kernel.org/project/linux-pci/list/
  B:    https://bugzilla.kernel.org
  C:    irc://irc.oftc.net/linux-pci
 -T:    git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  F:    Documentation/PCI/
  F:    Documentation/devicetree/bindings/pci/
  F:    arch/x86/kernel/early-quirks.c
@@@ -16329,7 -16419,6 +16335,7 @@@ R:   Mark Rutland <[email protected]
  R:    Alexander Shishkin <[email protected]>
  R:    Jiri Olsa <[email protected]>
  R:    Namhyung Kim <[email protected]>
 +R:    Ian Rogers <[email protected]>
  L:    [email protected]
  L:    [email protected]
  S:    Supported
@@@ -16541,13 -16630,6 +16547,13 @@@ S: Maintaine
  F:    Documentation/devicetree/bindings/iio/chemical/plantower,pms7003.yaml
  F:    drivers/iio/chemical/pms7003.c
  
 +PLCA RECONCILIATION SUBLAYER (IEEE802.3 Clause 148)
 +M:    Piergiorgio Beruto <[email protected]>
 +L:    [email protected]
 +S:    Maintained
 +F:    drivers/net/phy/mdio-open-alliance.h
 +F:    net/ethtool/plca.c
 +
  PLDMFW LIBRARY
  M:    Jacob Keller <[email protected]>
  S:    Maintained
@@@ -17145,13 -17227,6 +17151,13 @@@ T: git git://git.kernel.org/pub/scm/lin
  F:    Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
  F:    drivers/net/wireless/ath/ath11k/
  
 +QUALCOMM ATH12K WIRELESS DRIVER
 +M:    Kalle Valo <[email protected]>
 +L:    [email protected]
 +S:    Supported
 +T:    git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
 +F:    drivers/net/wireless/ath/ath12k/
 +
  QUALCOMM ATHEROS ATH9K WIRELESS DRIVER
  M:    Toke Høiland-Jørgensen <[email protected]>
  L:    [email protected]
@@@ -17723,13 -17798,6 +17729,13 @@@ F: Documentation/devicetree/bindings/ne
  F:    drivers/net/ethernet/renesas/
  F:    include/linux/sh_eth.h
  
 +RENESAS IDT821034 ASoC CODEC
 +M:    Herve Codina <[email protected]>
 +L:    [email protected] (moderated for non-subscribers)
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/sound/renesas,idt821034.yaml
 +F:    sound/soc/codecs/idt821034.c
 +
  RENESAS R-CAR GYROADC DRIVER
  M:    Marek Vasut <[email protected]>
  L:    [email protected]
@@@ -17895,7 -17963,6 +17901,7 @@@ M:   Albert Ou <[email protected]
  L:    [email protected]
  S:    Supported
  Q:    https://patchwork.kernel.org/project/linux-riscv/list/
 +C:    irc://irc.libera.chat/riscv
  P:    Documentation/riscv/patch-acceptance.rst
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
  F:    arch/riscv/
  S:    Supported
  W:    https://github.com/Rust-for-Linux/linux
  B:    https://github.com/Rust-for-Linux/linux/issues
 +C:    zulip://rust-for-linux.zulipchat.com
  T:    git https://github.com/Rust-for-Linux/linux.git rust-next
  F:    Documentation/rust/
  F:    rust/
@@@ -18201,7 -18267,6 +18207,7 @@@ F:   Documentation/driver-api/s390-driver
  F:    Documentation/s390/
  F:    arch/s390/
  F:    drivers/s390/
 +F:    drivers/watchdog/diag288_wdt.c
  
  S390 COMMON I/O LAYER
  M:    Vineeth Vijayan <[email protected]>
@@@ -18262,13 -18327,6 +18268,13 @@@ F: arch/s390/pci
  F:    drivers/pci/hotplug/s390_pci_hpc.c
  F:    Documentation/s390/pci.rst
  
 +S390 SCM DRIVER
 +M:    Vineeth Vijayan <[email protected]>
 +L:    [email protected]
 +S:    Supported
 +F:    drivers/s390/block/scm*
 +F:    drivers/s390/cio/scm.c
 +
  S390 VFIO AP DRIVER
  M:    Tony Krowiak <[email protected]>
  M:    Halil Pasic <[email protected]>
  S:    Supported
  F:    drivers/s390/scsi/zfcp_*
  
 -S3C ADC BATTERY DRIVER
 -M:    Krzysztof Kozlowski <[email protected]>
 -L:    [email protected]
 -S:    Odd Fixes
 -F:    drivers/power/supply/s3c_adc_battery.c
 -F:    include/linux/s3c_adc_battery.h
 -
 -S3C24XX SD/MMC Driver
 -M:    Ben Dooks <[email protected]>
 -L:    [email protected] (moderated for non-subscribers)
 -S:    Supported
 -F:    drivers/mmc/host/s3cmci.*
 -
  SAA6588 RDS RECEIVER DRIVER
  M:    Hans Verkuil <[email protected]>
  L:    [email protected]
@@@ -18464,9 -18535,11 +18470,9 @@@ F:  Documentation/devicetree/bindings/cl
  F:    Documentation/devicetree/bindings/clock/samsung,s3c*
  F:    drivers/clk/samsung/
  F:    include/dt-bindings/clock/exynos*.h
 -F:    include/dt-bindings/clock/s3c*.h
  F:    include/dt-bindings/clock/s5p*.h
  F:    include/dt-bindings/clock/samsung,*.h
  F:    include/linux/clk/samsung.h
 -F:    include/linux/platform_data/clk-s3c2410.h
  
  SAMSUNG SPI DRIVERS
  M:    Krzysztof Kozlowski <[email protected]>
@@@ -18477,6 -18550,7 +18483,6 @@@ S:   Maintaine
  F:    Documentation/devicetree/bindings/spi/samsung,spi*.yaml
  F:    drivers/spi/spi-s3c*
  F:    include/linux/platform_data/spi-s3c64xx.h
 -F:    include/linux/spi/s3c24xx-fiq.h
  
  SAMSUNG SXGBE DRIVERS
  M:    Byungho An <[email protected]>
@@@ -18610,9 -18684,9 +18616,9 @@@ F:   drivers/target
  F:    include/target/
  
  SCTP PROTOCOL
 -M:    Vlad Yasevich <[email protected]>
  M:    Neil Horman <[email protected]>
  M:    Marcelo Ricardo Leitner <[email protected]>
 +M:    Xin Long <[email protected]>
  L:    [email protected]
  S:    Maintained
  W:    http://lksctp.sourceforge.net
@@@ -18835,7 -18909,6 +18841,7 @@@ M:   Edward Cree <[email protected]
  M:    Martin Habets <[email protected]>
  L:    [email protected]
  S:    Supported
 +F:    Documentation/networking/devlink/sfc.rst
  F:    drivers/net/ethernet/sfc/
  
  SFF/SFP/SFP+ MODULE SUPPORT
@@@ -19047,6 -19120,14 +19053,6 @@@ M:  Simtec Linux Team <[email protected]
  S:    Supported
  W:    http://www.simtec.co.uk/products/EB110ATX/
  
 -SIMTEC EB2410ITX (BAST)
 -M:    Simtec Linux Team <[email protected]>
 -S:    Supported
 -W:    http://www.simtec.co.uk/products/EB2410ITX/
 -F:    arch/arm/mach-s3c/bast-ide.c
 -F:    arch/arm/mach-s3c/bast-irq.c
 -F:    arch/arm/mach-s3c/mach-bast.c
 -
  SIOX
  M:    Thorsten Scherer <[email protected]>
  M:    Uwe Kleine-König <[email protected]>
  S:    Orphan
  F:    sound/soc/uniphier/
  
 +SOCKET TIMESTAMPING
 +M:    Willem de Bruijn <[email protected]>
 +S:    Maintained
 +F:    Documentation/networking/timestamping.rst
 +F:    include/uapi/linux/net_tstamp.h
 +F:    tools/testing/selftests/net/so_txtime.c
 +
  SOEKRIS NET48XX LED SUPPORT
  M:    Chris Boot <[email protected]>
  S:    Maintained
@@@ -19831,15 -19905,13 +19837,15 @@@ F:        Documentation/devicetree/bindings/cl
  F:    drivers/clk/starfive/clk-starfive-jh7100*
  F:    include/dt-bindings/clock/starfive-jh7100*.h
  
 -STARFIVE JH7100 PINCTRL DRIVER
 +STARFIVE JH71X0 PINCTRL DRIVERS
  M:    Emil Renner Berthing <[email protected]>
 +M:    Jianlong Huang <[email protected]>
  L:    [email protected]
  S:    Maintained
 -F:    Documentation/devicetree/bindings/pinctrl/starfive,jh7100-pinctrl.yaml
 -F:    drivers/pinctrl/starfive/
 +F:    Documentation/devicetree/bindings/pinctrl/starfive,jh71*.yaml
 +F:    drivers/pinctrl/starfive/pinctrl-starfive-jh71*
  F:    include/dt-bindings/pinctrl/pinctrl-starfive-jh7100.h
 +F:    include/dt-bindings/pinctrl/starfive,jh7110-pinctrl.h
  
  STARFIVE JH7100 RESET CONTROLLER DRIVER
  M:    Emil Renner Berthing <[email protected]>
@@@ -19848,12 -19920,6 +19854,12 @@@ F: Documentation/devicetree/bindings/re
  F:    drivers/reset/reset-starfive-jh7100.c
  F:    include/dt-bindings/reset/starfive-jh7100.h
  
 +STARFIVE TRNG DRIVER
 +M:    Jia Jie Ho <[email protected]>
 +S:    Supported
 +F:    Documentation/devicetree/bindings/rng/starfive*
 +F:    drivers/char/hw_random/jh7110-trng.c
 +
  STATIC BRANCH/CALL
  M:    Peter Zijlstra <[email protected]>
  M:    Josh Poimboeuf <[email protected]>
@@@ -20011,7 -20077,6 +20017,7 @@@ F:   drivers/watchdog/sunplus_wdt.
  SUPERH
  M:    Yoshinori Sato <[email protected]>
  M:    Rich Felker <[email protected]>
 +M:    John Paul Adrian Glaubitz <[email protected]>
  L:    [email protected]
  S:    Maintained
  Q:    http://patchwork.kernel.org/project/linux-sh/list/
  S:    Supported
  B:    https://bugzilla.kernel.org
  F:    Documentation/power/
 -F:    arch/x86/kernel/acpi/
 +F:    arch/x86/kernel/acpi/sleep*
 +F:    arch/x86/kernel/acpi/wakeup*
  F:    drivers/base/power/
  F:    include/linux/freezer.h
  F:    include/linux/pm.h
@@@ -20245,7 -20309,8 +20251,7 @@@ S:   Maintaine
  F:    drivers/platform/x86/system76_acpi.c
  
  SYSV FILESYSTEM
 -M:    Christoph Hellwig <[email protected]>
 -S:    Maintained
 +S:    Orphan
  F:    Documentation/filesystems/sysv-fs.rst
  F:    fs/sysv/
  F:    include/linux/sysv_fs.h
@@@ -20636,7 -20701,6 +20642,7 @@@ S:   Supporte
  Q:    https://patchwork.kernel.org/project/linux-pm/list/
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git thermal
  F:    Documentation/ABI/testing/sysfs-class-thermal
 +F:    Documentation/admin-guide/thermal/
  F:    Documentation/devicetree/bindings/thermal/
  F:    Documentation/driver-api/thermal/
  F:    drivers/thermal/
@@@ -20718,7 -20782,7 +20724,7 @@@ M:   Mika Westerberg <mika.westerberg@lin
  M:    Yehezkel Bernat <[email protected]>
  L:    [email protected]
  S:    Maintained
 -F:    drivers/net/thunderbolt.c
 +F:    drivers/net/thunderbolt/
  
  THUNDERX GPIO DRIVER
  M:    Robert Richter <[email protected]>
@@@ -20793,6 -20857,7 +20799,6 @@@ W:   https://linuxtv.or
  Q:    http://patchwork.linuxtv.org/project/linux-media/list/
  T:    git git://linuxtv.org/mhadli/v4l-dvb-davinci_devices.git
  F:    drivers/media/platform/ti/davinci/
 -F:    drivers/staging/media/deprecated/vpfe_capture/
  F:    include/media/davinci/
  
  TI ENHANCED CAPTURE (eCAP) DRIVER
  S:    Supported
  F:    drivers/ufs/host/*dwc*
  
 +UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER EXYNOS HOOKS
 +M:    Alim Akhtar <[email protected]>
 +L:    [email protected]
 +S:    Maintained
 +F:    drivers/ufs/host/ufs-exynos*
 +
  UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER MEDIATEK HOOKS
  M:    Stanley Chu <[email protected]>
  L:    [email protected]
  S:    Maintained
  F:    drivers/ufs/host/ufs-mediatek*
  
 +UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER QUALCOMM HOOKS
 +M:    Manivannan Sadhasivam <[email protected]>
 +L:    [email protected]
 +L:    [email protected]
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/ufs/qcom,ufs.yaml
 +F:    drivers/ufs/host/ufs-qcom*
 +
  UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER RENESAS HOOKS
  M:    Yoshihiro Shimoda <[email protected]>
  L:    [email protected]
@@@ -21666,7 -21717,6 +21672,7 @@@ F:   include/uapi/linux/uvcvideo.
  
  USB WEBCAM GADGET
  M:    Laurent Pinchart <[email protected]>
 +M:    Daniel Scally <[email protected]>
  L:    [email protected]
  S:    Maintained
  F:    drivers/usb/gadget/function/*uvc*
@@@ -21702,13 -21752,6 +21708,13 @@@ T: git git://linuxtv.org/media_tree.gi
  F:    Documentation/admin-guide/media/zr364xx*
  F:    drivers/staging/media/deprecated/zr364xx/
  
 +USER DATAGRAM PROTOCOL (UDP)
 +M:    Willem de Bruijn <[email protected]>
 +S:    Maintained
 +F:    include/linux/udp.h
 +F:    net/ipv4/udp.c
 +F:    net/ipv6/udp.c
 +
  USER-MODE LINUX (UML)
  M:    Richard Weinberger <[email protected]>
  M:    Anton Ivanov <[email protected]>
@@@ -21754,9 -21797,11 +21760,9 @@@ W:  http://en.wikipedia.org/wiki/Util-li
  T:    git git://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
  
  UUID HELPERS
 -M:    Christoph Hellwig <[email protected]>
  R:    Andy Shevchenko <[email protected]>
  L:    [email protected]
  S:    Maintained
 -T:    git git://git.infradead.org/users/hch/uuid.git
  F:    include/linux/uuid.h
  F:    include/uapi/linux/uuid.h
  F:    lib/test_uuid.c
@@@ -22585,7 -22630,6 +22591,7 @@@ S:   Maintaine
  T:    git git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
  F:    drivers/platform/olpc/
  F:    drivers/platform/x86/
 +F:    include/linux/platform_data/x86/
  
  X86 PLATFORM DRIVERS - ARCH
  R:    Darren Hart <[email protected]>
@@@ -22871,13 -22915,6 +22877,13 @@@ F: Documentation/devicetree/bindings/dm
  F:    drivers/dma/xilinx/xilinx_dpdma.c
  F:    include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
  
 +XILINX ZYNQMP OCM EDAC DRIVER
 +M:    Shubhrajyoti Datta <[email protected]>
 +M:    Sai Krishna Potthuri <[email protected]>
 +S:    Maintained
 +F:    Documentation/devicetree/bindings/memory-controllers/xlnx,zynqmp-ocmc-1.0.yaml
 +F:    drivers/edac/zynqmp_edac.c
 +
  XILINX ZYNQMP PSGTR PHY DRIVER
  M:    Anurag Kumar Vulisha <[email protected]>
  M:    Laurent Pinchart <[email protected]>
index c81e7be2b4eab32e29b89014d479e0370bcb9225,61c30b9a24eac7cd1c6075d7e986758ca52ffd28..0e8ff85890adec077412107fb891793701ca5fe9
@@@ -78,6 -78,7 +78,6 @@@ void arch_cpu_idle(void
                arm_pm_idle();
        else
                cpu_do_idle();
 -      raw_local_irq_enable();
  }
  
  void arch_cpu_idle_prepare(void)
@@@ -315,7 -316,7 +315,7 @@@ static int __init gate_vma_init(void
        gate_vma.vm_page_prot = PAGE_READONLY_EXEC;
        gate_vma.vm_start = 0xffff0000;
        gate_vma.vm_end = 0xffff0000 + PAGE_SIZE;
-       gate_vma.vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC;
+       vm_flags_init(&gate_vma, VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC);
        return 0;
  }
  arch_initcall(gate_vma_init);
index 27455bfd64bc3b72d7b8462764cfdfc483158cb0,575c63de894f4b69c0328bd78f67795c0983ad42..b6ba466e2e8a3fc758dcbe236c22653eb6464ac0
@@@ -275,7 -275,6 +275,7 @@@ static inline void set_pte(pte_t *ptep
  }
  
  extern void __sync_icache_dcache(pte_t pteval);
 +bool pgattr_change_is_safe(u64 old, u64 new);
  
  /*
   * PTE bits configuration in the presence of hardware Dirty Bit Management
   *   PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
   */
  
 -static inline void __check_racy_pte_update(struct mm_struct *mm, pte_t *ptep,
 +static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
                                           pte_t pte)
  {
        pte_t old_pte;
        VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte),
                     "%s: racy dirty state clearing: 0x%016llx -> 0x%016llx",
                     __func__, pte_val(old_pte), pte_val(pte));
 +      VM_WARN_ONCE(!pgattr_change_is_safe(pte_val(old_pte), pte_val(pte)),
 +                   "%s: unsafe attribute change: 0x%016llx -> 0x%016llx",
 +                   __func__, pte_val(old_pte), pte_val(pte));
  }
  
  static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
                        mte_sync_tags(old_pte, pte);
        }
  
 -      __check_racy_pte_update(mm, ptep, pte);
 +      __check_safe_pte_update(mm, ptep, pte);
  
        set_pte(ptep, pte);
  }
@@@ -421,7 -417,6 +421,6 @@@ static inline pgprot_t mk_pmd_sect_prot
        return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
  }
  
- #define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  static inline pte_t pte_swp_mkexclusive(pte_t pte)
  {
        return set_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
index 3e01f4f3ab08aa9ec797390032d109d5e33264c3,5b9f409a940d87f6d2deff2f61fb645794772c9f..d8d8de0ded992d3862c2d058c06f356ce7fa50c3
@@@ -721,23 -721,21 +721,25 @@@ static inline pmd_t pmdp_establish(stru
        page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
        return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
  }
 +
 +#define pmdp_collapse_flush pmdp_collapse_flush
 +extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 +                               unsigned long address, pmd_t *pmdp);
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  /*
-  * Encode and decode a swap entry
+  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+  * are !pte_none() && !pte_present().
   *
   * Format of swap PTE:
   *    bit            0:       _PAGE_PRESENT (zero)
   *    bit       1 to 3:       _PAGE_LEAF (zero)
   *    bit            5:       _PAGE_PROT_NONE (zero)
-  *    bits      6 to 10:      swap type
-  *    bits 10 to XLEN-1:      swap offset
+  *    bit            6:       exclusive marker
+  *    bits      7 to 11:      swap type
+  *    bits 11 to XLEN-1:      swap offset
   */
- #define __SWP_TYPE_SHIFT      6
+ #define __SWP_TYPE_SHIFT      7
  #define __SWP_TYPE_BITS               5
  #define __SWP_TYPE_MASK               ((1UL << __SWP_TYPE_BITS) - 1)
  #define __SWP_OFFSET_SHIFT    (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
  #define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
  #define __swp_offset(x)       ((x).val >> __SWP_OFFSET_SHIFT)
  #define __swp_entry(type, offset) ((swp_entry_t) \
-       { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
+       { (((type) & __SWP_TYPE_MASK) << __SWP_TYPE_SHIFT) | \
+         ((offset) << __SWP_OFFSET_SHIFT) })
  
  #define __pte_to_swp_entry(pte)       ((swp_entry_t) { pte_val(pte) })
  #define __swp_entry_to_pte(x) ((pte_t) { (x).val })
  
+ static inline int pte_swp_exclusive(pte_t pte)
+ {
+       return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+ }
+ static inline pte_t pte_swp_mkexclusive(pte_t pte)
+ {
+       return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+ }
+ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+ {
+       return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+ }
  #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
  #define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
  #define __swp_entry_to_pmd(swp) __pmd((swp).val)
index b87ca864d27d7d4d59246645b049ec8fbd3ef5fe,2b5db99e31dd77b17c920007b7e1c97f49d2688a..2c70b4d1263d2057ab9932945fa6d1b4c5269bcf
@@@ -23,7 -23,6 +23,7 @@@
  #include <asm/uv.h>
  
  extern pgd_t swapper_pg_dir[];
 +extern pgd_t invalid_pg_dir[];
  extern void paging_init(void);
  extern unsigned long s390_invalid_asce;
  
@@@ -182,20 -181,12 +182,20 @@@ static inline int is_module_addr(void *
  #define _PAGE_SOFT_DIRTY 0x000
  #endif
  
 +#define _PAGE_SW_BITS 0xffUL          /* All SW bits */
 +
  #define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE       /* SW pte exclusive swap bit */
  
  /* Set of bits not changed in pte_modify */
  #define _PAGE_CHG_MASK                (PAGE_MASK | _PAGE_SPECIAL | _PAGE_DIRTY | \
                                 _PAGE_YOUNG | _PAGE_SOFT_DIRTY)
  
 +/*
 + * Mask of bits that must not be changed with RDP. Allow only _PAGE_PROTECT
 + * HW bit and all SW bits.
 + */
 +#define _PAGE_RDP_MASK                ~(_PAGE_PROTECT | _PAGE_SW_BITS)
 +
  /*
   * handle_pte_fault uses pte_present and pte_none to find out the pte type
   * WITHOUT holding the page table lock. The _PAGE_PRESENT bit is used to
                                   _REGION3_ENTRY_YOUNG |  \
                                   _REGION_ENTRY_PROTECT | \
                                   _REGION_ENTRY_NOEXEC)
 +#define REGION3_KERNEL_EXEC __pgprot(_REGION_ENTRY_TYPE_R3 | \
 +                               _REGION3_ENTRY_LARGE |  \
 +                               _REGION3_ENTRY_READ |   \
 +                               _REGION3_ENTRY_WRITE |  \
 +                               _REGION3_ENTRY_YOUNG |  \
 +                               _REGION3_ENTRY_DIRTY)
  
  static inline bool mm_p4d_folded(struct mm_struct *mm)
  {
@@@ -827,7 -812,6 +827,6 @@@ static inline int pmd_protnone(pmd_t pm
  }
  #endif
  
- #define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
  static inline int pte_swp_exclusive(pte_t pte)
  {
        return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
@@@ -1060,19 -1044,6 +1059,19 @@@ static inline pte_t pte_mkhuge(pte_t pt
  #define IPTE_NODAT    0x400
  #define IPTE_GUEST_ASCE       0x800
  
 +static __always_inline void __ptep_rdp(unsigned long addr, pte_t *ptep,
 +                                     unsigned long opt, unsigned long asce,
 +                                     int local)
 +{
 +      unsigned long pto;
 +
 +      pto = __pa(ptep) & ~(PTRS_PER_PTE * sizeof(pte_t) - 1);
 +      asm volatile(".insn rrf,0xb98b0000,%[r1],%[r2],%[asce],%[m4]"
 +                   : "+m" (*ptep)
 +                   : [r1] "a" (pto), [r2] "a" ((addr & PAGE_MASK) | opt),
 +                     [asce] "a" (asce), [m4] "i" (local));
 +}
 +
  static __always_inline void __ptep_ipte(unsigned long address, pte_t *ptep,
                                        unsigned long opt, unsigned long asce,
                                        int local)
@@@ -1223,42 -1194,6 +1222,42 @@@ static inline void ptep_set_wrprotect(s
                ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte));
  }
  
 +/*
 + * Check if PTEs only differ in _PAGE_PROTECT HW bit, but also allow SW PTE
 + * bits in the comparison. Those might change e.g. because of dirty and young
 + * tracking.
 + */
 +static inline int pte_allow_rdp(pte_t old, pte_t new)
 +{
 +      /*
 +       * Only allow changes from RO to RW
 +       */
 +      if (!(pte_val(old) & _PAGE_PROTECT) || pte_val(new) & _PAGE_PROTECT)
 +              return 0;
 +
 +      return (pte_val(old) & _PAGE_RDP_MASK) == (pte_val(new) & _PAGE_RDP_MASK);
 +}
 +
 +static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
 +                                              unsigned long address)
 +{
 +      /*
 +       * RDP might not have propagated the PTE protection reset to all CPUs,
 +       * so there could be spurious TLB protection faults.
 +       * NOTE: This will also be called when a racing pagetable update on
 +       * another thread already installed the correct PTE. Both cases cannot
 +       * really be distinguished.
 +       * Therefore, only do the local TLB flush when RDP can be used, to avoid
 +       * unnecessary overhead.
 +       */
 +      if (MACHINE_HAS_RDP)
 +              asm volatile("ptlb" : : : "memory");
 +}
 +#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
 +
 +void ptep_reset_dat_prot(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 +                       pte_t new);
 +
  #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
  static inline int ptep_set_access_flags(struct vm_area_struct *vma,
                                        unsigned long addr, pte_t *ptep,
  {
        if (pte_same(*ptep, entry))
                return 0;
 -      ptep_xchg_direct(vma->vm_mm, addr, ptep, entry);
 +      if (MACHINE_HAS_RDP && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, entry))
 +              ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry);
 +      else
 +              ptep_xchg_direct(vma->vm_mm, addr, ptep, entry);
        return 1;
  }
  
index 2738eb28cb2ece7e51ed4ae35d633febb83ae590,ec5e4d2048cb5077206bc10893c5020fdbe4d932..11a5c68d1218545da2dba1135a81ccc8a4b42d16
@@@ -44,16 -44,13 +44,16 @@@ unsigned int vclocks_used __read_mostly
  unsigned int __read_mostly vdso64_enabled = 1;
  #endif
  
 -void __init init_vdso_image(const struct vdso_image *image)
 +int __init init_vdso_image(const struct vdso_image *image)
  {
 +      BUILD_BUG_ON(VDSO_CLOCKMODE_MAX >= 32);
        BUG_ON(image->size % PAGE_SIZE != 0);
  
        apply_alternatives((struct alt_instr *)(image->data + image->alt),
                           (struct alt_instr *)(image->data + image->alt +
                                                image->alt_len));
 +
 +      return 0;
  }
  
  static const struct vm_special_mapping vvar_mapping;
@@@ -116,10 -113,8 +116,8 @@@ int vdso_join_timens(struct task_struc
  
        mmap_read_lock(mm);
        for_each_vma(vmi, vma) {
-               unsigned long size = vma->vm_end - vma->vm_start;
                if (vma_is_special_mapping(vma, &vvar_mapping))
-                       zap_page_range(vma, vma->vm_start, size);
+                       zap_vma_pages(vma);
        }
        mmap_read_unlock(mm);
  
@@@ -421,4 -416,18 +419,4 @@@ static __init int vdso_setup(char *s
        return 1;
  }
  __setup("vdso=", vdso_setup);
 -
 -static int __init init_vdso(void)
 -{
 -      BUILD_BUG_ON(VDSO_CLOCKMODE_MAX >= 32);
 -
 -      init_vdso_image(&vdso_image_64);
 -
 -#ifdef CONFIG_X86_X32_ABI
 -      init_vdso_image(&vdso_image_x32);
 -#endif
 -
 -      return 0;
 -}
 -subsys_initcall(init_vdso);
  #endif /* CONFIG_X86_64 */
index 004b37f026d1521eb1c1cf69dee733a138a5ef20,691bf8934b6fe570292c55b4dec4e1c9e8b1d6c4..46a00aa858b6fe27712f7d91cad8f4cacdc32b32
@@@ -159,10 -159,10 +159,10 @@@ static inline void set_page_memtype(str
                break;
        }
  
 +      old_flags = READ_ONCE(pg->flags);
        do {
 -              old_flags = pg->flags;
                new_flags = (old_flags & _PGMT_CLEAR_MASK) | memtype_flags;
 -      } while (cmpxchg(&pg->flags, old_flags, new_flags) != old_flags);
 +      } while (!try_cmpxchg(&pg->flags, &old_flags, new_flags));
  }
  #else
  static inline enum page_cache_mode get_page_memtype(struct page *pg)
@@@ -387,7 -387,8 +387,7 @@@ static unsigned long pat_x_mtrr_type(u6
                u8 mtrr_type, uniform;
  
                mtrr_type = mtrr_type_lookup(start, end, &uniform);
 -              if (mtrr_type != MTRR_TYPE_WRBACK &&
 -                  mtrr_type != MTRR_TYPE_INVALID)
 +              if (mtrr_type != MTRR_TYPE_WRBACK)
                        return _PAGE_CACHE_MODE_UC_MINUS;
  
                return _PAGE_CACHE_MODE_WB;
@@@ -999,7 -1000,7 +999,7 @@@ int track_pfn_remap(struct vm_area_stru
  
                ret = reserve_pfn_range(paddr, size, prot, 0);
                if (ret == 0 && vma)
-                       vma->vm_flags |= VM_PAT;
+                       vm_flags_set(vma, VM_PAT);
                return ret;
        }
  
@@@ -1045,7 -1046,7 +1045,7 @@@ void track_pfn_insert(struct vm_area_st
   * can be for the entire vma (in which case pfn, size are zero).
   */
  void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
-                unsigned long size)
+                unsigned long size, bool mm_wr_locked)
  {
        resource_size_t paddr;
        unsigned long prot;
                size = vma->vm_end - vma->vm_start;
        }
        free_pfn_range(paddr, size);
-       if (vma)
-               vma->vm_flags &= ~VM_PAT;
+       if (vma) {
+               if (mm_wr_locked)
+                       vm_flags_clear(vma, VM_PAT);
+               else
+                       __vm_flags_mod(vma, 0, VM_PAT);
+       }
  }
  
  /*
   */
  void untrack_pfn_moved(struct vm_area_struct *vma)
  {
-       vma->vm_flags &= ~VM_PAT;
+       vm_flags_clear(vma, VM_PAT);
  }
  
  pgprot_t pgprot_writecombine(pgprot_t prot)
index e6474d38afc49187f92e041f8fdff6c592e9e558,0000000000000000000000000000000000000000..761a47e89b005a80ed92a0889f5ce0a62d1850bf
mode 100644,000000..100644
--- /dev/null
@@@ -1,3003 -1,0 +1,3003 @@@
-       vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include <uapi/drm/habanalabs_accel.h>
 +#include "habanalabs.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +
 +#include <linux/uaccess.h>
 +#include <linux/slab.h>
 +#include <linux/vmalloc.h>
 +#include <linux/pci-p2pdma.h>
 +
 +MODULE_IMPORT_NS(DMA_BUF);
 +
 +#define HL_MMU_DEBUG  0
 +
 +/* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
 +#define DRAM_POOL_PAGE_SIZE   SZ_8M
 +
 +#define MEM_HANDLE_INVALID    ULONG_MAX
 +
 +static int allocate_timestamps_buffers(struct hl_fpriv *hpriv,
 +                      struct hl_mem_in *args, u64 *handle);
 +
 +static int set_alloc_page_size(struct hl_device *hdev, struct hl_mem_in *args, u32 *page_size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 psize;
 +
 +      /*
 +       * for ASIC that supports setting the allocation page size by user we will address
 +       * user's choice only if it is not 0 (as 0 means taking the default page size)
 +       */
 +      if (prop->supports_user_set_page_size && args->alloc.page_size) {
 +              psize = args->alloc.page_size;
 +
 +              if (!is_power_of_2(psize)) {
 +                      dev_err(hdev->dev, "user page size (%#llx) is not power of 2\n", psize);
 +                      return -EINVAL;
 +              }
 +      } else {
 +              psize = prop->device_mem_alloc_default_page_size;
 +      }
 +
 +      *page_size = psize;
 +
 +      return 0;
 +}
 +
 +/*
 + * The va ranges in context object contain a list with the available chunks of
 + * device virtual memory.
 + * There is one range for host allocations and one for DRAM allocations.
 + *
 + * On initialization each range contains one chunk of all of its available
 + * virtual range which is a half of the total device virtual range.
 + *
 + * On each mapping of physical pages, a suitable virtual range chunk (with a
 + * minimum size) is selected from the list. If the chunk size equals the
 + * requested size, the chunk is returned. Otherwise, the chunk is split into
 + * two chunks - one to return as result and a remainder to stay in the list.
 + *
 + * On each Unmapping of a virtual address, the relevant virtual chunk is
 + * returned to the list. The chunk is added to the list and if its edges match
 + * the edges of the adjacent chunks (means a contiguous chunk can be created),
 + * the chunks are merged.
 + *
 + * On finish, the list is checked to have only one chunk of all the relevant
 + * virtual range (which is a half of the device total virtual range).
 + * If not (means not all mappings were unmapped), a warning is printed.
 + */
 +
 +/*
 + * alloc_device_memory() - allocate device memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters containing the requested size.
 + * @ret_handle: result handle.
 + *
 + * This function does the following:
 + * - Allocate the requested size rounded up to 'dram_page_size' pages.
 + * - Return unique handle for later map/unmap/free.
 + */
 +static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                              u32 *ret_handle)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u64 paddr = 0, total_size, num_pgs, i;
 +      u32 num_curr_pgs, page_size;
 +      bool contiguous;
 +      int handle, rc;
 +
 +      num_curr_pgs = 0;
 +
 +      rc = set_alloc_page_size(hdev, args, &page_size);
 +      if (rc)
 +              return rc;
 +
 +      num_pgs = DIV_ROUND_UP_ULL(args->alloc.mem_size, page_size);
 +      total_size = num_pgs * page_size;
 +
 +      if (!total_size) {
 +              dev_err(hdev->dev, "Cannot allocate 0 bytes\n");
 +              return -EINVAL;
 +      }
 +
 +      contiguous = args->flags & HL_MEM_CONTIGUOUS;
 +
 +      if (contiguous) {
 +              if (is_power_of_2(page_size))
 +                      paddr = (uintptr_t) gen_pool_dma_alloc_align(vm->dram_pg_pool,
 +                                                                   total_size, NULL, page_size);
 +              else
 +                      paddr = gen_pool_alloc(vm->dram_pg_pool, total_size);
 +              if (!paddr) {
 +                      dev_err(hdev->dev,
 +                              "Cannot allocate %llu contiguous pages with total size of %llu\n",
 +                              num_pgs, total_size);
 +                      return -ENOMEM;
 +              }
 +      }
 +
 +      phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
 +      if (!phys_pg_pack) {
 +              rc = -ENOMEM;
 +              goto pages_pack_err;
 +      }
 +
 +      phys_pg_pack->vm_type = VM_TYPE_PHYS_PACK;
 +      phys_pg_pack->asid = ctx->asid;
 +      phys_pg_pack->npages = num_pgs;
 +      phys_pg_pack->page_size = page_size;
 +      phys_pg_pack->total_size = total_size;
 +      phys_pg_pack->flags = args->flags;
 +      phys_pg_pack->contiguous = contiguous;
 +
 +      phys_pg_pack->pages = kvmalloc_array(num_pgs, sizeof(u64), GFP_KERNEL);
 +      if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
 +              rc = -ENOMEM;
 +              goto pages_arr_err;
 +      }
 +
 +      if (phys_pg_pack->contiguous) {
 +              for (i = 0 ; i < num_pgs ; i++)
 +                      phys_pg_pack->pages[i] = paddr + i * page_size;
 +      } else {
 +              for (i = 0 ; i < num_pgs ; i++) {
 +                      if (is_power_of_2(page_size))
 +                              phys_pg_pack->pages[i] =
 +                                      (uintptr_t)gen_pool_dma_alloc_align(vm->dram_pg_pool,
 +                                                                          page_size, NULL,
 +                                                                          page_size);
 +                      else
 +                              phys_pg_pack->pages[i] = gen_pool_alloc(vm->dram_pg_pool,
 +                                                                      page_size);
 +
 +                      if (!phys_pg_pack->pages[i]) {
 +                              dev_err(hdev->dev,
 +                                      "Cannot allocate device memory (out of memory)\n");
 +                              rc = -ENOMEM;
 +                              goto page_err;
 +                      }
 +
 +                      num_curr_pgs++;
 +              }
 +      }
 +
 +      spin_lock(&vm->idr_lock);
 +      handle = idr_alloc(&vm->phys_pg_pack_handles, phys_pg_pack, 1, 0,
 +                              GFP_ATOMIC);
 +      spin_unlock(&vm->idr_lock);
 +
 +      if (handle < 0) {
 +              dev_err(hdev->dev, "Failed to get handle for page\n");
 +              rc = -EFAULT;
 +              goto idr_err;
 +      }
 +
 +      for (i = 0 ; i < num_pgs ; i++)
 +              kref_get(&vm->dram_pg_pool_refcount);
 +
 +      phys_pg_pack->handle = handle;
 +
 +      atomic64_add(phys_pg_pack->total_size, &ctx->dram_phys_mem);
 +      atomic64_add(phys_pg_pack->total_size, &hdev->dram_used_mem);
 +
 +      *ret_handle = handle;
 +
 +      return 0;
 +
 +idr_err:
 +page_err:
 +      if (!phys_pg_pack->contiguous)
 +              for (i = 0 ; i < num_curr_pgs ; i++)
 +                      gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[i],
 +                                      page_size);
 +
 +      kvfree(phys_pg_pack->pages);
 +pages_arr_err:
 +      kfree(phys_pg_pack);
 +pages_pack_err:
 +      if (contiguous)
 +              gen_pool_free(vm->dram_pg_pool, paddr, total_size);
 +
 +      return rc;
 +}
 +
 +/**
 + * dma_map_host_va() - DMA mapping of the given host virtual address.
 + * @hdev: habanalabs device structure.
 + * @addr: the host virtual address of the memory area.
 + * @size: the size of the memory area.
 + * @p_userptr: pointer to result userptr structure.
 + *
 + * This function does the following:
 + * - Allocate userptr structure.
 + * - Pin the given host memory using the userptr structure.
 + * - Perform DMA mapping to have the DMA addresses of the pages.
 + */
 +static int dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
 +                              struct hl_userptr **p_userptr)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr) {
 +              rc = -ENOMEM;
 +              goto userptr_err;
 +      }
 +
 +      rc = hl_pin_host_memory(hdev, addr, size, userptr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to pin host memory\n");
 +              goto pin_err;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = DMA_BIDIRECTIONAL;
 +      userptr->vm_type = VM_TYPE_USERPTR;
 +
 +      *p_userptr = userptr;
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, DMA_BIDIRECTIONAL);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto dma_map_err;
 +      }
 +
 +      return 0;
 +
 +dma_map_err:
 +      hl_unpin_host_memory(hdev, userptr);
 +pin_err:
 +      kfree(userptr);
 +userptr_err:
 +
 +      return rc;
 +}
 +
 +/**
 + * dma_unmap_host_va() - DMA unmapping of the given host virtual address.
 + * @hdev: habanalabs device structure.
 + * @userptr: userptr to free.
 + *
 + * This function does the following:
 + * - Unpins the physical pages.
 + * - Frees the userptr structure.
 + */
 +static void dma_unmap_host_va(struct hl_device *hdev,
 +                              struct hl_userptr *userptr)
 +{
 +      hl_unpin_host_memory(hdev, userptr);
 +      kfree(userptr);
 +}
 +
 +/**
 + * dram_pg_pool_do_release() - free DRAM pages pool
 + * @ref: pointer to reference object.
 + *
 + * This function does the following:
 + * - Frees the idr structure of physical pages handles.
 + * - Frees the generic pool of DRAM physical pages.
 + */
 +static void dram_pg_pool_do_release(struct kref *ref)
 +{
 +      struct hl_vm *vm = container_of(ref, struct hl_vm,
 +                      dram_pg_pool_refcount);
 +
 +      /*
 +       * free the idr here as only here we know for sure that there are no
 +       * allocated physical pages and hence there are no handles in use
 +       */
 +      idr_destroy(&vm->phys_pg_pack_handles);
 +      gen_pool_destroy(vm->dram_pg_pool);
 +}
 +
 +/**
 + * free_phys_pg_pack() - free physical page pack.
 + * @hdev: habanalabs device structure.
 + * @phys_pg_pack: physical page pack to free.
 + *
 + * This function does the following:
 + * - For DRAM memory only
 + *   - iterate over the pack, free each physical block structure by
 + *     returning it to the general pool.
 + * - Free the hl_vm_phys_pg_pack structure.
 + */
 +static void free_phys_pg_pack(struct hl_device *hdev,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_vm *vm = &hdev->vm;
 +      u64 i;
 +
 +      if (phys_pg_pack->created_from_userptr)
 +              goto end;
 +
 +      if (phys_pg_pack->contiguous) {
 +              gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[0],
 +                      phys_pg_pack->total_size);
 +
 +              for (i = 0; i < phys_pg_pack->npages ; i++)
 +                      kref_put(&vm->dram_pg_pool_refcount,
 +                              dram_pg_pool_do_release);
 +      } else {
 +              for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +                      gen_pool_free(vm->dram_pg_pool,
 +                              phys_pg_pack->pages[i],
 +                              phys_pg_pack->page_size);
 +                      kref_put(&vm->dram_pg_pool_refcount,
 +                              dram_pg_pool_do_release);
 +              }
 +      }
 +
 +end:
 +      kvfree(phys_pg_pack->pages);
 +      kfree(phys_pg_pack);
 +
 +      return;
 +}
 +
 +/**
 + * free_device_memory() - free device memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters containing the requested size.
 + *
 + * This function does the following:
 + * - Free the device memory related to the given handle.
 + */
 +static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u32 handle = args->free.handle;
 +
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "free device memory failed, no match for handle %u\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      if (atomic_read(&phys_pg_pack->mapping_cnt) > 0) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "handle %u is mapped, cannot free\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      /* must remove from idr before the freeing of the physical pages as the refcount of the pool
 +       * is also the trigger of the idr destroy
 +       */
 +      idr_remove(&vm->phys_pg_pack_handles, handle);
 +      spin_unlock(&vm->idr_lock);
 +
 +      atomic64_sub(phys_pg_pack->total_size, &ctx->dram_phys_mem);
 +      atomic64_sub(phys_pg_pack->total_size, &hdev->dram_used_mem);
 +
 +      free_phys_pg_pack(hdev, phys_pg_pack);
 +
 +      return 0;
 +}
 +
 +/**
 + * clear_va_list_locked() - free virtual addresses list.
 + * @hdev: habanalabs device structure.
 + * @va_list: list of virtual addresses to free.
 + *
 + * This function does the following:
 + * - Iterate over the list and free each virtual addresses block.
 + *
 + * This function should be called only when va_list lock is taken.
 + */
 +static void clear_va_list_locked(struct hl_device *hdev,
 +              struct list_head *va_list)
 +{
 +      struct hl_vm_va_block *va_block, *tmp;
 +
 +      list_for_each_entry_safe(va_block, tmp, va_list, node) {
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +      }
 +}
 +
 +/**
 + * print_va_list_locked() - print virtual addresses list.
 + * @hdev: habanalabs device structure.
 + * @va_list: list of virtual addresses to print.
 + *
 + * This function does the following:
 + * - Iterate over the list and print each virtual addresses block.
 + *
 + * This function should be called only when va_list lock is taken.
 + */
 +static void print_va_list_locked(struct hl_device *hdev,
 +              struct list_head *va_list)
 +{
 +#if HL_MMU_DEBUG
 +      struct hl_vm_va_block *va_block;
 +
 +      dev_dbg(hdev->dev, "print va list:\n");
 +
 +      list_for_each_entry(va_block, va_list, node)
 +              dev_dbg(hdev->dev,
 +                      "va block, start: 0x%llx, end: 0x%llx, size: %llu\n",
 +                      va_block->start, va_block->end, va_block->size);
 +#endif
 +}
 +
 +/**
 + * merge_va_blocks_locked() - merge a virtual block if possible.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_list: pointer to the virtual addresses block list.
 + * @va_block: virtual block to merge with adjacent blocks.
 + *
 + * This function does the following:
 + * - Merge the given blocks with the adjacent blocks if their virtual ranges
 + *   create a contiguous virtual range.
 + *
 + * This Function should be called only when va_list lock is taken.
 + */
 +static void merge_va_blocks_locked(struct hl_device *hdev,
 +              struct list_head *va_list, struct hl_vm_va_block *va_block)
 +{
 +      struct hl_vm_va_block *prev, *next;
 +
 +      prev = list_prev_entry(va_block, node);
 +      if (&prev->node != va_list && prev->end + 1 == va_block->start) {
 +              prev->end = va_block->end;
 +              prev->size = prev->end - prev->start + 1;
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +              va_block = prev;
 +      }
 +
 +      next = list_next_entry(va_block, node);
 +      if (&next->node != va_list && va_block->end + 1 == next->start) {
 +              next->start = va_block->start;
 +              next->size = next->end - next->start + 1;
 +              list_del(&va_block->node);
 +              kfree(va_block);
 +      }
 +}
 +
 +/**
 + * add_va_block_locked() - add a virtual block to the virtual addresses list.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_list: pointer to the virtual addresses block list.
 + * @start: start virtual address.
 + * @end: end virtual address.
 + *
 + * This function does the following:
 + * - Add the given block to the virtual blocks list and merge with other blocks
 + *   if a contiguous virtual block can be created.
 + *
 + * This Function should be called only when va_list lock is taken.
 + */
 +static int add_va_block_locked(struct hl_device *hdev,
 +              struct list_head *va_list, u64 start, u64 end)
 +{
 +      struct hl_vm_va_block *va_block, *res = NULL;
 +      u64 size = end - start + 1;
 +
 +      print_va_list_locked(hdev, va_list);
 +
 +      list_for_each_entry(va_block, va_list, node) {
 +              /* TODO: remove upon matureness */
 +              if (hl_mem_area_crosses_range(start, size, va_block->start,
 +                              va_block->end)) {
 +                      dev_err(hdev->dev,
 +                              "block crossing ranges at start 0x%llx, end 0x%llx\n",
 +                              va_block->start, va_block->end);
 +                      return -EINVAL;
 +              }
 +
 +              if (va_block->end < start)
 +                      res = va_block;
 +      }
 +
 +      va_block = kmalloc(sizeof(*va_block), GFP_KERNEL);
 +      if (!va_block)
 +              return -ENOMEM;
 +
 +      va_block->start = start;
 +      va_block->end = end;
 +      va_block->size = size;
 +
 +      if (!res)
 +              list_add(&va_block->node, va_list);
 +      else
 +              list_add(&va_block->node, &res->node);
 +
 +      merge_va_blocks_locked(hdev, va_list, va_block);
 +
 +      print_va_list_locked(hdev, va_list);
 +
 +      return 0;
 +}
 +
 +/**
 + * add_va_block() - wrapper for add_va_block_locked.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_range: pointer to the virtual addresses range object.
 + * @start: start virtual address.
 + * @end: end virtual address.
 + *
 + * This function does the following:
 + * - Takes the list lock and calls add_va_block_locked.
 + */
 +static inline int add_va_block(struct hl_device *hdev,
 +              struct hl_va_range *va_range, u64 start, u64 end)
 +{
 +      int rc;
 +
 +      mutex_lock(&va_range->lock);
 +      rc = add_va_block_locked(hdev, &va_range->list, start, end);
 +      mutex_unlock(&va_range->lock);
 +
 +      return rc;
 +}
 +
 +/**
 + * is_hint_crossing_range() - check if hint address crossing specified reserved.
 + * @range_type: virtual space range type.
 + * @start_addr: start virtual address.
 + * @size: block size.
 + * @prop: asic properties structure to retrieve reserved ranges from.
 + */
 +static inline bool is_hint_crossing_range(enum hl_va_range_type range_type,
 +              u64 start_addr, u32 size, struct asic_fixed_properties *prop) {
 +      bool range_cross;
 +
 +      if (range_type == HL_VA_RANGE_TYPE_DRAM)
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr, size,
 +                      prop->hints_dram_reserved_va_range.start_addr,
 +                      prop->hints_dram_reserved_va_range.end_addr);
 +      else if (range_type == HL_VA_RANGE_TYPE_HOST)
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr,   size,
 +                      prop->hints_host_reserved_va_range.start_addr,
 +                      prop->hints_host_reserved_va_range.end_addr);
 +      else
 +              range_cross =
 +                      hl_mem_area_crosses_range(start_addr, size,
 +                      prop->hints_host_hpage_reserved_va_range.start_addr,
 +                      prop->hints_host_hpage_reserved_va_range.end_addr);
 +
 +      return range_cross;
 +}
 +
 +/**
 + * get_va_block() - get a virtual block for the given size and alignment.
 + *
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_range: pointer to the virtual addresses range.
 + * @size: requested block size.
 + * @hint_addr: hint for requested address by the user.
 + * @va_block_align: required alignment of the virtual block start address.
 + * @range_type: va range type (host, dram)
 + * @flags: additional memory flags, currently only uses HL_MEM_FORCE_HINT
 + *
 + * This function does the following:
 + * - Iterate on the virtual block list to find a suitable virtual block for the
 + *   given size, hint address and alignment.
 + * - Reserve the requested block and update the list.
 + * - Return the start address of the virtual block.
 + */
 +static u64 get_va_block(struct hl_device *hdev,
 +                              struct hl_va_range *va_range,
 +                              u64 size, u64 hint_addr, u32 va_block_align,
 +                              enum hl_va_range_type range_type,
 +                              u32 flags)
 +{
 +      struct hl_vm_va_block *va_block, *new_va_block = NULL;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 tmp_hint_addr, valid_start, valid_size, prev_start, prev_end,
 +              align_mask, reserved_valid_start = 0, reserved_valid_size = 0,
 +              dram_hint_mask = prop->dram_hints_align_mask;
 +      bool add_prev = false;
 +      bool is_align_pow_2  = is_power_of_2(va_range->page_size);
 +      bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr);
 +      bool force_hint = flags & HL_MEM_FORCE_HINT;
 +
 +      if (is_align_pow_2)
 +              align_mask = ~((u64)va_block_align - 1);
 +      else
 +              /*
 +               * with non-power-of-2 range we work only with page granularity
 +               * and the start address is page aligned,
 +               * so no need for alignment checking.
 +               */
 +              size = DIV_ROUND_UP_ULL(size, va_range->page_size) *
 +                                                      va_range->page_size;
 +
 +      tmp_hint_addr = hint_addr & ~dram_hint_mask;
 +
 +      /* Check if we need to ignore hint address */
 +      if ((is_align_pow_2 && (hint_addr & (va_block_align - 1))) ||
 +                      (!is_align_pow_2 && is_hint_dram_addr &&
 +                      do_div(tmp_hint_addr, va_range->page_size))) {
 +
 +              if (force_hint) {
 +                      /* Hint must be respected, so here we just fail */
 +                      dev_err(hdev->dev,
 +                              "Hint address 0x%llx is not page aligned - cannot be respected\n",
 +                              hint_addr);
 +                      return 0;
 +              }
 +
 +              dev_dbg(hdev->dev,
 +                      "Hint address 0x%llx will be ignored because it is not aligned\n",
 +                      hint_addr);
 +              hint_addr = 0;
 +      }
 +
 +      mutex_lock(&va_range->lock);
 +
 +      print_va_list_locked(hdev, &va_range->list);
 +
 +      list_for_each_entry(va_block, &va_range->list, node) {
 +              /* Calc the first possible aligned addr */
 +              valid_start = va_block->start;
 +
 +              if (is_align_pow_2 && (valid_start & (va_block_align - 1))) {
 +                      valid_start &= align_mask;
 +                      valid_start += va_block_align;
 +                      if (valid_start > va_block->end)
 +                              continue;
 +              }
 +
 +              valid_size = va_block->end - valid_start + 1;
 +              if (valid_size < size)
 +                      continue;
 +
 +              /*
 +               * In case hint address is 0, and hints_range_reservation
 +               * property enabled, then avoid allocating va blocks from the
 +               * range reserved for hint addresses
 +               */
 +              if (prop->hints_range_reservation && !hint_addr)
 +                      if (is_hint_crossing_range(range_type, valid_start,
 +                                      size, prop))
 +                              continue;
 +
 +              /* Pick the minimal length block which has the required size */
 +              if (!new_va_block || (valid_size < reserved_valid_size)) {
 +                      new_va_block = va_block;
 +                      reserved_valid_start = valid_start;
 +                      reserved_valid_size = valid_size;
 +              }
 +
 +              if (hint_addr && hint_addr >= valid_start &&
 +                                      (hint_addr + size) <= va_block->end) {
 +                      new_va_block = va_block;
 +                      reserved_valid_start = hint_addr;
 +                      reserved_valid_size = valid_size;
 +                      break;
 +              }
 +      }
 +
 +      if (!new_va_block) {
 +              dev_err(hdev->dev, "no available va block for size %llu\n",
 +                                                              size);
 +              goto out;
 +      }
 +
 +      if (force_hint && reserved_valid_start != hint_addr) {
 +              /* Hint address must be respected. If we are here - this means
 +               * we could not respect it.
 +               */
 +              dev_err(hdev->dev,
 +                      "Hint address 0x%llx could not be respected\n",
 +                      hint_addr);
 +              reserved_valid_start = 0;
 +              goto out;
 +      }
 +
 +      /*
 +       * Check if there is some leftover range due to reserving the new
 +       * va block, then return it to the main virtual addresses list.
 +       */
 +      if (reserved_valid_start > new_va_block->start) {
 +              prev_start = new_va_block->start;
 +              prev_end = reserved_valid_start - 1;
 +
 +              new_va_block->start = reserved_valid_start;
 +              new_va_block->size = reserved_valid_size;
 +
 +              add_prev = true;
 +      }
 +
 +      if (new_va_block->size > size) {
 +              new_va_block->start += size;
 +              new_va_block->size = new_va_block->end - new_va_block->start + 1;
 +      } else {
 +              list_del(&new_va_block->node);
 +              kfree(new_va_block);
 +      }
 +
 +      if (add_prev)
 +              add_va_block_locked(hdev, &va_range->list, prev_start,
 +                              prev_end);
 +
 +      print_va_list_locked(hdev, &va_range->list);
 +out:
 +      mutex_unlock(&va_range->lock);
 +
 +      return reserved_valid_start;
 +}
 +
 +/*
 + * hl_reserve_va_block() - reserve a virtual block of a given size.
 + * @hdev: pointer to the habanalabs device structure.
 + * @ctx: current context
 + * @type: virtual addresses range type.
 + * @size: requested block size.
 + * @alignment: required alignment in bytes of the virtual block start address,
 + *             0 means no alignment.
 + *
 + * This function does the following:
 + * - Iterate on the virtual block list to find a suitable virtual block for the
 + *   given size and alignment.
 + * - Reserve the requested block and update the list.
 + * - Return the start address of the virtual block.
 + */
 +u64 hl_reserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
 +              enum hl_va_range_type type, u64 size, u32 alignment)
 +{
 +      return get_va_block(hdev, ctx->va_range[type], size, 0,
 +                      max(alignment, ctx->va_range[type]->page_size),
 +                      type, 0);
 +}
 +
 +/**
 + * hl_get_va_range_type() - get va_range type for the given address and size.
 + * @ctx: context to fetch va_range from.
 + * @address: the start address of the area we want to validate.
 + * @size: the size in bytes of the area we want to validate.
 + * @type: returned va_range type.
 + *
 + * Return: true if the area is inside a valid range, false otherwise.
 + */
 +static int hl_get_va_range_type(struct hl_ctx *ctx, u64 address, u64 size,
 +                      enum hl_va_range_type *type)
 +{
 +      int i;
 +
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX; i++) {
 +              if (hl_mem_area_inside_range(address, size,
 +                              ctx->va_range[i]->start_addr,
 +                              ctx->va_range[i]->end_addr)) {
 +                      *type = i;
 +                      return 0;
 +              }
 +      }
 +
 +      return -EINVAL;
 +}
 +
 +/**
 + * hl_unreserve_va_block() - wrapper for add_va_block to unreserve a va block.
 + * @hdev: pointer to the habanalabs device structure
 + * @ctx: pointer to the context structure.
 + * @start_addr: start virtual address.
 + * @size: number of bytes to unreserve.
 + *
 + * This function does the following:
 + * - Takes the list lock and calls add_va_block_locked.
 + */
 +int hl_unreserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
 +              u64 start_addr, u64 size)
 +{
 +      enum hl_va_range_type type;
 +      int rc;
 +
 +      rc = hl_get_va_range_type(ctx, start_addr, size, &type);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "cannot find va_range for va %#llx size %llu",
 +                      start_addr, size);
 +              return rc;
 +      }
 +
 +      rc = add_va_block(hdev, ctx->va_range[type], start_addr,
 +                                              start_addr + size - 1);
 +      if (rc)
 +              dev_warn(hdev->dev,
 +                      "add va block failed for vaddr: 0x%llx\n", start_addr);
 +
 +      return rc;
 +}
 +
 +/**
 + * init_phys_pg_pack_from_userptr() - initialize physical page pack from host
 + *                                    memory
 + * @ctx: pointer to the context structure.
 + * @userptr: userptr to initialize from.
 + * @pphys_pg_pack: result pointer.
 + * @force_regular_page: tell the function to ignore huge page optimization,
 + *                      even if possible. Needed for cases where the device VA
 + *                      is allocated before we know the composition of the
 + *                      physical pages
 + *
 + * This function does the following:
 + * - Pin the physical pages related to the given virtual block.
 + * - Create a physical page pack from the physical pages related to the given
 + *   virtual block.
 + */
 +static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
 +                              struct hl_userptr *userptr,
 +                              struct hl_vm_phys_pg_pack **pphys_pg_pack,
 +                              bool force_regular_page)
 +{
 +      u32 npages, page_size = PAGE_SIZE,
 +              huge_page_size = ctx->hdev->asic_prop.pmmu_huge.page_size;
 +      u32 pgs_in_huge_page = huge_page_size >> __ffs(page_size);
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      bool first = true, is_huge_page_opt;
 +      u64 page_mask, total_npages;
 +      struct scatterlist *sg;
 +      dma_addr_t dma_addr;
 +      int rc, i, j;
 +
 +      phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
 +      if (!phys_pg_pack)
 +              return -ENOMEM;
 +
 +      phys_pg_pack->vm_type = userptr->vm_type;
 +      phys_pg_pack->created_from_userptr = true;
 +      phys_pg_pack->asid = ctx->asid;
 +      atomic_set(&phys_pg_pack->mapping_cnt, 1);
 +
 +      is_huge_page_opt = (force_regular_page ? false : true);
 +
 +      /* Only if all dma_addrs are aligned to 2MB and their
 +       * sizes is at least 2MB, we can use huge page mapping.
 +       * We limit the 2MB optimization to this condition,
 +       * since later on we acquire the related VA range as one
 +       * consecutive block.
 +       */
 +      total_npages = 0;
 +      for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
 +              npages = hl_get_sg_info(sg, &dma_addr);
 +
 +              total_npages += npages;
 +
 +              if ((npages % pgs_in_huge_page) ||
 +                                      (dma_addr & (huge_page_size - 1)))
 +                      is_huge_page_opt = false;
 +      }
 +
 +      if (is_huge_page_opt) {
 +              page_size = huge_page_size;
 +              do_div(total_npages, pgs_in_huge_page);
 +      }
 +
 +      page_mask = ~(((u64) page_size) - 1);
 +
 +      phys_pg_pack->pages = kvmalloc_array(total_npages, sizeof(u64),
 +                                              GFP_KERNEL);
 +      if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
 +              rc = -ENOMEM;
 +              goto page_pack_arr_mem_err;
 +      }
 +
 +      phys_pg_pack->npages = total_npages;
 +      phys_pg_pack->page_size = page_size;
 +      phys_pg_pack->total_size = total_npages * page_size;
 +
 +      j = 0;
 +      for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
 +              npages = hl_get_sg_info(sg, &dma_addr);
 +
 +              /* align down to physical page size and save the offset */
 +              if (first) {
 +                      first = false;
 +                      phys_pg_pack->offset = dma_addr & (page_size - 1);
 +                      dma_addr &= page_mask;
 +              }
 +
 +              while (npages) {
 +                      phys_pg_pack->pages[j++] = dma_addr;
 +                      dma_addr += page_size;
 +
 +                      if (is_huge_page_opt)
 +                              npages -= pgs_in_huge_page;
 +                      else
 +                              npages--;
 +              }
 +      }
 +
 +      *pphys_pg_pack = phys_pg_pack;
 +
 +      return 0;
 +
 +page_pack_arr_mem_err:
 +      kfree(phys_pg_pack);
 +
 +      return rc;
 +}
 +
 +/**
 + * map_phys_pg_pack() - maps the physical page pack..
 + * @ctx: pointer to the context structure.
 + * @vaddr: start address of the virtual area to map from.
 + * @phys_pg_pack: the pack of physical pages to map to.
 + *
 + * This function does the following:
 + * - Maps each chunk of virtual memory to matching physical chunk.
 + * - Stores number of successful mappings in the given argument.
 + * - Returns 0 on success, error code otherwise.
 + */
 +static int map_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      u64 next_vaddr = vaddr, paddr, mapped_pg_cnt = 0, i;
 +      u32 page_size = phys_pg_pack->page_size;
 +      int rc = 0;
 +      bool is_host_addr;
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +              paddr = phys_pg_pack->pages[i];
 +
 +              rc = hl_mmu_map_page(ctx, next_vaddr, paddr, page_size,
 +                              (i + 1) == phys_pg_pack->npages);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "map failed for handle %u, npages: %llu, mapped: %llu",
 +                              phys_pg_pack->handle, phys_pg_pack->npages,
 +                              mapped_pg_cnt);
 +                      goto err;
 +              }
 +
 +              mapped_pg_cnt++;
 +              next_vaddr += page_size;
 +      }
 +
 +      return 0;
 +
 +err:
 +      is_host_addr = !hl_is_dram_va(hdev, vaddr);
 +
 +      next_vaddr = vaddr;
 +      for (i = 0 ; i < mapped_pg_cnt ; i++) {
 +              if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
 +                                      (i + 1) == mapped_pg_cnt))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap handle %u, va: 0x%llx, pa: 0x%llx, page size: %u\n",
 +                                      phys_pg_pack->handle, next_vaddr,
 +                                      phys_pg_pack->pages[i], page_size);
 +
 +              next_vaddr += page_size;
 +
 +              /*
 +               * unmapping on Palladium can be really long, so avoid a CPU
 +               * soft lockup bug by sleeping a little between unmapping pages
 +               *
 +               * In addition, on host num of pages could be huge,
 +               * because page size could be 4KB, so when unmapping host
 +               * pages sleep every 32K pages to avoid soft lockup
 +               */
 +              if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
 +                      usleep_range(50, 200);
 +      }
 +
 +      return rc;
 +}
 +
 +/**
 + * unmap_phys_pg_pack() - unmaps the physical page pack.
 + * @ctx: pointer to the context structure.
 + * @vaddr: start address of the virtual area to unmap.
 + * @phys_pg_pack: the pack of physical pages to unmap.
 + */
 +static void unmap_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
 +                              struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      u64 next_vaddr, i;
 +      bool is_host_addr;
 +      u32 page_size;
 +
 +      is_host_addr = !hl_is_dram_va(hdev, vaddr);
 +      page_size = phys_pg_pack->page_size;
 +      next_vaddr = vaddr;
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++, next_vaddr += page_size) {
 +              if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
 +                                     (i + 1) == phys_pg_pack->npages))
 +                      dev_warn_ratelimited(hdev->dev,
 +                      "unmap failed for vaddr: 0x%llx\n", next_vaddr);
 +
 +              /*
 +               * unmapping on Palladium can be really long, so avoid a CPU
 +               * soft lockup bug by sleeping a little between unmapping pages
 +               *
 +               * In addition, on host num of pages could be huge,
 +               * because page size could be 4KB, so when unmapping host
 +               * pages sleep every 32K pages to avoid soft lockup
 +               */
 +              if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
 +                      usleep_range(50, 200);
 +      }
 +}
 +
 +static int get_paddr_from_handle(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                                      u64 *paddr)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      u32 handle;
 +
 +      handle = lower_32_bits(args->map_device.handle);
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_err(hdev->dev, "no match for handle %u\n", handle);
 +              return -EINVAL;
 +      }
 +
 +      *paddr = phys_pg_pack->pages[0];
 +
 +      spin_unlock(&vm->idr_lock);
 +
 +      return 0;
 +}
 +
 +/**
 + * map_device_va() - map the given memory.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters with handle/host virtual address.
 + * @device_addr: pointer to result device virtual address.
 + *
 + * This function does the following:
 + * - If given a physical device memory handle, map to a device virtual block
 + *   and return the start address of this block.
 + * - If given a host virtual address and size, find the related physical pages,
 + *   map a device virtual block to this pages and return the start address of
 + *   this block.
 + */
 +static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device_addr)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      enum hl_va_range_type va_range_type = 0;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_userptr *userptr = NULL;
 +      u32 handle = 0, va_block_align;
 +      struct hl_vm_hash_node *hnode;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hl_va_range *va_range;
 +      bool is_userptr, do_prefetch;
 +      u64 ret_vaddr, hint_addr;
 +      enum vm_type *vm_type;
 +      int rc;
 +
 +      /* set map flags */
 +      is_userptr = args->flags & HL_MEM_USERPTR;
 +      do_prefetch = hdev->supports_mmu_prefetch && (args->flags & HL_MEM_PREFETCH);
 +
 +      /* Assume failure */
 +      *device_addr = 0;
 +
 +      if (is_userptr) {
 +              u64 addr = args->map_host.host_virt_addr,
 +                      size = args->map_host.mem_size;
 +              u32 page_size = hdev->asic_prop.pmmu.page_size,
 +                      huge_page_size = hdev->asic_prop.pmmu_huge.page_size;
 +
 +              rc = dma_map_host_va(hdev, addr, size, &userptr);
 +              if (rc) {
 +                      dev_err(hdev->dev, "failed to get userptr from va\n");
 +                      return rc;
 +              }
 +
 +              rc = init_phys_pg_pack_from_userptr(ctx, userptr,
 +                              &phys_pg_pack, false);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "unable to init page pack for vaddr 0x%llx\n",
 +                              addr);
 +                      goto init_page_pack_err;
 +              }
 +
 +              vm_type = (enum vm_type *) userptr;
 +              hint_addr = args->map_host.hint_addr;
 +              handle = phys_pg_pack->handle;
 +
 +              /* get required alignment */
 +              if (phys_pg_pack->page_size == page_size) {
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +                      va_range_type = HL_VA_RANGE_TYPE_HOST;
 +                      /*
 +                       * huge page alignment may be needed in case of regular
 +                       * page mapping, depending on the host VA alignment
 +                       */
 +                      if (addr & (huge_page_size - 1))
 +                              va_block_align = page_size;
 +                      else
 +                              va_block_align = huge_page_size;
 +              } else {
 +                      /*
 +                       * huge page alignment is needed in case of huge page
 +                       * mapping
 +                       */
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
 +                      va_range_type = HL_VA_RANGE_TYPE_HOST_HUGE;
 +                      va_block_align = huge_page_size;
 +              }
 +      } else {
 +              handle = lower_32_bits(args->map_device.handle);
 +
 +              spin_lock(&vm->idr_lock);
 +              phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
 +              if (!phys_pg_pack) {
 +                      spin_unlock(&vm->idr_lock);
 +                      dev_err(hdev->dev,
 +                              "no match for handle %u\n", handle);
 +                      return -EINVAL;
 +              }
 +
 +              /* increment now to avoid freeing device memory while mapping */
 +              atomic_inc(&phys_pg_pack->mapping_cnt);
 +
 +              spin_unlock(&vm->idr_lock);
 +
 +              vm_type = (enum vm_type *) phys_pg_pack;
 +
 +              hint_addr = args->map_device.hint_addr;
 +
 +              /* DRAM VA alignment is the same as the MMU page size */
 +              va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
 +              va_range_type = HL_VA_RANGE_TYPE_DRAM;
 +              va_block_align = hdev->asic_prop.dmmu.page_size;
 +      }
 +
 +      /*
 +       * relevant for mapping device physical memory only, as host memory is
 +       * implicitly shared
 +       */
 +      if (!is_userptr && !(phys_pg_pack->flags & HL_MEM_SHARED) &&
 +                      phys_pg_pack->asid != ctx->asid) {
 +              dev_err(hdev->dev,
 +                      "Failed to map memory, handle %u is not shared\n",
 +                      handle);
 +              rc = -EPERM;
 +              goto shared_err;
 +      }
 +
 +      hnode = kzalloc(sizeof(*hnode), GFP_KERNEL);
 +      if (!hnode) {
 +              rc = -ENOMEM;
 +              goto hnode_err;
 +      }
 +
 +      if (hint_addr && phys_pg_pack->offset) {
 +              if (args->flags & HL_MEM_FORCE_HINT) {
 +                      /* Fail if hint must be respected but it can't be */
 +                      dev_err(hdev->dev,
 +                              "Hint address 0x%llx cannot be respected because source memory is not aligned 0x%x\n",
 +                              hint_addr, phys_pg_pack->offset);
 +                      rc = -EINVAL;
 +                      goto va_block_err;
 +              }
 +              dev_dbg(hdev->dev,
 +                      "Hint address 0x%llx will be ignored because source memory is not aligned 0x%x\n",
 +                      hint_addr, phys_pg_pack->offset);
 +      }
 +
 +      ret_vaddr = get_va_block(hdev, va_range, phys_pg_pack->total_size,
 +                                      hint_addr, va_block_align,
 +                                      va_range_type, args->flags);
 +      if (!ret_vaddr) {
 +              dev_err(hdev->dev, "no available va block for handle %u\n",
 +                              handle);
 +              rc = -ENOMEM;
 +              goto va_block_err;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      rc = map_phys_pg_pack(ctx, ret_vaddr, phys_pg_pack);
 +      if (rc) {
 +              dev_err(hdev->dev, "mapping page pack failed for handle %u\n", handle);
 +              mutex_unlock(&hdev->mmu_lock);
 +              goto map_err;
 +      }
 +
 +      rc = hl_mmu_invalidate_cache_range(hdev, false, *vm_type | MMU_OP_SKIP_LOW_CACHE_INV,
 +                              ctx->asid, ret_vaddr, phys_pg_pack->total_size);
 +      mutex_unlock(&hdev->mmu_lock);
 +      if (rc)
 +              goto map_err;
 +
 +      /*
 +       * prefetch is done upon user's request. it is performed in WQ as and so can
 +       * be outside the MMU lock. the operation itself is already protected by the mmu lock
 +       */
 +      if (do_prefetch) {
 +              rc = hl_mmu_prefetch_cache_range(ctx, *vm_type, ctx->asid, ret_vaddr,
 +                                                      phys_pg_pack->total_size);
 +              if (rc)
 +                      goto map_err;
 +      }
 +
 +      ret_vaddr += phys_pg_pack->offset;
 +
 +      hnode->ptr = vm_type;
 +      hnode->vaddr = ret_vaddr;
 +      hnode->handle = is_userptr ? MEM_HANDLE_INVALID : handle;
 +
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_add(ctx->mem_hash, &hnode->node, ret_vaddr);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      *device_addr = ret_vaddr;
 +
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +
 +      return rc;
 +
 +map_err:
 +      if (add_va_block(hdev, va_range, ret_vaddr,
 +                              ret_vaddr + phys_pg_pack->total_size - 1))
 +              dev_warn(hdev->dev,
 +                      "release va block failed for handle 0x%x, vaddr: 0x%llx\n",
 +                              handle, ret_vaddr);
 +
 +va_block_err:
 +      kfree(hnode);
 +hnode_err:
 +shared_err:
 +      atomic_dec(&phys_pg_pack->mapping_cnt);
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +init_page_pack_err:
 +      if (is_userptr)
 +              dma_unmap_host_va(hdev, userptr);
 +
 +      return rc;
 +}
 +
 +/**
 + * unmap_device_va() - unmap the given device virtual address.
 + * @ctx: pointer to the context structure.
 + * @args: host parameters with device virtual address to unmap.
 + * @ctx_free: true if in context free flow, false otherwise.
 + *
 + * This function does the following:
 + * - unmap the physical pages related to the given virtual address.
 + * - return the device virtual block to the virtual block list.
 + */
 +static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
 +                              bool ctx_free)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 +      u64 vaddr = args->unmap.device_virt_addr;
 +      struct hl_vm_hash_node *hnode = NULL;
 +      struct asic_fixed_properties *prop;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_userptr *userptr = NULL;
 +      struct hl_va_range *va_range;
 +      enum vm_type *vm_type;
 +      bool is_userptr;
 +      int rc = 0;
 +
 +      prop = &hdev->asic_prop;
 +
 +      /* protect from double entrance */
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr)
 +              if (vaddr == hnode->vaddr)
 +                      break;
 +
 +      if (!hnode) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_err(hdev->dev,
 +                      "unmap failed, no mem hnode for vaddr 0x%llx\n",
 +                      vaddr);
 +              return -EINVAL;
 +      }
 +
 +      if (hnode->export_cnt) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_err(hdev->dev, "failed to unmap %#llx, memory is exported\n", vaddr);
 +              return -EINVAL;
 +      }
 +
 +      hash_del(&hnode->node);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      vm_type = hnode->ptr;
 +
 +      if (*vm_type == VM_TYPE_USERPTR) {
 +              is_userptr = true;
 +              userptr = hnode->ptr;
 +
 +              rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack,
 +                                                      false);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "unable to init page pack for vaddr 0x%llx\n",
 +                              vaddr);
 +                      goto vm_type_err;
 +              }
 +
 +              if (phys_pg_pack->page_size ==
 +                                      hdev->asic_prop.pmmu.page_size)
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +              else
 +                      va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
 +      } else if (*vm_type == VM_TYPE_PHYS_PACK) {
 +              is_userptr = false;
 +              va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
 +              phys_pg_pack = hnode->ptr;
 +      } else {
 +              dev_warn(hdev->dev,
 +                      "unmap failed, unknown vm desc for vaddr 0x%llx\n",
 +                              vaddr);
 +              rc = -EFAULT;
 +              goto vm_type_err;
 +      }
 +
 +      if (atomic_read(&phys_pg_pack->mapping_cnt) == 0) {
 +              dev_err(hdev->dev, "vaddr 0x%llx is not mapped\n", vaddr);
 +              rc = -EINVAL;
 +              goto mapping_cnt_err;
 +      }
 +
 +      if (!is_userptr && !is_power_of_2(phys_pg_pack->page_size))
 +              vaddr = prop->dram_base_address +
 +                      DIV_ROUND_DOWN_ULL(vaddr - prop->dram_base_address,
 +                                              phys_pg_pack->page_size) *
 +                                                      phys_pg_pack->page_size;
 +      else
 +              vaddr &= ~(((u64) phys_pg_pack->page_size) - 1);
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      unmap_phys_pg_pack(ctx, vaddr, phys_pg_pack);
 +
 +      /*
 +       * During context free this function is called in a loop to clean all
 +       * the context mappings. Hence the cache invalidation can be called once
 +       * at the loop end rather than for each iteration
 +       */
 +      if (!ctx_free)
 +              rc = hl_mmu_invalidate_cache_range(hdev, true, *vm_type, ctx->asid, vaddr,
 +                                                      phys_pg_pack->total_size);
 +
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      /*
 +       * If the context is closing we don't need to check for the MMU cache
 +       * invalidation return code and update the VA free list as in this flow
 +       * we invalidate the MMU cache outside of this unmap function and the VA
 +       * free list will be freed anyway.
 +       */
 +      if (!ctx_free) {
 +              int tmp_rc;
 +
 +              tmp_rc = add_va_block(hdev, va_range, vaddr,
 +                                      vaddr + phys_pg_pack->total_size - 1);
 +              if (tmp_rc) {
 +                      dev_warn(hdev->dev,
 +                                      "add va block failed for vaddr: 0x%llx\n",
 +                                      vaddr);
 +                      if (!rc)
 +                              rc = tmp_rc;
 +              }
 +      }
 +
 +      atomic_dec(&phys_pg_pack->mapping_cnt);
 +      kfree(hnode);
 +
 +      if (is_userptr) {
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +              dma_unmap_host_va(hdev, userptr);
 +      }
 +
 +      return rc;
 +
 +mapping_cnt_err:
 +      if (is_userptr)
 +              free_phys_pg_pack(hdev, phys_pg_pack);
 +vm_type_err:
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_add(ctx->mem_hash, &hnode->node, vaddr);
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      return rc;
 +}
 +
 +static int map_block(struct hl_device *hdev, u64 address, u64 *handle, u32 *size)
 +{
 +      u32 block_id;
 +      int rc;
 +
 +      *handle = 0;
 +      if (size)
 +              *size = 0;
 +
 +      rc = hdev->asic_funcs->get_hw_block_id(hdev, address, size, &block_id);
 +      if (rc)
 +              return rc;
 +
 +      *handle = block_id | HL_MMAP_TYPE_BLOCK;
 +      *handle <<= PAGE_SHIFT;
 +
 +      return 0;
 +}
 +
 +static void hw_block_vm_close(struct vm_area_struct *vma)
 +{
 +      struct hl_vm_hw_block_list_node *lnode =
 +              (struct hl_vm_hw_block_list_node *) vma->vm_private_data;
 +      struct hl_ctx *ctx = lnode->ctx;
 +      long new_mmap_size;
 +
 +      new_mmap_size = lnode->mapped_size - (vma->vm_end - vma->vm_start);
 +      if (new_mmap_size > 0) {
 +              lnode->mapped_size = new_mmap_size;
 +              return;
 +      }
 +
 +      mutex_lock(&ctx->hw_block_list_lock);
 +      list_del(&lnode->node);
 +      mutex_unlock(&ctx->hw_block_list_lock);
 +      hl_ctx_put(ctx);
 +      kfree(lnode);
 +      vma->vm_private_data = NULL;
 +}
 +
 +static const struct vm_operations_struct hw_block_vm_ops = {
 +      .close = hw_block_vm_close
 +};
 +
 +/**
 + * hl_hw_block_mmap() - mmap a hw block to user.
 + * @hpriv: pointer to the private data of the fd
 + * @vma: pointer to vm_area_struct of the process
 + *
 + * Driver increments context reference for every HW block mapped in order
 + * to prevent user from closing FD without unmapping first
 + */
 +int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
 +{
 +      struct hl_vm_hw_block_list_node *lnode;
 +      struct hl_device *hdev = hpriv->hdev;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u32 block_id, block_size;
 +      int rc;
 +
 +      /* We use the page offset to hold the block id and thus we need to clear
 +       * it before doing the mmap itself
 +       */
 +      block_id = vma->vm_pgoff;
 +      vma->vm_pgoff = 0;
 +
 +      /* Driver only allows mapping of a complete HW block */
 +      block_size = vma->vm_end - vma->vm_start;
 +
 +      if (!access_ok((void __user *) (uintptr_t) vma->vm_start, block_size)) {
 +              dev_err(hdev->dev,
 +                      "user pointer is invalid - 0x%lx\n",
 +                      vma->vm_start);
 +
 +              return -EINVAL;
 +      }
 +
 +      lnode = kzalloc(sizeof(*lnode), GFP_KERNEL);
 +      if (!lnode)
 +              return -ENOMEM;
 +
 +      rc = hdev->asic_funcs->hw_block_mmap(hdev, vma, block_id, block_size);
 +      if (rc) {
 +              kfree(lnode);
 +              return rc;
 +      }
 +
 +      hl_ctx_get(ctx);
 +
 +      lnode->ctx = ctx;
 +      lnode->vaddr = vma->vm_start;
 +      lnode->block_size = block_size;
 +      lnode->mapped_size = lnode->block_size;
 +      lnode->id = block_id;
 +
 +      vma->vm_private_data = lnode;
 +      vma->vm_ops = &hw_block_vm_ops;
 +
 +      mutex_lock(&ctx->hw_block_list_lock);
 +      list_add_tail(&lnode->node, &ctx->hw_block_mem_list);
 +      mutex_unlock(&ctx->hw_block_list_lock);
 +
 +      vma->vm_pgoff = block_id;
 +
 +      return 0;
 +}
 +
 +static int set_dma_sg(struct scatterlist *sg, u64 bar_address, u64 chunk_size,
 +                      struct device *dev, enum dma_data_direction dir)
 +{
 +      dma_addr_t addr;
 +      int rc;
 +
 +      addr = dma_map_resource(dev, bar_address, chunk_size, dir,
 +                              DMA_ATTR_SKIP_CPU_SYNC);
 +      rc = dma_mapping_error(dev, addr);
 +      if (rc)
 +              return rc;
 +
 +      sg_set_page(sg, NULL, chunk_size, 0);
 +      sg_dma_address(sg) = addr;
 +      sg_dma_len(sg) = chunk_size;
 +
 +      return 0;
 +}
 +
 +static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 *pages, u64 npages,
 +                                              u64 page_size, u64 exported_size,
 +                                              struct device *dev, enum dma_data_direction dir)
 +{
 +      u64 chunk_size, bar_address, dma_max_seg_size, cur_size_to_export, cur_npages;
 +      struct asic_fixed_properties *prop;
 +      int rc, i, j, nents, cur_page;
 +      struct scatterlist *sg;
 +      struct sg_table *sgt;
 +
 +      prop = &hdev->asic_prop;
 +
 +      dma_max_seg_size = dma_get_max_seg_size(dev);
 +
 +      /* We would like to align the max segment size to PAGE_SIZE, so the
 +       * SGL will contain aligned addresses that can be easily mapped to
 +       * an MMU
 +       */
 +      dma_max_seg_size = ALIGN_DOWN(dma_max_seg_size, PAGE_SIZE);
 +      if (dma_max_seg_size < PAGE_SIZE) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "dma_max_seg_size %llu can't be smaller than PAGE_SIZE\n",
 +                              dma_max_seg_size);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 +      if (!sgt)
 +              return ERR_PTR(-ENOMEM);
 +
 +      /* remove export size restrictions in case not explicitly defined */
 +      cur_size_to_export = exported_size ? exported_size : (npages * page_size);
 +
 +      /* If the size of each page is larger than the dma max segment size,
 +       * then we can't combine pages and the number of entries in the SGL
 +       * will just be the
 +       * <number of pages> * <chunks of max segment size in each page>
 +       */
 +      if (page_size > dma_max_seg_size) {
 +              /* we should limit number of pages according to the exported size */
 +              cur_npages = DIV_ROUND_UP_SECTOR_T(cur_size_to_export, page_size);
 +              nents = cur_npages * DIV_ROUND_UP_SECTOR_T(page_size, dma_max_seg_size);
 +      } else {
 +              cur_npages = npages;
 +
 +              /* Get number of non-contiguous chunks */
 +              for (i = 1, nents = 1, chunk_size = page_size ; i < cur_npages ; i++) {
 +                      if (pages[i - 1] + page_size != pages[i] ||
 +                                      chunk_size + page_size > dma_max_seg_size) {
 +                              nents++;
 +                              chunk_size = page_size;
 +                              continue;
 +                      }
 +
 +                      chunk_size += page_size;
 +              }
 +      }
 +
 +      rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
 +      if (rc)
 +              goto error_free;
 +
 +      cur_page = 0;
 +
 +      if (page_size > dma_max_seg_size) {
 +              u64 size_left, cur_device_address = 0;
 +
 +              size_left = page_size;
 +
 +              /* Need to split each page into the number of chunks of
 +               * dma_max_seg_size
 +               */
 +              for_each_sgtable_dma_sg(sgt, sg, i) {
 +                      if (size_left == page_size)
 +                              cur_device_address =
 +                                      pages[cur_page] - prop->dram_base_address;
 +                      else
 +                              cur_device_address += dma_max_seg_size;
 +
 +                      /* make sure not to export over exported size */
 +                      chunk_size = min3(size_left, dma_max_seg_size, cur_size_to_export);
 +
 +                      bar_address = hdev->dram_pci_bar_start + cur_device_address;
 +
 +                      rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
 +                      if (rc)
 +                              goto error_unmap;
 +
 +                      cur_size_to_export -= chunk_size;
 +
 +                      if (size_left > dma_max_seg_size) {
 +                              size_left -= dma_max_seg_size;
 +                      } else {
 +                              cur_page++;
 +                              size_left = page_size;
 +                      }
 +              }
 +      } else {
 +              /* Merge pages and put them into the scatterlist */
 +              for_each_sgtable_dma_sg(sgt, sg, i) {
 +                      chunk_size = page_size;
 +                      for (j = cur_page + 1 ; j < cur_npages ; j++) {
 +                              if (pages[j - 1] + page_size != pages[j] ||
 +                                              chunk_size + page_size > dma_max_seg_size)
 +                                      break;
 +
 +                              chunk_size += page_size;
 +                      }
 +
 +                      bar_address = hdev->dram_pci_bar_start +
 +                                      (pages[cur_page] - prop->dram_base_address);
 +
 +                      /* make sure not to export over exported size */
 +                      chunk_size = min(chunk_size, cur_size_to_export);
 +                      rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
 +                      if (rc)
 +                              goto error_unmap;
 +
 +                      cur_size_to_export -= chunk_size;
 +                      cur_page = j;
 +              }
 +      }
 +
 +      /* Because we are not going to include a CPU list we want to have some
 +       * chance that other users will detect this by setting the orig_nents
 +       * to 0 and using only nents (length of DMA list) when going over the
 +       * sgl
 +       */
 +      sgt->orig_nents = 0;
 +
 +      return sgt;
 +
 +error_unmap:
 +      for_each_sgtable_dma_sg(sgt, sg, i) {
 +              if (!sg_dma_len(sg))
 +                      continue;
 +
 +              dma_unmap_resource(dev, sg_dma_address(sg),
 +                                      sg_dma_len(sg), dir,
 +                                      DMA_ATTR_SKIP_CPU_SYNC);
 +      }
 +
 +      sg_free_table(sgt);
 +
 +error_free:
 +      kfree(sgt);
 +      return ERR_PTR(rc);
 +}
 +
 +static int hl_dmabuf_attach(struct dma_buf *dmabuf,
 +                              struct dma_buf_attachment *attachment)
 +{
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      int rc;
 +
 +      hl_dmabuf = dmabuf->priv;
 +      hdev = hl_dmabuf->ctx->hdev;
 +
 +      rc = pci_p2pdma_distance(hdev->pdev, attachment->dev, true);
 +
 +      if (rc < 0)
 +              attachment->peer2peer = false;
 +      return 0;
 +}
 +
 +static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
 +                                      enum dma_data_direction dir)
 +{
 +      struct dma_buf *dma_buf = attachment->dmabuf;
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      struct sg_table *sgt;
 +
 +      hl_dmabuf = dma_buf->priv;
 +      hdev = hl_dmabuf->ctx->hdev;
 +      phys_pg_pack = hl_dmabuf->phys_pg_pack;
 +
 +      if (!attachment->peer2peer) {
 +              dev_dbg(hdev->dev, "Failed to map dmabuf because p2p is disabled\n");
 +              return ERR_PTR(-EPERM);
 +      }
 +
 +      if (phys_pg_pack)
 +              sgt = alloc_sgt_from_device_pages(hdev,
 +                                              phys_pg_pack->pages,
 +                                              phys_pg_pack->npages,
 +                                              phys_pg_pack->page_size,
 +                                              phys_pg_pack->exported_size,
 +                                              attachment->dev,
 +                                              dir);
 +      else
 +              sgt = alloc_sgt_from_device_pages(hdev,
 +                                              &hl_dmabuf->device_address,
 +                                              1,
 +                                              hl_dmabuf->dmabuf->size,
 +                                              0,
 +                                              attachment->dev,
 +                                              dir);
 +
 +      if (IS_ERR(sgt))
 +              dev_err(hdev->dev, "failed (%ld) to initialize sgt for dmabuf\n", PTR_ERR(sgt));
 +
 +      return sgt;
 +}
 +
 +static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
 +                                struct sg_table *sgt,
 +                                enum dma_data_direction dir)
 +{
 +      struct scatterlist *sg;
 +      int i;
 +
 +      /* The memory behind the dma-buf has *always* resided on the device itself, i.e. it lives
 +       * only in the 'device' domain (after all, it maps a PCI bar address which points to the
 +       * device memory).
 +       *
 +       * Therefore, it was never in the 'CPU' domain and hence, there is no need to perform
 +       * a sync of the memory to the CPU's cache, as it never resided inside that cache.
 +       */
 +      for_each_sgtable_dma_sg(sgt, sg, i)
 +              dma_unmap_resource(attachment->dev, sg_dma_address(sg),
 +                                      sg_dma_len(sg), dir,
 +                                      DMA_ATTR_SKIP_CPU_SYNC);
 +
 +      /* Need to restore orig_nents because sg_free_table use that field */
 +      sgt->orig_nents = sgt->nents;
 +      sg_free_table(sgt);
 +      kfree(sgt);
 +}
 +
 +static void hl_release_dmabuf(struct dma_buf *dmabuf)
 +{
 +      struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv;
 +      struct hl_ctx *ctx;
 +
 +      if (!hl_dmabuf)
 +              return;
 +
 +      ctx = hl_dmabuf->ctx;
 +
 +      if (hl_dmabuf->memhash_hnode) {
 +              mutex_lock(&ctx->mem_hash_lock);
 +              hl_dmabuf->memhash_hnode->export_cnt--;
 +              mutex_unlock(&ctx->mem_hash_lock);
 +      }
 +
 +      hl_ctx_put(ctx);
 +      kfree(hl_dmabuf);
 +}
 +
 +static const struct dma_buf_ops habanalabs_dmabuf_ops = {
 +      .attach = hl_dmabuf_attach,
 +      .map_dma_buf = hl_map_dmabuf,
 +      .unmap_dma_buf = hl_unmap_dmabuf,
 +      .release = hl_release_dmabuf,
 +};
 +
 +static int export_dmabuf(struct hl_ctx *ctx,
 +                              struct hl_dmabuf_priv *hl_dmabuf,
 +                              u64 total_size, int flags, int *dmabuf_fd)
 +{
 +      DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 +      struct hl_device *hdev = ctx->hdev;
 +      int rc, fd;
 +
 +      exp_info.ops = &habanalabs_dmabuf_ops;
 +      exp_info.size = total_size;
 +      exp_info.flags = flags;
 +      exp_info.priv = hl_dmabuf;
 +
 +      hl_dmabuf->dmabuf = dma_buf_export(&exp_info);
 +      if (IS_ERR(hl_dmabuf->dmabuf)) {
 +              dev_err(hdev->dev, "failed to export dma-buf\n");
 +              return PTR_ERR(hl_dmabuf->dmabuf);
 +      }
 +
 +      fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
 +      if (fd < 0) {
 +              dev_err(hdev->dev, "failed to get a file descriptor for a dma-buf, %d\n", fd);
 +              rc = fd;
 +              goto err_dma_buf_put;
 +      }
 +
 +      hl_dmabuf->ctx = ctx;
 +      hl_ctx_get(hl_dmabuf->ctx);
 +
 +      *dmabuf_fd = fd;
 +
 +      return 0;
 +
 +err_dma_buf_put:
 +      hl_dmabuf->dmabuf->priv = NULL;
 +      dma_buf_put(hl_dmabuf->dmabuf);
 +      return rc;
 +}
 +
 +static int validate_export_params_common(struct hl_device *hdev, u64 device_addr, u64 size)
 +{
 +      if (!IS_ALIGNED(device_addr, PAGE_SIZE)) {
 +              dev_dbg(hdev->dev,
 +                      "exported device memory address 0x%llx should be aligned to 0x%lx\n",
 +                      device_addr, PAGE_SIZE);
 +              return -EINVAL;
 +      }
 +
 +      if (size < PAGE_SIZE) {
 +              dev_dbg(hdev->dev,
 +                      "exported device memory size %llu should be equal to or greater than %lu\n",
 +                      size, PAGE_SIZE);
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int validate_export_params_no_mmu(struct hl_device *hdev, u64 device_addr, u64 size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 bar_address;
 +      int rc;
 +
 +      rc = validate_export_params_common(hdev, device_addr, size);
 +      if (rc)
 +              return rc;
 +
 +      if (device_addr < prop->dram_user_base_address ||
 +                              (device_addr + size) > prop->dram_end_address ||
 +                              (device_addr + size) < device_addr) {
 +              dev_dbg(hdev->dev,
 +                      "DRAM memory range 0x%llx (+0x%llx) is outside of DRAM boundaries\n",
 +                      device_addr, size);
 +              return -EINVAL;
 +      }
 +
 +      bar_address = hdev->dram_pci_bar_start + (device_addr - prop->dram_base_address);
 +
 +      if ((bar_address + size) > (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
 +                      (bar_address + size) < bar_address) {
 +              dev_dbg(hdev->dev,
 +                      "DRAM memory range 0x%llx (+0x%llx) is outside of PCI BAR boundaries\n",
 +                      device_addr, size);
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size, u64 offset,
 +                                      struct hl_vm_phys_pg_pack *phys_pg_pack)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 bar_address;
 +      int i, rc;
 +
 +      rc = validate_export_params_common(hdev, device_addr, size);
 +      if (rc)
 +              return rc;
 +
 +      if ((offset + size) > phys_pg_pack->total_size) {
 +              dev_dbg(hdev->dev, "offset %#llx and size %#llx exceed total map size %#llx\n",
 +                              offset, size, phys_pg_pack->total_size);
 +              return -EINVAL;
 +      }
 +
 +      for (i = 0 ; i < phys_pg_pack->npages ; i++) {
 +
 +              bar_address = hdev->dram_pci_bar_start +
 +                                      (phys_pg_pack->pages[i] - prop->dram_base_address);
 +
 +              if ((bar_address + phys_pg_pack->page_size) >
 +                              (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
 +                              (bar_address + phys_pg_pack->page_size) < bar_address) {
 +                      dev_dbg(hdev->dev,
 +                              "DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n",
 +                                      phys_pg_pack->pages[i],
 +                                      phys_pg_pack->page_size);
 +
 +                      return -EINVAL;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm_hash_node *hnode;
 +
 +      /* get the memory handle */
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr)
 +              if (addr == hnode->vaddr)
 +                      break;
 +
 +      if (!hnode) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      if (upper_32_bits(hnode->handle)) {
 +              mutex_unlock(&ctx->mem_hash_lock);
 +              dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
 +                              hnode->handle, addr);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      /*
 +       * node found, increase export count so this memory cannot be unmapped
 +       * and the hash node cannot be deleted.
 +       */
 +      hnode->export_cnt++;
 +      mutex_unlock(&ctx->mem_hash_lock);
 +
 +      return hnode;
 +}
 +
 +static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
 +{
 +      mutex_lock(&ctx->mem_hash_lock);
 +      hnode->export_cnt--;
 +      mutex_unlock(&ctx->mem_hash_lock);
 +}
 +
 +static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev,
 +                                                      struct hl_vm_hash_node *hnode)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack;
 +      struct hl_vm *vm = &hdev->vm;
 +
 +      spin_lock(&vm->idr_lock);
 +      phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) hnode->handle);
 +      if (!phys_pg_pack) {
 +              spin_unlock(&vm->idr_lock);
 +              dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) hnode->handle);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      spin_unlock(&vm->idr_lock);
 +
 +      if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
 +              dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", hnode->handle);
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      return phys_pg_pack;
 +}
 +
 +/**
 + * export_dmabuf_from_addr() - export a dma-buf object for the given memory
 + *                             address and size.
 + * @ctx: pointer to the context structure.
 + * @addr: device address.
 + * @size: size of device memory to export.
 + * @offset: the offset into the buffer from which to start exporting
 + * @flags: DMA-BUF file/FD flags.
 + * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
 + *
 + * Create and export a dma-buf object for an existing memory allocation inside
 + * the device memory, and return a FD which is associated with the dma-buf
 + * object.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 offset,
 +                                      int flags, int *dmabuf_fd)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
 +      struct hl_vm_hash_node *hnode = NULL;
 +      struct asic_fixed_properties *prop;
 +      struct hl_dmabuf_priv *hl_dmabuf;
 +      struct hl_device *hdev;
 +      u64 export_addr;
 +      int rc;
 +
 +      hdev = ctx->hdev;
 +      prop = &hdev->asic_prop;
 +
 +      /* offset must be 0 in devices without virtual memory support */
 +      if (!prop->dram_supports_virtual_memory && offset) {
 +              dev_dbg(hdev->dev, "offset is not allowed in device without virtual memory\n");
 +              return -EINVAL;
 +      }
 +
 +      export_addr = addr + offset;
 +
 +      hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
 +      if (!hl_dmabuf)
 +              return -ENOMEM;
 +
 +      if (prop->dram_supports_virtual_memory) {
 +              hnode = memhash_node_export_get(ctx, addr);
 +              if (IS_ERR(hnode)) {
 +                      rc = PTR_ERR(hnode);
 +                      goto err_free_dmabuf_wrapper;
 +              }
 +              phys_pg_pack = get_phys_pg_pack_from_hash_node(hdev, hnode);
 +              if (IS_ERR(phys_pg_pack)) {
 +                      rc = PTR_ERR(phys_pg_pack);
 +                      goto dec_memhash_export_cnt;
 +              }
 +              rc = validate_export_params(hdev, export_addr, size, offset, phys_pg_pack);
 +              if (rc)
 +                      goto dec_memhash_export_cnt;
 +
 +              phys_pg_pack->exported_size = size;
 +              hl_dmabuf->phys_pg_pack = phys_pg_pack;
 +              hl_dmabuf->memhash_hnode = hnode;
 +      } else {
 +              rc = validate_export_params_no_mmu(hdev, export_addr, size);
 +              if (rc)
 +                      goto err_free_dmabuf_wrapper;
 +      }
 +
 +      hl_dmabuf->device_address = export_addr;
 +
 +      rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd);
 +      if (rc)
 +              goto dec_memhash_export_cnt;
 +
 +      return 0;
 +
 +dec_memhash_export_cnt:
 +      if (prop->dram_supports_virtual_memory)
 +              memhash_node_export_put(ctx, hnode);
 +err_free_dmabuf_wrapper:
 +      kfree(hl_dmabuf);
 +      return rc;
 +}
 +
 +static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
 +{
 +      struct hl_device *hdev = hpriv->hdev;
 +      u64 block_handle, device_addr = 0;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u32 handle = 0, block_size;
 +      int rc;
 +
 +      switch (args->in.op) {
 +      case HL_MEM_OP_ALLOC:
 +              if (args->in.alloc.mem_size == 0) {
 +                      dev_err(hdev->dev, "alloc size must be larger than 0\n");
 +                      rc = -EINVAL;
 +                      goto out;
 +              }
 +
 +              /* Force contiguous as there are no real MMU
 +               * translations to overcome physical memory gaps
 +               */
 +              args->in.flags |= HL_MEM_CONTIGUOUS;
 +              rc = alloc_device_memory(ctx, &args->in, &handle);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.handle = (__u64) handle;
 +              break;
 +
 +      case HL_MEM_OP_FREE:
 +              rc = free_device_memory(ctx, &args->in);
 +              break;
 +
 +      case HL_MEM_OP_MAP:
 +              if (args->in.flags & HL_MEM_USERPTR) {
 +                      dev_err(hdev->dev, "Failed to map host memory when MMU is disabled\n");
 +                      rc = -EPERM;
 +              } else {
 +                      rc = get_paddr_from_handle(ctx, &args->in, &device_addr);
 +                      memset(args, 0, sizeof(*args));
 +                      args->out.device_virt_addr = device_addr;
 +              }
 +
 +              break;
 +
 +      case HL_MEM_OP_UNMAP:
 +              rc = 0;
 +              break;
 +
 +      case HL_MEM_OP_MAP_BLOCK:
 +              rc = map_block(hdev, args->in.map_block.block_addr, &block_handle, &block_size);
 +              args->out.block_handle = block_handle;
 +              args->out.block_size = block_size;
 +              break;
 +
 +      case HL_MEM_OP_EXPORT_DMABUF_FD:
 +              dev_err(hdev->dev, "Failed to export dma-buf object when MMU is disabled\n");
 +              rc = -EPERM;
 +              break;
 +
 +      case HL_MEM_OP_TS_ALLOC:
 +              rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 +              rc = -EINVAL;
 +              break;
 +      }
 +
 +out:
 +      return rc;
 +}
 +
 +static void ts_buff_release(struct hl_mmap_mem_buf *buf)
 +{
 +      struct hl_ts_buff *ts_buff = buf->private;
 +
 +      vfree(ts_buff->kernel_buff_address);
 +      vfree(ts_buff->user_buff_address);
 +      kfree(ts_buff);
 +}
 +
 +static int hl_ts_mmap(struct hl_mmap_mem_buf *buf, struct vm_area_struct *vma, void *args)
 +{
 +      struct hl_ts_buff *ts_buff = buf->private;
 +
++      vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
 +      return remap_vmalloc_range(vma, ts_buff->user_buff_address, 0);
 +}
 +
 +static int hl_ts_alloc_buf(struct hl_mmap_mem_buf *buf, gfp_t gfp, void *args)
 +{
 +      struct hl_ts_buff *ts_buff = NULL;
 +      u32 num_elements;
 +      size_t size;
 +      void *p;
 +
 +      num_elements = *(u32 *)args;
 +
 +      ts_buff = kzalloc(sizeof(*ts_buff), gfp);
 +      if (!ts_buff)
 +              return -ENOMEM;
 +
 +      /* Allocate the user buffer */
 +      size = num_elements * sizeof(u64);
 +      p = vmalloc_user(size);
 +      if (!p)
 +              goto free_mem;
 +
 +      ts_buff->user_buff_address = p;
 +      buf->mappable_size = size;
 +
 +      /* Allocate the internal kernel buffer */
 +      size = num_elements * sizeof(struct hl_user_pending_interrupt);
 +      p = vzalloc(size);
 +      if (!p)
 +              goto free_user_buff;
 +
 +      ts_buff->kernel_buff_address = p;
 +      ts_buff->kernel_buff_size = size;
 +
 +      buf->private = ts_buff;
 +
 +      return 0;
 +
 +free_user_buff:
 +      vfree(ts_buff->user_buff_address);
 +free_mem:
 +      kfree(ts_buff);
 +      return -ENOMEM;
 +}
 +
 +static struct hl_mmap_mem_buf_behavior hl_ts_behavior = {
 +      .topic = "TS",
 +      .mem_id = HL_MMAP_TYPE_TS_BUFF,
 +      .mmap = hl_ts_mmap,
 +      .alloc = hl_ts_alloc_buf,
 +      .release = ts_buff_release,
 +};
 +
 +/**
 + * allocate_timestamps_buffers() - allocate timestamps buffers
 + * This function will allocate ts buffer that will later on be mapped to the user
 + * in order to be able to read the timestamp.
 + * in additon it'll allocate an extra buffer for registration management.
 + * since we cannot fail during registration for out-of-memory situation, so
 + * we'll prepare a pool which will be used as user interrupt nodes and instead
 + * of dynamically allocating nodes while registration we'll pick the node from
 + * this pool. in addtion it'll add node to the mapping hash which will be used
 + * to map user ts buffer to the internal kernel ts buffer.
 + * @hpriv: pointer to the private data of the fd
 + * @args: ioctl input
 + * @handle: user timestamp buffer handle as an output
 + */
 +static int allocate_timestamps_buffers(struct hl_fpriv *hpriv, struct hl_mem_in *args, u64 *handle)
 +{
 +      struct hl_mem_mgr *mmg = &hpriv->mem_mgr;
 +      struct hl_mmap_mem_buf *buf;
 +
 +      if (args->num_of_elements > TS_MAX_ELEMENTS_NUM) {
 +              dev_err(mmg->dev, "Num of elements exceeds Max allowed number (0x%x > 0x%x)\n",
 +                              args->num_of_elements, TS_MAX_ELEMENTS_NUM);
 +              return -EINVAL;
 +      }
 +
 +      buf = hl_mmap_mem_buf_alloc(mmg, &hl_ts_behavior, GFP_KERNEL, &args->num_of_elements);
 +      if (!buf)
 +              return -ENOMEM;
 +
 +      *handle = buf->handle;
 +
 +      return 0;
 +}
 +
 +int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
 +{
 +      enum hl_device_status status;
 +      union hl_mem_args *args = data;
 +      struct hl_device *hdev = hpriv->hdev;
 +      struct hl_ctx *ctx = hpriv->ctx;
 +      u64 block_handle, device_addr = 0;
 +      u32 handle = 0, block_size;
 +      int rc, dmabuf_fd = -EBADF;
 +
 +      if (!hl_device_operational(hdev, &status)) {
 +              dev_dbg_ratelimited(hdev->dev,
 +                      "Device is %s. Can't execute MEMORY IOCTL\n",
 +                      hdev->status[status]);
 +              return -EBUSY;
 +      }
 +
 +      if (!hdev->mmu_enable)
 +              return mem_ioctl_no_mmu(hpriv, args);
 +
 +      switch (args->in.op) {
 +      case HL_MEM_OP_ALLOC:
 +              if (args->in.alloc.mem_size == 0) {
 +                      dev_err(hdev->dev,
 +                              "alloc size must be larger than 0\n");
 +                      rc = -EINVAL;
 +                      goto out;
 +              }
 +
 +              /* If DRAM does not support virtual memory the driver won't
 +               * handle the allocation/freeing of that memory. However, for
 +               * system administration/monitoring purposes, the driver will
 +               * keep track of the amount of DRAM memory that is allocated
 +               * and freed by the user. Because this code totally relies on
 +               * the user's input, the driver can't ensure the validity
 +               * of this accounting.
 +               */
 +              if (!hdev->asic_prop.dram_supports_virtual_memory) {
 +                      atomic64_add(args->in.alloc.mem_size,
 +                                      &ctx->dram_phys_mem);
 +                      atomic64_add(args->in.alloc.mem_size,
 +                                      &hdev->dram_used_mem);
 +
 +                      dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
 +                      rc = 0;
 +
 +                      memset(args, 0, sizeof(*args));
 +                      args->out.handle = 0;
 +                      goto out;
 +              }
 +
 +              rc = alloc_device_memory(ctx, &args->in, &handle);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.handle = (__u64) handle;
 +              break;
 +
 +      case HL_MEM_OP_FREE:
 +              /* If DRAM does not support virtual memory the driver won't
 +               * handle the allocation/freeing of that memory. However, for
 +               * system administration/monitoring purposes, the driver will
 +               * keep track of the amount of DRAM memory that is allocated
 +               * and freed by the user. Because this code totally relies on
 +               * the user's input, the driver can't ensure the validity
 +               * of this accounting.
 +               */
 +              if (!hdev->asic_prop.dram_supports_virtual_memory) {
 +                      atomic64_sub(args->in.alloc.mem_size,
 +                                      &ctx->dram_phys_mem);
 +                      atomic64_sub(args->in.alloc.mem_size,
 +                                      &hdev->dram_used_mem);
 +
 +                      dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
 +                      rc = 0;
 +
 +                      goto out;
 +              }
 +
 +              rc = free_device_memory(ctx, &args->in);
 +              break;
 +
 +      case HL_MEM_OP_MAP:
 +              rc = map_device_va(ctx, &args->in, &device_addr);
 +
 +              memset(args, 0, sizeof(*args));
 +              args->out.device_virt_addr = device_addr;
 +              break;
 +
 +      case HL_MEM_OP_UNMAP:
 +              rc = unmap_device_va(ctx, &args->in, false);
 +              break;
 +
 +      case HL_MEM_OP_MAP_BLOCK:
 +              rc = map_block(hdev, args->in.map_block.block_addr,
 +                              &block_handle, &block_size);
 +              args->out.block_handle = block_handle;
 +              args->out.block_size = block_size;
 +              break;
 +
 +      case HL_MEM_OP_EXPORT_DMABUF_FD:
 +              rc = export_dmabuf_from_addr(ctx,
 +                              args->in.export_dmabuf_fd.addr,
 +                              args->in.export_dmabuf_fd.mem_size,
 +                              args->in.export_dmabuf_fd.offset,
 +                              args->in.flags,
 +                              &dmabuf_fd);
 +              memset(args, 0, sizeof(*args));
 +              args->out.fd = dmabuf_fd;
 +              break;
 +
 +      case HL_MEM_OP_TS_ALLOC:
 +              rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
 +              rc = -EINVAL;
 +              break;
 +      }
 +
 +out:
 +      return rc;
 +}
 +
 +static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                              u32 npages, u64 start, u32 offset,
 +                              struct hl_userptr *userptr)
 +{
 +      int rc;
 +
 +      if (!access_ok((void __user *) (uintptr_t) addr, size)) {
 +              dev_err(hdev->dev, "user pointer is invalid - 0x%llx\n", addr);
 +              return -EFAULT;
 +      }
 +
 +      userptr->pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
 +      if (!userptr->pages)
 +              return -ENOMEM;
 +
 +      rc = pin_user_pages_fast(start, npages, FOLL_WRITE | FOLL_LONGTERM,
 +                               userptr->pages);
 +
 +      if (rc != npages) {
 +              dev_err(hdev->dev,
 +                      "Failed (%d) to pin host memory with user ptr 0x%llx, size 0x%llx, npages %d\n",
 +                      rc, addr, size, npages);
 +              if (rc < 0)
 +                      goto destroy_pages;
 +              npages = rc;
 +              rc = -EFAULT;
 +              goto put_pages;
 +      }
 +      userptr->npages = npages;
 +
 +      rc = sg_alloc_table_from_pages(userptr->sgt,
 +                                     userptr->pages,
 +                                     npages, offset, size, GFP_KERNEL);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "failed to create SG table from pages\n");
 +              goto put_pages;
 +      }
 +
 +      return 0;
 +
 +put_pages:
 +      unpin_user_pages(userptr->pages, npages);
 +destroy_pages:
 +      kvfree(userptr->pages);
 +      return rc;
 +}
 +
 +/**
 + * hl_pin_host_memory() - pins a chunk of host memory.
 + * @hdev: pointer to the habanalabs device structure.
 + * @addr: the host virtual address of the memory area.
 + * @size: the size of the memory area.
 + * @userptr: pointer to hl_userptr structure.
 + *
 + * This function does the following:
 + * - Pins the physical pages.
 + * - Create an SG list from those pages.
 + */
 +int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                                      struct hl_userptr *userptr)
 +{
 +      u64 start, end;
 +      u32 npages, offset;
 +      int rc;
 +
 +      if (!size) {
 +              dev_err(hdev->dev, "size to pin is invalid - %llu\n", size);
 +              return -EINVAL;
 +      }
 +
 +      /*
 +       * If the combination of the address and size requested for this memory
 +       * region causes an integer overflow, return error.
 +       */
 +      if (((addr + size) < addr) ||
 +                      PAGE_ALIGN(addr + size) < (addr + size)) {
 +              dev_err(hdev->dev,
 +                      "user pointer 0x%llx + %llu causes integer overflow\n",
 +                      addr, size);
 +              return -EINVAL;
 +      }
 +
 +      userptr->pid = current->pid;
 +      userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_KERNEL);
 +      if (!userptr->sgt)
 +              return -ENOMEM;
 +
 +      start = addr & PAGE_MASK;
 +      offset = addr & ~PAGE_MASK;
 +      end = PAGE_ALIGN(addr + size);
 +      npages = (end - start) >> PAGE_SHIFT;
 +
 +      userptr->size = size;
 +      userptr->addr = addr;
 +      userptr->dma_mapped = false;
 +      INIT_LIST_HEAD(&userptr->job_node);
 +
 +      rc = get_user_memory(hdev, addr, size, npages, start, offset,
 +                              userptr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "failed to get user memory for address 0x%llx\n",
 +                      addr);
 +              goto free_sgt;
 +      }
 +
 +      hl_debugfs_add_userptr(hdev, userptr);
 +
 +      return 0;
 +
 +free_sgt:
 +      kfree(userptr->sgt);
 +      return rc;
 +}
 +
 +/*
 + * hl_unpin_host_memory - unpins a chunk of host memory.
 + * @hdev: pointer to the habanalabs device structure
 + * @userptr: pointer to hl_userptr structure
 + *
 + * This function does the following:
 + * - Unpins the physical pages related to the host memory
 + * - Free the SG list
 + */
 +void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 +{
 +      hl_debugfs_remove_userptr(hdev, userptr);
 +
 +      if (userptr->dma_mapped)
 +              hdev->asic_funcs->hl_dma_unmap_sgtable(hdev, userptr->sgt, userptr->dir);
 +
 +      unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
 +      kvfree(userptr->pages);
 +
 +      list_del(&userptr->job_node);
 +
 +      sg_free_table(userptr->sgt);
 +      kfree(userptr->sgt);
 +}
 +
 +/**
 + * hl_userptr_delete_list() - clear userptr list.
 + * @hdev: pointer to the habanalabs device structure.
 + * @userptr_list: pointer to the list to clear.
 + *
 + * This function does the following:
 + * - Iterates over the list and unpins the host memory and frees the userptr
 + *   structure.
 + */
 +void hl_userptr_delete_list(struct hl_device *hdev,
 +                              struct list_head *userptr_list)
 +{
 +      struct hl_userptr *userptr, *tmp;
 +
 +      list_for_each_entry_safe(userptr, tmp, userptr_list, job_node) {
 +              hl_unpin_host_memory(hdev, userptr);
 +              kfree(userptr);
 +      }
 +
 +      INIT_LIST_HEAD(userptr_list);
 +}
 +
 +/**
 + * hl_userptr_is_pinned() - returns whether the given userptr is pinned.
 + * @hdev: pointer to the habanalabs device structure.
 + * @addr: user address to check.
 + * @size: user block size to check.
 + * @userptr_list: pointer to the list to clear.
 + * @userptr: pointer to userptr to check.
 + *
 + * This function does the following:
 + * - Iterates over the list and checks if the given userptr is in it, means is
 + *   pinned. If so, returns true, otherwise returns false.
 + */
 +bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr,
 +                              u32 size, struct list_head *userptr_list,
 +                              struct hl_userptr **userptr)
 +{
 +      list_for_each_entry((*userptr), userptr_list, job_node) {
 +              if ((addr == (*userptr)->addr) && (size == (*userptr)->size))
 +                      return true;
 +      }
 +
 +      return false;
 +}
 +
 +/**
 + * va_range_init() - initialize virtual addresses range.
 + * @hdev: pointer to the habanalabs device structure.
 + * @va_ranges: pointer to va_ranges array.
 + * @range_type: virtual address range type.
 + * @start: range start address, inclusive.
 + * @end: range end address, inclusive.
 + * @page_size: page size for this va_range.
 + *
 + * This function does the following:
 + * - Initializes the virtual addresses list of the given range with the given
 + *   addresses.
 + */
 +static int va_range_init(struct hl_device *hdev, struct hl_va_range **va_ranges,
 +                              enum hl_va_range_type range_type, u64 start,
 +                              u64 end, u32 page_size)
 +{
 +      struct hl_va_range *va_range = va_ranges[range_type];
 +      int rc;
 +
 +      INIT_LIST_HEAD(&va_range->list);
 +
 +      /*
 +       * PAGE_SIZE alignment
 +       * it is the caller's responsibility to align the addresses if the
 +       * page size is not a power of 2
 +       */
 +
 +      if (is_power_of_2(page_size)) {
 +              start = round_up(start, page_size);
 +
 +              /*
 +               * The end of the range is inclusive, hence we need to align it
 +               * to the end of the last full page in the range. For example if
 +               * end = 0x3ff5 with page size 0x1000, we need to align it to
 +               * 0x2fff. The remaining 0xff5 bytes do not form a full page.
 +               */
 +              end = round_down(end + 1, page_size) - 1;
 +      }
 +
 +      if (start >= end) {
 +              dev_err(hdev->dev, "too small vm range for va list\n");
 +              return -EFAULT;
 +      }
 +
 +      rc = add_va_block(hdev, va_range, start, end);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to init host va list\n");
 +              return rc;
 +      }
 +
 +      va_range->start_addr = start;
 +      va_range->end_addr = end;
 +      va_range->page_size = page_size;
 +
 +      return 0;
 +}
 +
 +/**
 + * va_range_fini() - clear a virtual addresses range.
 + * @hdev: pointer to the habanalabs structure.
 + * @va_range: pointer to virtual addresses range.
 + *
 + * This function does the following:
 + * - Frees the virtual addresses block list and its lock.
 + */
 +static void va_range_fini(struct hl_device *hdev, struct hl_va_range *va_range)
 +{
 +      mutex_lock(&va_range->lock);
 +      clear_va_list_locked(hdev, &va_range->list);
 +      mutex_unlock(&va_range->lock);
 +
 +      mutex_destroy(&va_range->lock);
 +      kfree(va_range);
 +}
 +
 +/**
 + * vm_ctx_init_with_ranges() - initialize virtual memory for context.
 + * @ctx: pointer to the habanalabs context structure.
 + * @host_range_start: host virtual addresses range start.
 + * @host_range_end: host virtual addresses range end.
 + * @host_page_size: host page size.
 + * @host_huge_range_start: host virtual addresses range start for memory
 + *                         allocated with huge pages.
 + * @host_huge_range_end: host virtual addresses range end for memory allocated
 + *                        with huge pages.
 + * @host_huge_page_size: host huge page size.
 + * @dram_range_start: dram virtual addresses range start.
 + * @dram_range_end: dram virtual addresses range end.
 + * @dram_page_size: dram page size.
 + *
 + * This function initializes the following:
 + * - MMU for context.
 + * - Virtual address to area descriptor hashtable.
 + * - Virtual block list of available virtual memory.
 + */
 +static int vm_ctx_init_with_ranges(struct hl_ctx *ctx,
 +                                      u64 host_range_start,
 +                                      u64 host_range_end,
 +                                      u32 host_page_size,
 +                                      u64 host_huge_range_start,
 +                                      u64 host_huge_range_end,
 +                                      u32 host_huge_page_size,
 +                                      u64 dram_range_start,
 +                                      u64 dram_range_end,
 +                                      u32 dram_page_size)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      int i, rc;
 +
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++) {
 +              ctx->va_range[i] =
 +                      kzalloc(sizeof(struct hl_va_range), GFP_KERNEL);
 +              if (!ctx->va_range[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_va_range;
 +              }
 +      }
 +
 +      rc = hl_mmu_ctx_init(ctx);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init context %d\n", ctx->asid);
 +              goto free_va_range;
 +      }
 +
 +      mutex_init(&ctx->mem_hash_lock);
 +      hash_init(ctx->mem_hash);
 +
 +      mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +
 +      rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_HOST,
 +                      host_range_start, host_range_end, host_page_size);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init host vm range\n");
 +              goto mmu_ctx_fini;
 +      }
 +
 +      if (hdev->pmmu_huge_range) {
 +              mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +
 +              rc = va_range_init(hdev,
 +                      ctx->va_range, HL_VA_RANGE_TYPE_HOST_HUGE,
 +                      host_huge_range_start, host_huge_range_end,
 +                      host_huge_page_size);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to init host huge vm range\n");
 +                      goto clear_host_va_range;
 +              }
 +      } else {
 +              kfree(ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
 +              ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE] =
 +                              ctx->va_range[HL_VA_RANGE_TYPE_HOST];
 +      }
 +
 +      mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
 +
 +      rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_DRAM,
 +                      dram_range_start, dram_range_end, dram_page_size);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to init dram vm range\n");
 +              goto clear_host_huge_va_range;
 +      }
 +
 +      hl_debugfs_add_ctx_mem_hash(hdev, ctx);
 +
 +      return 0;
 +
 +clear_host_huge_va_range:
 +      mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
 +
 +      if (hdev->pmmu_huge_range) {
 +              mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +              clear_va_list_locked(hdev,
 +                      &ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->list);
 +              mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +      }
 +clear_host_va_range:
 +      if (hdev->pmmu_huge_range)
 +              mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
 +      mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +      clear_va_list_locked(hdev, &ctx->va_range[HL_VA_RANGE_TYPE_HOST]->list);
 +      mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +mmu_ctx_fini:
 +      mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
 +      mutex_destroy(&ctx->mem_hash_lock);
 +      hl_mmu_ctx_fini(ctx);
 +free_va_range:
 +      for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++)
 +              kfree(ctx->va_range[i]);
 +
 +      return rc;
 +}
 +
 +int hl_vm_ctx_init(struct hl_ctx *ctx)
 +{
 +      struct asic_fixed_properties *prop = &ctx->hdev->asic_prop;
 +      u64 host_range_start, host_range_end, host_huge_range_start,
 +              host_huge_range_end, dram_range_start, dram_range_end;
 +      u32 host_page_size, host_huge_page_size, dram_page_size;
 +
 +      atomic64_set(&ctx->dram_phys_mem, 0);
 +
 +      /*
 +       * - If MMU is enabled, init the ranges as usual.
 +       * - If MMU is disabled, in case of host mapping, the returned address
 +       *   is the given one.
 +       *   In case of DRAM mapping, the returned address is the physical
 +       *   address of the memory related to the given handle.
 +       */
 +      if (!ctx->hdev->mmu_enable)
 +              return 0;
 +
 +      dram_range_start = prop->dmmu.start_addr;
 +      dram_range_end = prop->dmmu.end_addr - 1;
 +      dram_page_size = prop->dram_page_size ?
 +                              prop->dram_page_size : prop->dmmu.page_size;
 +      host_range_start = prop->pmmu.start_addr;
 +      host_range_end = prop->pmmu.end_addr - 1;
 +      host_page_size = prop->pmmu.page_size;
 +      host_huge_range_start = prop->pmmu_huge.start_addr;
 +      host_huge_range_end = prop->pmmu_huge.end_addr - 1;
 +      host_huge_page_size = prop->pmmu_huge.page_size;
 +
 +      return vm_ctx_init_with_ranges(ctx, host_range_start, host_range_end,
 +                      host_page_size, host_huge_range_start,
 +                      host_huge_range_end, host_huge_page_size,
 +                      dram_range_start, dram_range_end, dram_page_size);
 +}
 +
 +/**
 + * hl_vm_ctx_fini() - virtual memory teardown of context.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function perform teardown the following:
 + * - Virtual block list of available virtual memory.
 + * - Virtual address to area descriptor hashtable.
 + * - MMU for context.
 + *
 + * In addition this function does the following:
 + * - Unmaps the existing hashtable nodes if the hashtable is not empty. The
 + *   hashtable should be empty as no valid mappings should exist at this
 + *   point.
 + * - Frees any existing physical page list from the idr which relates to the
 + *   current context asid.
 + * - This function checks the virtual block list for correctness. At this point
 + *   the list should contain one element which describes the whole virtual
 + *   memory range of the context. Otherwise, a warning is printed.
 + */
 +void hl_vm_ctx_fini(struct hl_ctx *ctx)
 +{
 +      struct hl_vm_phys_pg_pack *phys_pg_list, *tmp_phys_node;
 +      struct hl_device *hdev = ctx->hdev;
 +      struct hl_vm_hash_node *hnode;
 +      struct hl_vm *vm = &hdev->vm;
 +      struct hlist_node *tmp_node;
 +      struct list_head free_list;
 +      struct hl_mem_in args;
 +      int i;
 +
 +      if (!hdev->mmu_enable)
 +              return;
 +
 +      hl_debugfs_remove_ctx_mem_hash(hdev, ctx);
 +
 +      /*
 +       * Clearly something went wrong on hard reset so no point in printing
 +       * another side effect error
 +       */
 +      if (!hdev->reset_info.hard_reset_pending && !hash_empty(ctx->mem_hash))
 +              dev_dbg(hdev->dev,
 +                      "user released device without removing its memory mappings\n");
 +
 +      hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) {
 +              dev_dbg(hdev->dev,
 +                      "hl_mem_hash_node of vaddr 0x%llx of asid %d is still alive\n",
 +                      hnode->vaddr, ctx->asid);
 +              args.unmap.device_virt_addr = hnode->vaddr;
 +              unmap_device_va(ctx, &args, true);
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +
 +      /* invalidate the cache once after the unmapping loop */
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_PHYS_PACK);
 +
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      INIT_LIST_HEAD(&free_list);
 +
 +      spin_lock(&vm->idr_lock);
 +      idr_for_each_entry(&vm->phys_pg_pack_handles, phys_pg_list, i)
 +              if (phys_pg_list->asid == ctx->asid) {
 +                      dev_dbg(hdev->dev,
 +                              "page list 0x%px of asid %d is still alive\n",
 +                              phys_pg_list, ctx->asid);
 +
 +                      atomic64_sub(phys_pg_list->total_size, &hdev->dram_used_mem);
 +                      idr_remove(&vm->phys_pg_pack_handles, i);
 +                      list_add(&phys_pg_list->node, &free_list);
 +              }
 +      spin_unlock(&vm->idr_lock);
 +
 +      list_for_each_entry_safe(phys_pg_list, tmp_phys_node, &free_list, node)
 +              free_phys_pg_pack(hdev, phys_pg_list);
 +
 +      va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_DRAM]);
 +      va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST]);
 +
 +      if (hdev->pmmu_huge_range)
 +              va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
 +
 +      mutex_destroy(&ctx->mem_hash_lock);
 +      hl_mmu_ctx_fini(ctx);
 +
 +      /* In this case we need to clear the global accounting of DRAM usage
 +       * because the user notifies us on allocations. If the user is no more,
 +       * all DRAM is available
 +       */
 +      if (ctx->asid != HL_KERNEL_ASID_ID &&
 +                      !hdev->asic_prop.dram_supports_virtual_memory)
 +              atomic64_set(&hdev->dram_used_mem, 0);
 +}
 +
 +/**
 + * hl_vm_init() - initialize virtual memory module.
 + * @hdev: pointer to the habanalabs device structure.
 + *
 + * This function initializes the following:
 + * - MMU module.
 + * - DRAM physical pages pool of 2MB.
 + * - Idr for device memory allocation handles.
 + */
 +int hl_vm_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_vm *vm = &hdev->vm;
 +      int rc;
 +
 +      if (is_power_of_2(prop->dram_page_size))
 +              vm->dram_pg_pool =
 +                      gen_pool_create(__ffs(prop->dram_page_size), -1);
 +      else
 +              vm->dram_pg_pool =
 +                      gen_pool_create(__ffs(DRAM_POOL_PAGE_SIZE), -1);
 +
 +      if (!vm->dram_pg_pool) {
 +              dev_err(hdev->dev, "Failed to create dram page pool\n");
 +              return -ENOMEM;
 +      }
 +
 +      kref_init(&vm->dram_pg_pool_refcount);
 +
 +      rc = gen_pool_add(vm->dram_pg_pool, prop->dram_user_base_address,
 +                      prop->dram_end_address - prop->dram_user_base_address,
 +                      -1);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to dram page pool %d\n", rc);
 +              goto pool_add_err;
 +      }
 +
 +      spin_lock_init(&vm->idr_lock);
 +      idr_init(&vm->phys_pg_pack_handles);
 +
 +      atomic64_set(&hdev->dram_used_mem, 0);
 +
 +      vm->init_done = true;
 +
 +      return 0;
 +
 +pool_add_err:
 +      gen_pool_destroy(vm->dram_pg_pool);
 +
 +      return rc;
 +}
 +
 +/**
 + * hl_vm_fini() - virtual memory module teardown.
 + * @hdev: pointer to the habanalabs device structure.
 + *
 + * This function perform teardown to the following:
 + * - Idr for device memory allocation handles.
 + * - DRAM physical pages pool of 2MB.
 + * - MMU module.
 + */
 +void hl_vm_fini(struct hl_device *hdev)
 +{
 +      struct hl_vm *vm = &hdev->vm;
 +
 +      if (!vm->init_done)
 +              return;
 +
 +      /*
 +       * At this point all the contexts should be freed and hence no DRAM
 +       * memory should be in use. Hence the DRAM pool should be freed here.
 +       */
 +      if (kref_put(&vm->dram_pg_pool_refcount, dram_pg_pool_do_release) != 1)
 +              dev_warn(hdev->dev, "dram_pg_pool was not destroyed on %s\n",
 +                              __func__);
 +
 +      vm->init_done = false;
 +}
 +
 +/**
 + * hl_hw_block_mem_init() - HW block memory initialization.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function initializes the HW block virtual mapped addresses list and
 + * it's lock.
 + */
 +void hl_hw_block_mem_init(struct hl_ctx *ctx)
 +{
 +      mutex_init(&ctx->hw_block_list_lock);
 +      INIT_LIST_HEAD(&ctx->hw_block_mem_list);
 +}
 +
 +/**
 + * hl_hw_block_mem_fini() - HW block memory teardown.
 + * @ctx: pointer to the habanalabs context structure.
 + *
 + * This function clears the HW block virtual mapped addresses list and destroys
 + * it's lock.
 + */
 +void hl_hw_block_mem_fini(struct hl_ctx *ctx)
 +{
 +      struct hl_vm_hw_block_list_node *lnode, *tmp;
 +
 +      if (!list_empty(&ctx->hw_block_mem_list))
 +              dev_crit(ctx->hdev->dev, "HW block mem list isn't empty\n");
 +
 +      list_for_each_entry_safe(lnode, tmp, &ctx->hw_block_mem_list, node) {
 +              list_del(&lnode->node);
 +              kfree(lnode);
 +      }
 +
 +      mutex_destroy(&ctx->hw_block_list_lock);
 +}
index 71debe862c865fd1e4440036db32a63933ba8583,0000000000000000000000000000000000000000..bb858b94e1e81348853601019a907a268a4883fa
mode 100644,000000..100644
--- /dev/null
@@@ -1,9282 -1,0 +1,9282 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "gaudiP.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v1_1.h"
 +#include "../include/gaudi/gaudi_masks.h"
 +#include "../include/gaudi/gaudi_fw_if.h"
 +#include "../include/gaudi/gaudi_reg_map.h"
 +#include "../include/gaudi/gaudi_async_ids_map_extended.h"
 +
 +#include <linux/module.h>
 +#include <linux/pci.h>
 +#include <linux/firmware.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +#include <linux/seq_file.h>
 +
 +/*
 + * Gaudi security scheme:
 + *
 + * 1. Host is protected by:
 + *        - Range registers
 + *        - MMU
 + *
 + * 2. DDR is protected by:
 + *        - Range registers (protect the first 512MB)
 + *
 + * 3. Configuration is protected by:
 + *        - Range registers
 + *        - Protection bits
 + *
 + * MMU is always enabled.
 + *
 + * QMAN DMA channels 0,1 (PCI DMAN):
 + *     - DMA is not secured.
 + *     - PQ and CQ are secured.
 + *     - CP is secured: The driver needs to parse CB but WREG should be allowed
 + *                      because of TDMA (tensor DMA). Hence, WREG is always not
 + *                      secured.
 + *
 + * When the driver needs to use DMA it will check that Gaudi is idle, set DMA
 + * channel 0 to be secured, execute the DMA and change it back to not secured.
 + * Currently, the driver doesn't use the DMA while there are compute jobs
 + * running.
 + *
 + * The current use cases for the driver to use the DMA are:
 + *     - Clear SRAM on context switch (happens on context switch when device is
 + *       idle)
 + *     - MMU page tables area clear (happens on init)
 + *
 + * QMAN DMA 2-7, TPC, MME, NIC:
 + * PQ is secured and is located on the Host (HBM CON TPC3 bug)
 + * CQ, CP and the engine are not secured
 + *
 + */
 +
 +#define GAUDI_BOOT_FIT_FILE   "habanalabs/gaudi/gaudi-boot-fit.itb"
 +#define GAUDI_LINUX_FW_FILE   "habanalabs/gaudi/gaudi-fit.itb"
 +#define GAUDI_TPC_FW_FILE     "habanalabs/gaudi/gaudi_tpc.bin"
 +
 +#define GAUDI_DMA_POOL_BLK_SIZE               0x100 /* 256 bytes */
 +
 +#define GAUDI_RESET_TIMEOUT_MSEC      2000            /* 2000ms */
 +#define GAUDI_RESET_WAIT_MSEC         1               /* 1ms */
 +#define GAUDI_CPU_RESET_WAIT_MSEC     200             /* 200ms */
 +#define GAUDI_TEST_QUEUE_WAIT_USEC    100000          /* 100ms */
 +
 +#define GAUDI_PLDM_RESET_WAIT_MSEC    1000            /* 1s */
 +#define GAUDI_PLDM_HRESET_TIMEOUT_MSEC        20000           /* 20s */
 +#define GAUDI_PLDM_TEST_QUEUE_WAIT_USEC       1000000         /* 1s */
 +#define GAUDI_PLDM_MMU_TIMEOUT_USEC   (MMU_CONFIG_TIMEOUT_USEC * 100)
 +#define GAUDI_PLDM_QMAN0_TIMEOUT_USEC (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GAUDI_PLDM_TPC_KERNEL_WAIT_USEC       (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC       4000000         /* 4s */
 +#define GAUDI_MSG_TO_CPU_TIMEOUT_USEC 4000000         /* 4s */
 +#define GAUDI_WAIT_FOR_BL_TIMEOUT_USEC        15000000        /* 15s */
 +
 +#define GAUDI_QMAN0_FENCE_VAL         0x72E91AB9
 +
 +#define GAUDI_MAX_STRING_LEN          20
 +
 +#define GAUDI_CB_POOL_CB_CNT          512
 +#define GAUDI_CB_POOL_CB_SIZE         0x20000 /* 128KB */
 +
 +#define GAUDI_ALLOC_CPU_MEM_RETRY_CNT 3
 +
 +#define GAUDI_NUM_OF_TPC_INTR_CAUSE   20
 +
 +#define GAUDI_NUM_OF_QM_ERR_CAUSE     16
 +
 +#define GAUDI_NUM_OF_QM_ARB_ERR_CAUSE 3
 +
 +#define GAUDI_ARB_WDT_TIMEOUT         0xEE6b27FF /* 8 seconds */
 +
 +#define HBM_SCRUBBING_TIMEOUT_US      1000000 /* 1s */
 +
 +#define BIN_REG_STRING_SIZE   sizeof("0b10101010101010101010101010101010")
 +
 +#define MONITOR_SOB_STRING_SIZE               256
 +
 +static u32 gaudi_stream_master[GAUDI_STREAM_MASTER_ARR_SIZE] = {
 +      GAUDI_QUEUE_ID_DMA_0_0,
 +      GAUDI_QUEUE_ID_DMA_0_1,
 +      GAUDI_QUEUE_ID_DMA_0_2,
 +      GAUDI_QUEUE_ID_DMA_0_3,
 +      GAUDI_QUEUE_ID_DMA_1_0,
 +      GAUDI_QUEUE_ID_DMA_1_1,
 +      GAUDI_QUEUE_ID_DMA_1_2,
 +      GAUDI_QUEUE_ID_DMA_1_3
 +};
 +
 +static const char gaudi_irq_name[GAUDI_MSI_ENTRIES][GAUDI_MAX_STRING_LEN] = {
 +              "gaudi cq 0_0", "gaudi cq 0_1", "gaudi cq 0_2", "gaudi cq 0_3",
 +              "gaudi cq 1_0", "gaudi cq 1_1", "gaudi cq 1_2", "gaudi cq 1_3",
 +              "gaudi cq 5_0", "gaudi cq 5_1", "gaudi cq 5_2", "gaudi cq 5_3",
 +              "gaudi cpu eq"
 +};
 +
 +static const u8 gaudi_dma_assignment[GAUDI_DMA_MAX] = {
 +      [GAUDI_PCI_DMA_1] = GAUDI_ENGINE_ID_DMA_0,
 +      [GAUDI_PCI_DMA_2] = GAUDI_ENGINE_ID_DMA_1,
 +      [GAUDI_HBM_DMA_1] = GAUDI_ENGINE_ID_DMA_2,
 +      [GAUDI_HBM_DMA_2] = GAUDI_ENGINE_ID_DMA_3,
 +      [GAUDI_HBM_DMA_3] = GAUDI_ENGINE_ID_DMA_4,
 +      [GAUDI_HBM_DMA_4] = GAUDI_ENGINE_ID_DMA_5,
 +      [GAUDI_HBM_DMA_5] = GAUDI_ENGINE_ID_DMA_6,
 +      [GAUDI_HBM_DMA_6] = GAUDI_ENGINE_ID_DMA_7
 +};
 +
 +static const u8 gaudi_cq_assignment[NUMBER_OF_CMPLT_QUEUES] = {
 +      [0] = GAUDI_QUEUE_ID_DMA_0_0,
 +      [1] = GAUDI_QUEUE_ID_DMA_0_1,
 +      [2] = GAUDI_QUEUE_ID_DMA_0_2,
 +      [3] = GAUDI_QUEUE_ID_DMA_0_3,
 +      [4] = GAUDI_QUEUE_ID_DMA_1_0,
 +      [5] = GAUDI_QUEUE_ID_DMA_1_1,
 +      [6] = GAUDI_QUEUE_ID_DMA_1_2,
 +      [7] = GAUDI_QUEUE_ID_DMA_1_3,
 +};
 +
 +static const u16 gaudi_packet_sizes[MAX_PACKET_ID] = {
 +      [PACKET_WREG_32]        = sizeof(struct packet_wreg32),
 +      [PACKET_WREG_BULK]      = sizeof(struct packet_wreg_bulk),
 +      [PACKET_MSG_LONG]       = sizeof(struct packet_msg_long),
 +      [PACKET_MSG_SHORT]      = sizeof(struct packet_msg_short),
 +      [PACKET_CP_DMA]         = sizeof(struct packet_cp_dma),
 +      [PACKET_REPEAT]         = sizeof(struct packet_repeat),
 +      [PACKET_MSG_PROT]       = sizeof(struct packet_msg_prot),
 +      [PACKET_FENCE]          = sizeof(struct packet_fence),
 +      [PACKET_LIN_DMA]        = sizeof(struct packet_lin_dma),
 +      [PACKET_NOP]            = sizeof(struct packet_nop),
 +      [PACKET_STOP]           = sizeof(struct packet_stop),
 +      [PACKET_ARB_POINT]      = sizeof(struct packet_arb_point),
 +      [PACKET_WAIT]           = sizeof(struct packet_wait),
 +      [PACKET_LOAD_AND_EXE]   = sizeof(struct packet_load_and_exe)
 +};
 +
 +static inline bool validate_packet_id(enum packet_id id)
 +{
 +      switch (id) {
 +      case PACKET_WREG_32:
 +      case PACKET_WREG_BULK:
 +      case PACKET_MSG_LONG:
 +      case PACKET_MSG_SHORT:
 +      case PACKET_CP_DMA:
 +      case PACKET_REPEAT:
 +      case PACKET_MSG_PROT:
 +      case PACKET_FENCE:
 +      case PACKET_LIN_DMA:
 +      case PACKET_NOP:
 +      case PACKET_STOP:
 +      case PACKET_ARB_POINT:
 +      case PACKET_WAIT:
 +      case PACKET_LOAD_AND_EXE:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static const char * const
 +gaudi_tpc_interrupts_cause[GAUDI_NUM_OF_TPC_INTR_CAUSE] = {
 +      "tpc_address_exceed_slm",
 +      "tpc_div_by_0",
 +      "tpc_spu_mac_overflow",
 +      "tpc_spu_addsub_overflow",
 +      "tpc_spu_abs_overflow",
 +      "tpc_spu_fp_dst_nan_inf",
 +      "tpc_spu_fp_dst_denorm",
 +      "tpc_vpu_mac_overflow",
 +      "tpc_vpu_addsub_overflow",
 +      "tpc_vpu_abs_overflow",
 +      "tpc_vpu_fp_dst_nan_inf",
 +      "tpc_vpu_fp_dst_denorm",
 +      "tpc_assertions",
 +      "tpc_illegal_instruction",
 +      "tpc_pc_wrap_around",
 +      "tpc_qm_sw_err",
 +      "tpc_hbw_rresp_err",
 +      "tpc_hbw_bresp_err",
 +      "tpc_lbw_rresp_err",
 +      "tpc_lbw_bresp_err"
 +};
 +
 +static const char * const
 +gaudi_qman_error_cause[GAUDI_NUM_OF_QM_ERR_CAUSE] = {
 +      "PQ AXI HBW error",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped"
 +};
 +
 +static const char * const
 +gaudi_qman_arb_error_cause[GAUDI_NUM_OF_QM_ARB_ERR_CAUSE] = {
 +      "Choice push while full error",
 +      "Choice Q watchdog error",
 +      "MSG AXI LBW returned with error"
 +};
 +
 +static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_0 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_1 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_2 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_3 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_0 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_1 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_2 */
 +      QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_3 */
 +      QUEUE_TYPE_CPU, /* GAUDI_QUEUE_ID_CPU_PQ */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_3 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_0 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_1 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_2 */
 +      QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_3 */
 +};
 +
 +static struct hl_hw_obj_name_entry gaudi_so_id_to_str[] = {
 +      { .id = 0,  .name = "SYNC_OBJ_DMA_DOWN_FEEDBACK" },
 +      { .id = 1,  .name = "SYNC_OBJ_DMA_UP_FEEDBACK" },
 +      { .id = 2,  .name = "SYNC_OBJ_DMA_STATIC_DRAM_SRAM_FEEDBACK" },
 +      { .id = 3,  .name = "SYNC_OBJ_DMA_SRAM_DRAM_FEEDBACK" },
 +      { .id = 4,  .name = "SYNC_OBJ_FIRST_COMPUTE_FINISH" },
 +      { .id = 5,  .name = "SYNC_OBJ_HOST_DRAM_DONE" },
 +      { .id = 6,  .name = "SYNC_OBJ_DBG_CTR_DEPRECATED" },
 +      { .id = 7,  .name = "SYNC_OBJ_DMA_ACTIVATIONS_DRAM_SRAM_FEEDBACK" },
 +      { .id = 8,  .name = "SYNC_OBJ_ENGINE_SEM_MME_0" },
 +      { .id = 9,  .name = "SYNC_OBJ_ENGINE_SEM_MME_1" },
 +      { .id = 10, .name = "SYNC_OBJ_ENGINE_SEM_TPC_0" },
 +      { .id = 11, .name = "SYNC_OBJ_ENGINE_SEM_TPC_1" },
 +      { .id = 12, .name = "SYNC_OBJ_ENGINE_SEM_TPC_2" },
 +      { .id = 13, .name = "SYNC_OBJ_ENGINE_SEM_TPC_3" },
 +      { .id = 14, .name = "SYNC_OBJ_ENGINE_SEM_TPC_4" },
 +      { .id = 15, .name = "SYNC_OBJ_ENGINE_SEM_TPC_5" },
 +      { .id = 16, .name = "SYNC_OBJ_ENGINE_SEM_TPC_6" },
 +      { .id = 17, .name = "SYNC_OBJ_ENGINE_SEM_TPC_7" },
 +      { .id = 18, .name = "SYNC_OBJ_ENGINE_SEM_DMA_1" },
 +      { .id = 19, .name = "SYNC_OBJ_ENGINE_SEM_DMA_2" },
 +      { .id = 20, .name = "SYNC_OBJ_ENGINE_SEM_DMA_3" },
 +      { .id = 21, .name = "SYNC_OBJ_ENGINE_SEM_DMA_4" },
 +      { .id = 22, .name = "SYNC_OBJ_ENGINE_SEM_DMA_5" },
 +      { .id = 23, .name = "SYNC_OBJ_ENGINE_SEM_DMA_6" },
 +      { .id = 24, .name = "SYNC_OBJ_ENGINE_SEM_DMA_7" },
 +      { .id = 25, .name = "SYNC_OBJ_DBG_CTR_0" },
 +      { .id = 26, .name = "SYNC_OBJ_DBG_CTR_1" },
 +};
 +
 +static struct hl_hw_obj_name_entry gaudi_monitor_id_to_str[] = {
 +      { .id = 200, .name = "MON_OBJ_DMA_DOWN_FEEDBACK_RESET" },
 +      { .id = 201, .name = "MON_OBJ_DMA_UP_FEEDBACK_RESET" },
 +      { .id = 203, .name = "MON_OBJ_DRAM_TO_SRAM_QUEUE_FENCE" },
 +      { .id = 204, .name = "MON_OBJ_TPC_0_CLK_GATE" },
 +      { .id = 205, .name = "MON_OBJ_TPC_1_CLK_GATE" },
 +      { .id = 206, .name = "MON_OBJ_TPC_2_CLK_GATE" },
 +      { .id = 207, .name = "MON_OBJ_TPC_3_CLK_GATE" },
 +      { .id = 208, .name = "MON_OBJ_TPC_4_CLK_GATE" },
 +      { .id = 209, .name = "MON_OBJ_TPC_5_CLK_GATE" },
 +      { .id = 210, .name = "MON_OBJ_TPC_6_CLK_GATE" },
 +      { .id = 211, .name = "MON_OBJ_TPC_7_CLK_GATE" },
 +};
 +
 +static s64 gaudi_state_dump_specs_props[] = {
 +      [SP_SYNC_OBJ_BASE_ADDR] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0,
 +      [SP_NEXT_SYNC_OBJ_ADDR] = NEXT_SYNC_OBJ_ADDR_INTERVAL,
 +      [SP_SYNC_OBJ_AMOUNT] = NUM_OF_SOB_IN_BLOCK,
 +      [SP_MON_OBJ_WR_ADDR_LOW] =
 +              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0,
 +      [SP_MON_OBJ_WR_ADDR_HIGH] =
 +              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0,
 +      [SP_MON_OBJ_WR_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_DATA_0,
 +      [SP_MON_OBJ_ARM_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_ARM_0,
 +      [SP_MON_OBJ_STATUS] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0,
 +      [SP_MONITORS_AMOUNT] = NUM_OF_MONITORS_IN_BLOCK,
 +      [SP_TPC0_CMDQ] = mmTPC0_QM_GLBL_CFG0,
 +      [SP_TPC0_CFG_SO] = mmTPC0_CFG_QM_SYNC_OBJECT_ADDR,
 +      [SP_NEXT_TPC] = mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0,
 +      [SP_MME_CMDQ] = mmMME0_QM_GLBL_CFG0,
 +      [SP_MME_CFG_SO] = mmMME0_CTRL_ARCH_DESC_SYNC_OBJECT_ADDR_LOW_LOCAL,
 +      [SP_NEXT_MME] = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0,
 +      [SP_DMA_CMDQ] = mmDMA0_QM_GLBL_CFG0,
 +      [SP_DMA_CFG_SO] = mmDMA0_CORE_WR_COMP_ADDR_LO,
 +      [SP_DMA_QUEUES_OFFSET] = mmDMA1_QM_GLBL_CFG0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_NUM_OF_MME_ENGINES] = NUM_OF_MME_ENGINES,
 +      [SP_SUB_MME_ENG_NUM] = NUM_OF_MME_SUB_ENGINES,
 +      [SP_NUM_OF_DMA_ENGINES] = NUM_OF_DMA_ENGINES,
 +      [SP_NUM_OF_TPC_ENGINES] = NUM_OF_TPC_ENGINES,
 +      [SP_ENGINE_NUM_OF_QUEUES] = NUM_OF_QUEUES,
 +      [SP_ENGINE_NUM_OF_STREAMS] = NUM_OF_STREAMS,
 +      [SP_ENGINE_NUM_OF_FENCES] = NUM_OF_FENCES,
 +      [SP_FENCE0_CNT_OFFSET] =
 +              mmDMA0_QM_CP_FENCE0_CNT_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_FENCE0_RDATA_OFFSET] =
 +              mmDMA0_QM_CP_FENCE0_RDATA_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_CP_STS_OFFSET] = mmDMA0_QM_CP_STS_0 - mmDMA0_QM_GLBL_CFG0,
 +      [SP_NUM_CORES] = 1,
 +};
 +
 +static const int gaudi_queue_id_to_engine_id[] = {
 +      [GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3] = GAUDI_ENGINE_ID_DMA_0,
 +      [GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3] = GAUDI_ENGINE_ID_DMA_1,
 +      [GAUDI_QUEUE_ID_CPU_PQ] = GAUDI_ENGINE_ID_SIZE,
 +      [GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3] = GAUDI_ENGINE_ID_DMA_2,
 +      [GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3] = GAUDI_ENGINE_ID_DMA_3,
 +      [GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3] = GAUDI_ENGINE_ID_DMA_4,
 +      [GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3] = GAUDI_ENGINE_ID_DMA_5,
 +      [GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3] = GAUDI_ENGINE_ID_DMA_6,
 +      [GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3] = GAUDI_ENGINE_ID_DMA_7,
 +      [GAUDI_QUEUE_ID_MME_0_0...GAUDI_QUEUE_ID_MME_0_3] = GAUDI_ENGINE_ID_MME_0,
 +      [GAUDI_QUEUE_ID_MME_1_0...GAUDI_QUEUE_ID_MME_1_3] = GAUDI_ENGINE_ID_MME_2,
 +      [GAUDI_QUEUE_ID_TPC_0_0...GAUDI_QUEUE_ID_TPC_0_3] = GAUDI_ENGINE_ID_TPC_0,
 +      [GAUDI_QUEUE_ID_TPC_1_0...GAUDI_QUEUE_ID_TPC_1_3] = GAUDI_ENGINE_ID_TPC_1,
 +      [GAUDI_QUEUE_ID_TPC_2_0...GAUDI_QUEUE_ID_TPC_2_3] = GAUDI_ENGINE_ID_TPC_2,
 +      [GAUDI_QUEUE_ID_TPC_3_0...GAUDI_QUEUE_ID_TPC_3_3] = GAUDI_ENGINE_ID_TPC_3,
 +      [GAUDI_QUEUE_ID_TPC_4_0...GAUDI_QUEUE_ID_TPC_4_3] = GAUDI_ENGINE_ID_TPC_4,
 +      [GAUDI_QUEUE_ID_TPC_5_0...GAUDI_QUEUE_ID_TPC_5_3] = GAUDI_ENGINE_ID_TPC_5,
 +      [GAUDI_QUEUE_ID_TPC_6_0...GAUDI_QUEUE_ID_TPC_6_3] = GAUDI_ENGINE_ID_TPC_6,
 +      [GAUDI_QUEUE_ID_TPC_7_0...GAUDI_QUEUE_ID_TPC_7_3] = GAUDI_ENGINE_ID_TPC_7,
 +      [GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3] = GAUDI_ENGINE_ID_NIC_0,
 +      [GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3] = GAUDI_ENGINE_ID_NIC_1,
 +      [GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3] = GAUDI_ENGINE_ID_NIC_2,
 +      [GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3] = GAUDI_ENGINE_ID_NIC_3,
 +      [GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3] = GAUDI_ENGINE_ID_NIC_4,
 +      [GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3] = GAUDI_ENGINE_ID_NIC_5,
 +      [GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3] = GAUDI_ENGINE_ID_NIC_6,
 +      [GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3] = GAUDI_ENGINE_ID_NIC_7,
 +      [GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3] = GAUDI_ENGINE_ID_NIC_8,
 +      [GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3] = GAUDI_ENGINE_ID_NIC_9,
 +};
 +
 +/* The order here is opposite to the order of the indexing in the h/w.
 + * i.e. SYNC_MGR_W_S is actually 0, SYNC_MGR_E_S is 1, etc.
 + */
 +static const char * const gaudi_sync_manager_names[] = {
 +      "SYNC_MGR_E_N",
 +      "SYNC_MGR_W_N",
 +      "SYNC_MGR_E_S",
 +      "SYNC_MGR_W_S",
 +      NULL
 +};
 +
 +struct ecc_info_extract_params {
 +      u64 block_address;
 +      u32 num_memories;
 +      bool derr;
 +};
 +
 +static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 +                                                              u64 phys_addr);
 +static int gaudi_send_job_on_qman0(struct hl_device *hdev,
 +                                      struct hl_cs_job *job);
 +static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
 +                                      u32 size, u64 val);
 +static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
 +                                      u32 num_regs, u32 val);
 +static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,
 +                              u32 tpc_id);
 +static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev);
 +static int gaudi_cpucp_info_get(struct hl_device *hdev);
 +static void gaudi_disable_clock_gating(struct hl_device *hdev);
 +static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid);
 +static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb);
 +static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
 +                              struct hl_gen_wait_properties *prop);
 +static inline enum hl_collective_mode
 +get_collective_mode(struct hl_device *hdev, u32 queue_id)
 +{
 +      if (gaudi_queue_type[queue_id] == QUEUE_TYPE_EXT)
 +              return HL_COLLECTIVE_MASTER;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_DMA_5_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_DMA_5_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_TPC_7_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_TPC_7_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_NIC_0_0 &&
 +                      queue_id <= GAUDI_QUEUE_ID_NIC_9_3)
 +              return HL_COLLECTIVE_SLAVE;
 +
 +      return HL_COLLECTIVE_NOT_SUPPORTED;
 +}
 +
 +static inline void set_default_power_values(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      if (hdev->card_type == cpucp_card_type_pmc) {
 +              prop->max_power_default = MAX_POWER_DEFAULT_PMC;
 +
 +              if (prop->fw_security_enabled)
 +                      prop->dc_power_default = DC_POWER_DEFAULT_PMC_SEC;
 +              else
 +                      prop->dc_power_default = DC_POWER_DEFAULT_PMC;
 +      } else {
 +              prop->max_power_default = MAX_POWER_DEFAULT_PCI;
 +              prop->dc_power_default = DC_POWER_DEFAULT_PCI;
 +      }
 +}
 +
 +static int gaudi_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 num_sync_stream_queues = 0;
 +      int i;
 +
 +      prop->max_queues = GAUDI_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues,
 +                      sizeof(struct hw_queue_properties),
 +                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < prop->max_queues ; i++) {
 +              if (gaudi_queue_type[i] == QUEUE_TYPE_EXT) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 +                      prop->hw_queues_props[i].driver_only = 0;
 +                      prop->hw_queues_props[i].supports_sync_stream = 1;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_KERNEL;
 +                      num_sync_stream_queues++;
 +              } else if (gaudi_queue_type[i] == QUEUE_TYPE_CPU) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
 +                      prop->hw_queues_props[i].driver_only = 1;
 +                      prop->hw_queues_props[i].supports_sync_stream = 0;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_KERNEL;
 +              } else if (gaudi_queue_type[i] == QUEUE_TYPE_INT) {
 +                      prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
 +                      prop->hw_queues_props[i].driver_only = 0;
 +                      prop->hw_queues_props[i].supports_sync_stream = 0;
 +                      prop->hw_queues_props[i].cb_alloc_flags =
 +                              CB_ALLOC_USER;
 +
 +              }
 +              prop->hw_queues_props[i].collective_mode =
 +                                              get_collective_mode(hdev, i);
 +      }
 +
 +      prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
 +      prop->host_base_address = HOST_PHYS_BASE;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
 +      prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 +      prop->completion_mode = HL_COMPLETION_MODE_JOB;
 +      prop->collective_first_sob = 0;
 +      prop->collective_first_mon = 0;
 +
 +      /* 2 SOBs per internal queue stream are reserved for collective */
 +      prop->sync_stream_first_sob =
 +                      ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR)
 +                      * QMAN_STREAMS * HL_RSVD_SOBS;
 +
 +      /* 1 monitor per internal queue stream are reserved for collective
 +       * 2 monitors per external queue stream are reserved for collective
 +       */
 +      prop->sync_stream_first_mon =
 +                      (NUMBER_OF_COLLECTIVE_QUEUES * QMAN_STREAMS) +
 +                      (NUMBER_OF_EXT_HW_QUEUES * 2);
 +
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_size = GAUDI_HBM_SIZE_32GB;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address =
 +                      prop->sram_base_address + SRAM_USER_BASE_OFFSET;
 +
 +      prop->mmu_cache_mng_addr = MMU_CACHE_MNG_ADDR;
 +      prop->mmu_cache_mng_size = MMU_CACHE_MNG_SIZE;
 +
 +      prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +      prop->dram_page_size = PAGE_SIZE_2MB;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_supports_virtual_memory = false;
 +
 +      prop->pmmu.hop_shifts[MMU_HOP0] = MMU_V1_1_HOP0_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP1] = MMU_V1_1_HOP1_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP2] = MMU_V1_1_HOP2_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP3] = MMU_V1_1_HOP3_SHIFT;
 +      prop->pmmu.hop_shifts[MMU_HOP4] = MMU_V1_1_HOP4_SHIFT;
 +      prop->pmmu.hop_masks[MMU_HOP0] = MMU_V1_1_HOP0_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP1] = MMU_V1_1_HOP1_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP2] = MMU_V1_1_HOP2_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP3] = MMU_V1_1_HOP3_MASK;
 +      prop->pmmu.hop_masks[MMU_HOP4] = MMU_V1_1_HOP4_MASK;
 +      prop->pmmu.start_addr = VA_HOST_SPACE_START;
 +      prop->pmmu.end_addr =
 +                      (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2) - 1;
 +      prop->pmmu.page_size = PAGE_SIZE_4KB;
 +      prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* PMMU and HPMMU are the same except of page size */
 +      memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +
 +      /* shifts and masks are the same in PMMU and DMMU */
 +      memcpy(&prop->dmmu, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->dmmu.start_addr = (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2);
 +      prop->dmmu.end_addr = VA_HOST_SPACE_END;
 +      prop->dmmu.page_size = PAGE_SIZE_2MB;
 +
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GAUDI_EVENT_SIZE;
 +      prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 +
 +      set_default_power_values(hdev);
 +
 +      prop->cb_pool_cb_cnt = GAUDI_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GAUDI_CB_POOL_CB_SIZE;
 +
 +      prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
 +                                      CARD_NAME_MAX_LEN);
 +
 +      prop->max_pending_cs = GAUDI_MAX_PENDING_CS;
 +
 +      prop->first_available_user_sob[HL_GAUDI_WS_DCORE] =
 +                      prop->sync_stream_first_sob +
 +                      (num_sync_stream_queues * HL_RSVD_SOBS);
 +      prop->first_available_user_mon[HL_GAUDI_WS_DCORE] =
 +                      prop->sync_stream_first_mon +
 +                      (num_sync_stream_queues * HL_RSVD_MONS);
 +
 +      prop->first_available_user_interrupt = USHRT_MAX;
 +
 +      for (i = 0 ; i < HL_MAX_DCORES ; i++)
 +              prop->first_available_cq[i] = USHRT_MAX;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->clk_pll_index = HL_GAUDI_MME_PLL;
 +      prop->max_freq_value = GAUDI_MAX_CLK_FREQ;
 +
 +      prop->use_get_power_for_reset_history = true;
 +
 +      prop->configurable_stop_on_err = true;
 +
 +      prop->set_max_power_on_device_init = true;
 +
 +      prop->dma_mask = 48;
 +
 +      prop->hbw_flush_reg = mmPCIE_WRAP_RR_ELBI_RD_SEC_REG_CTRL;
 +
 +      return 0;
 +}
 +
 +static int gaudi_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"SRAM", "CFG", "HBM"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[CFG_BAR_ID] +
 +                      (CFG_BASE - SPI_FLASH_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((gaudi) && (gaudi->hbm_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return U64_MAX;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to HBM */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = HBM_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (gaudi) {
 +              old_addr = gaudi->hbm_bar_cur_addr;
 +              gaudi->hbm_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +static int gaudi_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to SRAM + CFG */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_BAR_ID;
 +      inbound_region.addr = SRAM_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 1 - Bar 2 - Point to SPI FLASH */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = CFG_BAR_ID;
 +      inbound_region.addr = SPI_FLASH_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to HBM */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = HBM_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Outbound Region 0 - Point to Host */
 +      outbound_region.addr = HOST_PHYS_BASE;
 +      outbound_region.size = HOST_PHYS_SIZE;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +done:
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state gaudi_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +static int gaudi_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      u32 fw_boot_status;
 +      int rc;
 +
 +      rc = gaudi_set_fixed_properties(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed setting fixed properties\n");
 +              return rc;
 +      }
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_BAR_ID);
 +
 +      if (pci_bar_size != SRAM_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_BAR_ID, &pci_bar_size, SRAM_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, HBM_BAR_ID);
 +
 +      /* If FW security is enabled at this point it means no access to ELBI */
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +              /*
 +               * GIC-security-bit can ONLY be set by CPUCP, so in this stage
 +               * decision can only be taken based on PCI ID security.
 +               */
 +              hdev->asic_prop.gic_interrupts_enable = false;
 +              goto pci_init;
 +      }
 +
 +      rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
 +                              &fw_boot_status);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Check whether FW is configuring iATU */
 +      if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
 +                      (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +pci_init:
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (gaudi_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +static int gaudi_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +/**
 + * gaudi_fetch_psoc_frequency - Fetch PSOC frequency values
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int gaudi_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
 +      int rc;
 +
 +      if ((hdev->fw_components & FW_TYPE_LINUX) &&
 +                      (prop->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_PLL_INFO_EN)) {
 +              struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +                      return 0;
 +
 +              rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI_CPU_PLL, pll_freq_arr);
 +
 +              if (rc)
 +                      return rc;
 +
 +              freq = pll_freq_arr[2];
 +      } else {
 +              /* Backward compatibility */
 +              div_fctr = RREG32(mmPSOC_CPU_PLL_DIV_FACTOR_2);
 +              div_sel = RREG32(mmPSOC_CPU_PLL_DIV_SEL_2);
 +              nr = RREG32(mmPSOC_CPU_PLL_NR);
 +              nf = RREG32(mmPSOC_CPU_PLL_NF);
 +              od = RREG32(mmPSOC_CPU_PLL_OD);
 +
 +              if (div_sel == DIV_SEL_REF_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_REF) {
 +                      if (div_sel == DIV_SEL_REF_CLK)
 +                              freq = PLL_REF_CLK;
 +                      else
 +                              freq = PLL_REF_CLK / (div_fctr + 1);
 +              } else if (div_sel == DIV_SEL_PLL_CLK ||
 +                      div_sel == DIV_SEL_DIVIDED_PLL) {
 +                      pll_clk = PLL_REF_CLK * (nf + 1) /
 +                                      ((nr + 1) * (od + 1));
 +                      if (div_sel == DIV_SEL_PLL_CLK)
 +                              freq = pll_clk;
 +                      else
 +                              freq = pll_clk / (div_fctr + 1);
 +              } else {
 +                      dev_warn(hdev->dev, "Received invalid div select value: %#x", div_sel);
 +                      freq = 0;
 +              }
 +      }
 +
 +      prop->psoc_timestamp_frequency = freq;
 +      prop->psoc_pci_pll_nr = nr;
 +      prop->psoc_pci_pll_nf = nf;
 +      prop->psoc_pci_pll_od = od;
 +      prop->psoc_pci_pll_div_factor = div_fctr;
 +
 +      return 0;
 +}
 +
 +static int _gaudi_init_tpc_mem(struct hl_device *hdev,
 +              dma_addr_t tpc_kernel_src_addr, u32 tpc_kernel_size)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct packet_lin_dma *init_tpc_mem_pkt;
 +      struct hl_cs_job *job;
 +      struct hl_cb *cb;
 +      u64 dst_addr;
 +      u32 cb_size, ctl;
 +      u8 tpc_id;
 +      int rc;
 +
 +      cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      init_tpc_mem_pkt = cb->kernel_address;
 +      cb_size = sizeof(*init_tpc_mem_pkt);
 +      memset(init_tpc_mem_pkt, 0, cb_size);
 +
 +      init_tpc_mem_pkt->tsize = cpu_to_le32(tpc_kernel_size);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      init_tpc_mem_pkt->ctl = cpu_to_le32(ctl);
 +
 +      init_tpc_mem_pkt->src_addr = cpu_to_le64(tpc_kernel_src_addr);
 +
 +      /* TPC_CMD is configured with I$ prefetch enabled, so address should be aligned to 8KB */
 +      dst_addr = FIELD_PREP(GAUDI_PKT_LIN_DMA_DST_ADDR_MASK,
 +                              round_up(prop->sram_user_base_address, SZ_8K));
 +      init_tpc_mem_pkt->dst_addr |= cpu_to_le64(dst_addr);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +
 +      if (rc)
 +              goto free_job;
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              rc = gaudi_run_tpc_kernel(hdev, dst_addr, tpc_id);
 +              if (rc)
 +                      break;
 +      }
 +
 +free_job:
 +      hl_userptr_delete_list(hdev, &job->userptr_list);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +/*
 + * gaudi_init_tpc_mem() - Initialize TPC memories.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy TPC kernel fw from firmware file and run it to initialize TPC memories.
 + *
 + * Return: 0 for success, negative value for error.
 + */
 +static int gaudi_init_tpc_mem(struct hl_device *hdev)
 +{
 +      const struct firmware *fw;
 +      size_t fw_size;
 +      void *cpu_addr;
 +      dma_addr_t dma_handle;
 +      int rc, count = 5;
 +
 +again:
 +      rc = request_firmware(&fw, GAUDI_TPC_FW_FILE, hdev->dev);
 +      if (rc == -EINTR && count-- > 0) {
 +              msleep(50);
 +              goto again;
 +      }
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to load firmware file %s\n",
 +                              GAUDI_TPC_FW_FILE);
 +              goto out;
 +      }
 +
 +      fw_size = fw->size;
 +      cpu_addr = hl_asic_dma_alloc_coherent(hdev, fw_size, &dma_handle, GFP_KERNEL | __GFP_ZERO);
 +      if (!cpu_addr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate %zu of dma memory for TPC kernel\n",
 +                      fw_size);
 +              rc = -ENOMEM;
 +              goto out;
 +      }
 +
 +      memcpy(cpu_addr, fw->data, fw_size);
 +
 +      rc = _gaudi_init_tpc_mem(hdev, dma_handle, fw_size);
 +
 +      hl_asic_dma_free_coherent(hdev, fw->size, cpu_addr, dma_handle);
 +
 +out:
 +      release_firmware(fw);
 +      return rc;
 +}
 +
 +static void gaudi_collective_map_sobs(struct hl_device *hdev, u32 stream)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_collective_properties *prop = &gaudi->collective_props;
 +      struct hl_hw_queue *q;
 +      u32 i, sob_id, sob_group_id, queue_id;
 +
 +      /* Iterate through SOB groups and assign a SOB for each slave queue */
 +      sob_group_id =
 +              stream * HL_RSVD_SOBS + prop->curr_sob_group_idx[stream];
 +      sob_id = prop->hw_sob_group[sob_group_id].base_sob_id;
 +
 +      queue_id = GAUDI_QUEUE_ID_NIC_0_0 + stream;
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              q = &hdev->kernel_queues[queue_id + (4 * i)];
 +              q->sync_stream_prop.collective_sob_id = sob_id + i;
 +      }
 +
 +      /* Both DMA5 and TPC7 use the same resources since only a single
 +       * engine need to participate in the reduction process
 +       */
 +      queue_id = GAUDI_QUEUE_ID_DMA_5_0 + stream;
 +      q = &hdev->kernel_queues[queue_id];
 +      q->sync_stream_prop.collective_sob_id =
 +                      sob_id + NIC_NUMBER_OF_ENGINES;
 +
 +      queue_id = GAUDI_QUEUE_ID_TPC_7_0 + stream;
 +      q = &hdev->kernel_queues[queue_id];
 +      q->sync_stream_prop.collective_sob_id =
 +                      sob_id + NIC_NUMBER_OF_ENGINES;
 +}
 +
 +static void gaudi_sob_group_hw_reset(struct kref *ref)
 +{
 +      struct gaudi_hw_sob_group *hw_sob_group =
 +              container_of(ref, struct gaudi_hw_sob_group, kref);
 +      struct hl_device *hdev = hw_sob_group->hdev;
 +      int i;
 +
 +      for (i = 0 ; i < NUMBER_OF_SOBS_IN_GRP ; i++)
 +              WREG32((mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (hw_sob_group->base_sob_id * 4) + (i * 4)), 0);
 +
 +      kref_init(&hw_sob_group->kref);
 +}
 +
 +static void gaudi_sob_group_reset_error(struct kref *ref)
 +{
 +      struct gaudi_hw_sob_group *hw_sob_group =
 +              container_of(ref, struct gaudi_hw_sob_group, kref);
 +      struct hl_device *hdev = hw_sob_group->hdev;
 +
 +      dev_crit(hdev->dev,
 +              "SOB release shouldn't be called here, base_sob_id: %d\n",
 +              hw_sob_group->base_sob_id);
 +}
 +
 +static void gaudi_collective_mstr_sob_mask_set(struct gaudi_device *gaudi)
 +{
 +      struct gaudi_collective_properties *prop;
 +      int i;
 +
 +      prop = &gaudi->collective_props;
 +
 +      memset(prop->mstr_sob_mask, 0, sizeof(prop->mstr_sob_mask));
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++)
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + i))
 +                      prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
 +                                      BIT(i % HL_MAX_SOBS_PER_MONITOR);
 +      /* Set collective engine bit */
 +      prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
 +                              BIT(i % HL_MAX_SOBS_PER_MONITOR);
 +}
 +
 +static int gaudi_collective_init(struct hl_device *hdev)
 +{
 +      u32 i, sob_id, reserved_sobs_per_group;
 +      struct gaudi_collective_properties *prop;
 +      struct gaudi_device *gaudi;
 +
 +      gaudi = hdev->asic_specific;
 +      prop = &gaudi->collective_props;
 +      sob_id = hdev->asic_prop.collective_first_sob;
 +
 +      /* First sob in group must be aligned to HL_MAX_SOBS_PER_MONITOR */
 +      reserved_sobs_per_group =
 +              ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR);
 +
 +      /* Init SOB groups */
 +      for (i = 0 ; i < NUM_SOB_GROUPS; i++) {
 +              prop->hw_sob_group[i].hdev = hdev;
 +              prop->hw_sob_group[i].base_sob_id = sob_id;
 +              sob_id += reserved_sobs_per_group;
 +              gaudi_sob_group_hw_reset(&prop->hw_sob_group[i].kref);
 +      }
 +
 +      for (i = 0 ; i < QMAN_STREAMS; i++) {
 +              prop->next_sob_group_val[i] = 1;
 +              prop->curr_sob_group_idx[i] = 0;
 +              gaudi_collective_map_sobs(hdev, i);
 +      }
 +
 +      gaudi_collective_mstr_sob_mask_set(gaudi);
 +
 +      return 0;
 +}
 +
 +static void gaudi_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_collective_properties *cprop = &gaudi->collective_props;
 +
 +      kref_put(&cprop->hw_sob_group[sob_group].kref,
 +                                      gaudi_sob_group_hw_reset);
 +}
 +
 +static void gaudi_collective_master_init_job(struct hl_device *hdev,
 +              struct hl_cs_job *job, u32 stream, u32 sob_group_offset)
 +{
 +      u32 master_sob_base, master_monitor, queue_id, cb_size = 0;
 +      struct gaudi_collective_properties *cprop;
 +      struct hl_gen_wait_properties wait_prop;
 +      struct hl_sync_stream_properties *prop;
 +      struct gaudi_device *gaudi;
 +
 +      gaudi = hdev->asic_specific;
 +      cprop = &gaudi->collective_props;
 +      queue_id = job->hw_queue_id;
 +      prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
 +
 +      master_sob_base =
 +              cprop->hw_sob_group[sob_group_offset].base_sob_id;
 +      master_monitor = prop->collective_mstr_mon_id[0];
 +
 +      cprop->hw_sob_group[sob_group_offset].queue_id = queue_id;
 +
 +      dev_dbg(hdev->dev,
 +              "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
 +              master_sob_base, cprop->mstr_sob_mask[0],
 +              cprop->next_sob_group_val[stream],
 +              master_monitor, queue_id);
 +
 +      wait_prop.data = (void *) job->patched_cb;
 +      wait_prop.sob_base = master_sob_base;
 +      wait_prop.sob_mask = cprop->mstr_sob_mask[0];
 +      wait_prop.sob_val = cprop->next_sob_group_val[stream];
 +      wait_prop.mon_id = master_monitor;
 +      wait_prop.q_idx = queue_id;
 +      wait_prop.size = cb_size;
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +
 +      master_sob_base += HL_MAX_SOBS_PER_MONITOR;
 +      master_monitor = prop->collective_mstr_mon_id[1];
 +
 +      dev_dbg(hdev->dev,
 +              "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
 +              master_sob_base, cprop->mstr_sob_mask[1],
 +              cprop->next_sob_group_val[stream],
 +              master_monitor, queue_id);
 +
 +      wait_prop.sob_base = master_sob_base;
 +      wait_prop.sob_mask = cprop->mstr_sob_mask[1];
 +      wait_prop.mon_id = master_monitor;
 +      wait_prop.size = cb_size;
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +}
 +
 +static void gaudi_collective_slave_init_job(struct hl_device *hdev,
 +              struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
 +{
 +      struct hl_gen_wait_properties wait_prop;
 +      struct hl_sync_stream_properties *prop;
 +      u32 queue_id, cb_size = 0;
 +
 +      queue_id = job->hw_queue_id;
 +      prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
 +
 +      if (job->cs->encaps_signals) {
 +              /* use the encaps signal handle store earlier in the flow
 +               * and set the SOB information from the encaps
 +               * signals handle
 +               */
 +              hl_hw_queue_encaps_sig_set_sob_info(hdev, job->cs, job,
 +                                              cs_cmpl);
 +
 +              dev_dbg(hdev->dev, "collective wait: Sequence %llu found, sob_id: %u,  wait for sob_val: %u\n",
 +                              job->cs->sequence,
 +                              cs_cmpl->hw_sob->sob_id,
 +                              cs_cmpl->sob_val);
 +      }
 +
 +      /* Add to wait CBs using slave monitor */
 +      wait_prop.data = (void *) job->user_cb;
 +      wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
 +      wait_prop.sob_mask = 0x1;
 +      wait_prop.sob_val = cs_cmpl->sob_val;
 +      wait_prop.mon_id = prop->collective_slave_mon_id;
 +      wait_prop.q_idx = queue_id;
 +      wait_prop.size = cb_size;
 +
 +      dev_dbg(hdev->dev,
 +              "Generate slave wait CB, sob %d, val:%x, mon %d, q %d\n",
 +              cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
 +              prop->collective_slave_mon_id, queue_id);
 +
 +      cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
 +
 +      dev_dbg(hdev->dev,
 +              "generate signal CB, sob_id: %d, sob val: 1, q_idx: %d\n",
 +              prop->collective_sob_id, queue_id);
 +
 +      cb_size += gaudi_gen_signal_cb(hdev, job->user_cb,
 +                      prop->collective_sob_id, cb_size, false);
 +}
 +
 +static int gaudi_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      struct hl_cs_compl *signal_cs_cmpl =
 +              container_of(cs->signal_fence, struct hl_cs_compl, base_fence);
 +      struct hl_cs_compl *cs_cmpl =
 +              container_of(cs->fence, struct hl_cs_compl, base_fence);
 +      struct hl_cs_encaps_sig_handle *handle = cs->encaps_sig_hdl;
 +      struct gaudi_collective_properties *cprop;
 +      u32 stream, queue_id, sob_group_offset;
 +      struct gaudi_device *gaudi;
 +      struct hl_device *hdev;
 +      struct hl_cs_job *job;
 +      struct hl_ctx *ctx;
 +
 +      ctx = cs->ctx;
 +      hdev = ctx->hdev;
 +      gaudi = hdev->asic_specific;
 +      cprop = &gaudi->collective_props;
 +
 +      if (cs->encaps_signals) {
 +              cs_cmpl->hw_sob = handle->hw_sob;
 +              /* at this checkpoint we only need the hw_sob pointer
 +               * for the completion check before start going over the jobs
 +               * of the master/slaves, the sob_value will be taken later on
 +               * in gaudi_collective_slave_init_job depends on each
 +               * job wait offset value.
 +               */
 +              cs_cmpl->sob_val = 0;
 +      } else {
 +              /* copy the SOB id and value of the signal CS */
 +              cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
 +              cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
 +      }
 +
 +      /* check again if the signal cs already completed.
 +       * if yes then don't send any wait cs since the hw_sob
 +       * could be in reset already. if signal is not completed
 +       * then get refcount to hw_sob to prevent resetting the sob
 +       * while wait cs is not submitted.
 +       * note that this check is protected by two locks,
 +       * hw queue lock and completion object lock,
 +       * and the same completion object lock also protects
 +       * the hw_sob reset handler function.
 +       * The hw_queue lock prevent out of sync of hw_sob
 +       * refcount value, changed by signal/wait flows.
 +       */
 +      spin_lock(&signal_cs_cmpl->lock);
 +
 +      if (completion_done(&cs->signal_fence->completion)) {
 +              spin_unlock(&signal_cs_cmpl->lock);
 +              return -EINVAL;
 +      }
 +      /* Increment kref since all slave queues are now waiting on it */
 +      kref_get(&cs_cmpl->hw_sob->kref);
 +
 +      spin_unlock(&signal_cs_cmpl->lock);
 +
 +      /* Calculate the stream from collective master queue (1st job) */
 +      job = list_first_entry(&cs->job_list, struct hl_cs_job, cs_node);
 +      stream = job->hw_queue_id % 4;
 +      sob_group_offset =
 +              stream * HL_RSVD_SOBS + cprop->curr_sob_group_idx[stream];
 +
 +      list_for_each_entry(job, &cs->job_list, cs_node) {
 +              queue_id = job->hw_queue_id;
 +
 +              if (hdev->kernel_queues[queue_id].collective_mode ==
 +                              HL_COLLECTIVE_MASTER)
 +                      gaudi_collective_master_init_job(hdev, job, stream,
 +                                              sob_group_offset);
 +              else
 +                      gaudi_collective_slave_init_job(hdev, job, cs_cmpl);
 +      }
 +
 +      cs_cmpl->sob_group = sob_group_offset;
 +
 +      /* Handle sob group kref and wraparound */
 +      kref_get(&cprop->hw_sob_group[sob_group_offset].kref);
 +      cprop->next_sob_group_val[stream]++;
 +
 +      if (cprop->next_sob_group_val[stream] == HL_MAX_SOB_VAL) {
 +              /*
 +               * Decrement as we reached the max value.
 +               * The release function won't be called here as we've
 +               * just incremented the refcount.
 +               */
 +              kref_put(&cprop->hw_sob_group[sob_group_offset].kref,
 +                              gaudi_sob_group_reset_error);
 +              cprop->next_sob_group_val[stream] = 1;
 +              /* only two SOBs are currently in use */
 +              cprop->curr_sob_group_idx[stream] =
 +                      (cprop->curr_sob_group_idx[stream] + 1) &
 +                                                      (HL_RSVD_SOBS - 1);
 +
 +              gaudi_collective_map_sobs(hdev, stream);
 +
 +              dev_dbg(hdev->dev, "switched to SOB group %d, stream: %d\n",
 +                              cprop->curr_sob_group_idx[stream], stream);
 +      }
 +
 +      mb();
 +      hl_fence_put(cs->signal_fence);
 +      cs->signal_fence = NULL;
 +
 +      return 0;
 +}
 +
 +static u32 gaudi_get_patched_cb_extra_size(u32 user_cb_size)
 +{
 +      u32 cacheline_end, additional_commands;
 +
 +      cacheline_end = round_up(user_cb_size, DEVICE_CACHE_LINE_SIZE);
 +      additional_commands = sizeof(struct packet_msg_prot) * 2;
 +
 +      if (user_cb_size + additional_commands > cacheline_end)
 +              return cacheline_end - user_cb_size + additional_commands;
 +      else
 +              return additional_commands;
 +}
 +
 +static int gaudi_collective_wait_create_job(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs,
 +              enum hl_collective_mode mode, u32 queue_id, u32 wait_queue_id,
 +              u32 encaps_signal_offset)
 +{
 +      struct hw_queue_properties *hw_queue_prop;
 +      struct hl_cs_counters_atomic *cntr;
 +      struct hl_cs_job *job;
 +      struct hl_cb *cb;
 +      u32 cb_size;
 +      bool patched_cb;
 +
 +      cntr = &hdev->aggregated_cs_counters;
 +
 +      if (mode == HL_COLLECTIVE_MASTER) {
 +              /* CB size of collective master queue contains
 +               * 4 msg short packets for monitor 1 configuration
 +               * 1 fence packet
 +               * 4 msg short packets for monitor 2 configuration
 +               * 1 fence packet
 +               * 2 msg prot packets for completion and MSI
 +               */
 +              cb_size = sizeof(struct packet_msg_short) * 8 +
 +                              sizeof(struct packet_fence) * 2 +
 +                              sizeof(struct packet_msg_prot) * 2;
 +              patched_cb = true;
 +      } else {
 +              /* CB size of collective slave queues contains
 +               * 4 msg short packets for monitor configuration
 +               * 1 fence packet
 +               * 1 additional msg short packet for sob signal
 +               */
 +              cb_size = sizeof(struct packet_msg_short) * 5 +
 +                              sizeof(struct packet_fence);
 +              patched_cb = false;
 +      }
 +
 +      hw_queue_prop = &hdev->asic_prop.hw_queues_props[queue_id];
 +      job = hl_cs_allocate_job(hdev, hw_queue_prop->type, true);
 +      if (!job) {
 +              atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
 +              atomic64_inc(&cntr->out_of_mem_drop_cnt);
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              return -ENOMEM;
 +      }
 +
 +      /* Allocate internal mapped CB for non patched CBs */
 +      cb = hl_cb_kernel_create(hdev, cb_size,
 +                      hdev->mmu_enable && !patched_cb);
 +      if (!cb) {
 +              atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
 +              atomic64_inc(&cntr->out_of_mem_drop_cnt);
 +              kfree(job);
 +              return -EFAULT;
 +      }
 +
 +      job->id = 0;
 +      job->cs = cs;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = queue_id;
 +
 +      /* since its guaranteed to have only one chunk in the collective wait
 +       * cs, we can use this chunk to set the encapsulated signal offset
 +       * in the jobs.
 +       */
 +      if (cs->encaps_signals)
 +              job->encaps_sig_wait_offset = encaps_signal_offset;
 +
 +      /*
 +       * No need in parsing, user CB is the patched CB.
 +       * We call hl_cb_destroy() out of two reasons - we don't need
 +       * the CB in the CB idr anymore and to decrement its refcount as
 +       * it was incremented inside hl_cb_kernel_create().
 +       */
 +      if (patched_cb)
 +              job->patched_cb = job->user_cb;
 +      else
 +              job->patched_cb = NULL;
 +
 +      job->job_cb_size = job->user_cb_size;
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      /* increment refcount as for external queues we get completion */
 +      if (hw_queue_prop->type == QUEUE_TYPE_EXT)
 +              cs_get(cs);
 +
 +      cs->jobs_in_queue_cnt[job->hw_queue_id]++;
 +
 +      list_add_tail(&job->cs_node, &cs->job_list);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      return 0;
 +}
 +
 +static int gaudi_collective_wait_create_jobs(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs,
 +              u32 wait_queue_id, u32 collective_engine_id,
 +              u32 encaps_signal_offset)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hw_queue_properties *hw_queue_prop;
 +      u32 queue_id, collective_queue, num_jobs;
 +      u32 stream, nic_queue, nic_idx = 0;
 +      bool skip;
 +      int i, rc = 0;
 +
 +      /* Verify wait queue id is configured as master */
 +      hw_queue_prop = &hdev->asic_prop.hw_queues_props[wait_queue_id];
 +      if (!(hw_queue_prop->collective_mode == HL_COLLECTIVE_MASTER)) {
 +              dev_err(hdev->dev,
 +                      "Queue %d is not configured as collective master\n",
 +                      wait_queue_id);
 +              return -EINVAL;
 +      }
 +
 +      /* Verify engine id is supported */
 +      if (collective_engine_id != GAUDI_ENGINE_ID_DMA_5 &&
 +                      collective_engine_id != GAUDI_ENGINE_ID_TPC_7) {
 +              dev_err(hdev->dev,
 +                      "Collective wait does not support engine %u\n",
 +                      collective_engine_id);
 +              return -EINVAL;
 +      }
 +
 +      stream = wait_queue_id % 4;
 +
 +      if (collective_engine_id == GAUDI_ENGINE_ID_DMA_5)
 +              collective_queue = GAUDI_QUEUE_ID_DMA_5_0 + stream;
 +      else
 +              collective_queue = GAUDI_QUEUE_ID_TPC_7_0 + stream;
 +
 +      num_jobs = NUMBER_OF_SOBS_IN_GRP + 1;
 +      nic_queue = GAUDI_QUEUE_ID_NIC_0_0 + stream;
 +
 +      /* First job goes to the collective master queue, it will wait for
 +       * the collective slave queues to finish execution.
 +       * The synchronization is done using two monitors:
 +       * First monitor for NICs 0-7, second monitor for NICs 8-9 and the
 +       * reduction engine (DMA5/TPC7).
 +       *
 +       * Rest of the jobs goes to the collective slave queues which will
 +       * all wait for the user to signal sob 'cs_cmpl->sob_val'.
 +       */
 +      for (i = 0 ; i < num_jobs ; i++) {
 +              if (i == 0) {
 +                      queue_id = wait_queue_id;
 +                      rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
 +                              HL_COLLECTIVE_MASTER, queue_id,
 +                              wait_queue_id, encaps_signal_offset);
 +              } else {
 +                      if (nic_idx < NIC_NUMBER_OF_ENGINES) {
 +                              if (gaudi->hw_cap_initialized &
 +                                      BIT(HW_CAP_NIC_SHIFT + nic_idx))
 +                                      skip = false;
 +                              else
 +                                      skip = true;
 +
 +                              queue_id = nic_queue;
 +                              nic_queue += 4;
 +                              nic_idx++;
 +
 +                              if (skip)
 +                                      continue;
 +                      } else {
 +                              queue_id = collective_queue;
 +                      }
 +
 +                      rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
 +                              HL_COLLECTIVE_SLAVE, queue_id,
 +                              wait_queue_id, encaps_signal_offset);
 +              }
 +
 +              if (rc)
 +                      return rc;
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      rc = gaudi->cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info\n");
 +              return rc;
 +      }
 +
 +      if ((hdev->card_type == cpucp_card_type_pci) &&
 +                      (hdev->nic_ports_mask & 0x3)) {
 +              dev_info(hdev->dev,
 +                      "PCI card detected, only 8 ports are enabled\n");
 +              hdev->nic_ports_mask &= ~0x3;
 +
 +              /* Stop and disable unused NIC QMANs */
 +              WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +              WREG32(mmNIC0_QM1_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                                      NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +              WREG32(mmNIC0_QM0_GLBL_CFG0, 0);
 +              WREG32(mmNIC0_QM1_GLBL_CFG0, 0);
 +
 +              gaudi->hw_cap_initialized &= ~(HW_CAP_NIC0 | HW_CAP_NIC1);
 +      }
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
 +              return rc;
 +      }
 +
 +      /* Scrub both SRAM and DRAM */
 +      rc = hdev->asic_funcs->scrub_device_mem(hdev);
 +      if (rc)
 +              goto disable_pci_access;
 +
 +      rc = gaudi_fetch_psoc_frequency(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_mmu_clear_pgt_range(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear MMU page tables range\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_init_tpc_mem(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to initialize TPC memories\n");
 +              goto disable_pci_access;
 +      }
 +
 +      rc = gaudi_collective_init(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to init collective\n");
 +              goto disable_pci_access;
 +      }
 +
 +      /* We only support a single ASID for the user, so for the sake of optimization, just
 +       * initialize the ASID one time during device initialization with the fixed value of 1
 +       */
 +      gaudi_mmu_prepare(hdev, 1);
 +
 +      hl_fw_set_pll_profile(hdev);
 +
 +      return 0;
 +
 +disable_pci_access:
 +      hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +
 +      return rc;
 +}
 +
 +static void gaudi_late_fini(struct hl_device *hdev)
 +{
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static int gaudi_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 +{
 +      dma_addr_t dma_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
 +      void *virt_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {};
 +      int i, j, rc = 0;
 +
 +      /*
 +       * The device CPU works with 40-bits addresses, while bit 39 must be set
 +       * to '1' when accessing the host.
 +       * Bits 49:39 of the full host address are saved for a later
 +       * configuration of the HW to perform extension to 50 bits.
 +       * Because there is a single HW register that holds the extension bits,
 +       * these bits must be identical in all allocated range.
 +       */
 +
 +      for (i = 0 ; i < GAUDI_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
 +              virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                              &dma_addr_arr[i],
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +              if (!virt_addr_arr[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_dma_mem_arr;
 +              }
 +
 +              end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
 +              if (GAUDI_CPU_PCI_MSB_ADDR(dma_addr_arr[i]) ==
 +                              GAUDI_CPU_PCI_MSB_ADDR(end_addr))
 +                      break;
 +      }
 +
 +      if (i == GAUDI_ALLOC_CPU_MEM_RETRY_CNT) {
 +              dev_err(hdev->dev,
 +                      "MSB of CPU accessible DMA memory are not identical in all range\n");
 +              rc = -EFAULT;
 +              goto free_dma_mem_arr;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
 +      hdev->cpu_accessible_dma_address = dma_addr_arr[i];
 +      hdev->cpu_pci_msb_addr =
 +              GAUDI_CPU_PCI_MSB_ADDR(hdev->cpu_accessible_dma_address);
 +
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_PCI_TO_CPU_ADDR(hdev->cpu_accessible_dma_address);
 +
 +free_dma_mem_arr:
 +      for (j = 0 ; j < i ; j++)
 +              hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
 +                                              dma_addr_arr[j]);
 +
 +      return rc;
 +}
 +
 +static void gaudi_free_internal_qmans_pq_mem(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u32 i;
 +
 +      for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
 +              q = &gaudi->internal_qmans[i];
 +              if (!q->pq_kernel_addr)
 +                      continue;
 +              hl_asic_dma_free_coherent(hdev, q->pq_size, q->pq_kernel_addr, q->pq_dma_addr);
 +      }
 +}
 +
 +static int gaudi_alloc_internal_qmans_pq_mem(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      int rc, i;
 +
 +      for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
 +              if (gaudi_queue_type[i] != QUEUE_TYPE_INT)
 +                      continue;
 +
 +              q = &gaudi->internal_qmans[i];
 +
 +              switch (i) {
 +              case GAUDI_QUEUE_ID_DMA_2_0 ... GAUDI_QUEUE_ID_DMA_7_3:
 +                      q->pq_size = HBM_DMA_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_MME_0_0 ... GAUDI_QUEUE_ID_MME_1_3:
 +                      q->pq_size = MME_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_TPC_0_0 ... GAUDI_QUEUE_ID_TPC_7_3:
 +                      q->pq_size = TPC_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              case GAUDI_QUEUE_ID_NIC_0_0 ... GAUDI_QUEUE_ID_NIC_9_3:
 +                      q->pq_size = NIC_QMAN_SIZE_IN_BYTES;
 +                      break;
 +              default:
 +                      dev_err(hdev->dev, "Bad internal queue index %d", i);
 +                      rc = -EINVAL;
 +                      goto free_internal_qmans_pq_mem;
 +              }
 +
 +              q->pq_kernel_addr = hl_asic_dma_alloc_coherent(hdev, q->pq_size, &q->pq_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +              if (!q->pq_kernel_addr) {
 +                      rc = -ENOMEM;
 +                      goto free_internal_qmans_pq_mem;
 +              }
 +      }
 +
 +      return 0;
 +
 +free_internal_qmans_pq_mem:
 +      gaudi_free_internal_qmans_pq_mem(hdev);
 +      return rc;
 +}
 +
 +static void gaudi_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - SPI_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = 0;
 +      region->bar_size = SRAM_BAR_SIZE;
 +      region->bar_id = SRAM_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = HBM_BAR_ID;
 +      region->used = 1;
 +
 +      /* SP SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SP_SRAM];
 +      region->region_base = PSOC_SCRATCHPAD_ADDR;
 +      region->region_size = PSOC_SCRATCHPAD_SIZE;
 +      region->offset_in_bar = PSOC_SCRATCHPAD_ADDR - SPI_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = CFG_BAR_ID;
 +      region->used = 1;
 +}
 +
 +static int gaudi_sw_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi;
 +      u32 i, event_id = 0;
 +      int rc;
 +
 +      /* Allocate device structure */
 +      gaudi = kzalloc(sizeof(*gaudi), GFP_KERNEL);
 +      if (!gaudi)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < ARRAY_SIZE(gaudi_irq_map_table) ; i++) {
 +              if (gaudi_irq_map_table[i].valid) {
 +                      if (event_id == GAUDI_EVENT_SIZE) {
 +                              dev_err(hdev->dev,
 +                                      "Event array exceeds the limit of %u events\n",
 +                                      GAUDI_EVENT_SIZE);
 +                              rc = -EINVAL;
 +                              goto free_gaudi_device;
 +                      }
 +
 +                      gaudi->events[event_id++] =
 +                                      gaudi_irq_map_table[i].fc_id;
 +              }
 +      }
 +
 +      gaudi->cpucp_info_get = gaudi_cpucp_info_get;
 +
 +      hdev->asic_specific = gaudi;
 +
 +      /* Create DMA pool for small allocations */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
 +                      &hdev->pdev->dev, GAUDI_DMA_POOL_BLK_SIZE, 8, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_gaudi_device;
 +      }
 +
 +      rc = gaudi_alloc_cpu_accessible_dma_mem(hdev);
 +      if (rc)
 +              goto free_dma_pool;
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
 +                              (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      rc = gaudi_alloc_internal_qmans_pq_mem(hdev);
 +      if (rc)
 +              goto free_cpu_accessible_dma_pool;
 +
 +      spin_lock_init(&gaudi->hw_queues_lock);
 +
 +      hdev->supports_sync_stream = true;
 +      hdev->supports_coresight = true;
 +      hdev->supports_staged_submission = true;
 +      hdev->supports_wait_for_multi_cs = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +      hdev->stream_master_qid_arr =
 +                              hdev->asic_funcs->get_stream_master_qid_arr();
 +      hdev->stream_master_qid_arr_size = GAUDI_STREAM_MASTER_ARR_SIZE;
 +
 +      return 0;
 +
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
 +                                      hdev->cpu_pci_msb_addr);
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_gaudi_device:
 +      kfree(gaudi);
 +      return rc;
 +}
 +
 +static int gaudi_sw_fini(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      gaudi_free_internal_qmans_pq_mem(hdev);
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
 +                                      hdev->cpu_pci_msb_addr);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(gaudi);
 +
 +      return 0;
 +}
 +
 +static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
 +{
 +      struct hl_device *hdev = arg;
 +      int i;
 +
 +      if (hdev->disabled)
 +              return IRQ_HANDLED;
 +
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 +              hl_irq_handler_cq(irq, &hdev->completion_queue[i]);
 +
 +      hl_irq_handler_eq(irq, &hdev->event_queue);
 +
 +      return IRQ_HANDLED;
 +}
 +
 +/*
 + * For backward compatibility, new MSI interrupts should be set after the
 + * existing CPU and NIC interrupts.
 + */
 +static int gaudi_pci_irq_vector(struct hl_device *hdev, unsigned int nr,
 +                              bool cpu_eq)
 +{
 +      int msi_vec;
 +
 +      if ((nr != GAUDI_EVENT_QUEUE_MSI_IDX) && (cpu_eq))
 +              dev_crit(hdev->dev, "CPU EQ must use IRQ %d\n",
 +                              GAUDI_EVENT_QUEUE_MSI_IDX);
 +
 +      msi_vec = ((nr < GAUDI_EVENT_QUEUE_MSI_IDX) || (cpu_eq)) ? nr :
 +                      (nr + NIC_NUMBER_OF_ENGINES + 1);
 +
 +      return pci_irq_vector(hdev->pdev, msi_vec);
 +}
 +
 +static int gaudi_enable_msi_single(struct hl_device *hdev)
 +{
 +      int rc, irq;
 +
 +      dev_dbg(hdev->dev, "Working in single MSI IRQ mode\n");
 +
 +      irq = gaudi_pci_irq_vector(hdev, 0, false);
 +      rc = request_irq(irq, gaudi_irq_handler_single, 0,
 +                      "gaudi single msi", hdev);
 +      if (rc)
 +              dev_err(hdev->dev,
 +                      "Failed to request single MSI IRQ\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi_enable_msi_multi(struct hl_device *hdev)
 +{
 +      int cq_cnt = hdev->asic_prop.completion_queues_count;
 +      int rc, i, irq_cnt_init, irq;
 +
 +      for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
 +              irq = gaudi_pci_irq_vector(hdev, i, false);
 +              rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi_irq_name[i],
 +                              &hdev->completion_queue[i]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_irqs;
 +              }
 +      }
 +
 +      irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX, true);
 +      rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi_irq_name[cq_cnt],
 +                              &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irqs;
 +      }
 +
 +      return 0;
 +
 +free_irqs:
 +      for (i = 0 ; i < irq_cnt_init ; i++)
 +              free_irq(gaudi_pci_irq_vector(hdev, i, false),
 +                              &hdev->completion_queue[i]);
 +      return rc;
 +}
 +
 +static int gaudi_enable_msi(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MSI)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, 1, 1, PCI_IRQ_MSI);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "MSI: Failed to enable support %d\n", rc);
 +              return rc;
 +      }
 +
 +      if (rc < NUMBER_OF_INTERRUPTS) {
 +              gaudi->multi_msi_mode = false;
 +              rc = gaudi_enable_msi_single(hdev);
 +      } else {
 +              gaudi->multi_msi_mode = true;
 +              rc = gaudi_enable_msi_multi(hdev);
 +      }
 +
 +      if (rc)
 +              goto free_pci_irq_vectors;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MSI;
 +
 +      return 0;
 +
 +free_pci_irq_vectors:
 +      pci_free_irq_vectors(hdev->pdev);
 +      return rc;
 +}
 +
 +static void gaudi_sync_irqs(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int i, cq_cnt = hdev->asic_prop.completion_queues_count;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      if (gaudi->multi_msi_mode) {
 +              for (i = 0 ; i < cq_cnt ; i++)
 +                      synchronize_irq(gaudi_pci_irq_vector(hdev, i, false));
 +
 +              synchronize_irq(gaudi_pci_irq_vector(hdev,
 +                                              GAUDI_EVENT_QUEUE_MSI_IDX,
 +                                              true));
 +      } else {
 +              synchronize_irq(gaudi_pci_irq_vector(hdev, 0, false));
 +      }
 +}
 +
 +static void gaudi_disable_msi(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int i, irq, cq_cnt = hdev->asic_prop.completion_queues_count;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
 +              return;
 +
 +      gaudi_sync_irqs(hdev);
 +
 +      if (gaudi->multi_msi_mode) {
 +              irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX,
 +                                              true);
 +              free_irq(irq, &hdev->event_queue);
 +
 +              for (i = 0 ; i < cq_cnt ; i++) {
 +                      irq = gaudi_pci_irq_vector(hdev, i, false);
 +                      free_irq(irq, &hdev->completion_queue[i]);
 +              }
 +      } else {
 +              free_irq(gaudi_pci_irq_vector(hdev, 0, false), hdev);
 +      }
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      gaudi->hw_cap_initialized &= ~HW_CAP_MSI;
 +}
 +
 +static void gaudi_init_scrambler_sram(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
 +                                              CPU_BOOT_DEV_STS0_SRAM_SCR_EN)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_SRAM_SCRAMBLER)
 +              return;
 +
 +      WREG32(mmNIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_SCRAM_SRAM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_SRAM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_SRAM_SCRAMBLER;
 +}
 +
 +static void gaudi_init_scrambler_hbm(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_DRAM_SCR_EN)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_HBM_SCRAMBLER)
 +              return;
 +
 +      WREG32(mmNIF_RTR_CTRL_0_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_SCRAM_HBM_EN,
 +                      1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_HBM_SCRAMBLER;
 +}
 +
 +static void gaudi_init_e2e(struct hl_device *hdev)
 +{
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_E2E_CRED_EN)
 +              return;
 +
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 247 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 785 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 49);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 101);
 +
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 297 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 908 >> 3);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 19);
 +
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 318 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 956 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 79);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 163);
 +
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
 +
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
 +
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 318 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 956 >> 3);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 79);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 79);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
 +
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_EN,
 +                      1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_EN,
 +                      1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
 +      WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_EN,
 +                      1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
 +}
 +
 +static void gaudi_init_hbm_cred(struct hl_device *hdev)
 +{
 +      u32 hbm0_wr, hbm1_wr, hbm0_rd, hbm1_rd;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
 +                                              CPU_BOOT_DEV_STS0_HBM_CRED_EN)
 +              return;
 +
 +      hbm0_wr = 0x33333333;
 +      hbm0_rd = 0x77777777;
 +      hbm1_wr = 0x55555555;
 +      hbm1_rd = 0xDDDDDDDD;
 +
 +      WREG32(mmDMA_IF_E_N_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_E_N_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_E_N_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_E_N_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_E_S_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_E_S_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_E_S_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_E_S_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_W_N_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_W_N_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_W_N_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_W_N_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_W_S_HBM0_WR_CRED_CNT, hbm0_wr);
 +      WREG32(mmDMA_IF_W_S_HBM1_WR_CRED_CNT, hbm1_wr);
 +      WREG32(mmDMA_IF_W_S_HBM0_RD_CRED_CNT, hbm0_rd);
 +      WREG32(mmDMA_IF_W_S_HBM1_RD_CRED_CNT, hbm1_rd);
 +
 +      WREG32(mmDMA_IF_E_N_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_E_S_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_N_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_S_HBM_CRED_EN_0,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +
 +      WREG32(mmDMA_IF_E_N_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_E_S_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_N_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +      WREG32(mmDMA_IF_W_S_HBM_CRED_EN_1,
 +                      (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
 +                      (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
 +}
 +
 +static void gaudi_init_golden_registers(struct hl_device *hdev)
 +{
 +      u32 tpc_offset;
 +      int tpc_id, i;
 +
 +      gaudi_init_e2e(hdev);
 +      gaudi_init_hbm_cred(hdev);
 +
 +      for (tpc_id = 0, tpc_offset = 0;
 +                              tpc_id < TPC_NUMBER_OF_ENGINES;
 +                              tpc_id++, tpc_offset += TPC_CFG_OFFSET) {
 +              /* Mask all arithmetic interrupts from TPC */
 +              WREG32(mmTPC0_CFG_TPC_INTR_MASK + tpc_offset, 0x8FFE);
 +              /* Set 16 cache lines */
 +              WREG32_FIELD(TPC0_CFG_MSS_CONFIG, tpc_offset,
 +                              ICACHE_FETCH_LINE_NUM, 2);
 +      }
 +
 +      /* Make sure 1st 128 bytes in SRAM are 0 for Tensor DMA */
 +      for (i = 0 ; i < 128 ; i += 8)
 +              writeq(0, hdev->pcie_bar[SRAM_BAR_ID] + i);
 +
 +      WREG32(mmMME0_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME1_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME2_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +      WREG32(mmMME3_CTRL_EUS_ROLLUP_CNT_ADD, 3);
 +}
 +
 +static void gaudi_init_pci_dma_qman(struct hl_device *hdev, int dma_id,
 +                                      int qman_id, dma_addr_t qman_pq_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 q_off, dma_qm_offset;
 +      u32 dma_qm_err_cfg, irq_handler_offset;
 +
 +      dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = dma_qm_offset + qman_id * 4;
 +
 +      WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_pq_addr));
 +      WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_pq_addr));
 +
 +      WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HL_QUEUE_LENGTH));
 +      WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
 +      WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
 +
 +      WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off, QMAN_LDMA_SIZE_OFFSET);
 +      WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +      WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
 +
 +      WREG32(mmDMA0_QM_CP_BARRIER_CFG_0 + q_off, 0x100);
 +
 +      /* The following configuration is needed only once per QMAN */
 +      if (qman_id == 0) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +
 +              /* Configure RAZWI IRQ */
 +              dma_qm_err_cfg = PCI_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      dma_qm_err_cfg |=
 +                              PCI_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
 +                                                                      dma_id);
 +
 +              WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
 +                              QMAN_EXTERNAL_MAKE_TRUSTED);
 +
 +              WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
 +      }
 +}
 +
 +static void gaudi_init_dma_core(struct hl_device *hdev, int dma_id)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 dma_err_cfg = 1 << DMA0_CORE_ERR_CFG_ERR_MSG_EN_SHIFT;
 +      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +      u32 irq_handler_offset;
 +
 +      /* Set to maximum possible according to physical size */
 +      WREG32(mmDMA0_CORE_RD_MAX_OUTSTAND + dma_offset, 0);
 +      WREG32(mmDMA0_CORE_RD_MAX_SIZE + dma_offset, 0);
 +
 +      /* WA for H/W bug H3-2116 */
 +      WREG32(mmDMA0_CORE_LBW_MAX_OUTSTAND + dma_offset, 15);
 +
 +      /* STOP_ON bit implies no completion to operation in case of RAZWI */
 +      if (hdev->stop_on_err)
 +              dma_err_cfg |= 1 << DMA0_CORE_ERR_CFG_STOP_ON_ERR_SHIFT;
 +
 +      WREG32(mmDMA0_CORE_ERR_CFG + dma_offset, dma_err_cfg);
 +
 +      irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
 +
 +      WREG32(mmDMA0_CORE_ERRMSG_ADDR_LO + dma_offset,
 +              lower_32_bits(CFG_BASE + irq_handler_offset));
 +      WREG32(mmDMA0_CORE_ERRMSG_ADDR_HI + dma_offset,
 +              upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      WREG32(mmDMA0_CORE_ERRMSG_WDATA + dma_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_DMA0_CORE].cpu_id + dma_id);
 +      WREG32(mmDMA0_CORE_PROT + dma_offset,
 +                      1 << DMA0_CORE_PROT_ERR_VAL_SHIFT);
 +      /* If the channel is secured, it should be in MMU bypass mode */
 +      WREG32(mmDMA0_CORE_SECURE_PROPS + dma_offset,
 +                      1 << DMA0_CORE_SECURE_PROPS_MMBP_SHIFT);
 +      WREG32(mmDMA0_CORE_CFG_0 + dma_offset, 1 << DMA0_CORE_CFG_0_EN_SHIFT);
 +}
 +
 +static void gaudi_enable_qman(struct hl_device *hdev, int dma_id,
 +                              u32 enable_mask)
 +{
 +      u32 dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG0 + dma_qm_offset, enable_mask);
 +}
 +
 +static void gaudi_init_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct hl_hw_queue *q;
 +      int i, j, dma_id, cpu_skip, nic_skip, cq_id = 0, q_idx, msi_vec = 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_PCI_DMA)
 +              return;
 +
 +      for (i = 0 ; i < PCI_DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[i];
 +              /*
 +               * For queues after the CPU Q need to add 1 to get the correct
 +               * queue. In addition, need to add the CPU EQ and NIC IRQs in
 +               * order to get the correct MSI register.
 +               */
 +              if (dma_id > 1) {
 +                      cpu_skip = 1;
 +                      nic_skip = NIC_NUMBER_OF_ENGINES;
 +              } else {
 +                      cpu_skip = 0;
 +                      nic_skip = 0;
 +              }
 +
 +              for (j = 0 ; j < QMAN_STREAMS ; j++) {
 +                      q_idx = 4 * dma_id + j + cpu_skip;
 +                      q = &hdev->kernel_queues[q_idx];
 +                      q->cq_id = cq_id++;
 +                      q->msi_vec = nic_skip + cpu_skip + msi_vec++;
 +                      gaudi_init_pci_dma_qman(hdev, dma_id, j,
 +                                              q->bus_address);
 +              }
 +
 +              gaudi_init_dma_core(hdev, dma_id);
 +
 +              gaudi_enable_qman(hdev, dma_id, PCI_DMA_QMAN_ENABLE);
 +      }
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_PCI_DMA;
 +}
 +
 +static void gaudi_init_hbm_dma_qman(struct hl_device *hdev, int dma_id,
 +                                      int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 dma_qm_err_cfg, irq_handler_offset;
 +      u32 q_off, dma_qm_offset;
 +
 +      dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = dma_qm_offset + qman_id * 4;
 +
 +      if (qman_id < 4) {
 +              WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HBM_DMA_QMAN_LENGTH));
 +              WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +
 +              WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              dma_qm_err_cfg = HBM_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      dma_qm_err_cfg |=
 +                              HBM_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
 +                                                                      dma_id);
 +
 +              WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
 +              WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure DMA5 CP_MSG_BASE 2/3 for sync stream collective */
 +      if (gaudi_dma_assignment[dma_id] == GAUDI_ENGINE_ID_DMA_5) {
 +              WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
 +                              mtr_base_ws_lo);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
 +                              mtr_base_ws_hi);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
 +                              so_base_ws_lo);
 +              WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
 +                              so_base_ws_hi);
 +      }
 +}
 +
 +static void gaudi_init_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      int i, j, dma_id, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_HBM_DMA)
 +              return;
 +
 +      for (i = 0 ; i < HBM_DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1 + i];
 +
 +              for (j = 0 ; j < QMAN_STREAMS ; j++) {
 +                       /*
 +                        * Add the CPU queue in order to get the correct queue
 +                        * number as all internal queue are placed after it
 +                        */
 +                      internal_q_index = dma_id * QMAN_STREAMS + j + 1;
 +
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_hbm_dma_qman(hdev, dma_id, j,
 +                                              qman_base_addr);
 +              }
 +
 +              /* Initializing lower CP for HBM DMA QMAN */
 +              gaudi_init_hbm_dma_qman(hdev, dma_id, 4, 0);
 +
 +              gaudi_init_dma_core(hdev, dma_id);
 +
 +              gaudi_enable_qman(hdev, dma_id, HBM_DMA_QMAN_ENABLE);
 +      }
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_HBM_DMA;
 +}
 +
 +static void gaudi_init_mme_qman(struct hl_device *hdev, u32 mme_offset,
 +                                      int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 irq_handler_offset;
 +      u32 q_off, mme_id;
 +      u32 mme_qm_err_cfg;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = mme_offset + qman_id * 4;
 +
 +      if (qman_id < 4) {
 +              WREG32(mmMME0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmMME0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmMME0_QM_PQ_SIZE_0 + q_off, ilog2(MME_QMAN_LENGTH));
 +              WREG32(mmMME0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmMME0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
 +
 +              WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              mme_id = mme_offset /
 +                              (mmMME1_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0) / 2;
 +
 +              mme_qm_err_cfg = MME_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      mme_qm_err_cfg |=
 +                              MME_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_CFG + mme_offset, mme_qm_err_cfg);
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_ADDR_LO + mme_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmMME0_QM_GLBL_ERR_ADDR_HI + mme_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmMME0_QM_GLBL_ERR_WDATA + mme_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_MME0_QM].cpu_id +
 +                                                                      mme_id);
 +
 +              WREG32(mmMME0_QM_ARB_ERR_MSG_EN + mme_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmMME0_QM_ARB_SLV_CHOISE_WDT + mme_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmMME0_QM_GLBL_CFG1 + mme_offset, 0);
 +              WREG32(mmMME0_QM_GLBL_PROT + mme_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_lo);
 +      WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_hi);
 +      WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_lo);
 +      WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_hi);
 +}
 +
 +static void gaudi_init_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 mme_offset;
 +      int i, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MME)
 +              return;
 +
 +      /*
 +       * map GAUDI_QUEUE_ID_MME_0_X to the N_W_MME (mmMME2_QM_BASE)
 +       * and GAUDI_QUEUE_ID_MME_1_X to the S_W_MME (mmMME0_QM_BASE)
 +       */
 +
 +      mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_QMANS ; i++) {
 +              internal_q_index = GAUDI_QUEUE_ID_MME_0_0 + i;
 +              q = &gaudi->internal_qmans[internal_q_index];
 +              qman_base_addr = (u64) q->pq_dma_addr;
 +              gaudi_init_mme_qman(hdev, mme_offset, (i & 0x3),
 +                                      qman_base_addr);
 +              if (i == 3)
 +                      mme_offset = 0;
 +      }
 +
 +      /* Initializing lower CP for MME QMANs */
 +      mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
 +      gaudi_init_mme_qman(hdev, mme_offset, 4, 0);
 +      gaudi_init_mme_qman(hdev, 0, 4, 0);
 +
 +      WREG32(mmMME2_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +      WREG32(mmMME0_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MME;
 +}
 +
 +static void gaudi_init_tpc_qman(struct hl_device *hdev, u32 tpc_offset,
 +                              int qman_id, u64 qman_base_addr)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 tpc_qm_err_cfg, irq_handler_offset;
 +      u32 q_off, tpc_id;
 +
 +      mtr_base_en_lo = lower_32_bits(CFG_BASE +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = tpc_offset + qman_id * 4;
 +
 +      tpc_id = tpc_offset /
 +                      (mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0);
 +
 +      if (qman_id < 4) {
 +              WREG32(mmTPC0_QM_PQ_BASE_LO_0 + q_off,
 +                                      lower_32_bits(qman_base_addr));
 +              WREG32(mmTPC0_QM_PQ_BASE_HI_0 + q_off,
 +                                      upper_32_bits(qman_base_addr));
 +
 +              WREG32(mmTPC0_QM_PQ_SIZE_0 + q_off, ilog2(TPC_QMAN_LENGTH));
 +              WREG32(mmTPC0_QM_PQ_PI_0 + q_off, 0);
 +              WREG32(mmTPC0_QM_PQ_CI_0 + q_off, 0);
 +
 +              WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SIZE_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_SRC_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_CPDMA_DST_OFFSET);
 +      } else {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
 +
 +              WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +              WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +              /* Configure RAZWI IRQ */
 +              tpc_qm_err_cfg = TPC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      tpc_qm_err_cfg |=
 +                              TPC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_CFG + tpc_offset, tpc_qm_err_cfg);
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + tpc_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + tpc_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmTPC0_QM_GLBL_ERR_WDATA + tpc_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_TPC0_QM].cpu_id +
 +                                                                      tpc_id);
 +
 +              WREG32(mmTPC0_QM_ARB_ERR_MSG_EN + tpc_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmTPC0_QM_ARB_SLV_CHOISE_WDT + tpc_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmTPC0_QM_GLBL_CFG1 + tpc_offset, 0);
 +              WREG32(mmTPC0_QM_GLBL_PROT + tpc_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure TPC7 CP_MSG_BASE 2/3 for sync stream collective */
 +      if (tpc_id == 6) {
 +              WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
 +                              mtr_base_ws_lo);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
 +                              mtr_base_ws_hi);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
 +                              so_base_ws_lo);
 +              WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
 +                              so_base_ws_hi);
 +      }
 +}
 +
 +static void gaudi_init_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 so_base_hi, tpc_offset = 0;
 +      u32 tpc_delta = mmTPC1_CFG_SM_BASE_ADDRESS_HIGH -
 +                      mmTPC0_CFG_SM_BASE_ADDRESS_HIGH;
 +      int i, tpc_id, internal_q_index;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_TPC_MASK)
 +              return;
 +
 +      so_base_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              for (i = 0 ; i < QMAN_STREAMS ; i++) {
 +                      internal_q_index = GAUDI_QUEUE_ID_TPC_0_0 +
 +                                              tpc_id * QMAN_STREAMS + i;
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_tpc_qman(hdev, tpc_offset, i,
 +                                              qman_base_addr);
 +
 +                      if (i == 3) {
 +                              /* Initializing lower CP for TPC QMAN */
 +                              gaudi_init_tpc_qman(hdev, tpc_offset, 4, 0);
 +
 +                              /* Enable the QMAN and TPC channel */
 +                              WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset,
 +                                              QMAN_TPC_ENABLE);
 +                      }
 +              }
 +
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + tpc_id * tpc_delta,
 +                              so_base_hi);
 +
 +              tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
 +
 +              gaudi->hw_cap_initialized |=
 +                              FIELD_PREP(HW_CAP_TPC_MASK, 1 << tpc_id);
 +      }
 +}
 +
 +static void gaudi_init_nic_qman(struct hl_device *hdev, u32 nic_offset,
 +                              int qman_id, u64 qman_base_addr, int nic_id)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
 +      u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
 +      u32 nic_qm_err_cfg, irq_handler_offset;
 +      u32 q_off;
 +
 +      mtr_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_en_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      mtr_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_ws_hi = upper_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      q_off = nic_offset + qman_id * 4;
 +
 +      WREG32(mmNIC0_QM0_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_base_addr));
 +      WREG32(mmNIC0_QM0_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_base_addr));
 +
 +      WREG32(mmNIC0_QM0_PQ_SIZE_0 + q_off, ilog2(NIC_QMAN_LENGTH));
 +      WREG32(mmNIC0_QM0_PQ_PI_0 + q_off, 0);
 +      WREG32(mmNIC0_QM0_PQ_CI_0 + q_off, 0);
 +
 +      WREG32(mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SIZE_OFFSET);
 +      WREG32(mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_SRC_OFFSET);
 +      WREG32(mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
 +                                                      QMAN_LDMA_DST_OFFSET);
 +
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
 +
 +      /* Configure NIC CP_MSG_BASE 2/3 for sync stream collective */
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
 +      WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
 +
 +      if (qman_id == 0) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
 +
 +              /* Configure RAZWI IRQ */
 +              nic_qm_err_cfg = NIC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
 +              if (hdev->stop_on_err)
 +                      nic_qm_err_cfg |=
 +                              NIC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_CFG + nic_offset, nic_qm_err_cfg);
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_LO + nic_offset,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +              WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_HI + nic_offset,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +              WREG32(mmNIC0_QM0_GLBL_ERR_WDATA + nic_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_NIC0_QM0].cpu_id +
 +                                                                      nic_id);
 +
 +              WREG32(mmNIC0_QM0_ARB_ERR_MSG_EN + nic_offset,
 +                              QM_ARB_ERR_MSG_EN_MASK);
 +
 +              /* Set timeout to maximum */
 +              WREG32(mmNIC0_QM0_ARB_SLV_CHOISE_WDT + nic_offset, GAUDI_ARB_WDT_TIMEOUT);
 +
 +              WREG32(mmNIC0_QM0_GLBL_CFG1 + nic_offset, 0);
 +              WREG32(mmNIC0_QM0_GLBL_PROT + nic_offset,
 +                              QMAN_INTERNAL_MAKE_TRUSTED);
 +      }
 +}
 +
 +static void gaudi_init_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +      u64 qman_base_addr;
 +      u32 nic_offset = 0;
 +      u32 nic_delta_between_qmans =
 +                      mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      u32 nic_delta_between_nics =
 +                      mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      int i, nic_id, internal_q_index;
 +
 +      if (!hdev->nic_ports_mask)
 +              return;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
 +              return;
 +
 +      dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
 +
 +      for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
 +              if (!(hdev->nic_ports_mask & (1 << nic_id))) {
 +                      nic_offset += nic_delta_between_qmans;
 +                      if (nic_id & 1) {
 +                              nic_offset -= (nic_delta_between_qmans * 2);
 +                              nic_offset += nic_delta_between_nics;
 +                      }
 +                      continue;
 +              }
 +
 +              for (i = 0 ; i < QMAN_STREAMS ; i++) {
 +                      internal_q_index = GAUDI_QUEUE_ID_NIC_0_0 +
 +                                              nic_id * QMAN_STREAMS + i;
 +                      q = &gaudi->internal_qmans[internal_q_index];
 +                      qman_base_addr = (u64) q->pq_dma_addr;
 +                      gaudi_init_nic_qman(hdev, nic_offset, (i & 0x3),
 +                                              qman_base_addr, nic_id);
 +              }
 +
 +              /* Enable the QMAN */
 +              WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, NIC_QMAN_ENABLE);
 +
 +              nic_offset += nic_delta_between_qmans;
 +              if (nic_id & 1) {
 +                      nic_offset -= (nic_delta_between_qmans * 2);
 +                      nic_offset += nic_delta_between_nics;
 +              }
 +
 +              gaudi->hw_cap_initialized |= 1 << (HW_CAP_NIC_SHIFT + nic_id);
 +      }
 +}
 +
 +static void gaudi_disable_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA1_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA5_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      WREG32(mmDMA2_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA3_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA4_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA6_QM_GLBL_CFG0, 0);
 +      WREG32(mmDMA7_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      WREG32(mmMME2_QM_GLBL_CFG0, 0);
 +      WREG32(mmMME0_QM_GLBL_CFG0, 0);
 +}
 +
 +static void gaudi_disable_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 tpc_offset = 0;
 +      int tpc_id;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
 +              WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset, 0);
 +              tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
 +      }
 +}
 +
 +static void gaudi_disable_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 nic_mask, nic_offset = 0;
 +      u32 nic_delta_between_qmans =
 +                      mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      u32 nic_delta_between_nics =
 +                      mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
 +      int nic_id;
 +
 +      for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
 +              nic_mask = 1 << (HW_CAP_NIC_SHIFT + nic_id);
 +
 +              if (gaudi->hw_cap_initialized & nic_mask)
 +                      WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, 0);
 +
 +              nic_offset += nic_delta_between_qmans;
 +              if (nic_id & 1) {
 +                      nic_offset -= (nic_delta_between_qmans * 2);
 +                      nic_offset += nic_delta_between_nics;
 +              }
 +      }
 +}
 +
 +static void gaudi_stop_pci_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      /* Stop upper CPs of QMANs 0.0 to 1.3 and 5.0 to 5.3 */
 +      WREG32(mmDMA0_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA1_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA5_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_hbm_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      /* Stop CPs of HBM DMA QMANs */
 +
 +      WREG32(mmDMA2_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA3_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA4_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA6_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmDMA7_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      /* Stop CPs of MME QMANs */
 +      WREG32(mmMME2_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmMME0_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC1_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC2_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC3_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC4_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC5_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC6_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +      WREG32(mmTPC7_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +}
 +
 +static void gaudi_stop_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      /* Stop upper CPs of QMANs */
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC0)
 +              WREG32(mmNIC0_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC1)
 +              WREG32(mmNIC0_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC2)
 +              WREG32(mmNIC1_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC3)
 +              WREG32(mmNIC1_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC4)
 +              WREG32(mmNIC2_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC5)
 +              WREG32(mmNIC2_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC6)
 +              WREG32(mmNIC3_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC7)
 +              WREG32(mmNIC3_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC8)
 +              WREG32(mmNIC4_QM0_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC9)
 +              WREG32(mmNIC4_QM1_GLBL_CFG1,
 +                              NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
 +                              NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
 +}
 +
 +static void gaudi_pci_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
 +              return;
 +
 +      WREG32(mmDMA0_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA1_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA5_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +}
 +
 +static void gaudi_hbm_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
 +              return;
 +
 +      WREG32(mmDMA2_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA3_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA4_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA6_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +      WREG32(mmDMA7_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
 +}
 +
 +static void gaudi_mme_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      /* WA for H3-1800 bug: do ACC and SBAB writes twice */
 +      WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
 +      WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +      WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
 +}
 +
 +static void gaudi_tpc_stall(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +}
 +
 +static void gaudi_disable_clock_gating(struct hl_device *hdev)
 +{
 +      u32 qman_offset;
 +      int i;
 +
 +      if (hdev->asic_prop.fw_security_enabled)
 +              return;
 +
 +      for (i = 0, qman_offset = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              WREG32(mmDMA0_QM_CGM_CFG + qman_offset, 0);
 +              WREG32(mmDMA0_QM_CGM_CFG1 + qman_offset, 0);
 +
 +              qman_offset += (mmDMA1_QM_CGM_CFG - mmDMA0_QM_CGM_CFG);
 +      }
 +
 +      WREG32(mmMME0_QM_CGM_CFG, 0);
 +      WREG32(mmMME0_QM_CGM_CFG1, 0);
 +      WREG32(mmMME2_QM_CGM_CFG, 0);
 +      WREG32(mmMME2_QM_CGM_CFG1, 0);
 +
 +      for (i = 0, qman_offset = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              WREG32(mmTPC0_QM_CGM_CFG + qman_offset, 0);
 +              WREG32(mmTPC0_QM_CGM_CFG1 + qman_offset, 0);
 +
 +              qman_offset += (mmTPC1_QM_CGM_CFG - mmTPC0_QM_CGM_CFG);
 +      }
 +}
 +
 +static void gaudi_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
 +}
 +
 +static void gaudi_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +}
 +
 +static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
 +
 +      if (fw_reset)
 +              goto skip_engines;
 +
 +      gaudi_stop_nic_qmans(hdev);
 +      gaudi_stop_mme_qmans(hdev);
 +      gaudi_stop_tpc_qmans(hdev);
 +      gaudi_stop_hbm_dma_qmans(hdev);
 +      gaudi_stop_pci_dma_qmans(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi_pci_dma_stall(hdev);
 +      gaudi_hbm_dma_stall(hdev);
 +      gaudi_tpc_stall(hdev);
 +      gaudi_mme_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi_disable_nic_qmans(hdev);
 +      gaudi_disable_mme_qmans(hdev);
 +      gaudi_disable_tpc_qmans(hdev);
 +      gaudi_disable_hbm_dma_qmans(hdev);
 +      gaudi_disable_pci_dma_qmans(hdev);
 +
 +      gaudi_disable_timestamp(hdev);
 +
 +skip_engines:
 +      gaudi_disable_msi(hdev);
 +}
 +
 +static int gaudi_mmu_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 hop0_addr;
 +      int rc, i;
 +
 +      if (!hdev->mmu_enable)
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      for (i = 0 ; i < prop->max_asid ; i++) {
 +              hop0_addr = prop->mmu_pgt_addr +
 +                              (i * prop->mmu_hop_table_size);
 +
 +              rc = gaudi_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to set hop0 addr for asid %d\n", i);
 +                      goto err;
 +              }
 +      }
 +
 +      /* init MMU cache manage page */
 +      WREG32(mmSTLB_CACHE_INV_BASE_39_8, prop->mmu_cache_mng_addr >> 8);
 +      WREG32(mmSTLB_CACHE_INV_BASE_49_40, prop->mmu_cache_mng_addr >> 40);
 +
 +      /* mem cache invalidation */
 +      WREG32(mmSTLB_MEM_CACHE_INVALIDATION, 1);
 +
 +      hl_mmu_invalidate_cache(hdev, true, 0);
 +
 +      WREG32(mmMMU_UP_MMU_ENABLE, 1);
 +      WREG32(mmMMU_UP_SPI_MASK, 0xF);
 +
 +      WREG32(mmSTLB_HOP_CONFIGURATION, 0x30440);
 +
 +      /*
 +       * The H/W expects the first PI after init to be 1. After wraparound
 +       * we'll write 0.
 +       */
 +      gaudi->mmu_cache_inv_pi = 1;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_MMU;
 +
 +      return 0;
 +
 +err:
 +      return rc;
 +}
 +
 +static int gaudi_load_firmware_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[HBM_BAR_ID] + LINUX_FW_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GAUDI_LINUX_FW_FILE, dst, 0, 0);
 +}
 +
 +static int gaudi_load_boot_fit_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[SRAM_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GAUDI_BOOT_FIT_FILE, dst, 0, 0);
 +}
 +
 +static void gaudi_init_dynamic_firmware_loader(struct hl_device *hdev)
 +{
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +
 +      /*
 +       * here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded) in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu =
 +                              cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host =
 +                              cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +
 +      dynamic_loader->wait_for_bl_timeout = GAUDI_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static void gaudi_init_static_firmware_loader(struct hl_device *hdev)
 +{
 +      struct static_fw_load_mgr *static_loader;
 +
 +      static_loader = &hdev->fw_loader.static_loader;
 +
 +      static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
 +      static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
 +      static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
 +      static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
 +      static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
 +      static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
 +      static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
 +      static_loader->cpu_reset_wait_msec = hdev->pldm ?
 +                      GAUDI_PLDM_RESET_WAIT_MSEC :
 +                      GAUDI_CPU_RESET_WAIT_MSEC;
 +}
 +
 +static void gaudi_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void gaudi_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GAUDI_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GAUDI_LINUX_FW_FILE;
 +      fw_loader->cpu_timeout = GAUDI_CPU_TIMEOUT_USEC;
 +      fw_loader->boot_fit_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = !hdev->bmc_enable;
 +      fw_loader->sram_bar_id = SRAM_BAR_ID;
 +      fw_loader->dram_bar_id = HBM_BAR_ID;
 +
 +      if (prop->dynamic_fw_load)
 +              gaudi_init_dynamic_firmware_loader(hdev);
 +      else
 +              gaudi_init_static_firmware_loader(hdev);
 +}
 +
 +static int gaudi_init_cpu(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      /*
 +       * The device CPU works with 40 bits addresses.
 +       * This register sets the extension to 50 bits.
 +       */
 +      if (!hdev->asic_prop.fw_security_enabled)
 +              WREG32(mmCPU_IF_CPU_MSB_ADDR, hdev->cpu_pci_msb_addr);
 +
 +      rc = hl_fw_init_cpu(hdev);
 +
 +      if (rc)
 +              return rc;
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int gaudi_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 status, irq_handler_offset;
 +      struct hl_eq *eq;
 +      struct hl_hw_queue *cpu_pq =
 +                      &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW,
 +                      lower_32_bits(hdev->cpu_accessible_dma_address));
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH,
 +                      upper_32_bits(hdev->cpu_accessible_dma_address));
 +
 +      WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      if (gaudi->multi_msi_mode)
 +              WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
 +      else
 +              WREG32(mmCPU_IF_QUEUE_INIT,
 +                      PQ_INIT_STATUS_READY_FOR_CP_SINGLE_MSI);
 +
 +      irq_handler_offset = prop->gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
 +
 +      WREG32(irq_handler_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_IF_QUEUE_INIT,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              cpu_timeout);
 +
 +      if (err) {
 +              dev_err(hdev->dev,
 +                      "Failed to communicate with Device CPU (CPU-CP timeout)\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      gaudi->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void gaudi_pre_hw_init(struct hl_device *hdev)
 +{
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmHW_STATE);
 +
 +      if (!hdev->asic_prop.fw_security_enabled) {
 +              /* Set the access through PCI bars (Linux driver only) as
 +               * secured
 +               */
 +              WREG32(mmPCIE_WRAP_LBW_PROT_OVR,
 +                              (PCIE_WRAP_LBW_PROT_OVR_RD_EN_MASK |
 +                              PCIE_WRAP_LBW_PROT_OVR_WR_EN_MASK));
 +
 +              /* Perform read to flush the waiting writes to ensure
 +               * configuration was set in the device
 +               */
 +              RREG32(mmPCIE_WRAP_LBW_PROT_OVR);
 +      }
 +
 +      /*
 +       * Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +}
 +
 +static int gaudi_hw_init(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int rc;
 +
 +      gaudi_pre_hw_init(hdev);
 +
 +      /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
 +       * So we set it here and if anyone tries to move it later to
 +       * a different address, there will be an error
 +       */
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              gaudi->hbm_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the hbm bar to
 +       * base address of dram
 +       */
 +      if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map HBM bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = gaudi_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      /* In case the clock gating was enabled in preboot we need to disable
 +       * it here before touching the MME/TPC registers.
 +       */
 +      gaudi_disable_clock_gating(hdev);
 +
 +      /* SRAM scrambler must be initialized after CPU is running from HBM */
 +      gaudi_init_scrambler_sram(hdev);
 +
 +      /* This is here just in case we are working without CPU */
 +      gaudi_init_scrambler_hbm(hdev);
 +
 +      gaudi_init_golden_registers(hdev);
 +
 +      rc = gaudi_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi_init_security(hdev);
 +
 +      gaudi_init_pci_dma_qmans(hdev);
 +
 +      gaudi_init_hbm_dma_qmans(hdev);
 +
 +      gaudi_init_mme_qmans(hdev);
 +
 +      gaudi_init_tpc_qmans(hdev);
 +
 +      gaudi_init_nic_qmans(hdev);
 +
 +      gaudi_enable_timestamp(hdev);
 +
 +      /* MSI must be enabled before CPU queues and NIC are initialized */
 +      rc = gaudi_enable_msi(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* must be called after MSI was enabled */
 +      rc = gaudi_init_cpu_queues(hdev, GAUDI_CPU_TIMEOUT_USEC);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n",
 +                      rc);
 +              goto disable_msi;
 +      }
 +
 +      /* Perform read from the device to flush all configuration */
 +      RREG32(mmHW_STATE);
 +
 +      return 0;
 +
 +disable_msi:
 +      gaudi_disable_msi(hdev);
 +disable_queues:
 +      gaudi_disable_mme_qmans(hdev);
 +      gaudi_disable_pci_dma_qmans(hdev);
 +
 +      return rc;
 +}
 +
 +static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 status, reset_timeout_ms, cpu_timeout_ms, irq_handler_offset;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      bool driver_performs_reset;
 +
 +      if (!hard_reset) {
 +              dev_err(hdev->dev, "GAUDI doesn't support soft-reset\n");
 +              return;
 +      }
 +
 +      if (hdev->pldm) {
 +              reset_timeout_ms = GAUDI_PLDM_HRESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
 +      } else {
 +              reset_timeout_ms = GAUDI_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GAUDI_CPU_RESET_WAIT_MSEC;
 +      }
 +
 +      if (fw_reset) {
 +              dev_dbg(hdev->dev,
 +                      "Firmware performs HARD reset, going to wait %dms\n",
 +                      reset_timeout_ms);
 +
 +              goto skip_reset;
 +      }
 +
 +      driver_performs_reset = !!(!hdev->asic_prop.fw_security_enabled &&
 +                                      !hdev->asic_prop.hard_reset_done_by_fw);
 +
 +      /* Set device to handle FLR by H/W as we will put the device CPU to
 +       * halt mode
 +       */
 +      if (driver_performs_reset)
 +              WREG32(mmPCIE_AUX_FLR_CTRL, (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK |
 +                                      PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
 +
 +      /* If linux is loaded in the device CPU we need to communicate with it
 +       * via the GIC. Otherwise, we need to use COMMS or the MSG_TO_CPU
 +       * registers in case of old F/Ws
 +       */
 +      if (hdev->fw_loader.fw_comp_loaded & FW_TYPE_LINUX) {
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_host_halt_irq);
 +
 +              WREG32(irq_handler_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_HALT_MACHINE].cpu_id);
 +
 +              /* This is a hail-mary attempt to revive the card in the small chance that the
 +               * f/w has experienced a watchdog event, which caused it to return back to preboot.
 +               * In that case, triggering reset through GIC won't help. We need to trigger the
 +               * reset as if Linux wasn't loaded.
 +               *
 +               * We do it only if the reset cause was HB, because that would be the indication
 +               * of such an event.
 +               *
 +               * In case watchdog hasn't expired but we still got HB, then this won't do any
 +               * damage.
 +               */
 +              if (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT) {
 +                      if (hdev->asic_prop.hard_reset_done_by_fw)
 +                              hl_fw_ask_hard_reset_without_linux(hdev);
 +                      else
 +                              hl_fw_ask_halt_machine_without_linux(hdev);
 +              }
 +      } else {
 +              if (hdev->asic_prop.hard_reset_done_by_fw)
 +                      hl_fw_ask_hard_reset_without_linux(hdev);
 +              else
 +                      hl_fw_ask_halt_machine_without_linux(hdev);
 +      }
 +
 +      if (driver_performs_reset) {
 +
 +              /* Configure the reset registers. Must be done as early as
 +               * possible in case we fail during H/W initialization
 +               */
 +              WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_H,
 +                                              (CFG_RST_H_DMA_MASK |
 +                                              CFG_RST_H_MME_MASK |
 +                                              CFG_RST_H_SM_MASK |
 +                                              CFG_RST_H_TPC_7_MASK));
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_L, CFG_RST_L_TPC_MASK);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_H,
 +                                              (CFG_RST_H_HBM_MASK |
 +                                              CFG_RST_H_TPC_7_MASK |
 +                                              CFG_RST_H_NIC_MASK |
 +                                              CFG_RST_H_SM_MASK |
 +                                              CFG_RST_H_DMA_MASK |
 +                                              CFG_RST_H_MME_MASK |
 +                                              CFG_RST_H_CPU_MASK |
 +                                              CFG_RST_H_MMU_MASK));
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_L,
 +                                              (CFG_RST_L_IF_MASK |
 +                                              CFG_RST_L_PSOC_MASK |
 +                                              CFG_RST_L_TPC_MASK));
 +
 +              msleep(cpu_timeout_ms);
 +
 +              /* Tell ASIC not to re-initialize PCIe */
 +              WREG32(mmPREBOOT_PCIE_EN, LKD_HARD_RESET_MAGIC);
 +
 +              /* Restart BTL/BLR upon hard-reset */
 +              WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 1);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
 +                      1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
 +
 +              dev_dbg(hdev->dev,
 +                      "Issued HARD reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      } else {
 +              dev_dbg(hdev->dev,
 +                      "Firmware performs HARD reset, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      }
 +
 +skip_reset:
 +      /*
 +       * After hard reset, we can't poll the BTM_FSM register because the PSOC
 +       * itself is in reset. Need to wait until the reset is deasserted
 +       */
 +      msleep(reset_timeout_ms);
 +
 +      status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
 +      if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for device to reset 0x%x\n",
 +                      status);
 +
 +      if (gaudi) {
 +              gaudi->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q | HW_CAP_HBM |
 +                                              HW_CAP_PCI_DMA | HW_CAP_MME | HW_CAP_TPC_MASK |
 +                                              HW_CAP_HBM_DMA | HW_CAP_PLL | HW_CAP_NIC_MASK |
 +                                              HW_CAP_MMU | HW_CAP_SRAM_SCRAMBLER |
 +                                              HW_CAP_HBM_SCRAMBLER);
 +
 +              memset(gaudi->events_stat, 0, sizeof(gaudi->events_stat));
 +
 +              hdev->device_cpu_is_halted = false;
 +      }
 +}
 +
 +static int gaudi_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi_resume(struct hl_device *hdev)
 +{
 +      return gaudi_init_iatu(hdev);
 +}
 +
 +static int gaudi_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
 +                              (dma_addr - HOST_PHYS_BASE), size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +      return rc;
 +}
 +
 +static void gaudi_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 db_reg_offset, db_value, dma_qm_offset, q_off, irq_handler_offset;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      bool invalid_queue = false;
 +      int dma_id;
 +
 +      switch (hw_queue_id) {
 +      case GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_2];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_3];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_4];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_5];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3:
 +              dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_6];
 +              dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_CPU_PQ:
 +              if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
 +                      db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +              else
 +                      invalid_queue = true;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_0:
 +              db_reg_offset = mmMME2_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_1:
 +              db_reg_offset = mmMME2_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_2:
 +              db_reg_offset = mmMME2_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_0_3:
 +              db_reg_offset = mmMME2_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_0:
 +              db_reg_offset = mmMME0_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_1:
 +              db_reg_offset = mmMME0_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_2:
 +              db_reg_offset = mmMME0_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_MME_1_3:
 +              db_reg_offset = mmMME0_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_0:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_1:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_2:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_0_3:
 +              db_reg_offset = mmTPC0_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_0:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_1:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_2:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_1_3:
 +              db_reg_offset = mmTPC1_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_0:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_1:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_2:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_2_3:
 +              db_reg_offset = mmTPC2_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_0:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_1:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_2:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_3_3:
 +              db_reg_offset = mmTPC3_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_0:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_1:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_2:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_4_3:
 +              db_reg_offset = mmTPC4_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_0:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_1:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_2:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_5_3:
 +              db_reg_offset = mmTPC5_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_0:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_1:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_2:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_6_3:
 +              db_reg_offset = mmTPC6_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_0:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_0;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_1:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_1;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_2:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_2;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_TPC_7_3:
 +              db_reg_offset = mmTPC7_QM_PQ_PI_3;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC0))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC0_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC1))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC0_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC2))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC1_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC3))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC1_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC4))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC2_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC5))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC2_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC6))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC3_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC7))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC3_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC8))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC4_QM0_PQ_PI_0 + q_off;
 +              break;
 +
 +      case GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3:
 +              if (!(gaudi->hw_cap_initialized & HW_CAP_NIC9))
 +                      invalid_queue = true;
 +
 +              q_off = ((hw_queue_id - 1) & 0x3) * 4;
 +              db_reg_offset = mmNIC4_QM1_PQ_PI_0 + q_off;
 +              break;
 +
 +      default:
 +              invalid_queue = true;
 +      }
 +
 +      if (invalid_queue) {
 +              /* Should never get here */
 +              dev_err(hdev->dev, "h/w queue %d is invalid. Can't set pi\n",
 +                      hw_queue_id);
 +              return;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GAUDI_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +
 +              irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                              mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                              le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
 +
 +              WREG32(irq_handler_offset,
 +                      gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
 +      }
 +}
 +
 +static void gaudi_pqe_write(struct hl_device *hdev, __le64 *pqe,
 +                              struct hl_bd *bd)
 +{
 +      __le64 *pbd = (__le64 *) bd;
 +
 +      /* The QMANs are on the host memory so a simple copy suffice */
 +      pqe[0] = pbd[0];
 +      pqe[1] = pbd[1];
 +}
 +
 +static void *gaudi_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
 +                                              dma_handle, flags);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void gaudi_dma_free_coherent(struct hl_device *hdev, size_t size,
 +              void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
 +
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
 +}
 +
 +static int gaudi_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 cur_addr = prop->dram_user_base_address;
 +      u32 chunk_size, busy;
 +      int rc, dma_id;
 +
 +      while (cur_addr < prop->dram_end_address) {
 +              for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
 +                      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +                      chunk_size =
 +                      min((u64)SZ_2G, prop->dram_end_address - cur_addr);
 +
 +                      dev_dbg(hdev->dev,
 +                              "Doing HBM scrubbing for 0x%09llx - 0x%09llx\n",
 +                              cur_addr, cur_addr + chunk_size);
 +
 +                      WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset,
 +                                      lower_32_bits(val));
 +                      WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset,
 +                                      upper_32_bits(val));
 +                      WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset,
 +                                              lower_32_bits(cur_addr));
 +                      WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset,
 +                                              upper_32_bits(cur_addr));
 +                      WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset,
 +                                      chunk_size);
 +                      WREG32(mmDMA0_CORE_COMMIT + dma_offset,
 +                                      ((1 << DMA0_CORE_COMMIT_LIN_SHIFT) |
 +                                      (1 << DMA0_CORE_COMMIT_MEM_SET_SHIFT)));
 +
 +                      cur_addr += chunk_size;
 +
 +                      if (cur_addr == prop->dram_end_address)
 +                              break;
 +              }
 +
 +              for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
 +                      u32 dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +                      rc = hl_poll_timeout(
 +                              hdev,
 +                              mmDMA0_CORE_STS0 + dma_offset,
 +                              busy,
 +                              ((busy & DMA0_CORE_STS0_BUSY_MASK) == 0),
 +                              1000,
 +                              HBM_SCRUBBING_TIMEOUT_US);
 +
 +                      if (rc) {
 +                              dev_err(hdev->dev,
 +                                      "DMA Timeout during HBM scrubbing of DMA #%d\n",
 +                                      dma_id);
 +                              return -EIO;
 +                      }
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_scrub_device_mem(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 wait_to_idle_time = hdev->pdev ? HBM_SCRUBBING_TIMEOUT_US :
 +                      min_t(u64, HBM_SCRUBBING_TIMEOUT_US * 10, HL_SIM_MAX_TIMEOUT_US);
 +      u64 addr, size, val = hdev->memory_scrub_val;
 +      ktime_t timeout;
 +      int rc = 0;
 +
 +      if (!hdev->memory_scrub)
 +              return 0;
 +
 +      timeout = ktime_add_us(ktime_get(), wait_to_idle_time);
 +      while (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
 +              if (ktime_compare(ktime_get(), timeout) > 0) {
 +                      dev_err(hdev->dev, "waiting for idle timeout\n");
 +                      return -ETIMEDOUT;
 +              }
 +              usleep_range((1000 >> 2) + 1, 1000);
 +      }
 +
 +      /* Scrub SRAM */
 +      addr = prop->sram_user_base_address;
 +      size = hdev->pldm ? 0x10000 : prop->sram_size - SRAM_USER_BASE_OFFSET;
 +
 +      dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx val: 0x%llx\n",
 +                      addr, addr + size, val);
 +      rc = gaudi_memset_device_memory(hdev, addr, size, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear SRAM (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      /* Scrub HBM using all DMA channels in parallel */
 +      rc = gaudi_scrub_device_dram(hdev, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear HBM (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void *gaudi_get_int_queue_base(struct hl_device *hdev,
 +                              u32 queue_id, dma_addr_t *dma_handle,
 +                              u16 *queue_len)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct gaudi_internal_qman_info *q;
 +
 +      if (queue_id >= GAUDI_QUEUE_ID_SIZE ||
 +                      gaudi_queue_type[queue_id] != QUEUE_TYPE_INT) {
 +              dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
 +              return NULL;
 +      }
 +
 +      q = &gaudi->internal_qmans[queue_id];
 +      *dma_handle = q->pq_dma_addr;
 +      *queue_len = q->pq_size / QMAN_PQ_ENTRY_SIZE;
 +
 +      return q->pq_kernel_addr;
 +}
 +
 +static int gaudi_send_cpu_message(struct hl_device *hdev, u32 *msg,
 +                              u16 len, u32 timeout, u64 *result)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GAUDI_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GAUDI_QUEUE_ID_CPU_PQ, msg, len,
 +                                              timeout, result);
 +}
 +
 +static int gaudi_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      u32 fence_val, tmp, timeout_usec;
 +      dma_addr_t fence_dma_addr;
 +      u32 *fence_ptr;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_TEST_QUEUE_WAIT_USEC;
 +      else
 +              timeout_usec = GAUDI_TEST_QUEUE_WAIT_USEC;
 +
 +      fence_val = GAUDI_QMAN0_FENCE_VAL;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate memory for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      *fence_ptr = 0;
 +
 +      fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
 +                                              &pkt_dma_addr);
 +      if (!fence_pkt) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              rc = -ENOMEM;
 +              goto free_fence_ptr;
 +      }
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(fence_val);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
 +                                      sizeof(struct packet_msg_prot),
 +                                      pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to send fence packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
 +                                      1000, timeout_usec, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev,
 +                      "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
 +                      hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
 +              rc = -EIO;
 +      }
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +static int gaudi_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +static int gaudi_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = 0 ; i < hdev->asic_prop.max_queues ; i++) {
 +              if (hdev->asic_prop.hw_queues_props[i].type == QUEUE_TYPE_EXT) {
 +                      rc = gaudi_test_queue(hdev, i);
 +                      if (rc)
 +                              ret_val = -EINVAL;
 +              }
 +      }
 +
 +      rc = gaudi_test_cpu_queue(hdev);
 +      if (rc)
 +              ret_val = -EINVAL;
 +
 +      return ret_val;
 +}
 +
 +static void *gaudi_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +              gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      void *kernel_addr;
 +
 +      if (size > GAUDI_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      kernel_addr = dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void gaudi_dma_pool_free(struct hl_device *hdev, void *vaddr,
 +                      dma_addr_t dma_addr)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
 +
 +      dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
 +}
 +
 +static void *gaudi_cpu_accessible_dma_pool_alloc(struct hl_device *hdev,
 +                                      size_t size, dma_addr_t *dma_handle)
 +{
 +      return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +}
 +
 +static void gaudi_cpu_accessible_dma_pool_free(struct hl_device *hdev,
 +                                              size_t size, void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +static u32 gaudi_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
 +{
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t addr, addr_next;
 +
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((addr + len == addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              dma_desc_cnt++;
 +      }
 +
 +      return dma_desc_cnt * sizeof(struct packet_lin_dma);
 +}
 +
 +static int gaudi_pin_memory_before_cs(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              u64 addr, enum dma_data_direction dir)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr))
 +              goto already_pinned;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr)
 +              return -ENOMEM;
 +
 +      rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                              userptr);
 +      if (rc)
 +              goto free_userptr;
 +
 +      list_add_tail(&userptr->job_node, parser->job_userptr_list);
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto unpin_memory;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = dir;
 +
 +already_pinned:
 +      parser->patched_cb_size +=
 +                      gaudi_get_dma_desc_list_size(hdev, userptr->sgt);
 +
 +      return 0;
 +
 +unpin_memory:
 +      list_del(&userptr->job_node);
 +      hl_unpin_host_memory(hdev, userptr);
 +free_userptr:
 +      kfree(userptr);
 +      return rc;
 +}
 +
 +static int gaudi_validate_dma_pkt_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              bool src_in_host)
 +{
 +      enum dma_data_direction dir;
 +      bool skip_host_mem_pin = false, user_memset;
 +      u64 addr;
 +      int rc = 0;
 +
 +      user_memset = (le32_to_cpu(user_dma_pkt->ctl) &
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if (src_in_host) {
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> DEVICE\n");
 +              dir = DMA_TO_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +      } else {
 +              dev_dbg(hdev->dev, "DMA direction is DEVICE --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
 +                              GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
 +                              GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
 +      }
 +
 +      if (skip_host_mem_pin)
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +      else
 +              rc = gaudi_pin_memory_before_cs(hdev, parser, user_dma_pkt,
 +                                              addr, dir);
 +
 +      return rc;
 +}
 +
 +static int gaudi_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      bool src_in_host = false;
 +      u64 dst_addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
 +                      GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
 +
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +                              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n", dst_addr);
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      /*
 +       * Special handling for DMA with size 0. Bypass all validations
 +       * because no transactions will be done except for WR_COMP, which
 +       * is not a security issue
 +       */
 +      if (!le32_to_cpu(user_dma_pkt->tsize)) {
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
 +              src_in_host = true;
 +
 +      return gaudi_validate_dma_pkt_host(hdev, parser, user_dma_pkt,
 +                                              src_in_host);
 +}
 +
 +static int gaudi_validate_load_and_exe_pkt(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser,
 +                                      struct packet_load_and_exe *user_pkt)
 +{
 +      u32 cfg;
 +
 +      cfg = le32_to_cpu(user_pkt->cfg);
 +
 +      if (cfg & GAUDI_PKT_LOAD_AND_EXE_CFG_DST_MASK) {
 +              dev_err(hdev->dev,
 +                      "User not allowed to use Load and Execute\n");
 +              return -EPERM;
 +      }
 +
 +      parser->patched_cb_size += sizeof(struct packet_load_and_exe);
 +
 +      return 0;
 +}
 +
 +static int gaudi_validate_cb(struct hl_device *hdev,
 +                      struct hl_cs_parser *parser, bool is_mmu)
 +{
 +      u32 cb_parsed_length = 0;
 +      int rc = 0;
 +
 +      parser->patched_cb_size = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              struct gaudi_packet *user_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = gaudi_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_LOAD_AND_EXE:
 +                      rc = gaudi_validate_load_and_exe_pkt(hdev, parser,
 +                              (struct packet_load_and_exe *) user_pkt);
 +                      break;
 +
 +              case PACKET_LIN_DMA:
 +                      parser->contains_dma_pkt = true;
 +                      if (is_mmu)
 +                              parser->patched_cb_size += pkt_size;
 +                      else
 +                              rc = gaudi_validate_dma_pkt_no_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      break;
 +
 +              case PACKET_WREG_32:
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_REPEAT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +              case PACKET_ARB_POINT:
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. Optional NOP padding for cacheline alignment
 +       * 2. A packet that will act as a completion packet
 +       * 3. A packet that will generate MSI interrupt
 +       */
 +      if (parser->completion)
 +              parser->patched_cb_size += gaudi_get_patched_cb_extra_size(
 +                      parser->patched_cb_size);
 +
 +      return rc;
 +}
 +
 +static int gaudi_patch_dma_packet(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              struct packet_lin_dma *new_dma_pkt,
 +                              u32 *new_dma_pkt_size)
 +{
 +      struct hl_userptr *userptr;
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt, user_wrcomp_en_mask, ctl;
 +      u64 len, len_next;
 +      dma_addr_t dma_addr, dma_addr_next;
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      struct sg_table *sgt;
 +      bool src_in_host = false;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
 +              src_in_host = true;
 +
 +      user_memset = (ctl & GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if (src_in_host) {
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              dir = DMA_TO_DEVICE;
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +      } else {
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dir = DMA_FROM_DEVICE;
 +      }
 +
 +      if ((!skip_host_mem_pin) &&
 +              (!hl_userptr_is_pinned(hdev, addr,
 +                                      le32_to_cpu(user_dma_pkt->tsize),
 +                                      parser->job_userptr_list, &userptr))) {
 +              dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
 +                              addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if ((user_memset) && (dir == DMA_TO_DEVICE)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      user_wrcomp_en_mask = ctl & GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
 +
 +      sgt = userptr->sgt;
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              dma_addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      dma_addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((dma_addr + len == dma_addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              ctl = le32_to_cpu(user_dma_pkt->ctl);
 +              if (likely(dma_desc_cnt))
 +                      ctl &= ~GAUDI_PKT_CTL_EB_MASK;
 +              ctl &= ~GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
 +              new_dma_pkt->ctl = cpu_to_le32(ctl);
 +              new_dma_pkt->tsize = cpu_to_le32(len);
 +
 +              if (dir == DMA_TO_DEVICE) {
 +                      new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
 +              } else {
 +                      new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
 +              }
 +
 +              if (!user_memset)
 +                      device_memory_addr += len;
 +              dma_desc_cnt++;
 +              new_dma_pkt++;
 +      }
 +
 +      if (!dma_desc_cnt) {
 +              dev_err(hdev->dev,
 +                      "Error of 0 SG entries when patching DMA packet\n");
 +              return -EFAULT;
 +      }
 +
 +      /* Fix the last dma packet - wrcomp must be as user set it */
 +      new_dma_pkt--;
 +      new_dma_pkt->ctl |= cpu_to_le32(user_wrcomp_en_mask);
 +
 +      *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
 +
 +      return 0;
 +}
 +
 +static int gaudi_patch_cb(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u32 cb_parsed_length = 0;
 +      u32 cb_patched_cur_length = 0;
 +      int rc = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              u32 new_pkt_size = 0;
 +              struct gaudi_packet *user_pkt, *kernel_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +              kernel_pkt = parser->patched_cb->kernel_address +
 +                                      cb_patched_cur_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = gaudi_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_LIN_DMA:
 +                      rc = gaudi_patch_dma_packet(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt,
 +                                      (struct packet_lin_dma *) kernel_pkt,
 +                                      &new_pkt_size);
 +                      cb_patched_cur_length += new_pkt_size;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_WREG_32:
 +              case PACKET_WREG_BULK:
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_REPEAT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +              case PACKET_ARB_POINT:
 +              case PACKET_LOAD_AND_EXE:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      u32 patched_cb_size;
 +      struct hl_cb *user_cb;
 +      int rc;
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. Optional NOP padding for cacheline alignment
 +       * 2. A packet that will act as a completion packet
 +       * 3. A packet that will generate MSI interrupt
 +       */
 +      if (parser->completion)
 +              parser->patched_cb_size = parser->user_cb_size +
 +                              gaudi_get_patched_cb_extra_size(parser->user_cb_size);
 +      else
 +              parser->patched_cb_size = parser->user_cb_size;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n",
 +                      rc);
 +              return rc;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      /*
 +       * We are protected from overflow because the check
 +       * "parser->user_cb_size <= parser->user_cb->size" was done in get_cb_from_cs_chunk()
 +       * in the common code. That check is done only if is_kernel_allocated_cb is true.
 +       *
 +       * There is no option to reach here without going through that check because:
 +       * 1. validate_queue_index() assigns true to is_kernel_allocated_cb for any submission to
 +       *    an external queue.
 +       * 2. For Gaudi, we only parse CBs that were submitted to the external queues.
 +       */
 +      memcpy(parser->patched_cb->kernel_address,
 +              parser->user_cb->kernel_address,
 +              parser->user_cb_size);
 +
 +      patched_cb_size = parser->patched_cb_size;
 +
 +      /* Validate patched CB instead of user CB */
 +      user_cb = parser->user_cb;
 +      parser->user_cb = parser->patched_cb;
 +      rc = gaudi_validate_cb(hdev, parser, true);
 +      parser->user_cb = user_cb;
 +
 +      if (rc) {
 +              hl_cb_put(parser->patched_cb);
 +              goto out;
 +      }
 +
 +      if (patched_cb_size != parser->patched_cb_size) {
 +              dev_err(hdev->dev, "user CB size mismatch\n");
 +              hl_cb_put(parser->patched_cb);
 +              rc = -EINVAL;
 +              goto out;
 +      }
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_no_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      int rc;
 +
 +      rc = gaudi_validate_cb(hdev, parser, false);
 +
 +      if (rc)
 +              goto free_userptr;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n", rc);
 +              goto free_userptr;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      rc = gaudi_patch_cb(hdev, parser);
 +
 +      if (rc)
 +              hl_cb_put(parser->patched_cb);
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +free_userptr:
 +      if (rc)
 +              hl_userptr_delete_list(hdev, parser->job_userptr_list);
 +      return rc;
 +}
 +
 +static int gaudi_parse_cb_no_ext_queue(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 nic_queue_offset, nic_mask_q_id;
 +
 +      if ((parser->hw_queue_id >= GAUDI_QUEUE_ID_NIC_0_0) &&
 +                      (parser->hw_queue_id <= GAUDI_QUEUE_ID_NIC_9_3)) {
 +              nic_queue_offset = parser->hw_queue_id - GAUDI_QUEUE_ID_NIC_0_0;
 +              nic_mask_q_id = 1 << (HW_CAP_NIC_SHIFT + (nic_queue_offset >> 2));
 +
 +              if (!(gaudi->hw_cap_initialized & nic_mask_q_id)) {
 +                      dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
 +                      return -EINVAL;
 +              }
 +      }
 +
 +      /* For internal queue jobs just check if CB address is valid */
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->sram_user_base_address,
 +                                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->dram_user_base_address,
 +                                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      /* PMMU and HPMMU addresses are equal, check only one of them */
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu.start_addr,
 +                                      asic_prop->pmmu.end_addr))
 +              return 0;
 +
 +      dev_err(hdev->dev,
 +              "CB address 0x%px + 0x%x for internal QMAN is not valid\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +static int gaudi_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (parser->queue_type == QUEUE_TYPE_INT)
 +              return gaudi_parse_cb_no_ext_queue(hdev, parser);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_MMU)
 +              return gaudi_parse_cb_mmu(hdev, parser);
 +      else
 +              return gaudi_parse_cb_no_mmu(hdev, parser);
 +}
 +
 +static void gaudi_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
 +                              u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
 +                              u32 msi_vec, bool eb)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct packet_msg_prot *cq_pkt;
 +      struct packet_nop *cq_padding;
 +      u64 msi_addr;
 +      u32 tmp;
 +
 +      cq_padding = kernel_address + original_len;
 +      cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
 +
 +      while ((void *)cq_padding < (void *)cq_pkt) {
 +              cq_padding->ctl = cpu_to_le32(FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_NOP));
 +              cq_padding++;
 +      }
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      if (eb)
 +              tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(cq_val);
 +      cq_pkt->addr = cpu_to_le64(cq_addr);
 +
 +      cq_pkt++;
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(1);
 +
 +      if (gaudi->multi_msi_mode)
 +              msi_addr = mmPCIE_MSI_INTR_0 + msi_vec * 4;
 +      else
 +              msi_addr = mmPCIE_CORE_MSI_REQ;
 +
 +      cq_pkt->addr = cpu_to_le64(CFG_BASE + msi_addr);
 +}
 +
 +static void gaudi_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, val);
 +}
 +
 +static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
 +                                      u32 size, u64 val)
 +{
 +      struct packet_lin_dma *lin_dma_pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl, err_cause;
 +      struct hl_cb *cb;
 +      int rc;
 +
 +      cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      lin_dma_pkt = cb->kernel_address;
 +      memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
 +      cb_size = sizeof(*lin_dma_pkt);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +
 +      lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +      lin_dma_pkt->src_addr = cpu_to_le64(val);
 +      lin_dma_pkt->dst_addr |= cpu_to_le64(addr);
 +      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
 +      if (err_cause && !hdev->init_done) {
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
 +      if (err_cause) {
 +              dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
 +              rc = -EIO;
 +              if (!hdev->init_done) {
 +                      dev_dbg(hdev->dev,
 +                              "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                              err_cause);
 +                      WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
 +              }
 +      }
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
 +                                      u32 num_regs, u32 val)
 +{
 +      struct packet_msg_long *pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl;
 +      struct hl_cb *cb;
 +      int i, rc;
 +
 +      cb_size = (sizeof(*pkt) * num_regs) + sizeof(struct packet_msg_prot);
 +
 +      if (cb_size > SZ_2M) {
 +              dev_err(hdev->dev, "CB size must be smaller than %uMB", SZ_2M);
 +              return -ENOMEM;
 +      }
 +
 +      cb = hl_cb_kernel_create(hdev, cb_size, false);
 +      if (!cb)
 +              return -EFAULT;
 +
 +      pkt = cb->kernel_address;
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_LONG_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_LONG);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      for (i = 0; i < num_regs ; i++, pkt++) {
 +              pkt->ctl = cpu_to_le32(ctl);
 +              pkt->value = cpu_to_le32(val);
 +              pkt->addr = cpu_to_le64(reg_base + (i * 4));
 +      }
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = cb_size;
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = gaudi_send_job_on_qman0(hdev, job);
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +static int gaudi_restore_sm_registers(struct hl_device *hdev)
 +{
 +      u64 base_addr;
 +      u32 num_regs;
 +      int rc;
 +
 +      base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      num_regs = NUM_OF_SOB_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_MON_STATUS_0;
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT * 4);
 +      num_regs = NUM_OF_SOB_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      base_addr = CFG_BASE +  mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0 +
 +                      (GAUDI_FIRST_AVAILABLE_W_S_MONITOR * 4);
 +      num_regs = NUM_OF_MONITORS_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_MONITOR;
 +      rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed resetting SM registers");
 +              return -ENOMEM;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi_restore_dma_registers(struct hl_device *hdev)
 +{
 +      u32 sob_delta = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_1 -
 +                      mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +      int i;
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              u64 sob_addr = CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                              (i * sob_delta);
 +              u32 dma_offset = i * DMA_CORE_OFFSET;
 +
 +              WREG32(mmDMA0_CORE_WR_COMP_ADDR_LO + dma_offset,
 +                              lower_32_bits(sob_addr));
 +              WREG32(mmDMA0_CORE_WR_COMP_ADDR_HI + dma_offset,
 +                              upper_32_bits(sob_addr));
 +              WREG32(mmDMA0_CORE_WR_COMP_WDATA + dma_offset, 0x80000001);
 +
 +              /* For DMAs 2-7, need to restore WR_AWUSER_31_11 as it can be
 +               * modified by the user for SRAM reduction
 +               */
 +              if (i > 1)
 +                      WREG32(mmDMA0_CORE_WR_AWUSER_31_11 + dma_offset,
 +                                                              0x00000001);
 +      }
 +}
 +
 +static void gaudi_restore_qm_registers(struct hl_device *hdev)
 +{
 +      u32 qman_offset;
 +      int i;
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
 +              qman_offset = i * DMA_QMAN_OFFSET;
 +              WREG32(mmDMA0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_MASTER_ENGINES ; i++) {
 +              qman_offset = i * (mmMME2_QM_BASE - mmMME0_QM_BASE);
 +              WREG32(mmMME0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              qman_offset = i * TPC_QMAN_OFFSET;
 +              WREG32(mmTPC0_QM_ARB_CFG_0 + qman_offset, 0);
 +      }
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              qman_offset = (i >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (i & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              WREG32(mmNIC0_QM0_ARB_CFG_0 + qman_offset, 0);
 +      }
 +}
 +
 +static int gaudi_restore_user_registers(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi_restore_sm_registers(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi_restore_dma_registers(hdev);
 +      gaudi_restore_qm_registers(hdev);
 +
 +      return 0;
 +}
 +
 +static int gaudi_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      return 0;
 +}
 +
 +static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev)
 +{
 +      u32 size = hdev->asic_prop.mmu_pgt_size +
 +                      hdev->asic_prop.mmu_cache_mng_size;
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 addr = hdev->asic_prop.mmu_pgt_addr;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return gaudi_memset_device_memory(hdev, addr, size, 0);
 +}
 +
 +static void gaudi_restore_phase_topology(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static int gaudi_dma_core_transfer(struct hl_device *hdev, int dma_id, u64 addr,
 +                                      u32 size_to_dma, dma_addr_t dma_addr)
 +{
 +      u32 err_cause, val;
 +      u64 dma_offset;
 +      int rc;
 +
 +      dma_offset = dma_id * DMA_CORE_OFFSET;
 +
 +      WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset, lower_32_bits(addr));
 +      WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset, upper_32_bits(addr));
 +      WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset, lower_32_bits(dma_addr));
 +      WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset, upper_32_bits(dma_addr));
 +      WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset, size_to_dma);
 +      WREG32(mmDMA0_CORE_COMMIT + dma_offset,
 +                      (1 << DMA0_CORE_COMMIT_LIN_SHIFT));
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmDMA0_CORE_STS0 + dma_offset,
 +              val,
 +              ((val & DMA0_CORE_STS0_BUSY_MASK) == 0),
 +              0,
 +              1000000);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "DMA %d timed-out during reading of 0x%llx\n",
 +                      dma_id, addr);
 +              return -EIO;
 +      }
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      if (err_cause) {
 +              dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
 +
 +              return -EIO;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size,
 +                              void *blob_addr)
 +{
 +      u32 dma_core_sts0, err_cause, cfg1, size_left, pos, size_to_dma;
 +      u32 qm_glbl_sts0, qm_cgm_sts;
 +      u64 dma_offset, qm_offset;
 +      dma_addr_t dma_addr;
 +      void *kernel_addr;
 +      bool is_eng_idle;
 +      int rc = 0, dma_id;
 +
 +      kernel_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &dma_addr, GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!kernel_addr)
 +              return -ENOMEM;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
 +      dma_offset = dma_id * DMA_CORE_OFFSET;
 +      qm_offset = dma_id * DMA_QMAN_OFFSET;
 +      dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
 +      qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
 +      qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
 +      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                    IS_DMA_IDLE(dma_core_sts0);
 +
 +      if (!is_eng_idle) {
 +              dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
 +              dma_offset = dma_id * DMA_CORE_OFFSET;
 +              qm_offset = dma_id * DMA_QMAN_OFFSET;
 +              dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
 +              qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
 +              qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                            IS_DMA_IDLE(dma_core_sts0);
 +
 +              if (!is_eng_idle) {
 +                      dev_err_ratelimited(hdev->dev,
 +                              "Can't read via DMA because it is BUSY\n");
 +                      rc = -EAGAIN;
 +                      goto out;
 +              }
 +      }
 +
 +      cfg1 = RREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset);
 +      WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset,
 +                      0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +
 +      /* TODO: remove this by mapping the DMA temporary buffer to the MMU
 +       * using the compute ctx ASID, if exists. If not, use the kernel ctx
 +       * ASID
 +       */
 +      WREG32_OR(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      /* Verify DMA is OK */
 +      err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      if (err_cause) {
 +              dev_dbg(hdev->dev,
 +                      "Clearing DMA0 engine from errors (cause 0x%x)\n",
 +                      err_cause);
 +              WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
 +      }
 +
 +      pos = 0;
 +      size_left = size;
 +      size_to_dma = SZ_2M;
 +
 +      while (size_left > 0) {
 +
 +              if (size_left < SZ_2M)
 +                      size_to_dma = size_left;
 +
 +              rc = gaudi_dma_core_transfer(hdev, dma_id, addr, size_to_dma,
 +                                              dma_addr);
 +              if (rc)
 +                      break;
 +
 +              memcpy(blob_addr + pos, kernel_addr, size_to_dma);
 +
 +              if (size_left <= SZ_2M)
 +                      break;
 +
 +              pos += SZ_2M;
 +              addr += SZ_2M;
 +              size_left -= SZ_2M;
 +      }
 +
 +      /* TODO: remove this by mapping the DMA temporary buffer to the MMU
 +       * using the compute ctx ASID, if exists. If not, use the kernel ctx
 +       * ASID
 +       */
 +      WREG32_AND(mmDMA0_CORE_PROT + dma_offset,
 +                      ~BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset, cfg1);
 +
 +out:
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +
 +      hl_asic_dma_free_coherent(hdev, SZ_2M, kernel_addr, dma_addr);
 +
 +      return rc;
 +}
 +
 +static u64 gaudi_read_pte(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return U64_MAX;
 +
 +      return readq(hdev->pcie_bar[HBM_BAR_ID] +
 +                      (addr - gaudi->hbm_bar_cur_addr));
 +}
 +
 +static void gaudi_write_pte(struct hl_device *hdev, u64 addr, u64 val)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return;
 +
 +      writeq(val, hdev->pcie_bar[HBM_BAR_ID] +
 +                      (addr - gaudi->hbm_bar_cur_addr));
 +}
 +
 +void gaudi_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
 +{
 +      /* mask to zero the MMBP and ASID bits */
 +      WREG32_AND(reg, ~0x7FF);
 +      WREG32_OR(reg, asid);
 +}
 +
 +static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (asid & ~DMA0_QM_GLBL_NON_SECURE_PROPS_0_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return;
 +      }
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmDMA0_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA1_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA2_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA3_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA4_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA5_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA6_CORE_NON_SECURE_PROPS, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmDMA7_CORE_NON_SECURE_PROPS, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_ARUSER_LO, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_AWUSER_LO, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_2, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_3, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_4, asid);
 +
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER0, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER1, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME0_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME1_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME2_ACC_WBC, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmMME3_ACC_WBC, asid);
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC0) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC1) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC2) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC3) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC4) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC5) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC6) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC7) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC8) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      if (gaudi->hw_cap_initialized & HW_CAP_NIC9) {
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3,
 +                              asid);
 +              gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4,
 +                              asid);
 +      }
 +
 +      gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_ARUSER, asid);
 +      gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_AWUSER, asid);
 +}
 +
 +static int gaudi_send_job_on_qman0(struct hl_device *hdev,
 +              struct hl_cs_job *job)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      u32 *fence_ptr;
 +      dma_addr_t fence_dma_addr;
 +      struct hl_cb *cb;
 +      u32 tmp, timeout, dma_offset;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout = GAUDI_PLDM_QMAN0_TIMEOUT_USEC;
 +      else
 +              timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate fence memory for QMAN0\n");
 +              return -ENOMEM;
 +      }
 +
 +      cb = job->patched_cb;
 +
 +      fence_pkt = cb->kernel_address +
 +                      job->job_cb_size - sizeof(struct packet_msg_prot);
 +
 +      tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
 +      tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(GAUDI_QMAN0_FENCE_VAL);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      dma_offset = gaudi_dma_assignment[GAUDI_PCI_DMA_1] * DMA_CORE_OFFSET;
 +
 +      WREG32(mmDMA0_CORE_PROT + dma_offset,
 +                      BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT) | BIT(DMA0_CORE_PROT_VAL_SHIFT));
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, GAUDI_QUEUE_ID_DMA_0_0,
 +                                      job->job_cb_size, cb->bus_address);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
 +              goto free_fence_ptr;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
 +                              (tmp == GAUDI_QMAN0_FENCE_VAL), 1000,
 +                              timeout, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, GAUDI_QUEUE_ID_DMA_0_0);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
 +              goto free_fence_ptr;
 +      }
 +
 +free_fence_ptr:
 +      WREG32(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT));
 +
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +static void gaudi_get_event_desc(u16 event_type, char *desc, size_t size)
 +{
 +      if (event_type >= GAUDI_EVENT_SIZE)
 +              goto event_not_supported;
 +
 +      if (!gaudi_irq_map_table[event_type].valid)
 +              goto event_not_supported;
 +
 +      snprintf(desc, size, gaudi_irq_map_table[event_type].name);
 +
 +      return;
 +
 +event_not_supported:
 +      snprintf(desc, size, "N/A");
 +}
 +
 +static const char *gaudi_get_razwi_initiator_dma_name(struct hl_device *hdev, u32 x_y,
 +                                                      bool is_write, u16 *engine_id_1,
 +                                                      u16 *engine_id_2)
 +{
 +      u32 dma_id[2], dma_offset, err_cause[2], mask, i;
 +
 +      mask = is_write ? DMA0_CORE_ERR_CAUSE_HBW_WR_ERR_MASK :
 +                              DMA0_CORE_ERR_CAUSE_HBW_RD_ERR_MASK;
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +              dma_id[0] = 0;
 +              dma_id[1] = 2;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +              dma_id[0] = 1;
 +              dma_id[1] = 3;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +              dma_id[0] = 4;
 +              dma_id[1] = 6;
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              dma_id[0] = 5;
 +              dma_id[1] = 7;
 +              break;
 +      default:
 +              goto unknown_initiator;
 +      }
 +
 +      for (i = 0 ; i < 2 ; i++) {
 +              dma_offset = dma_id[i] * DMA_CORE_OFFSET;
 +              err_cause[i] = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
 +      }
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
 +                      return "DMA0";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_2;
 +                      return "DMA2";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_2;
 +                      return "DMA0 or DMA2";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
 +                      return "DMA1";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_3;
 +                      return "DMA3";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_3;
 +                      return "DMA1 or DMA3";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
 +                      return "DMA4";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_6;
 +                      return "DMA6";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_6;
 +                      return "DMA4 or DMA6";
 +              }
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
 +                      return "DMA5";
 +              } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_7;
 +                      return "DMA7";
 +              } else {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
 +                      *engine_id_2 = GAUDI_ENGINE_ID_DMA_7;
 +                      return "DMA5 or DMA7";
 +              }
 +      }
 +
 +unknown_initiator:
 +      return "unknown initiator";
 +}
 +
 +static const char *gaudi_get_razwi_initiator_name(struct hl_device *hdev, bool is_write,
 +                                                      u16 *engine_id_1, u16 *engine_id_2)
 +{
 +      u32 val, x_y, axi_id;
 +
 +      val = is_write ? RREG32(mmMMU_UP_RAZWI_WRITE_ID) :
 +                              RREG32(mmMMU_UP_RAZWI_READ_ID);
 +      x_y = val & ((RAZWI_INITIATOR_Y_MASK << RAZWI_INITIATOR_Y_SHIFT) |
 +                      (RAZWI_INITIATOR_X_MASK << RAZWI_INITIATOR_X_SHIFT));
 +      axi_id = val & (RAZWI_INITIATOR_AXI_ID_MASK <<
 +                      RAZWI_INITIATOR_AXI_ID_SHIFT);
 +
 +      switch (x_y) {
 +      case RAZWI_INITIATOR_ID_X_Y_TPC0_NIC0:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_0;
 +                      return "TPC0";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_0;
 +                      return "NIC0";
 +              }
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_TPC1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_1;
 +              return "TPC1";
 +      case RAZWI_INITIATOR_ID_X_Y_MME0_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME0_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_0;
 +              return "MME0";
 +      case RAZWI_INITIATOR_ID_X_Y_MME1_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME1_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_1;
 +              return "MME1";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC2:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_2;
 +              return "TPC2";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC3_PCI_CPU_PSOC:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_3;
 +                      return "TPC3";
 +              }
 +              /* PCI, CPU or PSOC does not have engine id*/
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PCI))
 +                      return "PCI";
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_CPU))
 +                      return "CPU";
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PSOC))
 +                      return "PSOC";
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
 +      case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
 +              return gaudi_get_razwi_initiator_dma_name(hdev, x_y, is_write,
 +                              engine_id_1, engine_id_2);
 +      case RAZWI_INITIATOR_ID_X_Y_TPC4_NIC1_NIC2:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_4;
 +                      return "TPC4";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_1;
 +                      return "NIC1";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_2;
 +                      return "NIC2";
 +              }
 +              break;
 +      case RAZWI_INITIATOR_ID_X_Y_TPC5:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_5;
 +              return "TPC5";
 +      case RAZWI_INITIATOR_ID_X_Y_MME2_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME2_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_2;
 +              return "MME2";
 +      case RAZWI_INITIATOR_ID_X_Y_MME3_0:
 +      case RAZWI_INITIATOR_ID_X_Y_MME3_1:
 +              *engine_id_1 = GAUDI_ENGINE_ID_MME_3;
 +              return "MME3";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC6:
 +              *engine_id_1 = GAUDI_ENGINE_ID_TPC_6;
 +              return "TPC6";
 +      case RAZWI_INITIATOR_ID_X_Y_TPC7_NIC4_NIC5:
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_TPC_7;
 +                      return "TPC7";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_4;
 +                      return "NIC4";
 +              }
 +              if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
 +                      *engine_id_1 = GAUDI_ENGINE_ID_NIC_5;
 +                      return "NIC5";
 +              }
 +              break;
 +      default:
 +              break;
 +      }
 +
 +      dev_err(hdev->dev,
 +              "Unknown RAZWI initiator ID 0x%x [Y=%d, X=%d, AXI_ID=%d]\n",
 +              val,
 +              (val >> RAZWI_INITIATOR_Y_SHIFT) & RAZWI_INITIATOR_Y_MASK,
 +              (val >> RAZWI_INITIATOR_X_SHIFT) & RAZWI_INITIATOR_X_MASK,
 +              (val >> RAZWI_INITIATOR_AXI_ID_SHIFT) &
 +                      RAZWI_INITIATOR_AXI_ID_MASK);
 +
 +      return "unknown initiator";
 +}
 +
 +static void gaudi_print_and_get_razwi_info(struct hl_device *hdev, u16 *engine_id_1,
 +                                              u16 *engine_id_2, bool *is_read, bool *is_write)
 +{
 +
 +      if (RREG32(mmMMU_UP_RAZWI_WRITE_VLD)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "RAZWI event caused by illegal write of %s\n",
 +                      gaudi_get_razwi_initiator_name(hdev, true, engine_id_1, engine_id_2));
 +              WREG32(mmMMU_UP_RAZWI_WRITE_VLD, 0);
 +              *is_write = true;
 +      }
 +
 +      if (RREG32(mmMMU_UP_RAZWI_READ_VLD)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "RAZWI event caused by illegal read of %s\n",
 +                      gaudi_get_razwi_initiator_name(hdev, false, engine_id_1, engine_id_2));
 +              WREG32(mmMMU_UP_RAZWI_READ_VLD, 0);
 +              *is_read = true;
 +      }
 +}
 +
 +static void gaudi_print_and_get_mmu_error_info(struct hl_device *hdev, u64 *addr, u64 *event_mask)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 val;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      val = RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE);
 +      if (val & MMU_UP_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              *addr = val & MMU_UP_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
 +              *addr <<= 32;
 +              *addr |= RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n", *addr);
 +              hl_handle_page_fault(hdev, *addr, 0, true, event_mask);
 +
 +              WREG32(mmMMU_UP_PAGE_ERROR_CAPTURE, 0);
 +      }
 +
 +      val = RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE);
 +      if (val & MMU_UP_ACCESS_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              *addr = val & MMU_UP_ACCESS_ERROR_CAPTURE_VA_49_32_MASK;
 +              *addr <<= 32;
 +              *addr |= RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU access error on va 0x%llx\n", *addr);
 +
 +              WREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE, 0);
 +      }
 +}
 +
 +/*
 + *  +-------------------+------------------------------------------------------+
 + *  | Configuration Reg |                     Description                      |
 + *  |      Address      |                                                      |
 + *  +-------------------+------------------------------------------------------+
 + *  |  0xF30 - 0xF3F    |ECC single error indication (1 bit per memory wrapper)|
 + *  |                   |0xF30 memory wrappers 31:0 (MSB to LSB)               |
 + *  |                   |0xF34 memory wrappers 63:32                           |
 + *  |                   |0xF38 memory wrappers 95:64                           |
 + *  |                   |0xF3C memory wrappers 127:96                          |
 + *  +-------------------+------------------------------------------------------+
 + *  |  0xF40 - 0xF4F    |ECC double error indication (1 bit per memory wrapper)|
 + *  |                   |0xF40 memory wrappers 31:0 (MSB to LSB)               |
 + *  |                   |0xF44 memory wrappers 63:32                           |
 + *  |                   |0xF48 memory wrappers 95:64                           |
 + *  |                   |0xF4C memory wrappers 127:96                          |
 + *  +-------------------+------------------------------------------------------+
 + */
 +static int gaudi_extract_ecc_info(struct hl_device *hdev,
 +              struct ecc_info_extract_params *params, u64 *ecc_address,
 +              u64 *ecc_syndrom, u8 *memory_wrapper_idx)
 +{
 +      u32 i, num_mem_regs, reg, err_bit;
 +      u64 err_addr, err_word = 0;
 +
 +      num_mem_regs = params->num_memories / 32 +
 +                      ((params->num_memories % 32) ? 1 : 0);
 +
 +      if (params->block_address >= CFG_BASE)
 +              params->block_address -= CFG_BASE;
 +
 +      if (params->derr)
 +              err_addr = params->block_address + GAUDI_ECC_DERR0_OFFSET;
 +      else
 +              err_addr = params->block_address + GAUDI_ECC_SERR0_OFFSET;
 +
 +      /* Set invalid wrapper index */
 +      *memory_wrapper_idx = 0xFF;
 +
 +      /* Iterate through memory wrappers, a single bit must be set */
 +      for (i = 0 ; i < num_mem_regs ; i++) {
 +              err_addr += i * 4;
 +              err_word = RREG32(err_addr);
 +              if (err_word) {
 +                      err_bit = __ffs(err_word);
 +                      *memory_wrapper_idx = err_bit + (32 * i);
 +                      break;
 +              }
 +      }
 +
 +      if (*memory_wrapper_idx == 0xFF) {
 +              dev_err(hdev->dev, "ECC error information cannot be found\n");
 +              return -EINVAL;
 +      }
 +
 +      WREG32(params->block_address + GAUDI_ECC_MEM_SEL_OFFSET,
 +                      *memory_wrapper_idx);
 +
 +      *ecc_address =
 +              RREG32(params->block_address + GAUDI_ECC_ADDRESS_OFFSET);
 +      *ecc_syndrom =
 +              RREG32(params->block_address + GAUDI_ECC_SYNDROME_OFFSET);
 +
 +      /* Clear error indication */
 +      reg = RREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET);
 +      if (params->derr)
 +              reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_DERR_MASK, 1);
 +      else
 +              reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_SERR_MASK, 1);
 +
 +      WREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET, reg);
 +
 +      return 0;
 +}
 +
 +/*
 + * gaudi_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
 + *
 + * @idx: the current pi/ci value
 + * @q_len: the queue length (power of 2)
 + *
 + * @return the cyclically decremented index
 + */
 +static inline u32 gaudi_queue_idx_dec(u32 idx, u32 q_len)
 +{
 +      u32 mask = q_len - 1;
 +
 +      /*
 +       * modular decrement is equivalent to adding (queue_size -1)
 +       * later we take LSBs to make sure the value is in the
 +       * range [0, queue_len - 1]
 +       */
 +      return (idx + q_len - 1) & mask;
 +}
 +
 +/**
 + * gaudi_handle_sw_config_stream_data - print SW config stream data
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + */
 +static void gaudi_handle_sw_config_stream_data(struct hl_device *hdev, u32 stream,
 +                                              u64 qman_base, u64 event_mask)
 +{
 +      u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
 +      u32 cq_ptr_lo_off, size;
 +
 +      cq_ptr_lo_off = mmTPC0_QM_CQ_PTR_LO_1 - mmTPC0_QM_CQ_PTR_LO_0;
 +
 +      cq_ptr_lo = qman_base + (mmTPC0_QM_CQ_PTR_LO_0 - mmTPC0_QM_BASE) +
 +                                              stream * cq_ptr_lo_off;
 +      cq_ptr_hi = cq_ptr_lo +
 +                              (mmTPC0_QM_CQ_PTR_HI_0 - mmTPC0_QM_CQ_PTR_LO_0);
 +      cq_tsize = cq_ptr_lo +
 +                              (mmTPC0_QM_CQ_TSIZE_0 - mmTPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
 +      size = RREG32(cq_tsize);
 +      dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %u\n",
 +                                                      stream, cq_ptr, size);
 +
 +      if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
 +              hdev->captured_err_info.undef_opcode.cq_addr = cq_ptr;
 +              hdev->captured_err_info.undef_opcode.cq_size = size;
 +              hdev->captured_err_info.undef_opcode.stream_id = stream;
 +      }
 +}
 +
 +/**
 + * gaudi_handle_last_pqes_on_err - print last PQEs on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
 + */
 +static void gaudi_handle_last_pqes_on_err(struct hl_device *hdev, u32 qid_base,
 +                                              u32 stream, u64 qman_base,
 +                                              u64 event_mask,
 +                                              bool pr_sw_conf)
 +{
 +      u32 ci, qm_ci_stream_off, queue_len;
 +      struct hl_hw_queue *q;
 +      u64 pq_ci, addr[PQ_FETCHER_CACHE_SIZE];
 +      int i;
 +
 +      q = &hdev->kernel_queues[qid_base + stream];
 +
 +      qm_ci_stream_off = mmTPC0_QM_PQ_CI_1 - mmTPC0_QM_PQ_CI_0;
 +      pq_ci = qman_base + (mmTPC0_QM_PQ_CI_0 - mmTPC0_QM_BASE) +
 +                                              stream * qm_ci_stream_off;
 +
 +      queue_len = (q->queue_type == QUEUE_TYPE_INT) ?
 +                                      q->int_queue_len : HL_QUEUE_LENGTH;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      if (pr_sw_conf)
 +              gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
 +
 +      ci = RREG32(pq_ci);
 +
 +      /* we should start printing form ci -1 */
 +      ci = gaudi_queue_idx_dec(ci, queue_len);
 +      memset(addr, 0, sizeof(addr));
 +
 +      for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
 +              struct hl_bd *bd;
 +              u32 len;
 +
 +              bd = q->kernel_address;
 +              bd += ci;
 +
 +              len = le32_to_cpu(bd->len);
 +              /* len 0 means uninitialized entry- break */
 +              if (!len)
 +                      break;
 +
 +              addr[i] = le64_to_cpu(bd->ptr);
 +
 +              dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %u\n",
 +                                                      stream, ci, addr[i], len);
 +
 +              /* get previous ci, wrap if needed */
 +              ci = gaudi_queue_idx_dec(ci, queue_len);
 +      }
 +
 +      if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
 +              struct undefined_opcode_info *undef_opcode = &hdev->captured_err_info.undef_opcode;
 +              u32 arr_idx = undef_opcode->cb_addr_streams_len;
 +
 +              if (arr_idx == 0) {
 +                      undef_opcode->timestamp = ktime_get();
 +                      undef_opcode->engine_id = gaudi_queue_id_to_engine_id[qid_base];
 +              }
 +
 +              memcpy(undef_opcode->cb_addr_streams[arr_idx], addr, sizeof(addr));
 +              undef_opcode->cb_addr_streams_len++;
 +      }
 +
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +}
 +
 +/**
 + * handle_qman_data_on_err - extract QMAN data on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @event_mask: mask of the last events occurred
 + *
 + * This function attempt to exatract as much data as possible on QMAN error.
 + * On upper CP print the SW config stream data and last 8 PQEs.
 + * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
 + */
 +static void handle_qman_data_on_err(struct hl_device *hdev, u32 qid_base,
 +                                 u32 stream, u64 qman_base, u64 event_mask)
 +{
 +      u32 i;
 +
 +      if (stream != QMAN_STREAMS) {
 +              gaudi_handle_last_pqes_on_err(hdev, qid_base, stream,
 +                      qman_base, event_mask, true);
 +              return;
 +      }
 +
 +      /* handle Lower-CP */
 +      gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
 +
 +      for (i = 0; i < QMAN_STREAMS; i++)
 +              gaudi_handle_last_pqes_on_err(hdev, qid_base, i,
 +                      qman_base, event_mask, false);
 +}
 +
 +static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
 +                                        const char *qm_name,
 +                                        u64 qman_base,
 +                                        u32 qid_base,
 +                                        u64 *event_mask)
 +{
 +      u32 i, j, glbl_sts_val, arb_err_val, glbl_sts_clr_val;
 +      u64 glbl_sts_addr, arb_err_addr;
 +      char reg_desc[32];
 +
 +      glbl_sts_addr = qman_base + (mmTPC0_QM_GLBL_STS1_0 - mmTPC0_QM_BASE);
 +      arb_err_addr = qman_base + (mmTPC0_QM_ARB_ERR_CAUSE - mmTPC0_QM_BASE);
 +
 +      /* Iterate through all stream GLBL_STS1 registers + Lower CP */
 +      for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
 +              glbl_sts_clr_val = 0;
 +              glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
 +
 +              if (!glbl_sts_val)
 +                      continue;
 +
 +              if (i == QMAN_STREAMS)
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
 +              else
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
 +
 +              for (j = 0 ; j < GAUDI_NUM_OF_QM_ERR_CAUSE ; j++) {
 +                      if (glbl_sts_val & BIT(j)) {
 +                              dev_err_ratelimited(hdev->dev,
 +                                              "%s %s. err cause: %s\n",
 +                                              qm_name, reg_desc,
 +                                              gaudi_qman_error_cause[j]);
 +                              glbl_sts_clr_val |= BIT(j);
 +                      }
 +              }
 +              /* check for undefined opcode */
 +              if (glbl_sts_val & TPC0_QM_GLBL_STS1_CP_UNDEF_CMD_ERR_MASK &&
 +                              hdev->captured_err_info.undef_opcode.write_enable) {
 +                      memset(&hdev->captured_err_info.undef_opcode, 0,
 +                                              sizeof(hdev->captured_err_info.undef_opcode));
 +
 +                      hdev->captured_err_info.undef_opcode.write_enable = false;
 +                      *event_mask |= HL_NOTIFIER_EVENT_UNDEFINED_OPCODE;
 +              }
 +
 +              /* Write 1 clear errors */
 +              if (!hdev->stop_on_err)
 +                      WREG32(glbl_sts_addr + 4 * i, glbl_sts_clr_val);
 +              else
 +                      handle_qman_data_on_err(hdev, qid_base, i, qman_base, *event_mask);
 +      }
 +
 +      arb_err_val = RREG32(arb_err_addr);
 +
 +      if (!arb_err_val)
 +              return;
 +
 +      for (j = 0 ; j < GAUDI_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
 +              if (arb_err_val & BIT(j)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "%s ARB_ERR. err cause: %s\n",
 +                                      qm_name,
 +                                      gaudi_qman_arb_error_cause[j]);
 +              }
 +      }
 +}
 +
 +static void gaudi_print_sm_sei_info(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_sm_sei_data *sei_data)
 +{
 +      u32 index = event_type - GAUDI_EVENT_DMA_IF_SEI_0;
 +
 +      /* Flip the bits as the enum is ordered in the opposite way */
 +      index = (index ^ 0x3) & 0x3;
 +
 +      switch (sei_data->sei_cause) {
 +      case SM_SEI_SO_OVERFLOW:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: SOB Group %u overflow/underflow",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      case SM_SEI_LBW_4B_UNALIGNED:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: Unaligned 4B LBW access, monitor agent address low - %#x",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      case SM_SEI_AXI_RESPONSE_ERR:
 +              dev_err_ratelimited(hdev->dev,
 +                      "%s SEI Error: AXI ID %u response error",
 +                      gaudi_sync_manager_names[index],
 +                      le32_to_cpu(sei_data->sei_log));
 +              break;
 +      default:
 +              dev_err_ratelimited(hdev->dev, "Unknown SM SEI cause %u",
 +                              le32_to_cpu(sei_data->sei_log));
 +              break;
 +      }
 +}
 +
 +static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_ecc_data *ecc_data)
 +{
 +      struct ecc_info_extract_params params;
 +      u64 ecc_address = 0, ecc_syndrom = 0;
 +      u8 index, memory_wrapper_idx = 0;
 +      bool extract_info_from_fw;
 +      int rc;
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              extract_info_from_fw = true;
 +              goto extract_ecc_info;
 +      }
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_PCIE_CORE_SERR ... GAUDI_EVENT_PCIE_PHY_DERR:
 +      case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_MMU_DERR:
 +              extract_info_from_fw = true;
 +              break;
 +      case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
 +              index = event_type - GAUDI_EVENT_TPC0_SERR;
 +              params.block_address = mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
 +              params.num_memories = 90;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
 +              index = event_type - GAUDI_EVENT_TPC0_DERR;
 +              params.block_address =
 +                      mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
 +              params.num_memories = 90;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_ACC_SERR:
 +      case GAUDI_EVENT_MME1_ACC_SERR:
 +      case GAUDI_EVENT_MME2_ACC_SERR:
 +      case GAUDI_EVENT_MME3_ACC_SERR:
 +              index = (event_type - GAUDI_EVENT_MME0_ACC_SERR) / 4;
 +              params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 128;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_ACC_DERR:
 +      case GAUDI_EVENT_MME1_ACC_DERR:
 +      case GAUDI_EVENT_MME2_ACC_DERR:
 +      case GAUDI_EVENT_MME3_ACC_DERR:
 +              index = (event_type - GAUDI_EVENT_MME0_ACC_DERR) / 4;
 +              params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 128;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_SBAB_SERR:
 +      case GAUDI_EVENT_MME1_SBAB_SERR:
 +      case GAUDI_EVENT_MME2_SBAB_SERR:
 +      case GAUDI_EVENT_MME3_SBAB_SERR:
 +              index = (event_type - GAUDI_EVENT_MME0_SBAB_SERR) / 4;
 +              params.block_address =
 +                      mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 33;
 +              params.derr = false;
 +              extract_info_from_fw = false;
 +              break;
 +      case GAUDI_EVENT_MME0_SBAB_DERR:
 +      case GAUDI_EVENT_MME1_SBAB_DERR:
 +      case GAUDI_EVENT_MME2_SBAB_DERR:
 +      case GAUDI_EVENT_MME3_SBAB_DERR:
 +              index = (event_type - GAUDI_EVENT_MME0_SBAB_DERR) / 4;
 +              params.block_address =
 +                      mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
 +              params.num_memories = 33;
 +              params.derr = true;
 +              extract_info_from_fw = false;
 +              break;
 +      default:
 +              return;
 +      }
 +
 +extract_ecc_info:
 +      if (extract_info_from_fw) {
 +              ecc_address = le64_to_cpu(ecc_data->ecc_address);
 +              ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
 +              memory_wrapper_idx = ecc_data->memory_wrapper_idx;
 +      } else {
 +              rc = gaudi_extract_ecc_info(hdev, &params, &ecc_address,
 +                              &ecc_syndrom, &memory_wrapper_idx);
 +              if (rc)
 +                      return;
 +      }
 +
 +      dev_err(hdev->dev,
 +              "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u\n",
 +              ecc_address, ecc_syndrom, memory_wrapper_idx);
 +}
 +
 +static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      u64 qman_base;
 +      char desc[32];
 +      u32 qid_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
 +              index = event_type - GAUDI_EVENT_TPC0_QM;
 +              qid_base = GAUDI_QUEUE_ID_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmTPC0_QM_BASE + index * TPC_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC_QM", index);
 +              break;
 +      case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 +              if (event_type == GAUDI_EVENT_MME0_QM) {
 +                      index = 0;
 +                      qid_base = GAUDI_QUEUE_ID_MME_0_0;
 +              } else { /* event_type == GAUDI_EVENT_MME2_QM */
 +                      index = 2;
 +                      qid_base = GAUDI_QUEUE_ID_MME_1_0;
 +              }
 +              qman_base = mmMME0_QM_BASE + index * MME_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "MME_QM", index);
 +              break;
 +      case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 +              index = event_type - GAUDI_EVENT_DMA0_QM;
 +              qid_base = GAUDI_QUEUE_ID_DMA_0_0 + index * QMAN_STREAMS;
 +              /* skip GAUDI_QUEUE_ID_CPU_PQ if necessary */
 +              if (index > 1)
 +                      qid_base++;
 +              qman_base = mmDMA0_QM_BASE + index * DMA_QMAN_OFFSET;
 +              snprintf(desc, ARRAY_SIZE(desc), "%s%d", "DMA_QM", index);
 +              break;
 +      case GAUDI_EVENT_NIC0_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_0_0;
 +              qman_base = mmNIC0_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC0_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_1_0;
 +              qman_base = mmNIC0_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC1_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_2_0;
 +              qman_base = mmNIC1_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC1_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_3_0;
 +              qman_base = mmNIC1_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC2_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_4_0;
 +              qman_base = mmNIC2_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC2_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_5_0;
 +              qman_base = mmNIC2_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC3_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_6_0;
 +              qman_base = mmNIC3_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC3_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_7_0;
 +              qman_base = mmNIC3_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM1");
 +              break;
 +      case GAUDI_EVENT_NIC4_QM0:
 +              qid_base = GAUDI_QUEUE_ID_NIC_8_0;
 +              qman_base = mmNIC4_QM0_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM0");
 +              break;
 +      case GAUDI_EVENT_NIC4_QM1:
 +              qid_base = GAUDI_QUEUE_ID_NIC_9_0;
 +              qman_base = mmNIC4_QM1_BASE;
 +              snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM1");
 +              break;
 +      default:
 +              return;
 +      }
 +
 +      gaudi_handle_qman_err_generic(hdev, desc, qman_base, qid_base, event_mask);
 +}
 +
 +static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
 +                                      bool razwi, u64 *event_mask)
 +{
 +      bool is_read = false, is_write = false;
 +      u16 engine_id[2], num_of_razwi_eng = 0;
 +      char desc[64] = "";
 +      u64 razwi_addr = 0;
 +      u8 razwi_flags = 0;
 +
 +      /*
 +       * Init engine id by default as not valid and only if razwi initiated from engine with
 +       * engine id it will get valid value.
 +       */
 +      engine_id[0] = HL_RAZWI_NA_ENG_ID;
 +      engine_id[1] = HL_RAZWI_NA_ENG_ID;
 +
 +      gaudi_get_event_desc(event_type, desc, sizeof(desc));
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +
 +      if (razwi) {
 +              gaudi_print_and_get_razwi_info(hdev, &engine_id[0], &engine_id[1], &is_read,
 +                                              &is_write);
 +              gaudi_print_and_get_mmu_error_info(hdev, &razwi_addr, event_mask);
 +
 +              if (is_read)
 +                      razwi_flags |= HL_RAZWI_READ;
 +              if (is_write)
 +                      razwi_flags |= HL_RAZWI_WRITE;
 +
 +              if (engine_id[0] != HL_RAZWI_NA_ENG_ID) {
 +                      if (engine_id[1] != HL_RAZWI_NA_ENG_ID)
 +                              num_of_razwi_eng = 2;
 +                      else
 +                              num_of_razwi_eng = 1;
 +              }
 +
 +              hl_handle_razwi(hdev, razwi_addr, engine_id, num_of_razwi_eng, razwi_flags,
 +                              event_mask);
 +      }
 +}
 +
 +static void gaudi_print_out_of_sync_info(struct hl_device *hdev,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
 +
 +      dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static void gaudi_print_fw_alive_info(struct hl_device *hdev,
 +                                      struct hl_eq_fw_alive *fw_alive)
 +{
 +      dev_err(hdev->dev,
 +              "FW alive report: severity=%s, process_id=%u, thread_id=%u, uptime=%llu seconds\n",
 +              (fw_alive->severity == FW_ALIVE_SEVERITY_MINOR) ? "Minor" : "Critical",
 +              le32_to_cpu(fw_alive->process_id),
 +              le32_to_cpu(fw_alive->thread_id),
 +              le64_to_cpu(fw_alive->uptime_seconds));
 +}
 +
 +static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type,
 +                                              void *data)
 +{
 +      char desc[64] = "", *type;
 +      struct eq_nic_sei_event *eq_nic_sei = data;
 +      u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0;
 +
 +      switch (eq_nic_sei->axi_error_cause) {
 +      case RXB:
 +              type = "RXB";
 +              break;
 +      case RXE:
 +              type = "RXE";
 +              break;
 +      case TXS:
 +              type = "TXS";
 +              break;
 +      case TXE:
 +              type = "TXE";
 +              break;
 +      case QPC_RESP:
 +              type = "QPC_RESP";
 +              break;
 +      case NON_AXI_ERR:
 +              type = "NON_AXI_ERR";
 +              break;
 +      case TMR:
 +              type = "TMR";
 +              break;
 +      default:
 +              dev_err(hdev->dev, "unknown NIC AXI cause %d\n",
 +                      eq_nic_sei->axi_error_cause);
 +              type = "N/A";
 +              break;
 +      }
 +
 +      snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type,
 +                      eq_nic_sei->id);
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +}
 +
 +static int gaudi_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      /* GAUDI doesn't support any reset except hard-reset */
 +      return -EPERM;
 +}
 +
 +static int gaudi_hbm_read_interrupts(struct hl_device *hdev, int device,
 +                      struct hl_eq_hbm_ecc_data *hbm_ecc_data)
 +{
 +      u32 base, val, val2, wr_par, rd_par, ca_par, derr, serr, type, ch;
 +      int rc = 0;
 +
 +      if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
 +                                      CPU_BOOT_DEV_STS0_HBM_ECC_EN) {
 +              if (!hbm_ecc_data) {
 +                      dev_err(hdev->dev, "No FW ECC data");
 +                      return 0;
 +              }
 +
 +              wr_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_WR_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              rd_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_RD_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              ca_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_CA_PAR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              derr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_DERR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              serr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_SERR_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              type = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_TYPE_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +              ch = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_HBM_CH_MASK,
 +                              le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
 +
 +              dev_err(hdev->dev,
 +                      "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                      device, ch, wr_par, rd_par, ca_par, serr, derr);
 +              dev_err(hdev->dev,
 +                      "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%u, SEC_CNT=%d, DEC_CNT=%d\n",
 +                      device, ch, hbm_ecc_data->first_addr, type,
 +                      hbm_ecc_data->sec_cont_cnt, hbm_ecc_data->sec_cnt,
 +                      hbm_ecc_data->dec_cnt);
 +              return 0;
 +      }
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              dev_info(hdev->dev, "Cannot access MC regs for ECC data while security is enabled\n");
 +              return 0;
 +      }
 +
 +      base = GAUDI_HBM_CFG_BASE + device * GAUDI_HBM_CFG_OFFSET;
 +      for (ch = 0 ; ch < GAUDI_HBM_CHANNELS ; ch++) {
 +              val = RREG32_MASK(base + ch * 0x1000 + 0x06C, 0x0000FFFF);
 +              val = (val & 0xFF) | ((val >> 8) & 0xFF);
 +              if (val) {
 +                      rc = -EIO;
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                              device, ch * 2, val & 0x1, (val >> 1) & 0x1,
 +                              (val >> 2) & 0x1, (val >> 3) & 0x1,
 +                              (val >> 4) & 0x1);
 +
 +                      val2 = RREG32(base + ch * 0x1000 + 0x060);
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
 +                              device, ch * 2,
 +                              RREG32(base + ch * 0x1000 + 0x064),
 +                              (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
 +                              (val2 & 0xFF0000) >> 16,
 +                              (val2 & 0xFF000000) >> 24);
 +              }
 +
 +              val = RREG32_MASK(base + ch * 0x1000 + 0x07C, 0x0000FFFF);
 +              val = (val & 0xFF) | ((val >> 8) & 0xFF);
 +              if (val) {
 +                      rc = -EIO;
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
 +                              device, ch * 2 + 1, val & 0x1, (val >> 1) & 0x1,
 +                              (val >> 2) & 0x1, (val >> 3) & 0x1,
 +                              (val >> 4) & 0x1);
 +
 +                      val2 = RREG32(base + ch * 0x1000 + 0x070);
 +                      dev_err(hdev->dev,
 +                              "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
 +                              device, ch * 2 + 1,
 +                              RREG32(base + ch * 0x1000 + 0x074),
 +                              (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
 +                              (val2 & 0xFF0000) >> 16,
 +                              (val2 & 0xFF000000) >> 24);
 +              }
 +
 +              /* Clear interrupts */
 +              RMWREG32(base + (ch * 0x1000) + 0x060, 0x1C8, 0x1FF);
 +              RMWREG32(base + (ch * 0x1000) + 0x070, 0x1C8, 0x1FF);
 +              WREG32(base + (ch * 0x1000) + 0x06C, 0x1F1F);
 +              WREG32(base + (ch * 0x1000) + 0x07C, 0x1F1F);
 +              RMWREG32(base + (ch * 0x1000) + 0x060, 0x0, 0xF);
 +              RMWREG32(base + (ch * 0x1000) + 0x070, 0x0, 0xF);
 +      }
 +
 +      val  = RREG32(base + 0x8F30);
 +      val2 = RREG32(base + 0x8F34);
 +      if (val | val2) {
 +              rc = -EIO;
 +              dev_err(hdev->dev,
 +                      "HBM %d MC SRAM SERR info: Reg 0x8F30=0x%x, Reg 0x8F34=0x%x\n",
 +                      device, val, val2);
 +      }
 +      val  = RREG32(base + 0x8F40);
 +      val2 = RREG32(base + 0x8F44);
 +      if (val | val2) {
 +              rc = -EIO;
 +              dev_err(hdev->dev,
 +                      "HBM %d MC SRAM DERR info: Reg 0x8F40=0x%x, Reg 0x8F44=0x%x\n",
 +                      device, val, val2);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi_hbm_event_to_dev(u16 hbm_event_type)
 +{
 +      switch (hbm_event_type) {
 +      case GAUDI_EVENT_HBM0_SPI_0:
 +      case GAUDI_EVENT_HBM0_SPI_1:
 +              return 0;
 +      case GAUDI_EVENT_HBM1_SPI_0:
 +      case GAUDI_EVENT_HBM1_SPI_1:
 +              return 1;
 +      case GAUDI_EVENT_HBM2_SPI_0:
 +      case GAUDI_EVENT_HBM2_SPI_1:
 +              return 2;
 +      case GAUDI_EVENT_HBM3_SPI_0:
 +      case GAUDI_EVENT_HBM3_SPI_1:
 +              return 3;
 +      default:
 +              break;
 +      }
 +
 +      /* Should never happen */
 +      return 0;
 +}
 +
 +static bool gaudi_tpc_read_interrupts(struct hl_device *hdev, u8 tpc_id,
 +                                      char *interrupt_name)
 +{
 +      u32 tpc_offset = tpc_id * TPC_CFG_OFFSET, tpc_interrupts_cause, i;
 +      bool soft_reset_required = false;
 +
 +      tpc_interrupts_cause = RREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset) &
 +                              TPC0_CFG_TPC_INTR_CAUSE_CAUSE_MASK;
 +
 +      for (i = 0 ; i < GAUDI_NUM_OF_TPC_INTR_CAUSE ; i++)
 +              if (tpc_interrupts_cause & BIT(i)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "TPC%d_%s interrupt cause: %s\n",
 +                                      tpc_id, interrupt_name,
 +                                      gaudi_tpc_interrupts_cause[i]);
 +                      /* If this is QM error, we need to soft-reset */
 +                      if (i == 15)
 +                              soft_reset_required = true;
 +              }
 +
 +      /* Clear interrupts */
 +      WREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset, 0);
 +
 +      return soft_reset_required;
 +}
 +
 +static int tpc_dec_event_to_tpc_id(u16 tpc_dec_event_type)
 +{
 +      return (tpc_dec_event_type - GAUDI_EVENT_TPC0_DEC) >> 1;
 +}
 +
 +static int tpc_krn_event_to_tpc_id(u16 tpc_dec_event_type)
 +{
 +      return (tpc_dec_event_type - GAUDI_EVENT_TPC0_KRN_ERR) / 6;
 +}
 +
 +static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GAUDI_EVENT_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n",
 +                      event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u64 data = le64_to_cpu(eq_entry->data[0]), event_mask = 0;
 +      u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      u32 fw_fatal_err_flag = 0, flags = 0;
 +      u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 +                      >> EQ_CTL_EVENT_TYPE_SHIFT);
 +      bool reset_required, reset_direct = false;
 +      u8 cause;
 +      int rc;
 +
 +      if (event_type >= GAUDI_EVENT_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GAUDI_EVENT_SIZE - 1);
 +              return;
 +      }
 +
 +      gaudi->events_stat[event_type]++;
 +      gaudi->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GAUDI_EVENT_PCIE_CORE_DERR:
 +      case GAUDI_EVENT_PCIE_IF_DERR:
 +      case GAUDI_EVENT_PCIE_PHY_DERR:
 +      case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
 +      case GAUDI_EVENT_MME0_ACC_DERR:
 +      case GAUDI_EVENT_MME0_SBAB_DERR:
 +      case GAUDI_EVENT_MME1_ACC_DERR:
 +      case GAUDI_EVENT_MME1_SBAB_DERR:
 +      case GAUDI_EVENT_MME2_ACC_DERR:
 +      case GAUDI_EVENT_MME2_SBAB_DERR:
 +      case GAUDI_EVENT_MME3_ACC_DERR:
 +      case GAUDI_EVENT_MME3_SBAB_DERR:
 +      case GAUDI_EVENT_DMA0_DERR_ECC ... GAUDI_EVENT_DMA7_DERR_ECC:
 +              fallthrough;
 +      case GAUDI_EVENT_CPU_IF_ECC_DERR:
 +      case GAUDI_EVENT_PSOC_MEM_DERR:
 +      case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
 +      case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
 +      case GAUDI_EVENT_NIC0_DERR ... GAUDI_EVENT_NIC4_DERR:
 +      case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
 +      case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
 +      case GAUDI_EVENT_MMU_DERR:
 +      case GAUDI_EVENT_NIC0_CS_DBG_DERR ... GAUDI_EVENT_NIC4_CS_DBG_DERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_GIC500:
 +      case GAUDI_EVENT_AXI_ECC:
 +      case GAUDI_EVENT_L2_RAM_ECC:
 +      case GAUDI_EVENT_PLL0 ... GAUDI_EVENT_PLL17:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_HBM0_SPI_0:
 +      case GAUDI_EVENT_HBM1_SPI_0:
 +      case GAUDI_EVENT_HBM2_SPI_0:
 +      case GAUDI_EVENT_HBM3_SPI_0:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_hbm_read_interrupts(hdev,
 +                              gaudi_hbm_event_to_dev(event_type),
 +                              &eq_entry->hbm_ecc_data);
 +              fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_HBM0_SPI_1:
 +      case GAUDI_EVENT_HBM1_SPI_1:
 +      case GAUDI_EVENT_HBM2_SPI_1:
 +      case GAUDI_EVENT_HBM3_SPI_1:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_hbm_read_interrupts(hdev,
 +                              gaudi_hbm_event_to_dev(event_type),
 +                              &eq_entry->hbm_ecc_data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_TPC0_DEC:
 +      case GAUDI_EVENT_TPC1_DEC:
 +      case GAUDI_EVENT_TPC2_DEC:
 +      case GAUDI_EVENT_TPC3_DEC:
 +      case GAUDI_EVENT_TPC4_DEC:
 +      case GAUDI_EVENT_TPC5_DEC:
 +      case GAUDI_EVENT_TPC6_DEC:
 +      case GAUDI_EVENT_TPC7_DEC:
 +              /* In TPC DEC event, notify on TPC assertion. While there isn't
 +               * a specific event for assertion yet, the FW generates TPC DEC event.
 +               * The SW upper layer will inspect an internal mapped area to indicate
 +               * if the event is a TPC Assertion or a "real" TPC DEC.
 +               */
 +              event_mask |= HL_NOTIFIER_EVENT_TPC_ASSERT;
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              reset_required = gaudi_tpc_read_interrupts(hdev,
 +                                      tpc_dec_event_to_tpc_id(event_type),
 +                                      "AXI_SLV_DEC_Error");
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (reset_required) {
 +                      dev_err(hdev->dev, "reset required due to %s\n",
 +                              gaudi_irq_map_table[event_type].name);
 +
 +                      reset_direct = true;
 +                      goto reset_device;
 +              } else {
 +                      hl_fw_unmask_irq(hdev, event_type);
 +                      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +              }
 +              break;
 +
 +      case GAUDI_EVENT_TPC0_KRN_ERR:
 +      case GAUDI_EVENT_TPC1_KRN_ERR:
 +      case GAUDI_EVENT_TPC2_KRN_ERR:
 +      case GAUDI_EVENT_TPC3_KRN_ERR:
 +      case GAUDI_EVENT_TPC4_KRN_ERR:
 +      case GAUDI_EVENT_TPC5_KRN_ERR:
 +      case GAUDI_EVENT_TPC6_KRN_ERR:
 +      case GAUDI_EVENT_TPC7_KRN_ERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              reset_required = gaudi_tpc_read_interrupts(hdev,
 +                                      tpc_krn_event_to_tpc_id(event_type),
 +                                      "KRN_ERR");
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (reset_required) {
 +                      dev_err(hdev->dev, "reset required due to %s\n",
 +                              gaudi_irq_map_table[event_type].name);
 +
 +                      reset_direct = true;
 +                      goto reset_device;
 +              } else {
 +                      hl_fw_unmask_irq(hdev, event_type);
 +                      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +              }
 +              break;
 +
 +      case GAUDI_EVENT_PCIE_CORE_SERR:
 +      case GAUDI_EVENT_PCIE_IF_SERR:
 +      case GAUDI_EVENT_PCIE_PHY_SERR:
 +      case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
 +      case GAUDI_EVENT_MME0_ACC_SERR:
 +      case GAUDI_EVENT_MME0_SBAB_SERR:
 +      case GAUDI_EVENT_MME1_ACC_SERR:
 +      case GAUDI_EVENT_MME1_SBAB_SERR:
 +      case GAUDI_EVENT_MME2_ACC_SERR:
 +      case GAUDI_EVENT_MME2_SBAB_SERR:
 +      case GAUDI_EVENT_MME3_ACC_SERR:
 +      case GAUDI_EVENT_MME3_SBAB_SERR:
 +      case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_DMA7_SERR_ECC:
 +      case GAUDI_EVENT_CPU_IF_ECC_SERR:
 +      case GAUDI_EVENT_PSOC_MEM_SERR:
 +      case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
 +      case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
 +      case GAUDI_EVENT_NIC0_SERR ... GAUDI_EVENT_NIC4_SERR:
 +      case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
 +      case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
 +              fallthrough;
 +      case GAUDI_EVENT_MMU_SERR:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_PCIE_DEC:
 +      case GAUDI_EVENT_CPU_AXI_SPLITTER:
 +      case GAUDI_EVENT_PSOC_AXI_DEC:
 +      case GAUDI_EVENT_PSOC_PRSTN_FALL:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_MMU_PAGE_FAULT:
 +      case GAUDI_EVENT_MMU_WR_PERM:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_MME0_WBC_RSP:
 +      case GAUDI_EVENT_MME0_SBAB0_RSP:
 +      case GAUDI_EVENT_MME1_WBC_RSP:
 +      case GAUDI_EVENT_MME1_SBAB0_RSP:
 +      case GAUDI_EVENT_MME2_WBC_RSP:
 +      case GAUDI_EVENT_MME2_SBAB0_RSP:
 +      case GAUDI_EVENT_MME3_WBC_RSP:
 +      case GAUDI_EVENT_MME3_SBAB0_RSP:
 +      case GAUDI_EVENT_RAZWI_OR_ADC:
 +      case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
 +      case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
 +              fallthrough;
 +      case GAUDI_EVENT_NIC0_QM0:
 +      case GAUDI_EVENT_NIC0_QM1:
 +      case GAUDI_EVENT_NIC1_QM0:
 +      case GAUDI_EVENT_NIC1_QM1:
 +      case GAUDI_EVENT_NIC2_QM0:
 +      case GAUDI_EVENT_NIC2_QM1:
 +      case GAUDI_EVENT_NIC3_QM0:
 +      case GAUDI_EVENT_NIC3_QM1:
 +      case GAUDI_EVENT_NIC4_QM0:
 +      case GAUDI_EVENT_NIC4_QM1:
 +      case GAUDI_EVENT_DMA0_CORE ... GAUDI_EVENT_DMA7_CORE:
 +      case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              gaudi_handle_qman_err(hdev, event_type, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= (HL_NOTIFIER_EVENT_USER_ENGINE_ERR | HL_NOTIFIER_EVENT_DEVICE_RESET);
 +              break;
 +
 +      case GAUDI_EVENT_RAZWI_OR_ADC_SW:
 +              gaudi_print_irq_info(hdev, event_type, true, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_TPC0_BMON_SPMU:
 +      case GAUDI_EVENT_TPC1_BMON_SPMU:
 +      case GAUDI_EVENT_TPC2_BMON_SPMU:
 +      case GAUDI_EVENT_TPC3_BMON_SPMU:
 +      case GAUDI_EVENT_TPC4_BMON_SPMU:
 +      case GAUDI_EVENT_TPC5_BMON_SPMU:
 +      case GAUDI_EVENT_TPC6_BMON_SPMU:
 +      case GAUDI_EVENT_TPC7_BMON_SPMU:
 +      case GAUDI_EVENT_DMA_BM_CH0 ... GAUDI_EVENT_DMA_BM_CH7:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4:
 +              gaudi_print_nic_axi_irq_info(hdev, event_type, &data);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_DMA_IF_SEI_0 ... GAUDI_EVENT_DMA_IF_SEI_3:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_sm_sei_info(hdev, event_type,
 +                                      &eq_entry->sm_sei_data);
 +              rc = hl_state_dump(hdev);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Error during system state dump %d\n", rc);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GAUDI_EVENT_STATUS_NIC0_ENG0 ... GAUDI_EVENT_STATUS_NIC4_ENG1:
 +              break;
 +
 +      case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
 +              gaudi_print_clk_change_info(hdev, event_type, &event_mask);
 +              hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GAUDI_EVENT_PSOC_GPIO_U16_0:
 +              cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
 +              dev_err(hdev->dev,
 +                      "Received high temp H/W interrupt %d (cause %d)\n",
 +                      event_type, cause);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI_EVENT_DEV_RESET_REQ:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_PKT_QUEUE_OUT_SYNC:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      case GAUDI_EVENT_FW_ALIVE_S:
 +              gaudi_print_irq_info(hdev, event_type, false, &event_mask);
 +              gaudi_print_fw_alive_info(hdev, &eq_entry->fw_alive);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              goto reset_device;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
 +                              event_type);
 +              break;
 +      }
 +
 +      if (event_mask)
 +              hl_notifier_event_send_all(hdev, event_mask);
 +
 +      return;
 +
 +reset_device:
 +      reset_required = true;
 +
 +      if (hdev->asic_prop.fw_security_enabled && !reset_direct) {
 +              flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW | fw_fatal_err_flag;
 +
 +              /* notify on device unavailable while the reset triggered by fw */
 +              event_mask |= (HL_NOTIFIER_EVENT_DEVICE_RESET |
 +                                      HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE);
 +      } else if (hdev->hard_reset_on_fw_events) {
 +              flags = HL_DRV_RESET_HARD | HL_DRV_RESET_DELAY | fw_fatal_err_flag;
 +              event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +      } else {
 +              reset_required = false;
 +      }
 +
 +      if (reset_required) {
 +              hl_device_cond_reset(hdev, flags, event_mask);
 +      } else {
 +              hl_fw_unmask_irq(hdev, event_type);
 +              /* Notification on occurred event needs to be sent although reset is not executed */
 +              if (event_mask)
 +                      hl_notifier_event_send_all(hdev, event_mask);
 +      }
 +}
 +
 +static void *gaudi_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(gaudi->events_stat_aggregate);
 +              return gaudi->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(gaudi->events_stat);
 +      return gaudi->events_stat;
 +}
 +
 +static int gaudi_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU) ||
 +              hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      /* L0 & L1 invalidation */
 +      WREG32(mmSTLB_INV_PS, 3);
 +      WREG32(mmSTLB_CACHE_INV, gaudi->mmu_cache_inv_pi++);
 +      WREG32(mmSTLB_INV_PS, 2);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmSTLB_INV_PS,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      WREG32(mmSTLB_INV_SET, 0);
 +
 +      return rc;
 +}
 +
 +static int gaudi_mmu_invalidate_cache_range(struct hl_device *hdev,
 +                                              bool is_hard, u32 flags,
 +                                              u32 asid, u64 va, u64 size)
 +{
 +      /* Treat as invalidate all because there is no range invalidation
 +       * in Gaudi
 +       */
 +      return hdev->asic_funcs->mmu_invalidate_cache(hdev, is_hard, flags);
 +}
 +
 +static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid, u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(MMU_ASID, asid);
 +      WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
 +      WREG32(MMU_BUSY, 0x80000000);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              MMU_BUSY,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +static int gaudi_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
 +                                      mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                      mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
 +                              CARD_NAME_MAX_LEN);
 +
 +      hdev->card_type = le32_to_cpu(hdev->asic_prop.cpucp_info.card_type);
 +
 +      set_default_power_values(hdev);
 +
 +      return 0;
 +}
 +
 +static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +              struct engines_data *e)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
 +      const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
 +      const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
 +      unsigned long *mask = (unsigned long *)mask_arr;
 +      u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
 +      bool is_idle = true, is_eng_idle, is_slave;
 +      u64 offset;
 +      int i, dma_id, port;
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nDMA  is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_STS0\n"
 +                      "---  -------  ------------  ----------  -------------\n");
 +
 +      for (i = 0 ; i < DMA_NUMBER_OF_CHNLS ; i++) {
 +              dma_id = gaudi_dma_assignment[i];
 +              offset = dma_id * DMA_QMAN_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + offset);
 +              qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + offset);
 +              dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                              IS_DMA_IDLE(dma_core_sts0);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_DMA_0 + dma_id, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, dma_id,
 +                              is_eng_idle ? "Y" : "N", qm_glbl_sts0,
 +                              qm_cgm_sts, dma_core_sts0);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nTPC  is_idle  QM_GLBL_STS0  QM_CGM_STS  CFG_STATUS\n"
 +                      "---  -------  ------------  ----------  ----------\n");
 +
 +      for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
 +              offset = i * TPC_QMAN_OFFSET;
 +              qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + offset);
 +              qm_cgm_sts = RREG32(mmTPC0_QM_CGM_STS + offset);
 +              tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + offset);
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
 +                              IS_TPC_IDLE(tpc_cfg_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_TPC_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, i,
 +                              is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nMME  is_idle  QM_GLBL_STS0  QM_CGM_STS  ARCH_STATUS\n"
 +                      "---  -------  ------------  ----------  -----------\n");
 +
 +      for (i = 0 ; i < MME_NUMBER_OF_ENGINES ; i++) {
 +              offset = i * MME_QMAN_OFFSET;
 +              mme_arch_sts = RREG32(mmMME0_CTRL_ARCH_STATUS + offset);
 +              is_eng_idle = IS_MME_IDLE(mme_arch_sts);
 +
 +              /* MME 1 & 3 are slaves, no need to check their QMANs */
 +              is_slave = i % 2;
 +              if (!is_slave) {
 +                      qm_glbl_sts0 = RREG32(mmMME0_QM_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmMME0_QM_CGM_STS + offset);
 +                      is_eng_idle &= IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +              }
 +
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GAUDI_ENGINE_ID_MME_0 + i, mask);
 +              if (e) {
 +                      if (!is_slave)
 +                              hl_engine_data_sprintf(e, fmt, i,
 +                                      is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, mme_arch_sts);
 +                      else
 +                              hl_engine_data_sprintf(e, mme_slave_fmt, i,
 +                                      is_eng_idle ? "Y" : "N", "-",
 +                                      "-", mme_arch_sts);
 +              }
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                              "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
 +                              "---  -------  ------------  ----------\n");
 +
 +      for (i = 0 ; i < (NIC_NUMBER_OF_ENGINES / 2) ; i++) {
 +              offset = i * NIC_MACRO_QMAN_OFFSET;
 +              port = 2 * i;
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
 +                      qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
 +                      if (e)
 +                              hl_engine_data_sprintf(e, nic_fmt, port,
 +                                              is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +              }
 +
 +              port = 2 * i + 1;
 +              if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
 +                      qm_glbl_sts0 = RREG32(mmNIC0_QM1_GLBL_STS0 + offset);
 +                      qm_cgm_sts = RREG32(mmNIC0_QM1_CGM_STS + offset);
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
 +                      if (e)
 +                              hl_engine_data_sprintf(e, nic_fmt, port,
 +                                              is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +              }
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e, "\n");
 +
 +      return is_idle;
 +}
 +
 +static void gaudi_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&gaudi->hw_queues_lock)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      spin_lock(&gaudi->hw_queues_lock);
 +}
 +
 +static void gaudi_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&gaudi->hw_queues_lock)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      spin_unlock(&gaudi->hw_queues_lock);
 +}
 +
 +static u32 gaudi_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int gaudi_get_eeprom_data(struct hl_device *hdev, void *data,
 +                              size_t max_size)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static int gaudi_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_monitor_dump(hdev, data);
 +}
 +
 +/*
 + * this function should be used only during initialization and/or after reset,
 + * when there are no active users.
 + */
 +static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,       u32 tpc_id)
 +{
 +      u64 kernel_timeout;
 +      u32 status, offset;
 +      int rc;
 +
 +      offset = tpc_id * (mmTPC1_CFG_STATUS - mmTPC0_CFG_STATUS);
 +
 +      if (hdev->pldm)
 +              kernel_timeout = GAUDI_PLDM_TPC_KERNEL_WAIT_USEC;
 +      else
 +              kernel_timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_LOW + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_HIGH + offset,
 +                      upper_32_bits(tpc_kernel));
 +
 +      WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_LOW + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_HIGH + offset,
 +                      upper_32_bits(tpc_kernel));
 +      /* set a valid LUT pointer, content is of no significance */
 +      WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_LO + offset,
 +                      lower_32_bits(tpc_kernel));
 +      WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_HI + offset,
 +                      upper_32_bits(tpc_kernel));
 +
 +      WREG32(mmTPC0_CFG_QM_SYNC_OBJECT_ADDR + offset,
 +                      lower_32_bits(CFG_BASE +
 +                              mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0));
 +
 +      WREG32(mmTPC0_CFG_TPC_CMD + offset,
 +                      (1 << TPC0_CFG_TPC_CMD_ICACHE_INVALIDATE_SHIFT |
 +                      1 << TPC0_CFG_TPC_CMD_ICACHE_PREFETCH_64KB_SHIFT));
 +      /* wait a bit for the engine to start executing */
 +      usleep_range(1000, 1500);
 +
 +      /* wait until engine has finished executing */
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_STATUS + offset,
 +              status,
 +              (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
 +                              TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d icache prefetch\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      WREG32(mmTPC0_CFG_TPC_EXECUTE + offset,
 +                      1 << TPC0_CFG_TPC_EXECUTE_V_SHIFT);
 +
 +      /* wait a bit for the engine to start executing */
 +      usleep_range(1000, 1500);
 +
 +      /* wait until engine has finished executing */
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_STATUS + offset,
 +              status,
 +              (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
 +                              TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d vector pipe\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_WQ_INFLIGHT_CNTR + offset,
 +              status,
 +              (status == 0),
 +              1000,
 +              kernel_timeout);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d kernel to execute\n",
 +                      tpc_id);
 +              return -EIO;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi_internal_cb_pool_init(struct hl_device *hdev,
 +              struct hl_ctx *ctx)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +      int min_alloc_order, rc, collective_cb_size;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
 +                                                      HOST_SPACE_INTERNAL_CB_SZ,
 +                                                      &hdev->internal_cb_pool_dma_addr,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->internal_cb_pool_virt_addr)
 +              return -ENOMEM;
 +
 +      collective_cb_size = sizeof(struct packet_msg_short) * 5 +
 +                      sizeof(struct packet_fence);
 +      min_alloc_order = ilog2(collective_cb_size);
 +
 +      hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
 +      if (!hdev->internal_cb_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create internal CB pool\n");
 +              rc = -ENOMEM;
 +              goto free_internal_cb_pool;
 +      }
 +
 +      rc = gen_pool_add(hdev->internal_cb_pool,
 +                              (uintptr_t) hdev->internal_cb_pool_virt_addr,
 +                              HOST_SPACE_INTERNAL_CB_SZ, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to internal CB pool\n");
 +              rc = -EFAULT;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx,
 +                      HL_VA_RANGE_TYPE_HOST, HOST_SPACE_INTERNAL_CB_SZ,
 +                      HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +
 +      if (!hdev->internal_cb_va_base) {
 +              rc = -ENOMEM;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base,
 +                      hdev->internal_cb_pool_dma_addr,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +
 +      hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      if (rc)
 +              goto unreserve_internal_cb_pool;
 +
 +      return 0;
 +
 +unreserve_internal_cb_pool:
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +destroy_internal_cb_pool:
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +free_internal_cb_pool:
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +
 +      return rc;
 +}
 +
 +static void gaudi_internal_cb_pool_fini(struct hl_device *hdev,
 +              struct hl_ctx *ctx)
 +{
 +      struct gaudi_device *gaudi = hdev->asic_specific;
 +
 +      if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
 +                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +}
 +
 +static int gaudi_ctx_init(struct hl_ctx *ctx)
 +{
 +      int rc;
 +
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return 0;
 +
 +      rc = gaudi_internal_cb_pool_init(ctx->hdev, ctx);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi_restore_user_registers(ctx->hdev);
 +      if (rc)
 +              gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      return rc;
 +}
 +
 +static void gaudi_ctx_fini(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return;
 +
 +      gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
 +}
 +
 +static int gaudi_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static u32 gaudi_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return gaudi_cq_assignment[cq_idx];
 +}
 +
 +static u32 gaudi_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) +
 +                      sizeof(struct packet_msg_prot) * 2;
 +}
 +
 +static u32 gaudi_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) * 4 +
 +                      sizeof(struct packet_fence) +
 +                      sizeof(struct packet_msg_prot) * 2;
 +}
 +
 +static u32 gaudi_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + (sob_id * 4);
 +}
 +
 +static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb)
 +{
 +      struct hl_cb *cb = (struct hl_cb *) data;
 +      struct packet_msg_short *pkt;
 +      u32 value, ctl, pkt_size = sizeof(*pkt);
 +
 +      pkt = cb->kernel_address + size;
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Inc by 1, Mode ADD */
 +      value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 3); /* W_S SOB base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, eb);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return size + pkt_size;
 +}
 +
 +static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
 +                                      u16 addr)
 +{
 +      u32 ctl, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2);  /* W_S MON base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 0); /* last pkt MB */
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
 +              struct packet_msg_short *pkt, u16 sob_base, u8 sob_mask,
 +              u16 sob_val, u16 mon_id)
 +{
 +      u64 monitor_base;
 +      u32 ctl, value, pkt_size = sizeof(*pkt);
 +      u16 msg_addr_offset;
 +      u8 mask;
 +
 +      if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
 +              dev_err(hdev->dev,
 +                      "sob_base %u (mask %#x) is not valid\n",
 +                      sob_base, sob_mask);
 +              return 0;
 +      }
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Monitor config packet: bind the monitor to a sync object */
 +      value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MODE_MASK,
 +                      0); /* GREATER OR EQUAL*/
 +      value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MASK_MASK, mask);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
 +      ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
 +{
 +      u32 ctl, cfg, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      cfg = FIELD_PREP(GAUDI_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_ID_MASK, 2);
 +
 +      ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->cfg = cpu_to_le32(cfg);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static int gaudi_get_fence_addr(struct hl_device *hdev, u32 queue_id, u64 *addr)
 +{
 +      u32 offset, nic_index;
 +
 +      switch (queue_id) {
 +      case GAUDI_QUEUE_ID_DMA_0_0:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_1:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_2:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_0_3:
 +              offset = mmDMA0_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_0:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_1:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_2:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_1_3:
 +              offset = mmDMA1_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_0:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_1:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_2:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_DMA_5_3:
 +              offset = mmDMA5_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_0:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_0;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_1:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_1;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_2:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_2;
 +              break;
 +      case GAUDI_QUEUE_ID_TPC_7_3:
 +              offset = mmTPC7_QM_CP_FENCE2_RDATA_3;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_0:
 +      case GAUDI_QUEUE_ID_NIC_1_0:
 +      case GAUDI_QUEUE_ID_NIC_2_0:
 +      case GAUDI_QUEUE_ID_NIC_3_0:
 +      case GAUDI_QUEUE_ID_NIC_4_0:
 +      case GAUDI_QUEUE_ID_NIC_5_0:
 +      case GAUDI_QUEUE_ID_NIC_6_0:
 +      case GAUDI_QUEUE_ID_NIC_7_0:
 +      case GAUDI_QUEUE_ID_NIC_8_0:
 +      case GAUDI_QUEUE_ID_NIC_9_0:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_0) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_0 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_1:
 +      case GAUDI_QUEUE_ID_NIC_1_1:
 +      case GAUDI_QUEUE_ID_NIC_2_1:
 +      case GAUDI_QUEUE_ID_NIC_3_1:
 +      case GAUDI_QUEUE_ID_NIC_4_1:
 +      case GAUDI_QUEUE_ID_NIC_5_1:
 +      case GAUDI_QUEUE_ID_NIC_6_1:
 +      case GAUDI_QUEUE_ID_NIC_7_1:
 +      case GAUDI_QUEUE_ID_NIC_8_1:
 +      case GAUDI_QUEUE_ID_NIC_9_1:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_1) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_1 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_2:
 +      case GAUDI_QUEUE_ID_NIC_1_2:
 +      case GAUDI_QUEUE_ID_NIC_2_2:
 +      case GAUDI_QUEUE_ID_NIC_3_2:
 +      case GAUDI_QUEUE_ID_NIC_4_2:
 +      case GAUDI_QUEUE_ID_NIC_5_2:
 +      case GAUDI_QUEUE_ID_NIC_6_2:
 +      case GAUDI_QUEUE_ID_NIC_7_2:
 +      case GAUDI_QUEUE_ID_NIC_8_2:
 +      case GAUDI_QUEUE_ID_NIC_9_2:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_2) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_2 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      case GAUDI_QUEUE_ID_NIC_0_3:
 +      case GAUDI_QUEUE_ID_NIC_1_3:
 +      case GAUDI_QUEUE_ID_NIC_2_3:
 +      case GAUDI_QUEUE_ID_NIC_3_3:
 +      case GAUDI_QUEUE_ID_NIC_4_3:
 +      case GAUDI_QUEUE_ID_NIC_5_3:
 +      case GAUDI_QUEUE_ID_NIC_6_3:
 +      case GAUDI_QUEUE_ID_NIC_7_3:
 +      case GAUDI_QUEUE_ID_NIC_8_3:
 +      case GAUDI_QUEUE_ID_NIC_9_3:
 +              nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_3) >> 2;
 +              offset = mmNIC0_QM0_CP_FENCE2_RDATA_3 +
 +                              (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
 +                              (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
 +              break;
 +      default:
 +              return -EINVAL;
 +      }
 +
 +      *addr = CFG_BASE + offset;
 +
 +      return 0;
 +}
 +
 +static u32 gaudi_add_mon_pkts(void *buf, u16 mon_id, u64 fence_addr)
 +{
 +      u64 monitor_base;
 +      u32 size = 0;
 +      u16 msg_addr_offset;
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      /* First monitor config packet: low address of the sync */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
 +                                      msg_addr_offset);
 +
 +      /* Second monitor config packet: high address of the sync */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
 +                                      msg_addr_offset);
 +
 +      /*
 +       * Third monitor config packet: the payload, i.e. what to write when the
 +       * sync triggers
 +       */
 +      msg_addr_offset =
 +              (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 +
 +      return size;
 +}
 +
 +static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
 +                              struct hl_gen_wait_properties *prop)
 +{
 +      struct hl_cb *cb = (struct hl_cb *) prop->data;
 +      void *buf = cb->kernel_address;
 +      u64 fence_addr = 0;
 +      u32 size = prop->size;
 +
 +      if (gaudi_get_fence_addr(hdev, prop->q_idx, &fence_addr)) {
 +              dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
 +                              prop->q_idx);
 +              return 0;
 +      }
 +
 +      size += gaudi_add_mon_pkts(buf + size, prop->mon_id, fence_addr);
 +      size += gaudi_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base,
 +                      prop->sob_mask, prop->sob_val, prop->mon_id);
 +      size += gaudi_add_fence_pkt(buf + size);
 +
 +      return size;
 +}
 +
 +static void gaudi_reset_sob(struct hl_device *hdev, void *data)
 +{
 +      struct hl_hw_sob *hw_sob = (struct hl_hw_sob *) data;
 +
 +      dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx,
 +              hw_sob->sob_id);
 +
 +      WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      hw_sob->sob_id * 4, 0);
 +
 +      kref_init(&hw_sob->kref);
 +}
 +
 +static u64 gaudi_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int gaudi_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                              u32 *block_size, u32 *block_id)
 +{
 +      return -EPERM;
 +}
 +
 +static int gaudi_block_mmap(struct hl_device *hdev,
 +                              struct vm_area_struct *vma,
 +                              u32 block_id, u32 block_size)
 +{
 +      return -EPERM;
 +}
 +
 +static void gaudi_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      struct cpu_dyn_regs *dyn_regs =
 +                      &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
 +                      mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
 +                      le32_to_cpu(dyn_regs->gic_host_ints_irq);
 +
 +      WREG32(irq_handler_offset,
 +              gaudi_irq_map_table[GAUDI_EVENT_INTS_REGISTER].cpu_id);
 +}
 +
 +static int gaudi_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      return -EINVAL;
 +}
 +
 +static int gaudi_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GAUDI_CPU_PLL: return CPU_PLL;
 +      case HL_GAUDI_PCI_PLL: return PCI_PLL;
 +      case HL_GAUDI_NIC_PLL: return NIC_PLL;
 +      case HL_GAUDI_DMA_PLL: return DMA_PLL;
 +      case HL_GAUDI_MESH_PLL: return MESH_PLL;
 +      case HL_GAUDI_MME_PLL: return MME_PLL;
 +      case HL_GAUDI_TPC_PLL: return TPC_PLL;
 +      case HL_GAUDI_IF_PLL: return IF_PLL;
 +      case HL_GAUDI_SRAM_PLL: return SRAM_PLL;
 +      case HL_GAUDI_HBM_PLL: return HBM_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int gaudi_add_sync_to_engine_map_entry(
 +      struct hl_sync_to_engine_map *map, u32 reg_value,
 +      enum hl_sync_engine_type engine_type, u32 engine_id)
 +{
 +      struct hl_sync_to_engine_map_entry *entry;
 +
 +      /* Reg value represents a partial address of sync object,
 +       * it is used as unique identifier. For this we need to
 +       * clear the cutoff cfg base bits from the value.
 +       */
 +      if (reg_value == 0 || reg_value == 0xffffffff)
 +              return 0;
 +      reg_value -= lower_32_bits(CFG_BASE);
 +
 +      /* create a new hash entry */
 +      entry = kzalloc(sizeof(*entry), GFP_KERNEL);
 +      if (!entry)
 +              return -ENOMEM;
 +      entry->engine_type = engine_type;
 +      entry->engine_id = engine_id;
 +      entry->sync_id = reg_value;
 +      hash_add(map->tb, &entry->node, reg_value);
 +
 +      return 0;
 +}
 +
 +static int gaudi_gen_sync_to_engine_map(struct hl_device *hdev,
 +                              struct hl_sync_to_engine_map *map)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int i, j, rc;
 +      u32 reg_value;
 +
 +      /* Iterate over TPC engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_TPC_ENGINES]; ++i) {
 +
 +              reg_value = RREG32(sds->props[SP_TPC0_CFG_SO] +
 +                                      sds->props[SP_NEXT_TPC] * i);
 +
 +              rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
 +                                                      ENGINE_TPC, i);
 +              if (rc)
 +                      goto free_sync_to_engine_map;
 +      }
 +
 +      /* Iterate over MME engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_MME_ENGINES]; ++i) {
 +              for (j = 0; j < sds->props[SP_SUB_MME_ENG_NUM]; ++j) {
 +
 +                      reg_value = RREG32(sds->props[SP_MME_CFG_SO] +
 +                                              sds->props[SP_NEXT_MME] * i +
 +                                              j * sizeof(u32));
 +
 +                      rc = gaudi_add_sync_to_engine_map_entry(
 +                              map, reg_value, ENGINE_MME,
 +                              i * sds->props[SP_SUB_MME_ENG_NUM] + j);
 +                      if (rc)
 +                              goto free_sync_to_engine_map;
 +              }
 +      }
 +
 +      /* Iterate over DMA engines */
 +      for (i = 0; i < sds->props[SP_NUM_OF_DMA_ENGINES]; ++i) {
 +              reg_value = RREG32(sds->props[SP_DMA_CFG_SO] +
 +                                      sds->props[SP_DMA_QUEUES_OFFSET] * i);
 +              rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
 +                                                      ENGINE_DMA, i);
 +              if (rc)
 +                      goto free_sync_to_engine_map;
 +      }
 +
 +      return 0;
 +
 +free_sync_to_engine_map:
 +      hl_state_dump_free_sync_to_engine_map(map);
 +
 +      return rc;
 +}
 +
 +static int gaudi_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      return FIELD_GET(
 +              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_VALID_MASK,
 +              mon->status);
 +}
 +
 +static void gaudi_fill_sobs_from_mon(char *sobs, struct hl_mon_state_dump *mon)
 +{
 +      const size_t max_write = 10;
 +      u32 gid, mask, sob;
 +      int i, offset;
 +
 +      /* Sync object ID is calculated as follows:
 +       * (8 * group_id + cleared bits in mask)
 +       */
 +      gid = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
 +                      mon->arm_data);
 +      mask = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
 +                      mon->arm_data);
 +
 +      for (i = 0, offset = 0; mask && offset < MONITOR_SOB_STRING_SIZE -
 +              max_write; mask >>= 1, i++) {
 +              if (!(mask & 1)) {
 +                      sob = gid * MONITOR_MAX_SOBS + i;
 +
 +                      if (offset > 0)
 +                              offset += snprintf(sobs + offset, max_write,
 +                                                      ", ");
 +
 +                      offset += snprintf(sobs + offset, max_write, "%u", sob);
 +              }
 +      }
 +}
 +
 +static int gaudi_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev,
 +                              struct hl_mon_state_dump *mon)
 +{
 +      const char *name;
 +      char scratch_buf1[BIN_REG_STRING_SIZE],
 +              scratch_buf2[BIN_REG_STRING_SIZE];
 +      char monitored_sobs[MONITOR_SOB_STRING_SIZE] = {0};
 +
 +      name = hl_state_dump_get_monitor_name(hdev, mon);
 +      if (!name)
 +              name = "";
 +
 +      gaudi_fill_sobs_from_mon(monitored_sobs, mon);
 +
 +      return hl_snprintf_resize(
 +              buf, size, offset,
 +              "Mon id: %u%s, wait for group id: %u mask %s to reach val: %u and write %u to address 0x%llx. Pending: %s. Means sync objects [%s] are being monitored.",
 +              mon->id, name,
 +              FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
 +                              mon->arm_data),
 +              hl_format_as_binary(
 +                      scratch_buf1, sizeof(scratch_buf1),
 +                      FIELD_GET(
 +                              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
 +                              mon->arm_data)),
 +              FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SOD_MASK,
 +                              mon->arm_data),
 +              mon->wr_data,
 +              (((u64)mon->wr_addr_high) << 32) | mon->wr_addr_low,
 +              hl_format_as_binary(
 +                      scratch_buf2, sizeof(scratch_buf2),
 +                      FIELD_GET(
 +                              SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_PENDING_MASK,
 +                              mon->status)),
 +              monitored_sobs);
 +}
 +
 +
 +static int gaudi_print_fences_single_engine(
 +      struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
 +      enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
 +      size_t *size, size_t *offset)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int rc = -ENOMEM, i;
 +      u32 *statuses, *fences;
 +
 +      statuses = kcalloc(sds->props[SP_ENGINE_NUM_OF_QUEUES],
 +                      sizeof(*statuses), GFP_KERNEL);
 +      if (!statuses)
 +              goto out;
 +
 +      fences = kcalloc(sds->props[SP_ENGINE_NUM_OF_FENCES] *
 +                              sds->props[SP_ENGINE_NUM_OF_QUEUES],
 +                       sizeof(*fences), GFP_KERNEL);
 +      if (!fences)
 +              goto free_status;
 +
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES]; ++i)
 +              statuses[i] = RREG32(status_base_offset + i * sizeof(u32));
 +
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES] *
 +                              sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i)
 +              fences[i] = RREG32(base_offset + i * sizeof(u32));
 +
 +      /* The actual print */
 +      for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i) {
 +              u32 fence_id;
 +              u64 fence_cnt, fence_rdata;
 +              const char *engine_name;
 +
 +              if (!FIELD_GET(TPC0_QM_CP_STS_0_FENCE_IN_PROGRESS_MASK,
 +                      statuses[i]))
 +                      continue;
 +
 +              fence_id =
 +                      FIELD_GET(TPC0_QM_CP_STS_0_FENCE_ID_MASK, statuses[i]);
 +              fence_cnt = base_offset + CFG_BASE +
 +                      sizeof(u32) *
 +                      (i + fence_id * sds->props[SP_ENGINE_NUM_OF_QUEUES]);
 +              fence_rdata = fence_cnt - sds->props[SP_FENCE0_CNT_OFFSET] +
 +                              sds->props[SP_FENCE0_RDATA_OFFSET];
 +              engine_name = hl_sync_engine_to_string(engine_type);
 +
 +              rc = hl_snprintf_resize(
 +                      buf, size, offset,
 +                      "%s%u, stream %u: fence id %u cnt = 0x%llx (%s%u_QM.CP_FENCE%u_CNT_%u) rdata = 0x%llx (%s%u_QM.CP_FENCE%u_RDATA_%u) value = %u, cp_status = %u\n",
 +                      engine_name, engine_id,
 +                      i, fence_id,
 +                      fence_cnt, engine_name, engine_id, fence_id, i,
 +                      fence_rdata, engine_name, engine_id, fence_id, i,
 +                      fences[fence_id],
 +                      statuses[i]);
 +              if (rc)
 +                      goto free_fences;
 +      }
 +
 +      rc = 0;
 +
 +free_fences:
 +      kfree(fences);
 +free_status:
 +      kfree(statuses);
 +out:
 +      return rc;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs gaudi_state_dump_funcs = {
 +      .monitor_valid = gaudi_monitor_valid,
 +      .print_single_monitor = gaudi_print_single_monitor,
 +      .gen_sync_to_engine_map = gaudi_gen_sync_to_engine_map,
 +      .print_fences_single_engine = gaudi_print_fences_single_engine,
 +};
 +
 +static void gaudi_state_dump_init(struct hl_device *hdev)
 +{
 +      struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
 +      int i;
 +
 +      for (i = 0; i < ARRAY_SIZE(gaudi_so_id_to_str); ++i)
 +              hash_add(sds->so_id_to_str_tb,
 +                      &gaudi_so_id_to_str[i].node,
 +                      gaudi_so_id_to_str[i].id);
 +
 +      for (i = 0; i < ARRAY_SIZE(gaudi_monitor_id_to_str); ++i)
 +              hash_add(sds->monitor_id_to_str_tb,
 +                      &gaudi_monitor_id_to_str[i].node,
 +                      gaudi_monitor_id_to_str[i].id);
 +
 +      sds->props = gaudi_state_dump_specs_props;
 +
 +      sds->sync_namager_names = gaudi_sync_manager_names;
 +
 +      sds->funcs = gaudi_state_dump_funcs;
 +}
 +
 +static u32 *gaudi_get_stream_master_qid_arr(void)
 +{
 +      return gaudi_stream_master;
 +}
 +
 +static int gaudi_set_dram_properties(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int gaudi_set_binning_masks(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static void gaudi_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +}
 +
 +static ssize_t infineon_ver_show(struct device *dev, struct device_attribute *attr, char *buf)
 +{
 +      struct hl_device *hdev = dev_get_drvdata(dev);
 +      struct cpucp_info *cpucp_info;
 +
 +      cpucp_info = &hdev->asic_prop.cpucp_info;
 +
 +      return sprintf(buf, "%#04x\n", le32_to_cpu(cpucp_info->infineon_version));
 +}
 +
 +static DEVICE_ATTR_RO(infineon_ver);
 +
 +static struct attribute *gaudi_vrm_dev_attrs[] = {
 +      &dev_attr_infineon_ver.attr,
 +      NULL,
 +};
 +
 +static void gaudi_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
 +                                      struct attribute_group *dev_vrm_attr_grp)
 +{
 +      hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
 +      dev_vrm_attr_grp->attrs = gaudi_vrm_dev_attrs;
 +}
 +
 +static int gaudi_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      return 0;
 +}
 +
 +static const struct hl_asic_funcs gaudi_funcs = {
 +      .early_init = gaudi_early_init,
 +      .early_fini = gaudi_early_fini,
 +      .late_init = gaudi_late_init,
 +      .late_fini = gaudi_late_fini,
 +      .sw_init = gaudi_sw_init,
 +      .sw_fini = gaudi_sw_fini,
 +      .hw_init = gaudi_hw_init,
 +      .hw_fini = gaudi_hw_fini,
 +      .halt_engines = gaudi_halt_engines,
 +      .suspend = gaudi_suspend,
 +      .resume = gaudi_resume,
 +      .mmap = gaudi_mmap,
 +      .ring_doorbell = gaudi_ring_doorbell,
 +      .pqe_write = gaudi_pqe_write,
 +      .asic_dma_alloc_coherent = gaudi_dma_alloc_coherent,
 +      .asic_dma_free_coherent = gaudi_dma_free_coherent,
 +      .scrub_device_mem = gaudi_scrub_device_mem,
 +      .scrub_device_dram = gaudi_scrub_device_dram,
 +      .get_int_queue_base = gaudi_get_int_queue_base,
 +      .test_queues = gaudi_test_queues,
 +      .asic_dma_pool_zalloc = gaudi_dma_pool_zalloc,
 +      .asic_dma_pool_free = gaudi_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = gaudi_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = gaudi_cpu_accessible_dma_pool_free,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = gaudi_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = gaudi_add_end_of_cb_packets,
 +      .update_eq_ci = gaudi_update_eq_ci,
 +      .context_switch = gaudi_context_switch,
 +      .restore_phase_topology = gaudi_restore_phase_topology,
 +      .debugfs_read_dma = gaudi_debugfs_read_dma,
 +      .add_device_attr = gaudi_add_device_attr,
 +      .handle_eqe = gaudi_handle_eqe,
 +      .get_events_stat = gaudi_get_events_stat,
 +      .read_pte = gaudi_read_pte,
 +      .write_pte = gaudi_write_pte,
 +      .mmu_invalidate_cache = gaudi_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = gaudi_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = gaudi_send_heartbeat,
 +      .debug_coresight = gaudi_debug_coresight,
 +      .is_device_idle = gaudi_is_device_idle,
 +      .compute_reset_late_init = gaudi_compute_reset_late_init,
 +      .hw_queues_lock = gaudi_hw_queues_lock,
 +      .hw_queues_unlock = gaudi_hw_queues_unlock,
 +      .get_pci_id = gaudi_get_pci_id,
 +      .get_eeprom_data = gaudi_get_eeprom_data,
 +      .get_monitor_dump = gaudi_get_monitor_dump,
 +      .send_cpu_message = gaudi_send_cpu_message,
 +      .pci_bars_map = gaudi_pci_bars_map,
 +      .init_iatu = gaudi_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = gaudi_halt_coresight,
 +      .ctx_init = gaudi_ctx_init,
 +      .ctx_fini = gaudi_ctx_fini,
 +      .pre_schedule_cs = gaudi_pre_schedule_cs,
 +      .get_queue_id_for_cq = gaudi_get_queue_id_for_cq,
 +      .load_firmware_to_device = gaudi_load_firmware_to_device,
 +      .load_boot_fit_to_device = gaudi_load_boot_fit_to_device,
 +      .get_signal_cb_size = gaudi_get_signal_cb_size,
 +      .get_wait_cb_size = gaudi_get_wait_cb_size,
 +      .gen_signal_cb = gaudi_gen_signal_cb,
 +      .gen_wait_cb = gaudi_gen_wait_cb,
 +      .reset_sob = gaudi_reset_sob,
 +      .reset_sob_group = gaudi_reset_sob_group,
 +      .get_device_time = gaudi_get_device_time,
 +      .pb_print_security_errors = NULL,
 +      .collective_wait_init_cs = gaudi_collective_wait_init_cs,
 +      .collective_wait_create_jobs = gaudi_collective_wait_create_jobs,
 +      .get_dec_base_addr = NULL,
 +      .scramble_addr = hl_mmu_scramble_addr,
 +      .descramble_addr = hl_mmu_descramble_addr,
 +      .ack_protection_bits_errors = gaudi_ack_protection_bits_errors,
 +      .get_hw_block_id = gaudi_get_hw_block_id,
 +      .hw_block_mmap = gaudi_block_mmap,
 +      .enable_events_from_fw = gaudi_enable_events_from_fw,
 +      .ack_mmu_errors = gaudi_ack_mmu_page_fault_or_access_error,
 +      .map_pll_idx_to_fw_idx = gaudi_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = gaudi_init_firmware_preload_params,
 +      .init_firmware_loader = gaudi_init_firmware_loader,
 +      .init_cpu_scrambler_dram = gaudi_init_scrambler_hbm,
 +      .state_dump_init = gaudi_state_dump_init,
 +      .get_sob_addr = gaudi_get_sob_addr,
 +      .set_pci_memory_regions = gaudi_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = gaudi_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = gaudi_check_if_razwi_happened,
 +      .mmu_get_real_page_size = hl_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = gaudi_set_hbm_bar_base,
 +      .send_device_activity = gaudi_send_device_activity,
 +      .set_dram_properties = gaudi_set_dram_properties,
 +      .set_binning_masks = gaudi_set_binning_masks,
 +};
 +
 +/**
 + * gaudi_set_asic_funcs - set GAUDI function pointers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +void gaudi_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &gaudi_funcs;
 +}
index f1f2a58ee68c2aaae77b77fc956172fbef501e93,0000000000000000000000000000000000000000..6f415fa94eee9d314aefa406cf11875da4bec3b7
mode 100644,000000..100644
--- /dev/null
@@@ -1,10735 -1,0 +1,10735 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2020-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "gaudi2P.h"
 +#include "gaudi2_masks.h"
 +#include "../include/gaudi2/gaudi2_special_blocks.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v2_0.h"
 +#include "../include/gaudi2/gaudi2_packets.h"
 +#include "../include/gaudi2/gaudi2_reg_map.h"
 +#include "../include/gaudi2/gaudi2_async_ids_map_extended.h"
 +#include "../include/gaudi2/arc/gaudi2_arc_common_packets.h"
 +
 +#include <linux/module.h>
 +#include <linux/pci.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +
 +#define GAUDI2_DMA_POOL_BLK_SIZE              SZ_256          /* 256 bytes */
 +
 +#define GAUDI2_RESET_TIMEOUT_MSEC             2000            /* 2000ms */
 +#define GAUDI2_RESET_POLL_TIMEOUT_USEC                50000           /* 50ms */
 +#define GAUDI2_PLDM_HRESET_TIMEOUT_MSEC               25000           /* 25s */
 +#define GAUDI2_PLDM_SRESET_TIMEOUT_MSEC               25000           /* 25s */
 +#define GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC   3000000         /* 3s */
 +#define GAUDI2_RESET_POLL_CNT                 3
 +#define GAUDI2_RESET_WAIT_MSEC                        1               /* 1ms */
 +#define GAUDI2_CPU_RESET_WAIT_MSEC            100             /* 100ms */
 +#define GAUDI2_PLDM_RESET_WAIT_MSEC           1000            /* 1s */
 +#define GAUDI2_CB_POOL_CB_CNT                 512
 +#define GAUDI2_CB_POOL_CB_SIZE                        SZ_128K         /* 128KB */
 +#define GAUDI2_MSG_TO_CPU_TIMEOUT_USEC                4000000         /* 4s */
 +#define GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC               25000000        /* 25s */
 +#define GAUDI2_TEST_QUEUE_WAIT_USEC           100000          /* 100ms */
 +#define GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC      1000000         /* 1s */
 +
 +#define GAUDI2_ALLOC_CPU_MEM_RETRY_CNT                3
 +
 +/*
 + * since the code already has built-in support for binning of up to MAX_FAULTY_TPCS TPCs
 + * and the code relies on that value (for array size etc..) we define another value
 + * for MAX faulty TPCs which reflects the cluster binning requirements
 + */
 +#define MAX_CLUSTER_BINNING_FAULTY_TPCS               1
 +#define MAX_FAULTY_XBARS                      1
 +#define MAX_FAULTY_EDMAS                      1
 +#define MAX_FAULTY_DECODERS                   1
 +
 +#define GAUDI2_TPC_FULL_MASK                  0x1FFFFFF
 +#define GAUDI2_HIF_HMMU_FULL_MASK             0xFFFF
 +#define GAUDI2_DECODER_FULL_MASK              0x3FF
 +
 +#define GAUDI2_NA_EVENT_CAUSE                 0xFF
 +#define GAUDI2_NUM_OF_QM_ERR_CAUSE            18
 +#define GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE                25
 +#define GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE                3
 +#define GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE               14
 +#define GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE               3
 +#define GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE                2
 +#define GAUDI2_NUM_OF_ROT_ERR_CAUSE           22
 +#define GAUDI2_NUM_OF_TPC_INTR_CAUSE          30
 +#define GAUDI2_NUM_OF_DEC_ERR_CAUSE           25
 +#define GAUDI2_NUM_OF_MME_ERR_CAUSE           16
 +#define GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE      5
 +#define GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE               7
 +#define GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE     8
 +#define GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE               19
 +#define GAUDI2_NUM_OF_HBM_SEI_CAUSE           9
 +#define GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE                3
 +#define GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE 3
 +#define GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE    2
 +#define GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE     2
 +#define GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE     2
 +#define GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE                5
 +
 +#define GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC     (MMU_CONFIG_TIMEOUT_USEC * 10)
 +#define GAUDI2_PLDM_MMU_TIMEOUT_USEC          (MMU_CONFIG_TIMEOUT_USEC * 200)
 +#define GAUDI2_ARB_WDT_TIMEOUT                        (0x1000000)
 +
 +#define GAUDI2_VDEC_TIMEOUT_USEC              10000           /* 10ms */
 +#define GAUDI2_PLDM_VDEC_TIMEOUT_USEC         (GAUDI2_VDEC_TIMEOUT_USEC * 100)
 +
 +#define KDMA_TIMEOUT_USEC                     USEC_PER_SEC
 +
 +#define IS_DMA_IDLE(dma_core_idle_ind_mask)   \
 +      (!((dma_core_idle_ind_mask) &           \
 +      ((DCORE0_EDMA0_CORE_IDLE_IND_MASK_DESC_CNT_STS_MASK) | \
 +      (DCORE0_EDMA0_CORE_IDLE_IND_MASK_COMP_MASK))))
 +
 +#define IS_MME_IDLE(mme_arch_sts) (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
 +
 +#define IS_TPC_IDLE(tpc_cfg_sts) (((tpc_cfg_sts) & (TPC_IDLE_MASK)) == (TPC_IDLE_MASK))
 +
 +#define IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) \
 +      ((((qm_glbl_sts0) & (QM_IDLE_MASK)) == (QM_IDLE_MASK)) && \
 +      (((qm_glbl_sts1) & (QM_ARC_IDLE_MASK)) == (QM_ARC_IDLE_MASK)) && \
 +      (((qm_cgm_sts) & (CGM_IDLE_MASK)) == (CGM_IDLE_MASK)))
 +
 +#define PCIE_DEC_EN_MASK                      0x300
 +#define DEC_WORK_STATE_IDLE                   0
 +#define DEC_WORK_STATE_PEND                   3
 +#define IS_DEC_IDLE(dec_swreg15) \
 +      (((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) == DEC_WORK_STATE_IDLE || \
 +      ((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) ==  DEC_WORK_STATE_PEND)
 +
 +/* HBM MMU address scrambling parameters */
 +#define GAUDI2_HBM_MMU_SCRM_MEM_SIZE          SZ_8M
 +#define GAUDI2_HBM_MMU_SCRM_DIV_SHIFT         26
 +#define GAUDI2_HBM_MMU_SCRM_MOD_SHIFT         0
 +#define GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK      DRAM_VA_HINT_MASK
 +#define GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR        16
 +#define MMU_RANGE_INV_VA_LSB_SHIFT            12
 +#define MMU_RANGE_INV_VA_MSB_SHIFT            44
 +#define MMU_RANGE_INV_EN_SHIFT                        0
 +#define MMU_RANGE_INV_ASID_EN_SHIFT           1
 +#define MMU_RANGE_INV_ASID_SHIFT              2
 +
 +/* The last SPI_SEI cause bit, "burst_fifo_full", is expected to be triggered in PMMU because it has
 + * a 2 entries FIFO, and hence it is not enabled for it.
 + */
 +#define GAUDI2_PMMU_SPI_SEI_ENABLE_MASK               GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 2, 0)
 +#define GAUDI2_HMMU_SPI_SEI_ENABLE_MASK               GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 1, 0)
 +
 +#define GAUDI2_MAX_STRING_LEN                 64
 +
 +#define GAUDI2_VDEC_MSIX_ENTRIES              (GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM - \
 +                                                      GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 1)
 +
 +#define ENGINE_ID_DCORE_OFFSET (GAUDI2_DCORE1_ENGINE_ID_EDMA_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0)
 +
 +enum hl_pmmu_fatal_cause {
 +      LATENCY_RD_OUT_FIFO_OVERRUN,
 +      LATENCY_WR_OUT_FIFO_OVERRUN,
 +};
 +
 +enum hl_pcie_drain_ind_cause {
 +      LBW_AXI_DRAIN_IND,
 +      HBW_AXI_DRAIN_IND
 +};
 +
 +static const u32 cluster_hmmu_hif_enabled_mask[GAUDI2_HBM_NUM] = {
 +      [HBM_ID0] = 0xFFFC,
 +      [HBM_ID1] = 0xFFCF,
 +      [HBM_ID2] = 0xF7F7,
 +      [HBM_ID3] = 0x7F7F,
 +      [HBM_ID4] = 0xFCFF,
 +      [HBM_ID5] = 0xCFFF,
 +};
 +
 +static const u8 xbar_edge_to_hbm_cluster[EDMA_ID_SIZE] = {
 +      [0] = HBM_ID0,
 +      [1] = HBM_ID1,
 +      [2] = HBM_ID4,
 +      [3] = HBM_ID5,
 +};
 +
 +static const u8 edma_to_hbm_cluster[EDMA_ID_SIZE] = {
 +      [EDMA_ID_DCORE0_INSTANCE0] = HBM_ID0,
 +      [EDMA_ID_DCORE0_INSTANCE1] = HBM_ID2,
 +      [EDMA_ID_DCORE1_INSTANCE0] = HBM_ID1,
 +      [EDMA_ID_DCORE1_INSTANCE1] = HBM_ID3,
 +      [EDMA_ID_DCORE2_INSTANCE0] = HBM_ID2,
 +      [EDMA_ID_DCORE2_INSTANCE1] = HBM_ID4,
 +      [EDMA_ID_DCORE3_INSTANCE0] = HBM_ID3,
 +      [EDMA_ID_DCORE3_INSTANCE1] = HBM_ID5,
 +};
 +
 +static const int gaudi2_qman_async_event_id[] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = GAUDI2_EVENT_PDMA0_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = GAUDI2_EVENT_PDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = GAUDI2_EVENT_HDMA0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = GAUDI2_EVENT_HDMA1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = GAUDI2_EVENT_MME0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = GAUDI2_EVENT_TPC0_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = GAUDI2_EVENT_TPC1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = GAUDI2_EVENT_TPC2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = GAUDI2_EVENT_TPC3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = GAUDI2_EVENT_TPC4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = GAUDI2_EVENT_TPC5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = GAUDI2_EVENT_TPC24_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = GAUDI2_EVENT_HDMA2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = GAUDI2_EVENT_HDMA3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = GAUDI2_EVENT_MME1_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = GAUDI2_EVENT_TPC6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = GAUDI2_EVENT_TPC7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = GAUDI2_EVENT_TPC8_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = GAUDI2_EVENT_TPC9_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = GAUDI2_EVENT_TPC10_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = GAUDI2_EVENT_TPC11_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = GAUDI2_EVENT_HDMA4_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = GAUDI2_EVENT_HDMA5_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = GAUDI2_EVENT_MME2_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = GAUDI2_EVENT_TPC12_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = GAUDI2_EVENT_TPC13_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = GAUDI2_EVENT_TPC14_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = GAUDI2_EVENT_TPC15_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = GAUDI2_EVENT_TPC16_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = GAUDI2_EVENT_TPC17_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = GAUDI2_EVENT_HDMA6_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = GAUDI2_EVENT_HDMA7_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = GAUDI2_EVENT_MME3_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = GAUDI2_EVENT_TPC18_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = GAUDI2_EVENT_TPC19_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = GAUDI2_EVENT_TPC20_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = GAUDI2_EVENT_TPC21_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = GAUDI2_EVENT_TPC22_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = GAUDI2_EVENT_TPC23_QM,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = GAUDI2_EVENT_NIC0_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = GAUDI2_EVENT_NIC0_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = GAUDI2_EVENT_NIC1_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = GAUDI2_EVENT_NIC1_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = GAUDI2_EVENT_NIC2_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = GAUDI2_EVENT_NIC2_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = GAUDI2_EVENT_NIC3_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = GAUDI2_EVENT_NIC3_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = GAUDI2_EVENT_NIC4_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = GAUDI2_EVENT_NIC4_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = GAUDI2_EVENT_NIC5_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = GAUDI2_EVENT_NIC5_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = GAUDI2_EVENT_NIC6_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = GAUDI2_EVENT_NIC6_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = GAUDI2_EVENT_NIC7_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = GAUDI2_EVENT_NIC7_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = GAUDI2_EVENT_NIC8_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = GAUDI2_EVENT_NIC8_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = GAUDI2_EVENT_NIC9_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = GAUDI2_EVENT_NIC9_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = GAUDI2_EVENT_NIC10_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = GAUDI2_EVENT_NIC10_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = GAUDI2_EVENT_NIC11_QM0,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = GAUDI2_EVENT_NIC11_QM1,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = GAUDI2_EVENT_ROTATOR1_ROT1_QM
 +};
 +
 +static const int gaudi2_dma_core_async_event_id[] = {
 +      [DMA_CORE_ID_EDMA0] = GAUDI2_EVENT_HDMA0_CORE,
 +      [DMA_CORE_ID_EDMA1] = GAUDI2_EVENT_HDMA1_CORE,
 +      [DMA_CORE_ID_EDMA2] = GAUDI2_EVENT_HDMA2_CORE,
 +      [DMA_CORE_ID_EDMA3] = GAUDI2_EVENT_HDMA3_CORE,
 +      [DMA_CORE_ID_EDMA4] = GAUDI2_EVENT_HDMA4_CORE,
 +      [DMA_CORE_ID_EDMA5] = GAUDI2_EVENT_HDMA5_CORE,
 +      [DMA_CORE_ID_EDMA6] = GAUDI2_EVENT_HDMA6_CORE,
 +      [DMA_CORE_ID_EDMA7] = GAUDI2_EVENT_HDMA7_CORE,
 +      [DMA_CORE_ID_PDMA0] = GAUDI2_EVENT_PDMA0_CORE,
 +      [DMA_CORE_ID_PDMA1] = GAUDI2_EVENT_PDMA1_CORE,
 +      [DMA_CORE_ID_KDMA] = GAUDI2_EVENT_KDMA0_CORE,
 +};
 +
 +static const char * const gaudi2_qm_sei_error_cause[GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE] = {
 +      "qman sei intr",
 +      "arc sei intr"
 +};
 +
 +static const char * const gaudi2_cpu_sei_error_cause[GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE] = {
 +      "AXI_TERMINATOR WR",
 +      "AXI_TERMINATOR RD",
 +      "AXI SPLIT SEI Status"
 +};
 +
 +static const char * const gaudi2_arc_sei_error_cause[GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE] = {
 +      "cbu_bresp_sei_intr_cause",
 +      "cbu_rresp_sei_intr_cause",
 +      "lbu_bresp_sei_intr_cause",
 +      "lbu_rresp_sei_intr_cause",
 +      "cbu_axi_split_intr_cause",
 +      "lbu_axi_split_intr_cause",
 +      "arc_ip_excptn_sei_intr_cause",
 +      "dmi_bresp_sei_intr_cause",
 +      "aux2apb_err_sei_intr_cause",
 +      "cfg_lbw_wr_terminated_intr_cause",
 +      "cfg_lbw_rd_terminated_intr_cause",
 +      "cfg_dccm_wr_terminated_intr_cause",
 +      "cfg_dccm_rd_terminated_intr_cause",
 +      "cfg_hbw_rd_terminated_intr_cause"
 +};
 +
 +static const char * const gaudi2_dec_error_cause[GAUDI2_NUM_OF_DEC_ERR_CAUSE] = {
 +      "msix_vcd_hbw_sei",
 +      "msix_l2c_hbw_sei",
 +      "msix_nrm_hbw_sei",
 +      "msix_abnrm_hbw_sei",
 +      "msix_vcd_lbw_sei",
 +      "msix_l2c_lbw_sei",
 +      "msix_nrm_lbw_sei",
 +      "msix_abnrm_lbw_sei",
 +      "apb_vcd_lbw_sei",
 +      "apb_l2c_lbw_sei",
 +      "apb_nrm_lbw_sei",
 +      "apb_abnrm_lbw_sei",
 +      "dec_sei",
 +      "dec_apb_sei",
 +      "trc_apb_sei",
 +      "lbw_mstr_if_sei",
 +      "axi_split_bresp_err_sei",
 +      "hbw_axi_wr_viol_sei",
 +      "hbw_axi_rd_viol_sei",
 +      "lbw_axi_wr_viol_sei",
 +      "lbw_axi_rd_viol_sei",
 +      "vcd_spi",
 +      "l2c_spi",
 +      "nrm_spi",
 +      "abnrm_spi",
 +};
 +
 +static const char * const gaudi2_qman_error_cause[GAUDI2_NUM_OF_QM_ERR_CAUSE] = {
 +      "PQ AXI HBW error",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped",
 +      "CPDMA Up overflow",
 +      "PQC L2H error"
 +};
 +
 +static const char * const gaudi2_qman_lower_cp_error_cause[GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE] = {
 +      "RSVD0",
 +      "CQ AXI HBW error",
 +      "CP AXI HBW error",
 +      "CP error due to undefined OPCODE",
 +      "CP encountered STOP OPCODE",
 +      "CP AXI LBW error",
 +      "CP WRREG32 or WRBULK returned error",
 +      "N/A",
 +      "FENCE 0 inc over max value and clipped",
 +      "FENCE 1 inc over max value and clipped",
 +      "FENCE 2 inc over max value and clipped",
 +      "FENCE 3 inc over max value and clipped",
 +      "FENCE 0 dec under min value and clipped",
 +      "FENCE 1 dec under min value and clipped",
 +      "FENCE 2 dec under min value and clipped",
 +      "FENCE 3 dec under min value and clipped",
 +      "CPDMA Up overflow",
 +      "RSVD17",
 +      "CQ_WR_IFIFO_CI_ERR",
 +      "CQ_WR_CTL_CI_ERR",
 +      "ARC_CQF_RD_ERR",
 +      "ARC_CQ_WR_IFIFO_CI_ERR",
 +      "ARC_CQ_WR_CTL_CI_ERR",
 +      "ARC_AXI_ERR",
 +      "CP_SWITCH_WDT_ERR"
 +};
 +
 +static const char * const gaudi2_qman_arb_error_cause[GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE] = {
 +      "Choice push while full error",
 +      "Choice Q watchdog error",
 +      "MSG AXI LBW returned with error"
 +};
 +
 +static const char * const guadi2_rot_error_cause[GAUDI2_NUM_OF_ROT_ERR_CAUSE] = {
 +      "qm_axi_err",
 +      "qm_trace_fence_events",
 +      "qm_sw_err",
 +      "qm_cp_sw_stop",
 +      "lbw_mstr_rresp_err",
 +      "lbw_mstr_bresp_err",
 +      "lbw_msg_slverr",
 +      "hbw_msg_slverr",
 +      "wbc_slverr",
 +      "hbw_mstr_rresp_err",
 +      "hbw_mstr_bresp_err",
 +      "sb_resp_intr",
 +      "mrsb_resp_intr",
 +      "core_dw_status_0",
 +      "core_dw_status_1",
 +      "core_dw_status_2",
 +      "core_dw_status_3",
 +      "core_dw_status_4",
 +      "core_dw_status_5",
 +      "core_dw_status_6",
 +      "core_dw_status_7",
 +      "async_arc2cpu_sei_intr",
 +};
 +
 +static const char * const gaudi2_tpc_interrupts_cause[GAUDI2_NUM_OF_TPC_INTR_CAUSE] = {
 +      "tpc_address_exceed_slm",
 +      "tpc_div_by_0",
 +      "tpc_spu_mac_overflow",
 +      "tpc_spu_addsub_overflow",
 +      "tpc_spu_abs_overflow",
 +      "tpc_spu_fma_fp_dst_nan",
 +      "tpc_spu_fma_fp_dst_inf",
 +      "tpc_spu_convert_fp_dst_nan",
 +      "tpc_spu_convert_fp_dst_inf",
 +      "tpc_spu_fp_dst_denorm",
 +      "tpc_vpu_mac_overflow",
 +      "tpc_vpu_addsub_overflow",
 +      "tpc_vpu_abs_overflow",
 +      "tpc_vpu_convert_fp_dst_nan",
 +      "tpc_vpu_convert_fp_dst_inf",
 +      "tpc_vpu_fma_fp_dst_nan",
 +      "tpc_vpu_fma_fp_dst_inf",
 +      "tpc_vpu_fp_dst_denorm",
 +      "tpc_assertions",
 +      "tpc_illegal_instruction",
 +      "tpc_pc_wrap_around",
 +      "tpc_qm_sw_err",
 +      "tpc_hbw_rresp_err",
 +      "tpc_hbw_bresp_err",
 +      "tpc_lbw_rresp_err",
 +      "tpc_lbw_bresp_err",
 +      "st_unlock_already_locked",
 +      "invalid_lock_access",
 +      "LD_L protection violation",
 +      "ST_L protection violation",
 +};
 +
 +static const char * const guadi2_mme_error_cause[GAUDI2_NUM_OF_MME_ERR_CAUSE] = {
 +      "agu_resp_intr",
 +      "qman_axi_err",
 +      "wap sei (wbc axi err)",
 +      "arc sei",
 +      "cfg access error",
 +      "qm_sw_err",
 +      "sbte_dbg_intr_0",
 +      "sbte_dbg_intr_1",
 +      "sbte_dbg_intr_2",
 +      "sbte_dbg_intr_3",
 +      "sbte_dbg_intr_4",
 +      "sbte_prtn_intr_0",
 +      "sbte_prtn_intr_1",
 +      "sbte_prtn_intr_2",
 +      "sbte_prtn_intr_3",
 +      "sbte_prtn_intr_4",
 +};
 +
 +static const char * const guadi2_mme_sbte_error_cause[GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE] = {
 +      "i0",
 +      "i1",
 +      "i2",
 +      "i3",
 +      "i4",
 +};
 +
 +static const char * const guadi2_mme_wap_error_cause[GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE] = {
 +      "WBC ERR RESP_0",
 +      "WBC ERR RESP_1",
 +      "AP SOURCE POS INF",
 +      "AP SOURCE NEG INF",
 +      "AP SOURCE NAN",
 +      "AP RESULT POS INF",
 +      "AP RESULT NEG INF",
 +};
 +
 +static const char * const gaudi2_dma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
 +      "HBW Read returned with error RRESP",
 +      "HBW write returned with error BRESP",
 +      "LBW write returned with error BRESP",
 +      "descriptor_fifo_overflow",
 +      "KDMA SB LBW Read returned with error",
 +      "KDMA WBC LBW Write returned with error",
 +      "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
 +      "WRONG CFG FOR COMMIT IN LIN DMA"
 +};
 +
 +static const char * const gaudi2_kdma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
 +      "HBW/LBW Read returned with error RRESP",
 +      "HBW/LBW write returned with error BRESP",
 +      "LBW write returned with error BRESP",
 +      "descriptor_fifo_overflow",
 +      "KDMA SB LBW Read returned with error",
 +      "KDMA WBC LBW Write returned with error",
 +      "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
 +      "WRONG CFG FOR COMMIT IN LIN DMA"
 +};
 +
 +struct gaudi2_sm_sei_cause_data {
 +      const char *cause_name;
 +      const char *log_name;
 +};
 +
 +static const struct gaudi2_sm_sei_cause_data
 +gaudi2_sm_sei_cause[GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE] = {
 +      {"calculated SO value overflow/underflow", "SOB ID"},
 +      {"payload address of monitor is not aligned to 4B", "monitor addr"},
 +      {"armed monitor write got BRESP (SLVERR or DECERR)", "AXI id"},
 +};
 +
 +static const char * const
 +gaudi2_pmmu_fatal_interrupts_cause[GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE] = {
 +      "LATENCY_RD_OUT_FIFO_OVERRUN",
 +      "LATENCY_WR_OUT_FIFO_OVERRUN",
 +};
 +
 +static const char * const
 +gaudi2_hif_fatal_interrupts_cause[GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE] = {
 +      "LATENCY_RD_OUT_FIFO_OVERRUN",
 +      "LATENCY_WR_OUT_FIFO_OVERRUN",
 +};
 +
 +static const char * const
 +gaudi2_psoc_axi_drain_interrupts_cause[GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE] = {
 +      "AXI drain HBW",
 +      "AXI drain LBW",
 +};
 +
 +static const char * const
 +gaudi2_pcie_addr_dec_error_cause[GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE] = {
 +      "HBW error response",
 +      "LBW error response",
 +      "TLP is blocked by RR"
 +};
 +
 +const u32 gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_SIZE] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = mmPDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = mmPDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = mmDCORE0_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = mmDCORE0_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = mmDCORE0_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = mmDCORE0_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = mmDCORE0_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = mmDCORE0_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = mmDCORE0_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = mmDCORE0_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = mmDCORE0_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = mmDCORE0_TPC6_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = mmDCORE1_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = mmDCORE1_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = mmDCORE1_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = mmDCORE1_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = mmDCORE1_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = mmDCORE1_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = mmDCORE1_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = mmDCORE1_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = mmDCORE1_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = mmDCORE2_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = mmDCORE2_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = mmDCORE2_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = mmDCORE2_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = mmDCORE2_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = mmDCORE2_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = mmDCORE2_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = mmDCORE2_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = mmDCORE2_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = mmDCORE3_EDMA0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = mmDCORE3_EDMA1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = mmDCORE3_MME_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = mmDCORE3_TPC0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = mmDCORE3_TPC1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = mmDCORE3_TPC2_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = mmDCORE3_TPC3_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = mmDCORE3_TPC4_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = mmDCORE3_TPC5_QM_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = mmNIC0_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = mmNIC0_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = mmNIC1_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = mmNIC1_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = mmNIC2_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = mmNIC2_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = mmNIC3_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = mmNIC3_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = mmNIC4_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = mmNIC4_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = mmNIC5_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = mmNIC5_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = mmNIC6_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = mmNIC6_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = mmNIC7_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = mmNIC7_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = mmNIC8_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = mmNIC8_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = mmNIC9_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = mmNIC9_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = mmNIC10_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = mmNIC10_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = mmNIC11_QM0_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = mmNIC11_QM1_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = mmROT0_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = mmROT1_QM_BASE,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = mmROT1_QM_BASE
 +};
 +
 +static const u32 gaudi2_arc_blocks_bases[NUM_ARC_CPUS] = {
 +      [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_AUX_BASE,
 +      [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_AUX_BASE,
 +      [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_AUX_BASE,
 +      [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_AUX_BASE,
 +      [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_ARC_AUX_BASE,
 +      [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_ARC_AUX_BASE,
 +      [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_AUX_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_AUX_BASE,
 +      [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_AUX_BASE,
 +      [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_AUX_BASE,
 +      [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_ARC_AUX1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_ARC_AUX0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_ARC_AUX1_BASE,
 +};
 +
 +static const u32 gaudi2_arc_dccm_bases[NUM_ARC_CPUS] = {
 +      [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_DCCM0_BASE,
 +      [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_DCCM_BASE,
 +      [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_DCCM_BASE,
 +      [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_DCCM_BASE,
 +      [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_DCCM_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_DCCM_BASE,
 +      [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_DCCM_BASE,
 +      [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_DCCM_BASE,
 +      [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_DCCM_BASE,
 +      [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_DCCM1_BASE,
 +      [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_DCCM0_BASE,
 +      [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_DCCM1_BASE,
 +};
 +
 +const u32 gaudi2_mme_ctrl_lo_blocks_bases[MME_ID_SIZE] = {
 +      [MME_ID_DCORE0] = mmDCORE0_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE1] = mmDCORE1_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE2] = mmDCORE2_MME_CTRL_LO_BASE,
 +      [MME_ID_DCORE3] = mmDCORE3_MME_CTRL_LO_BASE,
 +};
 +
 +static const u32 gaudi2_queue_id_to_arc_id[GAUDI2_QUEUE_ID_SIZE] = {
 +      [GAUDI2_QUEUE_ID_PDMA_0_0] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_1] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_2] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_0_3] = CPU_ID_PDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_PDMA_1_0] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_1] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_2] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_PDMA_1_3] = CPU_ID_PDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = CPU_ID_MME_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = CPU_ID_TPC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = CPU_ID_TPC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = CPU_ID_TPC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = CPU_ID_TPC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = CPU_ID_TPC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = CPU_ID_TPC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = CPU_ID_TPC_QMAN_ARC24,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = CPU_ID_SCHED_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = CPU_ID_TPC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = CPU_ID_TPC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = CPU_ID_TPC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = CPU_ID_TPC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = CPU_ID_TPC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = CPU_ID_TPC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = CPU_ID_MME_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = CPU_ID_TPC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = CPU_ID_TPC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = CPU_ID_TPC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = CPU_ID_TPC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = CPU_ID_TPC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = CPU_ID_TPC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = CPU_ID_SCHED_ARC5,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = CPU_ID_TPC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = CPU_ID_TPC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = CPU_ID_TPC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = CPU_ID_TPC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = CPU_ID_TPC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = CPU_ID_TPC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_0_0] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_1] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_2] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_0_3] = CPU_ID_NIC_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_NIC_1_0] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_1] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_2] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_1_3] = CPU_ID_NIC_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_NIC_2_0] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_1] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_2] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_2_3] = CPU_ID_NIC_QMAN_ARC2,
 +      [GAUDI2_QUEUE_ID_NIC_3_0] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_1] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_2] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_3_3] = CPU_ID_NIC_QMAN_ARC3,
 +      [GAUDI2_QUEUE_ID_NIC_4_0] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_1] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_2] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_4_3] = CPU_ID_NIC_QMAN_ARC4,
 +      [GAUDI2_QUEUE_ID_NIC_5_0] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_1] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_2] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_5_3] = CPU_ID_NIC_QMAN_ARC5,
 +      [GAUDI2_QUEUE_ID_NIC_6_0] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_1] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_2] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_6_3] = CPU_ID_NIC_QMAN_ARC6,
 +      [GAUDI2_QUEUE_ID_NIC_7_0] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_1] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_2] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_7_3] = CPU_ID_NIC_QMAN_ARC7,
 +      [GAUDI2_QUEUE_ID_NIC_8_0] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_1] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_2] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_8_3] = CPU_ID_NIC_QMAN_ARC8,
 +      [GAUDI2_QUEUE_ID_NIC_9_0] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_1] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_2] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_9_3] = CPU_ID_NIC_QMAN_ARC9,
 +      [GAUDI2_QUEUE_ID_NIC_10_0] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_1] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_2] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_10_3] = CPU_ID_NIC_QMAN_ARC10,
 +      [GAUDI2_QUEUE_ID_NIC_11_0] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_1] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_2] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_11_3] = CPU_ID_NIC_QMAN_ARC11,
 +      [GAUDI2_QUEUE_ID_NIC_12_0] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_1] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_2] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_12_3] = CPU_ID_NIC_QMAN_ARC12,
 +      [GAUDI2_QUEUE_ID_NIC_13_0] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_1] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_2] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_13_3] = CPU_ID_NIC_QMAN_ARC13,
 +      [GAUDI2_QUEUE_ID_NIC_14_0] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_1] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_2] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_14_3] = CPU_ID_NIC_QMAN_ARC14,
 +      [GAUDI2_QUEUE_ID_NIC_15_0] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_1] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_2] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_15_3] = CPU_ID_NIC_QMAN_ARC15,
 +      [GAUDI2_QUEUE_ID_NIC_16_0] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_1] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_2] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_16_3] = CPU_ID_NIC_QMAN_ARC16,
 +      [GAUDI2_QUEUE_ID_NIC_17_0] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_1] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_2] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_17_3] = CPU_ID_NIC_QMAN_ARC17,
 +      [GAUDI2_QUEUE_ID_NIC_18_0] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_1] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_2] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_18_3] = CPU_ID_NIC_QMAN_ARC18,
 +      [GAUDI2_QUEUE_ID_NIC_19_0] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_1] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_2] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_19_3] = CPU_ID_NIC_QMAN_ARC19,
 +      [GAUDI2_QUEUE_ID_NIC_20_0] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_1] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_2] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_20_3] = CPU_ID_NIC_QMAN_ARC20,
 +      [GAUDI2_QUEUE_ID_NIC_21_0] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_1] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_2] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_21_3] = CPU_ID_NIC_QMAN_ARC21,
 +      [GAUDI2_QUEUE_ID_NIC_22_0] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_1] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_2] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_22_3] = CPU_ID_NIC_QMAN_ARC22,
 +      [GAUDI2_QUEUE_ID_NIC_23_0] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_1] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_2] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_NIC_23_3] = CPU_ID_NIC_QMAN_ARC23,
 +      [GAUDI2_QUEUE_ID_ROT_0_0] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_1] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_2] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_0_3] = CPU_ID_ROT_QMAN_ARC0,
 +      [GAUDI2_QUEUE_ID_ROT_1_0] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_1] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_2] = CPU_ID_ROT_QMAN_ARC1,
 +      [GAUDI2_QUEUE_ID_ROT_1_3] = CPU_ID_ROT_QMAN_ARC1
 +};
 +
 +const u32 gaudi2_dma_core_blocks_bases[DMA_CORE_ID_SIZE] = {
 +      [DMA_CORE_ID_PDMA0] = mmPDMA0_CORE_BASE,
 +      [DMA_CORE_ID_PDMA1] = mmPDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA0] = mmDCORE0_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA1] = mmDCORE0_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA2] = mmDCORE1_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA3] = mmDCORE1_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA4] = mmDCORE2_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA5] = mmDCORE2_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_EDMA6] = mmDCORE3_EDMA0_CORE_BASE,
 +      [DMA_CORE_ID_EDMA7] = mmDCORE3_EDMA1_CORE_BASE,
 +      [DMA_CORE_ID_KDMA] = mmARC_FARM_KDMA_BASE
 +};
 +
 +const u32 gaudi2_mme_acc_blocks_bases[MME_ID_SIZE] = {
 +      [MME_ID_DCORE0] = mmDCORE0_MME_ACC_BASE,
 +      [MME_ID_DCORE1] = mmDCORE1_MME_ACC_BASE,
 +      [MME_ID_DCORE2] = mmDCORE2_MME_ACC_BASE,
 +      [MME_ID_DCORE3] = mmDCORE3_MME_ACC_BASE
 +};
 +
 +static const u32 gaudi2_tpc_cfg_blocks_bases[TPC_ID_SIZE] = {
 +      [TPC_ID_DCORE0_TPC0] = mmDCORE0_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC1] = mmDCORE0_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC2] = mmDCORE0_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC3] = mmDCORE0_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC4] = mmDCORE0_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC5] = mmDCORE0_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC0] = mmDCORE1_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC1] = mmDCORE1_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC2] = mmDCORE1_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC3] = mmDCORE1_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC4] = mmDCORE1_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE1_TPC5] = mmDCORE1_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC0] = mmDCORE2_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC1] = mmDCORE2_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC2] = mmDCORE2_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC3] = mmDCORE2_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC4] = mmDCORE2_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE2_TPC5] = mmDCORE2_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC0] = mmDCORE3_TPC0_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC1] = mmDCORE3_TPC1_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC2] = mmDCORE3_TPC2_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC3] = mmDCORE3_TPC3_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC4] = mmDCORE3_TPC4_CFG_BASE,
 +      [TPC_ID_DCORE3_TPC5] = mmDCORE3_TPC5_CFG_BASE,
 +      [TPC_ID_DCORE0_TPC6] = mmDCORE0_TPC6_CFG_BASE,
 +};
 +
 +const u32 gaudi2_rot_blocks_bases[ROTATOR_ID_SIZE] = {
 +      [ROTATOR_ID_0] = mmROT0_BASE,
 +      [ROTATOR_ID_1] = mmROT1_BASE
 +};
 +
 +static const u32 gaudi2_tpc_id_to_queue_id[TPC_ID_SIZE] = {
 +      [TPC_ID_DCORE0_TPC0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0,
 +      [TPC_ID_DCORE0_TPC1] = GAUDI2_QUEUE_ID_DCORE0_TPC_1_0,
 +      [TPC_ID_DCORE0_TPC2] = GAUDI2_QUEUE_ID_DCORE0_TPC_2_0,
 +      [TPC_ID_DCORE0_TPC3] = GAUDI2_QUEUE_ID_DCORE0_TPC_3_0,
 +      [TPC_ID_DCORE0_TPC4] = GAUDI2_QUEUE_ID_DCORE0_TPC_4_0,
 +      [TPC_ID_DCORE0_TPC5] = GAUDI2_QUEUE_ID_DCORE0_TPC_5_0,
 +      [TPC_ID_DCORE1_TPC0] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0,
 +      [TPC_ID_DCORE1_TPC1] = GAUDI2_QUEUE_ID_DCORE1_TPC_1_0,
 +      [TPC_ID_DCORE1_TPC2] = GAUDI2_QUEUE_ID_DCORE1_TPC_2_0,
 +      [TPC_ID_DCORE1_TPC3] = GAUDI2_QUEUE_ID_DCORE1_TPC_3_0,
 +      [TPC_ID_DCORE1_TPC4] = GAUDI2_QUEUE_ID_DCORE1_TPC_4_0,
 +      [TPC_ID_DCORE1_TPC5] = GAUDI2_QUEUE_ID_DCORE1_TPC_5_0,
 +      [TPC_ID_DCORE2_TPC0] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0,
 +      [TPC_ID_DCORE2_TPC1] = GAUDI2_QUEUE_ID_DCORE2_TPC_1_0,
 +      [TPC_ID_DCORE2_TPC2] = GAUDI2_QUEUE_ID_DCORE2_TPC_2_0,
 +      [TPC_ID_DCORE2_TPC3] = GAUDI2_QUEUE_ID_DCORE2_TPC_3_0,
 +      [TPC_ID_DCORE2_TPC4] = GAUDI2_QUEUE_ID_DCORE2_TPC_4_0,
 +      [TPC_ID_DCORE2_TPC5] = GAUDI2_QUEUE_ID_DCORE2_TPC_5_0,
 +      [TPC_ID_DCORE3_TPC0] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0,
 +      [TPC_ID_DCORE3_TPC1] = GAUDI2_QUEUE_ID_DCORE3_TPC_1_0,
 +      [TPC_ID_DCORE3_TPC2] = GAUDI2_QUEUE_ID_DCORE3_TPC_2_0,
 +      [TPC_ID_DCORE3_TPC3] = GAUDI2_QUEUE_ID_DCORE3_TPC_3_0,
 +      [TPC_ID_DCORE3_TPC4] = GAUDI2_QUEUE_ID_DCORE3_TPC_4_0,
 +      [TPC_ID_DCORE3_TPC5] = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0,
 +      [TPC_ID_DCORE0_TPC6] = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0,
 +};
 +
 +static const u32 gaudi2_rot_id_to_queue_id[ROTATOR_ID_SIZE] = {
 +      [ROTATOR_ID_0] = GAUDI2_QUEUE_ID_ROT_0_0,
 +      [ROTATOR_ID_1] = GAUDI2_QUEUE_ID_ROT_1_0,
 +};
 +
 +const u32 edma_stream_base[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
 +      GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0,
 +      GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0,
 +      GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0,
 +};
 +
 +static const char gaudi2_vdec_irq_name[GAUDI2_VDEC_MSIX_ENTRIES][GAUDI2_MAX_STRING_LEN] = {
 +      "gaudi2 vdec 0_0", "gaudi2 vdec 0_0 abnormal",
 +      "gaudi2 vdec 0_1", "gaudi2 vdec 0_1 abnormal",
 +      "gaudi2 vdec 1_0", "gaudi2 vdec 1_0 abnormal",
 +      "gaudi2 vdec 1_1", "gaudi2 vdec 1_1 abnormal",
 +      "gaudi2 vdec 2_0", "gaudi2 vdec 2_0 abnormal",
 +      "gaudi2 vdec 2_1", "gaudi2 vdec 2_1 abnormal",
 +      "gaudi2 vdec 3_0", "gaudi2 vdec 3_0 abnormal",
 +      "gaudi2 vdec 3_1", "gaudi2 vdec 3_1 abnormal",
 +      "gaudi2 vdec s_0", "gaudi2 vdec s_0 abnormal",
 +      "gaudi2 vdec s_1", "gaudi2 vdec s_1 abnormal"
 +};
 +
 +static const u32 rtr_coordinates_to_rtr_id[NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES] = {
 +      RTR_ID_X_Y(2, 4),
 +      RTR_ID_X_Y(3, 4),
 +      RTR_ID_X_Y(4, 4),
 +      RTR_ID_X_Y(5, 4),
 +      RTR_ID_X_Y(6, 4),
 +      RTR_ID_X_Y(7, 4),
 +      RTR_ID_X_Y(8, 4),
 +      RTR_ID_X_Y(9, 4),
 +      RTR_ID_X_Y(10, 4),
 +      RTR_ID_X_Y(11, 4),
 +      RTR_ID_X_Y(12, 4),
 +      RTR_ID_X_Y(13, 4),
 +      RTR_ID_X_Y(14, 4),
 +      RTR_ID_X_Y(15, 4),
 +      RTR_ID_X_Y(16, 4),
 +      RTR_ID_X_Y(17, 4),
 +      RTR_ID_X_Y(2, 11),
 +      RTR_ID_X_Y(3, 11),
 +      RTR_ID_X_Y(4, 11),
 +      RTR_ID_X_Y(5, 11),
 +      RTR_ID_X_Y(6, 11),
 +      RTR_ID_X_Y(7, 11),
 +      RTR_ID_X_Y(8, 11),
 +      RTR_ID_X_Y(9, 11),
 +      RTR_ID_X_Y(0, 0),/* 24 no id */
 +      RTR_ID_X_Y(0, 0),/* 25 no id */
 +      RTR_ID_X_Y(0, 0),/* 26 no id */
 +      RTR_ID_X_Y(0, 0),/* 27 no id */
 +      RTR_ID_X_Y(14, 11),
 +      RTR_ID_X_Y(15, 11),
 +      RTR_ID_X_Y(16, 11),
 +      RTR_ID_X_Y(17, 11)
 +};
 +
 +enum rtr_id {
 +      DCORE0_RTR0,
 +      DCORE0_RTR1,
 +      DCORE0_RTR2,
 +      DCORE0_RTR3,
 +      DCORE0_RTR4,
 +      DCORE0_RTR5,
 +      DCORE0_RTR6,
 +      DCORE0_RTR7,
 +      DCORE1_RTR0,
 +      DCORE1_RTR1,
 +      DCORE1_RTR2,
 +      DCORE1_RTR3,
 +      DCORE1_RTR4,
 +      DCORE1_RTR5,
 +      DCORE1_RTR6,
 +      DCORE1_RTR7,
 +      DCORE2_RTR0,
 +      DCORE2_RTR1,
 +      DCORE2_RTR2,
 +      DCORE2_RTR3,
 +      DCORE2_RTR4,
 +      DCORE2_RTR5,
 +      DCORE2_RTR6,
 +      DCORE2_RTR7,
 +      DCORE3_RTR0,
 +      DCORE3_RTR1,
 +      DCORE3_RTR2,
 +      DCORE3_RTR3,
 +      DCORE3_RTR4,
 +      DCORE3_RTR5,
 +      DCORE3_RTR6,
 +      DCORE3_RTR7,
 +};
 +
 +static const u32 gaudi2_tpc_initiator_hbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2, DCORE0_RTR3, DCORE0_RTR3,
 +      DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5, DCORE1_RTR4, DCORE1_RTR4,
 +      DCORE2_RTR3, DCORE2_RTR3, DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1,
 +      DCORE3_RTR4, DCORE3_RTR4, DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6,
 +      DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_tpc_initiator_lbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2,
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5,
 +      DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1, DCORE2_RTR0, DCORE2_RTR0,
 +      DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6, DCORE3_RTR7, DCORE3_RTR7,
 +      DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_dec_initiator_hbw_rtr_id[NUMBER_OF_DEC] = {
 +      DCORE0_RTR0, DCORE0_RTR0, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0, DCORE2_RTR0,
 +      DCORE3_RTR7, DCORE3_RTR7, DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_dec_initiator_lbw_rtr_id[NUMBER_OF_DEC] = {
 +      DCORE0_RTR1, DCORE0_RTR1, DCORE1_RTR6, DCORE1_RTR6, DCORE2_RTR1, DCORE2_RTR1,
 +      DCORE3_RTR6, DCORE3_RTR6, DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_nic_initiator_hbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
 +      DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_nic_initiator_lbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
 +      DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
 +      DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_edma_initiator_hbw_sft[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
 +      mmSFT0_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT0_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT1_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT1_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT2_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT2_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT3_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
 +      mmSFT3_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE
 +};
 +
 +static const u32 gaudi2_pdma_initiator_hbw_rtr_id[NUM_OF_PDMA] = {
 +      DCORE0_RTR0, DCORE0_RTR0
 +};
 +
 +static const u32 gaudi2_pdma_initiator_lbw_rtr_id[NUM_OF_PDMA] = {
 +      DCORE0_RTR2, DCORE0_RTR2
 +};
 +
 +static const u32 gaudi2_rot_initiator_hbw_rtr_id[NUM_OF_ROT] = {
 +      DCORE2_RTR0, DCORE3_RTR7
 +};
 +
 +static const u32 gaudi2_rot_initiator_lbw_rtr_id[NUM_OF_ROT] = {
 +      DCORE2_RTR2, DCORE3_RTR5
 +};
 +
 +struct mme_initiators_rtr_id {
 +      u32 wap0;
 +      u32 wap1;
 +      u32 write;
 +      u32 read;
 +      u32 sbte0;
 +      u32 sbte1;
 +      u32 sbte2;
 +      u32 sbte3;
 +      u32 sbte4;
 +};
 +
 +enum mme_initiators {
 +      MME_WAP0 = 0,
 +      MME_WAP1,
 +      MME_WRITE,
 +      MME_READ,
 +      MME_SBTE0,
 +      MME_SBTE1,
 +      MME_SBTE2,
 +      MME_SBTE3,
 +      MME_SBTE4,
 +      MME_INITIATORS_MAX
 +};
 +
 +static const struct mme_initiators_rtr_id
 +gaudi2_mme_initiator_rtr_id[NUM_OF_MME_PER_DCORE * NUM_OF_DCORES] = {
 +      { .wap0 = 5, .wap1 = 7, .write = 6, .read = 7,
 +      .sbte0 = 7, .sbte1 = 4, .sbte2 = 4, .sbte3 = 5, .sbte4 = 6},
 +      { .wap0 = 10, .wap1 = 8, .write = 9, .read = 8,
 +      .sbte0 = 11, .sbte1 = 11, .sbte2 = 10, .sbte3 = 9, .sbte4 = 8},
 +      { .wap0 = 21, .wap1 = 23, .write = 22, .read = 23,
 +      .sbte0 = 20, .sbte1 = 20, .sbte2 = 21, .sbte3 = 22, .sbte4 = 23},
 +      { .wap0 = 30, .wap1 = 28, .write = 29, .read = 30,
 +      .sbte0 = 31, .sbte1 = 31, .sbte2 = 30, .sbte3 = 29, .sbte4 = 28},
 +};
 +
 +enum razwi_event_sources {
 +      RAZWI_TPC,
 +      RAZWI_MME,
 +      RAZWI_EDMA,
 +      RAZWI_PDMA,
 +      RAZWI_NIC,
 +      RAZWI_DEC,
 +      RAZWI_ROT
 +};
 +
 +struct hbm_mc_error_causes {
 +      u32 mask;
 +      char cause[50];
 +};
 +
 +static struct hl_special_block_info gaudi2_special_blocks[] = GAUDI2_SPECIAL_BLOCKS;
 +
 +/* Special blocks iterator is currently used to configure security protection bits,
 + * and read global errors. Most HW blocks are addressable and those who aren't (N/A)-
 + * must be skipped. Following configurations are commonly used for both PB config
 + * and global error reading, since currently they both share the same settings.
 + * Once it changes, we must remember to use separate configurations for either one.
 + */
 +static int gaudi2_iterator_skip_block_types[] = {
 +              GAUDI2_BLOCK_TYPE_PLL,
 +              GAUDI2_BLOCK_TYPE_EU_BIST,
 +              GAUDI2_BLOCK_TYPE_HBM,
 +              GAUDI2_BLOCK_TYPE_XFT
 +};
 +
 +static struct range gaudi2_iterator_skip_block_ranges[] = {
 +              /* Skip all PSOC blocks except for PSOC_GLOBAL_CONF */
 +              {mmPSOC_I2C_M0_BASE, mmPSOC_EFUSE_BASE},
 +              {mmPSOC_BTL_BASE, mmPSOC_MSTR_IF_RR_SHRD_HBW_BASE},
 +              /* Skip all CPU blocks except for CPU_IF */
 +              {mmCPU_CA53_CFG_BASE, mmCPU_CA53_CFG_BASE},
 +              {mmCPU_TIMESTAMP_BASE, mmCPU_MSTR_IF_RR_SHRD_HBW_BASE}
 +};
 +
 +static struct hbm_mc_error_causes hbm_mc_spi[GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE] = {
 +      {HBM_MC_SPI_TEMP_PIN_CHG_MASK, "temperature pins changed"},
 +      {HBM_MC_SPI_THR_ENG_MASK, "temperature-based throttling engaged"},
 +      {HBM_MC_SPI_THR_DIS_ENG_MASK, "temperature-based throttling disengaged"},
 +      {HBM_MC_SPI_IEEE1500_COMP_MASK, "IEEE1500 op comp"},
 +      {HBM_MC_SPI_IEEE1500_PAUSED_MASK, "IEEE1500 op paused"},
 +};
 +
 +static const char * const hbm_mc_sei_cause[GAUDI2_NUM_OF_HBM_SEI_CAUSE] = {
 +      [HBM_SEI_CMD_PARITY_EVEN] = "SEI C/A parity even",
 +      [HBM_SEI_CMD_PARITY_ODD] = "SEI C/A parity odd",
 +      [HBM_SEI_READ_ERR] = "SEI read data error",
 +      [HBM_SEI_WRITE_DATA_PARITY_ERR] = "SEI write data parity error",
 +      [HBM_SEI_CATTRIP] = "SEI CATTRIP asserted",
 +      [HBM_SEI_MEM_BIST_FAIL] = "SEI memory BIST fail",
 +      [HBM_SEI_DFI] = "SEI DFI error",
 +      [HBM_SEI_INV_TEMP_READ_OUT] = "SEI invalid temp read",
 +      [HBM_SEI_BIST_FAIL] = "SEI BIST fail"
 +};
 +
 +struct mmu_spi_sei_cause {
 +      char cause[50];
 +      int clear_bit;
 +};
 +
 +static const struct mmu_spi_sei_cause gaudi2_mmu_spi_sei[GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE] = {
 +      {"page fault", 1},              /* INTERRUPT_CLR[1] */
 +      {"page access", 1},             /* INTERRUPT_CLR[1] */
 +      {"bypass ddr", 2},              /* INTERRUPT_CLR[2] */
 +      {"multi hit", 2},               /* INTERRUPT_CLR[2] */
 +      {"mmu rei0", -1},               /* no clear register bit */
 +      {"mmu rei1", -1},               /* no clear register bit */
 +      {"stlb rei0", -1},              /* no clear register bit */
 +      {"stlb rei1", -1},              /* no clear register bit */
 +      {"rr privileged write hit", 2}, /* INTERRUPT_CLR[2] */
 +      {"rr privileged read hit", 2},  /* INTERRUPT_CLR[2] */
 +      {"rr secure write hit", 2},     /* INTERRUPT_CLR[2] */
 +      {"rr secure read hit", 2},      /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"bist_fail no use", 2},        /* INTERRUPT_CLR[2] */
 +      {"slave error", 16},            /* INTERRUPT_CLR[16] */
 +      {"dec error", 17},              /* INTERRUPT_CLR[17] */
 +      {"burst fifo full", 2}          /* INTERRUPT_CLR[2] */
 +};
 +
 +struct gaudi2_cache_invld_params {
 +      u64 start_va;
 +      u64 end_va;
 +      u32 inv_start_val;
 +      u32 flags;
 +      bool range_invalidation;
 +};
 +
 +struct gaudi2_tpc_idle_data {
 +      struct engines_data *e;
 +      unsigned long *mask;
 +      bool *is_idle;
 +      const char *tpc_fmt;
 +};
 +
 +struct gaudi2_tpc_mmu_data {
 +      u32 rw_asid;
 +};
 +
 +static s64 gaudi2_state_dump_specs_props[SP_MAX] = {0};
 +
 +static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val);
 +static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id);
 +static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id);
 +static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val);
 +static int gaudi2_send_job_to_kdma(struct hl_device *hdev, u64 src_addr, u64 dst_addr, u32 size,
 +                                                                              bool is_memset);
 +static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr);
 +
 +static void gaudi2_init_scrambler_hbm(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static u32 gaudi2_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short);
 +}
 +
 +static u32 gaudi2_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return sizeof(struct packet_msg_short) * 4 + sizeof(struct packet_fence);
 +}
 +
 +void gaudi2_iterate_tpcs(struct hl_device *hdev, struct iterate_module_ctx *ctx)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int dcore, inst, tpc_seq;
 +      u32 offset;
 +
 +      /* init the return code */
 +      ctx->rc = 0;
 +
 +      for (dcore = 0; dcore < NUM_OF_DCORES; dcore++) {
 +              for (inst = 0; inst < NUM_OF_TPC_PER_DCORE; inst++) {
 +                      tpc_seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
 +
 +                      if (!(prop->tpc_enabled_mask & BIT(tpc_seq)))
 +                              continue;
 +
 +                      offset = (DCORE_OFFSET * dcore) + (DCORE_TPC_OFFSET * inst);
 +
 +                      ctx->fn(hdev, dcore, inst, offset, ctx);
 +                      if (ctx->rc) {
 +                              dev_err(hdev->dev, "TPC iterator failed for DCORE%d TPC%d\n",
 +                                                      dcore, inst);
 +                              return;
 +                      }
 +              }
 +      }
 +
 +      if (!(prop->tpc_enabled_mask & BIT(TPC_ID_DCORE0_TPC6)))
 +              return;
 +
 +      /* special check for PCI TPC (DCORE0_TPC6) */
 +      offset = DCORE_TPC_OFFSET * (NUM_DCORE0_TPC - 1);
 +      ctx->fn(hdev, 0, NUM_DCORE0_TPC - 1, offset, ctx);
 +      if (ctx->rc)
 +              dev_err(hdev->dev, "TPC iterator failed for DCORE0 TPC6\n");
 +}
 +
 +static bool gaudi2_host_phys_addr_valid(u64 addr)
 +{
 +      if ((addr < HOST_PHYS_BASE_0 + HOST_PHYS_SIZE_0) || (addr >= HOST_PHYS_BASE_1))
 +              return true;
 +
 +      return false;
 +}
 +
 +static int set_number_of_functional_hbms(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 faulty_hbms = hweight64(hdev->dram_binning);
 +
 +      /* check if all HBMs should be used */
 +      if (!faulty_hbms) {
 +              dev_dbg(hdev->dev, "All HBM are in use (no binning)\n");
 +              prop->num_functional_hbms = GAUDI2_HBM_NUM;
 +              return 0;
 +      }
 +
 +      /*
 +       * check for error condition in which number of binning
 +       * candidates is higher than the maximum supported by the
 +       * driver (in which case binning mask shall be ignored and driver will
 +       * set the default)
 +       */
 +      if (faulty_hbms > MAX_FAULTY_HBMS) {
 +              dev_err(hdev->dev,
 +                      "HBM binning supports max of %d faulty HBMs, supplied mask 0x%llx.\n",
 +                      MAX_FAULTY_HBMS, hdev->dram_binning);
 +              return -EINVAL;
 +      }
 +
 +      /*
 +       * by default, number of functional HBMs in Gaudi2 is always
 +       * GAUDI2_HBM_NUM - 1.
 +       */
 +      prop->num_functional_hbms = GAUDI2_HBM_NUM - faulty_hbms;
 +      return 0;
 +}
 +
 +static int gaudi2_set_dram_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 basic_hbm_page_size;
 +      int rc;
 +
 +      rc = set_number_of_functional_hbms(hdev);
 +      if (rc)
 +              return -EINVAL;
 +
 +      /*
 +       * Due to HW bug in which TLB size is x16 smaller than expected we use a workaround
 +       * in which we are using x16 bigger page size to be able to populate the entire
 +       * HBM mappings in the TLB
 +       */
 +      basic_hbm_page_size = prop->num_functional_hbms * SZ_8M;
 +      prop->dram_page_size = GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR * basic_hbm_page_size;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_size = prop->num_functional_hbms * SZ_16G;
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_supports_virtual_memory = true;
 +
 +      prop->dram_user_base_address = DRAM_PHYS_BASE + prop->dram_page_size;
 +      prop->dram_hints_align_mask = ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK;
 +      prop->hints_dram_reserved_va_range.start_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_START;
 +      prop->hints_dram_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_END;
 +
 +      /* since DRAM page size differs from DMMU page size we need to allocate
 +       * DRAM memory in units of dram_page size and mapping this memory in
 +       * units of DMMU page size. we overcome this size mismatch using a
 +       * scrambling routine which takes a DRAM page and converts it to a DMMU
 +       * page.
 +       * We therefore:
 +       * 1. partition the virtual address space to DRAM-page (whole) pages.
 +       *    (suppose we get n such pages)
 +       * 2. limit the amount of virtual address space we got from 1 above to
 +       *    a multiple of 64M as we don't want the scrambled address to cross
 +       *    the DRAM virtual address space.
 +       *    ( m = (n * DRAM_page_size) / DMMU_page_size).
 +       * 3. determine the and address accordingly
 +       *    end_addr = start_addr + m * 48M
 +       *
 +       *    the DRAM address MSBs (63:48) are not part of the roundup calculation
 +       */
 +      prop->dmmu.start_addr = prop->dram_base_address +
 +                      (prop->dram_page_size *
 +                              DIV_ROUND_UP_SECTOR_T(prop->dram_size, prop->dram_page_size));
 +
 +      prop->dmmu.end_addr = prop->dmmu.start_addr + prop->dram_page_size *
 +                      div_u64((VA_HBM_SPACE_END - prop->dmmu.start_addr), prop->dmmu.page_size);
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props;
 +      u32 num_sync_stream_queues = 0;
 +      int i;
 +
 +      prop->max_queues = GAUDI2_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues, sizeof(struct hw_queue_properties),
 +                                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      q_props = prop->hw_queues_props;
 +
 +      for (i = 0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i++) {
 +              q_props[i].type = QUEUE_TYPE_HW;
 +              q_props[i].driver_only = 0;
 +
 +              if (i >= GAUDI2_QUEUE_ID_NIC_0_0 && i <= GAUDI2_QUEUE_ID_NIC_23_3) {
 +                      q_props[i].supports_sync_stream = 0;
 +              } else {
 +                      q_props[i].supports_sync_stream = 1;
 +                      num_sync_stream_queues++;
 +              }
 +
 +              q_props[i].cb_alloc_flags = CB_ALLOC_USER;
 +      }
 +
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].type = QUEUE_TYPE_CPU;
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].driver_only = 1;
 +      q_props[GAUDI2_QUEUE_ID_CPU_PQ].cb_alloc_flags = CB_ALLOC_KERNEL;
 +
 +      prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE_0;
 +      prop->host_base_address = HOST_PHYS_BASE_0;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE_0;
 +      prop->max_pending_cs = GAUDI2_MAX_PENDING_CS;
 +      prop->completion_queues_count = GAUDI2_RESERVED_CQ_NUMBER;
 +      prop->user_dec_intr_count = NUMBER_OF_DEC;
 +      prop->user_interrupt_count = GAUDI2_IRQ_NUM_USER_LAST - GAUDI2_IRQ_NUM_USER_FIRST + 1;
 +      prop->completion_mode = HL_COMPLETION_MODE_CS;
 +      prop->sync_stream_first_sob = GAUDI2_RESERVED_SOB_NUMBER;
 +      prop->sync_stream_first_mon = GAUDI2_RESERVED_MON_NUMBER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address = prop->sram_base_address + SRAM_USER_BASE_OFFSET;
 +
 +      prop->hints_range_reservation = true;
 +
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_INITIAL_SIZE;
 +
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +
 +      prop->dmmu.hop_shifts[MMU_HOP0] = DHOP0_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP1] = DHOP1_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP2] = DHOP2_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP3] = DHOP3_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP4] = DHOP4_SHIFT;
 +      prop->dmmu.hop_masks[MMU_HOP0] = DHOP0_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP1] = DHOP1_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP2] = DHOP2_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP3] = DHOP3_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP4] = DHOP4_MASK;
 +      prop->dmmu.page_size = PAGE_SIZE_1GB;
 +      prop->dmmu.num_hops = MMU_ARCH_6_HOPS;
 +      prop->dmmu.last_mask = LAST_MASK;
 +      prop->dmmu.host_resident = 1;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /*
 +       * this is done in order to be able to validate FW descriptor (i.e. validating that
 +       * the addresses and allocated space for FW image does not cross memory bounds).
 +       * for this reason we set the DRAM size to the minimum possible and later it will
 +       * be modified according to what reported in the cpucp info packet
 +       */
 +      prop->dram_size = (GAUDI2_HBM_NUM - 1) * SZ_16G;
 +
 +      hdev->pmmu_huge_range = true;
 +      prop->pmmu.host_resident = 1;
 +      prop->pmmu.num_hops = MMU_ARCH_6_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      prop->hints_host_reserved_va_range.start_addr = RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START;
 +      prop->hints_host_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HOST_END;
 +      prop->hints_host_hpage_reserved_va_range.start_addr =
 +                      RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_START;
 +      prop->hints_host_hpage_reserved_va_range.end_addr =
 +                      RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_END;
 +
 +      if (PAGE_SIZE == SZ_64K) {
 +              prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_64K;
 +              prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_64K;
 +              prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_64K;
 +              prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_64K;
 +              prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
 +              prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
 +              prop->pmmu.page_size = PAGE_SIZE_64KB;
 +
 +              /* shifts and masks are the same in PMMU and HPMMU */
 +              memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +              prop->pmmu_huge.page_size = PAGE_SIZE_16MB;
 +              prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
 +              prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
 +      } else {
 +              prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_4K;
 +              prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_4K;
 +              prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_4K;
 +              prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_4K;
 +              prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
 +              prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
 +              prop->pmmu.page_size = PAGE_SIZE_4KB;
 +
 +              /* shifts and masks are the same in PMMU and HPMMU */
 +              memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +              prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +              prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
 +              prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
 +      }
 +
 +      prop->num_engine_cores = CPU_ID_MAX;
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GAUDI2_EVENT_SIZE;
 +
 +      prop->dc_power_default = DC_POWER_DEFAULT;
 +
 +      prop->cb_pool_cb_cnt = GAUDI2_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GAUDI2_CB_POOL_CB_SIZE;
 +      prop->pcie_dbi_base_address = CFG_BASE + mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
 +
 +      prop->mme_master_slave_mode = 1;
 +
 +      prop->first_available_user_sob[0] = GAUDI2_RESERVED_SOB_NUMBER +
 +                                      (num_sync_stream_queues * HL_RSVD_SOBS);
 +
 +      prop->first_available_user_mon[0] = GAUDI2_RESERVED_MON_NUMBER +
 +                                      (num_sync_stream_queues * HL_RSVD_MONS);
 +
 +      prop->first_available_user_interrupt = GAUDI2_IRQ_NUM_USER_FIRST;
 +
 +      prop->first_available_cq[0] = GAUDI2_RESERVED_CQ_NUMBER;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->max_dec = NUMBER_OF_DEC;
 +
 +      prop->clk_pll_index = HL_GAUDI2_MME_PLL;
 +
 +      prop->dma_mask = 64;
 +
 +      prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"CFG_SRAM", "MSIX", "DRAM"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] + (CFG_BASE - STM_FLASH_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 gaudi2_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((gaudi2) && (gaudi2->dram_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return U64_MAX;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to DRAM */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = DRAM_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (gaudi2) {
 +              old_addr = gaudi2->dram_bar_cur_addr;
 +              gaudi2->dram_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +static int gaudi2_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      u32 bar_addr_low, bar_addr_high;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Temporary inbound Region 0 - Bar 0 - Point to CFG
 +       * We must map this region in BAR match mode in order to
 +       * fetch BAR physical base address
 +       */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      /* Base address must be aligned to Bar size which is 256 MB */
 +      inbound_region.addr = STM_FLASH_BASE_ADDR - STM_FLASH_ALIGNED_OFF;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Fetch physical BAR address */
 +      bar_addr_high = RREG32(mmPCIE_DBI_BAR1_REG + STM_FLASH_ALIGNED_OFF);
 +      bar_addr_low = RREG32(mmPCIE_DBI_BAR0_REG + STM_FLASH_ALIGNED_OFF) & ~0xF;
 +
 +      hdev->pcie_bar_phys[SRAM_CFG_BAR_ID] = (u64)bar_addr_high << 32 | bar_addr_low;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to CFG */
 +      inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.offset_in_bar = 0;
 +      inbound_region.addr = STM_FLASH_BASE_ADDR;
 +      inbound_region.size = CFG_REGION_SIZE;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Inbound Region 1 - Bar 0 - Point to BAR0_RESERVED + SRAM */
 +      inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.offset_in_bar = CFG_REGION_SIZE;
 +      inbound_region.addr = BAR0_RSRVD_BASE_ADDR;
 +      inbound_region.size = BAR0_RSRVD_SIZE + SRAM_SIZE;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Inbound Region 2 - Bar 4 - Point to DRAM */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = DRAM_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
 +      if (rc)
 +              return rc;
 +
 +      /* Outbound Region 0 - Point to Host */
 +      outbound_region.addr = HOST_PHYS_BASE_0;
 +      outbound_region.size = HOST_PHYS_SIZE_0;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state gaudi2_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +static int gaudi2_tpc_binning_init_prop(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (hweight64(hdev->tpc_binning) > MAX_CLUSTER_BINNING_FAULTY_TPCS) {
 +              dev_err(hdev->dev, "TPC binning is supported for max of %d faulty TPCs, provided mask 0x%llx\n",
 +                                      MAX_CLUSTER_BINNING_FAULTY_TPCS,
 +                                      hdev->tpc_binning);
 +              return -EINVAL;
 +      }
 +
 +      prop->tpc_binning_mask = hdev->tpc_binning;
 +      prop->tpc_enabled_mask = GAUDI2_TPC_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_tpc_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props = prop->hw_queues_props;
 +      u64 tpc_binning_mask;
 +      u8 subst_idx = 0;
 +      int i, rc;
 +
 +      rc = gaudi2_tpc_binning_init_prop(hdev);
 +      if (rc)
 +              return rc;
 +
 +      tpc_binning_mask = prop->tpc_binning_mask;
 +
 +      for (i = 0 ; i < MAX_FAULTY_TPCS ; i++) {
 +              u8 subst_seq, binned, qid_base;
 +
 +              if (tpc_binning_mask == 0)
 +                      break;
 +
 +              if (subst_idx == 0) {
 +                      subst_seq = TPC_ID_DCORE0_TPC6;
 +                      qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
 +              } else {
 +                      subst_seq = TPC_ID_DCORE3_TPC5;
 +                      qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0;
 +              }
 +
 +
 +              /* clear bit from mask */
 +              binned = __ffs(tpc_binning_mask);
 +              /*
 +               * Coverity complains about possible out-of-bound access in
 +               * clear_bit
 +               */
 +              if (binned >= TPC_ID_SIZE) {
 +                      dev_err(hdev->dev,
 +                              "Invalid binned TPC (binning mask: %llx)\n",
 +                              tpc_binning_mask);
 +                      return -EINVAL;
 +              }
 +              clear_bit(binned, (unsigned long *)&tpc_binning_mask);
 +
 +              /* also clear replacing TPC bit from enabled mask */
 +              clear_bit(subst_seq, (unsigned long *)&prop->tpc_enabled_mask);
 +
 +              /* bin substite TPC's Qs */
 +              q_props[qid_base].binned = 1;
 +              q_props[qid_base + 1].binned = 1;
 +              q_props[qid_base + 2].binned = 1;
 +              q_props[qid_base + 3].binned = 1;
 +
 +              subst_idx++;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_dec_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 num_faulty;
 +
 +      num_faulty = hweight32(hdev->decoder_binning);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_DECODERS) {
 +              dev_err(hdev->dev, "decoder binning is supported for max of single faulty decoder, provided mask 0x%x\n",
 +                                              hdev->decoder_binning);
 +              return -EINVAL;
 +      }
 +
 +      prop->decoder_binning_mask = (hdev->decoder_binning & GAUDI2_DECODER_FULL_MASK);
 +
 +      if (prop->decoder_binning_mask)
 +              prop->decoder_enabled_mask = (GAUDI2_DECODER_FULL_MASK & ~BIT(DEC_ID_PCIE_VDEC1));
 +      else
 +              prop->decoder_enabled_mask = GAUDI2_DECODER_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static void gaudi2_set_dram_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* check if we should override default binning */
 +      if (!hdev->dram_binning) {
 +              prop->dram_binning_mask = 0;
 +              prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK;
 +              return;
 +      }
 +
 +      /* set DRAM binning constraints */
 +      prop->faulty_dram_cluster_map |= hdev->dram_binning;
 +      prop->dram_binning_mask = hdev->dram_binning;
 +      prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK & ~BIT(HBM_ID5);
 +}
 +
 +static int gaudi2_set_edma_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hw_queue_properties *q_props;
 +      u8 seq, num_faulty;
 +
 +      num_faulty = hweight32(hdev->edma_binning);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_EDMAS) {
 +              dev_err(hdev->dev,
 +                      "EDMA binning is supported for max of single faulty EDMA, provided mask 0x%x\n",
 +                      hdev->edma_binning);
 +              return -EINVAL;
 +      }
 +
 +      if (!hdev->edma_binning) {
 +              prop->edma_binning_mask = 0;
 +              prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK;
 +              return 0;
 +      }
 +
 +      seq = __ffs((unsigned long)hdev->edma_binning);
 +
 +      /* set binning constraints */
 +      prop->faulty_dram_cluster_map |= BIT(edma_to_hbm_cluster[seq]);
 +      prop->edma_binning_mask = hdev->edma_binning;
 +      prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK & ~BIT(EDMA_ID_DCORE3_INSTANCE1);
 +
 +      /* bin substitute EDMA's queue */
 +      q_props = prop->hw_queues_props;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2].binned = 1;
 +      q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3].binned = 1;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_xbar_edge_enable_mask(struct hl_device *hdev, u32 xbar_edge_iso_mask)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 num_faulty, seq;
 +
 +      /* check if we should override default binning */
 +      if (!xbar_edge_iso_mask) {
 +              prop->xbar_edge_enabled_mask = GAUDI2_XBAR_EDGE_FULL_MASK;
 +              return 0;
 +      }
 +
 +      /*
 +       * note that it can be set to value other than 0 only after cpucp packet (i.e.
 +       * only the FW can set a redundancy value). for user it'll always be 0.
 +       */
 +      num_faulty = hweight32(xbar_edge_iso_mask);
 +
 +      /*
 +       * check for error condition in which number of binning candidates
 +       * is higher than the maximum supported by the driver
 +       */
 +      if (num_faulty > MAX_FAULTY_XBARS) {
 +              dev_err(hdev->dev, "we cannot have more than %d faulty XBAR EDGE\n",
 +                                                                      MAX_FAULTY_XBARS);
 +              return -EINVAL;
 +      }
 +
 +      seq = __ffs((unsigned long)xbar_edge_iso_mask);
 +
 +      /* set binning constraints */
 +      prop->faulty_dram_cluster_map |= BIT(xbar_edge_to_hbm_cluster[seq]);
 +      prop->xbar_edge_enabled_mask = (~xbar_edge_iso_mask) & GAUDI2_XBAR_EDGE_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_cluster_binning_masks_common(struct hl_device *hdev, u8 xbar_edge_iso_mask)
 +{
 +      int rc;
 +
 +      /*
 +       * mark all clusters as good, each component will "fail" cluster
 +       * based on eFuse/user values.
 +       * If more than single cluster is faulty- the chip is unusable
 +       */
 +      hdev->asic_prop.faulty_dram_cluster_map = 0;
 +
 +      gaudi2_set_dram_binning_masks(hdev);
 +
 +      rc = gaudi2_set_edma_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_xbar_edge_enable_mask(hdev, xbar_edge_iso_mask);
 +      if (rc)
 +              return rc;
 +
 +
 +      /* always initially set to full mask */
 +      hdev->asic_prop.hmmu_hif_enabled_mask = GAUDI2_HIF_HMMU_FULL_MASK;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_cluster_binning_masks(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      rc = gaudi2_set_cluster_binning_masks_common(hdev, prop->cpucp_info.xbar_binning_mask);
 +      if (rc)
 +              return rc;
 +
 +      /* if we have DRAM binning reported by FW we should perform cluster config  */
 +      if (prop->faulty_dram_cluster_map) {
 +              u8 cluster_seq = __ffs((unsigned long)prop->faulty_dram_cluster_map);
 +
 +              prop->hmmu_hif_enabled_mask = cluster_hmmu_hif_enabled_mask[cluster_seq];
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_set_binning_masks(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi2_set_cluster_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_tpc_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_set_dec_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      long max_power;
 +      u64 dram_size;
 +      int rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      /* No point of asking this information again when not doing hard reset, as the device
 +       * CPU hasn't been reset
 +       */
 +      if (hdev->reset_info.in_compute_reset)
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0, mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                                                              mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
 +      if (dram_size) {
 +              /* we can have wither 5 or 6 HBMs. other values are invalid */
 +
 +              if ((dram_size != ((GAUDI2_HBM_NUM - 1) * SZ_16G)) &&
 +                                      (dram_size != (GAUDI2_HBM_NUM * SZ_16G))) {
 +                      dev_err(hdev->dev,
 +                              "F/W reported invalid DRAM size %llu. Trying to use default size %llu\n",
 +                              dram_size, prop->dram_size);
 +                      dram_size = prop->dram_size;
 +              }
 +
 +              prop->dram_size = dram_size;
 +              prop->dram_end_address = prop->dram_base_address + dram_size;
 +      }
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
 +
 +      /* Overwrite binning masks with the actual binning values from F/W */
 +      hdev->dram_binning = prop->cpucp_info.dram_binning_mask;
 +      hdev->edma_binning = prop->cpucp_info.edma_binning_mask;
 +      hdev->tpc_binning = le64_to_cpu(prop->cpucp_info.tpc_binning_mask);
 +      hdev->decoder_binning = lower_32_bits(le64_to_cpu(prop->cpucp_info.decoder_binning_mask));
 +
 +      /*
 +       * at this point the DRAM parameters need to be updated according to data obtained
 +       * from the FW
 +       */
 +      rc = hdev->asic_funcs->set_dram_properties(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = hdev->asic_funcs->set_binning_masks(hdev);
 +      if (rc)
 +              return rc;
 +
 +      max_power = hl_fw_get_max_power(hdev);
 +      if (max_power < 0)
 +              return max_power;
 +
 +      prop->max_power_default = (u64) max_power;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS];
 +      int rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI2_CPU_PLL, pll_freq_arr);
 +      if (rc)
 +              return rc;
 +
 +      hdev->asic_prop.psoc_timestamp_frequency = pll_freq_arr[3];
 +
 +      return 0;
 +}
 +
 +static int gaudi2_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      int rc;
 +
 +      rc = gaudi2_set_fixed_properties(hdev);
 +      if (rc)
 +              return rc;
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
 +      if (pci_bar_size != MSIX_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, DRAM_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, DRAM_BAR_ID);
 +
 +      /*
 +       * Only in pldm driver config iATU
 +       */
 +      if (hdev->pldm)
 +              hdev->asic_prop.iatu_done_by_fw = false;
 +      else
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (gaudi2_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +static int gaudi2_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +static bool gaudi2_is_arc_nic_owned(u64 arc_id)
 +{
 +      switch (arc_id) {
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static bool gaudi2_is_arc_tpc_owned(u64 arc_id)
 +{
 +      switch (arc_id) {
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_init_arcs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 arc_id;
 +      u32 i;
 +
 +      for (i = CPU_ID_SCHED_ARC0 ; i <= CPU_ID_SCHED_ARC3 ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_set_arc_id_cap(hdev, i);
 +      }
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              arc_id = gaudi2_queue_id_to_arc_id[i];
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      continue;
 +
 +              if (gaudi2_is_arc_nic_owned(arc_id) &&
 +                              !(hdev->nic_ports_mask & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0)))
 +                      continue;
 +
 +              if (gaudi2_is_arc_tpc_owned(arc_id) && !(gaudi2->tpc_hw_cap_initialized &
 +                                                      BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0)))
 +                      continue;
 +
 +              gaudi2_set_arc_id_cap(hdev, arc_id);
 +      }
 +}
 +
 +static int gaudi2_scrub_arc_dccm(struct hl_device *hdev, u32 cpu_id)
 +{
 +      u32 reg_base, reg_val;
 +      int rc;
 +
 +      switch (cpu_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC3:
 +              /* Each ARC scheduler has 2 consecutive DCCM blocks */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE * 2, true);
 +              if (rc)
 +                      return rc;
 +              break;
 +      case CPU_ID_SCHED_ARC4:
 +      case CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0:
 +      case CPU_ID_MME_QMAN_ARC1:
 +              reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +
 +              /* Scrub lower DCCM block */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +
 +              /* Switch to upper DCCM block */
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 1);
 +              WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
 +
 +              /* Scrub upper DCCM block */
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +
 +              /* Switch to lower DCCM block */
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 0);
 +              WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
 +              break;
 +      default:
 +              rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
 +                                              ARC_DCCM_BLOCK_SIZE, true);
 +              if (rc)
 +                      return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_scrub_arcs_dccm(struct hl_device *hdev)
 +{
 +      u16 arc_id;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0 ; arc_id < CPU_ID_MAX ; arc_id++) {
 +              if (!gaudi2_is_arc_enabled(hdev, arc_id))
 +                      continue;
 +
 +              gaudi2_scrub_arc_dccm(hdev, arc_id);
 +      }
 +}
 +
 +static int gaudi2_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      hdev->asic_prop.supports_advanced_cpucp_rc = true;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS,
 +                                      gaudi2->virt_msix_db_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
 +              return rc;
 +      }
 +
 +      rc = gaudi2_fetch_psoc_frequency(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
 +              goto disable_pci_access;
 +      }
 +
 +      gaudi2_init_arcs(hdev);
 +      gaudi2_scrub_arcs_dccm(hdev);
 +      gaudi2_init_security(hdev);
 +
 +      return 0;
 +
 +disable_pci_access:
 +      hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_late_fini(struct hl_device *hdev)
 +{
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static void gaudi2_user_mapped_dec_init(struct gaudi2_device *gaudi2, u32 start_idx)
 +{
 +      struct user_mapped_block *blocks = gaudi2->mapped_blocks;
 +
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmPCIE_DEC0_CMD_BASE, HL_BLOCK_SIZE);
 +      HL_USR_MAPPED_BLK_INIT(&blocks[start_idx], mmPCIE_DEC1_CMD_BASE, HL_BLOCK_SIZE);
 +}
 +
 +static void gaudi2_user_mapped_blocks_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct user_mapped_block *blocks = gaudi2->mapped_blocks;
 +      u32 block_size, umr_start_idx, num_umr_blocks;
 +      int i;
 +
 +      for (i = 0 ; i < NUM_ARC_CPUS ; i++) {
 +              if (i >= CPU_ID_SCHED_ARC0 && i <= CPU_ID_SCHED_ARC3)
 +                      block_size = ARC_DCCM_BLOCK_SIZE * 2;
 +              else
 +                      block_size = ARC_DCCM_BLOCK_SIZE;
 +
 +              blocks[i].address = gaudi2_arc_dccm_bases[i];
 +              blocks[i].size = block_size;
 +      }
 +
 +      blocks[NUM_ARC_CPUS].address = mmARC_FARM_ARC0_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 1].address = mmARC_FARM_ARC1_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 1].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 2].address = mmARC_FARM_ARC2_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 2].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 3].address = mmARC_FARM_ARC3_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 3].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 4].address = mmDCORE0_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 4].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 5].address = mmDCORE1_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 5].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 6].address = mmDCORE2_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 6].size = HL_BLOCK_SIZE;
 +
 +      blocks[NUM_ARC_CPUS + 7].address = mmDCORE3_MME_QM_ARC_ACP_ENG_BASE;
 +      blocks[NUM_ARC_CPUS + 7].size = HL_BLOCK_SIZE;
 +
 +      umr_start_idx = NUM_ARC_CPUS + NUM_OF_USER_ACP_BLOCKS;
 +      num_umr_blocks = NIC_NUMBER_OF_ENGINES * NUM_OF_USER_NIC_UMR_BLOCKS;
 +      for (i = 0 ; i < num_umr_blocks ; i++) {
 +              u8 nic_id, umr_block_id;
 +
 +              nic_id = i / NUM_OF_USER_NIC_UMR_BLOCKS;
 +              umr_block_id = i % NUM_OF_USER_NIC_UMR_BLOCKS;
 +
 +              blocks[umr_start_idx + i].address =
 +                      mmNIC0_UMR0_0_UNSECURE_DOORBELL0_BASE +
 +                      (nic_id / NIC_NUMBER_OF_QM_PER_MACRO) * NIC_OFFSET +
 +                      (nic_id % NIC_NUMBER_OF_QM_PER_MACRO) * NIC_QM_OFFSET +
 +                      umr_block_id * NIC_UMR_OFFSET;
 +              blocks[umr_start_idx + i].size = HL_BLOCK_SIZE;
 +      }
 +
 +      /* Expose decoder HW configuration block to user */
 +      gaudi2_user_mapped_dec_init(gaudi2, USR_MAPPED_BLK_DEC_START_IDX);
 +
 +      for (i = 1; i < NUM_OF_DCORES; ++i) {
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].size = SM_OBJS_BLOCK_SIZE;
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].size = HL_BLOCK_SIZE;
 +
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].address =
 +                                              mmDCORE0_SYNC_MNGR_OBJS_BASE + i * DCORE_OFFSET;
 +
 +              blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].address =
 +                                              mmDCORE0_SYNC_MNGR_GLBL_BASE + i * DCORE_OFFSET;
 +      }
 +}
 +
 +static int gaudi2_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
 +{
 +      dma_addr_t dma_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
 +      void *virt_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {};
 +      int i, j, rc = 0;
 +
 +      /* The device ARC works with 32-bits addresses, and because there is a single HW register
 +       * that holds the extension bits (49..28), these bits must be identical in all the allocated
 +       * range.
 +       */
 +
 +      for (i = 0 ; i < GAUDI2_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
 +              virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                      &dma_addr_arr[i], GFP_KERNEL | __GFP_ZERO);
 +              if (!virt_addr_arr[i]) {
 +                      rc = -ENOMEM;
 +                      goto free_dma_mem_arr;
 +              }
 +
 +              end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
 +              if (GAUDI2_ARC_PCI_MSB_ADDR(dma_addr_arr[i]) == GAUDI2_ARC_PCI_MSB_ADDR(end_addr))
 +                      break;
 +      }
 +
 +      if (i == GAUDI2_ALLOC_CPU_MEM_RETRY_CNT) {
 +              dev_err(hdev->dev,
 +                      "MSB of ARC accessible DMA memory are not identical in all range\n");
 +              rc = -EFAULT;
 +              goto free_dma_mem_arr;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
 +      hdev->cpu_accessible_dma_address = dma_addr_arr[i];
 +
 +free_dma_mem_arr:
 +      for (j = 0 ; j < i ; j++)
 +              hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
 +                                              dma_addr_arr[j]);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - STM_FLASH_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = CFG_REGION_SIZE + BAR0_RSRVD_SIZE;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = DRAM_BAR_ID;
 +      region->used = 1;
 +}
 +
 +static void gaudi2_user_interrupt_setup(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i, j, k;
 +
 +      /* Initialize common user CQ interrupt */
 +      HL_USR_INTR_STRUCT_INIT(hdev->common_user_cq_interrupt, hdev,
 +                              HL_COMMON_USER_CQ_INTERRUPT_ID, HL_USR_INTERRUPT_CQ);
 +
 +      /* Initialize common decoder interrupt */
 +      HL_USR_INTR_STRUCT_INIT(hdev->common_decoder_interrupt, hdev,
 +                              HL_COMMON_DEC_INTERRUPT_ID, HL_USR_INTERRUPT_DECODER);
 +
 +      /* User interrupts structure holds both decoder and user interrupts from various engines.
 +       * We first initialize the decoder interrupts and then we add the user interrupts.
 +       * The only limitation is that the last decoder interrupt id must be smaller
 +       * then GAUDI2_IRQ_NUM_USER_FIRST. This is checked at compilation time.
 +       */
 +
 +      /* Initialize decoder interrupts, expose only normal interrupts,
 +       * error interrupts to be handled by driver
 +       */
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, j = 0 ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_NRM;
 +                                                                              i += 2, j++)
 +              HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i,
 +                                              HL_USR_INTERRUPT_DECODER);
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, k = 0 ; k < prop->user_interrupt_count; i++, j++, k++)
 +              HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, HL_USR_INTERRUPT_CQ);
 +}
 +
 +static inline int gaudi2_get_non_zero_random_int(void)
 +{
 +      int rand = get_random_u32();
 +
 +      return rand ? rand : 1;
 +}
 +
 +static void gaudi2_special_blocks_free(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_skip_blocks_cfg *skip_special_blocks_cfg =
 +                      &prop->skip_special_blocks_cfg;
 +
 +      kfree(prop->special_blocks);
 +      kfree(skip_special_blocks_cfg->block_types);
 +      kfree(skip_special_blocks_cfg->block_ranges);
 +}
 +
 +static void gaudi2_special_blocks_iterator_free(struct hl_device *hdev)
 +{
 +      gaudi2_special_blocks_free(hdev);
 +}
 +
 +static bool gaudi2_special_block_skip(struct hl_device *hdev,
 +              struct hl_special_blocks_cfg *special_blocks_cfg,
 +              u32 blk_idx, u32 major, u32 minor, u32 sub_minor)
 +{
 +      return false;
 +}
 +
 +static int gaudi2_special_blocks_config(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i, rc;
 +
 +      /* Configure Special blocks */
 +      prop->glbl_err_cause_num = GAUDI2_NUM_OF_GLBL_ERR_CAUSE;
 +      prop->num_of_special_blocks = ARRAY_SIZE(gaudi2_special_blocks);
 +      prop->special_blocks = kmalloc_array(prop->num_of_special_blocks,
 +                      sizeof(*prop->special_blocks), GFP_KERNEL);
 +      if (!prop->special_blocks)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < prop->num_of_special_blocks ; i++)
 +              memcpy(&prop->special_blocks[i], &gaudi2_special_blocks[i],
 +                              sizeof(*prop->special_blocks));
 +
 +      /* Configure when to skip Special blocks */
 +      memset(&prop->skip_special_blocks_cfg, 0, sizeof(prop->skip_special_blocks_cfg));
 +      prop->skip_special_blocks_cfg.skip_block_hook = gaudi2_special_block_skip;
 +
 +      if (ARRAY_SIZE(gaudi2_iterator_skip_block_types)) {
 +              prop->skip_special_blocks_cfg.block_types =
 +                              kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_types),
 +                                      sizeof(gaudi2_iterator_skip_block_types[0]), GFP_KERNEL);
 +              if (!prop->skip_special_blocks_cfg.block_types) {
 +                      rc = -ENOMEM;
 +                      goto free_special_blocks;
 +              }
 +
 +              memcpy(prop->skip_special_blocks_cfg.block_types, gaudi2_iterator_skip_block_types,
 +                              sizeof(gaudi2_iterator_skip_block_types));
 +
 +              prop->skip_special_blocks_cfg.block_types_len =
 +                                      ARRAY_SIZE(gaudi2_iterator_skip_block_types);
 +      }
 +
 +      if (ARRAY_SIZE(gaudi2_iterator_skip_block_ranges)) {
 +              prop->skip_special_blocks_cfg.block_ranges =
 +                              kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_ranges),
 +                                      sizeof(gaudi2_iterator_skip_block_ranges[0]), GFP_KERNEL);
 +              if (!prop->skip_special_blocks_cfg.block_ranges) {
 +                      rc = -ENOMEM;
 +                      goto free_skip_special_blocks_types;
 +              }
 +
 +              for (i = 0 ; i < ARRAY_SIZE(gaudi2_iterator_skip_block_ranges) ; i++)
 +                      memcpy(&prop->skip_special_blocks_cfg.block_ranges[i],
 +                                      &gaudi2_iterator_skip_block_ranges[i],
 +                                      sizeof(struct range));
 +
 +              prop->skip_special_blocks_cfg.block_ranges_len =
 +                                      ARRAY_SIZE(gaudi2_iterator_skip_block_ranges);
 +      }
 +
 +      return 0;
 +
 +free_skip_special_blocks_types:
 +      kfree(prop->skip_special_blocks_cfg.block_types);
 +free_special_blocks:
 +      kfree(prop->special_blocks);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_special_blocks_iterator_config(struct hl_device *hdev)
 +{
 +      return gaudi2_special_blocks_config(hdev);
 +}
 +
 +static int gaudi2_sw_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2;
 +      int i, rc;
 +
 +      /* Allocate device structure */
 +      gaudi2 = kzalloc(sizeof(*gaudi2), GFP_KERNEL);
 +      if (!gaudi2)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < ARRAY_SIZE(gaudi2_irq_map_table) ; i++) {
 +              if (gaudi2_irq_map_table[i].msg || !gaudi2_irq_map_table[i].valid)
 +                      continue;
 +
 +              if (gaudi2->num_of_valid_hw_events == GAUDI2_EVENT_SIZE) {
 +                      dev_err(hdev->dev, "H/W events array exceeds the limit of %u events\n",
 +                              GAUDI2_EVENT_SIZE);
 +                      rc = -EINVAL;
 +                      goto free_gaudi2_device;
 +              }
 +
 +              gaudi2->hw_events[gaudi2->num_of_valid_hw_events++] = gaudi2_irq_map_table[i].fc_id;
 +      }
 +
 +      for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++)
 +              gaudi2->lfsr_rand_seeds[i] = gaudi2_get_non_zero_random_int();
 +
 +      gaudi2->cpucp_info_get = gaudi2_cpucp_info_get;
 +
 +      hdev->asic_specific = gaudi2;
 +
 +      /* Create DMA pool for small allocations.
 +       * Use DEVICE_CACHE_LINE_SIZE for alignment since the NIC memory-mapped
 +       * PI/CI registers allocated from this pool have this restriction
 +       */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev), &hdev->pdev->dev,
 +                                      GAUDI2_DMA_POOL_BLK_SIZE, DEVICE_CACHE_LINE_SIZE, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_gaudi2_device;
 +      }
 +
 +      rc = gaudi2_alloc_cpu_accessible_dma_mem(hdev);
 +      if (rc)
 +              goto free_dma_pool;
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev, "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool, (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      gaudi2->virt_msix_db_cpu_addr = hl_cpu_accessible_dma_pool_alloc(hdev, prop->pmmu.page_size,
 +                                                              &gaudi2->virt_msix_db_dma_addr);
 +      if (!gaudi2->virt_msix_db_cpu_addr) {
 +              dev_err(hdev->dev, "Failed to allocate DMA memory for virtual MSI-X doorbell\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      spin_lock_init(&gaudi2->hw_queues_lock);
 +
 +      gaudi2->scratchpad_kernel_address = hl_asic_dma_alloc_coherent(hdev, PAGE_SIZE,
 +                                                      &gaudi2->scratchpad_bus_address,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +      if (!gaudi2->scratchpad_kernel_address) {
 +              rc = -ENOMEM;
 +              goto free_virt_msix_db_mem;
 +      }
 +
 +      gaudi2_user_mapped_blocks_init(hdev);
 +
 +      /* Initialize user interrupts */
 +      gaudi2_user_interrupt_setup(hdev);
 +
 +      hdev->supports_coresight = true;
 +      hdev->supports_sync_stream = true;
 +      hdev->supports_cb_mapping = true;
 +      hdev->supports_wait_for_multi_cs = false;
 +
 +      prop->supports_compute_reset = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +
 +      rc = gaudi2_special_blocks_iterator_config(hdev);
 +      if (rc)
 +              goto free_scratchpad_mem;
 +
 +      return 0;
 +
 +free_scratchpad_mem:
 +      hl_asic_dma_pool_free(hdev, gaudi2->scratchpad_kernel_address,
 +                              gaudi2->scratchpad_bus_address);
 +free_virt_msix_db_mem:
 +      hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_gaudi2_device:
 +      kfree(gaudi2);
 +      return rc;
 +}
 +
 +static int gaudi2_sw_fini(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      gaudi2_special_blocks_iterator_free(hdev);
 +
 +      hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                              hdev->cpu_accessible_dma_address);
 +
 +      hl_asic_dma_free_coherent(hdev, PAGE_SIZE, gaudi2->scratchpad_kernel_address,
 +                                      gaudi2->scratchpad_bus_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(gaudi2);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_stop_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_STOP |
 +                                              QM_GLBL_CFG1_CQF_STOP |
 +                                              QM_GLBL_CFG1_CP_STOP);
 +
 +      /* stop also the ARC */
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_STOP);
 +}
 +
 +static void gaudi2_flush_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_FLUSH |
 +                                              QM_GLBL_CFG1_CQF_FLUSH |
 +                                              QM_GLBL_CFG1_CP_FLUSH);
 +}
 +
 +static void gaudi2_flush_qman_arc_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_FLUSH);
 +}
 +
 +/**
 + * gaudi2_clear_qm_fence_counters_common - clear QM's fence counters
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @queue_id: queue to clear fence counters to
 + * @skip_fence: if true set maximum fence value to all fence counters to avoid
 + *              getting stuck on any fence value. otherwise set all fence
 + *              counters to 0 (standard clear of fence counters)
 + */
 +static void gaudi2_clear_qm_fence_counters_common(struct hl_device *hdev, u32 queue_id,
 +                                              bool skip_fence)
 +{
 +      u32 size, reg_base;
 +      u32 addr, val;
 +
 +      reg_base = gaudi2_qm_blocks_bases[queue_id];
 +
 +      addr = reg_base + QM_CP_FENCE0_CNT_0_OFFSET;
 +      size = mmPDMA0_QM_CP_BARRIER_CFG - mmPDMA0_QM_CP_FENCE0_CNT_0;
 +
 +      /*
 +       * in case we want to make sure that QM that is stuck on a fence will
 +       * be released we should set the fence counter to a higher value that
 +       * the value the QM waiting for. to comply with any fence counter of
 +       * any value we set maximum fence value to all counters
 +       */
 +      val = skip_fence ? U32_MAX : 0;
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +}
 +
 +static void gaudi2_qman_manual_flush_common(struct hl_device *hdev, u32 queue_id)
 +{
 +      u32 reg_base = gaudi2_qm_blocks_bases[queue_id];
 +
 +      gaudi2_clear_qm_fence_counters_common(hdev, queue_id, true);
 +      gaudi2_flush_qman_common(hdev, reg_base);
 +      gaudi2_flush_qman_arc_common(hdev, reg_base);
 +}
 +
 +static void gaudi2_stop_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stop_edma_qmans;
 +
 +      /* Stop CPs of PDMA QMANs */
 +      gaudi2_stop_qman_common(hdev, mmPDMA0_QM_BASE);
 +      gaudi2_stop_qman_common(hdev, mmPDMA1_QM_BASE);
 +
 +stop_edma_qmans:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 qm_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Stop CPs of EDMA QMANs */
 +                      gaudi2_stop_qman_common(hdev, qm_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_stop_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i)))
 +                      continue;
 +
 +              gaudi2_stop_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
 +      }
 +}
 +
 +static void gaudi2_stop_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stop_rot_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stop_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base, queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_stop_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_stall_dma_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      u32 reg_val;
 +
 +      reg_val = FIELD_PREP(PDMA0_CORE_CFG_1_HALT_MASK, 0x1);
 +      WREG32(reg_base + DMA_CORE_CFG_1_OFFSET, reg_val);
 +}
 +
 +static void gaudi2_dma_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stall_edma;
 +
 +      gaudi2_stall_dma_common(hdev, mmPDMA0_CORE_BASE);
 +      gaudi2_stall_dma_common(hdev, mmPDMA1_CORE_BASE);
 +
 +stall_edma:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 core_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      core_base = mmDCORE0_EDMA0_CORE_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Stall CPs of EDMA QMANs */
 +                      gaudi2_stall_dma_common(hdev, core_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_mme_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_CTRL_LO_QM_STALL - mmDCORE0_MME_CTRL_LO_QM_STALL;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
 +                      WREG32(mmDCORE0_MME_CTRL_LO_QM_STALL + (i * offset), 1);
 +}
 +
 +static void gaudi2_tpc_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_tpc_cfg_blocks_bases[i];
 +              WREG32(reg_base + TPC_CFG_STALL_OFFSET, 1);
 +      }
 +}
 +
 +static void gaudi2_rotator_stall(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_val;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      reg_val = FIELD_PREP(ROT_MSS_HALT_WBC_MASK, 0x1) |
 +                      FIELD_PREP(ROT_MSS_HALT_RSB_MASK, 0x1) |
 +                      FIELD_PREP(ROT_MSS_HALT_MRSB_MASK, 0x1);
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              WREG32(mmROT0_MSS_HALT + i * ROT_OFFSET, reg_val);
 +      }
 +}
 +
 +static void gaudi2_disable_qman_common(struct hl_device *hdev, u32 reg_base)
 +{
 +      WREG32(reg_base + QM_GLBL_CFG0_OFFSET, 0);
 +}
 +
 +static void gaudi2_disable_dma_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
 +              goto stop_edma_qmans;
 +
 +      gaudi2_disable_qman_common(hdev, mmPDMA0_QM_BASE);
 +      gaudi2_disable_qman_common(hdev, mmPDMA1_QM_BASE);
 +
 +stop_edma_qmans:
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +                      u32 qm_base;
 +
 +                      if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
 +                              continue;
 +
 +                      qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
 +                                      inst * DCORE_EDMA_OFFSET;
 +
 +                      /* Disable CPs of EDMA QMANs */
 +                      gaudi2_disable_qman_common(hdev, qm_base);
 +              }
 +      }
 +}
 +
 +static void gaudi2_disable_mme_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, i;
 +
 +      offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
 +                      gaudi2_disable_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
 +}
 +
 +static void gaudi2_disable_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
 +              return;
 +
 +      for (i = 0 ; i < TPC_ID_SIZE ; i++) {
 +              if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_disable_rot_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +      int i;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
 +              return;
 +
 +      for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
 +              if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_disable_nic_qmans(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base, queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_disable_qman_common(hdev, reg_base);
 +      }
 +}
 +
 +static void gaudi2_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 1);
 +}
 +
 +static void gaudi2_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE, 0);
 +}
 +
 +static const char *gaudi2_irq_name(u16 irq_number)
 +{
 +      switch (irq_number) {
 +      case GAUDI2_IRQ_NUM_EVENT_QUEUE:
 +              return "gaudi2 cpu eq";
 +      case GAUDI2_IRQ_NUM_COMPLETION:
 +              return "gaudi2 completion";
 +      case GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ... GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM:
 +              return gaudi2_vdec_irq_name[irq_number - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM];
 +      case GAUDI2_IRQ_NUM_USER_FIRST ... GAUDI2_IRQ_NUM_USER_LAST:
 +              return "gaudi2 user completion";
 +      default:
 +              return "invalid";
 +      }
 +}
 +
 +static void gaudi2_dec_disable_msix(struct hl_device *hdev, u32 max_irq_num)
 +{
 +      int i, irq, relative_idx;
 +      struct hl_dec *dec;
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i < max_irq_num ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
 +
 +              dec = hdev->dec + relative_idx / 2;
 +
 +              /* We pass different structures depending on the irq handler. For the abnormal
 +               * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
 +               * user_interrupt entry
 +               */
 +              free_irq(irq, ((relative_idx % 2) ?
 +                              (void *) dec :
 +                              (void *) &hdev->user_interrupt[dec->core_id]));
 +      }
 +}
 +
 +static int gaudi2_dec_enable_msix(struct hl_device *hdev)
 +{
 +      int rc, i, irq_init_cnt, irq, relative_idx;
 +      irq_handler_t irq_handler;
 +      struct hl_dec *dec;
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, irq_init_cnt = 0;
 +                      i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM;
 +                      i++, irq_init_cnt++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
 +
 +              irq_handler = (relative_idx % 2) ?
 +                              hl_irq_handler_dec_abnrm :
 +                              hl_irq_handler_user_interrupt;
 +
 +              dec = hdev->dec + relative_idx / 2;
 +
 +              /* We pass different structures depending on the irq handler. For the abnormal
 +               * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
 +               * user_interrupt entry
 +               */
 +              rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i),
 +                              ((relative_idx % 2) ?
 +                              (void *) dec :
 +                              (void *) &hdev->user_interrupt[dec->core_id]));
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_dec_irqs;
 +              }
 +      }
 +
 +      return 0;
 +
 +free_dec_irqs:
 +      gaudi2_dec_disable_msix(hdev, (GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + irq_init_cnt));
 +      return rc;
 +}
 +
 +static int gaudi2_enable_msix(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc, irq, i, j, user_irq_init_cnt;
 +      irq_handler_t irq_handler;
 +      struct hl_cq *cq;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_MSIX)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, GAUDI2_MSIX_ENTRIES, GAUDI2_MSIX_ENTRIES,
 +                                      PCI_IRQ_MSIX);
 +      if (rc < 0) {
 +              dev_err(hdev->dev, "MSI-X: Failed to enable support -- %d/%d\n",
 +                      GAUDI2_MSIX_ENTRIES, rc);
 +              return rc;
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
 +      rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_COMPLETION), cq);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irq_vectors;
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_EVENT_QUEUE),
 +                      &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_completion_irq;
 +      }
 +
 +      rc = gaudi2_dec_enable_msix(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to enable decoder IRQ");
 +              goto free_event_irq;
 +      }
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, user_irq_init_cnt = 0;
 +                      user_irq_init_cnt < prop->user_interrupt_count;
 +                      i++, j++, user_irq_init_cnt++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              irq_handler = hl_irq_handler_user_interrupt;
 +
 +              rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i), &hdev->user_interrupt[j]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_user_irq;
 +              }
 +      }
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_MSIX;
 +
 +      return 0;
 +
 +free_user_irq:
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count;
 +                      i < GAUDI2_IRQ_NUM_USER_FIRST + user_irq_init_cnt ; i++, j++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->user_interrupt[j]);
 +      }
 +
 +      gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
 +
 +free_event_irq:
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      free_irq(irq, cq);
 +
 +free_completion_irq:
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      free_irq(irq, cq);
 +
 +free_irq_vectors:
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_sync_irqs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i, j;
 +      int irq;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION));
 +
 +      for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              synchronize_irq(irq);
 +      }
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = 0 ; j < hdev->asic_prop.user_interrupt_count;
 +                                                                              i++, j++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              synchronize_irq(irq);
 +      }
 +
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE));
 +}
 +
 +static void gaudi2_disable_msix(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct hl_cq *cq;
 +      int irq, i, j, k;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      gaudi2_sync_irqs(hdev);
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
 +      free_irq(irq, &hdev->event_queue);
 +
 +      gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
 +
 +      for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, k = 0;
 +                      k < hdev->asic_prop.user_interrupt_count ; i++, j++, k++) {
 +
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->user_interrupt[j]);
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
 +      free_irq(irq, cq);
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      gaudi2->hw_cap_initialized &= ~HW_CAP_MSIX;
 +}
 +
 +static void gaudi2_stop_dcore_dec(struct hl_device *hdev, int dcore_id)
 +{
 +      u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
 +      u32 graceful_pend_mask = DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
 +      u32 timeout_usec, dec_id, dec_bit, offset, graceful;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +              dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              offset = dcore_id * DCORE_OFFSET + dec_id * DCORE_VDEC_OFFSET;
 +
 +              WREG32(mmDCORE0_DEC0_CMD_SWREG16 + offset, 0);
 +
 +              WREG32(mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
 +
 +              /* Wait till all traffic from decoder stops
 +               * before apply core reset.
 +               */
 +              rc = hl_poll_timeout(
 +                              hdev,
 +                              mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset,
 +                              graceful,
 +                              (graceful & graceful_pend_mask),
 +                              100,
 +                              timeout_usec);
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Failed to stop traffic from DCORE%d Decoder %d\n",
 +                              dcore_id, dec_id);
 +      }
 +}
 +
 +static void gaudi2_stop_pcie_dec(struct hl_device *hdev)
 +{
 +      u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
 +      u32 graceful_pend_mask = PCIE_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
 +      u32 timeout_usec, dec_id, dec_bit, offset, graceful;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +              dec_bit = PCIE_DEC_SHIFT + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              offset = dec_id * PCIE_VDEC_OFFSET;
 +
 +              WREG32(mmPCIE_DEC0_CMD_SWREG16 + offset, 0);
 +
 +              WREG32(mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
 +
 +              /* Wait till all traffic from decoder stops
 +               * before apply core reset.
 +               */
 +              rc = hl_poll_timeout(
 +                              hdev,
 +                              mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset,
 +                              graceful,
 +                              (graceful & graceful_pend_mask),
 +                              100,
 +                              timeout_usec);
 +              if (rc)
 +                      dev_err(hdev->dev,
 +                              "Failed to stop traffic from PCIe Decoder %d\n",
 +                              dec_id);
 +      }
 +}
 +
 +static void gaudi2_stop_dec(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore_id;
 +
 +      if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == 0)
 +              return;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              gaudi2_stop_dcore_dec(hdev, dcore_id);
 +
 +      gaudi2_stop_pcie_dec(hdev);
 +}
 +
 +static void gaudi2_set_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
 +{
 +      u32 reg_base, reg_val;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +      if (run_mode == HL_ENGINE_CORE_RUN)
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 1);
 +      else
 +              reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_HALT_REQ_MASK, 1);
 +
 +      WREG32(reg_base + ARC_HALT_REQ_OFFSET, reg_val);
 +}
 +
 +static void gaudi2_halt_arcs(struct hl_device *hdev)
 +{
 +      u16 arc_id;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++) {
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      gaudi2_set_arc_running_mode(hdev, arc_id, HL_ENGINE_CORE_HALT);
 +      }
 +}
 +
 +static int gaudi2_verify_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
 +{
 +      int rc;
 +      u32 reg_base, val, ack_mask, timeout_usec = 100000;
 +
 +      if (hdev->pldm)
 +              timeout_usec *= 100;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +      if (run_mode == HL_ENGINE_CORE_RUN)
 +              ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_RUN_ACK_MASK;
 +      else
 +              ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_HALT_ACK_MASK;
 +
 +      rc = hl_poll_timeout(hdev, reg_base + ARC_HALT_ACK_OFFSET,
 +                              val, ((val & ack_mask) == ack_mask),
 +                              1000, timeout_usec);
 +
 +      if (!rc) {
 +              /* Clear */
 +              val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 0);
 +              WREG32(reg_base + ARC_HALT_REQ_OFFSET, val);
 +      }
 +
 +      return rc;
 +}
 +
 +static void gaudi2_reset_arcs(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u16 arc_id;
 +
 +      if (!gaudi2)
 +              return;
 +
 +      for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++)
 +              if (gaudi2_is_arc_enabled(hdev, arc_id))
 +                      gaudi2_clr_arc_id_cap(hdev, arc_id);
 +}
 +
 +static void gaudi2_nic_qmans_manual_flush(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 queue_id;
 +      int i;
 +
 +      if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
 +              return;
 +
 +      queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              gaudi2_qman_manual_flush_common(hdev, queue_id);
 +      }
 +}
 +
 +static int gaudi2_set_engine_cores(struct hl_device *hdev, u32 *core_ids,
 +                                      u32 num_cores, u32 core_command)
 +{
 +      int i, rc;
 +
 +
 +      for (i = 0 ; i < num_cores ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, core_ids[i]))
 +                      gaudi2_set_arc_running_mode(hdev, core_ids[i], core_command);
 +      }
 +
 +      for (i = 0 ; i < num_cores ; i++) {
 +              if (gaudi2_is_arc_enabled(hdev, core_ids[i])) {
 +                      rc = gaudi2_verify_arc_running_mode(hdev, core_ids[i], core_command);
 +
 +                      if (rc) {
 +                              dev_err(hdev->dev, "failed to %s arc: %d\n",
 +                                      (core_command == HL_ENGINE_CORE_HALT) ?
 +                                      "HALT" : "RUN", core_ids[i]);
 +                              return -1;
 +                      }
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GAUDI2_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GAUDI2_RESET_WAIT_MSEC;
 +
 +      if (fw_reset)
 +              goto skip_engines;
 +
 +      gaudi2_stop_dma_qmans(hdev);
 +      gaudi2_stop_mme_qmans(hdev);
 +      gaudi2_stop_tpc_qmans(hdev);
 +      gaudi2_stop_rot_qmans(hdev);
 +      gaudi2_stop_nic_qmans(hdev);
 +      msleep(wait_timeout_ms);
 +
 +      gaudi2_halt_arcs(hdev);
 +      gaudi2_dma_stall(hdev);
 +      gaudi2_mme_stall(hdev);
 +      gaudi2_tpc_stall(hdev);
 +      gaudi2_rotator_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      gaudi2_stop_dec(hdev);
 +
 +      /*
 +       * in case of soft reset do a manual flush for QMANs (currently called
 +       * only for NIC QMANs
 +       */
 +      if (!hard_reset)
 +              gaudi2_nic_qmans_manual_flush(hdev);
 +
 +      gaudi2_disable_dma_qmans(hdev);
 +      gaudi2_disable_mme_qmans(hdev);
 +      gaudi2_disable_tpc_qmans(hdev);
 +      gaudi2_disable_rot_qmans(hdev);
 +      gaudi2_disable_nic_qmans(hdev);
 +      gaudi2_disable_timestamp(hdev);
 +
 +skip_engines:
 +      if (hard_reset) {
 +              gaudi2_disable_msix(hdev);
 +              return;
 +      }
 +
 +      gaudi2_sync_irqs(hdev);
 +}
 +
 +static void gaudi2_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GAUDI2_PREBOOT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void gaudi2_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GAUDI2_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GAUDI2_LINUX_FW_FILE;
 +      fw_loader->boot_fit_timeout = GAUDI2_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = false;
 +      fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
 +      fw_loader->dram_bar_id = DRAM_BAR_ID;
 +      fw_loader->cpu_timeout = GAUDI2_CPU_TIMEOUT_USEC;
 +
 +      /* here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded). in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu = cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host = cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +      dynamic_loader->wait_for_bl_timeout = GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static int gaudi2_init_cpu(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      rc = hl_fw_init_cpu(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
 +{
 +      struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct cpu_dyn_regs *dyn_regs;
 +      struct hl_eq *eq;
 +      u32 status;
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW, lower_32_bits(hdev->cpu_accessible_dma_address));
 +      WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH, upper_32_bits(hdev->cpu_accessible_dma_address));
 +
 +      WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
 +
 +      /* Let the ARC know we are ready as it is now handling those queues  */
 +
 +      WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
 +              gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_IF_QUEUE_INIT,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              cpu_timeout);
 +
 +      if (err) {
 +              dev_err(hdev->dev, "Failed to communicate with device CPU (timeout)\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void gaudi2_init_qman_pq(struct hl_device *hdev, u32 reg_base,
 +                              u32 queue_id_base)
 +{
 +      struct hl_hw_queue *q;
 +      u32 pq_id, pq_offset;
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
 +              q = &hdev->kernel_queues[queue_id_base + pq_id];
 +              pq_offset = pq_id * 4;
 +
 +              WREG32(reg_base + QM_PQ_BASE_LO_0_OFFSET + pq_offset,
 +                              lower_32_bits(q->bus_address));
 +              WREG32(reg_base + QM_PQ_BASE_HI_0_OFFSET + pq_offset,
 +                              upper_32_bits(q->bus_address));
 +              WREG32(reg_base + QM_PQ_SIZE_0_OFFSET + pq_offset, ilog2(HL_QUEUE_LENGTH));
 +              WREG32(reg_base + QM_PQ_PI_0_OFFSET + pq_offset, 0);
 +              WREG32(reg_base + QM_PQ_CI_0_OFFSET + pq_offset, 0);
 +      }
 +}
 +
 +static void gaudi2_init_qman_cp(struct hl_device *hdev, u32 reg_base)
 +{
 +      u32 cp_id, cp_offset, mtr_base_lo, mtr_base_hi, so_base_lo, so_base_hi;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (cp_id = 0 ; cp_id < NUM_OF_CP_PER_QMAN; cp_id++) {
 +              cp_offset = cp_id * 4;
 +
 +              WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_LO_0_OFFSET + cp_offset, mtr_base_lo);
 +              WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_HI_0_OFFSET + cp_offset, mtr_base_hi);
 +              WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_LO_0_OFFSET + cp_offset, so_base_lo);
 +              WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_HI_0_OFFSET + cp_offset, so_base_hi);
 +      }
 +
 +      /* allow QMANs to accept work from ARC CQF */
 +      WREG32(reg_base + QM_CP_CFG_OFFSET, FIELD_PREP(PDMA0_QM_CP_CFG_SWITCH_EN_MASK, 0x1));
 +}
 +
 +static void gaudi2_init_qman_pqc(struct hl_device *hdev, u32 reg_base,
 +                              u32 queue_id_base)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 pq_id, pq_offset, so_base_lo, so_base_hi;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
 +              pq_offset = pq_id * 4;
 +
 +              /* Configure QMAN HBW to scratchpad as it is not needed */
 +              WREG32(reg_base + QM_PQC_HBW_BASE_LO_0_OFFSET + pq_offset,
 +                              lower_32_bits(gaudi2->scratchpad_bus_address));
 +              WREG32(reg_base + QM_PQC_HBW_BASE_HI_0_OFFSET + pq_offset,
 +                              upper_32_bits(gaudi2->scratchpad_bus_address));
 +              WREG32(reg_base + QM_PQC_SIZE_0_OFFSET + pq_offset,
 +                              ilog2(PAGE_SIZE / sizeof(struct hl_cq_entry)));
 +
 +              WREG32(reg_base + QM_PQC_PI_0_OFFSET + pq_offset, 0);
 +              WREG32(reg_base + QM_PQC_LBW_WDATA_0_OFFSET + pq_offset, QM_PQC_LBW_WDATA);
 +              WREG32(reg_base + QM_PQC_LBW_BASE_LO_0_OFFSET + pq_offset, so_base_lo);
 +              WREG32(reg_base + QM_PQC_LBW_BASE_HI_0_OFFSET + pq_offset, so_base_hi);
 +      }
 +
 +      /* Enable QMAN H/W completion */
 +      WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
 +}
 +
 +static u32 gaudi2_get_dyn_sp_reg(struct hl_device *hdev, u32 queue_id_base)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 sp_reg_addr;
 +
 +      switch (queue_id_base) {
 +      case GAUDI2_QUEUE_ID_PDMA_0_0...GAUDI2_QUEUE_ID_PDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_MME_0_0...GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_MME_0_0...GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_MME_0_0...GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_MME_0_0...GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
 +              fallthrough;
 +      case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_ROT_0_0...GAUDI2_QUEUE_ID_ROT_1_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_rot_qm_irq_ctrl);
 +              break;
 +      case GAUDI2_QUEUE_ID_NIC_0_0...GAUDI2_QUEUE_ID_NIC_23_3:
 +              sp_reg_addr = le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Unexpected h/w queue %d\n", queue_id_base);
 +              return 0;
 +      }
 +
 +      return sp_reg_addr;
 +}
 +
 +static void gaudi2_init_qman_common(struct hl_device *hdev, u32 reg_base,
 +                                      u32 queue_id_base)
 +{
 +      u32 glbl_prot = QMAN_MAKE_TRUSTED, irq_handler_offset;
 +      int map_table_entry;
 +
 +      WREG32(reg_base + QM_GLBL_PROT_OFFSET, glbl_prot);
 +
 +      irq_handler_offset = gaudi2_get_dyn_sp_reg(hdev, queue_id_base);
 +      WREG32(reg_base + QM_GLBL_ERR_ADDR_LO_OFFSET, lower_32_bits(CFG_BASE + irq_handler_offset));
 +      WREG32(reg_base + QM_GLBL_ERR_ADDR_HI_OFFSET, upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      map_table_entry = gaudi2_qman_async_event_id[queue_id_base];
 +      WREG32(reg_base + QM_GLBL_ERR_WDATA_OFFSET,
 +              gaudi2_irq_map_table[map_table_entry].cpu_id);
 +
 +      WREG32(reg_base + QM_ARB_ERR_MSG_EN_OFFSET, QM_ARB_ERR_MSG_EN_MASK);
 +
 +      WREG32(reg_base + QM_ARB_SLV_CHOISE_WDT_OFFSET, GAUDI2_ARB_WDT_TIMEOUT);
 +      WREG32(reg_base + QM_GLBL_CFG1_OFFSET, 0);
 +      WREG32(reg_base + QM_GLBL_CFG2_OFFSET, 0);
 +
 +      /* Enable the QMAN channel.
 +       * PDMA QMAN configuration is different, as we do not allow user to
 +       * access some of the CPs.
 +       * PDMA0: CP2/3 are reserved for the ARC usage.
 +       * PDMA1: CP1/2/3 are reserved for the ARC usage.
 +       */
 +      if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0])
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA1_QMAN_ENABLE);
 +      else if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0])
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA0_QMAN_ENABLE);
 +      else
 +              WREG32(reg_base + QM_GLBL_CFG0_OFFSET, QMAN_ENABLE);
 +}
 +
 +static void gaudi2_init_qman(struct hl_device *hdev, u32 reg_base,
 +              u32 queue_id_base)
 +{
 +      u32 pq_id;
 +
 +      for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++)
 +              hdev->kernel_queues[queue_id_base + pq_id].cq_id = GAUDI2_RESERVED_CQ_CS_COMPLETION;
 +
 +      gaudi2_init_qman_pq(hdev, reg_base, queue_id_base);
 +      gaudi2_init_qman_cp(hdev, reg_base);
 +      gaudi2_init_qman_pqc(hdev, reg_base, queue_id_base);
 +      gaudi2_init_qman_common(hdev, reg_base, queue_id_base);
 +}
 +
 +static void gaudi2_init_dma_core(struct hl_device *hdev, u32 reg_base,
 +                              u32 dma_core_id, bool is_secure)
 +{
 +      u32 prot, irq_handler_offset;
 +      struct cpu_dyn_regs *dyn_regs;
 +      int map_table_entry;
 +
 +      prot = 1 << ARC_FARM_KDMA_PROT_ERR_VAL_SHIFT;
 +      if (is_secure)
 +              prot |= 1 << ARC_FARM_KDMA_PROT_VAL_SHIFT;
 +
 +      WREG32(reg_base + DMA_CORE_PROT_OFFSET, prot);
 +
 +      dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      irq_handler_offset = le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
 +
 +      WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_LO_OFFSET,
 +                      lower_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_HI_OFFSET,
 +                      upper_32_bits(CFG_BASE + irq_handler_offset));
 +
 +      map_table_entry = gaudi2_dma_core_async_event_id[dma_core_id];
 +      WREG32(reg_base + DMA_CORE_ERRMSG_WDATA_OFFSET,
 +              gaudi2_irq_map_table[map_table_entry].cpu_id);
 +
 +      /* Enable the DMA channel */
 +      WREG32(reg_base + DMA_CORE_CFG_0_OFFSET, 1 << ARC_FARM_KDMA_CFG_0_EN_SHIFT);
 +}
 +
 +static void gaudi2_init_kdma(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_KDMA) == HW_CAP_KDMA)
 +              return;
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_KDMA];
 +
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_KDMA, true);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_KDMA;
 +}
 +
 +static void gaudi2_init_pdma(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_base;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK) == HW_CAP_PDMA_MASK)
 +              return;
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA0];
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA0, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0];
 +      gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_0_0);
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA1];
 +      gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA1, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0];
 +      gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_1_0);
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_PDMA_MASK;
 +}
 +
 +static void gaudi2_init_edma_instance(struct hl_device *hdev, u8 seq)
 +{
 +      u32 reg_base, base_edma_core_id, base_edma_qman_id;
 +
 +      base_edma_core_id = DMA_CORE_ID_EDMA0 + seq;
 +      base_edma_qman_id = edma_stream_base[seq];
 +
 +      reg_base = gaudi2_dma_core_blocks_bases[base_edma_core_id];
 +      gaudi2_init_dma_core(hdev, reg_base, base_edma_core_id, false);
 +
 +      reg_base = gaudi2_qm_blocks_bases[base_edma_qman_id];
 +      gaudi2_init_qman(hdev, reg_base, base_edma_qman_id);
 +}
 +
 +static void gaudi2_init_edma(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int dcore, inst;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK) == HW_CAP_EDMA_MASK)
 +              return;
 +
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
 +                      u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(seq)))
 +                              continue;
 +
 +                      gaudi2_init_edma_instance(hdev, seq);
 +
 +                      gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_EDMA_SHIFT + seq);
 +              }
 +      }
 +}
 +
 +/*
 + * gaudi2_arm_monitors_for_virt_msix_db() - Arm monitors for writing to the virtual MSI-X doorbell.
 + * @hdev: pointer to habanalabs device structure.
 + * @sob_id: sync object ID.
 + * @first_mon_id: ID of first monitor out of 3 consecutive monitors.
 + * @interrupt_id: interrupt ID.
 + *
 + * Some initiators cannot have HBW address in their completion address registers, and thus cannot
 + * write directly to the HBW host memory of the virtual MSI-X doorbell.
 + * Instead, they are configured to LBW write to a sync object, and a monitor will do the HBW write.
 + *
 + * The mechanism in the sync manager block is composed of a master monitor with 3 messages.
 + * In addition to the HBW write, the other 2 messages are for preparing the monitor to next
 + * completion, by decrementing the sync object value and re-arming the monitor.
 + */
 +static void gaudi2_arm_monitors_for_virt_msix_db(struct hl_device *hdev, u32 sob_id,
 +                                                      u32 first_mon_id, u32 interrupt_id)
 +{
 +      u32 sob_offset, first_mon_offset, mon_offset, payload, sob_group, mode, arm, config;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 addr;
 +      u8 mask;
 +
 +      /* Reset the SOB value */
 +      sob_offset = sob_id * sizeof(u32);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
 +
 +      /* Configure 3 monitors:
 +       * 1. Write interrupt ID to the virtual MSI-X doorbell (master monitor)
 +       * 2. Decrement SOB value by 1.
 +       * 3. Re-arm the master monitor.
 +       */
 +
 +      first_mon_offset = first_mon_id * sizeof(u32);
 +
 +      /* 2nd monitor: Decrement SOB value by 1 */
 +      mon_offset = first_mon_offset + sizeof(u32);
 +
 +      addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      payload = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 0x7FFF) | /* "-1" */
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_SIGN_MASK, 1) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      /* 3rd monitor: Re-arm the master monitor */
 +      mon_offset = first_mon_offset + 2 * sizeof(u32);
 +
 +      addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + first_mon_offset;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      sob_group = sob_id / 8;
 +      mask = ~BIT(sob_id & 0x7);
 +      mode = 0; /* comparison mode is "greater than or equal to" */
 +      arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sob_group) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, 1);
 +
 +      payload = arm;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      /* 1st monitor (master): Write interrupt ID to the virtual MSI-X doorbell */
 +      mon_offset = first_mon_offset;
 +
 +      config = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_WR_NUM_MASK, 2); /* "2": 3 writes */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + mon_offset, config);
 +
 +      addr = gaudi2->virt_msix_db_dma_addr;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
 +
 +      payload = interrupt_id;
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
 +
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, arm);
 +}
 +
 +static void gaudi2_prepare_sm_for_virt_msix_db(struct hl_device *hdev)
 +{
 +      u32 decoder_id, sob_id, first_mon_id, interrupt_id;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* Decoder normal/abnormal interrupts */
 +      for (decoder_id = 0 ; decoder_id < NUMBER_OF_DEC ; ++decoder_id) {
 +              if (!(prop->decoder_enabled_mask & BIT(decoder_id)))
 +                      continue;
 +
 +              sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
 +              first_mon_id = GAUDI2_RESERVED_MON_DEC_NRM_FIRST + 3 * decoder_id;
 +              interrupt_id = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 2 * decoder_id;
 +              gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
 +
 +              sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
 +              first_mon_id = GAUDI2_RESERVED_MON_DEC_ABNRM_FIRST + 3 * decoder_id;
 +              interrupt_id += 1;
 +              gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
 +      }
 +}
 +
 +static void gaudi2_init_sm(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 cq_address;
 +      u32 reg_val;
 +      int i;
 +
 +      /* Enable HBW/LBW CQ for completion monitors */
 +      reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
 +      reg_val |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_LBW_EN_MASK, 1);
 +
 +      for (i = 0 ; i < GAUDI2_MAX_PENDING_CS ; i++)
 +              WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
 +
 +      /* Enable only HBW CQ for KDMA completion monitor */
 +      reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
 +
 +      /* Init CQ0 DB - configure the monitor to trigger MSI-X interrupt */
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0, lower_32_bits(gaudi2->virt_msix_db_dma_addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0, upper_32_bits(gaudi2->virt_msix_db_dma_addr));
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0, GAUDI2_IRQ_NUM_COMPLETION);
 +
 +      for (i = 0 ; i < GAUDI2_RESERVED_CQ_NUMBER ; i++) {
 +              cq_address =
 +                      hdev->completion_queue[i].bus_address;
 +
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + (4 * i),
 +                                                      lower_32_bits(cq_address));
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + (4 * i),
 +                                                      upper_32_bits(cq_address));
 +              WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + (4 * i),
 +                                                      ilog2(HL_CQ_SIZE_IN_BYTES));
 +      }
 +
 +      /* Configure kernel ASID and MMU BP*/
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_SEC, 0x10000);
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV, 0);
 +
 +      /* Initialize sync objects and monitors which are used for the virtual MSI-X doorbell */
 +      gaudi2_prepare_sm_for_virt_msix_db(hdev);
 +}
 +
 +static void gaudi2_init_mme_acc(struct hl_device *hdev, u32 reg_base)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 reg_val;
 +      int i;
 +
 +      reg_val = FIELD_PREP(MME_ACC_INTR_MASK_WBC_ERR_RESP_MASK, 0);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_POS_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NEG_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NAN_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_POS_INF_MASK, 1);
 +      reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_NEG_INF_MASK, 1);
 +
 +      WREG32(reg_base + MME_ACC_INTR_MASK_OFFSET, reg_val);
 +      WREG32(reg_base + MME_ACC_AP_LFSR_POLY_OFFSET, 0x80DEADAF);
 +
 +      for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++) {
 +              WREG32(reg_base + MME_ACC_AP_LFSR_SEED_SEL_OFFSET, i);
 +              WREG32(reg_base + MME_ACC_AP_LFSR_SEED_WDATA_OFFSET, gaudi2->lfsr_rand_seeds[i]);
 +      }
 +}
 +
 +static void gaudi2_init_dcore_mme(struct hl_device *hdev, int dcore_id,
 +                                                      bool config_qman_only)
 +{
 +      u32 queue_id_base, reg_base;
 +
 +      switch (dcore_id) {
 +      case 0:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
 +              break;
 +      case 1:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
 +              break;
 +      case 2:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
 +              break;
 +      case 3:
 +              queue_id_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Invalid dcore id %u\n", dcore_id);
 +              return;
 +      }
 +
 +      if (!config_qman_only) {
 +              reg_base = gaudi2_mme_acc_blocks_bases[dcore_id];
 +              gaudi2_init_mme_acc(hdev, reg_base);
 +      }
 +
 +      reg_base = gaudi2_qm_blocks_bases[queue_id_base];
 +      gaudi2_init_qman(hdev, reg_base, queue_id_base);
 +}
 +
 +static void gaudi2_init_mme(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_MME_MASK) == HW_CAP_MME_MASK)
 +              return;
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              gaudi2_init_dcore_mme(hdev, i, false);
 +
 +              gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_MME_SHIFT + i);
 +      }
 +}
 +
 +static void gaudi2_init_tpc_cfg(struct hl_device *hdev, u32 reg_base)
 +{
 +      /* Mask arithmetic and QM interrupts in TPC */
 +      WREG32(reg_base + TPC_CFG_TPC_INTR_MASK_OFFSET, 0x23FFFE);
 +
 +      /* Set 16 cache lines */
 +      WREG32(reg_base + TPC_CFG_MSS_CONFIG_OFFSET,
 +                      2 << DCORE0_TPC0_CFG_MSS_CONFIG_ICACHE_FETCH_LINE_NUM_SHIFT);
 +}
 +
 +struct gaudi2_tpc_init_cfg_data {
 +      enum gaudi2_queue_id dcore_tpc_qid_base[NUM_OF_DCORES];
 +};
 +
 +static void gaudi2_init_tpc_config(struct hl_device *hdev, int dcore, int inst,
 +                                      u32 offset, struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_init_cfg_data *cfg_data = ctx->data;
 +      u32 queue_id_base;
 +      u8 seq;
 +
 +      queue_id_base = cfg_data->dcore_tpc_qid_base[dcore] + (inst * NUM_OF_PQ_PER_QMAN);
 +
 +      if (dcore == 0 && inst == (NUM_DCORE0_TPC - 1))
 +              /* gets last sequence number */
 +              seq = NUM_OF_DCORES * NUM_OF_TPC_PER_DCORE;
 +      else
 +              seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
 +
 +      gaudi2_init_tpc_cfg(hdev, mmDCORE0_TPC0_CFG_BASE + offset);
 +      gaudi2_init_qman(hdev, mmDCORE0_TPC0_QM_BASE + offset, queue_id_base);
 +
 +      gaudi2->tpc_hw_cap_initialized |= BIT_ULL(HW_CAP_TPC_SHIFT + seq);
 +}
 +
 +static void gaudi2_init_tpc(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_init_cfg_data init_cfg_data;
 +      struct iterate_module_ctx tpc_iter;
 +
 +      if (!hdev->asic_prop.tpc_enabled_mask)
 +              return;
 +
 +      if ((gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK) == HW_CAP_TPC_MASK)
 +              return;
 +
 +      init_cfg_data.dcore_tpc_qid_base[0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[1] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[2] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0;
 +      init_cfg_data.dcore_tpc_qid_base[3] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0;
 +      tpc_iter.fn = &gaudi2_init_tpc_config;
 +      tpc_iter.data = &init_cfg_data;
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +}
 +
 +static void gaudi2_init_rotator(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 i, reg_base, queue_id;
 +
 +      queue_id = GAUDI2_QUEUE_ID_ROT_0_0;
 +
 +      for (i = 0 ; i < NUM_OF_ROT ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
 +              reg_base = gaudi2_qm_blocks_bases[queue_id];
 +              gaudi2_init_qman(hdev, reg_base, queue_id);
 +
 +              gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_ROT_SHIFT + i);
 +      }
 +}
 +
 +static void gaudi2_init_vdec_brdg_ctrl(struct hl_device *hdev, u64 base_addr, u32 decoder_id)
 +{
 +      u32 sob_id;
 +
 +      /* VCMD normal interrupt */
 +      sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
 +      WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_AWADDR,
 +                      mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
 +      WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
 +
 +      /* VCMD abnormal interrupt */
 +      sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
 +      WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_AWADDR,
 +                      mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
 +      WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
 +}
 +
 +static void gaudi2_init_dec(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 dcore_id, dec_id, dec_bit;
 +      u64 base_addr;
 +
 +      if (!hdev->asic_prop.decoder_enabled_mask)
 +              return;
 +
 +      if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == HW_CAP_DEC_MASK)
 +              return;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
 +                      dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
 +
 +                      if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                              continue;
 +
 +                      base_addr =  mmDCORE0_DEC0_CMD_BASE +
 +                                      BRDG_CTRL_BLOCK_OFFSET +
 +                                      dcore_id * DCORE_OFFSET +
 +                                      dec_id * DCORE_VDEC_OFFSET;
 +
 +                      gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
 +
 +                      gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
 +              }
 +
 +      for (dec_id = 0 ; dec_id < NUM_OF_PCIE_VDEC ; dec_id++) {
 +              dec_bit = PCIE_DEC_SHIFT + dec_id;
 +              if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
 +                      continue;
 +
 +              base_addr = mmPCIE_DEC0_CMD_BASE + BRDG_CTRL_BLOCK_OFFSET +
 +                              dec_id * DCORE_VDEC_OFFSET;
 +
 +              gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
 +
 +              gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
 +      }
 +}
 +
 +static int gaudi2_mmu_update_asid_hop0_addr(struct hl_device *hdev,
 +                                      u32 stlb_base, u32 asid, u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm || !hdev->pdev)
 +              timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(stlb_base + STLB_ASID_OFFSET, asid);
 +      WREG32(stlb_base + STLB_HOP0_PA43_12_OFFSET, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(stlb_base + STLB_HOP0_PA63_44_OFFSET, phys_addr >> MMU_HOP0_PA63_44_SHIFT);
 +      WREG32(stlb_base + STLB_BUSY_OFFSET, 0x80000000);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_BUSY_OFFSET,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_mmu_send_invalidate_cache_cmd(struct hl_device *hdev, u32 stlb_base,
 +                                      u32 start_offset, u32 inv_start_val,
 +                                      u32 flags)
 +{
 +      /* clear PMMU mem line cache (only needed in mmu range invalidation) */
 +      if (flags & MMU_OP_CLEAR_MEMCACHE)
 +              WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INVALIDATION, 0x1);
 +
 +      if (flags & MMU_OP_SKIP_LOW_CACHE_INV)
 +              return;
 +
 +      WREG32(stlb_base + start_offset, inv_start_val);
 +}
 +
 +static int gaudi2_mmu_invalidate_cache_status_poll(struct hl_device *hdev, u32 stlb_base,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 status, timeout_usec, start_offset;
 +      int rc;
 +
 +      timeout_usec = (hdev->pldm) ? GAUDI2_PLDM_MMU_TIMEOUT_USEC :
 +                                      GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
 +
 +      /* poll PMMU mem line cache (only needed in mmu range invalidation) */
 +      if (inv_params->flags & MMU_OP_CLEAR_MEMCACHE) {
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS,
 +                      status,
 +                      status & 0x1,
 +                      1000,
 +                      timeout_usec);
 +
 +              if (rc)
 +                      return rc;
 +
 +              /* Need to manually reset the status to 0 */
 +              WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS, 0x0);
 +      }
 +
 +      /* Lower cache does not work with cache lines, hence we can skip its
 +       * invalidation upon map and invalidate only upon unmap
 +       */
 +      if (inv_params->flags & MMU_OP_SKIP_LOW_CACHE_INV)
 +              return 0;
 +
 +      start_offset = inv_params->range_invalidation ?
 +                      STLB_RANGE_CACHE_INVALIDATION_OFFSET : STLB_INV_ALL_START_OFFSET;
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + start_offset,
 +              status,
 +              !(status & 0x1),
 +              1000,
 +              timeout_usec);
 +
 +      return rc;
 +}
 +
 +bool gaudi2_is_hmmu_enabled(struct hl_device *hdev, int dcore_id, int hmmu_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 hw_cap;
 +
 +      hw_cap = HW_CAP_DCORE0_DMMU0 << (NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id);
 +
 +      if (gaudi2->hw_cap_initialized & hw_cap)
 +              return true;
 +
 +      return false;
 +}
 +
 +/* this function shall be called only for HMMUs for which capability bit is set */
 +static inline u32 get_hmmu_stlb_base(int dcore_id, int hmmu_id)
 +{
 +      u32 offset;
 +
 +      offset =  (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
 +      return (u32)(mmDCORE0_HMMU0_STLB_BASE + offset);
 +}
 +
 +static void gaudi2_mmu_invalidate_cache_trigger(struct hl_device *hdev, u32 stlb_base,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 start_offset;
 +
 +      if (inv_params->range_invalidation) {
 +              /* Set the addresses range
 +               * Note: that the start address we set in register, is not included in
 +               * the range of the invalidation, by design.
 +               * that's why we need to set lower address than the one we actually
 +               * want to be included in the range invalidation.
 +               */
 +              u64 start = inv_params->start_va - 1;
 +
 +              start_offset = STLB_RANGE_CACHE_INVALIDATION_OFFSET;
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_START_LSB_OFFSET,
 +                              start >> MMU_RANGE_INV_VA_LSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_START_MSB_OFFSET,
 +                              start >> MMU_RANGE_INV_VA_MSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_END_LSB_OFFSET,
 +                              inv_params->end_va >> MMU_RANGE_INV_VA_LSB_SHIFT);
 +
 +              WREG32(stlb_base + STLB_RANGE_INV_END_MSB_OFFSET,
 +                              inv_params->end_va >> MMU_RANGE_INV_VA_MSB_SHIFT);
 +      } else {
 +              start_offset = STLB_INV_ALL_START_OFFSET;
 +      }
 +
 +      gaudi2_mmu_send_invalidate_cache_cmd(hdev, stlb_base, start_offset,
 +                                              inv_params->inv_start_val, inv_params->flags);
 +}
 +
 +static inline void gaudi2_hmmu_invalidate_cache_trigger(struct hl_device *hdev,
 +                                              int dcore_id, int hmmu_id,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
 +
 +      gaudi2_mmu_invalidate_cache_trigger(hdev, stlb_base, inv_params);
 +}
 +
 +static inline int gaudi2_hmmu_invalidate_cache_status_poll(struct hl_device *hdev,
 +                                              int dcore_id, int hmmu_id,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
 +
 +      return gaudi2_mmu_invalidate_cache_status_poll(hdev, stlb_base, inv_params);
 +}
 +
 +static int gaudi2_hmmus_invalidate_cache(struct hl_device *hdev,
 +                                              struct gaudi2_cache_invld_params *inv_params)
 +{
 +      int dcore_id, hmmu_id;
 +
 +      /* first send all invalidation commands */
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
 +                      if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
 +                              continue;
 +
 +                      gaudi2_hmmu_invalidate_cache_trigger(hdev, dcore_id, hmmu_id, inv_params);
 +              }
 +      }
 +
 +      /* next, poll all invalidations status */
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
 +                      int rc;
 +
 +                      if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
 +                              continue;
 +
 +                      rc = gaudi2_hmmu_invalidate_cache_status_poll(hdev, dcore_id, hmmu_id,
 +                                                                              inv_params);
 +                      if (rc)
 +                              return rc;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_cache_invld_params invld_params;
 +      int rc = 0;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return rc;
 +
 +      invld_params.range_invalidation = false;
 +      invld_params.inv_start_val = 1;
 +
 +      if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              invld_params.flags = flags;
 +              gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
 +              rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
 +                                                                              &invld_params);
 +      } else if (flags & MMU_OP_PHYS_PACK) {
 +              invld_params.flags = 0;
 +              rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi2_mmu_invalidate_cache_range(struct hl_device *hdev, bool is_hard,
 +                              u32 flags, u32 asid, u64 va, u64 size)
 +{
 +      struct gaudi2_cache_invld_params invld_params = {0};
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 start_va, end_va;
 +      u32 inv_start_val;
 +      int rc = 0;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      inv_start_val = (1 << MMU_RANGE_INV_EN_SHIFT |
 +                      1 << MMU_RANGE_INV_ASID_EN_SHIFT |
 +                      asid << MMU_RANGE_INV_ASID_SHIFT);
 +      start_va = va;
 +      end_va = start_va + size;
 +
 +      if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              /* As range invalidation does not support zero address we will
 +               * do full invalidation in this case
 +               */
 +              if (start_va) {
 +                      invld_params.range_invalidation = true;
 +                      invld_params.start_va = start_va;
 +                      invld_params.end_va = end_va;
 +                      invld_params.inv_start_val = inv_start_val;
 +                      invld_params.flags = flags | MMU_OP_CLEAR_MEMCACHE;
 +              } else {
 +                      invld_params.range_invalidation = false;
 +                      invld_params.inv_start_val = 1;
 +                      invld_params.flags = flags;
 +              }
 +
 +
 +              gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
 +              rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
 +                                                                              &invld_params);
 +              if (rc)
 +                      return rc;
 +
 +      } else if (flags & MMU_OP_PHYS_PACK) {
 +              invld_params.start_va = gaudi2_mmu_scramble_addr(hdev, start_va);
 +              invld_params.end_va = gaudi2_mmu_scramble_addr(hdev, end_va);
 +              invld_params.inv_start_val = inv_start_val;
 +              invld_params.flags = flags;
 +              rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
 +      }
 +
 +      return rc;
 +}
 +
 +static int gaudi2_mmu_update_hop0_addr(struct hl_device *hdev, u32 stlb_base)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 hop0_addr;
 +      u32 asid, max_asid = prop->max_asid;
 +      int rc;
 +
 +      /* it takes too much time to init all of the ASIDs on palladium */
 +      if (hdev->pldm)
 +              max_asid = min((u32) 8, max_asid);
 +
 +      for (asid = 0 ; asid < max_asid ; asid++) {
 +              hop0_addr = hdev->mmu_priv.hr.mmu_asid_hop0[asid].phys_addr;
 +              rc = gaudi2_mmu_update_asid_hop0_addr(hdev, stlb_base, asid, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev, "failed to set hop0 addr for asid %d\n", asid);
 +                      return rc;
 +              }
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_init_common(struct hl_device *hdev, u32 mmu_base, u32 stlb_base)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm || !hdev->pdev)
 +              timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
 +
 +      WREG32(stlb_base + STLB_INV_ALL_START_OFFSET, 1);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_SRAM_INIT_OFFSET,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      if (rc)
 +              dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU SRAM init\n");
 +
 +      rc = gaudi2_mmu_update_hop0_addr(hdev, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      WREG32(mmu_base + MMU_BYPASS_OFFSET, 0);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              stlb_base + STLB_INV_ALL_START_OFFSET,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      if (rc)
 +              dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU invalidate all\n");
 +
 +      WREG32(mmu_base + MMU_ENABLE_OFFSET, 1);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_pci_mmu_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 mmu_base, stlb_base;
 +      int rc;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_PMMU)
 +              return 0;
 +
 +      mmu_base = mmPMMU_HBW_MMU_BASE;
 +      stlb_base = mmPMMU_HBW_STLB_BASE;
 +
 +      RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
 +              (0 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_SHIFT) |
 +              (4 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_SHIFT) |
 +              (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_SHIFT),
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
 +              PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
 +
 +      WREG32(stlb_base + STLB_LL_LOOKUP_MASK_63_32_OFFSET, 0);
 +
 +      if (PAGE_SIZE == SZ_64K) {
 +              /* Set page sizes to 64K on hop5 and 16M on hop4 + enable 8 bit hops */
 +              RMWREG32_SHIFTED(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET,
 +                      FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK, 4) |
 +                      FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK, 3) |
 +                      FIELD_PREP(
 +                              DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK,
 +                              1),
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK |
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK |
 +                      DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK);
 +      }
 +
 +      WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_PMMU_SPI_SEI_ENABLE_MASK);
 +
 +      rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= HW_CAP_PMMU;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_dcore_hmmu_init(struct hl_device *hdev, int dcore_id,
 +                              int hmmu_id)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 offset, mmu_base, stlb_base, hw_cap;
 +      u8 dmmu_seq;
 +      int rc;
 +
 +      dmmu_seq = NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id;
 +      hw_cap = HW_CAP_DCORE0_DMMU0 << dmmu_seq;
 +
 +      /*
 +       * return if DMMU is already initialized or if it's not out of
 +       * isolation (due to cluster binning)
 +       */
 +      if ((gaudi2->hw_cap_initialized & hw_cap) || !(prop->hmmu_hif_enabled_mask & BIT(dmmu_seq)))
 +              return 0;
 +
 +      offset = (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
 +      mmu_base = mmDCORE0_HMMU0_MMU_BASE + offset;
 +      stlb_base = mmDCORE0_HMMU0_STLB_BASE + offset;
 +
 +      RMWREG32(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET, 5 /* 64MB */,
 +                      MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK);
 +
 +      RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK, 0) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK, 3) |
 +              FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK, 3),
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
 +                      DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
 +
 +      RMWREG32(stlb_base + STLB_HOP_CONFIGURATION_OFFSET, 1,
 +                      STLB_HOP_CONFIGURATION_ONLY_LARGE_PAGE_MASK);
 +
 +      WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_HMMU_SPI_SEI_ENABLE_MASK);
 +
 +      rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2->hw_cap_initialized |= hw_cap;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_hbm_mmu_init(struct hl_device *hdev)
 +{
 +      int rc, dcore_id, hmmu_id;
 +
 +      for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
 +              for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE; hmmu_id++) {
 +                      rc = gaudi2_dcore_hmmu_init(hdev, dcore_id, hmmu_id);
 +                      if (rc)
 +                              return rc;
 +              }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_init(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = gaudi2_pci_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_hbm_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static int gaudi2_hw_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      /* Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmHW_STATE);
 +
 +      /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
 +       * So we set it here and if anyone tries to move it later to
 +       * a different address, there will be an error
 +       */
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              gaudi2->dram_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the hbm bar to
 +       * base address of dram
 +       */
 +      if (gaudi2_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev, "failed to map HBM bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = gaudi2_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      gaudi2_init_scrambler_hbm(hdev);
 +      gaudi2_init_kdma(hdev);
 +
 +      rc = gaudi2_init_cpu_queues(hdev, GAUDI2_CPU_TIMEOUT_USEC);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = gaudi2->cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info\n");
 +              return rc;
 +      }
 +
 +      rc = gaudi2_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      gaudi2_init_pdma(hdev);
 +      gaudi2_init_edma(hdev);
 +      gaudi2_init_sm(hdev);
 +      gaudi2_init_tpc(hdev);
 +      gaudi2_init_mme(hdev);
 +      gaudi2_init_rotator(hdev);
 +      gaudi2_init_dec(hdev);
 +      gaudi2_enable_timestamp(hdev);
 +
 +      rc = gaudi2_coresight_init(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      rc = gaudi2_enable_msix(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* Perform read from the device to flush all configuration */
 +      RREG32(mmHW_STATE);
 +
 +      return 0;
 +
 +disable_queues:
 +      gaudi2_disable_dma_qmans(hdev);
 +      gaudi2_disable_mme_qmans(hdev);
 +      gaudi2_disable_tpc_qmans(hdev);
 +      gaudi2_disable_rot_qmans(hdev);
 +      gaudi2_disable_nic_qmans(hdev);
 +
 +      gaudi2_disable_timestamp(hdev);
 +
 +      return rc;
 +}
 +
 +/**
 + * gaudi2_send_hard_reset_cmd - common function to handle reset
 + *
 + * @hdev: pointer to the habanalabs device structure
 + *
 + * This function handles the various possible scenarios for reset.
 + * It considers if reset is handled by driver\FW and what FW components are loaded
 + */
 +static void gaudi2_send_hard_reset_cmd(struct hl_device *hdev)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      bool heartbeat_reset, preboot_only, cpu_initialized = false;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 cpu_boot_status;
 +
 +      preboot_only = (hdev->fw_loader.fw_comp_loaded == FW_TYPE_PREBOOT_CPU);
 +      heartbeat_reset = (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT);
 +
 +      /*
 +       * Handle corner case where failure was at cpu management app load,
 +       * and driver didn't detect any failure while loading the FW,
 +       * then at such scenario driver will send only HALT_MACHINE
 +       * and no one will respond to this request since FW already back to preboot
 +       * and it cannot handle such cmd.
 +       * In this case next time the management app loads it'll check on events register
 +       * which will still have the halt indication, and will reboot the device.
 +       * The solution is to let preboot clear all relevant registers before next boot
 +       * once driver send COMMS_RST_DEV.
 +       */
 +      cpu_boot_status = RREG32(mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS);
 +
 +      if (gaudi2 && (gaudi2->hw_cap_initialized & HW_CAP_CPU) &&
 +                      (cpu_boot_status == CPU_BOOT_STATUS_SRAM_AVAIL))
 +              cpu_initialized = true;
 +
 +      /*
 +       * when Linux/Bootfit exist this write to the SP can be interpreted in 2 ways:
 +       * 1. FW reset: FW initiate the reset sequence
 +       * 2. driver reset: FW will start HALT sequence (the preparations for the
 +       *                  reset but not the reset itself as it is not implemented
 +       *                  on their part) and LKD will wait to let FW complete the
 +       *                  sequence before issuing the reset
 +       */
 +      if (!preboot_only && cpu_initialized) {
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_halt_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_HALT_MACHINE].cpu_id);
 +
 +              msleep(GAUDI2_CPU_RESET_WAIT_MSEC);
 +      }
 +
 +      /*
 +       * When working with preboot (without Linux/Boot fit) we can
 +       * communicate only using the COMMS commands to issue halt/reset.
 +       *
 +       * For the case in which we are working with Linux/Bootfit this is a hail-mary
 +       * attempt to revive the card in the small chance that the f/w has
 +       * experienced a watchdog event, which caused it to return back to preboot.
 +       * In that case, triggering reset through GIC won't help. We need to
 +       * trigger the reset as if Linux wasn't loaded.
 +       *
 +       * We do it only if the reset cause was HB, because that would be the
 +       * indication of such an event.
 +       *
 +       * In case watchdog hasn't expired but we still got HB, then this won't
 +       * do any damage.
 +       */
 +
 +      if (heartbeat_reset || preboot_only || !cpu_initialized) {
 +              if (hdev->asic_prop.hard_reset_done_by_fw)
 +                      hl_fw_ask_hard_reset_without_linux(hdev);
 +              else
 +                      hl_fw_ask_halt_machine_without_linux(hdev);
 +      }
 +}
 +
 +/**
 + * gaudi2_execute_hard_reset - execute hard reset by driver/FW
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @reset_sleep_ms: sleep time in msec after reset
 + *
 + * This function executes hard reset based on if driver/FW should do the reset
 + */
 +static void gaudi2_execute_hard_reset(struct hl_device *hdev, u32 reset_sleep_ms)
 +{
 +      if (hdev->asic_prop.hard_reset_done_by_fw) {
 +              gaudi2_send_hard_reset_cmd(hdev);
 +              return;
 +      }
 +
 +      /* Set device to handle FLR by H/W as we will put the device
 +       * CPU to halt mode
 +       */
 +      WREG32(mmPCIE_AUX_FLR_CTRL,
 +                      (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK | PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
 +
 +      gaudi2_send_hard_reset_cmd(hdev);
 +
 +      WREG32(mmPSOC_RESET_CONF_SW_ALL_RST, 1);
 +}
 +
 +/**
 + * gaudi2_execute_soft_reset - execute soft reset by driver/FW
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @reset_sleep_ms: sleep time in msec after reset
 + * @driver_performs_reset: true if driver should perform reset instead of f/w.
 + *
 + * This function executes soft reset based on if driver/FW should do the reset
 + */
 +static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms,
 +                                              bool driver_performs_reset)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +
 +      if (!driver_performs_reset) {
 +              /* set SP to indicate reset request sent to FW */
 +              if (dyn_regs->cpu_rst_status)
 +                      WREG32(le32_to_cpu(dyn_regs->cpu_rst_status), CPU_RST_STATUS_NA);
 +              else
 +                      WREG32(mmCPU_RST_STATUS_TO_HOST, CPU_RST_STATUS_NA);
 +
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_soft_rst_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_SOFT_RESET].cpu_id);
 +              return;
 +      }
 +
 +      /* Block access to engines, QMANs and SM during reset, these
 +       * RRs will be reconfigured after soft reset.
 +       * PCIE_MSIX is left unsecured to allow NIC packets processing during the reset.
 +       */
 +      gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 1,
 +                                      mmDCORE0_TPC0_QM_DCCM_BASE, mmPCIE_MSIX_BASE);
 +
 +      gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 2,
 +                              mmPCIE_MSIX_BASE + HL_BLOCK_SIZE,
 +                              mmPCIE_VDEC1_MSTR_IF_RR_SHRD_HBW_BASE + HL_BLOCK_SIZE);
 +
 +      WREG32(mmPSOC_RESET_CONF_SOFT_RST, 1);
 +}
 +
 +static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 reset_sleep_ms,
 +                                                              u32 poll_timeout_us)
 +{
 +      int i, rc = 0;
 +      u32 reg_val;
 +
 +      /* without this sleep reset will not work */
 +      msleep(reset_sleep_ms);
 +
 +      /* We poll the BTM done indication multiple times after reset due to
 +       * a HW errata 'GAUDI2_0300'
 +       */
 +      for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmPSOC_GLOBAL_CONF_BTM_FSM,
 +                      reg_val,
 +                      reg_val == 0,
 +                      1000,
 +                      poll_timeout_us);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Timeout while waiting for device to reset 0x%x\n", reg_val);
 +}
 +
 +static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_timeout_us)
 +{
 +      int i, rc = 0;
 +      u32 reg_val;
 +
 +      for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      mmCPU_RST_STATUS_TO_HOST,
 +                      reg_val,
 +                      reg_val == CPU_RST_STATUS_SOFT_RST_DONE,
 +                      1000,
 +                      poll_timeout_us);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Timeout while waiting for FW to complete soft reset (0x%x)\n",
 +                              reg_val);
 +}
 +
 +static void gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 poll_timeout_us, reset_sleep_ms;
 +      bool driver_performs_reset = false;
 +
 +      if (hdev->pldm) {
 +              reset_sleep_ms = hard_reset ? GAUDI2_PLDM_HRESET_TIMEOUT_MSEC :
 +                                              GAUDI2_PLDM_SRESET_TIMEOUT_MSEC;
 +              poll_timeout_us = GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC;
 +      } else {
 +              reset_sleep_ms = GAUDI2_RESET_TIMEOUT_MSEC;
 +              poll_timeout_us = GAUDI2_RESET_POLL_TIMEOUT_USEC;
 +      }
 +
 +      if (fw_reset)
 +              goto skip_reset;
 +
 +      gaudi2_reset_arcs(hdev);
 +
 +      if (hard_reset) {
 +              driver_performs_reset = !hdev->asic_prop.hard_reset_done_by_fw;
 +              gaudi2_execute_hard_reset(hdev, reset_sleep_ms);
 +      } else {
 +              /*
 +               * As we have to support also work with preboot only (which does not supports
 +               * soft reset) we have to make sure that security is disabled before letting driver
 +               * do the reset. user shall control the BFE flags to avoid asking soft reset in
 +               * secured device with preboot only.
 +               */
 +              driver_performs_reset = (hdev->fw_components == FW_TYPE_PREBOOT_CPU &&
 +                                                      !hdev->asic_prop.fw_security_enabled);
 +              gaudi2_execute_soft_reset(hdev, reset_sleep_ms, driver_performs_reset);
 +      }
 +
 +skip_reset:
 +      if (driver_performs_reset || hard_reset)
 +              /*
 +               * Instead of waiting for BTM indication we should wait for preboot ready:
 +               * Consider the below scenario:
 +               * 1. FW update is being triggered
 +               *        - setting the dirty bit
 +               * 2. hard reset will be triggered due to the dirty bit
 +               * 3. FW initiates the reset:
 +               *        - dirty bit cleared
 +               *        - BTM indication cleared
 +               *        - preboot ready indication cleared
 +               * 4. during hard reset:
 +               *        - BTM indication will be set
 +               *        - BIST test performed and another reset triggered
 +               * 5. only after this reset the preboot will set the preboot ready
 +               *
 +               * when polling on BTM indication alone we can lose sync with FW while trying to
 +               * communicate with FW that is during reset.
 +               * to overcome this we will always wait to preboot ready indication
 +               */
 +              if ((hdev->fw_components & FW_TYPE_PREBOOT_CPU)) {
 +                      msleep(reset_sleep_ms);
 +                      hl_fw_wait_preboot_ready(hdev);
 +              } else {
 +                      gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us);
 +              }
 +      else
 +              gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us);
 +
 +      if (!gaudi2)
 +              return;
 +
 +      gaudi2->dec_hw_cap_initialized &= ~(HW_CAP_DEC_MASK);
 +      gaudi2->tpc_hw_cap_initialized &= ~(HW_CAP_TPC_MASK);
 +
 +      /*
 +       * Clear NIC capability mask in order for driver to re-configure
 +       * NIC QMANs. NIC ports will not be re-configured during soft
 +       * reset as we call gaudi2_nic_init only during hard reset
 +       */
 +      gaudi2->nic_hw_cap_initialized &= ~(HW_CAP_NIC_MASK);
 +
 +      if (hard_reset) {
 +              gaudi2->hw_cap_initialized &=
 +                      ~(HW_CAP_DRAM | HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_MASK |
 +                      HW_CAP_PMMU | HW_CAP_CPU | HW_CAP_CPU_Q |
 +                      HW_CAP_SRAM_SCRAMBLER | HW_CAP_DMMU_MASK |
 +                      HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_KDMA |
 +                      HW_CAP_MME_MASK | HW_CAP_ROT_MASK);
 +
 +              memset(gaudi2->events_stat, 0, sizeof(gaudi2->events_stat));
 +      } else {
 +              gaudi2->hw_cap_initialized &=
 +                      ~(HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_SW_RESET |
 +                      HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_MME_MASK |
 +                      HW_CAP_ROT_MASK);
 +      }
 +}
 +
 +static int gaudi2_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +static int gaudi2_resume(struct hl_device *hdev)
 +{
 +      return gaudi2_init_iatu(hdev);
 +}
 +
 +static int gaudi2_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +              void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +#ifdef _HAS_DMA_MMAP_COHERENT
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr, dma_addr, size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +#else
 +
 +      rc = remap_pfn_range(vma, vma->vm_start,
 +                              virt_to_phys(cpu_addr) >> PAGE_SHIFT,
 +                              size, vma->vm_page_prot);
 +      if (rc)
 +              dev_err(hdev->dev, "remap_pfn_range error %d", rc);
 +
 +#endif
 +
 +      return rc;
 +}
 +
 +static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 hw_cap_mask = 0;
 +      u64 hw_tpc_cap_bit = 0;
 +      u64 hw_nic_cap_bit = 0;
 +      u64 hw_test_cap_bit = 0;
 +
 +      switch (hw_queue_id) {
 +      case GAUDI2_QUEUE_ID_PDMA_0_0:
 +      case GAUDI2_QUEUE_ID_PDMA_0_1:
 +      case GAUDI2_QUEUE_ID_PDMA_1_0:
 +              hw_cap_mask = HW_CAP_PDMA_MASK;
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 2 * NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0) >> 2);
 +              break;
 +      case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
 +              hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 3 * NUM_OF_EDMA_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE1_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 1;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE2_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 2;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE3_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
 +              hw_test_cap_bit = HW_CAP_MME_SHIFT + 3;
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_TPC_0_0) >> 2);
 +
 +              /* special case where cap bit refers to the first queue id */
 +              if (!hw_tpc_cap_bit)
 +                      return !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(0));
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + NUM_OF_TPC_PER_DCORE +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (2 * NUM_OF_TPC_PER_DCORE) +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (3 * NUM_OF_TPC_PER_DCORE) +
 +                      ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_TPC_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_DCORE0_TPC_6_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
 +              hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (4 * NUM_OF_TPC_PER_DCORE);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_ROT_0_0 ... GAUDI2_QUEUE_ID_ROT_1_3:
 +              hw_test_cap_bit = HW_CAP_ROT_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_ROT_0_0) >> 2);
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_NIC_0_0 ... GAUDI2_QUEUE_ID_NIC_23_3:
 +              hw_nic_cap_bit = HW_CAP_NIC_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_NIC_0_0) >> 2);
 +
 +              /* special case where cap bit refers to the first queue id */
 +              if (!hw_nic_cap_bit)
 +                      return !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(0));
 +              break;
 +
 +      case GAUDI2_QUEUE_ID_CPU_PQ:
 +              return !!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q);
 +
 +      default:
 +              return false;
 +      }
 +
 +      if (hw_tpc_cap_bit)
 +              return  !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(hw_tpc_cap_bit));
 +
 +      if (hw_nic_cap_bit)
 +              return  !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(hw_nic_cap_bit));
 +
 +      if (hw_test_cap_bit)
 +              hw_cap_mask = BIT_ULL(hw_test_cap_bit);
 +
 +      return !!(gaudi2->hw_cap_initialized & hw_cap_mask);
 +}
 +
 +static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              return !!(gaudi2->active_hw_arc & BIT_ULL(arc_id));
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              return !!(gaudi2->active_tpc_arc & BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              return !!(gaudi2->active_nic_arc & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
 +
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              gaudi2->active_hw_arc &= ~(BIT_ULL(arc_id));
 +              break;
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              gaudi2->active_tpc_arc &= ~(BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
 +              break;
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              gaudi2->active_nic_arc &= ~(BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
 +              break;
 +
 +      default:
 +              return;
 +      }
 +}
 +
 +static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      switch (arc_id) {
 +      case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
 +      case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
 +              gaudi2->active_hw_arc |= BIT_ULL(arc_id);
 +              break;
 +
 +      case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
 +              gaudi2->active_tpc_arc |= BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0);
 +              break;
 +
 +      case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
 +              gaudi2->active_nic_arc |= BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0);
 +              break;
 +
 +      default:
 +              return;
 +      }
 +}
 +
 +static void gaudi2_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 pq_offset, reg_base, db_reg_offset, db_value;
 +
 +      if (hw_queue_id != GAUDI2_QUEUE_ID_CPU_PQ) {
 +              /*
 +               * QMAN has 4 successive PQ_PI registers, 1 for each of the QMAN PQs.
 +               * Masking the H/W queue ID with 0x3 extracts the QMAN internal PQ
 +               * number.
 +               */
 +              pq_offset = (hw_queue_id & 0x3) * 4;
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              db_reg_offset = reg_base + QM_PQ_PI_0_OFFSET + pq_offset;
 +      } else {
 +              db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GAUDI2_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +              WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
 +      }
 +}
 +
 +static void gaudi2_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
 +{
 +      __le64 *pbd = (__le64 *) bd;
 +
 +      /* The QMANs are on the host memory so a simple copy suffice */
 +      pqe[0] = pbd[0];
 +      pqe[1] = pbd[1];
 +}
 +
 +static void *gaudi2_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                              dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      return dma_alloc_coherent(&hdev->pdev->dev, size, dma_handle, flags);
 +}
 +
 +static void gaudi2_dma_free_coherent(struct hl_device *hdev, size_t size,
 +                              void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, dma_handle);
 +}
 +
 +static int gaudi2_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 +                              u32 timeout, u64 *result)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GAUDI2_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GAUDI2_QUEUE_ID_CPU_PQ, msg, len, timeout, result);
 +}
 +
 +static void *gaudi2_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +                              gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      if (size > GAUDI2_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      return dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +}
 +
 +static void gaudi2_dma_pool_free(struct hl_device *hdev, void *vaddr, dma_addr_t dma_addr)
 +{
 +      dma_pool_free(hdev->dma_pool, vaddr, dma_addr);
 +}
 +
 +static void *gaudi2_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 +                                              dma_addr_t *dma_handle)
 +{
 +      return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +}
 +
 +static void gaudi2_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size, void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +static dma_addr_t gaudi2_dma_map_single(struct hl_device *hdev, void *addr, int len,
 +                                      enum dma_data_direction dir)
 +{
 +      dma_addr_t dma_addr;
 +
 +      dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, dir);
 +      if (unlikely(dma_mapping_error(&hdev->pdev->dev, dma_addr)))
 +              return 0;
 +
 +      return dma_addr;
 +}
 +
 +static void gaudi2_dma_unmap_single(struct hl_device *hdev, dma_addr_t addr, int len,
 +                                      enum dma_data_direction dir)
 +{
 +      dma_unmap_single(&hdev->pdev->dev, addr, len, dir);
 +}
 +
 +static int gaudi2_validate_cb_address(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!gaudi2_is_queue_enabled(hdev, parser->hw_queue_id)) {
 +              dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
 +              return -EINVAL;
 +      }
 +
 +      /* Just check if CB address is valid */
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->sram_user_base_address,
 +                                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->dram_user_base_address,
 +                                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      if ((gaudi2->hw_cap_initialized & HW_CAP_DMMU_MASK) &&
 +              hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                              parser->user_cb_size,
 +                                              asic_prop->dmmu.start_addr,
 +                                              asic_prop->dmmu.end_addr))
 +              return 0;
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_PMMU) {
 +              if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu.start_addr,
 +                                      asic_prop->pmmu.end_addr) ||
 +                      hl_mem_area_inside_range(
 +                                      (u64) (uintptr_t) parser->user_cb,
 +                                      parser->user_cb_size,
 +                                      asic_prop->pmmu_huge.start_addr,
 +                                      asic_prop->pmmu_huge.end_addr))
 +                      return 0;
 +
 +      } else if (gaudi2_host_phys_addr_valid((u64) (uintptr_t) parser->user_cb)) {
 +              if (!hdev->pdev)
 +                      return 0;
 +
 +              if (!device_iommu_mapped(&hdev->pdev->dev))
 +                      return 0;
 +      }
 +
 +      dev_err(hdev->dev, "CB address %p + 0x%x for internal QMAN is not valid\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +static int gaudi2_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!parser->is_kernel_allocated_cb)
 +              return gaudi2_validate_cb_address(hdev, parser);
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
 +              dev_err(hdev->dev, "PMMU not initialized - Unsupported mode in Gaudi2\n");
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +/* This is an internal helper function, used to update the KDMA mmu props.
 + * Should be called with a proper kdma lock.
 + */
 +static void gaudi2_kdma_set_mmbp_asid(struct hl_device *hdev,
 +                                         bool mmu_bypass, u32 asid)
 +{
 +      u32 rw_asid, rw_mmu_bp;
 +
 +      rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                    (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +
 +      rw_mmu_bp = (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_SHIFT) |
 +                      (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_SHIFT);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP, rw_mmu_bp);
 +}
 +
 +static void gaudi2_arm_cq_monitor(struct hl_device *hdev, u32 sob_id, u32 mon_id, u32 cq_id,
 +                                              u32 mon_payload, u32 sync_value)
 +{
 +      u32 sob_offset, mon_offset, sync_group_id, mode, mon_arm;
 +      u8 mask;
 +
 +      sob_offset = sob_id * 4;
 +      mon_offset = mon_id * 4;
 +
 +      /* Reset the SOB value */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
 +
 +      /* Configure this address with CQ_ID 0 because CQ_EN is set */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, cq_id);
 +
 +      /* Configure this address with CS index because CQ_EN is set */
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, mon_payload);
 +
 +      sync_group_id = sob_id / 8;
 +      mask = ~(1 << (sob_id & 0x7));
 +      mode = 1; /* comparison mode is "equal to" */
 +
 +      mon_arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, sync_value);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask);
 +      mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sync_group_id);
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, mon_arm);
 +}
 +
 +/* This is an internal helper function used by gaudi2_send_job_to_kdma only */
 +static int gaudi2_send_job_to_kdma(struct hl_device *hdev,
 +                                      u64 src_addr, u64 dst_addr,
 +                                      u32 size, bool is_memset)
 +{
 +      u32 comp_val, commit_mask, *polling_addr, timeout, status = 0;
 +      struct hl_cq_entry *cq_base;
 +      struct hl_cq *cq;
 +      u64 comp_addr;
 +      int rc;
 +
 +      gaudi2_arm_cq_monitor(hdev, GAUDI2_RESERVED_SOB_KDMA_COMPLETION,
 +                              GAUDI2_RESERVED_MON_KDMA_COMPLETION,
 +                              GAUDI2_RESERVED_CQ_KDMA_COMPLETION, 1, 1);
 +
 +      comp_addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 +
 +                      (GAUDI2_RESERVED_SOB_KDMA_COMPLETION * sizeof(u32));
 +
 +      comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
 +                      FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_LO, lower_32_bits(src_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_HI, upper_32_bits(src_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_LO, lower_32_bits(dst_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_HI, upper_32_bits(dst_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_LO, lower_32_bits(comp_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_HI, upper_32_bits(comp_addr));
 +      WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_WDATA, comp_val);
 +      WREG32(mmARC_FARM_KDMA_CTX_DST_TSIZE_0, size);
 +
 +      commit_mask = FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_LIN_MASK, 1) |
 +                              FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_WR_COMP_EN_MASK, 1);
 +
 +      if (is_memset)
 +              commit_mask |= FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_MEM_SET_MASK, 1);
 +
 +      WREG32(mmARC_FARM_KDMA_CTX_COMMIT, commit_mask);
 +
 +      /* Wait for completion */
 +      cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_KDMA_COMPLETION];
 +      cq_base = cq->kernel_address;
 +      polling_addr = (u32 *)&cq_base[cq->ci];
 +
 +      if (hdev->pldm)
 +              /* for each 1MB 20 second of timeout */
 +              timeout = ((size / SZ_1M) + 1) * USEC_PER_SEC * 20;
 +      else
 +              timeout = KDMA_TIMEOUT_USEC;
 +
 +      /* Polling */
 +      rc = hl_poll_timeout_memory(
 +                      hdev,
 +                      polling_addr,
 +                      status,
 +                      (status == 1),
 +                      1000,
 +                      timeout,
 +                      true);
 +
 +      *polling_addr = 0;
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "Timeout while waiting for KDMA to be idle\n");
 +              WREG32(mmARC_FARM_KDMA_CFG_1, 1 << ARC_FARM_KDMA_CFG_1_HALT_SHIFT);
 +              return rc;
 +      }
 +
 +      cq->ci = hl_cq_inc_ptr(cq->ci);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val)
 +{
 +      u32 i;
 +
 +      for (i = 0 ; i < size ; i += sizeof(u32))
 +              WREG32(addr + i, val);
 +}
 +
 +static void gaudi2_qman_set_test_mode(struct hl_device *hdev, u32 hw_queue_id, bool enable)
 +{
 +      u32 reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +
 +      if (enable) {
 +              WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED_TEST_MODE);
 +              WREG32(reg_base + QM_PQC_CFG_OFFSET, 0);
 +      } else {
 +              WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED);
 +              WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
 +      }
 +}
 +
 +static int gaudi2_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      u32 sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      u32 sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      u32 timeout_usec, tmp, sob_base = 1, sob_val = 0x5a5a;
 +      struct packet_msg_short *msg_short_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      size_t pkt_size;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC;
 +      else
 +              timeout_usec = GAUDI2_TEST_QUEUE_WAIT_USEC;
 +
 +      pkt_size = sizeof(*msg_short_pkt);
 +      msg_short_pkt = hl_asic_dma_pool_zalloc(hdev, pkt_size, GFP_KERNEL, &pkt_dma_addr);
 +      if (!msg_short_pkt) {
 +              dev_err(hdev->dev, "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      tmp = (PACKET_MSG_SHORT << GAUDI2_PKT_CTL_OPCODE_SHIFT) |
 +              (1 << GAUDI2_PKT_CTL_EB_SHIFT) |
 +              (1 << GAUDI2_PKT_CTL_MB_SHIFT) |
 +              (sob_base << GAUDI2_PKT_SHORT_CTL_BASE_SHIFT) |
 +              (sob_offset << GAUDI2_PKT_SHORT_CTL_ADDR_SHIFT);
 +
 +      msg_short_pkt->value = cpu_to_le32(sob_val);
 +      msg_short_pkt->ctl = cpu_to_le32(tmp);
 +
 +      /* Reset the SOB value */
 +      WREG32(sob_addr, 0);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send msg_short packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout(
 +                      hdev,
 +                      sob_addr,
 +                      tmp,
 +                      (tmp == sob_val),
 +                      1000,
 +                      timeout_usec);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "H/W queue %d test failed (SOB_OBJ_0 == 0x%x)\n",
 +                      hw_queue_id, tmp);
 +              rc = -EIO;
 +      }
 +
 +      /* Reset the SOB value */
 +      WREG32(sob_addr, 0);
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) msg_short_pkt, pkt_dma_addr);
 +      return rc;
 +}
 +
 +static int gaudi2_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +static int gaudi2_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ; i++) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_qman_set_test_mode(hdev, i, true);
 +              rc = gaudi2_test_queue(hdev, i);
 +              gaudi2_qman_set_test_mode(hdev, i, false);
 +
 +              if (rc) {
 +                      ret_val = -EINVAL;
 +                      goto done;
 +              }
 +      }
 +
 +      rc = gaudi2_test_cpu_queue(hdev);
 +      if (rc) {
 +              ret_val = -EINVAL;
 +              goto done;
 +      }
 +
 +done:
 +      return ret_val;
 +}
 +
 +static int gaudi2_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      size_t irq_arr_size;
 +
 +      /* TODO: missing gaudi2_nic_resume.
 +       * Until implemented nic_hw_cap_initialized will remain zeroed
 +       */
 +      gaudi2_init_arcs(hdev);
 +      gaudi2_scrub_arcs_dccm(hdev);
 +      gaudi2_init_security(hdev);
 +
 +      /* Unmask all IRQs since some could have been received during the soft reset */
 +      irq_arr_size = gaudi2->num_of_valid_hw_events * sizeof(gaudi2->hw_events[0]);
 +      return hl_fw_unmask_irq_arr(hdev, gaudi2->hw_events, irq_arr_size);
 +}
 +
 +static void gaudi2_is_tpc_engine_idle(struct hl_device *hdev, int dcore, int inst, u32 offset,
 +                                      struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_tpc_idle_data *idle_data = ctx->data;
 +      u32 tpc_cfg_sts, qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts;
 +      bool is_eng_idle;
 +      int engine_idx;
 +
 +      if ((dcore == 0) && (inst == (NUM_DCORE0_TPC - 1)))
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +      else
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_0 +
 +                              dcore * GAUDI2_ENGINE_ID_DCORE_OFFSET + inst;
 +
 +      tpc_cfg_sts = RREG32(mmDCORE0_TPC0_CFG_STATUS + offset);
 +      qm_glbl_sts0 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS0 + offset);
 +      qm_glbl_sts1 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS1 + offset);
 +      qm_cgm_sts = RREG32(mmDCORE0_TPC0_QM_CGM_STS + offset);
 +
 +      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                                              IS_TPC_IDLE(tpc_cfg_sts);
 +      *(idle_data->is_idle) &= is_eng_idle;
 +
 +      if (idle_data->mask && !is_eng_idle)
 +              set_bit(engine_idx, idle_data->mask);
 +
 +      if (idle_data->e)
 +              hl_engine_data_sprintf(idle_data->e,
 +                                      idle_data->tpc_fmt, dcore, inst,
 +                                      is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
 +}
 +
 +static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +                                      struct engines_data *e)
 +{
 +      u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask,
 +              mme_arch_sts, dec_swreg15, dec_enabled_bit;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      const char *rot_fmt = "%-6d%-5d%-9s%#-14x%#-12x%s\n";
 +      unsigned long *mask = (unsigned long *) mask_arr;
 +      const char *edma_fmt = "%-6d%-6d%-9s%#-14x%#x\n";
 +      const char *mme_fmt = "%-5d%-6s%-9s%#-14x%#x\n";
 +      const char *nic_fmt = "%-5d%-9s%#-14x%#-12x\n";
 +      const char *pdma_fmt = "%-6d%-9s%#-14x%#x\n";
 +      const char *pcie_dec_fmt = "%-10d%-9s%#x\n";
 +      const char *dec_fmt = "%-6d%-5d%-9s%#x\n";
 +      bool is_idle = true, is_eng_idle;
 +      u64 offset;
 +
 +      struct gaudi2_tpc_idle_data tpc_idle_data = {
 +              .tpc_fmt = "%-6d%-5d%-9s%#-14x%#-12x%#x\n",
 +              .e = e,
 +              .mask = mask,
 +              .is_idle = &is_idle,
 +      };
 +      struct iterate_module_ctx tpc_iter = {
 +              .fn = &gaudi2_is_tpc_engine_idle,
 +              .data = &tpc_idle_data,
 +      };
 +
 +      int engine_idx, i, j;
 +
 +      /* EDMA, Two engines per Dcore */
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  EDMA  is_idle  QM_GLBL_STS0  DMA_CORE_IDLE_IND_MASK\n"
 +                      "----  ----  -------  ------------  ----------------------\n");
 +
 +      for (i = 0; i < NUM_OF_DCORES; i++) {
 +              for (j = 0 ; j < NUM_OF_EDMA_PER_DCORE ; j++) {
 +                      int seq = i * NUM_OF_EDMA_PER_DCORE + j;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(seq)))
 +                              continue;
 +
 +                      engine_idx = GAUDI2_DCORE0_ENGINE_ID_EDMA_0 +
 +                                      i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
 +                      offset = i * DCORE_OFFSET + j * DCORE_EDMA_OFFSET;
 +
 +                      dma_core_idle_ind_mask =
 +                      RREG32(mmDCORE0_EDMA0_CORE_IDLE_IND_MASK + offset);
 +
 +                      qm_glbl_sts0 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS0 + offset);
 +                      qm_glbl_sts1 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS1 + offset);
 +                      qm_cgm_sts = RREG32(mmDCORE0_EDMA0_QM_CGM_STS + offset);
 +
 +                      is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                                      IS_DMA_IDLE(dma_core_idle_ind_mask);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(engine_idx, mask);
 +
 +                      if (e)
 +                              hl_engine_data_sprintf(e, edma_fmt, i, j,
 +                                                      is_eng_idle ? "Y" : "N",
 +                                                      qm_glbl_sts0,
 +                                                      dma_core_idle_ind_mask);
 +              }
 +      }
 +
 +      /* PDMA, Two engines in Full chip */
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                                      "\nPDMA  is_idle  QM_GLBL_STS0  DMA_CORE_IDLE_IND_MASK\n"
 +                                      "----  -------  ------------  ----------------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_PDMA ; i++) {
 +              engine_idx = GAUDI2_ENGINE_ID_PDMA_0 + i;
 +              offset = i * PDMA_OFFSET;
 +              dma_core_idle_ind_mask = RREG32(mmPDMA0_CORE_IDLE_IND_MASK + offset);
 +
 +              qm_glbl_sts0 = RREG32(mmPDMA0_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmPDMA0_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmPDMA0_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
 +                              IS_DMA_IDLE(dma_core_idle_ind_mask);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, pdma_fmt, i, is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, dma_core_idle_ind_mask);
 +      }
 +
 +      /* NIC, twelve macros in Full chip */
 +      if (e && hdev->nic_ports_mask)
 +              hl_engine_data_sprintf(e,
 +                                      "\nNIC  is_idle  QM_GLBL_STS0  QM_CGM_STS\n"
 +                                      "---  -------  ------------  ----------\n");
 +
 +      for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
 +              if (!(i & 1))
 +                      offset = i / 2 * NIC_OFFSET;
 +              else
 +                      offset += NIC_QM_OFFSET;
 +
 +              if (!(hdev->nic_ports_mask & BIT(i)))
 +                      continue;
 +
 +              engine_idx = GAUDI2_ENGINE_ID_NIC0_0 + i;
 +
 +
 +              qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmNIC0_QM0_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, nic_fmt, i, is_eng_idle ? "Y" : "N",
 +                                              qm_glbl_sts0, qm_cgm_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                                      "\nMME  Stub  is_idle  QM_GLBL_STS0  MME_ARCH_STATUS\n"
 +                                      "---  ----  -------  ------------  ---------------\n");
 +      /* MME, one per Dcore */
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              engine_idx = GAUDI2_DCORE0_ENGINE_ID_MME + i * GAUDI2_ENGINE_ID_DCORE_OFFSET;
 +              offset = i * DCORE_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmDCORE0_MME_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmDCORE0_MME_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmDCORE0_MME_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              mme_arch_sts = RREG32(mmDCORE0_MME_CTRL_LO_ARCH_STATUS + offset);
 +              is_eng_idle &= IS_MME_IDLE(mme_arch_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, mme_fmt, i, "N",
 +                              is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0,
 +                              mme_arch_sts);
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +      }
 +
 +      /*
 +       * TPC
 +       */
 +      if (e && prop->tpc_enabled_mask)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  TPC   is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_IDLE_IND_MASK\n"
 +                      "----  ---  --------  ------------  ----------  ----------------------\n");
 +
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +
 +      /* Decoders, two each Dcore and two shared PCIe decoders */
 +      if (e && (prop->decoder_enabled_mask & (~PCIE_DEC_EN_MASK)))
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  DEC  is_idle  VSI_CMD_SWREG15\n"
 +                      "----  ---  -------  ---------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++) {
 +              for (j = 0 ; j < NUM_OF_DEC_PER_DCORE ; j++) {
 +                      dec_enabled_bit = 1 << (i * NUM_OF_DEC_PER_DCORE + j);
 +                      if (!(prop->decoder_enabled_mask & dec_enabled_bit))
 +                              continue;
 +
 +                      engine_idx = GAUDI2_DCORE0_ENGINE_ID_DEC_0 +
 +                                      i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
 +                      offset = i * DCORE_OFFSET + j * DCORE_DEC_OFFSET;
 +
 +                      dec_swreg15 = RREG32(mmDCORE0_DEC0_CMD_SWREG15 + offset);
 +                      is_eng_idle = IS_DEC_IDLE(dec_swreg15);
 +                      is_idle &= is_eng_idle;
 +
 +                      if (mask && !is_eng_idle)
 +                              set_bit(engine_idx, mask);
 +
 +                      if (e)
 +                              hl_engine_data_sprintf(e, dec_fmt, i, j,
 +                                                      is_eng_idle ? "Y" : "N", dec_swreg15);
 +              }
 +      }
 +
 +      if (e && (prop->decoder_enabled_mask & PCIE_DEC_EN_MASK))
 +              hl_engine_data_sprintf(e,
 +                      "\nPCIe DEC  is_idle  VSI_CMD_SWREG15\n"
 +                      "--------  -------  ---------------\n");
 +
 +      /* Check shared(PCIe) decoders */
 +      for (i = 0 ; i < NUM_OF_DEC_PER_DCORE ; i++) {
 +              dec_enabled_bit = PCIE_DEC_SHIFT + i;
 +              if (!(prop->decoder_enabled_mask & BIT(dec_enabled_bit)))
 +                      continue;
 +
 +              engine_idx = GAUDI2_PCIE_ENGINE_ID_DEC_0 + i;
 +              offset = i * DCORE_DEC_OFFSET;
 +              dec_swreg15 = RREG32(mmPCIE_DEC0_CMD_SWREG15 + offset);
 +              is_eng_idle = IS_DEC_IDLE(dec_swreg15);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, pcie_dec_fmt, i,
 +                                              is_eng_idle ? "Y" : "N", dec_swreg15);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nCORE  ROT  is_idle  QM_GLBL_STS0  QM_CGM_STS  DMA_CORE_STS0\n"
 +                      "----  ----  -------  ------------  ----------  -------------\n");
 +
 +      for (i = 0 ; i < NUM_OF_ROT ; i++) {
 +              engine_idx = GAUDI2_ENGINE_ID_ROT_0 + i;
 +
 +              offset = i * ROT_OFFSET;
 +
 +              qm_glbl_sts0 = RREG32(mmROT0_QM_GLBL_STS0 + offset);
 +              qm_glbl_sts1 = RREG32(mmROT0_QM_GLBL_STS1 + offset);
 +              qm_cgm_sts = RREG32(mmROT0_QM_CGM_STS + offset);
 +
 +              is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(engine_idx, mask);
 +
 +              if (e)
 +                      hl_engine_data_sprintf(e, rot_fmt, i, 0, is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, qm_cgm_sts, "-");
 +      }
 +
 +      return is_idle;
 +}
 +
 +static void gaudi2_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&gaudi2->hw_queues_lock)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      spin_lock(&gaudi2->hw_queues_lock);
 +}
 +
 +static void gaudi2_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&gaudi2->hw_queues_lock)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      spin_unlock(&gaudi2->hw_queues_lock);
 +}
 +
 +static u32 gaudi2_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int gaudi2_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static void gaudi2_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_IF_EQ_RD_OFFS, val);
 +}
 +
 +static void *gaudi2_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(gaudi2->events_stat_aggregate);
 +              return gaudi2->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(gaudi2->events_stat);
 +      return gaudi2->events_stat;
 +}
 +
 +static void gaudi2_mmu_vdec_dcore_prepare(struct hl_device *hdev, int dcore_id,
 +                              int dcore_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmDCORE0_VDEC1_BRDG_CTRL_BASE - mmDCORE0_VDEC0_BRDG_CTRL_BASE) *
 +                      dcore_vdec_id + DCORE_OFFSET * dcore_id;
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gaudi2_mmu_dcore_prepare(struct hl_device *hdev, int dcore_id, u32 asid)
 +{
 +      u32 rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                      (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 dcore_offset = dcore_id * DCORE_OFFSET;
 +      u32 vdec_id, i, ports_offset, reg_val;
 +      u8 edma_seq_base;
 +
 +      /* EDMA */
 +      edma_seq_base = dcore_id * NUM_OF_EDMA_PER_DCORE;
 +      if (prop->edma_enabled_mask & BIT(edma_seq_base)) {
 +              WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +      }
 +
 +      if (prop->edma_enabled_mask & BIT(edma_seq_base + 1)) {
 +              WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +              WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +              WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      }
 +
 +      /* Sync Mngr */
 +      WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV + dcore_offset, asid);
 +      /*
 +       * Sync Mngrs on dcores 1 - 3 are exposed to user, so must use user ASID
 +       * for any access type
 +       */
 +      if (dcore_id > 0) {
 +              reg_val = (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_RD_SHIFT) |
 +                        (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_WR_SHIFT);
 +              WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID + dcore_offset, reg_val);
 +              WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      }
 +
 +      WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_MMU_BP + dcore_offset, 0);
 +      WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_ASID + dcore_offset, rw_asid);
 +
 +      for (i = 0 ; i < NUM_OF_MME_SBTE_PORTS ; i++) {
 +              ports_offset = i * DCORE_MME_SBTE_OFFSET;
 +              WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_MMU_BP +
 +                              dcore_offset + ports_offset, 0);
 +              WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_ASID +
 +                              dcore_offset + ports_offset, rw_asid);
 +      }
 +
 +      for (i = 0 ; i < NUM_OF_MME_WB_PORTS ; i++) {
 +              ports_offset = i * DCORE_MME_WB_OFFSET;
 +              WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_MMU_BP +
 +                              dcore_offset + ports_offset, 0);
 +              WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_ASID +
 +                              dcore_offset + ports_offset, rw_asid);
 +      }
 +
 +      WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
 +      WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
 +
 +      /*
 +       * Decoders
 +       */
 +      for (vdec_id = 0 ; vdec_id < NUM_OF_DEC_PER_DCORE ; vdec_id++) {
 +              if (prop->decoder_enabled_mask & BIT(dcore_id * NUM_OF_DEC_PER_DCORE + vdec_id))
 +                      gaudi2_mmu_vdec_dcore_prepare(hdev, dcore_id, vdec_id, rw_asid, 0);
 +      }
 +}
 +
 +static void gudi2_mmu_vdec_shared_prepare(struct hl_device *hdev,
 +                              int shared_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmPCIE_VDEC1_BRDG_CTRL_BASE - mmPCIE_VDEC0_BRDG_CTRL_BASE) * shared_vdec_id;
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
 +
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gudi2_mmu_arc_farm_arc_dup_eng_prepare(struct hl_device *hdev, int arc_farm_id,
 +                                                      u32 rw_asid, u32 rw_mmu_bp)
 +{
 +      u32 offset = (mmARC_FARM_ARC1_DUP_ENG_BASE - mmARC_FARM_ARC0_DUP_ENG_BASE) * arc_farm_id;
 +
 +      WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_MMU_BP + offset, rw_mmu_bp);
 +      WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_ASID + offset, rw_asid);
 +}
 +
 +static void gaudi2_arc_mmu_prepare(struct hl_device *hdev, u32 cpu_id, u32 asid)
 +{
 +      u32 reg_base, reg_offset, reg_val = 0;
 +
 +      reg_base = gaudi2_arc_blocks_bases[cpu_id];
 +
 +      /* Enable MMU and configure asid for all relevant ARC regions */
 +      reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_MMU_BP_MASK, 0);
 +      reg_val |= FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_0_ASID_MASK, asid);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION3_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION4_HBM0_FW);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION5_HBM1_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION6_HBM2_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION7_HBM3_GC_DATA);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION9_PCIE);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION10_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION11_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION12_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION13_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +
 +      reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION14_GENERAL);
 +      WREG32(reg_base + reg_offset, reg_val);
 +}
 +
 +static int gaudi2_arc_mmu_prepare_all(struct hl_device *hdev, u32 asid)
 +{
 +      int i;
 +
 +      if (hdev->fw_components & FW_TYPE_BOOT_CPU)
 +              return hl_fw_cpucp_engine_core_asid_set(hdev, asid);
 +
 +      for (i = CPU_ID_SCHED_ARC0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
 +              gaudi2_arc_mmu_prepare(hdev, i, asid);
 +
 +      for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
 +              if (!gaudi2_is_queue_enabled(hdev, i))
 +                      continue;
 +
 +              gaudi2_arc_mmu_prepare(hdev, gaudi2_queue_id_to_arc_id[i], asid);
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_mmu_shared_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 rw_asid, offset;
 +      int rc, i;
 +
 +      rw_asid = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_MASK, asid) |
 +                      FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_MASK, asid);
 +
 +      WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
 +      WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
 +      WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_MMU_BP, 0);
 +
 +      WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
 +      WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
 +      WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_ASID, rw_asid);
 +      WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_MMU_BP, 0);
 +
 +      /* ROT */
 +      for (i = 0 ; i < NUM_OF_ROT ; i++) {
 +              offset = i * ROT_OFFSET;
 +              WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_ASID + offset, rw_asid);
 +              WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
 +              RMWREG32(mmROT0_CPL_QUEUE_AWUSER + offset, asid, MMUBP_ASID_MASK);
 +              RMWREG32(mmROT0_DESC_HBW_ARUSER_LO + offset, asid, MMUBP_ASID_MASK);
 +              RMWREG32(mmROT0_DESC_HBW_AWUSER_LO + offset, asid, MMUBP_ASID_MASK);
 +      }
 +
 +      /* Shared Decoders are the last bits in the decoders mask */
 +      if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 0))
 +              gudi2_mmu_vdec_shared_prepare(hdev, 0, rw_asid, 0);
 +
 +      if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 1))
 +              gudi2_mmu_vdec_shared_prepare(hdev, 1, rw_asid, 0);
 +
 +      /* arc farm arc dup eng */
 +      for (i = 0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
 +              gudi2_mmu_arc_farm_arc_dup_eng_prepare(hdev, i, rw_asid, 0);
 +
 +      rc = gaudi2_arc_mmu_prepare_all(hdev, asid);
 +      if (rc)
 +              return rc;
 +
 +      return 0;
 +}
 +
 +static void gaudi2_tpc_mmu_prepare(struct hl_device *hdev, int dcore, int inst,       u32 offset,
 +                                      struct iterate_module_ctx *ctx)
 +{
 +      struct gaudi2_tpc_mmu_data *mmu_data = ctx->data;
 +
 +      WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_MMU_BP + offset, 0);
 +      WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_ASID + offset, mmu_data->rw_asid);
 +      WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
 +      WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_ASID + offset, mmu_data->rw_asid);
 +}
 +
 +/* zero the MMUBP and set the ASID */
 +static int gaudi2_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      struct gaudi2_tpc_mmu_data tpc_mmu_data;
 +      struct iterate_module_ctx tpc_iter = {
 +              .fn = &gaudi2_tpc_mmu_prepare,
 +              .data = &tpc_mmu_data,
 +      };
 +      int rc, i;
 +
 +      if (asid & ~DCORE0_HMMU0_STLB_ASID_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return -EINVAL;
 +      }
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_MMU_MASK))
 +              return 0;
 +
 +      rc = gaudi2_mmu_shared_prepare(hdev, asid);
 +      if (rc)
 +              return rc;
 +
 +      /* configure DCORE MMUs */
 +      tpc_mmu_data.rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
 +                              (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
 +      gaudi2_iterate_tpcs(hdev, &tpc_iter);
 +      for (i = 0 ; i < NUM_OF_DCORES ; i++)
 +              gaudi2_mmu_dcore_prepare(hdev, i, asid);
 +
 +      return 0;
 +}
 +
 +static inline bool is_info_event(u32 event)
 +{
 +      switch (event) {
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S ... GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +
 +      /* return in case of NIC status event - these events are received periodically and not as
 +       * an indication to an error.
 +       */
 +      case GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 ... GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static void gaudi2_print_event(struct hl_device *hdev, u16 event_type,
 +                      bool ratelimited, const char *fmt, ...)
 +{
 +      struct va_format vaf;
 +      va_list args;
 +
 +      va_start(args, fmt);
 +      vaf.fmt = fmt;
 +      vaf.va = &args;
 +
 +      if (ratelimited)
 +              dev_err_ratelimited(hdev->dev, "%s: %pV\n",
 +                      gaudi2_irq_map_table[event_type].valid ?
 +                      gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
 +      else
 +              dev_err(hdev->dev, "%s: %pV\n",
 +                      gaudi2_irq_map_table[event_type].valid ?
 +                      gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
 +
 +      va_end(args);
 +}
 +
 +static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type,
 +              struct hl_eq_ecc_data *ecc_data)
 +{
 +      u64 ecc_address = 0, ecc_syndrom = 0;
 +      u8 memory_wrapper_idx = 0;
 +
 +      ecc_address = le64_to_cpu(ecc_data->ecc_address);
 +      ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
 +      memory_wrapper_idx = ecc_data->memory_wrapper_idx;
 +
 +      gaudi2_print_event(hdev, event_type, !ecc_data->is_critical,
 +              "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.\n",
 +              ecc_address, ecc_syndrom, memory_wrapper_idx, ecc_data->is_critical);
 +
 +      return !!ecc_data->is_critical;
 +}
 +
 +/*
 + * gaudi2_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
 + *
 + * @idx: the current pi/ci value
 + * @q_len: the queue length (power of 2)
 + *
 + * @return the cyclically decremented index
 + */
 +static inline u32 gaudi2_queue_idx_dec(u32 idx, u32 q_len)
 +{
 +      u32 mask = q_len - 1;
 +
 +      /*
 +       * modular decrement is equivalent to adding (queue_size -1)
 +       * later we take LSBs to make sure the value is in the
 +       * range [0, queue_len - 1]
 +       */
 +      return (idx + q_len - 1) & mask;
 +}
 +
 +/**
 + * gaudi2_print_sw_config_stream_data - print SW config stream data
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + */
 +static void gaudi2_print_sw_config_stream_data(struct hl_device *hdev,
 +                                              u32 stream, u64 qman_base)
 +{
 +      u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
 +      u32 cq_ptr_lo_off, size;
 +
 +      cq_ptr_lo_off = mmDCORE0_TPC0_QM_CQ_PTR_LO_1 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0;
 +
 +      cq_ptr_lo = qman_base + (mmDCORE0_TPC0_QM_CQ_PTR_LO_0 - mmDCORE0_TPC0_QM_BASE) +
 +                                                                      stream * cq_ptr_lo_off;
 +
 +      cq_ptr_hi = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_PTR_HI_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_tsize = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_TSIZE_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
 +
 +      cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
 +      size = RREG32(cq_tsize);
 +      dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %x\n",
 +              stream, cq_ptr, size);
 +}
 +
 +/**
 + * gaudi2_print_last_pqes_on_err - print last PQEs on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
 + */
 +static void gaudi2_print_last_pqes_on_err(struct hl_device *hdev, u32 qid_base, u32 stream,
 +                                              u64 qman_base, bool pr_sw_conf)
 +{
 +      u32 ci, qm_ci_stream_off;
 +      struct hl_hw_queue *q;
 +      u64 pq_ci;
 +      int i;
 +
 +      q = &hdev->kernel_queues[qid_base + stream];
 +
 +      qm_ci_stream_off = mmDCORE0_TPC0_QM_PQ_CI_1 - mmDCORE0_TPC0_QM_PQ_CI_0;
 +      pq_ci = qman_base + (mmDCORE0_TPC0_QM_PQ_CI_0 - mmDCORE0_TPC0_QM_BASE) +
 +                                              stream * qm_ci_stream_off;
 +
 +      hdev->asic_funcs->hw_queues_lock(hdev);
 +
 +      if (pr_sw_conf)
 +              gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
 +
 +      ci = RREG32(pq_ci);
 +
 +      /* we should start printing form ci -1 */
 +      ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
 +
 +      for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
 +              struct hl_bd *bd;
 +              u64 addr;
 +              u32 len;
 +
 +              bd = q->kernel_address;
 +              bd += ci;
 +
 +              len = le32_to_cpu(bd->len);
 +              /* len 0 means uninitialized entry- break */
 +              if (!len)
 +                      break;
 +
 +              addr = le64_to_cpu(bd->ptr);
 +
 +              dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %x\n",
 +                      stream, ci, addr, len);
 +
 +              /* get previous ci, wrap if needed */
 +              ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
 +      }
 +
 +      hdev->asic_funcs->hw_queues_unlock(hdev);
 +}
 +
 +/**
 + * print_qman_data_on_err - extract QMAN data on error
 + *
 + * @hdev: pointer to the habanalabs device structure
 + * @qid_base: first QID of the QMAN (out of 4 streams)
 + * @stream: the QMAN's stream
 + * @qman_base: base address of QMAN registers block
 + *
 + * This function attempt to extract as much data as possible on QMAN error.
 + * On upper CP print the SW config stream data and last 8 PQEs.
 + * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
 + */
 +static void print_qman_data_on_err(struct hl_device *hdev, u32 qid_base, u32 stream, u64 qman_base)
 +{
 +      u32 i;
 +
 +      if (stream != QMAN_STREAMS) {
 +              gaudi2_print_last_pqes_on_err(hdev, qid_base, stream, qman_base, true);
 +              return;
 +      }
 +
 +      gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
 +
 +      for (i = 0 ; i < QMAN_STREAMS ; i++)
 +              gaudi2_print_last_pqes_on_err(hdev, qid_base, i, qman_base, false);
 +}
 +
 +static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, u16 event_type,
 +                                                      u64 qman_base, u32 qid_base)
 +{
 +      u32 i, j, glbl_sts_val, arb_err_val, num_error_causes, error_count = 0;
 +      u64 glbl_sts_addr, arb_err_addr;
 +      char reg_desc[32];
 +
 +      glbl_sts_addr = qman_base + (mmDCORE0_TPC0_QM_GLBL_ERR_STS_0 - mmDCORE0_TPC0_QM_BASE);
 +      arb_err_addr = qman_base + (mmDCORE0_TPC0_QM_ARB_ERR_CAUSE - mmDCORE0_TPC0_QM_BASE);
 +
 +      /* Iterate through all stream GLBL_ERR_STS registers + Lower CP */
 +      for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
 +              glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
 +
 +              if (!glbl_sts_val)
 +                      continue;
 +
 +              if (i == QMAN_STREAMS) {
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
 +                      num_error_causes = GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE;
 +              } else {
 +                      snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
 +                      num_error_causes = GAUDI2_NUM_OF_QM_ERR_CAUSE;
 +              }
 +
 +              for (j = 0 ; j < num_error_causes ; j++)
 +                      if (glbl_sts_val & BIT(j)) {
 +                              gaudi2_print_event(hdev, event_type, true,
 +                                      "%s. err cause: %s", reg_desc,
 +                                      i == QMAN_STREAMS ?
 +                                      gaudi2_qman_lower_cp_error_cause[j] :
 +                                      gaudi2_qman_error_cause[j]);
 +                              error_count++;
 +                      }
 +
 +              print_qman_data_on_err(hdev, qid_base, i, qman_base);
 +      }
 +
 +      arb_err_val = RREG32(arb_err_addr);
 +
 +      if (!arb_err_val)
 +              goto out;
 +
 +      for (j = 0 ; j < GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
 +              if (arb_err_val & BIT(j)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "ARB_ERR. err cause: %s",
 +                              gaudi2_qman_arb_error_cause[j]);
 +                      error_count++;
 +              }
 +      }
 +
 +out:
 +      return error_count;
 +}
 +
 +static void gaudi2_razwi_rr_hbw_shared_printf_info(struct hl_device *hdev,
 +                      u64 rtr_mstr_if_base_addr, bool is_write, char *name,
 +                      enum gaudi2_engine_id id, u64 *event_mask)
 +{
 +      u32 razwi_hi, razwi_lo, razwi_xy;
 +      u16 eng_id = id;
 +      u8 rd_wr_flag;
 +
 +      if (is_write) {
 +              razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HI);
 +              razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_LO);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +      } else {
 +              razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HI);
 +              razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_LO);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_READ;
 +      }
 +
 +      hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &eng_id, 1,
 +                              rd_wr_flag | HL_RAZWI_HBW, event_mask);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "%s-RAZWI SHARED RR HBW %s error, address %#llx, Initiator coordinates 0x%x\n",
 +              name, is_write ? "WR" : "RD", (u64)razwi_hi << 32 | razwi_lo, razwi_xy);
 +}
 +
 +static void gaudi2_razwi_rr_lbw_shared_printf_info(struct hl_device *hdev,
 +                      u64 rtr_mstr_if_base_addr, bool is_write, char *name,
 +                      enum gaudi2_engine_id id, u64 *event_mask)
 +{
 +      u64 razwi_addr = CFG_BASE;
 +      u32 razwi_xy;
 +      u16 eng_id = id;
 +      u8 rd_wr_flag;
 +
 +      if (is_write) {
 +              razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +      } else {
 +              razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI);
 +              razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_XY);
 +              rd_wr_flag = HL_RAZWI_READ;
 +      }
 +
 +      hl_handle_razwi(hdev, razwi_addr, &eng_id, 1, rd_wr_flag | HL_RAZWI_LBW, event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +                              "%s-RAZWI SHARED RR LBW %s error, mstr_if 0x%llx, captured address 0x%llX Initiator coordinates 0x%x\n",
 +                              name, is_write ? "WR" : "RD", rtr_mstr_if_base_addr, razwi_addr,
 +                                              razwi_xy);
 +}
 +
 +static enum gaudi2_engine_id gaudi2_razwi_calc_engine_id(struct hl_device *hdev,
 +                                              enum razwi_event_sources module, u8 module_idx)
 +{
 +      switch (module) {
 +      case RAZWI_TPC:
 +              if (module_idx == (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES))
 +                      return GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +              return (((module_idx / NUM_OF_TPC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                              (module_idx % NUM_OF_TPC_PER_DCORE) +
 +                              (GAUDI2_DCORE0_ENGINE_ID_TPC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
 +
 +      case RAZWI_MME:
 +              return ((GAUDI2_DCORE0_ENGINE_ID_MME - GAUDI2_DCORE0_ENGINE_ID_EDMA_0) +
 +                      (module_idx * ENGINE_ID_DCORE_OFFSET));
 +
 +      case RAZWI_EDMA:
 +              return (((module_idx / NUM_OF_EDMA_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                      (module_idx % NUM_OF_EDMA_PER_DCORE));
 +
 +      case RAZWI_PDMA:
 +              return (GAUDI2_ENGINE_ID_PDMA_0 + module_idx);
 +
 +      case RAZWI_NIC:
 +              return (GAUDI2_ENGINE_ID_NIC0_0 + (NIC_NUMBER_OF_QM_PER_MACRO * module_idx));
 +
 +      case RAZWI_DEC:
 +              if (module_idx == 8)
 +                      return GAUDI2_PCIE_ENGINE_ID_DEC_0;
 +
 +              if (module_idx == 9)
 +                      return GAUDI2_PCIE_ENGINE_ID_DEC_1;
 +                                      ;
 +              return (((module_idx / NUM_OF_DEC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
 +                              (module_idx % NUM_OF_DEC_PER_DCORE) +
 +                              (GAUDI2_DCORE0_ENGINE_ID_DEC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
 +
 +      case RAZWI_ROT:
 +              return GAUDI2_ENGINE_ID_ROT_0 + module_idx;
 +
 +      default:
 +              return GAUDI2_ENGINE_ID_SIZE;
 +      }
 +}
 +
 +/*
 + * This function handles RR(Range register) hit events.
 + * raised be initiators not PSOC RAZWI.
 + */
 +static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 +                              enum razwi_event_sources module, u8 module_idx,
 +                              u8 module_sub_idx, u64 *event_mask)
 +{
 +      bool via_sft = false;
 +      u32 hbw_rtr_id, lbw_rtr_id, dcore_id, dcore_rtr_id, eng_id;
 +      u64 hbw_rtr_mstr_if_base_addr, lbw_rtr_mstr_if_base_addr;
 +      u32 hbw_shrd_aw = 0, hbw_shrd_ar = 0;
 +      u32 lbw_shrd_aw = 0, lbw_shrd_ar = 0;
 +      char initiator_name[64];
 +
 +      switch (module) {
 +      case RAZWI_TPC:
 +              hbw_rtr_id = gaudi2_tpc_initiator_hbw_rtr_id[module_idx];
 +
 +              /* TODO : remove this check and depend only on tpc routers table
 +               * when SW-118828 is resolved
 +               */
 +              if (!hdev->asic_prop.fw_security_enabled &&
 +                              ((module_idx == 0) || (module_idx == 1)))
 +                      lbw_rtr_id = DCORE0_RTR0;
 +              else
 +                      lbw_rtr_id = gaudi2_tpc_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "TPC_%u", module_idx);
 +              break;
 +      case RAZWI_MME:
 +              sprintf(initiator_name, "MME_%u", module_idx);
 +              switch (module_sub_idx) {
 +              case MME_WAP0:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap0;
 +                      break;
 +              case MME_WAP1:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap1;
 +                      break;
 +              case MME_WRITE:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].write;
 +                      break;
 +              case MME_READ:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].read;
 +                      break;
 +              case MME_SBTE0:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte0;
 +                      break;
 +              case MME_SBTE1:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte1;
 +                      break;
 +              case MME_SBTE2:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte2;
 +                      break;
 +              case MME_SBTE3:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte3;
 +                      break;
 +              case MME_SBTE4:
 +                      hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte4;
 +                      break;
 +              default:
 +                      return;
 +              }
 +              lbw_rtr_id = hbw_rtr_id;
 +              break;
 +      case RAZWI_EDMA:
 +              hbw_rtr_mstr_if_base_addr = gaudi2_edma_initiator_hbw_sft[module_idx];
 +              dcore_id = module_idx / NUM_OF_EDMA_PER_DCORE;
 +              /* SFT has separate MSTR_IF for LBW, only there we can
 +               * read the LBW razwi related registers
 +               */
 +              lbw_rtr_mstr_if_base_addr = mmSFT0_LBW_RTR_IF_MSTR_IF_RR_SHRD_HBW_BASE +
 +                                                              dcore_id * SFT_DCORE_OFFSET;
 +              via_sft = true;
 +              sprintf(initiator_name, "EDMA_%u", module_idx);
 +              break;
 +      case RAZWI_PDMA:
 +              hbw_rtr_id = gaudi2_pdma_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_pdma_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "PDMA_%u", module_idx);
 +              break;
 +      case RAZWI_NIC:
 +              hbw_rtr_id = gaudi2_nic_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_nic_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "NIC_%u", module_idx);
 +              break;
 +      case RAZWI_DEC:
 +              hbw_rtr_id = gaudi2_dec_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_dec_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "DEC_%u", module_idx);
 +              break;
 +      case RAZWI_ROT:
 +              hbw_rtr_id = gaudi2_rot_initiator_hbw_rtr_id[module_idx];
 +              lbw_rtr_id = gaudi2_rot_initiator_lbw_rtr_id[module_idx];
 +              sprintf(initiator_name, "ROT_%u", module_idx);
 +              break;
 +      default:
 +              return;
 +      }
 +
 +      /* Find router mstr_if register base */
 +      if (!via_sft) {
 +              dcore_id = hbw_rtr_id / NUM_OF_RTR_PER_DCORE;
 +              dcore_rtr_id = hbw_rtr_id % NUM_OF_RTR_PER_DCORE;
 +              hbw_rtr_mstr_if_base_addr = mmDCORE0_RTR0_CTRL_BASE +
 +                              dcore_id * DCORE_OFFSET +
 +                              dcore_rtr_id * DCORE_RTR_OFFSET +
 +                              RTR_MSTR_IF_OFFSET;
 +              lbw_rtr_mstr_if_base_addr = hbw_rtr_mstr_if_base_addr +
 +                              (((s32)lbw_rtr_id - hbw_rtr_id) * DCORE_RTR_OFFSET);
 +      }
 +
 +      /* Find out event cause by reading "RAZWI_HAPPENED" registers */
 +      hbw_shrd_aw = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED);
 +      hbw_shrd_ar = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED);
 +      lbw_shrd_aw = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
 +      lbw_shrd_ar = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
 +
 +      eng_id = gaudi2_razwi_calc_engine_id(hdev, module, module_idx);
 +      if (hbw_shrd_aw) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, true,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED, hbw_shrd_aw);
 +      }
 +
 +      if (hbw_shrd_ar) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, false,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED, hbw_shrd_ar);
 +      }
 +
 +      if (lbw_shrd_aw) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, true,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED, lbw_shrd_aw);
 +      }
 +
 +      if (lbw_shrd_ar) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, false,
 +                                              initiator_name, eng_id, event_mask);
 +
 +              /* Clear event indication */
 +              WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED, lbw_shrd_ar);
 +      }
 +}
 +
 +static void gaudi2_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u8 mod_idx, sub_mod;
 +
 +      /* check all TPCs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1) ; mod_idx++) {
 +              if (prop->tpc_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, mod_idx, 0, NULL);
 +      }
 +
 +      /* check all MMEs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_MME_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
 +              for (sub_mod = MME_WAP0 ; sub_mod < MME_INITIATORS_MAX ; sub_mod++)
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mod_idx,
 +                                                                      sub_mod, NULL);
 +
 +      /* check all EDMAs */
 +      for (mod_idx = 0 ; mod_idx < (NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
 +              if (prop->edma_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, mod_idx, 0, NULL);
 +
 +      /* check all PDMAs */
 +      for (mod_idx = 0 ; mod_idx < NUM_OF_PDMA ; mod_idx++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_PDMA, mod_idx, 0, NULL);
 +
 +      /* check all NICs */
 +      for (mod_idx = 0 ; mod_idx < NIC_NUMBER_OF_PORTS ; mod_idx++)
 +              if (hdev->nic_ports_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_NIC, mod_idx >> 1, 0,
 +                                                              NULL);
 +
 +      /* check all DECs */
 +      for (mod_idx = 0 ; mod_idx < NUMBER_OF_DEC ; mod_idx++)
 +              if (prop->decoder_enabled_mask & BIT(mod_idx))
 +                      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, mod_idx, 0, NULL);
 +
 +      /* check all ROTs */
 +      for (mod_idx = 0 ; mod_idx < NUM_OF_ROT ; mod_idx++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, mod_idx, 0, NULL);
 +}
 +
 +static const char *gaudi2_get_initiators_name(u32 rtr_id)
 +{
 +      switch (rtr_id) {
 +      case DCORE0_RTR0:
 +              return "DEC0/1/8/9, TPC24, PDMA0/1, PMMU, PCIE_IF, EDMA0/2, HMMU0/2/4/6, CPU";
 +      case DCORE0_RTR1:
 +              return "TPC0/1";
 +      case DCORE0_RTR2:
 +              return "TPC2/3";
 +      case DCORE0_RTR3:
 +              return "TPC4/5";
 +      case DCORE0_RTR4:
 +              return "MME0_SBTE0/1";
 +      case DCORE0_RTR5:
 +              return "MME0_WAP0/SBTE2";
 +      case DCORE0_RTR6:
 +              return "MME0_CTRL_WR/SBTE3";
 +      case DCORE0_RTR7:
 +              return "MME0_WAP1/CTRL_RD/SBTE4";
 +      case DCORE1_RTR0:
 +              return "MME1_WAP1/CTRL_RD/SBTE4";
 +      case DCORE1_RTR1:
 +              return "MME1_CTRL_WR/SBTE3";
 +      case DCORE1_RTR2:
 +              return "MME1_WAP0/SBTE2";
 +      case DCORE1_RTR3:
 +              return "MME1_SBTE0/1";
 +      case DCORE1_RTR4:
 +              return "TPC10/11";
 +      case DCORE1_RTR5:
 +              return "TPC8/9";
 +      case DCORE1_RTR6:
 +              return "TPC6/7";
 +      case DCORE1_RTR7:
 +              return "DEC2/3, NIC0/1/2/3/4, ARC_FARM, KDMA, EDMA1/3, HMMU1/3/5/7";
 +      case DCORE2_RTR0:
 +              return "DEC4/5, NIC5/6/7/8, EDMA4/6, HMMU8/10/12/14, ROT0";
 +      case DCORE2_RTR1:
 +              return "TPC16/17";
 +      case DCORE2_RTR2:
 +              return "TPC14/15";
 +      case DCORE2_RTR3:
 +              return "TPC12/13";
 +      case DCORE2_RTR4:
 +              return "MME2_SBTE0/1";
 +      case DCORE2_RTR5:
 +              return "MME2_WAP0/SBTE2";
 +      case DCORE2_RTR6:
 +              return "MME2_CTRL_WR/SBTE3";
 +      case DCORE2_RTR7:
 +              return "MME2_WAP1/CTRL_RD/SBTE4";
 +      case DCORE3_RTR0:
 +              return "MME3_WAP1/CTRL_RD/SBTE4";
 +      case DCORE3_RTR1:
 +              return "MME3_CTRL_WR/SBTE3";
 +      case DCORE3_RTR2:
 +              return "MME3_WAP0/SBTE2";
 +      case DCORE3_RTR3:
 +              return "MME3_SBTE0/1";
 +      case DCORE3_RTR4:
 +              return "TPC18/19";
 +      case DCORE3_RTR5:
 +              return "TPC20/21";
 +      case DCORE3_RTR6:
 +              return "TPC22/23";
 +      case DCORE3_RTR7:
 +              return "DEC6/7, NIC9/10/11, EDMA5/7, HMMU9/11/13/15, ROT1, PSOC";
 +      default:
 +      return "N/A";
 +      }
 +}
 +
 +static u16 gaudi2_get_razwi_initiators(u32 rtr_id, u16 *engines)
 +{
 +      switch (rtr_id) {
 +      case DCORE0_RTR0:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_PCIE_ENGINE_ID_DEC_0;
 +              engines[3] = GAUDI2_PCIE_ENGINE_ID_DEC_1;
 +              engines[4] = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
 +              engines[5] = GAUDI2_ENGINE_ID_PDMA_0;
 +              engines[6] = GAUDI2_ENGINE_ID_PDMA_1;
 +              engines[7] = GAUDI2_ENGINE_ID_PCIE;
 +              engines[8] = GAUDI2_DCORE0_ENGINE_ID_EDMA_0;
 +              engines[9] = GAUDI2_DCORE1_ENGINE_ID_EDMA_0;
 +              engines[10] = GAUDI2_ENGINE_ID_PSOC;
 +              return 11;
 +
 +      case DCORE0_RTR1:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE0_RTR2:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE0_RTR3:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE0_RTR4:
 +      case DCORE0_RTR5:
 +      case DCORE0_RTR6:
 +      case DCORE0_RTR7:
 +              engines[0] = GAUDI2_DCORE0_ENGINE_ID_MME;
 +              return 1;
 +
 +      case DCORE1_RTR0:
 +      case DCORE1_RTR1:
 +      case DCORE1_RTR2:
 +      case DCORE1_RTR3:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_MME;
 +              return 1;
 +
 +      case DCORE1_RTR4:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE1_RTR5:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE1_RTR6:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE1_RTR7:
 +              engines[0] = GAUDI2_DCORE1_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE1_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC0_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC1_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC2_0;
 +              engines[5] = GAUDI2_ENGINE_ID_NIC3_0;
 +              engines[6] = GAUDI2_ENGINE_ID_NIC4_0;
 +              engines[7] = GAUDI2_ENGINE_ID_ARC_FARM;
 +              engines[8] = GAUDI2_ENGINE_ID_KDMA;
 +              engines[9] = GAUDI2_DCORE0_ENGINE_ID_EDMA_1;
 +              engines[10] = GAUDI2_DCORE1_ENGINE_ID_EDMA_1;
 +              return 11;
 +
 +      case DCORE2_RTR0:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC5_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC6_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC7_0;
 +              engines[5] = GAUDI2_ENGINE_ID_NIC8_0;
 +              engines[6] = GAUDI2_DCORE2_ENGINE_ID_EDMA_0;
 +              engines[7] = GAUDI2_DCORE3_ENGINE_ID_EDMA_0;
 +              engines[8] = GAUDI2_ENGINE_ID_ROT_0;
 +              return 9;
 +
 +      case DCORE2_RTR1:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_5;
 +              return 2;
 +
 +      case DCORE2_RTR2:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_3;
 +              return 2;
 +
 +      case DCORE2_RTR3:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_1;
 +              return 2;
 +
 +      case DCORE2_RTR4:
 +      case DCORE2_RTR5:
 +      case DCORE2_RTR6:
 +      case DCORE2_RTR7:
 +              engines[0] = GAUDI2_DCORE2_ENGINE_ID_MME;
 +              return 1;
 +      case DCORE3_RTR0:
 +      case DCORE3_RTR1:
 +      case DCORE3_RTR2:
 +      case DCORE3_RTR3:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_MME;
 +              return 1;
 +      case DCORE3_RTR4:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_0;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_1;
 +              return 2;
 +      case DCORE3_RTR5:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_2;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_3;
 +              return 2;
 +      case DCORE3_RTR6:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_4;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_5;
 +              return 2;
 +      case DCORE3_RTR7:
 +              engines[0] = GAUDI2_DCORE3_ENGINE_ID_DEC_0;
 +              engines[1] = GAUDI2_DCORE3_ENGINE_ID_DEC_1;
 +              engines[2] = GAUDI2_ENGINE_ID_NIC9_0;
 +              engines[3] = GAUDI2_ENGINE_ID_NIC10_0;
 +              engines[4] = GAUDI2_ENGINE_ID_NIC11_0;
 +              engines[5] = GAUDI2_DCORE2_ENGINE_ID_EDMA_1;
 +              engines[6] = GAUDI2_DCORE3_ENGINE_ID_EDMA_1;
 +              engines[7] = GAUDI2_ENGINE_ID_ROT_1;
 +              engines[8] = GAUDI2_ENGINE_ID_ROT_0;
 +              return 9;
 +      default:
 +              return 0;
 +      }
 +}
 +
 +static void gaudi2_razwi_unmapped_addr_hbw_printf_info(struct hl_device *hdev, u32 rtr_id,
 +                                                      u64 rtr_ctrl_base_addr, bool is_write,
 +                                                      u64 *event_mask)
 +{
 +      u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
 +      u32 razwi_hi, razwi_lo;
 +      u8 rd_wr_flag;
 +
 +      num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
 +
 +      if (is_write) {
 +              razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_HI);
 +              razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_LO);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET, 0x1);
 +      } else {
 +              razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_HI);
 +              razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_LO);
 +              rd_wr_flag = HL_RAZWI_READ;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET, 0x1);
 +      }
 +
 +      hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &engines[0], num_of_eng,
 +                              rd_wr_flag | HL_RAZWI_HBW, event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +              "RAZWI PSOC unmapped HBW %s error, rtr id %u, address %#llx\n",
 +              is_write ? "WR" : "RD", rtr_id, (u64)razwi_hi << 32 | razwi_lo);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
 +}
 +
 +static void gaudi2_razwi_unmapped_addr_lbw_printf_info(struct hl_device *hdev, u32 rtr_id,
 +                                                      u64 rtr_ctrl_base_addr, bool is_write,
 +                                                      u64 *event_mask)
 +{
 +      u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
 +      u64 razwi_addr = CFG_BASE;
 +      u8 rd_wr_flag;
 +
 +      num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
 +
 +      if (is_write) {
 +              razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_ADDR);
 +              rd_wr_flag = HL_RAZWI_WRITE;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET, 0x1);
 +      } else {
 +              razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_ADDR);
 +              rd_wr_flag = HL_RAZWI_READ;
 +
 +              /* Clear set indication */
 +              WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET, 0x1);
 +      }
 +
 +      hl_handle_razwi(hdev, razwi_addr, &engines[0], num_of_eng, rd_wr_flag | HL_RAZWI_LBW,
 +                      event_mask);
 +      dev_err_ratelimited(hdev->dev,
 +              "RAZWI PSOC unmapped LBW %s error, rtr id %u, address 0x%llX\n",
 +              is_write ? "WR" : "RD", rtr_id, razwi_addr);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
 +}
 +
 +/* PSOC RAZWI interrupt occurs only when trying to access a bad address */
 +static int gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *event_mask)
 +{
 +      u32 hbw_aw_set, hbw_ar_set, lbw_aw_set, lbw_ar_set, rtr_id, dcore_id, dcore_rtr_id, xy,
 +                                              razwi_mask_info, razwi_intr = 0, error_count = 0;
 +      int rtr_map_arr_len = NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES;
 +      u64 rtr_ctrl_base_addr;
 +
 +      if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX)) {
 +              razwi_intr = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT);
 +              if (!razwi_intr)
 +                      return 0;
 +      }
 +
 +      razwi_mask_info = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_MASK_INFO);
 +      xy = FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_L_MASK, razwi_mask_info);
 +
 +      dev_err_ratelimited(hdev->dev,
 +              "PSOC RAZWI interrupt: Mask %d, AR %d, AW %d, AXUSER_L 0x%x AXUSER_H 0x%x\n",
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_MASK_MASK, razwi_mask_info),
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AR_MASK, razwi_mask_info),
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AW_MASK, razwi_mask_info),
 +              xy,
 +              FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_H_MASK, razwi_mask_info));
 +
 +      if (xy == 0) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "PSOC RAZWI interrupt: received event from 0 rtr coordinates\n");
 +              goto clear;
 +      }
 +
 +      /* Find router id by router coordinates */
 +      for (rtr_id = 0 ; rtr_id < rtr_map_arr_len ; rtr_id++)
 +              if (rtr_coordinates_to_rtr_id[rtr_id] == xy)
 +                      break;
 +
 +      if (rtr_id == rtr_map_arr_len) {
 +              dev_err_ratelimited(hdev->dev,
 +                              "PSOC RAZWI interrupt: invalid rtr coordinates (0x%x)\n", xy);
 +              goto clear;
 +      }
 +
 +      /* Find router mstr_if register base */
 +      dcore_id = rtr_id / NUM_OF_RTR_PER_DCORE;
 +      dcore_rtr_id = rtr_id % NUM_OF_RTR_PER_DCORE;
 +      rtr_ctrl_base_addr = mmDCORE0_RTR0_CTRL_BASE + dcore_id * DCORE_OFFSET +
 +                              dcore_rtr_id * DCORE_RTR_OFFSET;
 +
 +      hbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET);
 +      hbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET);
 +      lbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET);
 +      lbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET);
 +
 +      if (hbw_aw_set)
 +              gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, true, event_mask);
 +
 +      if (hbw_ar_set)
 +              gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, false, event_mask);
 +
 +      if (lbw_aw_set)
 +              gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, true, event_mask);
 +
 +      if (lbw_ar_set)
 +              gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
 +                                              rtr_ctrl_base_addr, false, event_mask);
 +
 +      error_count++;
 +
 +clear:
 +      /* Clear Interrupts only on pldm or if f/w doesn't handle interrupts */
 +      if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX))
 +              WREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT, razwi_intr);
 +
 +      return error_count;
 +}
 +
 +static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(qman_base + QM_SEI_STATUS_OFFSET);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_qm_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      WREG32(qman_base + QM_SEI_STATUS_OFFSET, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type,
 +                                      bool extended_err_check, u64 *event_mask)
 +{
 +      enum razwi_event_sources module;
 +      u32 error_count = 0;
 +      u64 qman_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC23_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
 +              qman_base = mmDCORE0_TPC0_QM_BASE +
 +                              (index / NUM_OF_TPC_PER_DCORE) * DCORE_OFFSET +
 +                              (index % NUM_OF_TPC_PER_DCORE) * DCORE_TPC_OFFSET;
 +              module = RAZWI_TPC;
 +              break;
 +      case GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
 +              qman_base = mmDCORE0_TPC6_QM_BASE;
 +              module = RAZWI_TPC;
 +              break;
 +      case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
 +              index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
 +                              (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
 +                                              GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
 +              qman_base = mmDCORE0_MME_QM_BASE + index * DCORE_OFFSET;
 +              module = RAZWI_MME;
 +              break;
 +      case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP;
 +              qman_base = mmPDMA0_QM_BASE + index * PDMA_OFFSET;
 +              module = RAZWI_PDMA;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
 +              qman_base = mmROT0_QM_BASE + index * ROT_OFFSET;
 +              module = RAZWI_ROT;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
 +
 +      /* There is a single event per NIC macro, so should check its both QMAN blocks */
 +      if (event_type >= GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE &&
 +                      event_type <= GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE)
 +              error_count += _gaudi2_handle_qm_sei_err(hdev,
 +                                      qman_base + NIC_QM_OFFSET, event_type);
 +
 +      if (extended_err_check) {
 +              /* check if RAZWI happened */
 +              gaudi2_ack_module_razwi_event_handler(hdev, module, 0, 0, event_mask);
 +              hl_check_for_glbl_errors(hdev);
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      u32 qid_base, error_count = 0;
 +      u64 qman_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_TPC5_QM:
 +              index = event_type - GAUDI2_EVENT_TPC0_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE0_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC6_QM ... GAUDI2_EVENT_TPC11_QM:
 +              index = event_type - GAUDI2_EVENT_TPC6_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE1_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC12_QM ... GAUDI2_EVENT_TPC17_QM:
 +              index = event_type - GAUDI2_EVENT_TPC12_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE2_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC18_QM ... GAUDI2_EVENT_TPC23_QM:
 +              index = event_type - GAUDI2_EVENT_TPC18_QM;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 + index * QMAN_STREAMS;
 +              qman_base = mmDCORE3_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_TPC24_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
 +              qman_base = mmDCORE0_TPC6_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
 +              qman_base = mmDCORE0_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
 +              qman_base = mmDCORE1_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME2_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
 +              qman_base = mmDCORE2_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_MME3_QM:
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
 +              qman_base = mmDCORE3_MME_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA0_QM:
 +              index = 0;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0;
 +              qman_base = mmDCORE0_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA1_QM:
 +              index = 1;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0;
 +              qman_base = mmDCORE0_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA2_QM:
 +              index = 2;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0;
 +              qman_base = mmDCORE1_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA3_QM:
 +              index = 3;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0;
 +              qman_base = mmDCORE1_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA4_QM:
 +              index = 4;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0;
 +              qman_base = mmDCORE2_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA5_QM:
 +              index = 5;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0;
 +              qman_base = mmDCORE2_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA6_QM:
 +              index = 6;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0;
 +              qman_base = mmDCORE3_EDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_HDMA7_QM:
 +              index = 7;
 +              qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0;
 +              qman_base = mmDCORE3_EDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_PDMA0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_PDMA_0_0;
 +              qman_base = mmPDMA0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_PDMA1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_PDMA_1_0;
 +              qman_base = mmPDMA1_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR0_ROT0_QM:
 +              qid_base = GAUDI2_QUEUE_ID_ROT_0_0;
 +              qman_base = mmROT0_QM_BASE;
 +              break;
 +      case GAUDI2_EVENT_ROTATOR1_ROT1_QM:
 +              qid_base = GAUDI2_QUEUE_ID_ROT_1_0;
 +              qman_base = mmROT1_QM_BASE;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = gaudi2_handle_qman_err_generic(hdev, event_type, qman_base, qid_base);
 +
 +      /* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */
 +      if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) {
 +              error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, index, 0, event_mask);
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_STS);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_arc_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_CLR, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 i, sts_val, sts_clr_val = 0, error_count = 0;
 +
 +      sts_val = RREG32(mmCPU_IF_CPU_SEI_INTR_STS);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_cpu_sei_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(mmCPU_IF_CPU_SEI_INTR_CLR, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, u16 event_type,
 +                                      struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
 +                                      u64 *event_mask)
 +{
 +      u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_ROT_ERR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_rot_error_cause[i]);
 +                      error_count++;
 +              }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, rot_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev,  u8 tpc_index, u16 event_type,
 +                                      struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
 +                                      u64 *event_mask)
 +{
 +      u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_TPC_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "interrupt cause: %s",  gaudi2_tpc_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, tpc_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      if (dec_index < NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES)
 +              /* DCORE DEC */
 +              sts_addr = mmDCORE0_VDEC0_BRDG_CTRL_CAUSE_INTR +
 +                              DCORE_OFFSET * (dec_index / NUM_OF_DEC_PER_DCORE) +
 +                              DCORE_VDEC_OFFSET * (dec_index % NUM_OF_DEC_PER_DCORE);
 +      else
 +              /* PCIE DEC */
 +              sts_addr = mmPCIE_VDEC0_BRDG_CTRL_CAUSE_INTR + PCIE_VDEC_OFFSET *
 +                              (dec_index - NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES);
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DEC_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_dec_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, dec_index, 0, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      /* Write 1 clear errors */
 +      WREG32(sts_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      sts_addr = mmDCORE0_MME_CTRL_LO_INTR_CAUSE + DCORE_OFFSET * mme_index;
 +      sts_clr_addr = mmDCORE0_MME_CTRL_LO_INTR_CLEAR + DCORE_OFFSET * mme_index;
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened */
 +      for (i = MME_WRITE ; i < MME_INITIATORS_MAX ; i++)
 +              gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, i, event_mask);
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(sts_clr_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      int i, error_count = 0;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_sbte_error_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
 +                                      u64 *event_mask)
 +{
 +      u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
 +      int i;
 +
 +      sts_addr = mmDCORE0_MME_ACC_INTR_CAUSE + DCORE_OFFSET * mme_index;
 +      sts_clr_addr = mmDCORE0_MME_ACC_INTR_CLEAR + DCORE_OFFSET * mme_index;
 +
 +      sts_val = RREG32(sts_addr);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE ; i++) {
 +              if (sts_val & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", guadi2_mme_wap_error_cause[i]);
 +                      sts_clr_val |= BIT(i);
 +                      error_count++;
 +              }
 +      }
 +
 +      /* check if RAZWI happened on WAP0/1 */
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP0, event_mask);
 +      gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP1, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      WREG32(sts_clr_addr, sts_clr_val);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      /* If an AXI read or write error is received, an error is reported and
 +       * interrupt message is sent. Due to an HW errata, when reading the cause
 +       * register of the KDMA engine, the reported error is always HBW even if
 +       * the actual error caused by a LBW KDMA transaction.
 +       */
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_kdma_core_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
 +              if (intr_cause_data & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_dma_core_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask)
 +{
 +      u32 mstr_if_base_addr = mmPCIE_MSTR_RR_MSTR_IF_RR_SHRD_HBW_BASE, razwi_happened_addr;
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +
 +      razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED;
 +      if (RREG32(razwi_happened_addr)) {
 +              gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
 +                                                      GAUDI2_ENGINE_ID_PCIE, event_mask);
 +              WREG32(razwi_happened_addr, 0x1);
 +      }
 +}
 +
 +static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u16 event_type,
 +                                      u64 intr_cause_data, u64 *event_mask)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE ; i++) {
 +              if (!(intr_cause_data & BIT_ULL(i)))
 +                      continue;
 +
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "err cause: %s", gaudi2_pcie_addr_dec_error_cause[i]);
 +              error_count++;
 +
 +              switch (intr_cause_data & BIT_ULL(i)) {
 +              case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_AXI_LBW_ERR_INTR_MASK:
 +                      hl_check_for_glbl_errors(hdev);
 +                      break;
 +              case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_BAD_ACCESS_INTR_MASK:
 +                      gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(hdev, event_mask);
 +                      break;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u16 event_type,
 +                              u64 intr_cause_data)
 +
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_pmmu_fatal_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_hif_fatal_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu,
 +                                      u64 *event_mask)
 +{
 +      u32 valid, val, axid_l, axid_h;
 +      u64 addr;
 +
 +      valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
 +
 +      if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_PAGE_ERR_VALID_ENTRY_MASK))
 +              return;
 +
 +      val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE));
 +      addr = val & DCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA_63_32_MASK;
 +      addr <<= 32;
 +      addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA));
 +
 +      axid_l = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_LSB));
 +      axid_h = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_MSB));
 +
 +      dev_err_ratelimited(hdev->dev, "%s page fault on va 0x%llx, transaction id 0x%llX\n",
 +                              is_pmmu ? "PMMU" : "HMMU", addr, ((u64)axid_h << 32) + axid_l);
 +      hl_handle_page_fault(hdev, addr, 0, is_pmmu, event_mask);
 +
 +      WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE), 0);
 +}
 +
 +static void gaudi2_handle_access_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu)
 +{
 +      u32 valid, val;
 +      u64 addr;
 +
 +      valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
 +
 +      if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_ACCESS_ERR_VALID_ENTRY_MASK))
 +              return;
 +
 +      val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE));
 +      addr = val & DCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA_63_32_MASK;
 +      addr <<= 32;
 +      addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA));
 +
 +      dev_err_ratelimited(hdev->dev, "%s access error on va 0x%llx\n",
 +                              is_pmmu ? "PMMU" : "HMMU", addr);
 +      WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE), 0);
 +}
 +
 +static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, u16 event_type,
 +                                              u64 mmu_base, bool is_pmmu, u64 *event_mask)
 +{
 +      u32 spi_sei_cause, interrupt_clr = 0x0, error_count = 0;
 +      int i;
 +
 +      spi_sei_cause = RREG32(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET);
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE ; i++) {
 +              if (spi_sei_cause & BIT(i)) {
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s", gaudi2_mmu_spi_sei[i].cause);
 +
 +                      if (i == 0)
 +                              gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, event_mask);
 +                      else if (i == 1)
 +                              gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
 +
 +                      if (gaudi2_mmu_spi_sei[i].clear_bit >= 0)
 +                              interrupt_clr |= BIT(gaudi2_mmu_spi_sei[i].clear_bit);
 +
 +                      error_count++;
 +              }
 +      }
 +
 +      /* Clear cause */
 +      WREG32_AND(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET, ~spi_sei_cause);
 +
 +      /* Clear interrupt */
 +      WREG32(mmu_base + MMU_INTERRUPT_CLR_OFFSET, interrupt_clr);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_sm_err(struct hl_device *hdev, u16 event_type, u8 sm_index)
 +{
 +      u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log,
 +              cq_intr_addr, cq_intr_val, cq_intr_queue_index, error_count = 0;
 +      int i;
 +
 +      sei_cause_addr = mmDCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE + DCORE_OFFSET * sm_index;
 +      cq_intr_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_INTR + DCORE_OFFSET * sm_index;
 +
 +      sei_cause_val = RREG32(sei_cause_addr);
 +      sei_cause_cause = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_CAUSE_MASK, sei_cause_val);
 +      cq_intr_val = RREG32(cq_intr_addr);
 +
 +      /* SEI interrupt */
 +      if (sei_cause_cause) {
 +              /* There are corresponding SEI_CAUSE_log bits for every SEI_CAUSE_cause bit */
 +              sei_cause_log = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_LOG_MASK,
 +                                      sei_cause_val);
 +
 +              for (i = 0 ; i < GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE ; i++) {
 +                      if (!(sei_cause_cause & BIT(i)))
 +                              continue;
 +
 +                      gaudi2_print_event(hdev, event_type, true,
 +                              "err cause: %s. %s: 0x%X\n",
 +                              gaudi2_sm_sei_cause[i].cause_name,
 +                              gaudi2_sm_sei_cause[i].log_name,
 +                              sei_cause_log);
 +                      error_count++;
 +                      break;
 +              }
 +
 +              /* Clear SM_SEI_CAUSE */
 +              WREG32(sei_cause_addr, 0);
 +      }
 +
 +      /* CQ interrupt */
 +      if (cq_intr_val & DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_SEC_INTR_MASK) {
 +              cq_intr_queue_index =
 +                              FIELD_GET(DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_INTR_QUEUE_INDEX_MASK,
 +                                      cq_intr_val);
 +
 +              dev_err_ratelimited(hdev->dev, "SM%u err. err cause: CQ_INTR. queue index: %u\n",
 +                              sm_index, cq_intr_queue_index);
 +              error_count++;
 +
 +              /* Clear CQ_INTR */
 +              WREG32(cq_intr_addr, 0);
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      bool is_pmmu = false;
 +      u32 error_count = 0;
 +      u64 mmu_base;
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU3_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM) / 3;
 +              mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_3_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP);
 +              mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU11_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_11_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP);
 +              mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU4_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_4_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP);
 +              mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
 +              index = (event_type - GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM) / 3;
 +              mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
 +              index = (event_type - GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP);
 +              mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
 +              break;
 +      case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
 +      case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
 +              is_pmmu = true;
 +              mmu_base = mmPMMU_HBW_MMU_BASE;
 +              break;
 +      default:
 +              return 0;
 +      }
 +
 +      error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, event_type, mmu_base,
 +                                                      is_pmmu, event_mask);
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +
 +/* returns true if hard reset is required (ECC DERR or Read parity), false otherwise (ECC SERR) */
 +static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev,
 +                      struct hl_eq_hbm_sei_read_err_intr_info *rd_err_data, u32 err_cnt)
 +{
 +      u32 addr, beat, beat_shift;
 +      bool rc = false;
 +
 +      dev_err_ratelimited(hdev->dev,
 +                      "READ ERROR count: ECC SERR: %d, ECC DERR: %d, RD_PARITY: %d\n",
 +                      FIELD_GET(HBM_ECC_SERR_CNTR_MASK, err_cnt),
 +                      FIELD_GET(HBM_ECC_DERR_CNTR_MASK, err_cnt),
 +                      FIELD_GET(HBM_RD_PARITY_CNTR_MASK, err_cnt));
 +
 +      addr = le32_to_cpu(rd_err_data->dbg_rd_err_addr.rd_addr_val);
 +      dev_err_ratelimited(hdev->dev,
 +                      "READ ERROR address: sid(%u), bg(%u), ba(%u), col(%u), row(%u)\n",
 +                      FIELD_GET(HBM_RD_ADDR_SID_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_BG_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_BA_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_COL_MASK, addr),
 +                      FIELD_GET(HBM_RD_ADDR_ROW_MASK, addr));
 +
 +      /* For each beat (RDQS edge), look for possible errors and print relevant info */
 +      for (beat = 0 ; beat < 4 ; beat++) {
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_SERR_BEAT0_MASK << beat))
 +                      dev_err_ratelimited(hdev->dev, "Beat%d ECC SERR: DM: %#x, Syndrome: %#x\n",
 +                                              beat,
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
 +
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_DERR_BEAT0_MASK << beat)) {
 +                      dev_err_ratelimited(hdev->dev, "Beat%d ECC DERR: DM: %#x, Syndrome: %#x\n",
 +                                              beat,
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                              le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
 +                      rc |= true;
 +              }
 +
 +              beat_shift = beat * HBM_RD_ERR_BEAT_SHIFT;
 +              if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                      (HBM_RD_ERR_PAR_ERR_BEAT0_MASK << beat_shift)) {
 +                      dev_err_ratelimited(hdev->dev,
 +                                      "Beat%d read PARITY: DM: %#x, PAR data: %#x\n",
 +                                      beat,
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_dm),
 +                                      (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
 +                                              (HBM_RD_ERR_PAR_DATA_BEAT0_MASK << beat_shift)) >>
 +                                              (HBM_RD_ERR_PAR_DATA_BEAT0_SHIFT + beat_shift));
 +                      rc |= true;
 +              }
 +
 +              dev_err_ratelimited(hdev->dev, "Beat%d DQ data:\n", beat);
 +              dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2]));
 +              dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
 +                                      le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2 + 1]));
 +      }
 +
 +      return rc;
 +}
 +
 +static void gaudi2_hbm_sei_print_wr_par_info(struct hl_device *hdev,
 +                      struct hl_eq_hbm_sei_wr_par_intr_info *wr_par_err_data, u32 err_cnt)
 +{
 +      struct hbm_sei_wr_cmd_address *wr_cmd_addr = wr_par_err_data->dbg_last_wr_cmds;
 +      u32 i, curr_addr, derr = wr_par_err_data->dbg_derr;
 +
 +      dev_err_ratelimited(hdev->dev, "WRITE PARITY ERROR count: %d\n", err_cnt);
 +
 +      dev_err_ratelimited(hdev->dev, "CK-0 DERR: 0x%02x, CK-1 DERR: 0x%02x\n",
 +                              derr & 0x3, derr & 0xc);
 +
 +      /* JIRA H6-3286 - the following prints may not be valid */
 +      dev_err_ratelimited(hdev->dev, "Last latched write commands addresses:\n");
 +      for (i = 0 ; i < HBM_WR_PAR_CMD_LIFO_LEN ; i++) {
 +              curr_addr = le32_to_cpu(wr_cmd_addr[i].dbg_wr_cmd_addr);
 +              dev_err_ratelimited(hdev->dev,
 +                              "\twrite cmd[%u]: Address: SID(%u) BG(%u) BA(%u) COL(%u).\n",
 +                              i,
 +                              FIELD_GET(WR_PAR_LAST_CMD_SID_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_BG_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_BA_MASK, curr_addr),
 +                              FIELD_GET(WR_PAR_LAST_CMD_COL_MASK, curr_addr));
 +      }
 +}
 +
 +static void gaudi2_hbm_sei_print_ca_par_info(struct hl_device *hdev,
 +              struct hl_eq_hbm_sei_ca_par_intr_info *ca_par_err_data, u32 err_cnt)
 +{
 +      __le32 *col_cmd = ca_par_err_data->dbg_col;
 +      __le16 *row_cmd = ca_par_err_data->dbg_row;
 +      u32 i;
 +
 +      dev_err_ratelimited(hdev->dev, "CA ERROR count: %d\n", err_cnt);
 +
 +      dev_err_ratelimited(hdev->dev, "Last latched C&R bus commands:\n");
 +      for (i = 0 ; i < HBM_CA_ERR_CMD_LIFO_LEN ; i++)
 +              dev_err_ratelimited(hdev->dev, "cmd%u: ROW(0x%04x) COL(0x%05x)\n", i,
 +                      le16_to_cpu(row_cmd[i]) & (u16)GENMASK(13, 0),
 +                      le32_to_cpu(col_cmd[i]) & (u32)GENMASK(17, 0));
 +}
 +
 +/* Returns true if hard reset is needed or false otherwise */
 +static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type,
 +                                      struct hl_eq_hbm_sei_data *sei_data)
 +{
 +      bool require_hard_reset = false;
 +      u32 hbm_id, mc_id, cause_idx;
 +
 +      hbm_id = (event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 4;
 +      mc_id = ((event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 2) % 2;
 +
 +      cause_idx = sei_data->hdr.sei_cause;
 +      if (cause_idx > GAUDI2_NUM_OF_HBM_SEI_CAUSE - 1) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "err cause: %s",
 +                      "Invalid HBM SEI event cause (%d) provided by FW\n", cause_idx);
 +              return true;
 +      }
 +
 +      gaudi2_print_event(hdev, event_type, !sei_data->hdr.is_critical,
 +              "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n",
 +              sei_data->hdr.is_critical ? "Critical" : "Non-critical",
 +              hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel,
 +              hbm_mc_sei_cause[cause_idx]);
 +
 +      /* Print error-specific info */
 +      switch (cause_idx) {
 +      case HBM_SEI_CATTRIP:
 +              require_hard_reset = true;
 +              break;
 +
 +      case  HBM_SEI_CMD_PARITY_EVEN:
 +              gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_even_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case  HBM_SEI_CMD_PARITY_ODD:
 +              gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_odd_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case HBM_SEI_WRITE_DATA_PARITY_ERR:
 +              gaudi2_hbm_sei_print_wr_par_info(hdev, &sei_data->wr_parity_info,
 +                                              le32_to_cpu(sei_data->hdr.cnt));
 +              require_hard_reset = true;
 +              break;
 +
 +      case HBM_SEI_READ_ERR:
 +              /* Unlike other SEI events, read error requires further processing of the
 +               * raw data in order to determine the root cause.
 +               */
 +              require_hard_reset = gaudi2_hbm_sei_handle_read_err(hdev,
 +                                                              &sei_data->read_err_info,
 +                                                              le32_to_cpu(sei_data->hdr.cnt));
 +              break;
 +
 +      default:
 +              break;
 +      }
 +
 +      require_hard_reset |= !!sei_data->hdr.is_critical;
 +
 +      return require_hard_reset;
 +}
 +
 +static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u16 event_type,
 +                              u64 intr_cause_data)
 +{
 +      if (intr_cause_data) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "temperature error cause: %#llx", intr_cause_data);
 +              return 1;
 +      }
 +
 +      return 0;
 +}
 +
 +static int gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data)
 +{
 +      u32 i, error_count = 0;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE ; i++)
 +              if (intr_cause_data & hbm_mc_spi[i].mask) {
 +                      dev_dbg(hdev->dev, "HBM spi event: notification cause(%s)\n",
 +                              hbm_mc_spi[i].cause);
 +                      error_count++;
 +              }
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_dbg_ratelimited(hdev->dev, "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_dbg_ratelimited(hdev->dev, "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev, "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              dev_info_ratelimited(hdev->dev, "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n", event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, u16 event_type,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +
 +      gaudi2_print_event(hdev, event_type, false,
 +              "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci),
 +              q->pi, atomic_read(&q->ci));
 +}
 +
 +static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type)
 +{
 +      u32 p2p_intr, msix_gw_intr, error_count = 0;
 +
 +      p2p_intr = RREG32(mmPCIE_WRAP_P2P_INTR);
 +      msix_gw_intr = RREG32(mmPCIE_WRAP_MSIX_GW_INTR);
 +
 +      if (p2p_intr) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "pcie p2p transaction terminated due to security, req_id(0x%x)\n",
 +                      RREG32(mmPCIE_WRAP_P2P_REQ_ID));
 +
 +              WREG32(mmPCIE_WRAP_P2P_INTR, 0x1);
 +              error_count++;
 +      }
 +
 +      if (msix_gw_intr) {
 +              gaudi2_print_event(hdev, event_type, true,
 +                      "pcie msi-x gen denied due to vector num check failure, vec(0x%X)\n",
 +                      RREG32(mmPCIE_WRAP_MSIX_GW_VEC));
 +
 +              WREG32(mmPCIE_WRAP_MSIX_GW_INTR, 0x1);
 +              error_count++;
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_pcie_drain(struct hl_device *hdev,
 +                      struct hl_eq_pcie_drain_ind_data *drain_data)
 +{
 +      u64 lbw_rd, lbw_wr, hbw_rd, hbw_wr, cause, error_count = 0;
 +
 +      cause = le64_to_cpu(drain_data->intr_cause.intr_cause_data);
 +      lbw_rd = le64_to_cpu(drain_data->drain_rd_addr_lbw);
 +      lbw_wr = le64_to_cpu(drain_data->drain_wr_addr_lbw);
 +      hbw_rd = le64_to_cpu(drain_data->drain_rd_addr_hbw);
 +      hbw_wr = le64_to_cpu(drain_data->drain_wr_addr_hbw);
 +
 +      if (cause & BIT_ULL(0)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "PCIE AXI drain LBW completed, read_err %u, write_err %u\n",
 +                      !!lbw_rd, !!lbw_wr);
 +              error_count++;
 +      }
 +
 +      if (cause & BIT_ULL(1)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "PCIE AXI drain HBW completed, raddr %#llx, waddr %#llx\n",
 +                      hbw_rd, hbw_wr);
 +              error_count++;
 +      }
 +
 +      return error_count;
 +}
 +
 +static int gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data)
 +{
 +      u32 error_count = 0;
 +      int i;
 +
 +      for (i = 0 ; i < GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE ; i++) {
 +              if (intr_cause_data & BIT_ULL(i)) {
 +                      dev_err_ratelimited(hdev->dev, "PSOC %s completed\n",
 +                              gaudi2_psoc_axi_drain_interrupts_cause[i]);
 +                      error_count++;
 +              }
 +      }
 +
 +      hl_check_for_glbl_errors(hdev);
 +
 +      return error_count;
 +}
 +
 +static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, u16 event_type,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
 +
 +      gaudi2_print_event(hdev, event_type, false,
 +              "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static int hl_arc_event_handle(struct hl_device *hdev, u16 event_type,
 +                                      struct hl_eq_engine_arc_intr_data *data)
 +{
 +      struct hl_engine_arc_dccm_queue_full_irq *q;
 +      u32 intr_type, engine_id;
 +      u64 payload;
 +
 +      intr_type = le32_to_cpu(data->intr_type);
 +      engine_id = le32_to_cpu(data->engine_id);
 +      payload = le64_to_cpu(data->payload);
 +
 +      switch (intr_type) {
 +      case ENGINE_ARC_DCCM_QUEUE_FULL_IRQ:
 +              q = (struct hl_engine_arc_dccm_queue_full_irq *) &payload;
 +
 +              gaudi2_print_event(hdev, event_type, true,
 +                              "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n",
 +                              engine_id, intr_type, q->queue_index);
 +              return 1;
 +      default:
 +              gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type\n");
 +              return 0;
 +      }
 +}
 +
 +static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      bool reset_required = false, is_critical = false;
 +      u32 index, ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0;
 +      u64 event_mask = 0;
 +      u16 event_type;
 +
 +      ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK) >> EQ_CTL_EVENT_TYPE_SHIFT);
 +
 +      if (event_type >= GAUDI2_EVENT_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GAUDI2_EVENT_SIZE - 1);
 +              return;
 +      }
 +
 +      gaudi2->events_stat[event_type]++;
 +      gaudi2->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GAUDI2_EVENT_PCIE_CORE_SERR ... GAUDI2_EVENT_ARC0_ECC_DERR:
 +              fallthrough;
 +      case GAUDI2_EVENT_ROTATOR0_SERR ... GAUDI2_EVENT_ROTATOR1_DERR:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              reset_required = gaudi2_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
 +              is_critical = eq_entry->ecc_data.is_critical;
 +              error_count++;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_PDMA1_QM:
 +              fallthrough;
 +      case GAUDI2_EVENT_ROTATOR0_ROT0_QM ... GAUDI2_EVENT_ROTATOR1_ROT1_QM:
 +              fallthrough;
 +      case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1:
 +              error_count = gaudi2_handle_qman_err(hdev, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ARC_AXI_ERROR_RESPONSE_0:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              error_count = gaudi2_handle_arc_farm_sei_err(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_AXI_ERR_RSP:
 +              error_count = gaudi2_handle_cpu_sei_err(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              error_count = gaudi2_handle_qm_sei_err(hdev, event_type, true, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
 +              error_count = gaudi2_handle_rot_err(hdev, index, event_type,
 +                                      &eq_entry->razwi_with_intr_cause, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
 +              index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
 +              error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
 +                                              &eq_entry->razwi_with_intr_cause, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE ... GAUDI2_EVENT_DEC9_AXI_ERR_RSPONSE:
 +              index = event_type - GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE;
 +              error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_TPC0_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC1_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC2_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC3_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC4_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC5_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC6_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC7_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC8_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC9_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC10_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC11_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC12_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC13_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC14_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC15_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC16_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC17_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC18_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC19_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC20_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC21_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC22_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC23_KERNEL_ERR:
 +      case GAUDI2_EVENT_TPC24_KERNEL_ERR:
 +              index = (event_type - GAUDI2_EVENT_TPC0_KERNEL_ERR) /
 +                      (GAUDI2_EVENT_TPC1_KERNEL_ERR - GAUDI2_EVENT_TPC0_KERNEL_ERR);
 +              error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
 +                                      &eq_entry->razwi_with_intr_cause, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_DEC0_SPI:
 +      case GAUDI2_EVENT_DEC1_SPI:
 +      case GAUDI2_EVENT_DEC2_SPI:
 +      case GAUDI2_EVENT_DEC3_SPI:
 +      case GAUDI2_EVENT_DEC4_SPI:
 +      case GAUDI2_EVENT_DEC5_SPI:
 +      case GAUDI2_EVENT_DEC6_SPI:
 +      case GAUDI2_EVENT_DEC7_SPI:
 +      case GAUDI2_EVENT_DEC8_SPI:
 +      case GAUDI2_EVENT_DEC9_SPI:
 +              index = (event_type - GAUDI2_EVENT_DEC0_SPI) /
 +                              (GAUDI2_EVENT_DEC1_SPI - GAUDI2_EVENT_DEC0_SPI);
 +              error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
 +      case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
 +              index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
 +                              (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
 +                                              GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
 +              error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
 +              error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME1_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME2_QMAN_SW_ERROR:
 +      case GAUDI2_EVENT_MME3_QMAN_SW_ERROR:
 +              index = (event_type - GAUDI2_EVENT_MME0_QMAN_SW_ERROR) /
 +                              (GAUDI2_EVENT_MME1_QMAN_SW_ERROR -
 +                                      GAUDI2_EVENT_MME0_QMAN_SW_ERROR);
 +              error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME2_WAP_SOURCE_RESULT_INVALID:
 +      case GAUDI2_EVENT_MME3_WAP_SOURCE_RESULT_INVALID:
 +              index = (event_type - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID) /
 +                              (GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID -
 +                                      GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID);
 +              error_count = gaudi2_handle_mme_wap_err(hdev, index, event_type, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_KDMA0_CORE:
 +              error_count = gaudi2_handle_kdma_core_event(hdev, event_type,
 +                                      le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE:
 +              error_count = gaudi2_handle_dma_core_event(hdev, event_type,
 +                                      le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR:
 +              error_count = gaudi2_print_pcie_addr_dec_info(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data), &event_mask);
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
 +      case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
 +      case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
 +              error_count = gaudi2_handle_mmu_spi_sei_err(hdev, event_type, &event_mask);
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HIF0_FATAL ... GAUDI2_EVENT_HIF12_FATAL:
 +              error_count = gaudi2_handle_hif_fatal(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PMMU_FATAL_0:
 +              error_count = gaudi2_handle_pif_fatal(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC63_RAZWI_OR_PID_MIN_MAX_INTERRUPT:
 +              error_count = gaudi2_ack_psoc_razwi_event_handler(hdev, &event_mask);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE ... GAUDI2_EVENT_HBM5_MC1_SEI_NON_SEVERE:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              if (gaudi2_handle_hbm_mc_sei_err(hdev, event_type, &eq_entry->sei_data)) {
 +                      reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +                      reset_required = true;
 +              }
 +              error_count++;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM_CATTRIP_0 ... GAUDI2_EVENT_HBM_CATTRIP_5:
 +              error_count = gaudi2_handle_hbm_cattrip(hdev, event_type,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_HBM0_MC0_SPI ... GAUDI2_EVENT_HBM5_MC1_SPI:
 +              error_count = gaudi2_handle_hbm_mc_spi(hdev,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_DRAIN_COMPLETE:
 +              error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data);
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN:
 +              error_count = gaudi2_handle_psoc_drain(hdev,
 +                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_AXI_ECC:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_CPU_L2_RAM_ECC:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME0_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME1_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME2_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME2_SBTE4_AXI_ERR_RSP:
 +      case GAUDI2_EVENT_MME3_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME3_SBTE4_AXI_ERR_RSP:
 +              error_count = gaudi2_handle_mme_sbte_err(hdev, event_type,
 +                                              le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +      case GAUDI2_EVENT_VM0_ALARM_A ... GAUDI2_EVENT_VM3_ALARM_B:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PSOC_AXI_ERR_RSP:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PSOC_PRSTN_FALL:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PCIE_APB_TIMEOUT:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_PCIE_FATAL_ERR:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_TPC0_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC1_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC2_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC3_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC4_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC5_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC6_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC7_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC8_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC9_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC10_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC11_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC12_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC13_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC14_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC15_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC16_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC17_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC18_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC19_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC20_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC21_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC22_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC23_BMON_SPMU:
 +      case GAUDI2_EVENT_TPC24_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME0_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME1_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME2_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_CTRL_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_SBTE_BMON_SPMU:
 +      case GAUDI2_EVENT_MME3_WAP_BMON_SPMU:
 +      case GAUDI2_EVENT_HDMA2_BM_SPMU ... GAUDI2_EVENT_PDMA1_BM_SPMU:
 +              fallthrough;
 +      case GAUDI2_EVENT_DEC0_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC1_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC2_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC3_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC4_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC5_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC6_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC7_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC8_BMON_SPMU:
 +      case GAUDI2_EVENT_DEC9_BMON_SPMU:
 +      case GAUDI2_EVENT_ROTATOR0_BMON_SPMU ... GAUDI2_EVENT_SM3_BMON_SPMU:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
 +      case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
 +      case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
 +              gaudi2_print_clk_change_info(hdev, event_type, &event_mask);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC:
 +              gaudi2_print_out_of_sync_info(hdev, event_type, &eq_entry->pkt_sync_err);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_FLR_REQUESTED:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              /* Do nothing- FW will handle it */
 +              break;
 +
 +      case GAUDI2_EVENT_PCIE_P2P_MSIX:
 +              error_count = gaudi2_handle_pcie_p2p_msix(hdev, event_type);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_SM3_AXI_ERROR_RESPONSE:
 +              index = event_type - GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE;
 +              error_count = gaudi2_handle_sm_err(hdev, event_type, index);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_PSOC_MME_PLL_LOCK_ERR ... GAUDI2_EVENT_DCORE2_HBM_PLL_LOCK_ERR:
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
 +              dev_info(hdev->dev, "CPLD shutdown cause, reset reason: 0x%llx\n",
 +                                              le64_to_cpu(eq_entry->data[0]));
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +      case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_EVENT:
 +              dev_err(hdev->dev, "CPLD shutdown event, reset reason: 0x%llx\n",
 +                                              le64_to_cpu(eq_entry->data[0]));
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_PKT_SANITY_FAILED:
 +              gaudi2_print_cpu_pkt_failure_info(hdev, event_type, &eq_entry->pkt_sync_err);
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_ARC_DCCM_FULL:
 +              error_count = hl_arc_event_handle(hdev, event_type, &eq_entry->arc_data);
 +              event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 +              break;
 +
 +      case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED:
 +      case GAUDI2_EVENT_DEV_RESET_REQ:
 +              event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
 +              error_count = GAUDI2_NA_EVENT_CAUSE;
 +              is_critical = true;
 +              break;
 +
 +      default:
 +              if (gaudi2_irq_map_table[event_type].valid) {
 +                      dev_err_ratelimited(hdev->dev, "Cannot find handler for event %d\n",
 +                                              event_type);
 +                      error_count = GAUDI2_NA_EVENT_CAUSE;
 +              }
 +      }
 +
 +      /* Make sure to dump an error in case no error cause was printed so far.
 +       * Note that although we have counted the errors, we use this number as
 +       * a boolean.
 +       */
 +      if (error_count == GAUDI2_NA_EVENT_CAUSE && !is_info_event(event_type))
 +              gaudi2_print_event(hdev, event_type, true, "%d", event_type);
 +      else if (error_count == 0)
 +              gaudi2_print_event(hdev, event_type, true,
 +                              "No error cause for H/W event %u\n", event_type);
 +
 +      if ((gaudi2_irq_map_table[event_type].reset || reset_required) &&
 +                              (hdev->hard_reset_on_fw_events ||
 +                              (hdev->asic_prop.fw_security_enabled && is_critical)))
 +              goto reset_device;
 +
 +      /* Send unmask irq only for interrupts not classified as MSG */
 +      if (!gaudi2_irq_map_table[event_type].msg)
 +              hl_fw_unmask_irq(hdev, event_type);
 +
 +      if (event_mask)
 +              hl_notifier_event_send_all(hdev, event_mask);
 +
 +      return;
 +
 +reset_device:
 +      if (hdev->asic_prop.fw_security_enabled && is_critical) {
 +              reset_flags |= HL_DRV_RESET_BYPASS_REQ_TO_FW;
 +              event_mask |= HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
 +      } else {
 +              reset_flags |= HL_DRV_RESET_DELAY;
 +      }
 +      event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
 +      hl_device_cond_reset(hdev, reset_flags, event_mask);
 +}
 +
 +static int gaudi2_memset_memory_chunk_using_edma_qm(struct hl_device *hdev,
 +                      struct packet_lin_dma *lin_dma_pkt, dma_addr_t pkt_dma_addr,
 +                      u32 hw_queue_id, u32 size, u64 addr, u32 val)
 +{
 +      u32 ctl, pkt_size;
 +      int rc = 0;
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_WRCOMP_MASK, 1);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 1);
 +
 +      lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +      lin_dma_pkt->src_addr = cpu_to_le64(val);
 +      lin_dma_pkt->dst_addr = cpu_to_le64(addr);
 +      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +      pkt_size = sizeof(struct packet_lin_dma);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to send lin dma packet to H/W queue %d\n",
 +                              hw_queue_id);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val)
 +{
 +      u32 edma_queues_id[] = {GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
 +                                      GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0};
 +      u32 chunk_size, dcore, edma_idx, sob_offset, sob_addr, comp_val,
 +              old_mmubp, mmubp, num_of_pkts, busy, pkt_size;
 +      u64 comp_addr, cur_addr = addr, end_addr = addr + size;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      void *lin_dma_pkts_arr;
 +      dma_addr_t pkt_dma_addr;
 +      int rc = 0, dma_num = 0;
 +
 +      if (prop->edma_enabled_mask == 0) {
 +              dev_info(hdev->dev, "non of the EDMA engines is enabled - skip dram scrubbing\n");
 +              return -EIO;
 +      }
 +
 +      sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
 +      comp_addr = CFG_BASE + sob_addr;
 +      comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
 +              FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
 +      mmubp = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_MASK, 1) |
 +              FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_MASK, 1);
 +
 +      /* Calculate how many lin dma pkts we'll need */
 +      num_of_pkts = div64_u64(round_up(size, SZ_2G), SZ_2G);
 +      pkt_size = sizeof(struct packet_lin_dma);
 +
 +      lin_dma_pkts_arr = hl_asic_dma_alloc_coherent(hdev, pkt_size * num_of_pkts,
 +                                      &pkt_dma_addr, GFP_KERNEL);
 +      if (!lin_dma_pkts_arr)
 +              return -ENOMEM;
 +
 +      /*
 +       * set mmu bypass for the scrubbing - all ddmas are configured the same so save
 +       * only the first one to restore later
 +       * also set the sob addr for all edma cores for completion.
 +       * set QM as trusted to allow it to access physical address with MMU bp.
 +       */
 +      old_mmubp = RREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP);
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                      u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
 +                      u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                              continue;
 +
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP +
 +                                      edma_offset, mmubp);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset,
 +                                      lower_32_bits(comp_addr));
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset,
 +                                      upper_32_bits(comp_addr));
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset,
 +                                      comp_val);
 +                      gaudi2_qman_set_test_mode(hdev,
 +                                      edma_queues_id[dcore] + 4 * edma_idx, true);
 +              }
 +      }
 +
 +      WREG32(sob_addr, 0);
 +
 +      while (cur_addr < end_addr) {
 +              for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +                      for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                              u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                              if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                                      continue;
 +
 +                              chunk_size = min_t(u64, SZ_2G, end_addr - cur_addr);
 +
 +                              rc = gaudi2_memset_memory_chunk_using_edma_qm(hdev,
 +                                      (struct packet_lin_dma *)lin_dma_pkts_arr + dma_num,
 +                                      pkt_dma_addr + dma_num * pkt_size,
 +                                      edma_queues_id[dcore] + edma_idx * 4,
 +                                      chunk_size, cur_addr, val);
 +                              if (rc)
 +                                      goto end;
 +
 +                              dma_num++;
 +                              cur_addr += chunk_size;
 +                              if (cur_addr == end_addr)
 +                                      break;
 +                      }
 +              }
 +      }
 +
 +      rc = hl_poll_timeout(hdev, sob_addr, busy, (busy == dma_num), 1000, 1000000);
 +      if (rc) {
 +              dev_err(hdev->dev, "DMA Timeout during HBM scrubbing\n");
 +              goto end;
 +      }
 +end:
 +      for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
 +              for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
 +                      u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
 +                      u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
 +
 +                      if (!(prop->edma_enabled_mask & BIT(edma_bit)))
 +                              continue;
 +
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + edma_offset, old_mmubp);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset, 0);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset, 0);
 +                      WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset, 0);
 +                      gaudi2_qman_set_test_mode(hdev,
 +                                      edma_queues_id[dcore] + 4 * edma_idx, false);
 +              }
 +      }
 +
 +      WREG32(sob_addr, 0);
 +      hl_asic_dma_free_coherent(hdev, pkt_size * num_of_pkts, lin_dma_pkts_arr, pkt_dma_addr);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      int rc;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 size = prop->dram_end_address - prop->dram_user_base_address;
 +
 +      rc = gaudi2_memset_device_memory(hdev, prop->dram_user_base_address, size, val);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to scrub dram, address: 0x%llx size: %llu\n",
 +                              prop->dram_user_base_address, size);
 +      return rc;
 +}
 +
 +static int gaudi2_scrub_device_mem(struct hl_device *hdev)
 +{
 +      int rc;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 val = hdev->memory_scrub_val;
 +      u64 addr, size;
 +
 +      if (!hdev->memory_scrub)
 +              return 0;
 +
 +      /* scrub SRAM */
 +      addr = prop->sram_user_base_address;
 +      size = hdev->pldm ? 0x10000 : (prop->sram_size - SRAM_USER_BASE_OFFSET);
 +      dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx, val: 0x%llx\n",
 +                      addr, addr + size, val);
 +      rc = gaudi2_memset_device_memory(hdev, addr, size, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "scrubbing SRAM failed (%d)\n", rc);
 +              return rc;
 +      }
 +
 +      /* scrub DRAM */
 +      rc = gaudi2_scrub_device_dram(hdev, val);
 +      if (rc) {
 +              dev_err(hdev->dev, "scrubbing DRAM failed (%d)\n", rc);
 +              return rc;
 +      }
 +      return 0;
 +}
 +
 +static void gaudi2_restore_user_sm_registers(struct hl_device *hdev)
 +{
 +      u64 addr, mon_sts_addr, mon_cfg_addr, cq_lbw_l_addr, cq_lbw_h_addr,
 +              cq_lbw_data_addr, cq_base_l_addr, cq_base_h_addr, cq_size_addr;
 +      u32 val, size, offset;
 +      int dcore_id;
 +
 +      offset = hdev->asic_prop.first_available_cq[0] * 4;
 +      cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset;
 +      cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + offset;
 +      cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + offset;
 +      cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + offset;
 +      cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + offset;
 +      cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + offset;
 +      size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 -
 +                      (mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset);
 +
 +      /* memset dcore0 CQ registers */
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
 +      gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
 +
 +      cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + DCORE_OFFSET;
 +      cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + DCORE_OFFSET;
 +      cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + DCORE_OFFSET;
 +      cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + DCORE_OFFSET;
 +      cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + DCORE_OFFSET;
 +      cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 - mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
 +              gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
 +
 +              cq_lbw_l_addr += DCORE_OFFSET;
 +              cq_lbw_h_addr += DCORE_OFFSET;
 +              cq_lbw_data_addr += DCORE_OFFSET;
 +              cq_base_l_addr += DCORE_OFFSET;
 +              cq_base_h_addr += DCORE_OFFSET;
 +              cq_size_addr += DCORE_OFFSET;
 +      }
 +
 +      offset = hdev->asic_prop.first_available_user_mon[0] * 4;
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset;
 +      val = 1 << DCORE0_SYNC_MNGR_OBJS_MON_STATUS_PROT_SHIFT;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - (mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset);
 +
 +      /* memset dcore0 monitors */
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + offset;
 +      gaudi2_memset_device_lbw(hdev, addr, size, 0);
 +
 +      mon_sts_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + DCORE_OFFSET;
 +      mon_cfg_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, mon_sts_addr, size, val);
 +              gaudi2_memset_device_lbw(hdev, mon_cfg_addr, size, 0);
 +              mon_sts_addr += DCORE_OFFSET;
 +              mon_cfg_addr += DCORE_OFFSET;
 +      }
 +
 +      offset = hdev->asic_prop.first_available_user_sob[0] * 4;
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset;
 +      val = 0;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 -
 +                      (mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
 +
 +      /* memset dcore0 sobs */
 +      gaudi2_memset_device_lbw(hdev, addr, size, val);
 +
 +      addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + DCORE_OFFSET;
 +      size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 - mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0;
 +
 +      for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
 +              gaudi2_memset_device_lbw(hdev, addr, size, val);
 +              addr += DCORE_OFFSET;
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      val = RREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
 +}
 +
 +static void gaudi2_restore_user_qm_registers(struct hl_device *hdev)
 +{
 +      u32 reg_base, hw_queue_id;
 +
 +      for (hw_queue_id = GAUDI2_QUEUE_ID_PDMA_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_ROT_1_0;
 +                                                      hw_queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
 +                      continue;
 +
 +              gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
 +
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      RREG32(mmPDMA0_QM_ARB_CFG_0);
 +}
 +
 +static void gaudi2_restore_nic_qm_registers(struct hl_device *hdev)
 +{
 +      u32 reg_base, hw_queue_id;
 +
 +      for (hw_queue_id = GAUDI2_QUEUE_ID_NIC_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_NIC_23_3;
 +                                                      hw_queue_id += NUM_OF_PQ_PER_QMAN) {
 +              if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
 +                      continue;
 +
 +              gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
 +
 +              reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
 +              WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
 +      }
 +
 +      /* Flush all WREG to prevent race */
 +      RREG32(mmPDMA0_QM_ARB_CFG_0);
 +}
 +
 +static int gaudi2_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      return 0;
 +}
 +
 +static void gaudi2_restore_phase_topology(struct hl_device *hdev)
 +{
 +}
 +
 +static void gaudi2_init_block_instances(struct hl_device *hdev, u32 block_idx,
 +                                              struct dup_block_ctx *cfg_ctx)
 +{
 +      u64 block_base = cfg_ctx->base + block_idx * cfg_ctx->block_off;
 +      u8 seq;
 +      int i;
 +
 +      for (i = 0 ; i < cfg_ctx->instances ; i++) {
 +              seq = block_idx * cfg_ctx->instances + i;
 +
 +              /* skip disabled instance */
 +              if (!(cfg_ctx->enabled_mask & BIT_ULL(seq)))
 +                      continue;
 +
 +              cfg_ctx->instance_cfg_fn(hdev, block_base + i * cfg_ctx->instance_off,
 +                                      cfg_ctx->data);
 +      }
 +}
 +
 +static void gaudi2_init_blocks_with_mask(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx,
 +                                              u64 mask)
 +{
 +      int i;
 +
 +      cfg_ctx->enabled_mask = mask;
 +
 +      for (i = 0 ; i < cfg_ctx->blocks ; i++)
 +              gaudi2_init_block_instances(hdev, i, cfg_ctx);
 +}
 +
 +void gaudi2_init_blocks(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx)
 +{
 +      gaudi2_init_blocks_with_mask(hdev, cfg_ctx, U64_MAX);
 +}
 +
 +static int gaudi2_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
 +{
 +      void *host_mem_virtual_addr;
 +      dma_addr_t host_mem_dma_addr;
 +      u64 reserved_va_base;
 +      u32 pos, size_left, size_to_dma;
 +      struct hl_ctx *ctx;
 +      int rc = 0;
 +
 +      /* Fetch the ctx */
 +      ctx = hl_get_compute_ctx(hdev);
 +      if (!ctx) {
 +              dev_err(hdev->dev, "No ctx available\n");
 +              return -EINVAL;
 +      }
 +
 +      /* Allocate buffers for read and for poll */
 +      host_mem_virtual_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &host_mem_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +      if (host_mem_virtual_addr == NULL) {
 +              dev_err(hdev->dev, "Failed to allocate memory for KDMA read\n");
 +              rc = -ENOMEM;
 +              goto put_ctx;
 +      }
 +
 +      /* Reserve VM region on asic side */
 +      reserved_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST, SZ_2M,
 +                                              HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +      if (!reserved_va_base) {
 +              dev_err(hdev->dev, "Failed to reserve vmem on asic\n");
 +              rc = -ENOMEM;
 +              goto free_data_buffer;
 +      }
 +
 +      /* Create mapping on asic side */
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, reserved_va_base, host_mem_dma_addr, SZ_2M);
 +      hl_mmu_invalidate_cache_range(hdev, false,
 +                                    MMU_OP_USERPTR | MMU_OP_SKIP_LOW_CACHE_INV,
 +                                    ctx->asid, reserved_va_base, SZ_2M);
 +      mutex_unlock(&hdev->mmu_lock);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to create mapping on asic mmu\n");
 +              goto unreserve_va;
 +      }
 +
 +      /* Enable MMU on KDMA */
 +      gaudi2_kdma_set_mmbp_asid(hdev, false, ctx->asid);
 +
 +      pos = 0;
 +      size_left = size;
 +      size_to_dma = SZ_2M;
 +
 +      while (size_left > 0) {
 +              if (size_left < SZ_2M)
 +                      size_to_dma = size_left;
 +
 +              rc = gaudi2_send_job_to_kdma(hdev, addr, reserved_va_base, size_to_dma, false);
 +              if (rc)
 +                      break;
 +
 +              memcpy(blob_addr + pos, host_mem_virtual_addr, size_to_dma);
 +
 +              if (size_left <= SZ_2M)
 +                      break;
 +
 +              pos += SZ_2M;
 +              addr += SZ_2M;
 +              size_left -= SZ_2M;
 +      }
 +
 +      gaudi2_kdma_set_mmbp_asid(hdev, true, HL_KERNEL_ASID_ID);
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M);
 +      hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR,
 +                                    ctx->asid, reserved_va_base, SZ_2M);
 +      mutex_unlock(&hdev->mmu_lock);
 +unreserve_va:
 +      hl_unreserve_va_block(hdev, ctx, reserved_va_base, SZ_2M);
 +free_data_buffer:
 +      hl_asic_dma_free_coherent(hdev, SZ_2M, host_mem_virtual_addr, host_mem_dma_addr);
 +put_ctx:
 +      hl_ctx_put(ctx);
 +
 +      return rc;
 +}
 +
 +static int gaudi2_internal_cb_pool_init(struct hl_device *hdev, struct hl_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int min_alloc_order, rc;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
 +              return 0;
 +
 +      hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
 +                                                              HOST_SPACE_INTERNAL_CB_SZ,
 +                                                              &hdev->internal_cb_pool_dma_addr,
 +                                                              GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->internal_cb_pool_virt_addr)
 +              return -ENOMEM;
 +
 +      min_alloc_order = ilog2(min(gaudi2_get_signal_cb_size(hdev),
 +                                      gaudi2_get_wait_cb_size(hdev)));
 +
 +      hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
 +      if (!hdev->internal_cb_pool) {
 +              dev_err(hdev->dev, "Failed to create internal CB pool\n");
 +              rc = -ENOMEM;
 +              goto free_internal_cb_pool;
 +      }
 +
 +      rc = gen_pool_add(hdev->internal_cb_pool, (uintptr_t) hdev->internal_cb_pool_virt_addr,
 +                              HOST_SPACE_INTERNAL_CB_SZ, -1);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to add memory to internal CB pool\n");
 +              rc = -EFAULT;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST,
 +                                      HOST_SPACE_INTERNAL_CB_SZ, HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
 +
 +      if (!hdev->internal_cb_va_base) {
 +              rc = -ENOMEM;
 +              goto destroy_internal_cb_pool;
 +      }
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base, hdev->internal_cb_pool_dma_addr,
 +                                      HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      if (rc)
 +              goto unreserve_internal_cb_pool;
 +
 +      return 0;
 +
 +unreserve_internal_cb_pool:
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +destroy_internal_cb_pool:
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +free_internal_cb_pool:
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_internal_cb_pool_fini(struct hl_device *hdev, struct hl_ctx *ctx)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
 +              return;
 +
 +      mutex_lock(&hdev->mmu_lock);
 +      hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
 +      mutex_unlock(&hdev->mmu_lock);
 +
 +      gen_pool_destroy(hdev->internal_cb_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
 +                                      hdev->internal_cb_pool_dma_addr);
 +}
 +
 +static void gaudi2_restore_user_registers(struct hl_device *hdev)
 +{
 +      gaudi2_restore_user_sm_registers(hdev);
 +      gaudi2_restore_user_qm_registers(hdev);
 +}
 +
 +static int gaudi2_map_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int rc;
 +
 +      rc = hl_mmu_map_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
 +                              gaudi2->virt_msix_db_dma_addr, prop->pmmu.page_size, true);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to map VA %#llx for virtual MSI-X doorbell memory\n",
 +                      RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_unmap_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
 +{
 +      struct hl_device *hdev = ctx->hdev;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      rc = hl_mmu_unmap_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
 +                              prop->pmmu.page_size, true);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to unmap VA %#llx of virtual MSI-X doorbell memory\n",
 +                      RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
 +}
 +
 +static int gaudi2_ctx_init(struct hl_ctx *ctx)
 +{
 +      int rc;
 +
 +      rc = gaudi2_mmu_prepare(ctx->hdev, ctx->asid);
 +      if (rc)
 +              return rc;
 +
 +      /* No need to clear user registers if the device has just
 +       * performed reset, we restore only nic qm registers
 +       */
 +      if (ctx->hdev->reset_upon_device_release)
 +              gaudi2_restore_nic_qm_registers(ctx->hdev);
 +      else
 +              gaudi2_restore_user_registers(ctx->hdev);
 +
 +      rc = gaudi2_internal_cb_pool_init(ctx->hdev, ctx);
 +      if (rc)
 +              return rc;
 +
 +      rc = gaudi2_map_virtual_msix_doorbell_memory(ctx);
 +      if (rc)
 +              gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_ctx_fini(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid == HL_KERNEL_ASID_ID)
 +              return;
 +
 +      gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
 +
 +      gaudi2_unmap_virtual_msix_doorbell_memory(ctx);
 +}
 +
 +static int gaudi2_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      struct hl_device *hdev = cs->ctx->hdev;
 +      int index = cs->sequence & (hdev->asic_prop.max_pending_cs - 1);
 +      u32 mon_payload, sob_id, mon_id;
 +
 +      if (!cs_needs_completion(cs))
 +              return 0;
 +
 +      /*
 +       * First 64 SOB/MON are reserved for driver for QMAN auto completion
 +       * mechanism. Each SOB/MON pair are used for a pending CS with the same
 +       * cyclic index. The SOB value is increased when each of the CS jobs is
 +       * completed. When the SOB reaches the number of CS jobs, the monitor
 +       * generates MSI-X interrupt.
 +       */
 +
 +      sob_id = mon_id = index;
 +      mon_payload = (1 << CQ_ENTRY_SHADOW_INDEX_VALID_SHIFT) |
 +                              (1 << CQ_ENTRY_READY_SHIFT) | index;
 +
 +      gaudi2_arm_cq_monitor(hdev, sob_id, mon_id, GAUDI2_RESERVED_CQ_CS_COMPLETION, mon_payload,
 +                              cs->jobs_cnt);
 +
 +      return 0;
 +}
 +
 +static u32 gaudi2_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return HL_INVALID_QUEUE;
 +}
 +
 +static u32 gaudi2_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id, u32 size, bool eb)
 +{
 +      struct hl_cb *cb = data;
 +      struct packet_msg_short *pkt;
 +      u32 value, ctl, pkt_size = sizeof(*pkt);
 +
 +      pkt = (struct packet_msg_short *) (uintptr_t) (cb->kernel_address + size);
 +      memset(pkt, 0, pkt_size);
 +
 +      /* Inc by 1, Mode ADD */
 +      value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 1); /* SOB base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, eb);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return size + pkt_size;
 +}
 +
 +static u32 gaudi2_add_mon_msg_short(struct packet_msg_short *pkt, u32 value, u16 addr)
 +{
 +      u32 ctl, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0);  /* MON base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 0);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_add_arm_monitor_pkt(struct hl_device *hdev, struct packet_msg_short *pkt,
 +                                      u16 sob_base, u8 sob_mask, u16 sob_val, u16 addr)
 +{
 +      u32 ctl, value, pkt_size = sizeof(*pkt);
 +      u8 mask;
 +
 +      if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
 +              dev_err(hdev->dev, "sob_base %u (mask %#x) is not valid\n", sob_base, sob_mask);
 +              return 0;
 +      }
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MODE_MASK, 0); /* GREATER OR EQUAL*/
 +      value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MASK_MASK, mask);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0); /* MON base */
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->value = cpu_to_le32(value);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_add_fence_pkt(struct packet_fence *pkt)
 +{
 +      u32 ctl, cfg, pkt_size = sizeof(*pkt);
 +
 +      memset(pkt, 0, pkt_size);
 +
 +      cfg = FIELD_PREP(GAUDI2_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
 +      cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_ID_MASK, 2);
 +
 +      ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
 +      ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
 +
 +      pkt->cfg = cpu_to_le32(cfg);
 +      pkt->ctl = cpu_to_le32(ctl);
 +
 +      return pkt_size;
 +}
 +
 +static u32 gaudi2_gen_wait_cb(struct hl_device *hdev, struct hl_gen_wait_properties *prop)
 +{
 +      struct hl_cb *cb = prop->data;
 +      void *buf = (void *) (uintptr_t) (cb->kernel_address);
 +
 +      u64 monitor_base, fence_addr = 0;
 +      u32 stream_index, size = prop->size;
 +      u16 msg_addr_offset;
 +
 +      stream_index = prop->q_idx % 4;
 +      fence_addr = CFG_BASE + gaudi2_qm_blocks_bases[prop->q_idx] +
 +                      QM_FENCE2_OFFSET + stream_index * 4;
 +
 +      /*
 +       * monitor_base should be the content of the base0 address registers,
 +       * so it will be added to the msg short offsets
 +       */
 +      monitor_base = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 +
 +      /* First monitor config packet: low address of the sync */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, (u32) fence_addr, msg_addr_offset);
 +
 +      /* Second monitor config packet: high address of the sync */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32), msg_addr_offset);
 +
 +      /*
 +       * Third monitor config packet: the payload, i.e. what to write when the
 +       * sync triggers
 +       */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + prop->mon_id * 4) -
 +                              monitor_base;
 +
 +      size += gaudi2_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 +
 +      /* Fourth monitor config packet: bind the monitor to a sync object */
 +      msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) - monitor_base;
 +
 +      size += gaudi2_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base, prop->sob_mask,
 +                                              prop->sob_val, msg_addr_offset);
 +
 +      /* Fence packet */
 +      size += gaudi2_add_fence_pkt(buf + size);
 +
 +      return size;
 +}
 +
 +static void gaudi2_reset_sob(struct hl_device *hdev, void *data)
 +{
 +      struct hl_hw_sob *hw_sob = data;
 +
 +      dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx, hw_sob->sob_id);
 +
 +      WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + hw_sob->sob_id * 4, 0);
 +
 +      kref_init(&hw_sob->kref);
 +}
 +
 +static void gaudi2_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +}
 +
 +static u64 gaudi2_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int gaudi2_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static int gaudi2_collective_wait_create_jobs(struct hl_device *hdev, struct hl_ctx *ctx,
 +                                      struct hl_cs *cs, u32 wait_queue_id,
 +                                      u32 collective_engine_id, u32 encaps_signal_offset)
 +{
 +      return -EINVAL;
 +}
 +
 +/*
 + * hl_mmu_scramble - converts a dram (non power of 2) page-size aligned address
 + *                   to DMMU page-size address (64MB) before mapping it in
 + *                   the MMU.
 + * The operation is performed on both the virtual and physical addresses.
 + * for device with 6 HBMs the scramble is:
 + * (addr[47:0] / 48M) * 64M + addr % 48M + addr[63:48]
 + *
 + * Example:
 + * =============================================================================
 + * Allocated DRAM  Reserved VA      scrambled VA for MMU mapping    Scrambled PA
 + * Phys address                                                     in MMU last
 + *                                                                    HOP
 + * =============================================================================
 + * PA1 0x3000000  VA1 0x9C000000  SVA1= (VA1/48M)*64M 0xD0000000  <- PA1/48M 0x1
 + * PA2 0x9000000  VA2 0x9F000000  SVA2= (VA2/48M)*64M 0xD4000000  <- PA2/48M 0x3
 + * =============================================================================
 + */
 +static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 divisor, mod_va;
 +      u64 div_va;
 +
 +      /* accept any address in the DRAM address space */
 +      if (hl_mem_area_inside_range(raw_addr, sizeof(raw_addr), DRAM_PHYS_BASE,
 +                                                                      VA_HBM_SPACE_END)) {
 +
 +              divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
 +              div_va = div_u64_rem(raw_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK, divisor, &mod_va);
 +              return (raw_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) |
 +                      (div_va << GAUDI2_HBM_MMU_SCRM_DIV_SHIFT) |
 +                      (mod_va << GAUDI2_HBM_MMU_SCRM_MOD_SHIFT);
 +      }
 +
 +      return raw_addr;
 +}
 +
 +static u64 gaudi2_mmu_descramble_addr(struct hl_device *hdev, u64 scrambled_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 divisor, mod_va;
 +      u64 div_va;
 +
 +      /* accept any address in the DRAM address space */
 +      if (hl_mem_area_inside_range(scrambled_addr, sizeof(scrambled_addr), DRAM_PHYS_BASE,
 +                                                                      VA_HBM_SPACE_END)) {
 +
 +              divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
 +              div_va = div_u64_rem(scrambled_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK,
 +                                      PAGE_SIZE_64MB, &mod_va);
 +
 +              return ((scrambled_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) +
 +                                      (div_va * divisor + mod_va));
 +      }
 +
 +      return scrambled_addr;
 +}
 +
 +static u32 gaudi2_get_dec_base_addr(struct hl_device *hdev, u32 core_id)
 +{
 +      u32 base = 0, dcore_id, dec_id;
 +
 +      if (core_id >= NUMBER_OF_DEC) {
 +              dev_err(hdev->dev, "Unexpected core number %d for DEC\n", core_id);
 +              goto out;
 +      }
 +
 +      if (core_id < 8) {
 +              dcore_id = core_id / NUM_OF_DEC_PER_DCORE;
 +              dec_id = core_id % NUM_OF_DEC_PER_DCORE;
 +
 +              base = mmDCORE0_DEC0_CMD_BASE + dcore_id * DCORE_OFFSET +
 +                              dec_id * DCORE_VDEC_OFFSET;
 +      } else {
 +              /* PCIe Shared Decoder */
 +              base = mmPCIE_DEC0_CMD_BASE + ((core_id % 8) * PCIE_VDEC_OFFSET);
 +      }
 +out:
 +      return base;
 +}
 +
 +static int gaudi2_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                              u32 *block_size, u32 *block_id)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      int i;
 +
 +      for (i = 0 ; i < NUM_USER_MAPPED_BLOCKS ; i++) {
 +              if (block_addr == CFG_BASE + gaudi2->mapped_blocks[i].address) {
 +                      *block_id = i;
 +                      if (block_size)
 +                              *block_size = gaudi2->mapped_blocks[i].size;
 +                      return 0;
 +              }
 +      }
 +
 +      dev_err(hdev->dev, "Invalid block address %#llx", block_addr);
 +
 +      return -EINVAL;
 +}
 +
 +static int gaudi2_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      u32 block_id, u32 block_size)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u64 offset_in_bar;
 +      u64 address;
 +      int rc;
 +
 +      if (block_id >= NUM_USER_MAPPED_BLOCKS) {
 +              dev_err(hdev->dev, "Invalid block id %u", block_id);
 +              return -EINVAL;
 +      }
 +
 +      /* we allow mapping only an entire block */
 +      if (block_size != gaudi2->mapped_blocks[block_id].size) {
 +              dev_err(hdev->dev, "Invalid block size %u", block_size);
 +              return -EINVAL;
 +      }
 +
 +      offset_in_bar = CFG_BASE + gaudi2->mapped_blocks[block_id].address - STM_FLASH_BASE_ADDR;
 +
 +      address = pci_resource_start(hdev->pdev, SRAM_CFG_BAR_ID) + offset_in_bar;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
 +                      block_size, vma->vm_page_prot);
 +      if (rc)
 +              dev_err(hdev->dev, "remap_pfn_range error %d", rc);
 +
 +      return rc;
 +}
 +
 +static void gaudi2_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
 +      u32 irq_handler_offset = le32_to_cpu(dyn_regs->gic_host_ints_irq);
 +
 +      if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
 +              WREG32(irq_handler_offset,
 +                      gaudi2_irq_map_table[GAUDI2_EVENT_CPU_INTS_REGISTER].cpu_id);
 +}
 +
 +static int gaudi2_get_mmu_base(struct hl_device *hdev, u64 mmu_id, u32 *mmu_base)
 +{
 +      switch (mmu_id) {
 +      case HW_CAP_DCORE0_DMMU0:
 +              *mmu_base = mmDCORE0_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU1:
 +              *mmu_base = mmDCORE0_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU2:
 +              *mmu_base = mmDCORE0_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE0_DMMU3:
 +              *mmu_base = mmDCORE0_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU0:
 +              *mmu_base = mmDCORE1_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU1:
 +              *mmu_base = mmDCORE1_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU2:
 +              *mmu_base = mmDCORE1_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE1_DMMU3:
 +              *mmu_base = mmDCORE1_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU0:
 +              *mmu_base = mmDCORE2_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU1:
 +              *mmu_base = mmDCORE2_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU2:
 +              *mmu_base = mmDCORE2_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE2_DMMU3:
 +              *mmu_base = mmDCORE2_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU0:
 +              *mmu_base = mmDCORE3_HMMU0_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU1:
 +              *mmu_base = mmDCORE3_HMMU1_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU2:
 +              *mmu_base = mmDCORE3_HMMU2_MMU_BASE;
 +              break;
 +      case HW_CAP_DCORE3_DMMU3:
 +              *mmu_base = mmDCORE3_HMMU3_MMU_BASE;
 +              break;
 +      case HW_CAP_PMMU:
 +              *mmu_base = mmPMMU_HBW_MMU_BASE;
 +              break;
 +      default:
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +static void gaudi2_ack_mmu_error(struct hl_device *hdev, u64 mmu_id)
 +{
 +      bool is_pmmu = (mmu_id == HW_CAP_PMMU);
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +      u32 mmu_base;
 +
 +      if (!(gaudi2->hw_cap_initialized & mmu_id))
 +              return;
 +
 +      if (gaudi2_get_mmu_base(hdev, mmu_id, &mmu_base))
 +              return;
 +
 +      gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, NULL);
 +      gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
 +}
 +
 +static int gaudi2_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      u32 i, mmu_id, num_of_hmmus = NUM_OF_HMMU_PER_DCORE * NUM_OF_DCORES;
 +
 +      /* check all HMMUs */
 +      for (i = 0 ; i < num_of_hmmus ; i++) {
 +              mmu_id = HW_CAP_DCORE0_DMMU0 << i;
 +
 +              if (mmu_cap_mask & mmu_id)
 +                      gaudi2_ack_mmu_error(hdev, mmu_id);
 +      }
 +
 +      /* check PMMU */
 +      if (mmu_cap_mask & HW_CAP_PMMU)
 +              gaudi2_ack_mmu_error(hdev, HW_CAP_PMMU);
 +
 +      return 0;
 +}
 +
 +static void gaudi2_get_msi_info(__le32 *table)
 +{
 +      table[CPUCP_EVENT_QUEUE_MSI_TYPE] = cpu_to_le32(GAUDI2_EVENT_QUEUE_MSIX_IDX);
 +}
 +
 +static int gaudi2_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GAUDI2_CPU_PLL: return CPU_PLL;
 +      case HL_GAUDI2_PCI_PLL: return PCI_PLL;
 +      case HL_GAUDI2_NIC_PLL: return NIC_PLL;
 +      case HL_GAUDI2_DMA_PLL: return DMA_PLL;
 +      case HL_GAUDI2_MESH_PLL: return MESH_PLL;
 +      case HL_GAUDI2_MME_PLL: return MME_PLL;
 +      case HL_GAUDI2_TPC_PLL: return TPC_PLL;
 +      case HL_GAUDI2_IF_PLL: return IF_PLL;
 +      case HL_GAUDI2_SRAM_PLL: return SRAM_PLL;
 +      case HL_GAUDI2_HBM_PLL: return HBM_PLL;
 +      case HL_GAUDI2_VID_PLL: return VID_PLL;
 +      case HL_GAUDI2_MSS_PLL: return MSS_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int gaudi2_gen_sync_to_engine_map(struct hl_device *hdev, struct hl_sync_to_engine_map *map)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int gaudi2_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int gaudi2_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev, struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static int gaudi2_print_fences_single_engine(struct hl_device *hdev, u64 base_offset,
 +                              u64 status_base_offset, enum hl_sync_engine_type engine_type,
 +                              u32 engine_id, char **buf, size_t *size, size_t *offset)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs gaudi2_state_dump_funcs = {
 +      .monitor_valid = gaudi2_monitor_valid,
 +      .print_single_monitor = gaudi2_print_single_monitor,
 +      .gen_sync_to_engine_map = gaudi2_gen_sync_to_engine_map,
 +      .print_fences_single_engine = gaudi2_print_fences_single_engine,
 +};
 +
 +static void gaudi2_state_dump_init(struct hl_device *hdev)
 +{
 +      /* Not implemented */
 +      hdev->state_dump_specs.props = gaudi2_state_dump_specs_props;
 +      hdev->state_dump_specs.funcs = gaudi2_state_dump_funcs;
 +}
 +
 +static u32 gaudi2_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return 0;
 +}
 +
 +static u32 *gaudi2_get_stream_master_qid_arr(void)
 +{
 +      return NULL;
 +}
 +
 +static void gaudi2_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
 +                              struct attribute_group *dev_vrm_attr_grp)
 +{
 +      hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
 +      hl_sysfs_add_dev_vrm_attr(hdev, dev_vrm_attr_grp);
 +}
 +
 +static int gaudi2_mmu_get_real_page_size(struct hl_device *hdev, struct hl_mmu_properties *mmu_prop,
 +                                      u32 page_size, u32 *real_page_size, bool is_dram_addr)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +
 +      /* for host pages the page size must be  */
 +      if (!is_dram_addr) {
 +              if (page_size % mmu_prop->page_size)
 +                      goto page_size_err;
 +
 +              *real_page_size = mmu_prop->page_size;
 +              return 0;
 +      }
 +
 +      if ((page_size % prop->dram_page_size) || (prop->dram_page_size > mmu_prop->page_size))
 +              goto page_size_err;
 +
 +      /*
 +       * MMU page size is different from DRAM page size (more precisely, DMMU page is greater
 +       * than DRAM page size).
 +       * for this reason work with the DRAM page size and let the MMU scrambling routine handle
 +       * this mismatch when calculating the address to place in the MMU page table.
 +       * (in that case also make sure that the dram_page_size is not greater than the
 +       * mmu page size)
 +       */
 +      *real_page_size = prop->dram_page_size;
 +
 +      return 0;
 +
 +page_size_err:
 +      dev_err(hdev->dev, "page size of %u is not %uKB aligned, can't map\n",
 +                                                      page_size, mmu_prop->page_size >> 10);
 +      return -EFAULT;
 +}
 +
 +static int gaudi2_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +int gaudi2_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      struct gaudi2_device *gaudi2 = hdev->asic_specific;
 +
 +      if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_device_activity(hdev, open);
 +}
 +
 +static const struct hl_asic_funcs gaudi2_funcs = {
 +      .early_init = gaudi2_early_init,
 +      .early_fini = gaudi2_early_fini,
 +      .late_init = gaudi2_late_init,
 +      .late_fini = gaudi2_late_fini,
 +      .sw_init = gaudi2_sw_init,
 +      .sw_fini = gaudi2_sw_fini,
 +      .hw_init = gaudi2_hw_init,
 +      .hw_fini = gaudi2_hw_fini,
 +      .halt_engines = gaudi2_halt_engines,
 +      .suspend = gaudi2_suspend,
 +      .resume = gaudi2_resume,
 +      .mmap = gaudi2_mmap,
 +      .ring_doorbell = gaudi2_ring_doorbell,
 +      .pqe_write = gaudi2_pqe_write,
 +      .asic_dma_alloc_coherent = gaudi2_dma_alloc_coherent,
 +      .asic_dma_free_coherent = gaudi2_dma_free_coherent,
 +      .scrub_device_mem = gaudi2_scrub_device_mem,
 +      .scrub_device_dram = gaudi2_scrub_device_dram,
 +      .get_int_queue_base = NULL,
 +      .test_queues = gaudi2_test_queues,
 +      .asic_dma_pool_zalloc = gaudi2_dma_pool_zalloc,
 +      .asic_dma_pool_free = gaudi2_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = gaudi2_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = gaudi2_cpu_accessible_dma_pool_free,
 +      .asic_dma_unmap_single = gaudi2_dma_unmap_single,
 +      .asic_dma_map_single = gaudi2_dma_map_single,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = gaudi2_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = NULL,
 +      .update_eq_ci = gaudi2_update_eq_ci,
 +      .context_switch = gaudi2_context_switch,
 +      .restore_phase_topology = gaudi2_restore_phase_topology,
 +      .debugfs_read_dma = gaudi2_debugfs_read_dma,
 +      .add_device_attr = gaudi2_add_device_attr,
 +      .handle_eqe = gaudi2_handle_eqe,
 +      .get_events_stat = gaudi2_get_events_stat,
 +      .read_pte = NULL,
 +      .write_pte = NULL,
 +      .mmu_invalidate_cache = gaudi2_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = gaudi2_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = gaudi2_send_heartbeat,
 +      .debug_coresight = gaudi2_debug_coresight,
 +      .is_device_idle = gaudi2_is_device_idle,
 +      .compute_reset_late_init = gaudi2_compute_reset_late_init,
 +      .hw_queues_lock = gaudi2_hw_queues_lock,
 +      .hw_queues_unlock = gaudi2_hw_queues_unlock,
 +      .get_pci_id = gaudi2_get_pci_id,
 +      .get_eeprom_data = gaudi2_get_eeprom_data,
 +      .get_monitor_dump = gaudi2_get_monitor_dump,
 +      .send_cpu_message = gaudi2_send_cpu_message,
 +      .pci_bars_map = gaudi2_pci_bars_map,
 +      .init_iatu = gaudi2_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = gaudi2_halt_coresight,
 +      .ctx_init = gaudi2_ctx_init,
 +      .ctx_fini = gaudi2_ctx_fini,
 +      .pre_schedule_cs = gaudi2_pre_schedule_cs,
 +      .get_queue_id_for_cq = gaudi2_get_queue_id_for_cq,
 +      .load_firmware_to_device = NULL,
 +      .load_boot_fit_to_device = NULL,
 +      .get_signal_cb_size = gaudi2_get_signal_cb_size,
 +      .get_wait_cb_size = gaudi2_get_wait_cb_size,
 +      .gen_signal_cb = gaudi2_gen_signal_cb,
 +      .gen_wait_cb = gaudi2_gen_wait_cb,
 +      .reset_sob = gaudi2_reset_sob,
 +      .reset_sob_group = gaudi2_reset_sob_group,
 +      .get_device_time = gaudi2_get_device_time,
 +      .pb_print_security_errors = gaudi2_pb_print_security_errors,
 +      .collective_wait_init_cs = gaudi2_collective_wait_init_cs,
 +      .collective_wait_create_jobs = gaudi2_collective_wait_create_jobs,
 +      .get_dec_base_addr = gaudi2_get_dec_base_addr,
 +      .scramble_addr = gaudi2_mmu_scramble_addr,
 +      .descramble_addr = gaudi2_mmu_descramble_addr,
 +      .ack_protection_bits_errors = gaudi2_ack_protection_bits_errors,
 +      .get_hw_block_id = gaudi2_get_hw_block_id,
 +      .hw_block_mmap = gaudi2_block_mmap,
 +      .enable_events_from_fw = gaudi2_enable_events_from_fw,
 +      .ack_mmu_errors = gaudi2_ack_mmu_page_fault_or_access_error,
 +      .get_msi_info = gaudi2_get_msi_info,
 +      .map_pll_idx_to_fw_idx = gaudi2_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = gaudi2_init_firmware_preload_params,
 +      .init_firmware_loader = gaudi2_init_firmware_loader,
 +      .init_cpu_scrambler_dram = gaudi2_init_scrambler_hbm,
 +      .state_dump_init = gaudi2_state_dump_init,
 +      .get_sob_addr = &gaudi2_get_sob_addr,
 +      .set_pci_memory_regions = gaudi2_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = gaudi2_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = gaudi2_check_if_razwi_happened,
 +      .mmu_get_real_page_size = gaudi2_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = gaudi2_set_hbm_bar_base,
 +      .set_engine_cores = gaudi2_set_engine_cores,
 +      .send_device_activity = gaudi2_send_device_activity,
 +      .set_dram_properties = gaudi2_set_dram_properties,
 +      .set_binning_masks = gaudi2_set_binning_masks,
 +};
 +
 +void gaudi2_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &gaudi2_funcs;
 +}
index 2b135e856607da737187cea7d796f56de5b6413a,0000000000000000000000000000000000000000..df65e9bdc18aa945b5fa7dcabdd2a7ec6ea7857e
mode 100644,000000..100644
--- /dev/null
@@@ -1,5544 -1,0 +1,5544 @@@
-       vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
-                       VM_DONTCOPY | VM_NORESERVE;
 +// SPDX-License-Identifier: GPL-2.0
 +
 +/*
 + * Copyright 2016-2022 HabanaLabs, Ltd.
 + * All Rights Reserved.
 + */
 +
 +#include "goyaP.h"
 +#include "../include/hw_ip/mmu/mmu_general.h"
 +#include "../include/hw_ip/mmu/mmu_v1_0.h"
 +#include "../include/goya/asic_reg/goya_masks.h"
 +#include "../include/goya/goya_reg_map.h"
 +
 +#include <linux/pci.h>
 +#include <linux/hwmon.h>
 +#include <linux/iommu.h>
 +#include <linux/seq_file.h>
 +
 +/*
 + * GOYA security scheme:
 + *
 + * 1. Host is protected by:
 + *        - Range registers (When MMU is enabled, DMA RR does NOT protect host)
 + *        - MMU
 + *
 + * 2. DRAM is protected by:
 + *        - Range registers (protect the first 512MB)
 + *        - MMU (isolation between users)
 + *
 + * 3. Configuration is protected by:
 + *        - Range registers
 + *        - Protection bits
 + *
 + * When MMU is disabled:
 + *
 + * QMAN DMA: PQ, CQ, CP, DMA are secured.
 + * PQ, CB and the data are on the host.
 + *
 + * QMAN TPC/MME:
 + * PQ, CQ and CP are not secured.
 + * PQ, CB and the data are on the SRAM/DRAM.
 + *
 + * Since QMAN DMA is secured, the driver is parsing the DMA CB:
 + *     - checks DMA pointer
 + *     - WREG, MSG_PROT are not allowed.
 + *     - MSG_LONG/SHORT are allowed.
 + *
 + * A read/write transaction by the QMAN to a protected area will succeed if
 + * and only if the QMAN's CP is secured and MSG_PROT is used
 + *
 + *
 + * When MMU is enabled:
 + *
 + * QMAN DMA: PQ, CQ and CP are secured.
 + * MMU is set to bypass on the Secure props register of the QMAN.
 + * The reasons we don't enable MMU for PQ, CQ and CP are:
 + *     - PQ entry is in kernel address space and the driver doesn't map it.
 + *     - CP writes to MSIX register and to kernel address space (completion
 + *       queue).
 + *
 + * DMA is not secured but because CP is secured, the driver still needs to parse
 + * the CB, but doesn't need to check the DMA addresses.
 + *
 + * For QMAN DMA 0, DMA is also secured because only the driver uses this DMA and
 + * the driver doesn't map memory in MMU.
 + *
 + * QMAN TPC/MME: PQ, CQ and CP aren't secured (no change from MMU disabled mode)
 + *
 + * DMA RR does NOT protect host because DMA is not secured
 + *
 + */
 +
 +#define GOYA_BOOT_FIT_FILE    "habanalabs/goya/goya-boot-fit.itb"
 +#define GOYA_LINUX_FW_FILE    "habanalabs/goya/goya-fit.itb"
 +
 +#define GOYA_MMU_REGS_NUM             63
 +
 +#define GOYA_DMA_POOL_BLK_SIZE                0x100           /* 256 bytes */
 +
 +#define GOYA_RESET_TIMEOUT_MSEC               500             /* 500ms */
 +#define GOYA_PLDM_RESET_TIMEOUT_MSEC  20000           /* 20s */
 +#define GOYA_RESET_WAIT_MSEC          1               /* 1ms */
 +#define GOYA_CPU_RESET_WAIT_MSEC      100             /* 100ms */
 +#define GOYA_PLDM_RESET_WAIT_MSEC     1000            /* 1s */
 +#define GOYA_TEST_QUEUE_WAIT_USEC     100000          /* 100ms */
 +#define GOYA_PLDM_MMU_TIMEOUT_USEC    (MMU_CONFIG_TIMEOUT_USEC * 100)
 +#define GOYA_PLDM_QMAN0_TIMEOUT_USEC  (HL_DEVICE_TIMEOUT_USEC * 30)
 +#define GOYA_BOOT_FIT_REQ_TIMEOUT_USEC        1000000         /* 1s */
 +#define GOYA_MSG_TO_CPU_TIMEOUT_USEC  4000000         /* 4s */
 +#define GOYA_WAIT_FOR_BL_TIMEOUT_USEC 15000000        /* 15s */
 +
 +#define GOYA_QMAN0_FENCE_VAL          0xD169B243
 +
 +#define GOYA_MAX_STRING_LEN           20
 +
 +#define GOYA_CB_POOL_CB_CNT           512
 +#define GOYA_CB_POOL_CB_SIZE          0x20000         /* 128KB */
 +
 +#define IS_QM_IDLE(engine, qm_glbl_sts0) \
 +      (((qm_glbl_sts0) & engine##_QM_IDLE_MASK) == engine##_QM_IDLE_MASK)
 +#define IS_DMA_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(DMA, qm_glbl_sts0)
 +#define IS_TPC_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(TPC, qm_glbl_sts0)
 +#define IS_MME_QM_IDLE(qm_glbl_sts0)  IS_QM_IDLE(MME, qm_glbl_sts0)
 +
 +#define IS_CMDQ_IDLE(engine, cmdq_glbl_sts0) \
 +      (((cmdq_glbl_sts0) & engine##_CMDQ_IDLE_MASK) == \
 +                      engine##_CMDQ_IDLE_MASK)
 +#define IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) \
 +      IS_CMDQ_IDLE(TPC, cmdq_glbl_sts0)
 +#define IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) \
 +      IS_CMDQ_IDLE(MME, cmdq_glbl_sts0)
 +
 +#define IS_DMA_IDLE(dma_core_sts0) \
 +      !((dma_core_sts0) & DMA_CH_0_STS0_DMA_BUSY_MASK)
 +
 +#define IS_TPC_IDLE(tpc_cfg_sts) \
 +      (((tpc_cfg_sts) & TPC_CFG_IDLE_MASK) == TPC_CFG_IDLE_MASK)
 +
 +#define IS_MME_IDLE(mme_arch_sts) \
 +      (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
 +
 +static const char goya_irq_name[GOYA_MSIX_ENTRIES][GOYA_MAX_STRING_LEN] = {
 +              "goya cq 0", "goya cq 1", "goya cq 2", "goya cq 3",
 +              "goya cq 4", "goya cpu eq"
 +};
 +
 +static u16 goya_packet_sizes[MAX_PACKET_ID] = {
 +      [PACKET_WREG_32]        = sizeof(struct packet_wreg32),
 +      [PACKET_WREG_BULK]      = sizeof(struct packet_wreg_bulk),
 +      [PACKET_MSG_LONG]       = sizeof(struct packet_msg_long),
 +      [PACKET_MSG_SHORT]      = sizeof(struct packet_msg_short),
 +      [PACKET_CP_DMA]         = sizeof(struct packet_cp_dma),
 +      [PACKET_MSG_PROT]       = sizeof(struct packet_msg_prot),
 +      [PACKET_FENCE]          = sizeof(struct packet_fence),
 +      [PACKET_LIN_DMA]        = sizeof(struct packet_lin_dma),
 +      [PACKET_NOP]            = sizeof(struct packet_nop),
 +      [PACKET_STOP]           = sizeof(struct packet_stop)
 +};
 +
 +static inline bool validate_packet_id(enum packet_id id)
 +{
 +      switch (id) {
 +      case PACKET_WREG_32:
 +      case PACKET_WREG_BULK:
 +      case PACKET_MSG_LONG:
 +      case PACKET_MSG_SHORT:
 +      case PACKET_CP_DMA:
 +      case PACKET_MSG_PROT:
 +      case PACKET_FENCE:
 +      case PACKET_LIN_DMA:
 +      case PACKET_NOP:
 +      case PACKET_STOP:
 +              return true;
 +      default:
 +              return false;
 +      }
 +}
 +
 +static u64 goya_mmu_regs[GOYA_MMU_REGS_NUM] = {
 +      mmDMA_QM_0_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_1_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_2_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_3_GLBL_NON_SECURE_PROPS,
 +      mmDMA_QM_4_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_QM_GLBL_SECURE_PROPS,
 +      mmTPC0_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC0_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC0_CFG_ARUSER,
 +      mmTPC0_CFG_AWUSER,
 +      mmTPC1_QM_GLBL_SECURE_PROPS,
 +      mmTPC1_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC1_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC1_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC1_CFG_ARUSER,
 +      mmTPC1_CFG_AWUSER,
 +      mmTPC2_QM_GLBL_SECURE_PROPS,
 +      mmTPC2_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC2_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC2_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC2_CFG_ARUSER,
 +      mmTPC2_CFG_AWUSER,
 +      mmTPC3_QM_GLBL_SECURE_PROPS,
 +      mmTPC3_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC3_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC3_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC3_CFG_ARUSER,
 +      mmTPC3_CFG_AWUSER,
 +      mmTPC4_QM_GLBL_SECURE_PROPS,
 +      mmTPC4_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC4_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC4_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC4_CFG_ARUSER,
 +      mmTPC4_CFG_AWUSER,
 +      mmTPC5_QM_GLBL_SECURE_PROPS,
 +      mmTPC5_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC5_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC5_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC5_CFG_ARUSER,
 +      mmTPC5_CFG_AWUSER,
 +      mmTPC6_QM_GLBL_SECURE_PROPS,
 +      mmTPC6_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC6_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC6_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC6_CFG_ARUSER,
 +      mmTPC6_CFG_AWUSER,
 +      mmTPC7_QM_GLBL_SECURE_PROPS,
 +      mmTPC7_QM_GLBL_NON_SECURE_PROPS,
 +      mmTPC7_CMDQ_GLBL_SECURE_PROPS,
 +      mmTPC7_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmTPC7_CFG_ARUSER,
 +      mmTPC7_CFG_AWUSER,
 +      mmMME_QM_GLBL_SECURE_PROPS,
 +      mmMME_QM_GLBL_NON_SECURE_PROPS,
 +      mmMME_CMDQ_GLBL_SECURE_PROPS,
 +      mmMME_CMDQ_GLBL_NON_SECURE_PROPS,
 +      mmMME_SBA_CONTROL_DATA,
 +      mmMME_SBB_CONTROL_DATA,
 +      mmMME_SBC_CONTROL_DATA,
 +      mmMME_WBC_CONTROL_DATA,
 +      mmPCIE_WRAP_PSOC_ARUSER,
 +      mmPCIE_WRAP_PSOC_AWUSER
 +};
 +
 +static u32 goya_all_events[] = {
 +      GOYA_ASYNC_EVENT_ID_PCIE_IF,
 +      GOYA_ASYNC_EVENT_ID_TPC0_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC1_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC2_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC3_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC4_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC5_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC6_ECC,
 +      GOYA_ASYNC_EVENT_ID_TPC7_ECC,
 +      GOYA_ASYNC_EVENT_ID_MME_ECC,
 +      GOYA_ASYNC_EVENT_ID_MME_ECC_EXT,
 +      GOYA_ASYNC_EVENT_ID_MMU_ECC,
 +      GOYA_ASYNC_EVENT_ID_DMA_MACRO,
 +      GOYA_ASYNC_EVENT_ID_DMA_ECC,
 +      GOYA_ASYNC_EVENT_ID_CPU_IF_ECC,
 +      GOYA_ASYNC_EVENT_ID_PSOC_MEM,
 +      GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT,
 +      GOYA_ASYNC_EVENT_ID_SRAM0,
 +      GOYA_ASYNC_EVENT_ID_SRAM1,
 +      GOYA_ASYNC_EVENT_ID_SRAM2,
 +      GOYA_ASYNC_EVENT_ID_SRAM3,
 +      GOYA_ASYNC_EVENT_ID_SRAM4,
 +      GOYA_ASYNC_EVENT_ID_SRAM5,
 +      GOYA_ASYNC_EVENT_ID_SRAM6,
 +      GOYA_ASYNC_EVENT_ID_SRAM7,
 +      GOYA_ASYNC_EVENT_ID_SRAM8,
 +      GOYA_ASYNC_EVENT_ID_SRAM9,
 +      GOYA_ASYNC_EVENT_ID_SRAM10,
 +      GOYA_ASYNC_EVENT_ID_SRAM11,
 +      GOYA_ASYNC_EVENT_ID_SRAM12,
 +      GOYA_ASYNC_EVENT_ID_SRAM13,
 +      GOYA_ASYNC_EVENT_ID_SRAM14,
 +      GOYA_ASYNC_EVENT_ID_SRAM15,
 +      GOYA_ASYNC_EVENT_ID_SRAM16,
 +      GOYA_ASYNC_EVENT_ID_SRAM17,
 +      GOYA_ASYNC_EVENT_ID_SRAM18,
 +      GOYA_ASYNC_EVENT_ID_SRAM19,
 +      GOYA_ASYNC_EVENT_ID_SRAM20,
 +      GOYA_ASYNC_EVENT_ID_SRAM21,
 +      GOYA_ASYNC_EVENT_ID_SRAM22,
 +      GOYA_ASYNC_EVENT_ID_SRAM23,
 +      GOYA_ASYNC_EVENT_ID_SRAM24,
 +      GOYA_ASYNC_EVENT_ID_SRAM25,
 +      GOYA_ASYNC_EVENT_ID_SRAM26,
 +      GOYA_ASYNC_EVENT_ID_SRAM27,
 +      GOYA_ASYNC_EVENT_ID_SRAM28,
 +      GOYA_ASYNC_EVENT_ID_SRAM29,
 +      GOYA_ASYNC_EVENT_ID_GIC500,
 +      GOYA_ASYNC_EVENT_ID_PLL0,
 +      GOYA_ASYNC_EVENT_ID_PLL1,
 +      GOYA_ASYNC_EVENT_ID_PLL3,
 +      GOYA_ASYNC_EVENT_ID_PLL4,
 +      GOYA_ASYNC_EVENT_ID_PLL5,
 +      GOYA_ASYNC_EVENT_ID_PLL6,
 +      GOYA_ASYNC_EVENT_ID_AXI_ECC,
 +      GOYA_ASYNC_EVENT_ID_L2_RAM_ECC,
 +      GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET,
 +      GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT,
 +      GOYA_ASYNC_EVENT_ID_PCIE_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC0_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC1_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC2_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC3_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC4_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC5_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC6_DEC,
 +      GOYA_ASYNC_EVENT_ID_TPC7_DEC,
 +      GOYA_ASYNC_EVENT_ID_MME_WACS,
 +      GOYA_ASYNC_EVENT_ID_MME_WACSD,
 +      GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER,
 +      GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC,
 +      GOYA_ASYNC_EVENT_ID_PSOC,
 +      GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR,
 +      GOYA_ASYNC_EVENT_ID_TPC0_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC1_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC2_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC3_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC4_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC5_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC6_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC7_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_TPC0_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC1_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC2_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC3_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC4_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC5_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC6_QM,
 +      GOYA_ASYNC_EVENT_ID_TPC7_QM,
 +      GOYA_ASYNC_EVENT_ID_MME_QM,
 +      GOYA_ASYNC_EVENT_ID_MME_CMDQ,
 +      GOYA_ASYNC_EVENT_ID_DMA0_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA1_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA2_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA3_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA4_QM,
 +      GOYA_ASYNC_EVENT_ID_DMA0_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA1_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA2_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA3_CH,
 +      GOYA_ASYNC_EVENT_ID_DMA4_CH,
 +      GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH0,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH1,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH2,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH3,
 +      GOYA_ASYNC_EVENT_ID_DMA_BM_CH4,
 +      GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S,
 +      GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E,
 +      GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S,
 +      GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E
 +};
 +
 +static s64 goya_state_dump_specs_props[SP_MAX] = {0};
 +
 +static int goya_mmu_clear_pgt_range(struct hl_device *hdev);
 +static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
 +static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev);
 +static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
 +
 +int goya_set_fixed_properties(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int i;
 +
 +      prop->max_queues = GOYA_QUEUE_ID_SIZE;
 +      prop->hw_queues_props = kcalloc(prop->max_queues,
 +                      sizeof(struct hw_queue_properties),
 +                      GFP_KERNEL);
 +
 +      if (!prop->hw_queues_props)
 +              return -ENOMEM;
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 +              prop->hw_queues_props[i].driver_only = 0;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
 +      }
 +
 +      for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES ; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
 +              prop->hw_queues_props[i].driver_only = 1;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
 +      }
 +
 +      for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES +
 +                      NUMBER_OF_INT_HW_QUEUES; i++) {
 +              prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
 +              prop->hw_queues_props[i].driver_only = 0;
 +              prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_USER;
 +      }
 +
 +      prop->cfg_base_address = CFG_BASE;
 +      prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
 +      prop->host_base_address = HOST_PHYS_BASE;
 +      prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
 +      prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 +      prop->completion_mode = HL_COMPLETION_MODE_JOB;
 +      prop->dram_base_address = DRAM_PHYS_BASE;
 +      prop->dram_size = DRAM_PHYS_DEFAULT_SIZE;
 +      prop->dram_end_address = prop->dram_base_address + prop->dram_size;
 +      prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
 +
 +      prop->sram_base_address = SRAM_BASE_ADDR;
 +      prop->sram_size = SRAM_SIZE;
 +      prop->sram_end_address = prop->sram_base_address + prop->sram_size;
 +      prop->sram_user_base_address = prop->sram_base_address +
 +                                              SRAM_USER_BASE_OFFSET;
 +
 +      prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
 +      prop->mmu_dram_default_page_addr = MMU_DRAM_DEFAULT_PAGE_ADDR;
 +      if (hdev->pldm)
 +              prop->mmu_pgt_size = 0x800000; /* 8MB */
 +      else
 +              prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
 +      prop->mmu_pte_size = HL_PTE_SIZE;
 +      prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
 +      prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
 +      prop->dram_page_size = PAGE_SIZE_2MB;
 +      prop->device_mem_alloc_default_page_size = prop->dram_page_size;
 +      prop->dram_supports_virtual_memory = true;
 +
 +      prop->dmmu.hop_shifts[MMU_HOP0] = MMU_V1_0_HOP0_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP1] = MMU_V1_0_HOP1_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP2] = MMU_V1_0_HOP2_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP3] = MMU_V1_0_HOP3_SHIFT;
 +      prop->dmmu.hop_shifts[MMU_HOP4] = MMU_V1_0_HOP4_SHIFT;
 +      prop->dmmu.hop_masks[MMU_HOP0] = MMU_V1_0_HOP0_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP1] = MMU_V1_0_HOP1_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP2] = MMU_V1_0_HOP2_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP3] = MMU_V1_0_HOP3_MASK;
 +      prop->dmmu.hop_masks[MMU_HOP4] = MMU_V1_0_HOP4_MASK;
 +      prop->dmmu.start_addr = VA_DDR_SPACE_START;
 +      prop->dmmu.end_addr = VA_DDR_SPACE_END;
 +      prop->dmmu.page_size = PAGE_SIZE_2MB;
 +      prop->dmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->dmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* shifts and masks are the same in PMMU and DMMU */
 +      memcpy(&prop->pmmu, &prop->dmmu, sizeof(prop->dmmu));
 +      prop->pmmu.start_addr = VA_HOST_SPACE_START;
 +      prop->pmmu.end_addr = VA_HOST_SPACE_END;
 +      prop->pmmu.page_size = PAGE_SIZE_4KB;
 +      prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
 +      prop->pmmu.last_mask = LAST_MASK;
 +      /* TODO: will be duplicated until implementing per-MMU props */
 +      prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
 +      prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
 +
 +      /* PMMU and HPMMU are the same except of page size */
 +      memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
 +      prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
 +
 +      prop->dram_size_for_default_page_mapping = VA_DDR_SPACE_END;
 +      prop->cfg_size = CFG_SIZE;
 +      prop->max_asid = MAX_ASID;
 +      prop->num_of_events = GOYA_ASYNC_EVENT_ID_SIZE;
 +      prop->high_pll = PLL_HIGH_DEFAULT;
 +      prop->cb_pool_cb_cnt = GOYA_CB_POOL_CB_CNT;
 +      prop->cb_pool_cb_size = GOYA_CB_POOL_CB_SIZE;
 +      prop->max_power_default = MAX_POWER_DEFAULT;
 +      prop->dc_power_default = DC_POWER_DEFAULT;
 +      prop->tpc_enabled_mask = TPC_ENABLED_MASK;
 +      prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
 +      prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
 +
 +      strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
 +              CARD_NAME_MAX_LEN);
 +
 +      prop->max_pending_cs = GOYA_MAX_PENDING_CS;
 +
 +      prop->first_available_user_interrupt = USHRT_MAX;
 +
 +      for (i = 0 ; i < HL_MAX_DCORES ; i++)
 +              prop->first_available_cq[i] = USHRT_MAX;
 +
 +      prop->fw_cpu_boot_dev_sts0_valid = false;
 +      prop->fw_cpu_boot_dev_sts1_valid = false;
 +      prop->hard_reset_done_by_fw = false;
 +      prop->gic_interrupts_enable = true;
 +
 +      prop->server_type = HL_SERVER_TYPE_UNKNOWN;
 +
 +      prop->clk_pll_index = HL_GOYA_MME_PLL;
 +
 +      prop->use_get_power_for_reset_history = true;
 +
 +      prop->configurable_stop_on_err = true;
 +
 +      prop->set_max_power_on_device_init = true;
 +
 +      prop->dma_mask = 48;
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_pci_bars_map - Map PCI BARS of Goya device
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Request PCI regions and map them to kernel virtual addresses.
 + * Returns 0 on success
 + *
 + */
 +static int goya_pci_bars_map(struct hl_device *hdev)
 +{
 +      static const char * const name[] = {"SRAM_CFG", "MSIX", "DDR"};
 +      bool is_wc[3] = {false, false, true};
 +      int rc;
 +
 +      rc = hl_pci_bars_map(hdev, name, is_wc);
 +      if (rc)
 +              return rc;
 +
 +      hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] +
 +                      (CFG_BASE - SRAM_BASE_ADDR);
 +
 +      return 0;
 +}
 +
 +static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct hl_inbound_pci_region pci_region;
 +      u64 old_addr = addr;
 +      int rc;
 +
 +      if ((goya) && (goya->ddr_bar_cur_addr == addr))
 +              return old_addr;
 +
 +      /* Inbound Region 1 - Bar 4 - Point to DDR */
 +      pci_region.mode = PCI_BAR_MATCH_MODE;
 +      pci_region.bar = DDR_BAR_ID;
 +      pci_region.addr = addr;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &pci_region);
 +      if (rc)
 +              return U64_MAX;
 +
 +      if (goya) {
 +              old_addr = goya->ddr_bar_cur_addr;
 +              goya->ddr_bar_cur_addr = addr;
 +      }
 +
 +      return old_addr;
 +}
 +
 +/*
 + * goya_init_iatu - Initialize the iATU unit inside the PCI controller
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * This is needed in case the firmware doesn't initialize the iATU
 + *
 + */
 +static int goya_init_iatu(struct hl_device *hdev)
 +{
 +      struct hl_inbound_pci_region inbound_region;
 +      struct hl_outbound_pci_region outbound_region;
 +      int rc;
 +
 +      if (hdev->asic_prop.iatu_done_by_fw)
 +              return 0;
 +
 +      /* Inbound Region 0 - Bar 0 - Point to SRAM and CFG */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = SRAM_CFG_BAR_ID;
 +      inbound_region.addr = SRAM_BASE_ADDR;
 +      rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Inbound Region 1 - Bar 4 - Point to DDR */
 +      inbound_region.mode = PCI_BAR_MATCH_MODE;
 +      inbound_region.bar = DDR_BAR_ID;
 +      inbound_region.addr = DRAM_PHYS_BASE;
 +      rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
 +      if (rc)
 +              goto done;
 +
 +      /* Outbound Region 0 - Point to Host  */
 +      outbound_region.addr = HOST_PHYS_BASE;
 +      outbound_region.size = HOST_PHYS_SIZE;
 +      rc = hl_pci_set_outbound_region(hdev, &outbound_region);
 +
 +done:
 +      return rc;
 +}
 +
 +static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
 +{
 +      return RREG32(mmHW_STATE);
 +}
 +
 +/*
 + * goya_early_init - GOYA early initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Verify PCI bars
 + * Set DMA masks
 + * PCI controller initialization
 + * Map PCI bars
 + *
 + */
 +static int goya_early_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_dev *pdev = hdev->pdev;
 +      resource_size_t pci_bar_size;
 +      u32 fw_boot_status, val;
 +      int rc;
 +
 +      rc = goya_set_fixed_properties(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get fixed properties\n");
 +              return rc;
 +      }
 +
 +      /* Check BAR sizes */
 +      pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
 +
 +      if (pci_bar_size != CFG_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
 +
 +      if (pci_bar_size != MSIX_BAR_SIZE) {
 +              dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
 +                      MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
 +              rc = -ENODEV;
 +              goto free_queue_props;
 +      }
 +
 +      prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
 +      hdev->dram_pci_bar_start = pci_resource_start(pdev, DDR_BAR_ID);
 +
 +      /* If FW security is enabled at this point it means no access to ELBI */
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +              goto pci_init;
 +      }
 +
 +      rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
 +                              &fw_boot_status);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Check whether FW is configuring iATU */
 +      if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
 +                      (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
 +              hdev->asic_prop.iatu_done_by_fw = true;
 +
 +pci_init:
 +      rc = hl_pci_init(hdev);
 +      if (rc)
 +              goto free_queue_props;
 +
 +      /* Before continuing in the initialization, we need to read the preboot
 +       * version to determine whether we run with a security-enabled firmware
 +       */
 +      rc = hl_fw_read_preboot_status(hdev);
 +      if (rc) {
 +              if (hdev->reset_on_preboot_fail)
 +                      hdev->asic_funcs->hw_fini(hdev, true, false);
 +              goto pci_fini;
 +      }
 +
 +      if (goya_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
 +              dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
 +              hdev->asic_funcs->hw_fini(hdev, true, false);
 +      }
 +
 +      if (!hdev->pldm) {
 +              val = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
 +              if (val & PSOC_GLOBAL_CONF_BOOT_STRAP_PINS_SRIOV_EN_MASK)
 +                      dev_warn(hdev->dev,
 +                              "PCI strap is not configured correctly, PCI bus errors may occur\n");
 +      }
 +
 +      return 0;
 +
 +pci_fini:
 +      hl_pci_fini(hdev);
 +free_queue_props:
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      return rc;
 +}
 +
 +/*
 + * goya_early_fini - GOYA early finalization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Unmap PCI bars
 + *
 + */
 +static int goya_early_fini(struct hl_device *hdev)
 +{
 +      kfree(hdev->asic_prop.hw_queues_props);
 +      hl_pci_fini(hdev);
 +
 +      return 0;
 +}
 +
 +static void goya_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
 +{
 +      /* mask to zero the MMBP and ASID bits */
 +      WREG32_AND(reg, ~0x7FF);
 +      WREG32_OR(reg, asid);
 +}
 +
 +static void goya_qman0_set_security(struct hl_device *hdev, bool secure)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (secure)
 +              WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_FULLY_TRUSTED);
 +      else
 +              WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_PARTLY_TRUSTED);
 +
 +      RREG32(mmDMA_QM_0_GLBL_PROT);
 +}
 +
 +/*
 + * goya_fetch_psoc_frequency - Fetch PSOC frequency values
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_fetch_psoc_frequency(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
 +      u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
 +      int rc;
 +
 +      if (hdev->asic_prop.fw_security_enabled) {
 +              struct goya_device *goya = hdev->asic_specific;
 +
 +              if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +                      return;
 +
 +              rc = hl_fw_cpucp_pll_info_get(hdev, HL_GOYA_PCI_PLL,
 +                              pll_freq_arr);
 +
 +              if (rc)
 +                      return;
 +
 +              freq = pll_freq_arr[1];
 +      } else {
 +              div_fctr = RREG32(mmPSOC_PCI_PLL_DIV_FACTOR_1);
 +              div_sel = RREG32(mmPSOC_PCI_PLL_DIV_SEL_1);
 +              nr = RREG32(mmPSOC_PCI_PLL_NR);
 +              nf = RREG32(mmPSOC_PCI_PLL_NF);
 +              od = RREG32(mmPSOC_PCI_PLL_OD);
 +
 +              if (div_sel == DIV_SEL_REF_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_REF) {
 +                      if (div_sel == DIV_SEL_REF_CLK)
 +                              freq = PLL_REF_CLK;
 +                      else
 +                              freq = PLL_REF_CLK / (div_fctr + 1);
 +              } else if (div_sel == DIV_SEL_PLL_CLK ||
 +                              div_sel == DIV_SEL_DIVIDED_PLL) {
 +                      pll_clk = PLL_REF_CLK * (nf + 1) /
 +                                      ((nr + 1) * (od + 1));
 +                      if (div_sel == DIV_SEL_PLL_CLK)
 +                              freq = pll_clk;
 +                      else
 +                              freq = pll_clk / (div_fctr + 1);
 +              } else {
 +                      dev_warn(hdev->dev,
 +                              "Received invalid div select value: %d",
 +                              div_sel);
 +                      freq = 0;
 +              }
 +      }
 +
 +      prop->psoc_timestamp_frequency = freq;
 +      prop->psoc_pci_pll_nr = nr;
 +      prop->psoc_pci_pll_nf = nf;
 +      prop->psoc_pci_pll_od = od;
 +      prop->psoc_pci_pll_div_factor = div_fctr;
 +}
 +
 +/*
 + * goya_set_frequency - set the frequency of the device
 + *
 + * @hdev: pointer to habanalabs device structure
 + * @freq: the new frequency value
 + *
 + * Change the frequency if needed. This function has no protection against
 + * concurrency, therefore it is assumed that the calling function has protected
 + * itself against the case of calling this function from multiple threads with
 + * different values
 + *
 + * Returns 0 if no change was done, otherwise returns 1
 + */
 +int goya_set_frequency(struct hl_device *hdev, enum hl_pll_frequency freq)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if ((goya->pm_mng_profile == PM_MANUAL) ||
 +                      (goya->curr_pll_profile == freq))
 +              return 0;
 +
 +      dev_dbg(hdev->dev, "Changing device frequency to %s\n",
 +              freq == PLL_HIGH ? "high" : "low");
 +
 +      goya_set_pll_profile(hdev, freq);
 +
 +      goya->curr_pll_profile = freq;
 +
 +      return 1;
 +}
 +
 +static void goya_set_freq_to_low_job(struct work_struct *work)
 +{
 +      struct goya_work_freq *goya_work = container_of(work,
 +                                              struct goya_work_freq,
 +                                              work_freq.work);
 +      struct hl_device *hdev = goya_work->hdev;
 +
 +      mutex_lock(&hdev->fpriv_list_lock);
 +
 +      if (!hdev->is_compute_ctx_active)
 +              goya_set_frequency(hdev, PLL_LOW);
 +
 +      mutex_unlock(&hdev->fpriv_list_lock);
 +
 +      schedule_delayed_work(&goya_work->work_freq,
 +                      usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
 +}
 +
 +int goya_late_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc;
 +
 +      goya_fetch_psoc_frequency(hdev);
 +
 +      rc = goya_mmu_clear_pgt_range(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to clear MMU page tables range %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = goya_mmu_set_dram_default_page(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to set DRAM default page %d\n", rc);
 +              return rc;
 +      }
 +
 +      rc = goya_mmu_add_mappings_for_device_cpu(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_init_cpu_queues(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_test_cpu_queue(hdev);
 +      if (rc)
 +              return rc;
 +
 +      rc = goya_cpucp_info_get(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to get cpucp info %d\n", rc);
 +              return rc;
 +      }
 +
 +      /* Now that we have the DRAM size in ASIC prop, we need to check
 +       * its size and configure the DMA_IF DDR wrap protection (which is in
 +       * the MMU block) accordingly. The value is the log2 of the DRAM size
 +       */
 +      WREG32(mmMMU_LOG2_DDR_SIZE, ilog2(prop->dram_size));
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to enable PCI access from CPU %d\n", rc);
 +              return rc;
 +      }
 +
 +      /* force setting to low frequency */
 +      goya->curr_pll_profile = PLL_LOW;
 +
 +      goya->pm_mng_profile = PM_AUTO;
 +
 +      goya_set_pll_profile(hdev, PLL_LOW);
 +
 +      schedule_delayed_work(&goya->goya_work->work_freq,
 +              usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_late_fini - GOYA late tear-down code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Free sensors allocated structures
 + */
 +void goya_late_fini(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      cancel_delayed_work_sync(&goya->goya_work->work_freq);
 +
 +      hl_hwmon_release_resources(hdev);
 +}
 +
 +static void goya_set_pci_memory_regions(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct pci_mem_region *region;
 +
 +      /* CFG */
 +      region = &hdev->pci_mem_region[PCI_REGION_CFG];
 +      region->region_base = CFG_BASE;
 +      region->region_size = CFG_SIZE;
 +      region->offset_in_bar = CFG_BASE - SRAM_BASE_ADDR;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* SRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_SRAM];
 +      region->region_base = SRAM_BASE_ADDR;
 +      region->region_size = SRAM_SIZE;
 +      region->offset_in_bar = 0;
 +      region->bar_size = CFG_BAR_SIZE;
 +      region->bar_id = SRAM_CFG_BAR_ID;
 +      region->used = 1;
 +
 +      /* DRAM */
 +      region = &hdev->pci_mem_region[PCI_REGION_DRAM];
 +      region->region_base = DRAM_PHYS_BASE;
 +      region->region_size = hdev->asic_prop.dram_size;
 +      region->offset_in_bar = 0;
 +      region->bar_size = prop->dram_pci_bar_size;
 +      region->bar_id = DDR_BAR_ID;
 +      region->used = 1;
 +}
 +
 +/*
 + * goya_sw_init - Goya software initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int goya_sw_init(struct hl_device *hdev)
 +{
 +      struct goya_device *goya;
 +      int rc;
 +
 +      /* Allocate device structure */
 +      goya = kzalloc(sizeof(*goya), GFP_KERNEL);
 +      if (!goya)
 +              return -ENOMEM;
 +
 +      /* according to goya_init_iatu */
 +      goya->ddr_bar_cur_addr = DRAM_PHYS_BASE;
 +
 +      goya->mme_clk = GOYA_PLL_FREQ_LOW;
 +      goya->tpc_clk = GOYA_PLL_FREQ_LOW;
 +      goya->ic_clk = GOYA_PLL_FREQ_LOW;
 +
 +      hdev->asic_specific = goya;
 +
 +      /* Create DMA pool for small allocations */
 +      hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
 +                      &hdev->pdev->dev, GOYA_DMA_POOL_BLK_SIZE, 8, 0);
 +      if (!hdev->dma_pool) {
 +              dev_err(hdev->dev, "failed to create DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_goya_device;
 +      }
 +
 +      hdev->cpu_accessible_dma_mem = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
 +                                                      &hdev->cpu_accessible_dma_address,
 +                                                      GFP_KERNEL | __GFP_ZERO);
 +
 +      if (!hdev->cpu_accessible_dma_mem) {
 +              rc = -ENOMEM;
 +              goto free_dma_pool;
 +      }
 +
 +      dev_dbg(hdev->dev, "cpu accessible memory at bus address %pad\n",
 +              &hdev->cpu_accessible_dma_address);
 +
 +      hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
 +      if (!hdev->cpu_accessible_dma_pool) {
 +              dev_err(hdev->dev,
 +                      "Failed to create CPU accessible DMA pool\n");
 +              rc = -ENOMEM;
 +              goto free_cpu_dma_mem;
 +      }
 +
 +      rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
 +                              (uintptr_t) hdev->cpu_accessible_dma_mem,
 +                              HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to add memory to CPU accessible DMA pool\n");
 +              rc = -EFAULT;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      spin_lock_init(&goya->hw_queues_lock);
 +      hdev->supports_coresight = true;
 +      hdev->asic_prop.supports_compute_reset = true;
 +      hdev->asic_prop.allow_inference_soft_reset = true;
 +      hdev->supports_wait_for_multi_cs = false;
 +      hdev->supports_ctx_switch = true;
 +
 +      hdev->asic_funcs->set_pci_memory_regions(hdev);
 +
 +      goya->goya_work = kmalloc(sizeof(struct goya_work_freq), GFP_KERNEL);
 +      if (!goya->goya_work) {
 +              rc = -ENOMEM;
 +              goto free_cpu_accessible_dma_pool;
 +      }
 +
 +      goya->goya_work->hdev = hdev;
 +      INIT_DELAYED_WORK(&goya->goya_work->work_freq, goya_set_freq_to_low_job);
 +
 +      return 0;
 +
 +free_cpu_accessible_dma_pool:
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +free_cpu_dma_mem:
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +free_dma_pool:
 +      dma_pool_destroy(hdev->dma_pool);
 +free_goya_device:
 +      kfree(goya);
 +
 +      return rc;
 +}
 +
 +/*
 + * goya_sw_fini - Goya software tear-down code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static int goya_sw_fini(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      gen_pool_destroy(hdev->cpu_accessible_dma_pool);
 +
 +      hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
 +                                      hdev->cpu_accessible_dma_address);
 +
 +      dma_pool_destroy(hdev->dma_pool);
 +
 +      kfree(goya->goya_work);
 +      kfree(goya);
 +
 +      return 0;
 +}
 +
 +static void goya_init_dma_qman(struct hl_device *hdev, int dma_id,
 +              dma_addr_t bus_address)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u32 reg_off = dma_id * (mmDMA_QM_1_PQ_PI - mmDMA_QM_0_PQ_PI);
 +      u32 dma_err_cfg = QMAN_DMA_ERR_MSG_EN;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmDMA_QM_0_PQ_BASE_LO + reg_off, lower_32_bits(bus_address));
 +      WREG32(mmDMA_QM_0_PQ_BASE_HI + reg_off, upper_32_bits(bus_address));
 +
 +      WREG32(mmDMA_QM_0_PQ_SIZE + reg_off, ilog2(HL_QUEUE_LENGTH));
 +      WREG32(mmDMA_QM_0_PQ_PI + reg_off, 0);
 +      WREG32(mmDMA_QM_0_PQ_CI + reg_off, 0);
 +
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +      WREG32(mmDMA_QM_0_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_DMA0_QM + dma_id);
 +
 +      /* PQ has buffer of 2 cache lines, while CQ has 8 lines */
 +      WREG32(mmDMA_QM_0_PQ_CFG1 + reg_off, 0x00020002);
 +      WREG32(mmDMA_QM_0_CQ_CFG1 + reg_off, 0x00080008);
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_PARTLY_TRUSTED);
 +      else
 +              WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_FULLY_TRUSTED);
 +
 +      if (hdev->stop_on_err)
 +              dma_err_cfg |= 1 << DMA_QM_0_GLBL_ERR_CFG_DMA_STOP_ON_ERR_SHIFT;
 +
 +      WREG32(mmDMA_QM_0_GLBL_ERR_CFG + reg_off, dma_err_cfg);
 +      WREG32(mmDMA_QM_0_GLBL_CFG0 + reg_off, QMAN_DMA_ENABLE);
 +}
 +
 +static void goya_init_dma_ch(struct hl_device *hdev, int dma_id)
 +{
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 sob_addr;
 +      u32 reg_off = dma_id * (mmDMA_CH_1_CFG1 - mmDMA_CH_0_CFG1);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmDMA_CH_0_ERRMSG_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmDMA_CH_0_ERRMSG_ADDR_HI + reg_off, gic_base_hi);
 +      WREG32(mmDMA_CH_0_ERRMSG_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_DMA0_CH + dma_id);
 +
 +      if (dma_id)
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
 +                              (dma_id - 1) * 4;
 +      else
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
 +
 +      WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr));
 +      WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x80000001);
 +}
 +
 +/*
 + * goya_init_dma_qmans - Initialize QMAN DMA registers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Initialize the H/W registers of the QMAN DMA channels
 + *
 + */
 +void goya_init_dma_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct hl_hw_queue *q;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_DMA)
 +              return;
 +
 +      q = &hdev->kernel_queues[0];
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++, q++) {
 +              q->cq_id = q->msi_vec = i;
 +              goya_init_dma_qman(hdev, i, q->bus_address);
 +              goya_init_dma_ch(hdev, i);
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_DMA;
 +}
 +
 +/*
 + * goya_disable_external_queues - Disable external queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_disable_external_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return;
 +
 +      WREG32(mmDMA_QM_0_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_1_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_2_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_3_GLBL_CFG0, 0);
 +      WREG32(mmDMA_QM_4_GLBL_CFG0, 0);
 +}
 +
 +static int goya_stop_queue(struct hl_device *hdev, u32 cfg_reg,
 +                              u32 cp_sts_reg, u32 glbl_sts0_reg)
 +{
 +      int rc;
 +      u32 status;
 +
 +      /* use the values of TPC0 as they are all the same*/
 +
 +      WREG32(cfg_reg, 1 << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
 +
 +      status = RREG32(cp_sts_reg);
 +      if (status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK) {
 +              rc = hl_poll_timeout(
 +                      hdev,
 +                      cp_sts_reg,
 +                      status,
 +                      !(status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK),
 +                      1000,
 +                      QMAN_FENCE_TIMEOUT_USEC);
 +
 +              /* if QMAN is stuck in fence no need to check for stop */
 +              if (rc)
 +                      return 0;
 +      }
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              glbl_sts0_reg,
 +              status,
 +              (status & TPC0_QM_GLBL_STS0_CP_IS_STOP_MASK),
 +              1000,
 +              QMAN_STOP_TIMEOUT_USEC);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for QMAN to stop\n");
 +              return -EINVAL;
 +      }
 +
 +      return 0;
 +}
 +
 +/*
 + * goya_stop_external_queues - Stop external queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_stop_external_queues(struct hl_device *hdev)
 +{
 +      int rc, retval = 0;
 +
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return retval;
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_0_GLBL_CFG1,
 +                      mmDMA_QM_0_CP_STS,
 +                      mmDMA_QM_0_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 0\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_1_GLBL_CFG1,
 +                      mmDMA_QM_1_CP_STS,
 +                      mmDMA_QM_1_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 1\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_2_GLBL_CFG1,
 +                      mmDMA_QM_2_CP_STS,
 +                      mmDMA_QM_2_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 2\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_3_GLBL_CFG1,
 +                      mmDMA_QM_3_CP_STS,
 +                      mmDMA_QM_3_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 3\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmDMA_QM_4_GLBL_CFG1,
 +                      mmDMA_QM_4_CP_STS,
 +                      mmDMA_QM_4_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop DMA QMAN 4\n");
 +              retval = -EIO;
 +      }
 +
 +      return retval;
 +}
 +
 +/*
 + * goya_init_cpu_queues - Initialize PQ/CQ/EQ of CPU
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +int goya_init_cpu_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct hl_eq *eq;
 +      u32 status;
 +      struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
 +      int err;
 +
 +      if (!hdev->cpu_queues_enable)
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_CPU_Q)
 +              return 0;
 +
 +      eq = &hdev->event_queue;
 +
 +      WREG32(mmCPU_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
 +      WREG32(mmCPU_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
 +
 +      WREG32(mmCPU_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
 +      WREG32(mmCPU_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
 +
 +      WREG32(mmCPU_CQ_BASE_ADDR_LOW,
 +                      lower_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 +      WREG32(mmCPU_CQ_BASE_ADDR_HIGH,
 +                      upper_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
 +
 +      WREG32(mmCPU_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
 +      WREG32(mmCPU_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
 +      WREG32(mmCPU_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
 +
 +      /* Used for EQ CI */
 +      WREG32(mmCPU_EQ_CI, 0);
 +
 +      WREG32(mmCPU_IF_PF_PQ_PI, 0);
 +
 +      WREG32(mmCPU_PQ_INIT_STATUS, PQ_INIT_STATUS_READY_FOR_CP);
 +
 +      WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_PI_UPDATE);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmCPU_PQ_INIT_STATUS,
 +              status,
 +              (status == PQ_INIT_STATUS_READY_FOR_HOST),
 +              1000,
 +              GOYA_CPU_TIMEOUT_USEC);
 +
 +      if (err) {
 +              dev_err(hdev->dev,
 +                      "Failed to setup communication with device CPU\n");
 +              return -EIO;
 +      }
 +
 +      /* update FW application security bits */
 +      if (prop->fw_cpu_boot_dev_sts0_valid)
 +              prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
 +
 +      if (prop->fw_cpu_boot_dev_sts1_valid)
 +              prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
 +
 +      goya->hw_cap_initialized |= HW_CAP_CPU_Q;
 +      return 0;
 +}
 +
 +static void goya_set_pll_refclk(struct hl_device *hdev)
 +{
 +      WREG32(mmCPU_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmCPU_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmIC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmIC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmMC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmMC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_MME_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_PCI_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmPSOC_EMMC_PLL_DIV_SEL_3, 0x0);
 +
 +      WREG32(mmTPC_PLL_DIV_SEL_0, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_1, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_2, 0x0);
 +      WREG32(mmTPC_PLL_DIV_SEL_3, 0x0);
 +}
 +
 +static void goya_disable_clk_rlx(struct hl_device *hdev)
 +{
 +      WREG32(mmPSOC_MME_PLL_CLK_RLX_0, 0x100010);
 +      WREG32(mmIC_PLL_CLK_RLX_0, 0x100010);
 +}
 +
 +static void _goya_tpc_mbist_workaround(struct hl_device *hdev, u8 tpc_id)
 +{
 +      u64 tpc_eml_address;
 +      u32 val, tpc_offset, tpc_eml_offset, tpc_slm_offset;
 +      int err, slm_index;
 +
 +      tpc_offset = tpc_id * 0x40000;
 +      tpc_eml_offset = tpc_id * 0x200000;
 +      tpc_eml_address = (mmTPC0_EML_CFG_BASE + tpc_eml_offset - CFG_BASE);
 +      tpc_slm_offset = tpc_eml_address + 0x100000;
 +
 +      /*
 +       * Workaround for Bug H2 #2443 :
 +       * "TPC SB is not initialized on chip reset"
 +       */
 +
 +      val = RREG32(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset);
 +      if (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_ACTIVE_MASK)
 +              dev_warn(hdev->dev, "TPC%d MBIST ACTIVE is not cleared\n",
 +                      tpc_id);
 +
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_PAT + tpc_offset, val & 0xFFFFF000);
 +
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_0 + tpc_offset, 0x37FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_1 + tpc_offset, 0x303F);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_2 + tpc_offset, 0x71FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_3 + tpc_offset, 0x71FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_4 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_5 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_6 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_7 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_8 + tpc_offset, 0x70FF);
 +      WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_9 + tpc_offset, 0x70FF);
 +
 +      WREG32_OR(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
 +              1 << TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_START_SHIFT);
 +
 +      err = hl_poll_timeout(
 +              hdev,
 +              mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
 +              val,
 +              (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_DONE_MASK),
 +              1000,
 +              HL_DEVICE_TIMEOUT_USEC);
 +
 +      if (err)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for TPC%d MBIST DONE\n", tpc_id);
 +
 +      WREG32_OR(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
 +              1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT);
 +
 +      msleep(GOYA_RESET_WAIT_MSEC);
 +
 +      WREG32_AND(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
 +              ~(1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT));
 +
 +      msleep(GOYA_RESET_WAIT_MSEC);
 +
 +      for (slm_index = 0 ; slm_index < 256 ; slm_index++)
 +              WREG32(tpc_slm_offset + (slm_index << 2), 0);
 +
 +      val = RREG32(tpc_slm_offset);
 +}
 +
 +static void goya_tpc_mbist_workaround(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (hdev->pldm)
 +              return;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_TPC_MBIST)
 +              return;
 +
 +      /* Workaround for H2 #2443 */
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++)
 +              _goya_tpc_mbist_workaround(hdev, i);
 +
 +      goya->hw_cap_initialized |= HW_CAP_TPC_MBIST;
 +}
 +
 +/*
 + * goya_init_golden_registers - Initialize golden registers
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Initialize the H/W registers of the device
 + *
 + */
 +static void goya_init_golden_registers(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 polynom[10], tpc_intr_mask, offset;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_GOLDEN)
 +              return;
 +
 +      polynom[0] = 0x00020080;
 +      polynom[1] = 0x00401000;
 +      polynom[2] = 0x00200800;
 +      polynom[3] = 0x00002000;
 +      polynom[4] = 0x00080200;
 +      polynom[5] = 0x00040100;
 +      polynom[6] = 0x00100400;
 +      polynom[7] = 0x00004000;
 +      polynom[8] = 0x00010000;
 +      polynom[9] = 0x00008000;
 +
 +      /* Mask all arithmetic interrupts from TPC */
 +      tpc_intr_mask = 0x7FFF;
 +
 +      for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x20000) {
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_L_ARB + offset, 0x204);
 +
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_E_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_E_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_E_ARB + offset, 0x207);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_W_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_W_ARB + offset, 0x207);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_W_ARB + offset, 0x206);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_E_ARB + offset, 0x101);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_E_ARB + offset, 0x102);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_E_ARB + offset, 0x103);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_E_ARB + offset, 0x104);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_E_ARB + offset, 0x105);
 +
 +              WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_W_ARB + offset, 0x105);
 +              WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_W_ARB + offset, 0x104);
 +              WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_W_ARB + offset, 0x103);
 +              WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_W_ARB + offset, 0x102);
 +              WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_W_ARB + offset, 0x101);
 +      }
 +
 +      WREG32(mmMME_STORE_MAX_CREDIT, 0x21);
 +      WREG32(mmMME_AGU, 0x0f0f0f10);
 +      WREG32(mmMME_SEI_MASK, ~0x0);
 +
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_N_ARB, 0x07010701);
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_S_ARB, 0x04010401);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_S_ARB, 0x04050401);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_S_ARB, 0x03070301);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_S_ARB, 0x01050105);
 +      WREG32(mmMME6_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
 +      WREG32(mmMME5_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
 +      WREG32(mmMME4_RTR_HBW_RD_RQ_W_ARB, 0x01040301);
 +      WREG32(mmMME3_RTR_HBW_RD_RQ_W_ARB, 0x01030401);
 +      WREG32(mmMME2_RTR_HBW_RD_RQ_W_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_RD_RQ_W_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_N_ARB, 0x02020202);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_N_ARB, 0x01070101);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_N_ARB, 0x02020201);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_N_ARB, 0x07020701);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_N_ARB, 0x01020101);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_S_ARB, 0x07020701);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_S_ARB, 0x02020201);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01020102);
 +      WREG32(mmMME6_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
 +      WREG32(mmMME5_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
 +      WREG32(mmMME4_RTR_HBW_WR_RQ_W_ARB, 0x07020707);
 +      WREG32(mmMME3_RTR_HBW_WR_RQ_W_ARB, 0x01020201);
 +      WREG32(mmMME2_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
 +      WREG32(mmMME1_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_N_ARB, 0x01070102);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_N_ARB, 0x01070102);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_N_ARB, 0x01060102);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_N_ARB, 0x01040102);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_N_ARB, 0x01020102);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_N_ARB, 0x01020107);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_S_ARB, 0x01020106);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_S_ARB, 0x01020102);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_S_ARB, 0x01040102);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_S_ARB, 0x01060102);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_S_ARB, 0x01070102);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_S_ARB, 0x01070102);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_E_ARB, 0x01020702);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_E_ARB, 0x01020702);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_E_ARB, 0x01040602);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_E_ARB, 0x01060402);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_E_ARB, 0x01070202);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_E_ARB, 0x01070102);
 +      WREG32(mmMME6_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME5_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME4_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME3_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME2_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME1_RTR_HBW_RD_RS_W_ARB, 0x01060401);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_N_ARB, 0x01010107);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_S_ARB, 0x01010107);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_E_ARB, 0x01010501);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_E_ARB, 0x01010501);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_E_ARB, 0x01040301);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_E_ARB, 0x01030401);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_E_ARB, 0x01040101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_E_ARB, 0x01050101);
 +      WREG32(mmMME6_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME5_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME4_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME3_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME2_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +      WREG32(mmMME1_RTR_HBW_WR_RS_W_ARB, 0x01010101);
 +
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_RD_RQ_E_ARB, 0x01060101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_N_ARB, 0x02020102);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RQ_E_ARB, 0x02070202);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_N_ARB, 0x01020201);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_S_ARB, 0x01070201);
 +      WREG32(mmTPC1_RTR_HBW_RD_RS_W_ARB, 0x01070202);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
 +      WREG32(mmTPC1_RTR_HBW_WR_RS_W_ARB, 0x01050101);
 +
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_S_ARB, 0x01050101);
 +      WREG32(mmTPC2_RTR_HBW_RD_RQ_E_ARB, 0x01010201);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_N_ARB, 0x02040102);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_S_ARB, 0x01050101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RQ_E_ARB, 0x02060202);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_N_ARB, 0x01020201);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_S_ARB, 0x01070201);
 +      WREG32(mmTPC2_RTR_HBW_RD_RS_W_ARB, 0x01070202);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
 +      WREG32(mmTPC2_RTR_HBW_WR_RS_W_ARB, 0x01040101);
 +
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
 +      WREG32(mmTPC3_RTR_HBW_RD_RQ_E_ARB, 0x01040301);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_N_ARB, 0x02060102);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_S_ARB, 0x01040101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RQ_E_ARB, 0x01040301);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_N_ARB, 0x01040201);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_S_ARB, 0x01060201);
 +      WREG32(mmTPC3_RTR_HBW_RD_RS_W_ARB, 0x01060402);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
 +      WREG32(mmTPC3_RTR_HBW_WR_RS_W_ARB, 0x01030401);
 +
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_RD_RQ_E_ARB, 0x01030401);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_S_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RQ_E_ARB, 0x02060702);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_N_ARB, 0x01060201);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_S_ARB, 0x01040201);
 +      WREG32(mmTPC4_RTR_HBW_RD_RS_W_ARB, 0x01040602);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
 +      WREG32(mmTPC4_RTR_HBW_WR_RS_W_ARB, 0x01040301);
 +
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_N_ARB, 0x01050101);
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_S_ARB, 0x01020101);
 +      WREG32(mmTPC5_RTR_HBW_RD_RQ_E_ARB, 0x01200501);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_S_ARB, 0x01020101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RQ_E_ARB, 0x02020602);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_N_ARB, 0x01070201);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_S_ARB, 0x01020201);
 +      WREG32(mmTPC5_RTR_HBW_RD_RS_W_ARB, 0x01020702);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC5_RTR_HBW_WR_RS_W_ARB, 0x01010501);
 +
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RQ_E_ARB, 0x01010601);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RQ_E_ARB, 0x02020702);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_N_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_RD_RS_W_ARB, 0x01020702);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_S_ARB, 0x01010101);
 +      WREG32(mmTPC6_RTR_HBW_WR_RS_W_ARB, 0x01010501);
 +
 +      for (i = 0, offset = 0 ; i < 10 ; i++, offset += 4) {
 +              WREG32(mmMME1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmMME6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +
 +              WREG32(mmTPC0_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmTPC7_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +
 +              WREG32(mmPCI_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +              WREG32(mmDMA_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
 +      }
 +
 +      for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x40000) {
 +              WREG32(mmMME1_RTR_SCRAMB_EN + offset,
 +                              1 << MME1_RTR_SCRAMB_EN_VAL_SHIFT);
 +              WREG32(mmMME1_RTR_NON_LIN_SCRAMB + offset,
 +                              1 << MME1_RTR_NON_LIN_SCRAMB_EN_SHIFT);
 +      }
 +
 +      for (i = 0, offset = 0 ; i < 8 ; i++, offset += 0x40000) {
 +              /*
 +               * Workaround for Bug H2 #2441 :
 +               * "ST.NOP set trace event illegal opcode"
 +               */
 +              WREG32(mmTPC0_CFG_TPC_INTR_MASK + offset, tpc_intr_mask);
 +
 +              WREG32(mmTPC0_NRTR_SCRAMB_EN + offset,
 +                              1 << TPC0_NRTR_SCRAMB_EN_VAL_SHIFT);
 +              WREG32(mmTPC0_NRTR_NON_LIN_SCRAMB + offset,
 +                              1 << TPC0_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +              WREG32_FIELD(TPC0_CFG_MSS_CONFIG, offset,
 +                              ICACHE_FETCH_LINE_NUM, 2);
 +      }
 +
 +      WREG32(mmDMA_NRTR_SCRAMB_EN, 1 << DMA_NRTR_SCRAMB_EN_VAL_SHIFT);
 +      WREG32(mmDMA_NRTR_NON_LIN_SCRAMB,
 +                      1 << DMA_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +      WREG32(mmPCI_NRTR_SCRAMB_EN, 1 << PCI_NRTR_SCRAMB_EN_VAL_SHIFT);
 +      WREG32(mmPCI_NRTR_NON_LIN_SCRAMB,
 +                      1 << PCI_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
 +
 +      /*
 +       * Workaround for H2 #HW-23 bug
 +       * Set DMA max outstanding read requests to 240 on DMA CH 1.
 +       * This limitation is still large enough to not affect Gen4 bandwidth.
 +       * We need to only limit that DMA channel because the user can only read
 +       * from Host using DMA CH 1
 +       */
 +      WREG32(mmDMA_CH_1_CFG0, 0x0fff00F0);
 +
 +      WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
 +
 +      goya->hw_cap_initialized |= HW_CAP_GOLDEN;
 +}
 +
 +static void goya_init_mme_qman(struct hl_device *hdev)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 qman_base_addr;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      qman_base_addr = hdev->asic_prop.sram_base_address +
 +                              MME_QMAN_BASE_OFFSET;
 +
 +      WREG32(mmMME_QM_PQ_BASE_LO, lower_32_bits(qman_base_addr));
 +      WREG32(mmMME_QM_PQ_BASE_HI, upper_32_bits(qman_base_addr));
 +      WREG32(mmMME_QM_PQ_SIZE, ilog2(MME_QMAN_LENGTH));
 +      WREG32(mmMME_QM_PQ_PI, 0);
 +      WREG32(mmMME_QM_PQ_CI, 0);
 +      WREG32(mmMME_QM_CP_LDMA_SRC_BASE_LO_OFFSET, 0x10C0);
 +      WREG32(mmMME_QM_CP_LDMA_SRC_BASE_HI_OFFSET, 0x10C4);
 +      WREG32(mmMME_QM_CP_LDMA_TSIZE_OFFSET, 0x10C8);
 +      WREG32(mmMME_QM_CP_LDMA_COMMIT_OFFSET, 0x10CC);
 +
 +      WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
 +      WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
 +      WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_LO, so_base_lo);
 +      WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_HI, so_base_hi);
 +
 +      /* QMAN CQ has 8 cache lines */
 +      WREG32(mmMME_QM_CQ_CFG1, 0x00080008);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_ADDR_LO, gic_base_lo);
 +      WREG32(mmMME_QM_GLBL_ERR_ADDR_HI, gic_base_hi);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_QM);
 +
 +      WREG32(mmMME_QM_GLBL_ERR_CFG, QMAN_MME_ERR_MSG_EN);
 +
 +      WREG32(mmMME_QM_GLBL_PROT, QMAN_MME_ERR_PROT);
 +
 +      WREG32(mmMME_QM_GLBL_CFG0, QMAN_MME_ENABLE);
 +}
 +
 +static void goya_init_mme_cmdq(struct hl_device *hdev)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_LO, so_base_lo);
 +      WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_HI, so_base_hi);
 +
 +      /* CMDQ CQ has 20 cache lines */
 +      WREG32(mmMME_CMDQ_CQ_CFG1, 0x00140014);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_LO, gic_base_lo);
 +      WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_HI, gic_base_hi);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_CMDQ);
 +
 +      WREG32(mmMME_CMDQ_GLBL_ERR_CFG, CMDQ_MME_ERR_MSG_EN);
 +
 +      WREG32(mmMME_CMDQ_GLBL_PROT, CMDQ_MME_ERR_PROT);
 +
 +      WREG32(mmMME_CMDQ_GLBL_CFG0, CMDQ_MME_ENABLE);
 +}
 +
 +void goya_init_mme_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 so_base_lo, so_base_hi;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MME)
 +              return;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      WREG32(mmMME_SM_BASE_ADDRESS_LOW, so_base_lo);
 +      WREG32(mmMME_SM_BASE_ADDRESS_HIGH, so_base_hi);
 +
 +      goya_init_mme_qman(hdev);
 +      goya_init_mme_cmdq(hdev);
 +
 +      goya->hw_cap_initialized |= HW_CAP_MME;
 +}
 +
 +static void goya_init_tpc_qman(struct hl_device *hdev, u32 base_off, int tpc_id)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u64 qman_base_addr;
 +      u32 reg_off = tpc_id * (mmTPC1_QM_PQ_PI - mmTPC0_QM_PQ_PI);
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      qman_base_addr = hdev->asic_prop.sram_base_address + base_off;
 +
 +      WREG32(mmTPC0_QM_PQ_BASE_LO + reg_off, lower_32_bits(qman_base_addr));
 +      WREG32(mmTPC0_QM_PQ_BASE_HI + reg_off, upper_32_bits(qman_base_addr));
 +      WREG32(mmTPC0_QM_PQ_SIZE + reg_off, ilog2(TPC_QMAN_LENGTH));
 +      WREG32(mmTPC0_QM_PQ_PI + reg_off, 0);
 +      WREG32(mmTPC0_QM_PQ_CI + reg_off, 0);
 +      WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET + reg_off, 0x10C0);
 +      WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_HI_OFFSET + reg_off, 0x10C4);
 +      WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET + reg_off, 0x10C8);
 +      WREG32(mmTPC0_QM_CP_LDMA_COMMIT_OFFSET + reg_off, 0x10CC);
 +
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +
 +      WREG32(mmTPC0_QM_CQ_CFG1 + reg_off, 0x00080008);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_TPC0_QM + tpc_id);
 +
 +      WREG32(mmTPC0_QM_GLBL_ERR_CFG + reg_off, QMAN_TPC_ERR_MSG_EN);
 +
 +      WREG32(mmTPC0_QM_GLBL_PROT + reg_off, QMAN_TPC_ERR_PROT);
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG0 + reg_off, QMAN_TPC_ENABLE);
 +}
 +
 +static void goya_init_tpc_cmdq(struct hl_device *hdev, int tpc_id)
 +{
 +      u32 mtr_base_lo, mtr_base_hi;
 +      u32 so_base_lo, so_base_hi;
 +      u32 gic_base_lo, gic_base_hi;
 +      u32 reg_off = tpc_id * (mmTPC1_CMDQ_CQ_CFG1 - mmTPC0_CMDQ_CQ_CFG1);
 +
 +      mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      gic_base_lo =
 +              lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +      gic_base_hi =
 +              upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
 +
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
 +      WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
 +
 +      WREG32(mmTPC0_CMDQ_CQ_CFG1 + reg_off, 0x00140014);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_WDATA + reg_off,
 +                      GOYA_ASYNC_EVENT_ID_TPC0_CMDQ + tpc_id);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_ERR_CFG + reg_off, CMDQ_TPC_ERR_MSG_EN);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_PROT + reg_off, CMDQ_TPC_ERR_PROT);
 +
 +      WREG32(mmTPC0_CMDQ_GLBL_CFG0 + reg_off, CMDQ_TPC_ENABLE);
 +}
 +
 +void goya_init_tpc_qmans(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 so_base_lo, so_base_hi;
 +      u32 cfg_off = mmTPC1_CFG_SM_BASE_ADDRESS_LOW -
 +                      mmTPC0_CFG_SM_BASE_ADDRESS_LOW;
 +      int i;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_TPC)
 +              return;
 +
 +      so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++) {
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_LOW + i * cfg_off,
 +                              so_base_lo);
 +              WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + i * cfg_off,
 +                              so_base_hi);
 +      }
 +
 +      goya_init_tpc_qman(hdev, TPC0_QMAN_BASE_OFFSET, 0);
 +      goya_init_tpc_qman(hdev, TPC1_QMAN_BASE_OFFSET, 1);
 +      goya_init_tpc_qman(hdev, TPC2_QMAN_BASE_OFFSET, 2);
 +      goya_init_tpc_qman(hdev, TPC3_QMAN_BASE_OFFSET, 3);
 +      goya_init_tpc_qman(hdev, TPC4_QMAN_BASE_OFFSET, 4);
 +      goya_init_tpc_qman(hdev, TPC5_QMAN_BASE_OFFSET, 5);
 +      goya_init_tpc_qman(hdev, TPC6_QMAN_BASE_OFFSET, 6);
 +      goya_init_tpc_qman(hdev, TPC7_QMAN_BASE_OFFSET, 7);
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++)
 +              goya_init_tpc_cmdq(hdev, i);
 +
 +      goya->hw_cap_initialized |= HW_CAP_TPC;
 +}
 +
 +/*
 + * goya_disable_internal_queues - Disable internal queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + */
 +static void goya_disable_internal_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              goto disable_tpc;
 +
 +      WREG32(mmMME_QM_GLBL_CFG0, 0);
 +      WREG32(mmMME_CMDQ_GLBL_CFG0, 0);
 +
 +disable_tpc:
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return;
 +
 +      WREG32(mmTPC0_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC0_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC1_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC1_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC2_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC2_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC3_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC3_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC4_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC4_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC5_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC5_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC6_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC6_CMDQ_GLBL_CFG0, 0);
 +
 +      WREG32(mmTPC7_QM_GLBL_CFG0, 0);
 +      WREG32(mmTPC7_CMDQ_GLBL_CFG0, 0);
 +}
 +
 +/*
 + * goya_stop_internal_queues - Stop internal queues
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_stop_internal_queues(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc, retval = 0;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              goto stop_tpc;
 +
 +      /*
 +       * Each queue (QMAN) is a separate H/W logic. That means that each
 +       * QMAN can be stopped independently and failure to stop one does NOT
 +       * mandate we should not try to stop other QMANs
 +       */
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmMME_QM_GLBL_CFG1,
 +                      mmMME_QM_CP_STS,
 +                      mmMME_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop MME QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmMME_CMDQ_GLBL_CFG1,
 +                      mmMME_CMDQ_CP_STS,
 +                      mmMME_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop MME CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +stop_tpc:
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return retval;
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC0_QM_GLBL_CFG1,
 +                      mmTPC0_QM_CP_STS,
 +                      mmTPC0_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 0 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC0_CMDQ_GLBL_CFG1,
 +                      mmTPC0_CMDQ_CP_STS,
 +                      mmTPC0_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 0 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC1_QM_GLBL_CFG1,
 +                      mmTPC1_QM_CP_STS,
 +                      mmTPC1_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 1 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC1_CMDQ_GLBL_CFG1,
 +                      mmTPC1_CMDQ_CP_STS,
 +                      mmTPC1_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 1 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC2_QM_GLBL_CFG1,
 +                      mmTPC2_QM_CP_STS,
 +                      mmTPC2_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 2 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC2_CMDQ_GLBL_CFG1,
 +                      mmTPC2_CMDQ_CP_STS,
 +                      mmTPC2_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 2 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC3_QM_GLBL_CFG1,
 +                      mmTPC3_QM_CP_STS,
 +                      mmTPC3_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 3 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC3_CMDQ_GLBL_CFG1,
 +                      mmTPC3_CMDQ_CP_STS,
 +                      mmTPC3_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 3 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC4_QM_GLBL_CFG1,
 +                      mmTPC4_QM_CP_STS,
 +                      mmTPC4_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 4 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC4_CMDQ_GLBL_CFG1,
 +                      mmTPC4_CMDQ_CP_STS,
 +                      mmTPC4_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 4 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC5_QM_GLBL_CFG1,
 +                      mmTPC5_QM_CP_STS,
 +                      mmTPC5_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 5 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC5_CMDQ_GLBL_CFG1,
 +                      mmTPC5_CMDQ_CP_STS,
 +                      mmTPC5_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 5 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC6_QM_GLBL_CFG1,
 +                      mmTPC6_QM_CP_STS,
 +                      mmTPC6_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 6 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC6_CMDQ_GLBL_CFG1,
 +                      mmTPC6_CMDQ_CP_STS,
 +                      mmTPC6_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 6 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC7_QM_GLBL_CFG1,
 +                      mmTPC7_QM_CP_STS,
 +                      mmTPC7_QM_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 7 QMAN\n");
 +              retval = -EIO;
 +      }
 +
 +      rc = goya_stop_queue(hdev,
 +                      mmTPC7_CMDQ_GLBL_CFG1,
 +                      mmTPC7_CMDQ_CP_STS,
 +                      mmTPC7_CMDQ_GLBL_STS0);
 +
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to stop TPC 7 CMDQ\n");
 +              retval = -EIO;
 +      }
 +
 +      return retval;
 +}
 +
 +static void goya_dma_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_DMA))
 +              return;
 +
 +      WREG32(mmDMA_QM_0_GLBL_CFG1, 1 << DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_1_GLBL_CFG1, 1 << DMA_QM_1_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_2_GLBL_CFG1, 1 << DMA_QM_2_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_3_GLBL_CFG1, 1 << DMA_QM_3_GLBL_CFG1_DMA_STOP_SHIFT);
 +      WREG32(mmDMA_QM_4_GLBL_CFG1, 1 << DMA_QM_4_GLBL_CFG1_DMA_STOP_SHIFT);
 +}
 +
 +static void goya_tpc_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_TPC))
 +              return;
 +
 +      WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC1_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC2_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC3_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC4_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC5_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC6_CFG_TPC_STALL_V_SHIFT);
 +      WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC7_CFG_TPC_STALL_V_SHIFT);
 +}
 +
 +static void goya_mme_stall(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MME))
 +              return;
 +
 +      WREG32(mmMME_STALL, 0xFFFFFFFF);
 +}
 +
 +static int goya_enable_msix(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int cq_cnt = hdev->asic_prop.completion_queues_count;
 +      int rc, i, irq_cnt_init, irq;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MSIX)
 +              return 0;
 +
 +      rc = pci_alloc_irq_vectors(hdev->pdev, GOYA_MSIX_ENTRIES,
 +                              GOYA_MSIX_ENTRIES, PCI_IRQ_MSIX);
 +      if (rc < 0) {
 +              dev_err(hdev->dev,
 +                      "MSI-X: Failed to enable support -- %d/%d\n",
 +                      GOYA_MSIX_ENTRIES, rc);
 +              return rc;
 +      }
 +
 +      for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              rc = request_irq(irq, hl_irq_handler_cq, 0, goya_irq_name[i],
 +                              &hdev->completion_queue[i]);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +                      goto free_irqs;
 +              }
 +      }
 +
 +      irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
 +
 +      rc = request_irq(irq, hl_irq_handler_eq, 0,
 +                      goya_irq_name[GOYA_EVENT_QUEUE_MSIX_IDX],
 +                      &hdev->event_queue);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to request IRQ %d", irq);
 +              goto free_irqs;
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_MSIX;
 +      return 0;
 +
 +free_irqs:
 +      for (i = 0 ; i < irq_cnt_init ; i++)
 +              free_irq(pci_irq_vector(hdev->pdev, i),
 +                      &hdev->completion_queue[i]);
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +      return rc;
 +}
 +
 +static void goya_sync_irqs(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      /* Wait for all pending IRQs to be finished */
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 +              synchronize_irq(pci_irq_vector(hdev->pdev, i));
 +
 +      synchronize_irq(pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX));
 +}
 +
 +static void goya_disable_msix(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i, irq;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
 +              return;
 +
 +      goya_sync_irqs(hdev);
 +
 +      irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
 +      free_irq(irq, &hdev->event_queue);
 +
 +      for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++) {
 +              irq = pci_irq_vector(hdev->pdev, i);
 +              free_irq(irq, &hdev->completion_queue[i]);
 +      }
 +
 +      pci_free_irq_vectors(hdev->pdev);
 +
 +      goya->hw_cap_initialized &= ~HW_CAP_MSIX;
 +}
 +
 +static void goya_enable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +
 +      /* Zero the lower/upper parts of the 64-bit counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
 +
 +      /* Enable the counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
 +}
 +
 +static void goya_disable_timestamp(struct hl_device *hdev)
 +{
 +      /* Disable the timestamp counter */
 +      WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
 +}
 +
 +static void goya_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      u32 wait_timeout_ms;
 +
 +      if (hdev->pldm)
 +              wait_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
 +      else
 +              wait_timeout_ms = GOYA_RESET_WAIT_MSEC;
 +
 +      goya_stop_external_queues(hdev);
 +      goya_stop_internal_queues(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      goya_dma_stall(hdev);
 +      goya_tpc_stall(hdev);
 +      goya_mme_stall(hdev);
 +
 +      msleep(wait_timeout_ms);
 +
 +      goya_disable_external_queues(hdev);
 +      goya_disable_internal_queues(hdev);
 +
 +      goya_disable_timestamp(hdev);
 +
 +      if (hard_reset) {
 +              goya_disable_msix(hdev);
 +              goya_mmu_remove_device_cpu_mappings(hdev);
 +      } else {
 +              goya_sync_irqs(hdev);
 +      }
 +}
 +
 +/*
 + * goya_load_firmware_to_device() - Load LINUX FW code to device.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy LINUX fw code from firmware file to HBM BAR.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int goya_load_firmware_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[DDR_BAR_ID] + LINUX_FW_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GOYA_LINUX_FW_FILE, dst, 0, 0);
 +}
 +
 +/*
 + * goya_load_boot_fit_to_device() - Load boot fit to device.
 + * @hdev: Pointer to hl_device structure.
 + *
 + * Copy boot fit file to SRAM BAR.
 + *
 + * Return: 0 on success, non-zero for failure.
 + */
 +static int goya_load_boot_fit_to_device(struct hl_device *hdev)
 +{
 +      void __iomem *dst;
 +
 +      dst = hdev->pcie_bar[SRAM_CFG_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
 +
 +      return hl_fw_load_fw_to_device(hdev, GOYA_BOOT_FIT_FILE, dst, 0, 0);
 +}
 +
 +static void goya_init_dynamic_firmware_loader(struct hl_device *hdev)
 +{
 +      struct dynamic_fw_load_mgr *dynamic_loader;
 +      struct cpu_dyn_regs *dyn_regs;
 +
 +      dynamic_loader = &hdev->fw_loader.dynamic_loader;
 +
 +      /*
 +       * here we update initial values for few specific dynamic regs (as
 +       * before reading the first descriptor from FW those value has to be
 +       * hard-coded) in later stages of the protocol those values will be
 +       * updated automatically by reading the FW descriptor so data there
 +       * will always be up-to-date
 +       */
 +      dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
 +      dyn_regs->kmd_msg_to_cpu =
 +                              cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
 +      dyn_regs->cpu_cmd_status_to_host =
 +                              cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
 +
 +      dynamic_loader->wait_for_bl_timeout = GOYA_WAIT_FOR_BL_TIMEOUT_USEC;
 +}
 +
 +static void goya_init_static_firmware_loader(struct hl_device *hdev)
 +{
 +      struct static_fw_load_mgr *static_loader;
 +
 +      static_loader = &hdev->fw_loader.static_loader;
 +
 +      static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
 +      static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
 +      static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
 +      static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
 +      static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
 +      static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
 +      static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
 +      static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
 +}
 +
 +static void goya_init_firmware_preload_params(struct hl_device *hdev)
 +{
 +      struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
 +
 +      pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
 +      pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
 +      pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
 +      pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
 +      pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
 +      pre_fw_load->wait_for_preboot_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
 +}
 +
 +static void goya_init_firmware_loader(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct fw_load_mgr *fw_loader = &hdev->fw_loader;
 +
 +      /* fill common fields */
 +      fw_loader->fw_comp_loaded = FW_TYPE_NONE;
 +      fw_loader->boot_fit_img.image_name = GOYA_BOOT_FIT_FILE;
 +      fw_loader->linux_img.image_name = GOYA_LINUX_FW_FILE;
 +      fw_loader->cpu_timeout = GOYA_CPU_TIMEOUT_USEC;
 +      fw_loader->boot_fit_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
 +      fw_loader->skip_bmc = false;
 +      fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
 +      fw_loader->dram_bar_id = DDR_BAR_ID;
 +
 +      if (prop->dynamic_fw_load)
 +              goya_init_dynamic_firmware_loader(hdev);
 +      else
 +              goya_init_static_firmware_loader(hdev);
 +}
 +
 +static int goya_init_cpu(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int rc;
 +
 +      if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_CPU)
 +              return 0;
 +
 +      /*
 +       * Before pushing u-boot/linux to device, need to set the ddr bar to
 +       * base address of dram
 +       */
 +      if (goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map DDR bar to DRAM base address\n");
 +              return -EIO;
 +      }
 +
 +      rc = hl_fw_init_cpu(hdev);
 +
 +      if (rc)
 +              return rc;
 +
 +      goya->hw_cap_initialized |= HW_CAP_CPU;
 +
 +      return 0;
 +}
 +
 +static int goya_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 +                                              u64 phys_addr)
 +{
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
 +      WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
 +      WREG32(MMU_ASID_BUSY, 0x80000000 | asid);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              MMU_ASID_BUSY,
 +              status,
 +              !(status & 0x80000000),
 +              1000,
 +              timeout_usec);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Timeout during MMU hop0 config of asid %d\n", asid);
 +              return rc;
 +      }
 +
 +      return 0;
 +}
 +
 +int goya_mmu_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 hop0_addr;
 +      int rc, i;
 +
 +      if (!hdev->mmu_enable)
 +              return 0;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      hdev->dram_default_page_mapping = true;
 +
 +      for (i = 0 ; i < prop->max_asid ; i++) {
 +              hop0_addr = prop->mmu_pgt_addr +
 +                              (i * prop->mmu_hop_table_size);
 +
 +              rc = goya_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "failed to set hop0 addr for asid %d\n", i);
 +                      goto err;
 +              }
 +      }
 +
 +      goya->hw_cap_initialized |= HW_CAP_MMU;
 +
 +      /* init MMU cache manage page */
 +      WREG32(mmSTLB_CACHE_INV_BASE_39_8,
 +                              lower_32_bits(MMU_CACHE_MNG_ADDR >> 8));
 +      WREG32(mmSTLB_CACHE_INV_BASE_49_40, MMU_CACHE_MNG_ADDR >> 40);
 +
 +      /* Remove follower feature due to performance bug */
 +      WREG32_AND(mmSTLB_STLB_FEATURE_EN,
 +                      (~STLB_STLB_FEATURE_EN_FOLLOWER_EN_MASK));
 +
 +      hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR | MMU_OP_PHYS_PACK);
 +
 +      WREG32(mmMMU_MMU_ENABLE, 1);
 +      WREG32(mmMMU_SPI_MASK, 0xF);
 +
 +      return 0;
 +
 +err:
 +      return rc;
 +}
 +
 +/*
 + * goya_hw_init - Goya hardware initialization code
 + *
 + * @hdev: pointer to hl_device structure
 + *
 + * Returns 0 on success
 + *
 + */
 +static int goya_hw_init(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      int rc;
 +
 +      /* Perform read from the device to make sure device is up */
 +      RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 +
 +      /*
 +       * Let's mark in the H/W that we have reached this point. We check
 +       * this value in the reset_before_init function to understand whether
 +       * we need to reset the chip before doing H/W init. This register is
 +       * cleared by the H/W upon H/W reset
 +       */
 +      WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
 +
 +      rc = goya_init_cpu(hdev);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to initialize CPU\n");
 +              return rc;
 +      }
 +
 +      goya_tpc_mbist_workaround(hdev);
 +
 +      goya_init_golden_registers(hdev);
 +
 +      /*
 +       * After CPU initialization is finished, change DDR bar mapping inside
 +       * iATU to point to the start address of the MMU page tables
 +       */
 +      if (goya_set_ddr_bar_base(hdev, (MMU_PAGE_TABLES_ADDR &
 +                      ~(prop->dram_pci_bar_size - 0x1ull))) == U64_MAX) {
 +              dev_err(hdev->dev,
 +                      "failed to map DDR bar to MMU page tables\n");
 +              return -EIO;
 +      }
 +
 +      rc = goya_mmu_init(hdev);
 +      if (rc)
 +              return rc;
 +
 +      goya_init_security(hdev);
 +
 +      goya_init_dma_qmans(hdev);
 +
 +      goya_init_mme_qmans(hdev);
 +
 +      goya_init_tpc_qmans(hdev);
 +
 +      goya_enable_timestamp(hdev);
 +
 +      /* MSI-X must be enabled before CPU queues are initialized */
 +      rc = goya_enable_msix(hdev);
 +      if (rc)
 +              goto disable_queues;
 +
 +      /* Perform read from the device to flush all MSI-X configuration */
 +      RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 +
 +      return 0;
 +
 +disable_queues:
 +      goya_disable_internal_queues(hdev);
 +      goya_disable_external_queues(hdev);
 +
 +      return rc;
 +}
 +
 +static void goya_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 reset_timeout_ms, cpu_timeout_ms, status;
 +
 +      if (hdev->pldm) {
 +              reset_timeout_ms = GOYA_PLDM_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
 +      } else {
 +              reset_timeout_ms = GOYA_RESET_TIMEOUT_MSEC;
 +              cpu_timeout_ms = GOYA_CPU_RESET_WAIT_MSEC;
 +      }
 +
 +      if (hard_reset) {
 +              /* I don't know what is the state of the CPU so make sure it is
 +               * stopped in any means necessary
 +               */
 +              WREG32(mmPSOC_GLOBAL_CONF_UBOOT_MAGIC, KMD_MSG_GOTO_WFE);
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_HALT_MACHINE);
 +
 +              msleep(cpu_timeout_ms);
 +
 +              goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE);
 +              goya_disable_clk_rlx(hdev);
 +              goya_set_pll_refclk(hdev);
 +
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, RESET_ALL);
 +              dev_dbg(hdev->dev,
 +                      "Issued HARD reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      } else {
 +              WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, DMA_MME_TPC_RESET);
 +              dev_dbg(hdev->dev,
 +                      "Issued SOFT reset command, going to wait %dms\n",
 +                      reset_timeout_ms);
 +      }
 +
 +      /*
 +       * After hard reset, we can't poll the BTM_FSM register because the PSOC
 +       * itself is in reset. In either reset we need to wait until the reset
 +       * is deasserted
 +       */
 +      msleep(reset_timeout_ms);
 +
 +      status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
 +      if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
 +              dev_err(hdev->dev,
 +                      "Timeout while waiting for device to reset 0x%x\n",
 +                      status);
 +
 +      if (!hard_reset && goya) {
 +              goya->hw_cap_initialized &= ~(HW_CAP_DMA | HW_CAP_MME |
 +                                              HW_CAP_GOLDEN | HW_CAP_TPC);
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                              GOYA_ASYNC_EVENT_ID_SOFT_RESET);
 +              return;
 +      }
 +
 +      /* Chicken bit to re-initiate boot sequencer flow */
 +      WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START,
 +              1 << PSOC_GLOBAL_CONF_BOOT_SEQ_RE_START_IND_SHIFT);
 +      /* Move boot manager FSM to pre boot sequencer init state */
 +      WREG32(mmPSOC_GLOBAL_CONF_SW_BTM_FSM,
 +                      0xA << PSOC_GLOBAL_CONF_SW_BTM_FSM_CTRL_SHIFT);
 +
 +      if (goya) {
 +              goya->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q |
 +                              HW_CAP_DDR_0 | HW_CAP_DDR_1 |
 +                              HW_CAP_DMA | HW_CAP_MME |
 +                              HW_CAP_MMU | HW_CAP_TPC_MBIST |
 +                              HW_CAP_GOLDEN | HW_CAP_TPC);
 +
 +              memset(goya->events_stat, 0, sizeof(goya->events_stat));
 +      }
 +}
 +
 +int goya_suspend(struct hl_device *hdev)
 +{
 +      int rc;
 +
 +      rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
 +      if (rc)
 +              dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
 +
 +      return rc;
 +}
 +
 +int goya_resume(struct hl_device *hdev)
 +{
 +      return goya_init_iatu(hdev);
 +}
 +
 +static int goya_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                      void *cpu_addr, dma_addr_t dma_addr, size_t size)
 +{
 +      int rc;
 +
++      vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++                      VM_DONTCOPY | VM_NORESERVE);
 +
 +      rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
 +                              (dma_addr - HOST_PHYS_BASE), size);
 +      if (rc)
 +              dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
 +
 +      return rc;
 +}
 +
 +void goya_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
 +{
 +      u32 db_reg_offset, db_value;
 +
 +      switch (hw_queue_id) {
 +      case GOYA_QUEUE_ID_DMA_0:
 +              db_reg_offset = mmDMA_QM_0_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_1:
 +              db_reg_offset = mmDMA_QM_1_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_2:
 +              db_reg_offset = mmDMA_QM_2_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_3:
 +              db_reg_offset = mmDMA_QM_3_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_DMA_4:
 +              db_reg_offset = mmDMA_QM_4_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_CPU_PQ:
 +              db_reg_offset = mmCPU_IF_PF_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_MME:
 +              db_reg_offset = mmMME_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC0:
 +              db_reg_offset = mmTPC0_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC1:
 +              db_reg_offset = mmTPC1_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC2:
 +              db_reg_offset = mmTPC2_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC3:
 +              db_reg_offset = mmTPC3_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC4:
 +              db_reg_offset = mmTPC4_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC5:
 +              db_reg_offset = mmTPC5_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC6:
 +              db_reg_offset = mmTPC6_QM_PQ_PI;
 +              break;
 +
 +      case GOYA_QUEUE_ID_TPC7:
 +              db_reg_offset = mmTPC7_QM_PQ_PI;
 +              break;
 +
 +      default:
 +              /* Should never get here */
 +              dev_err(hdev->dev, "H/W queue %d is invalid. Can't set pi\n",
 +                      hw_queue_id);
 +              return;
 +      }
 +
 +      db_value = pi;
 +
 +      /* ring the doorbell */
 +      WREG32(db_reg_offset, db_value);
 +
 +      if (hw_queue_id == GOYA_QUEUE_ID_CPU_PQ) {
 +              /* make sure device CPU will read latest data from host */
 +              mb();
 +              WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                              GOYA_ASYNC_EVENT_ID_PI_UPDATE);
 +      }
 +}
 +
 +void goya_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
 +{
 +      /* The QMANs are on the SRAM so need to copy to IO space */
 +      memcpy_toio((void __iomem *) pqe, bd, sizeof(struct hl_bd));
 +}
 +
 +static void *goya_dma_alloc_coherent(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle, gfp_t flags)
 +{
 +      void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
 +                                              dma_handle, flags);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void goya_dma_free_coherent(struct hl_device *hdev, size_t size,
 +                                      void *cpu_addr, dma_addr_t dma_handle)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
 +
 +      dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
 +}
 +
 +int goya_scrub_device_mem(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +void *goya_get_int_queue_base(struct hl_device *hdev, u32 queue_id,
 +                              dma_addr_t *dma_handle, u16 *queue_len)
 +{
 +      void *base;
 +      u32 offset;
 +
 +      *dma_handle = hdev->asic_prop.sram_base_address;
 +
 +      base = (__force void *) hdev->pcie_bar[SRAM_CFG_BAR_ID];
 +
 +      switch (queue_id) {
 +      case GOYA_QUEUE_ID_MME:
 +              offset = MME_QMAN_BASE_OFFSET;
 +              *queue_len = MME_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC0:
 +              offset = TPC0_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC1:
 +              offset = TPC1_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC2:
 +              offset = TPC2_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC3:
 +              offset = TPC3_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC4:
 +              offset = TPC4_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC5:
 +              offset = TPC5_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC6:
 +              offset = TPC6_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      case GOYA_QUEUE_ID_TPC7:
 +              offset = TPC7_QMAN_BASE_OFFSET;
 +              *queue_len = TPC_QMAN_LENGTH;
 +              break;
 +      default:
 +              dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
 +              return NULL;
 +      }
 +
 +      base += offset;
 +      *dma_handle += offset;
 +
 +      return base;
 +}
 +
 +static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      u32 *fence_ptr;
 +      dma_addr_t fence_dma_addr;
 +      struct hl_cb *cb;
 +      u32 tmp, timeout;
 +      int rc;
 +
 +      if (hdev->pldm)
 +              timeout = GOYA_PLDM_QMAN0_TIMEOUT_USEC;
 +      else
 +              timeout = HL_DEVICE_TIMEOUT_USEC;
 +
 +      if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
 +              dev_err_ratelimited(hdev->dev,
 +                      "Can't send driver job on QMAN0 because the device is not idle\n");
 +              return -EBUSY;
 +      }
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate fence memory for QMAN0\n");
 +              return -ENOMEM;
 +      }
 +
 +      goya_qman0_set_security(hdev, true);
 +
 +      cb = job->patched_cb;
 +
 +      fence_pkt = cb->kernel_address +
 +                      job->job_cb_size - sizeof(struct packet_msg_prot);
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(GOYA_QMAN0_FENCE_VAL);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_DMA_0,
 +                                      job->job_cb_size, cb->bus_address);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
 +              goto free_fence_ptr;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
 +                              (tmp == GOYA_QMAN0_FENCE_VAL), 1000,
 +                              timeout, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_DMA_0);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
 +              goto free_fence_ptr;
 +      }
 +
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +
 +      goya_qman0_set_security(hdev, false);
 +
 +      return rc;
 +}
 +
 +int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 +                              u32 timeout, u64 *result)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q)) {
 +              if (result)
 +                      *result = 0;
 +              return 0;
 +      }
 +
 +      if (!timeout)
 +              timeout = GOYA_MSG_TO_CPU_TIMEOUT_USEC;
 +
 +      return hl_fw_send_cpu_message(hdev, GOYA_QUEUE_ID_CPU_PQ, msg, len,
 +                                      timeout, result);
 +}
 +
 +int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id)
 +{
 +      struct packet_msg_prot *fence_pkt;
 +      dma_addr_t pkt_dma_addr;
 +      u32 fence_val, tmp;
 +      dma_addr_t fence_dma_addr;
 +      u32 *fence_ptr;
 +      int rc;
 +
 +      fence_val = GOYA_QMAN0_FENCE_VAL;
 +
 +      fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
 +      if (!fence_ptr) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate memory for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              return -ENOMEM;
 +      }
 +
 +      *fence_ptr = 0;
 +
 +      fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
 +                                              &pkt_dma_addr);
 +      if (!fence_pkt) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate packet for H/W queue %d testing\n",
 +                      hw_queue_id);
 +              rc = -ENOMEM;
 +              goto free_fence_ptr;
 +      }
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      fence_pkt->ctl = cpu_to_le32(tmp);
 +      fence_pkt->value = cpu_to_le32(fence_val);
 +      fence_pkt->addr = cpu_to_le64(fence_dma_addr);
 +
 +      rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
 +                                      sizeof(struct packet_msg_prot),
 +                                      pkt_dma_addr);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to send fence packet to H/W queue %d\n",
 +                      hw_queue_id);
 +              goto free_pkt;
 +      }
 +
 +      rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
 +                                      1000, GOYA_TEST_QUEUE_WAIT_USEC, true);
 +
 +      hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
 +
 +      if (rc == -ETIMEDOUT) {
 +              dev_err(hdev->dev,
 +                      "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
 +                      hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
 +              rc = -EIO;
 +      }
 +
 +free_pkt:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
 +free_fence_ptr:
 +      hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
 +      return rc;
 +}
 +
 +int goya_test_cpu_queue(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      /*
 +       * check capability here as send_cpu_message() won't update the result
 +       * value if no capability
 +       */
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_test_cpu_queue(hdev);
 +}
 +
 +int goya_test_queues(struct hl_device *hdev)
 +{
 +      int i, rc, ret_val = 0;
 +
 +      for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
 +              rc = goya_test_queue(hdev, i);
 +              if (rc)
 +                      ret_val = -EINVAL;
 +      }
 +
 +      return ret_val;
 +}
 +
 +static void *goya_dma_pool_zalloc(struct hl_device *hdev, size_t size,
 +                                      gfp_t mem_flags, dma_addr_t *dma_handle)
 +{
 +      void *kernel_addr;
 +
 +      if (size > GOYA_DMA_POOL_BLK_SIZE)
 +              return NULL;
 +
 +      kernel_addr =  dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
 +
 +      /* Shift to the device's base physical address of host memory */
 +      if (kernel_addr)
 +              *dma_handle += HOST_PHYS_BASE;
 +
 +      return kernel_addr;
 +}
 +
 +static void goya_dma_pool_free(struct hl_device *hdev, void *vaddr,
 +                              dma_addr_t dma_addr)
 +{
 +      /* Cancel the device's base physical address of host memory */
 +      dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
 +
 +      dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
 +}
 +
 +void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
 +                                      dma_addr_t *dma_handle)
 +{
 +      void *vaddr;
 +
 +      vaddr = hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
 +      *dma_handle = (*dma_handle) - hdev->cpu_accessible_dma_address +
 +                      VA_CPU_ACCESSIBLE_MEM_ADDR;
 +
 +      return vaddr;
 +}
 +
 +void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
 +                                      void *vaddr)
 +{
 +      hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
 +}
 +
 +u32 goya_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
 +{
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t addr, addr_next;
 +
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((addr + len == addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              dma_desc_cnt++;
 +      }
 +
 +      return dma_desc_cnt * sizeof(struct packet_lin_dma);
 +}
 +
 +static int goya_pin_memory_before_cs(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              u64 addr, enum dma_data_direction dir)
 +{
 +      struct hl_userptr *userptr;
 +      int rc;
 +
 +      if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr))
 +              goto already_pinned;
 +
 +      userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
 +      if (!userptr)
 +              return -ENOMEM;
 +
 +      rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
 +                              userptr);
 +      if (rc)
 +              goto free_userptr;
 +
 +      list_add_tail(&userptr->job_node, parser->job_userptr_list);
 +
 +      rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
 +      if (rc) {
 +              dev_err(hdev->dev, "failed to map sgt with DMA region\n");
 +              goto unpin_memory;
 +      }
 +
 +      userptr->dma_mapped = true;
 +      userptr->dir = dir;
 +
 +already_pinned:
 +      parser->patched_cb_size +=
 +                      goya_get_dma_desc_list_size(hdev, userptr->sgt);
 +
 +      return 0;
 +
 +unpin_memory:
 +      list_del(&userptr->job_node);
 +      hl_unpin_host_memory(hdev, userptr);
 +free_userptr:
 +      kfree(userptr);
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      enum hl_goya_dma_direction user_dir;
 +      bool sram_addr = true;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +      u32 ctl;
 +      int rc = 0;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      switch (user_dir) {
 +      case HL_DMA_HOST_TO_DRAM:
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> DRAM\n");
 +              dir = DMA_TO_DEVICE;
 +              sram_addr = false;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +              break;
 +
 +      case HL_DMA_DRAM_TO_HOST:
 +              dev_dbg(hdev->dev, "DMA direction is DRAM --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              sram_addr = false;
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              break;
 +
 +      case HL_DMA_HOST_TO_SRAM:
 +              dev_dbg(hdev->dev, "DMA direction is HOST --> SRAM\n");
 +              dir = DMA_TO_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +              break;
 +
 +      case HL_DMA_SRAM_TO_HOST:
 +              dev_dbg(hdev->dev, "DMA direction is SRAM --> HOST\n");
 +              dir = DMA_FROM_DEVICE;
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              break;
 +      default:
 +              dev_err(hdev->dev, "DMA direction %d is unsupported/undefined\n", user_dir);
 +              return -EFAULT;
 +      }
 +
 +      if (sram_addr) {
 +              if (!hl_mem_area_inside_range(device_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.sram_user_base_address,
 +                              hdev->asic_prop.sram_end_address)) {
 +
 +                      dev_err(hdev->dev,
 +                              "SRAM address 0x%llx + 0x%x is invalid\n",
 +                              device_memory_addr,
 +                              user_dma_pkt->tsize);
 +                      return -EFAULT;
 +              }
 +      } else {
 +              if (!hl_mem_area_inside_range(device_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.dram_user_base_address,
 +                              hdev->asic_prop.dram_end_address)) {
 +
 +                      dev_err(hdev->dev,
 +                              "DRAM address 0x%llx + 0x%x is invalid\n",
 +                              device_memory_addr,
 +                              user_dma_pkt->tsize);
 +                      return -EFAULT;
 +              }
 +      }
 +
 +      if (skip_host_mem_pin)
 +              parser->patched_cb_size += sizeof(*user_dma_pkt);
 +      else {
 +              if ((dir == DMA_TO_DEVICE) &&
 +                              (parser->hw_queue_id > GOYA_QUEUE_ID_DMA_1)) {
 +                      dev_err(hdev->dev,
 +                              "Can't DMA from host on queue other then 1\n");
 +                      return -EFAULT;
 +              }
 +
 +              rc = goya_pin_memory_before_cs(hdev, parser, user_dma_pkt,
 +                                              addr, dir);
 +      }
 +
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_no_host(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      u64 sram_memory_addr, dram_memory_addr;
 +      enum hl_goya_dma_direction user_dir;
 +      u32 ctl;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      if (user_dir == HL_DMA_DRAM_TO_SRAM) {
 +              dev_dbg(hdev->dev, "DMA direction is DRAM --> SRAM\n");
 +              dram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              sram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +      } else {
 +              dev_dbg(hdev->dev, "DMA direction is SRAM --> DRAM\n");
 +              sram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +      }
 +
 +      if (!hl_mem_area_inside_range(sram_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.sram_user_base_address,
 +                              hdev->asic_prop.sram_end_address)) {
 +              dev_err(hdev->dev, "SRAM address 0x%llx + 0x%x is invalid\n",
 +                      sram_memory_addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if (!hl_mem_area_inside_range(dram_memory_addr,
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.dram_user_base_address,
 +                              hdev->asic_prop.dram_end_address)) {
 +              dev_err(hdev->dev, "DRAM address 0x%llx + 0x%x is invalid\n",
 +                      dram_memory_addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      parser->patched_cb_size += sizeof(*user_dma_pkt);
 +
 +      return 0;
 +}
 +
 +static int goya_validate_dma_pkt_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      enum hl_goya_dma_direction user_dir;
 +      u32 ctl;
 +      int rc;
 +
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->dst_addr));
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      /*
 +       * Special handling for DMA with size 0. The H/W has a bug where
 +       * this can cause the QMAN DMA to get stuck, so block it here.
 +       */
 +      if (user_dma_pkt->tsize == 0) {
 +              dev_err(hdev->dev,
 +                      "Got DMA with size 0, might reset the device\n");
 +              return -EINVAL;
 +      }
 +
 +      if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM))
 +              rc = goya_validate_dma_pkt_no_host(hdev, parser, user_dma_pkt);
 +      else
 +              rc = goya_validate_dma_pkt_host(hdev, parser, user_dma_pkt);
 +
 +      return rc;
 +}
 +
 +static int goya_validate_dma_pkt_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt)
 +{
 +      dev_dbg(hdev->dev, "DMA packet details:\n");
 +      dev_dbg(hdev->dev, "source == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->src_addr));
 +      dev_dbg(hdev->dev, "destination == 0x%llx\n",
 +              le64_to_cpu(user_dma_pkt->dst_addr));
 +      dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
 +
 +      /*
 +       * WA for HW-23.
 +       * We can't allow user to read from Host using QMANs other than 1.
 +       * PMMU and HPMMU addresses are equal, check only one of them.
 +       */
 +      if (parser->hw_queue_id != GOYA_QUEUE_ID_DMA_1 &&
 +              hl_mem_area_inside_range(le64_to_cpu(user_dma_pkt->src_addr),
 +                              le32_to_cpu(user_dma_pkt->tsize),
 +                              hdev->asic_prop.pmmu.start_addr,
 +                              hdev->asic_prop.pmmu.end_addr)) {
 +              dev_err(hdev->dev,
 +                      "Can't DMA from host on queue other then 1\n");
 +              return -EFAULT;
 +      }
 +
 +      if (user_dma_pkt->tsize == 0) {
 +              dev_err(hdev->dev,
 +                      "Got DMA with size 0, might reset the device\n");
 +              return -EINVAL;
 +      }
 +
 +      parser->patched_cb_size += sizeof(*user_dma_pkt);
 +
 +      return 0;
 +}
 +
 +static int goya_validate_wreg32(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_wreg32 *wreg_pkt)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 sob_start_addr, sob_end_addr;
 +      u16 reg_offset;
 +
 +      reg_offset = le32_to_cpu(wreg_pkt->ctl) &
 +                      GOYA_PKT_WREG32_CTL_REG_OFFSET_MASK;
 +
 +      dev_dbg(hdev->dev, "WREG32 packet details:\n");
 +      dev_dbg(hdev->dev, "reg_offset == 0x%x\n", reg_offset);
 +      dev_dbg(hdev->dev, "value      == 0x%x\n",
 +              le32_to_cpu(wreg_pkt->value));
 +
 +      if (reg_offset != (mmDMA_CH_0_WR_COMP_ADDR_LO & 0x1FFF)) {
 +              dev_err(hdev->dev, "WREG32 packet with illegal address 0x%x\n",
 +                      reg_offset);
 +              return -EPERM;
 +      }
 +
 +      /*
 +       * With MMU, DMA channels are not secured, so it doesn't matter where
 +       * the WR COMP will be written to because it will go out with
 +       * non-secured property
 +       */
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      sob_start_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
 +      sob_end_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1023);
 +
 +      if ((le32_to_cpu(wreg_pkt->value) < sob_start_addr) ||
 +                      (le32_to_cpu(wreg_pkt->value) > sob_end_addr)) {
 +
 +              dev_err(hdev->dev, "WREG32 packet with illegal value 0x%x\n",
 +                      wreg_pkt->value);
 +              return -EPERM;
 +      }
 +
 +      return 0;
 +}
 +
 +static int goya_validate_cb(struct hl_device *hdev,
 +                      struct hl_cs_parser *parser, bool is_mmu)
 +{
 +      u32 cb_parsed_length = 0;
 +      int rc = 0;
 +
 +      parser->patched_cb_size = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              struct goya_packet *user_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = goya_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_WREG_32:
 +                      /*
 +                       * Although it is validated after copy in patch_cb(),
 +                       * need to validate here as well because patch_cb() is
 +                       * not called in MMU path while this function is called
 +                       */
 +                      rc = goya_validate_wreg32(hdev,
 +                              parser, (struct packet_wreg32 *) user_pkt);
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_LIN_DMA:
 +                      if (is_mmu)
 +                              rc = goya_validate_dma_pkt_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      else
 +                              rc = goya_validate_dma_pkt_no_mmu(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt);
 +                      break;
 +
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +                      parser->patched_cb_size += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT packets:
 +       * 1. A packet that will act as a completion packet
 +       * 2. A packet that will generate MSI-X interrupt
 +       */
 +      parser->patched_cb_size += sizeof(struct packet_msg_prot) * 2;
 +
 +      return rc;
 +}
 +
 +static int goya_patch_dma_packet(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser,
 +                              struct packet_lin_dma *user_dma_pkt,
 +                              struct packet_lin_dma *new_dma_pkt,
 +                              u32 *new_dma_pkt_size)
 +{
 +      struct hl_userptr *userptr;
 +      struct scatterlist *sg, *sg_next_iter;
 +      u32 count, dma_desc_cnt;
 +      u64 len, len_next;
 +      dma_addr_t dma_addr, dma_addr_next;
 +      enum hl_goya_dma_direction user_dir;
 +      u64 device_memory_addr, addr;
 +      enum dma_data_direction dir;
 +      struct sg_table *sgt;
 +      bool skip_host_mem_pin = false;
 +      bool user_memset;
 +      u32 user_rdcomp_mask, user_wrcomp_mask, ctl;
 +
 +      ctl = le32_to_cpu(user_dma_pkt->ctl);
 +
 +      user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +
 +      user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
 +                      GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
 +
 +      if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM) ||
 +                      (user_dma_pkt->tsize == 0)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*new_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*new_dma_pkt);
 +              return 0;
 +      }
 +
 +      if ((user_dir == HL_DMA_HOST_TO_DRAM) || (user_dir == HL_DMA_HOST_TO_SRAM)) {
 +              addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              dir = DMA_TO_DEVICE;
 +              if (user_memset)
 +                      skip_host_mem_pin = true;
 +      } else {
 +              addr = le64_to_cpu(user_dma_pkt->dst_addr);
 +              device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
 +              dir = DMA_FROM_DEVICE;
 +      }
 +
 +      if ((!skip_host_mem_pin) &&
 +              (hl_userptr_is_pinned(hdev, addr,
 +                      le32_to_cpu(user_dma_pkt->tsize),
 +                      parser->job_userptr_list, &userptr) == false)) {
 +              dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
 +                              addr, user_dma_pkt->tsize);
 +              return -EFAULT;
 +      }
 +
 +      if ((user_memset) && (dir == DMA_TO_DEVICE)) {
 +              memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
 +              *new_dma_pkt_size = sizeof(*user_dma_pkt);
 +              return 0;
 +      }
 +
 +      user_rdcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK;
 +
 +      user_wrcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK;
 +
 +      sgt = userptr->sgt;
 +      dma_desc_cnt = 0;
 +
 +      for_each_sgtable_dma_sg(sgt, sg, count) {
 +              len = sg_dma_len(sg);
 +              dma_addr = sg_dma_address(sg);
 +
 +              if (len == 0)
 +                      break;
 +
 +              while ((count + 1) < sgt->nents) {
 +                      sg_next_iter = sg_next(sg);
 +                      len_next = sg_dma_len(sg_next_iter);
 +                      dma_addr_next = sg_dma_address(sg_next_iter);
 +
 +                      if (len_next == 0)
 +                              break;
 +
 +                      if ((dma_addr + len == dma_addr_next) &&
 +                              (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
 +                              len += len_next;
 +                              count++;
 +                              sg = sg_next_iter;
 +                      } else {
 +                              break;
 +                      }
 +              }
 +
 +              ctl = le32_to_cpu(user_dma_pkt->ctl);
 +              if (likely(dma_desc_cnt))
 +                      ctl &= ~GOYA_PKT_CTL_EB_MASK;
 +              ctl &= ~(GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK |
 +                              GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
 +              new_dma_pkt->ctl = cpu_to_le32(ctl);
 +              new_dma_pkt->tsize = cpu_to_le32((u32) len);
 +
 +              if (dir == DMA_TO_DEVICE) {
 +                      new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
 +              } else {
 +                      new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
 +                      new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
 +              }
 +
 +              if (!user_memset)
 +                      device_memory_addr += len;
 +              dma_desc_cnt++;
 +              new_dma_pkt++;
 +      }
 +
 +      if (!dma_desc_cnt) {
 +              dev_err(hdev->dev,
 +                      "Error of 0 SG entries when patching DMA packet\n");
 +              return -EFAULT;
 +      }
 +
 +      /* Fix the last dma packet - rdcomp/wrcomp must be as user set them */
 +      new_dma_pkt--;
 +      new_dma_pkt->ctl |= cpu_to_le32(user_rdcomp_mask | user_wrcomp_mask);
 +
 +      *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
 +
 +      return 0;
 +}
 +
 +static int goya_patch_cb(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u32 cb_parsed_length = 0;
 +      u32 cb_patched_cur_length = 0;
 +      int rc = 0;
 +
 +      /* cb_user_size is more than 0 so loop will always be executed */
 +      while (cb_parsed_length < parser->user_cb_size) {
 +              enum packet_id pkt_id;
 +              u16 pkt_size;
 +              u32 new_pkt_size = 0;
 +              struct goya_packet *user_pkt, *kernel_pkt;
 +
 +              user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
 +              kernel_pkt = parser->patched_cb->kernel_address +
 +                                      cb_patched_cur_length;
 +
 +              pkt_id = (enum packet_id) (
 +                              (le64_to_cpu(user_pkt->header) &
 +                              PACKET_HEADER_PACKET_ID_MASK) >>
 +                                      PACKET_HEADER_PACKET_ID_SHIFT);
 +
 +              if (!validate_packet_id(pkt_id)) {
 +                      dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              pkt_size = goya_packet_sizes[pkt_id];
 +              cb_parsed_length += pkt_size;
 +              if (cb_parsed_length > parser->user_cb_size) {
 +                      dev_err(hdev->dev,
 +                              "packet 0x%x is out of CB boundary\n", pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              switch (pkt_id) {
 +              case PACKET_LIN_DMA:
 +                      rc = goya_patch_dma_packet(hdev, parser,
 +                                      (struct packet_lin_dma *) user_pkt,
 +                                      (struct packet_lin_dma *) kernel_pkt,
 +                                      &new_pkt_size);
 +                      cb_patched_cur_length += new_pkt_size;
 +                      break;
 +
 +              case PACKET_WREG_32:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      rc = goya_validate_wreg32(hdev, parser,
 +                                      (struct packet_wreg32 *) kernel_pkt);
 +                      break;
 +
 +              case PACKET_WREG_BULK:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use WREG_BULK\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_PROT:
 +                      dev_err(hdev->dev,
 +                              "User not allowed to use MSG_PROT\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_CP_DMA:
 +                      dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_STOP:
 +                      dev_err(hdev->dev, "User not allowed to use STOP\n");
 +                      rc = -EPERM;
 +                      break;
 +
 +              case PACKET_MSG_LONG:
 +              case PACKET_MSG_SHORT:
 +              case PACKET_FENCE:
 +              case PACKET_NOP:
 +                      memcpy(kernel_pkt, user_pkt, pkt_size);
 +                      cb_patched_cur_length += pkt_size;
 +                      break;
 +
 +              default:
 +                      dev_err(hdev->dev, "Invalid packet header 0x%x\n",
 +                              pkt_id);
 +                      rc = -EINVAL;
 +                      break;
 +              }
 +
 +              if (rc)
 +                      break;
 +      }
 +
 +      return rc;
 +}
 +
 +static int goya_parse_cb_mmu(struct hl_device *hdev,
 +              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      u32 patched_cb_size;
 +      struct hl_cb *user_cb;
 +      int rc;
 +
 +      /*
 +       * The new CB should have space at the end for two MSG_PROT pkt:
 +       * 1. A packet that will act as a completion packet
 +       * 2. A packet that will generate MSI-X interrupt
 +       */
 +      parser->patched_cb_size = parser->user_cb_size +
 +                      sizeof(struct packet_msg_prot) * 2;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n",
 +                      rc);
 +              return rc;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      /*
 +       * The check that parser->user_cb_size <= parser->user_cb->size was done
 +       * in validate_queue_index().
 +       */
 +      memcpy(parser->patched_cb->kernel_address,
 +              parser->user_cb->kernel_address,
 +              parser->user_cb_size);
 +
 +      patched_cb_size = parser->patched_cb_size;
 +
 +      /* validate patched CB instead of user CB */
 +      user_cb = parser->user_cb;
 +      parser->user_cb = parser->patched_cb;
 +      rc = goya_validate_cb(hdev, parser, true);
 +      parser->user_cb = user_cb;
 +
 +      if (rc) {
 +              hl_cb_put(parser->patched_cb);
 +              goto out;
 +      }
 +
 +      if (patched_cb_size != parser->patched_cb_size) {
 +              dev_err(hdev->dev, "user CB size mismatch\n");
 +              hl_cb_put(parser->patched_cb);
 +              rc = -EINVAL;
 +              goto out;
 +      }
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +      return rc;
 +}
 +
 +static int goya_parse_cb_no_mmu(struct hl_device *hdev,
 +                              struct hl_cs_parser *parser)
 +{
 +      u64 handle;
 +      int rc;
 +
 +      rc = goya_validate_cb(hdev, parser, false);
 +
 +      if (rc)
 +              goto free_userptr;
 +
 +      rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
 +                              parser->patched_cb_size, false, false,
 +                              &handle);
 +      if (rc) {
 +              dev_err(hdev->dev,
 +                      "Failed to allocate patched CB for DMA CS %d\n", rc);
 +              goto free_userptr;
 +      }
 +
 +      parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
 +      /* hl_cb_get should never fail here */
 +      if (!parser->patched_cb) {
 +              dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
 +              rc = -EFAULT;
 +              goto out;
 +      }
 +
 +      rc = goya_patch_cb(hdev, parser);
 +
 +      if (rc)
 +              hl_cb_put(parser->patched_cb);
 +
 +out:
 +      /*
 +       * Always call cb destroy here because we still have 1 reference
 +       * to it by calling cb_get earlier. After the job will be completed,
 +       * cb_put will release it, but here we want to remove it from the
 +       * idr
 +       */
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
 +
 +free_userptr:
 +      if (rc)
 +              hl_userptr_delete_list(hdev, parser->job_userptr_list);
 +      return rc;
 +}
 +
 +static int goya_parse_cb_no_ext_queue(struct hl_device *hdev,
 +                                      struct hl_cs_parser *parser)
 +{
 +      struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return 0;
 +
 +      /* For internal queue jobs, just check if CB address is valid */
 +      if (hl_mem_area_inside_range(
 +                      (u64) (uintptr_t) parser->user_cb,
 +                      parser->user_cb_size,
 +                      asic_prop->sram_user_base_address,
 +                      asic_prop->sram_end_address))
 +              return 0;
 +
 +      if (hl_mem_area_inside_range(
 +                      (u64) (uintptr_t) parser->user_cb,
 +                      parser->user_cb_size,
 +                      asic_prop->dram_user_base_address,
 +                      asic_prop->dram_end_address))
 +              return 0;
 +
 +      dev_err(hdev->dev,
 +              "Internal CB address 0x%px + 0x%x is not in SRAM nor in DRAM\n",
 +              parser->user_cb, parser->user_cb_size);
 +
 +      return -EFAULT;
 +}
 +
 +int goya_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (parser->queue_type == QUEUE_TYPE_INT)
 +              return goya_parse_cb_no_ext_queue(hdev, parser);
 +
 +      if (goya->hw_cap_initialized & HW_CAP_MMU)
 +              return goya_parse_cb_mmu(hdev, parser);
 +      else
 +              return goya_parse_cb_no_mmu(hdev, parser);
 +}
 +
 +void goya_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
 +                              u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
 +                              u32 msix_vec, bool eb)
 +{
 +      struct packet_msg_prot *cq_pkt;
 +      u32 tmp;
 +
 +      cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_EB_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(cq_val);
 +      cq_pkt->addr = cpu_to_le64(cq_addr);
 +
 +      cq_pkt++;
 +
 +      tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                      (1 << GOYA_PKT_CTL_MB_SHIFT);
 +      cq_pkt->ctl = cpu_to_le32(tmp);
 +      cq_pkt->value = cpu_to_le32(msix_vec & 0x7FF);
 +      cq_pkt->addr = cpu_to_le64(CFG_BASE + mmPCIE_DBI_MSIX_DOORBELL_OFF);
 +}
 +
 +void goya_update_eq_ci(struct hl_device *hdev, u32 val)
 +{
 +      WREG32(mmCPU_EQ_CI, val);
 +}
 +
 +void goya_restore_phase_topology(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static void goya_clear_sm_regs(struct hl_device *hdev)
 +{
 +      int i, num_of_sob_in_longs, num_of_mon_in_longs;
 +
 +      num_of_sob_in_longs =
 +              ((mmSYNC_MNGR_SOB_OBJ_1023 - mmSYNC_MNGR_SOB_OBJ_0) + 4);
 +
 +      num_of_mon_in_longs =
 +              ((mmSYNC_MNGR_MON_STATUS_255 - mmSYNC_MNGR_MON_STATUS_0) + 4);
 +
 +      for (i = 0 ; i < num_of_sob_in_longs ; i += 4)
 +              WREG32(mmSYNC_MNGR_SOB_OBJ_0 + i, 0);
 +
 +      for (i = 0 ; i < num_of_mon_in_longs ; i += 4)
 +              WREG32(mmSYNC_MNGR_MON_STATUS_0 + i, 0);
 +
 +      /* Flush all WREG to prevent race */
 +      i = RREG32(mmSYNC_MNGR_SOB_OBJ_0);
 +}
 +
 +static int goya_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
 +{
 +      dev_err(hdev->dev, "Reading via DMA is unimplemented yet\n");
 +      return -EPERM;
 +}
 +
 +static u64 goya_read_pte(struct hl_device *hdev, u64 addr)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return U64_MAX;
 +
 +      return readq(hdev->pcie_bar[DDR_BAR_ID] +
 +                      (addr - goya->ddr_bar_cur_addr));
 +}
 +
 +static void goya_write_pte(struct hl_device *hdev, u64 addr, u64 val)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (hdev->reset_info.hard_reset_pending)
 +              return;
 +
 +      writeq(val, hdev->pcie_bar[DDR_BAR_ID] +
 +                      (addr - goya->ddr_bar_cur_addr));
 +}
 +
 +static const char *_goya_get_event_desc(u16 event_type)
 +{
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_PCIE_IF:
 +              return "PCIe_if";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +              return "TPC%d_ecc";
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC:
 +              return "MME_ecc";
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
 +              return "MME_ecc_ext";
 +      case GOYA_ASYNC_EVENT_ID_MMU_ECC:
 +              return "MMU_ecc";
 +      case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
 +              return "DMA_macro";
 +      case GOYA_ASYNC_EVENT_ID_DMA_ECC:
 +              return "DMA_ecc";
 +      case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
 +              return "CPU_if_ecc";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
 +              return "PSOC_mem";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
 +              return "PSOC_coresight";
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +              return "SRAM%d";
 +      case GOYA_ASYNC_EVENT_ID_GIC500:
 +              return "GIC500";
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +              return "PLL%d";
 +      case GOYA_ASYNC_EVENT_ID_AXI_ECC:
 +              return "AXI_ecc";
 +      case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
 +              return "L2_ram_ecc";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
 +              return "PSOC_gpio_05_sw_reset";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
 +              return "PSOC_gpio_10_vrhot_icrit";
 +      case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
 +              return "PCIe_dec";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +              return "TPC%d_dec";
 +      case GOYA_ASYNC_EVENT_ID_MME_WACS:
 +              return "MME_wacs";
 +      case GOYA_ASYNC_EVENT_ID_MME_WACSD:
 +              return "MME_wacsd";
 +      case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
 +              return "CPU_axi_splitter";
 +      case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
 +              return "PSOC_axi_dec";
 +      case GOYA_ASYNC_EVENT_ID_PSOC:
 +              return "PSOC";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +              return "TPC%d_krn_err";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
 +              return "TPC%d_cq";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +              return "TPC%d_qm";
 +      case GOYA_ASYNC_EVENT_ID_MME_QM:
 +              return "MME_qm";
 +      case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
 +              return "MME_cq";
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +              return "DMA%d_qm";
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              return "DMA%d_ch";
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +              return "TPC%d_bmon_spmu";
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              return "DMA_bm_ch%d";
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +              return "POWER_ENV_S";
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +              return "POWER_ENV_E";
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +              return "THERMAL_ENV_S";
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              return "THERMAL_ENV_E";
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              return "QUEUE_OUT_OF_SYNC";
 +      default:
 +              return "N/A";
 +      }
 +}
 +
 +static void goya_get_event_desc(u16 event_type, char *desc, size_t size)
 +{
 +      u8 index;
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_ECC) / 3;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_SRAM0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_PLL0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_DEC) / 3;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR) / 10;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_CMDQ;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_QM;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_QM;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_CH;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +              index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU) / 10;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              index = event_type - GOYA_ASYNC_EVENT_ID_DMA_BM_CH0;
 +              snprintf(desc, size, _goya_get_event_desc(event_type), index);
 +              break;
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              snprintf(desc, size, _goya_get_event_desc(event_type));
 +              break;
 +      default:
 +              snprintf(desc, size, _goya_get_event_desc(event_type));
 +              break;
 +      }
 +}
 +
 +static void goya_print_razwi_info(struct hl_device *hdev)
 +{
 +      if (RREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal write to LBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal read from LBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal write to HBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD, 0);
 +      }
 +
 +      if (RREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD)) {
 +              dev_err_ratelimited(hdev->dev, "Illegal read from HBW\n");
 +              WREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD, 0);
 +      }
 +}
 +
 +static void goya_print_mmu_error_info(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr;
 +      u32 val;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      val = RREG32(mmMMU_PAGE_ERROR_CAPTURE);
 +      if (val & MMU_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
 +              addr = val & MMU_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
 +              addr <<= 32;
 +              addr |= RREG32(mmMMU_PAGE_ERROR_CAPTURE_VA);
 +
 +              dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n",
 +                                      addr);
 +
 +              WREG32(mmMMU_PAGE_ERROR_CAPTURE, 0);
 +      }
 +}
 +
 +static void goya_print_out_of_sync_info(struct hl_device *hdev,
 +                                      struct cpucp_pkt_sync_err *sync_err)
 +{
 +      struct hl_hw_queue *q = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
 +
 +      dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
 +              le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
 +}
 +
 +static void goya_print_irq_info(struct hl_device *hdev, u16 event_type,
 +                              bool razwi)
 +{
 +      char desc[20] = "";
 +
 +      goya_get_event_desc(event_type, desc, sizeof(desc));
 +      dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 +              event_type, desc);
 +
 +      if (razwi) {
 +              goya_print_razwi_info(hdev);
 +              goya_print_mmu_error_info(hdev);
 +      }
 +}
 +
 +static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
 +              size_t irq_arr_size)
 +{
 +      struct cpucp_unmask_irq_arr_packet *pkt;
 +      size_t total_pkt_size;
 +      u64 result;
 +      int rc;
 +      int irq_num_entries, irq_arr_index;
 +      __le32 *goya_irq_arr;
 +
 +      total_pkt_size = sizeof(struct cpucp_unmask_irq_arr_packet) +
 +                      irq_arr_size;
 +
 +      /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
 +      total_pkt_size = (total_pkt_size + 0x7) & ~0x7;
 +
 +      /* total_pkt_size is casted to u16 later on */
 +      if (total_pkt_size > USHRT_MAX) {
 +              dev_err(hdev->dev, "too many elements in IRQ array\n");
 +              return -EINVAL;
 +      }
 +
 +      pkt = kzalloc(total_pkt_size, GFP_KERNEL);
 +      if (!pkt)
 +              return -ENOMEM;
 +
 +      irq_num_entries = irq_arr_size / sizeof(irq_arr[0]);
 +      pkt->length = cpu_to_le32(irq_num_entries);
 +
 +      /* We must perform any necessary endianness conversation on the irq
 +       * array being passed to the goya hardware
 +       */
 +      for (irq_arr_index = 0, goya_irq_arr = (__le32 *) &pkt->irqs;
 +                      irq_arr_index < irq_num_entries ; irq_arr_index++)
 +              goya_irq_arr[irq_arr_index] =
 +                              cpu_to_le32(irq_arr[irq_arr_index]);
 +
 +      pkt->cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ_ARRAY <<
 +                                              CPUCP_PKT_CTL_OPCODE_SHIFT);
 +
 +      rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) pkt,
 +                                              total_pkt_size, 0, &result);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "failed to unmask IRQ array\n");
 +
 +      kfree(pkt);
 +
 +      return rc;
 +}
 +
 +static int goya_compute_reset_late_init(struct hl_device *hdev)
 +{
 +      /*
 +       * Unmask all IRQs since some could have been received
 +       * during the soft reset
 +       */
 +      return goya_unmask_irq_arr(hdev, goya_all_events,
 +                                      sizeof(goya_all_events));
 +}
 +
 +static int goya_unmask_irq(struct hl_device *hdev, u16 event_type)
 +{
 +      struct cpucp_packet pkt;
 +      u64 result;
 +      int rc;
 +
 +      memset(&pkt, 0, sizeof(pkt));
 +
 +      pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ <<
 +                              CPUCP_PKT_CTL_OPCODE_SHIFT);
 +      pkt.value = cpu_to_le64(event_type);
 +
 +      rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
 +                                              0, &result);
 +
 +      if (rc)
 +              dev_err(hdev->dev, "failed to unmask RAZWI IRQ %d", event_type);
 +
 +      return rc;
 +}
 +
 +static void goya_print_clk_change_info(struct hl_device *hdev, u16 event_type)
 +{
 +      ktime_t zero_time = ktime_set(0, 0);
 +
 +      mutex_lock(&hdev->clk_throttling.lock);
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to power consumption\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Power envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +              hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
 +              dev_info_ratelimited(hdev->dev,
 +                      "Clock throttling due to overheating\n");
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
 +              hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
 +              dev_info_ratelimited(hdev->dev,
 +                      "Thermal envelop is safe, back to optimal clock\n");
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid clock change event %d\n",
 +                      event_type);
 +              break;
 +      }
 +
 +      mutex_unlock(&hdev->clk_throttling.lock);
 +}
 +
 +void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
 +{
 +      u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
 +      u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
 +                              >> EQ_CTL_EVENT_TYPE_SHIFT);
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (event_type >= GOYA_ASYNC_EVENT_ID_SIZE) {
 +              dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
 +                              event_type, GOYA_ASYNC_EVENT_ID_SIZE - 1);
 +              return;
 +      }
 +
 +      goya->events_stat[event_type]++;
 +      goya->events_stat_aggregate[event_type]++;
 +
 +      switch (event_type) {
 +      case GOYA_ASYNC_EVENT_ID_PCIE_IF:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC:
 +      case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
 +      case GOYA_ASYNC_EVENT_ID_MMU_ECC:
 +      case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
 +      case GOYA_ASYNC_EVENT_ID_DMA_ECC:
 +      case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
 +      case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
 +      case GOYA_ASYNC_EVENT_ID_GIC500:
 +      case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
 +      case GOYA_ASYNC_EVENT_ID_AXI_ECC:
 +      case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
 +              goya_print_irq_info(hdev, event_type, false);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, (HL_DRV_RESET_HARD |
 +                                              HL_DRV_RESET_FW_FATAL_ERR));
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
 +              goya_print_irq_info(hdev, event_type, false);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, HL_DRV_RESET_HARD);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
 +      case GOYA_ASYNC_EVENT_ID_MME_WACS:
 +      case GOYA_ASYNC_EVENT_ID_MME_WACSD:
 +      case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
 +      case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
 +      case GOYA_ASYNC_EVENT_ID_PSOC:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
 +      case GOYA_ASYNC_EVENT_ID_MME_QM:
 +      case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
 +      case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
 +      case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
 +              goya_print_irq_info(hdev, event_type, true);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
 +      case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
 +      case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
 +              goya_print_irq_info(hdev, event_type, false);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
 +      case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
 +      case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
 +              goya_print_clk_change_info(hdev, event_type);
 +              goya_unmask_irq(hdev, event_type);
 +              break;
 +
 +      case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
 +              goya_print_irq_info(hdev, event_type, false);
 +              goya_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
 +              if (hdev->hard_reset_on_fw_events)
 +                      hl_device_reset(hdev, HL_DRV_RESET_HARD);
 +              else
 +                      hl_fw_unmask_irq(hdev, event_type);
 +              break;
 +
 +      default:
 +              dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
 +                              event_type);
 +              break;
 +      }
 +}
 +
 +void *goya_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (aggregate) {
 +              *size = (u32) sizeof(goya->events_stat_aggregate);
 +              return goya->events_stat_aggregate;
 +      }
 +
 +      *size = (u32) sizeof(goya->events_stat);
 +      return goya->events_stat;
 +}
 +
 +static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size,
 +                              u64 val, bool is_dram)
 +{
 +      struct packet_lin_dma *lin_dma_pkt;
 +      struct hl_cs_job *job;
 +      u32 cb_size, ctl;
 +      struct hl_cb *cb;
 +      int rc, lin_dma_pkts_cnt;
 +
 +      lin_dma_pkts_cnt = DIV_ROUND_UP_ULL(size, SZ_2G);
 +      cb_size = lin_dma_pkts_cnt * sizeof(struct packet_lin_dma) +
 +                                              sizeof(struct packet_msg_prot);
 +      cb = hl_cb_kernel_create(hdev, cb_size, false);
 +      if (!cb)
 +              return -ENOMEM;
 +
 +      lin_dma_pkt = cb->kernel_address;
 +
 +      do {
 +              memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
 +
 +              ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
 +                              (1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
 +                              (1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
 +                              (1 << GOYA_PKT_CTL_RB_SHIFT) |
 +                              (1 << GOYA_PKT_CTL_MB_SHIFT));
 +              ctl |= (is_dram ? HL_DMA_HOST_TO_DRAM : HL_DMA_HOST_TO_SRAM) <<
 +                              GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
 +              lin_dma_pkt->ctl = cpu_to_le32(ctl);
 +
 +              lin_dma_pkt->src_addr = cpu_to_le64(val);
 +              lin_dma_pkt->dst_addr = cpu_to_le64(addr);
 +              if (lin_dma_pkts_cnt > 1)
 +                      lin_dma_pkt->tsize = cpu_to_le32(SZ_2G);
 +              else
 +                      lin_dma_pkt->tsize = cpu_to_le32(size);
 +
 +              size -= SZ_2G;
 +              addr += SZ_2G;
 +              lin_dma_pkt++;
 +      } while (--lin_dma_pkts_cnt);
 +
 +      job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
 +      if (!job) {
 +              dev_err(hdev->dev, "Failed to allocate a new job\n");
 +              rc = -ENOMEM;
 +              goto release_cb;
 +      }
 +
 +      job->id = 0;
 +      job->user_cb = cb;
 +      atomic_inc(&job->user_cb->cs_cnt);
 +      job->user_cb_size = cb_size;
 +      job->hw_queue_id = GOYA_QUEUE_ID_DMA_0;
 +      job->patched_cb = job->user_cb;
 +      job->job_cb_size = job->user_cb_size;
 +
 +      hl_debugfs_add_job(hdev, job);
 +
 +      rc = goya_send_job_on_qman0(hdev, job);
 +
 +      hl_debugfs_remove_job(hdev, job);
 +      kfree(job);
 +      atomic_dec(&cb->cs_cnt);
 +
 +release_cb:
 +      hl_cb_put(cb);
 +      hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
 +
 +      return rc;
 +}
 +
 +int goya_context_switch(struct hl_device *hdev, u32 asid)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 addr = prop->sram_base_address, sob_addr;
 +      u32 size = hdev->pldm ? 0x10000 : prop->sram_size;
 +      u64 val = 0x7777777777777777ull;
 +      int rc, dma_id;
 +      u32 channel_off = mmDMA_CH_1_WR_COMP_ADDR_LO -
 +                                      mmDMA_CH_0_WR_COMP_ADDR_LO;
 +
 +      rc = goya_memset_device_memory(hdev, addr, size, val, false);
 +      if (rc) {
 +              dev_err(hdev->dev, "Failed to clear SRAM in context switch\n");
 +              return rc;
 +      }
 +
 +      /* we need to reset registers that the user is allowed to change */
 +      sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
 +      WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO, lower_32_bits(sob_addr));
 +
 +      for (dma_id = 1 ; dma_id < NUMBER_OF_EXT_HW_QUEUES ; dma_id++) {
 +              sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
 +                                                      (dma_id - 1) * 4;
 +              WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + channel_off * dma_id,
 +                                              lower_32_bits(sob_addr));
 +      }
 +
 +      WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
 +
 +      goya_clear_sm_regs(hdev);
 +
 +      return 0;
 +}
 +
 +static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr = prop->mmu_pgt_addr;
 +      u32 size = prop->mmu_pgt_size + MMU_DRAM_DEFAULT_PAGE_SIZE +
 +                      MMU_CACHE_MNG_SIZE;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return goya_memset_device_memory(hdev, addr, size, 0, true);
 +}
 +
 +static int goya_mmu_set_dram_default_page(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u64 addr = hdev->asic_prop.mmu_dram_default_page_addr;
 +      u32 size = MMU_DRAM_DEFAULT_PAGE_SIZE;
 +      u64 val = 0x9999999999999999ull;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      return goya_memset_device_memory(hdev, addr, size, val, true);
 +}
 +
 +static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      s64 off, cpu_off;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return 0;
 +
 +      for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB) {
 +              rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                      prop->dram_base_address + off,
 +                      prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                      (off + PAGE_SIZE_2MB) == CPU_FW_IMAGE_SIZE);
 +              if (rc) {
 +                      dev_err(hdev->dev, "Map failed for address 0x%llx\n",
 +                              prop->dram_base_address + off);
 +                      goto unmap;
 +              }
 +      }
 +
 +      if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
 +              rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                      VA_CPU_ACCESSIBLE_MEM_ADDR,
 +                      hdev->cpu_accessible_dma_address,
 +                      PAGE_SIZE_2MB, true);
 +
 +              if (rc) {
 +                      dev_err(hdev->dev,
 +                              "Map failed for CPU accessible memory\n");
 +                      off -= PAGE_SIZE_2MB;
 +                      goto unmap;
 +              }
 +      } else {
 +              for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB) {
 +                      rc = hl_mmu_map_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                              hdev->cpu_accessible_dma_address + cpu_off,
 +                              PAGE_SIZE_4KB, true);
 +                      if (rc) {
 +                              dev_err(hdev->dev,
 +                                      "Map failed for CPU accessible memory\n");
 +                              cpu_off -= PAGE_SIZE_4KB;
 +                              goto unmap_cpu;
 +                      }
 +              }
 +      }
 +
 +      goya_mmu_prepare_reg(hdev, mmCPU_IF_ARUSER_OVR, HL_KERNEL_ASID_ID);
 +      goya_mmu_prepare_reg(hdev, mmCPU_IF_AWUSER_OVR, HL_KERNEL_ASID_ID);
 +      WREG32(mmCPU_IF_ARUSER_OVR_EN, 0x7FF);
 +      WREG32(mmCPU_IF_AWUSER_OVR_EN, 0x7FF);
 +
 +      /* Make sure configuration is flushed to device */
 +      RREG32(mmCPU_IF_AWUSER_OVR_EN);
 +
 +      goya->device_cpu_mmu_mappings_done = true;
 +
 +      return 0;
 +
 +unmap_cpu:
 +      for (; cpu_off >= 0 ; cpu_off -= PAGE_SIZE_4KB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                              PAGE_SIZE_4KB, true))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap address 0x%llx\n",
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
 +unmap:
 +      for (; off >= 0 ; off -= PAGE_SIZE_2MB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                              true))
 +                      dev_warn_ratelimited(hdev->dev,
 +                              "failed to unmap address 0x%llx\n",
 +                              prop->dram_base_address + off);
 +
 +      return rc;
 +}
 +
 +void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev)
 +{
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 off, cpu_off;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (!goya->device_cpu_mmu_mappings_done)
 +              return;
 +
 +      WREG32(mmCPU_IF_ARUSER_OVR_EN, 0);
 +      WREG32(mmCPU_IF_AWUSER_OVR_EN, 0);
 +
 +      if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              VA_CPU_ACCESSIBLE_MEM_ADDR,
 +                              PAGE_SIZE_2MB, true))
 +                      dev_warn(hdev->dev,
 +                              "Failed to unmap CPU accessible memory\n");
 +      } else {
 +              for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB)
 +                      if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                                      VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
 +                                      PAGE_SIZE_4KB,
 +                                      (cpu_off + PAGE_SIZE_4KB) >= SZ_2M))
 +                              dev_warn_ratelimited(hdev->dev,
 +                                      "failed to unmap address 0x%llx\n",
 +                                      VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
 +      }
 +
 +      for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB)
 +              if (hl_mmu_unmap_page(hdev->kernel_ctx,
 +                              prop->dram_base_address + off, PAGE_SIZE_2MB,
 +                              (off + PAGE_SIZE_2MB) >= CPU_FW_IMAGE_SIZE))
 +                      dev_warn_ratelimited(hdev->dev,
 +                                      "Failed to unmap address 0x%llx\n",
 +                                      prop->dram_base_address + off);
 +
 +      goya->device_cpu_mmu_mappings_done = false;
 +}
 +
 +static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      int i;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU))
 +              return;
 +
 +      if (asid & ~MME_QM_GLBL_SECURE_PROPS_ASID_MASK) {
 +              dev_crit(hdev->dev, "asid %u is too big\n", asid);
 +              return;
 +      }
 +
 +      /* zero the MMBP and ASID bits and then set the ASID */
 +      for (i = 0 ; i < GOYA_MMU_REGS_NUM ; i++)
 +              goya_mmu_prepare_reg(hdev, goya_mmu_regs[i], asid);
 +}
 +
 +static int goya_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
 +                                      u32 flags)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      u32 status, timeout_usec;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_MMU) ||
 +              hdev->reset_info.hard_reset_pending)
 +              return 0;
 +
 +      /* no need in L1 only invalidation in Goya */
 +      if (!is_hard)
 +              return 0;
 +
 +      if (hdev->pldm)
 +              timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
 +      else
 +              timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
 +
 +      /* L0 & L1 invalidation */
 +      WREG32(mmSTLB_INV_ALL_START, 1);
 +
 +      rc = hl_poll_timeout(
 +              hdev,
 +              mmSTLB_INV_ALL_START,
 +              status,
 +              !status,
 +              1000,
 +              timeout_usec);
 +
 +      return rc;
 +}
 +
 +static int goya_mmu_invalidate_cache_range(struct hl_device *hdev,
 +                                              bool is_hard, u32 flags,
 +                                              u32 asid, u64 va, u64 size)
 +{
 +      /* Treat as invalidate all because there is no range invalidation
 +       * in Goya
 +       */
 +      return hl_mmu_invalidate_cache(hdev, is_hard, flags);
 +}
 +
 +int goya_send_heartbeat(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_send_heartbeat(hdev);
 +}
 +
 +int goya_cpucp_info_get(struct hl_device *hdev)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +      struct asic_fixed_properties *prop = &hdev->asic_prop;
 +      u64 dram_size;
 +      int rc;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
 +                                      mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
 +                                      mmCPU_BOOT_ERR1);
 +      if (rc)
 +              return rc;
 +
 +      dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
 +      if (dram_size) {
 +              if ((!is_power_of_2(dram_size)) ||
 +                              (dram_size < DRAM_PHYS_DEFAULT_SIZE)) {
 +                      dev_err(hdev->dev,
 +                              "F/W reported invalid DRAM size %llu. Trying to use default size\n",
 +                              dram_size);
 +                      dram_size = DRAM_PHYS_DEFAULT_SIZE;
 +              }
 +
 +              prop->dram_size = dram_size;
 +              prop->dram_end_address = prop->dram_base_address + dram_size;
 +      }
 +
 +      if (!strlen(prop->cpucp_info.card_name))
 +              strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
 +                              CARD_NAME_MAX_LEN);
 +
 +      return 0;
 +}
 +
 +static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
 +                              struct engines_data *e)
 +{
 +      const char *fmt = "%-5d%-9s%#-14x%#-16x%#x\n";
 +      const char *dma_fmt = "%-5d%-9s%#-14x%#x\n";
 +      unsigned long *mask = (unsigned long *)mask_arr;
 +      u32 qm_glbl_sts0, cmdq_glbl_sts0, dma_core_sts0, tpc_cfg_sts,
 +              mme_arch_sts;
 +      bool is_idle = true, is_eng_idle;
 +      u64 offset;
 +      int i;
 +
 +      if (e)
 +              hl_engine_data_sprintf(e, "\nDMA  is_idle  QM_GLBL_STS0  DMA_CORE_STS0\n"
 +                                      "---  -------  ------------  -------------\n");
 +
 +      offset = mmDMA_QM_1_GLBL_STS0 - mmDMA_QM_0_GLBL_STS0;
 +
 +      for (i = 0 ; i < DMA_MAX_NUM ; i++) {
 +              qm_glbl_sts0 = RREG32(mmDMA_QM_0_GLBL_STS0 + i * offset);
 +              dma_core_sts0 = RREG32(mmDMA_CH_0_STS0 + i * offset);
 +              is_eng_idle = IS_DMA_QM_IDLE(qm_glbl_sts0) &&
 +                              IS_DMA_IDLE(dma_core_sts0);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GOYA_ENGINE_ID_DMA_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, dma_fmt, i, is_eng_idle ? "Y" : "N",
 +                                      qm_glbl_sts0, dma_core_sts0);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nTPC  is_idle  QM_GLBL_STS0  CMDQ_GLBL_STS0  CFG_STATUS\n"
 +                      "---  -------  ------------  --------------  ----------\n");
 +
 +      offset = mmTPC1_QM_GLBL_STS0 - mmTPC0_QM_GLBL_STS0;
 +
 +      for (i = 0 ; i < TPC_MAX_NUM ; i++) {
 +              qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + i * offset);
 +              cmdq_glbl_sts0 = RREG32(mmTPC0_CMDQ_GLBL_STS0 + i * offset);
 +              tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + i * offset);
 +              is_eng_idle = IS_TPC_QM_IDLE(qm_glbl_sts0) &&
 +                              IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) &&
 +                              IS_TPC_IDLE(tpc_cfg_sts);
 +              is_idle &= is_eng_idle;
 +
 +              if (mask && !is_eng_idle)
 +                      set_bit(GOYA_ENGINE_ID_TPC_0 + i, mask);
 +              if (e)
 +                      hl_engine_data_sprintf(e, fmt, i, is_eng_idle ? "Y" : "N",
 +                              qm_glbl_sts0, cmdq_glbl_sts0, tpc_cfg_sts);
 +      }
 +
 +      if (e)
 +              hl_engine_data_sprintf(e,
 +                      "\nMME  is_idle  QM_GLBL_STS0  CMDQ_GLBL_STS0  ARCH_STATUS\n"
 +                      "---  -------  ------------  --------------  -----------\n");
 +
 +      qm_glbl_sts0 = RREG32(mmMME_QM_GLBL_STS0);
 +      cmdq_glbl_sts0 = RREG32(mmMME_CMDQ_GLBL_STS0);
 +      mme_arch_sts = RREG32(mmMME_ARCH_STATUS);
 +      is_eng_idle = IS_MME_QM_IDLE(qm_glbl_sts0) &&
 +                      IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) &&
 +                      IS_MME_IDLE(mme_arch_sts);
 +      is_idle &= is_eng_idle;
 +
 +      if (mask && !is_eng_idle)
 +              set_bit(GOYA_ENGINE_ID_MME_0, mask);
 +      if (e) {
 +              hl_engine_data_sprintf(e, fmt, 0, is_eng_idle ? "Y" : "N", qm_glbl_sts0,
 +                              cmdq_glbl_sts0, mme_arch_sts);
 +              hl_engine_data_sprintf(e, "\n");
 +      }
 +
 +      return is_idle;
 +}
 +
 +static void goya_hw_queues_lock(struct hl_device *hdev)
 +      __acquires(&goya->hw_queues_lock)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      spin_lock(&goya->hw_queues_lock);
 +}
 +
 +static void goya_hw_queues_unlock(struct hl_device *hdev)
 +      __releases(&goya->hw_queues_lock)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      spin_unlock(&goya->hw_queues_lock);
 +}
 +
 +static u32 goya_get_pci_id(struct hl_device *hdev)
 +{
 +      return hdev->pdev->device;
 +}
 +
 +static int goya_get_eeprom_data(struct hl_device *hdev, void *data,
 +                              size_t max_size)
 +{
 +      struct goya_device *goya = hdev->asic_specific;
 +
 +      if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
 +              return 0;
 +
 +      return hl_fw_get_eeprom_data(hdev, data, max_size);
 +}
 +
 +static void goya_cpu_init_scrambler_dram(struct hl_device *hdev)
 +{
 +
 +}
 +
 +static int goya_ctx_init(struct hl_ctx *ctx)
 +{
 +      if (ctx->asid != HL_KERNEL_ASID_ID)
 +              goya_mmu_prepare(ctx->hdev, ctx->asid);
 +
 +      return 0;
 +}
 +
 +static int goya_pre_schedule_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +u32 goya_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
 +{
 +      return cq_idx;
 +}
 +
 +static u32 goya_get_signal_cb_size(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_get_wait_cb_size(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
 +                              u32 size, bool eb)
 +{
 +      return 0;
 +}
 +
 +static u32 goya_gen_wait_cb(struct hl_device *hdev,
 +              struct hl_gen_wait_properties *prop)
 +{
 +      return 0;
 +}
 +
 +static void goya_reset_sob(struct hl_device *hdev, void *data)
 +{
 +
 +}
 +
 +static void goya_reset_sob_group(struct hl_device *hdev, u16 sob_group)
 +{
 +
 +}
 +
 +u64 goya_get_device_time(struct hl_device *hdev)
 +{
 +      u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
 +
 +      return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
 +}
 +
 +static int goya_collective_wait_init_cs(struct hl_cs *cs)
 +{
 +      return 0;
 +}
 +
 +static int goya_collective_wait_create_jobs(struct hl_device *hdev,
 +              struct hl_ctx *ctx, struct hl_cs *cs, u32 wait_queue_id,
 +              u32 collective_engine_id, u32 encaps_signal_offset)
 +{
 +      return -EINVAL;
 +}
 +
 +static void goya_ctx_fini(struct hl_ctx *ctx)
 +{
 +
 +}
 +
 +static int goya_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
 +                      u32 *block_size, u32 *block_id)
 +{
 +      return -EPERM;
 +}
 +
 +static int goya_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
 +                              u32 block_id, u32 block_size)
 +{
 +      return -EPERM;
 +}
 +
 +static void goya_enable_events_from_fw(struct hl_device *hdev)
 +{
 +      WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
 +                      GOYA_ASYNC_EVENT_ID_INTS_REGISTER);
 +}
 +
 +static int goya_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
 +{
 +      return -EINVAL;
 +}
 +
 +static int goya_map_pll_idx_to_fw_idx(u32 pll_idx)
 +{
 +      switch (pll_idx) {
 +      case HL_GOYA_CPU_PLL: return CPU_PLL;
 +      case HL_GOYA_PCI_PLL: return PCI_PLL;
 +      case HL_GOYA_MME_PLL: return MME_PLL;
 +      case HL_GOYA_TPC_PLL: return TPC_PLL;
 +      case HL_GOYA_IC_PLL: return IC_PLL;
 +      case HL_GOYA_MC_PLL: return MC_PLL;
 +      case HL_GOYA_EMMC_PLL: return EMMC_PLL;
 +      default: return -EINVAL;
 +      }
 +}
 +
 +static int goya_gen_sync_to_engine_map(struct hl_device *hdev,
 +                              struct hl_sync_to_engine_map *map)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int goya_monitor_valid(struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +static int goya_print_single_monitor(char **buf, size_t *size, size_t *offset,
 +                              struct hl_device *hdev,
 +                              struct hl_mon_state_dump *mon)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static int goya_print_fences_single_engine(
 +      struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
 +      enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
 +      size_t *size, size_t *offset)
 +{
 +      /* Not implemented */
 +      return 0;
 +}
 +
 +
 +static struct hl_state_dump_specs_funcs goya_state_dump_funcs = {
 +      .monitor_valid = goya_monitor_valid,
 +      .print_single_monitor = goya_print_single_monitor,
 +      .gen_sync_to_engine_map = goya_gen_sync_to_engine_map,
 +      .print_fences_single_engine = goya_print_fences_single_engine,
 +};
 +
 +static void goya_state_dump_init(struct hl_device *hdev)
 +{
 +      /* Not implemented */
 +      hdev->state_dump_specs.props = goya_state_dump_specs_props;
 +      hdev->state_dump_specs.funcs = goya_state_dump_funcs;
 +}
 +
 +static u32 goya_get_sob_addr(struct hl_device *hdev, u32 sob_id)
 +{
 +      return 0;
 +}
 +
 +static u32 *goya_get_stream_master_qid_arr(void)
 +{
 +      return NULL;
 +}
 +
 +static int goya_get_monitor_dump(struct hl_device *hdev, void *data)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +static void goya_check_if_razwi_happened(struct hl_device *hdev)
 +{
 +}
 +
 +static int goya_scrub_device_dram(struct hl_device *hdev, u64 val)
 +{
 +      return -EOPNOTSUPP;
 +}
 +
 +static int goya_set_dram_properties(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int goya_set_binning_masks(struct hl_device *hdev)
 +{
 +      return 0;
 +}
 +
 +static int goya_send_device_activity(struct hl_device *hdev, bool open)
 +{
 +      return 0;
 +}
 +
 +static const struct hl_asic_funcs goya_funcs = {
 +      .early_init = goya_early_init,
 +      .early_fini = goya_early_fini,
 +      .late_init = goya_late_init,
 +      .late_fini = goya_late_fini,
 +      .sw_init = goya_sw_init,
 +      .sw_fini = goya_sw_fini,
 +      .hw_init = goya_hw_init,
 +      .hw_fini = goya_hw_fini,
 +      .halt_engines = goya_halt_engines,
 +      .suspend = goya_suspend,
 +      .resume = goya_resume,
 +      .mmap = goya_mmap,
 +      .ring_doorbell = goya_ring_doorbell,
 +      .pqe_write = goya_pqe_write,
 +      .asic_dma_alloc_coherent = goya_dma_alloc_coherent,
 +      .asic_dma_free_coherent = goya_dma_free_coherent,
 +      .scrub_device_mem = goya_scrub_device_mem,
 +      .scrub_device_dram = goya_scrub_device_dram,
 +      .get_int_queue_base = goya_get_int_queue_base,
 +      .test_queues = goya_test_queues,
 +      .asic_dma_pool_zalloc = goya_dma_pool_zalloc,
 +      .asic_dma_pool_free = goya_dma_pool_free,
 +      .cpu_accessible_dma_pool_alloc = goya_cpu_accessible_dma_pool_alloc,
 +      .cpu_accessible_dma_pool_free = goya_cpu_accessible_dma_pool_free,
 +      .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
 +      .cs_parser = goya_cs_parser,
 +      .asic_dma_map_sgtable = hl_dma_map_sgtable,
 +      .add_end_of_cb_packets = goya_add_end_of_cb_packets,
 +      .update_eq_ci = goya_update_eq_ci,
 +      .context_switch = goya_context_switch,
 +      .restore_phase_topology = goya_restore_phase_topology,
 +      .debugfs_read_dma = goya_debugfs_read_dma,
 +      .add_device_attr = goya_add_device_attr,
 +      .handle_eqe = goya_handle_eqe,
 +      .get_events_stat = goya_get_events_stat,
 +      .read_pte = goya_read_pte,
 +      .write_pte = goya_write_pte,
 +      .mmu_invalidate_cache = goya_mmu_invalidate_cache,
 +      .mmu_invalidate_cache_range = goya_mmu_invalidate_cache_range,
 +      .mmu_prefetch_cache_range = NULL,
 +      .send_heartbeat = goya_send_heartbeat,
 +      .debug_coresight = goya_debug_coresight,
 +      .is_device_idle = goya_is_device_idle,
 +      .compute_reset_late_init = goya_compute_reset_late_init,
 +      .hw_queues_lock = goya_hw_queues_lock,
 +      .hw_queues_unlock = goya_hw_queues_unlock,
 +      .get_pci_id = goya_get_pci_id,
 +      .get_eeprom_data = goya_get_eeprom_data,
 +      .get_monitor_dump = goya_get_monitor_dump,
 +      .send_cpu_message = goya_send_cpu_message,
 +      .pci_bars_map = goya_pci_bars_map,
 +      .init_iatu = goya_init_iatu,
 +      .rreg = hl_rreg,
 +      .wreg = hl_wreg,
 +      .halt_coresight = goya_halt_coresight,
 +      .ctx_init = goya_ctx_init,
 +      .ctx_fini = goya_ctx_fini,
 +      .pre_schedule_cs = goya_pre_schedule_cs,
 +      .get_queue_id_for_cq = goya_get_queue_id_for_cq,
 +      .load_firmware_to_device = goya_load_firmware_to_device,
 +      .load_boot_fit_to_device = goya_load_boot_fit_to_device,
 +      .get_signal_cb_size = goya_get_signal_cb_size,
 +      .get_wait_cb_size = goya_get_wait_cb_size,
 +      .gen_signal_cb = goya_gen_signal_cb,
 +      .gen_wait_cb = goya_gen_wait_cb,
 +      .reset_sob = goya_reset_sob,
 +      .reset_sob_group = goya_reset_sob_group,
 +      .get_device_time = goya_get_device_time,
 +      .pb_print_security_errors = NULL,
 +      .collective_wait_init_cs = goya_collective_wait_init_cs,
 +      .collective_wait_create_jobs = goya_collective_wait_create_jobs,
 +      .get_dec_base_addr = NULL,
 +      .scramble_addr = hl_mmu_scramble_addr,
 +      .descramble_addr = hl_mmu_descramble_addr,
 +      .ack_protection_bits_errors = goya_ack_protection_bits_errors,
 +      .get_hw_block_id = goya_get_hw_block_id,
 +      .hw_block_mmap = goya_block_mmap,
 +      .enable_events_from_fw = goya_enable_events_from_fw,
 +      .ack_mmu_errors = goya_ack_mmu_page_fault_or_access_error,
 +      .map_pll_idx_to_fw_idx = goya_map_pll_idx_to_fw_idx,
 +      .init_firmware_preload_params = goya_init_firmware_preload_params,
 +      .init_firmware_loader = goya_init_firmware_loader,
 +      .init_cpu_scrambler_dram = goya_cpu_init_scrambler_dram,
 +      .state_dump_init = goya_state_dump_init,
 +      .get_sob_addr = &goya_get_sob_addr,
 +      .set_pci_memory_regions = goya_set_pci_memory_regions,
 +      .get_stream_master_qid_arr = goya_get_stream_master_qid_arr,
 +      .check_if_razwi_happened = goya_check_if_razwi_happened,
 +      .mmu_get_real_page_size = hl_mmu_get_real_page_size,
 +      .access_dev_mem = hl_access_dev_mem,
 +      .set_dram_bar_base = goya_set_ddr_bar_base,
 +      .send_device_activity = goya_send_device_activity,
 +      .set_dram_properties = goya_set_dram_properties,
 +      .set_binning_masks = goya_set_binning_masks,
 +};
 +
 +/*
 + * goya_set_asic_funcs - set Goya function pointers
 + *
 + * @*hdev: pointer to hl_device structure
 + *
 + */
 +void goya_set_asic_funcs(struct hl_device *hdev)
 +{
 +      hdev->asic_funcs = &goya_funcs;
 +}
index 01d47d3bad5bbb517158e6a37d6181996f4acec3,0000000000000000000000000000000000000000..52b339aefadcae0dd01f5d0d5d1a20aec44c1412
mode 100644,000000..100644
--- /dev/null
@@@ -1,749 -1,0 +1,749 @@@
-       vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND;
 +// SPDX-License-Identifier: GPL-2.0-only
 +/*
 + * Copyright (C) 2020-2023 Intel Corporation
 + */
 +
 +#include <linux/dma-buf.h>
 +#include <linux/highmem.h>
 +#include <linux/module.h>
 +#include <linux/set_memory.h>
 +#include <linux/xarray.h>
 +
 +#include <drm/drm_cache.h>
 +#include <drm/drm_debugfs.h>
 +#include <drm/drm_file.h>
 +#include <drm/drm_utils.h>
 +
 +#include "ivpu_drv.h"
 +#include "ivpu_gem.h"
 +#include "ivpu_hw.h"
 +#include "ivpu_mmu.h"
 +#include "ivpu_mmu_context.h"
 +
 +MODULE_IMPORT_NS(DMA_BUF);
 +
 +static const struct drm_gem_object_funcs ivpu_gem_funcs;
 +
 +static struct lock_class_key prime_bo_lock_class_key;
 +
 +static int __must_check prime_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      /* Pages are managed by the underlying dma-buf */
 +      return 0;
 +}
 +
 +static void prime_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      /* Pages are managed by the underlying dma-buf */
 +}
 +
 +static int prime_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct sg_table *sgt;
 +
 +      sgt = dma_buf_map_attachment_unlocked(bo->base.import_attach, DMA_BIDIRECTIONAL);
 +      if (IS_ERR(sgt)) {
 +              ivpu_err(vdev, "Failed to map attachment: %ld\n", PTR_ERR(sgt));
 +              return PTR_ERR(sgt);
 +      }
 +
 +      bo->sgt = sgt;
 +      return 0;
 +}
 +
 +static void prime_unmap_pages_locked(struct ivpu_bo *bo)
 +{
 +      dma_buf_unmap_attachment_unlocked(bo->base.import_attach, bo->sgt, DMA_BIDIRECTIONAL);
 +      bo->sgt = NULL;
 +}
 +
 +static const struct ivpu_bo_ops prime_ops = {
 +      .type = IVPU_BO_TYPE_PRIME,
 +      .name = "prime",
 +      .alloc_pages = prime_alloc_pages_locked,
 +      .free_pages = prime_free_pages_locked,
 +      .map_pages = prime_map_pages_locked,
 +      .unmap_pages = prime_unmap_pages_locked,
 +};
 +
 +static int __must_check shmem_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      int npages = bo->base.size >> PAGE_SHIFT;
 +      struct page **pages;
 +
 +      pages = drm_gem_get_pages(&bo->base);
 +      if (IS_ERR(pages))
 +              return PTR_ERR(pages);
 +
 +      if (bo->flags & DRM_IVPU_BO_WC)
 +              set_pages_array_wc(pages, npages);
 +      else if (bo->flags & DRM_IVPU_BO_UNCACHED)
 +              set_pages_array_uc(pages, npages);
 +
 +      bo->pages = pages;
 +      return 0;
 +}
 +
 +static void shmem_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
 +              set_pages_array_wb(bo->pages, bo->base.size >> PAGE_SHIFT);
 +
 +      drm_gem_put_pages(&bo->base, bo->pages, true, false);
 +      bo->pages = NULL;
 +}
 +
 +static int ivpu_bo_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      int npages = bo->base.size >> PAGE_SHIFT;
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct sg_table *sgt;
 +      int ret;
 +
 +      sgt = drm_prime_pages_to_sg(&vdev->drm, bo->pages, npages);
 +      if (IS_ERR(sgt)) {
 +              ivpu_err(vdev, "Failed to allocate sgtable\n");
 +              return PTR_ERR(sgt);
 +      }
 +
 +      ret = dma_map_sgtable(vdev->drm.dev, sgt, DMA_BIDIRECTIONAL, 0);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
 +              goto err_free_sgt;
 +      }
 +
 +      bo->sgt = sgt;
 +      return 0;
 +
 +err_free_sgt:
 +      kfree(sgt);
 +      return ret;
 +}
 +
 +static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      dma_unmap_sgtable(vdev->drm.dev, bo->sgt, DMA_BIDIRECTIONAL, 0);
 +      sg_free_table(bo->sgt);
 +      kfree(bo->sgt);
 +      bo->sgt = NULL;
 +}
 +
 +static const struct ivpu_bo_ops shmem_ops = {
 +      .type = IVPU_BO_TYPE_SHMEM,
 +      .name = "shmem",
 +      .alloc_pages = shmem_alloc_pages_locked,
 +      .free_pages = shmem_free_pages_locked,
 +      .map_pages = ivpu_bo_map_pages_locked,
 +      .unmap_pages = ivpu_bo_unmap_pages_locked,
 +};
 +
 +static int __must_check internal_alloc_pages_locked(struct ivpu_bo *bo)
 +{
 +      unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
 +      struct page **pages;
 +      int ret;
 +
 +      pages = kvmalloc_array(npages, sizeof(*bo->pages), GFP_KERNEL);
 +      if (!pages)
 +              return -ENOMEM;
 +
 +      for (i = 0; i < npages; i++) {
 +              pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
 +              if (!pages[i]) {
 +                      ret = -ENOMEM;
 +                      goto err_free_pages;
 +              }
 +              cond_resched();
 +      }
 +
 +      bo->pages = pages;
 +      return 0;
 +
 +err_free_pages:
 +      while (i--)
 +              put_page(pages[i]);
 +      kvfree(pages);
 +      return ret;
 +}
 +
 +static void internal_free_pages_locked(struct ivpu_bo *bo)
 +{
 +      unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
 +
 +      for (i = 0; i < npages; i++)
 +              put_page(bo->pages[i]);
 +
 +      kvfree(bo->pages);
 +      bo->pages = NULL;
 +}
 +
 +static const struct ivpu_bo_ops internal_ops = {
 +      .type = IVPU_BO_TYPE_INTERNAL,
 +      .name = "internal",
 +      .alloc_pages = internal_alloc_pages_locked,
 +      .free_pages = internal_free_pages_locked,
 +      .map_pages = ivpu_bo_map_pages_locked,
 +      .unmap_pages = ivpu_bo_unmap_pages_locked,
 +};
 +
 +static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret;
 +
 +      lockdep_assert_held(&bo->lock);
 +      drm_WARN_ON(&vdev->drm, bo->sgt);
 +
 +      ret = bo->ops->alloc_pages(bo);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
 +              return ret;
 +      }
 +
 +      ret = bo->ops->map_pages(bo);
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to map pages for BO: %d", ret);
 +              goto err_free_pages;
 +      }
 +      return ret;
 +
 +err_free_pages:
 +      bo->ops->free_pages(bo);
 +      return ret;
 +}
 +
 +static void ivpu_bo_unmap_and_free_pages(struct ivpu_bo *bo)
 +{
 +      mutex_lock(&bo->lock);
 +
 +      WARN_ON(!bo->sgt);
 +      bo->ops->unmap_pages(bo);
 +      WARN_ON(bo->sgt);
 +      bo->ops->free_pages(bo);
 +      WARN_ON(bo->pages);
 +
 +      mutex_unlock(&bo->lock);
 +}
 +
 +/*
 + * ivpu_bo_pin() - pin the backing physical pages and map them to VPU.
 + *
 + * This function pins physical memory pages, then maps the physical pages
 + * to IOMMU address space and finally updates the VPU MMU page tables
 + * to allow the VPU to translate VPU address to IOMMU address.
 + */
 +int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret = 0;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->vpu_addr) {
 +              ivpu_err(vdev, "vpu_addr not set for BO ctx_id: %d handle: %d\n",
 +                       bo->ctx->id, bo->handle);
 +              ret = -EINVAL;
 +              goto unlock;
 +      }
 +
 +      if (!bo->sgt) {
 +              ret = ivpu_bo_alloc_and_map_pages_locked(bo);
 +              if (ret)
 +                      goto unlock;
 +      }
 +
 +      if (!bo->mmu_mapped) {
 +              ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, bo->sgt,
 +                                             ivpu_bo_is_snooped(bo));
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to map BO in MMU: %d\n", ret);
 +                      goto unlock;
 +              }
 +              bo->mmu_mapped = true;
 +      }
 +
 +unlock:
 +      mutex_unlock(&bo->lock);
 +
 +      return ret;
 +}
 +
 +static int
 +ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
 +                     const struct ivpu_addr_range *range)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      int ret;
 +
 +      if (!range) {
 +              if (bo->flags & DRM_IVPU_BO_HIGH_MEM)
 +                      range = &vdev->hw->ranges.user_high;
 +              else
 +                      range = &vdev->hw->ranges.user_low;
 +      }
 +
 +      mutex_lock(&ctx->lock);
 +      ret = ivpu_mmu_context_insert_node_locked(ctx, range, bo->base.size, &bo->mm_node);
 +      if (!ret) {
 +              bo->ctx = ctx;
 +              bo->vpu_addr = bo->mm_node.start;
 +              list_add_tail(&bo->ctx_node, &ctx->bo_list);
 +      }
 +      mutex_unlock(&ctx->lock);
 +
 +      return ret;
 +}
 +
 +static void ivpu_bo_free_vpu_addr(struct ivpu_bo *bo)
 +{
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +      struct ivpu_mmu_context *ctx = bo->ctx;
 +
 +      ivpu_dbg(vdev, BO, "remove from ctx: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
 +               ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (bo->mmu_mapped) {
 +              drm_WARN_ON(&vdev->drm, !bo->sgt);
 +              ivpu_mmu_context_unmap_sgt(vdev, ctx, bo->vpu_addr, bo->sgt);
 +              bo->mmu_mapped = false;
 +      }
 +
 +      mutex_lock(&ctx->lock);
 +      list_del(&bo->ctx_node);
 +      bo->vpu_addr = 0;
 +      bo->ctx = NULL;
 +      ivpu_mmu_context_remove_node_locked(ctx, &bo->mm_node);
 +      mutex_unlock(&ctx->lock);
 +
 +      mutex_unlock(&bo->lock);
 +}
 +
 +void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx)
 +{
 +      struct ivpu_bo *bo, *tmp;
 +
 +      list_for_each_entry_safe(bo, tmp, &ctx->bo_list, ctx_node)
 +              ivpu_bo_free_vpu_addr(bo);
 +}
 +
 +static struct ivpu_bo *
 +ivpu_bo_alloc(struct ivpu_device *vdev, struct ivpu_mmu_context *mmu_context,
 +            u64 size, u32 flags, const struct ivpu_bo_ops *ops,
 +            const struct ivpu_addr_range *range, u64 user_ptr)
 +{
 +      struct ivpu_bo *bo;
 +      int ret = 0;
 +
 +      if (drm_WARN_ON(&vdev->drm, size == 0 || !PAGE_ALIGNED(size)))
 +              return ERR_PTR(-EINVAL);
 +
 +      switch (flags & DRM_IVPU_BO_CACHE_MASK) {
 +      case DRM_IVPU_BO_CACHED:
 +      case DRM_IVPU_BO_UNCACHED:
 +      case DRM_IVPU_BO_WC:
 +              break;
 +      default:
 +              return ERR_PTR(-EINVAL);
 +      }
 +
 +      bo = kzalloc(sizeof(*bo), GFP_KERNEL);
 +      if (!bo)
 +              return ERR_PTR(-ENOMEM);
 +
 +      mutex_init(&bo->lock);
 +      bo->base.funcs = &ivpu_gem_funcs;
 +      bo->flags = flags;
 +      bo->ops = ops;
 +      bo->user_ptr = user_ptr;
 +
 +      if (ops->type == IVPU_BO_TYPE_SHMEM)
 +              ret = drm_gem_object_init(&vdev->drm, &bo->base, size);
 +      else
 +              drm_gem_private_object_init(&vdev->drm, &bo->base, size);
 +
 +      if (ret) {
 +              ivpu_err(vdev, "Failed to initialize drm object\n");
 +              goto err_free;
 +      }
 +
 +      if (flags & DRM_IVPU_BO_MAPPABLE) {
 +              ret = drm_gem_create_mmap_offset(&bo->base);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to allocate mmap offset\n");
 +                      goto err_release;
 +              }
 +      }
 +
 +      if (mmu_context) {
 +              ret = ivpu_bo_alloc_vpu_addr(bo, mmu_context, range);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to add BO to context: %d\n", ret);
 +                      goto err_release;
 +              }
 +      }
 +
 +      return bo;
 +
 +err_release:
 +      drm_gem_object_release(&bo->base);
 +err_free:
 +      kfree(bo);
 +      return ERR_PTR(ret);
 +}
 +
 +static void ivpu_bo_free(struct drm_gem_object *obj)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      if (bo->ctx)
 +              ivpu_dbg(vdev, BO, "free: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
 +                       bo->ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
 +      else
 +              ivpu_dbg(vdev, BO, "free: ctx (released) allocated %d mmu_mapped %d\n",
 +                       (bool)bo->sgt, bo->mmu_mapped);
 +
 +      drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
 +
 +      vunmap(bo->kvaddr);
 +
 +      if (bo->ctx)
 +              ivpu_bo_free_vpu_addr(bo);
 +
 +      if (bo->sgt)
 +              ivpu_bo_unmap_and_free_pages(bo);
 +
 +      if (bo->base.import_attach)
 +              drm_prime_gem_destroy(&bo->base, bo->sgt);
 +
 +      drm_gem_object_release(&bo->base);
 +
 +      mutex_destroy(&bo->lock);
 +      kfree(bo);
 +}
 +
 +static int ivpu_bo_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 +
 +      ivpu_dbg(vdev, BO, "mmap: ctx %u handle %u vpu_addr 0x%llx size %zu type %s",
 +               bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size, bo->ops->name);
 +
 +      if (obj->import_attach) {
 +              /* Drop the reference drm_gem_mmap_obj() acquired.*/
 +              drm_gem_object_put(obj);
 +              vma->vm_private_data = NULL;
 +              return dma_buf_mmap(obj->dma_buf, vma, 0);
 +      }
 +
++      vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND);
 +      vma->vm_page_prot = ivpu_bo_pgprot(bo, vm_get_page_prot(vma->vm_flags));
 +
 +      return 0;
 +}
 +
 +static struct sg_table *ivpu_bo_get_sg_table(struct drm_gem_object *obj)
 +{
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      loff_t npages = obj->size >> PAGE_SHIFT;
 +      int ret = 0;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->sgt)
 +              ret = ivpu_bo_alloc_and_map_pages_locked(bo);
 +
 +      mutex_unlock(&bo->lock);
 +
 +      if (ret)
 +              return ERR_PTR(ret);
 +
 +      return drm_prime_pages_to_sg(obj->dev, bo->pages, npages);
 +}
 +
 +static vm_fault_t ivpu_vm_fault(struct vm_fault *vmf)
 +{
 +      struct vm_area_struct *vma = vmf->vma;
 +      struct drm_gem_object *obj = vma->vm_private_data;
 +      struct ivpu_bo *bo = to_ivpu_bo(obj);
 +      loff_t npages = obj->size >> PAGE_SHIFT;
 +      pgoff_t page_offset;
 +      struct page *page;
 +      vm_fault_t ret;
 +      int err;
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->sgt) {
 +              err = ivpu_bo_alloc_and_map_pages_locked(bo);
 +              if (err) {
 +                      ret = vmf_error(err);
 +                      goto unlock;
 +              }
 +      }
 +
 +      /* We don't use vmf->pgoff since that has the fake offset */
 +      page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
 +      if (page_offset >= npages) {
 +              ret = VM_FAULT_SIGBUS;
 +      } else {
 +              page = bo->pages[page_offset];
 +              ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
 +      }
 +
 +unlock:
 +      mutex_unlock(&bo->lock);
 +
 +      return ret;
 +}
 +
 +static const struct vm_operations_struct ivpu_vm_ops = {
 +      .fault = ivpu_vm_fault,
 +      .open = drm_gem_vm_open,
 +      .close = drm_gem_vm_close,
 +};
 +
 +static const struct drm_gem_object_funcs ivpu_gem_funcs = {
 +      .free = ivpu_bo_free,
 +      .mmap = ivpu_bo_mmap,
 +      .vm_ops = &ivpu_vm_ops,
 +      .get_sg_table = ivpu_bo_get_sg_table,
 +};
 +
 +int
 +ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct ivpu_file_priv *file_priv = file->driver_priv;
 +      struct ivpu_device *vdev = file_priv->vdev;
 +      struct drm_ivpu_bo_create *args = data;
 +      u64 size = PAGE_ALIGN(args->size);
 +      struct ivpu_bo *bo;
 +      int ret;
 +
 +      if (args->flags & ~DRM_IVPU_BO_FLAGS)
 +              return -EINVAL;
 +
 +      if (size == 0)
 +              return -EINVAL;
 +
 +      bo = ivpu_bo_alloc(vdev, &file_priv->ctx, size, args->flags, &shmem_ops, NULL, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to create BO: %pe (ctx %u size %llu flags 0x%x)",
 +                       bo, file_priv->ctx.id, args->size, args->flags);
 +              return PTR_ERR(bo);
 +      }
 +
 +      ret = drm_gem_handle_create(file, &bo->base, &bo->handle);
 +      if (!ret) {
 +              args->vpu_addr = bo->vpu_addr;
 +              args->handle = bo->handle;
 +      }
 +
 +      drm_gem_object_put(&bo->base);
 +
 +      ivpu_dbg(vdev, BO, "alloc shmem: ctx %u vpu_addr 0x%llx size %zu flags 0x%x\n",
 +               file_priv->ctx.id, bo->vpu_addr, bo->base.size, bo->flags);
 +
 +      return ret;
 +}
 +
 +struct ivpu_bo *
 +ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags)
 +{
 +      const struct ivpu_addr_range *range;
 +      struct ivpu_addr_range fixed_range;
 +      struct ivpu_bo *bo;
 +      pgprot_t prot;
 +      int ret;
 +
 +      drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(vpu_addr));
 +      drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(size));
 +
 +      if (vpu_addr) {
 +              fixed_range.start = vpu_addr;
 +              fixed_range.end = vpu_addr + size;
 +              range = &fixed_range;
 +      } else {
 +              range = &vdev->hw->ranges.global_low;
 +      }
 +
 +      bo = ivpu_bo_alloc(vdev, &vdev->gctx, size, flags, &internal_ops, range, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to create BO: %pe (vpu_addr 0x%llx size %llu flags 0x%x)",
 +                       bo, vpu_addr, size, flags);
 +              return NULL;
 +      }
 +
 +      ret = ivpu_bo_pin(bo);
 +      if (ret)
 +              goto err_put;
 +
 +      if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
 +              drm_clflush_pages(bo->pages, bo->base.size >> PAGE_SHIFT);
 +
 +      prot = ivpu_bo_pgprot(bo, PAGE_KERNEL);
 +      bo->kvaddr = vmap(bo->pages, bo->base.size >> PAGE_SHIFT, VM_MAP, prot);
 +      if (!bo->kvaddr) {
 +              ivpu_err(vdev, "Failed to map BO into kernel virtual memory\n");
 +              goto err_put;
 +      }
 +
 +      ivpu_dbg(vdev, BO, "alloc internal: ctx 0 vpu_addr 0x%llx size %zu flags 0x%x\n",
 +               bo->vpu_addr, bo->base.size, flags);
 +
 +      return bo;
 +
 +err_put:
 +      drm_gem_object_put(&bo->base);
 +      return NULL;
 +}
 +
 +void ivpu_bo_free_internal(struct ivpu_bo *bo)
 +{
 +      drm_gem_object_put(&bo->base);
 +}
 +
 +struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
 +{
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct dma_buf_attachment *attach;
 +      struct ivpu_bo *bo;
 +
 +      attach = dma_buf_attach(buf, dev->dev);
 +      if (IS_ERR(attach))
 +              return ERR_CAST(attach);
 +
 +      get_dma_buf(buf);
 +
 +      bo = ivpu_bo_alloc(vdev, NULL, buf->size, DRM_IVPU_BO_MAPPABLE, &prime_ops, NULL, 0);
 +      if (IS_ERR(bo)) {
 +              ivpu_err(vdev, "Failed to import BO: %pe (size %lu)", bo, buf->size);
 +              goto err_detach;
 +      }
 +
 +      lockdep_set_class(&bo->lock, &prime_bo_lock_class_key);
 +
 +      bo->base.import_attach = attach;
 +
 +      return &bo->base;
 +
 +err_detach:
 +      dma_buf_detach(buf, attach);
 +      dma_buf_put(buf);
 +      return ERR_CAST(bo);
 +}
 +
 +int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct ivpu_file_priv *file_priv = file->driver_priv;
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct drm_ivpu_bo_info *args = data;
 +      struct drm_gem_object *obj;
 +      struct ivpu_bo *bo;
 +      int ret = 0;
 +
 +      obj = drm_gem_object_lookup(file, args->handle);
 +      if (!obj)
 +              return -ENOENT;
 +
 +      bo = to_ivpu_bo(obj);
 +
 +      mutex_lock(&bo->lock);
 +
 +      if (!bo->ctx) {
 +              ret = ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, NULL);
 +              if (ret) {
 +                      ivpu_err(vdev, "Failed to allocate vpu_addr: %d\n", ret);
 +                      goto unlock;
 +              }
 +      }
 +
 +      args->flags = bo->flags;
 +      args->mmap_offset = drm_vma_node_offset_addr(&obj->vma_node);
 +      args->vpu_addr = bo->vpu_addr;
 +      args->size = obj->size;
 +unlock:
 +      mutex_unlock(&bo->lock);
 +      drm_gem_object_put(obj);
 +      return ret;
 +}
 +
 +int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 +{
 +      struct drm_ivpu_bo_wait *args = data;
 +      struct drm_gem_object *obj;
 +      unsigned long timeout;
 +      long ret;
 +
 +      timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
 +
 +      obj = drm_gem_object_lookup(file, args->handle);
 +      if (!obj)
 +              return -EINVAL;
 +
 +      ret = dma_resv_wait_timeout(obj->resv, DMA_RESV_USAGE_READ, true, timeout);
 +      if (ret == 0) {
 +              ret = -ETIMEDOUT;
 +      } else if (ret > 0) {
 +              ret = 0;
 +              args->job_status = to_ivpu_bo(obj)->job_status;
 +      }
 +
 +      drm_gem_object_put(obj);
 +
 +      return ret;
 +}
 +
 +static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
 +{
 +      unsigned long dma_refcount = 0;
 +
 +      if (bo->base.dma_buf && bo->base.dma_buf->file)
 +              dma_refcount = atomic_long_read(&bo->base.dma_buf->file->f_count);
 +
 +      drm_printf(p, "%5u %6d %16llx %10lu %10u %12lu %14s\n",
 +                 bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size,
 +                 kref_read(&bo->base.refcount), dma_refcount, bo->ops->name);
 +}
 +
 +void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
 +{
 +      struct ivpu_device *vdev = to_ivpu_device(dev);
 +      struct ivpu_file_priv *file_priv;
 +      unsigned long ctx_id;
 +      struct ivpu_bo *bo;
 +
 +      drm_printf(p, "%5s %6s %16s %10s %10s %12s %14s\n",
 +                 "ctx", "handle", "vpu_addr", "size", "refcount", "dma_refcount", "type");
 +
 +      mutex_lock(&vdev->gctx.lock);
 +      list_for_each_entry(bo, &vdev->gctx.bo_list, ctx_node)
 +              ivpu_bo_print_info(bo, p);
 +      mutex_unlock(&vdev->gctx.lock);
 +
 +      xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
 +              file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
 +              if (!file_priv)
 +                      continue;
 +
 +              mutex_lock(&file_priv->ctx.lock);
 +              list_for_each_entry(bo, &file_priv->ctx.bo_list, ctx_node)
 +                      ivpu_bo_print_info(bo, p);
 +              mutex_unlock(&file_priv->ctx.lock);
 +
 +              ivpu_file_priv_put(&file_priv);
 +      }
 +}
 +
 +void ivpu_bo_list_print(struct drm_device *dev)
 +{
 +      struct drm_printer p = drm_info_printer(dev->dev);
 +
 +      ivpu_bo_list(dev, &p);
 +}
diff --combined drivers/block/brd.c
index a8a77a1efe1e369c77eef227639de9ef3179cb86,37dce184eb56c6bffbeaa7272bd686f330ec5931..34177f1bd97dc09a4ccdd55f2ea1972f431d7e45
@@@ -78,25 -78,32 +78,25 @@@ static struct page *brd_lookup_page(str
  }
  
  /*
 - * Look up and return a brd's page for a given sector.
 - * If one does not exist, allocate an empty page, and insert that. Then
 - * return it.
 + * Insert a new page for a given sector, if one does not already exist.
   */
 -static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 +static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
  {
        pgoff_t idx;
        struct page *page;
 -      gfp_t gfp_flags;
 +      int ret = 0;
  
        page = brd_lookup_page(brd, sector);
        if (page)
 -              return page;
 +              return 0;
  
 -      /*
 -       * Must use NOIO because we don't want to recurse back into the
 -       * block or filesystem layers from page reclaim.
 -       */
 -      gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
 -      page = alloc_page(gfp_flags);
 +      page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM);
        if (!page)
 -              return NULL;
 +              return -ENOMEM;
  
 -      if (radix_tree_preload(GFP_NOIO)) {
 +      if (radix_tree_maybe_preload(gfp)) {
                __free_page(page);
 -              return NULL;
 +              return -ENOMEM;
        }
  
        spin_lock(&brd->brd_lock);
        if (radix_tree_insert(&brd->brd_pages, idx, page)) {
                __free_page(page);
                page = radix_tree_lookup(&brd->brd_pages, idx);
 -              BUG_ON(!page);
 -              BUG_ON(page->index != idx);
 +              if (!page)
 +                      ret = -ENOMEM;
 +              else if (page->index != idx)
 +                      ret = -EIO;
        } else {
                brd->brd_nr_pages++;
        }
        spin_unlock(&brd->brd_lock);
  
        radix_tree_preload_end();
 -
 -      return page;
 +      return ret;
  }
  
  /*
@@@ -164,22 -170,20 +164,22 @@@ static void brd_free_pages(struct brd_d
  /*
   * copy_to_brd_setup must be called before copy_to_brd. It may sleep.
   */
 -static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
 +static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n,
 +                           gfp_t gfp)
  {
        unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
        size_t copy;
 +      int ret;
  
        copy = min_t(size_t, n, PAGE_SIZE - offset);
 -      if (!brd_insert_page(brd, sector))
 -              return -ENOSPC;
 +      ret = brd_insert_page(brd, sector, gfp);
 +      if (ret)
 +              return ret;
        if (copy < n) {
                sector += copy >> SECTOR_SHIFT;
 -              if (!brd_insert_page(brd, sector))
 -                      return -ENOSPC;
 +              ret = brd_insert_page(brd, sector, gfp);
        }
 -      return 0;
 +      return ret;
  }
  
  /*
@@@ -252,26 -256,20 +252,26 @@@ static void copy_from_brd(void *dst, st
   * Process a single bvec of a bio.
   */
  static int brd_do_bvec(struct brd_device *brd, struct page *page,
 -                      unsigned int len, unsigned int off, enum req_op op,
 +                      unsigned int len, unsigned int off, blk_opf_t opf,
                        sector_t sector)
  {
        void *mem;
        int err = 0;
  
 -      if (op_is_write(op)) {
 -              err = copy_to_brd_setup(brd, sector, len);
 +      if (op_is_write(opf)) {
 +              /*
 +               * Must use NOIO because we don't want to recurse back into the
 +               * block or filesystem layers from page reclaim.
 +               */
 +              gfp_t gfp = opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO;
 +
 +              err = copy_to_brd_setup(brd, sector, len, gfp);
                if (err)
                        goto out;
        }
  
        mem = kmap_atomic(page);
 -      if (!op_is_write(op)) {
 +      if (!op_is_write(opf)) {
                copy_from_brd(mem + off, brd, sector, len);
                flush_dcache_page(page);
        } else {
@@@ -300,12 -298,8 +300,12 @@@ static void brd_submit_bio(struct bio *
                                (len & (SECTOR_SIZE - 1)));
  
                err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
 -                                bio_op(bio), sector);
 +                                bio->bi_opf, sector);
                if (err) {
 +                      if (err == -ENOMEM && bio->bi_opf & REQ_NOWAIT) {
 +                              bio_wouldblock_error(bio);
 +                              return;
 +                      }
                        bio_io_error(bio);
                        return;
                }
        bio_endio(bio);
  }
  
- static int brd_rw_page(struct block_device *bdev, sector_t sector,
-                      struct page *page, enum req_op op)
- {
-       struct brd_device *brd = bdev->bd_disk->private_data;
-       int err;
-       if (PageTransHuge(page))
-               return -ENOTSUPP;
-       err = brd_do_bvec(brd, page, PAGE_SIZE, 0, op, sector);
-       page_endio(page, op_is_write(op), err);
-       return err;
- }
  static const struct block_device_operations brd_fops = {
        .owner =                THIS_MODULE,
        .submit_bio =           brd_submit_bio,
-       .rw_page =              brd_rw_page,
  };
  
  /*
@@@ -417,8 -397,8 +403,9 @@@ static int brd_alloc(int i
  
        /* Tell the block layer that this is not a rotational device */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
+       blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
 +      blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
        err = add_disk(disk);
        if (err)
                goto out_cleanup_disk;
index bd8ae4822dc3ef7e14421efa44957b43051fa838,25526707f607c5498dfb64149ca140d77f0c5f3d..aa490da3cef233409e2b85db33a7f8c88d3cba29
@@@ -190,7 -190,7 +190,7 @@@ static inline bool valid_io_request(str
  
        end = start + (size >> SECTOR_SHIFT);
        bound = zram->disksize >> SECTOR_SHIFT;
-       /* out of range range */
+       /* out of range */
        if (unlikely(start >= bound || end > bound || start > end))
                return false;
  
@@@ -703,7 -703,9 +703,7 @@@ static ssize_t writeback_store(struct d
        for (; nr_pages != 0; index++, nr_pages--) {
                struct bio_vec bvec;
  
 -              bvec.bv_page = page;
 -              bvec.bv_len = PAGE_SIZE;
 -              bvec.bv_offset = 0;
 +              bvec_set_page(&bvec, page, PAGE_SIZE, 0);
  
                spin_lock(&zram->wb_limit_lock);
                if (zram->wb_limit_enable && !zram->bd_wb_limit) {
@@@ -1138,7 -1140,7 +1138,7 @@@ static ssize_t recomp_algorithm_store(s
        while (*args) {
                args = next_arg(args, &param, &val);
  
-               if (!*val)
+               if (!val || !*val)
                        return -EINVAL;
  
                if (!strcmp(param, "algo")) {
@@@ -1378,9 -1380,12 +1378,9 @@@ out
  static int zram_bvec_read_from_bdev(struct zram *zram, struct page *page,
                                    u32 index, struct bio *bio, bool partial_io)
  {
 -      struct bio_vec bvec = {
 -              .bv_page = page,
 -              .bv_len = PAGE_SIZE,
 -              .bv_offset = 0,
 -      };
 +      struct bio_vec bvec;
  
 +      bvec_set_page(&bvec, page, PAGE_SIZE, 0);
        return read_from_bdev(zram, &bvec, zram_get_element(zram, index), bio,
                              partial_io);
  }
@@@ -1448,10 -1453,6 +1448,6 @@@ static int __zram_bvec_read(struct zra
                /* Slot should be unlocked before the function call */
                zram_slot_unlock(zram, index);
  
-               /* A null bio means rw_page was used, we must fallback to bio */
-               if (!bio)
-                       return -EOPNOTSUPP;
                ret = zram_bvec_read_from_bdev(zram, page, index, bio,
                                               partial_io);
        }
@@@ -1647,7 -1648,9 +1643,7 @@@ static int zram_bvec_write(struct zram 
                memcpy_from_bvec(dst + offset, bvec);
                kunmap_atomic(dst);
  
 -              vec.bv_page = page;
 -              vec.bv_len = PAGE_SIZE;
 -              vec.bv_offset = 0;
 +              bvec_set_page(&vec, page, PAGE_SIZE, 0);
        }
  
        ret = __zram_bvec_write(zram, &vec, index, bio);
@@@ -1817,7 -1820,7 +1813,7 @@@ static ssize_t recompress_store(struct 
        while (*args) {
                args = next_arg(args, &param, &val);
  
-               if (!*val)
+               if (!val || !*val)
                        return -EINVAL;
  
                if (!strcmp(param, "type")) {
@@@ -2074,61 -2077,6 +2070,6 @@@ static void zram_slot_free_notify(struc
        zram_slot_unlock(zram, index);
  }
  
- static int zram_rw_page(struct block_device *bdev, sector_t sector,
-                      struct page *page, enum req_op op)
- {
-       int offset, ret;
-       u32 index;
-       struct zram *zram;
-       struct bio_vec bv;
-       unsigned long start_time;
-       if (PageTransHuge(page))
-               return -ENOTSUPP;
-       zram = bdev->bd_disk->private_data;
-       if (!valid_io_request(zram, sector, PAGE_SIZE)) {
-               atomic64_inc(&zram->stats.invalid_io);
-               ret = -EINVAL;
-               goto out;
-       }
-       index = sector >> SECTORS_PER_PAGE_SHIFT;
-       offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
-       bv.bv_page = page;
-       bv.bv_len = PAGE_SIZE;
-       bv.bv_offset = 0;
-       start_time = bdev_start_io_acct(bdev->bd_disk->part0,
-                       SECTORS_PER_PAGE, op, jiffies);
-       ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
-       bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
- out:
-       /*
-        * If I/O fails, just return error(ie, non-zero) without
-        * calling page_endio.
-        * It causes resubmit the I/O with bio request by upper functions
-        * of rw_page(e.g., swap_readpage, __swap_writepage) and
-        * bio->bi_end_io does things to handle the error
-        * (e.g., SetPageError, set_page_dirty and extra works).
-        */
-       if (unlikely(ret < 0))
-               return ret;
-       switch (ret) {
-       case 0:
-               page_endio(page, op_is_write(op), 0);
-               break;
-       case 1:
-               ret = 0;
-               break;
-       default:
-               WARN_ON(1);
-       }
-       return ret;
- }
  static void zram_destroy_comps(struct zram *zram)
  {
        u32 prio;
@@@ -2283,7 -2231,6 +2224,6 @@@ static const struct block_device_operat
        .open = zram_open,
        .submit_bio = zram_submit_bio,
        .swap_slot_free_notify = zram_slot_free_notify,
-       .rw_page = zram_rw_page,
        .owner = THIS_MODULE
  };
  
@@@ -2378,10 -2325,11 +2318,11 @@@ static int zram_add(void
        zram->disk->private_data = zram;
        snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
  
-       /* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
+       /* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
        set_capacity(zram->disk, 0);
        /* zram devices sort of resembles non-rotational disks */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue);
+       blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue);
  
        /*
index 59823ad1d9aea3068875b93edc52198dc3816427,733fe10339104728e97bbb1385235748ff3be4f2..133d1c07b3a73969cc83eb25c82a3307c48e5905
@@@ -95,6 -95,8 +95,6 @@@
  #define QM_VFT_CFG_RDY                        0x10006c
  #define QM_VFT_CFG_OP_WR              0x100058
  #define QM_VFT_CFG_TYPE                       0x10005c
 -#define QM_SQC_VFT                    0x0
 -#define QM_CQC_VFT                    0x1
  #define QM_VFT_CFG                    0x100060
  #define QM_VFT_CFG_OP_ENABLE          0x100054
  #define QM_PM_CTRL                    0x100148
  #define QM_SQC_VFT_BASE_SHIFT_V2      28
  #define QM_SQC_VFT_BASE_MASK_V2               GENMASK(15, 0)
  #define QM_SQC_VFT_NUM_SHIFT_V2               45
 -#define QM_SQC_VFT_NUM_MASK_v2                GENMASK(9, 0)
 +#define QM_SQC_VFT_NUM_MASK_V2                GENMASK(9, 0)
  
  #define QM_ABNORMAL_INT_SOURCE                0x100000
  #define QM_ABNORMAL_INT_MASK          0x100004
  
  /* interfunction communication */
  #define QM_IFC_READY_STATUS           0x100128
 -#define QM_IFC_C_STS_M                        0x10012C
  #define QM_IFC_INT_SET_P              0x100130
  #define QM_IFC_INT_CFG                        0x100134
  #define QM_IFC_INT_SOURCE_P           0x100138
  
  #define PCI_BAR_2                     2
  #define PCI_BAR_4                     4
 -#define QM_SQE_DATA_ALIGN_MASK                GENMASK(6, 0)
  #define QMC_ALIGN(sz)                 ALIGN(sz, 32)
  
  #define QM_DBG_READ_LEN               256
  #define QM_DRIVER_REMOVING            0
  #define QM_RST_SCHED                  1
  #define QM_QOS_PARAM_NUM              2
 -#define QM_QOS_VAL_NUM                        1
 -#define QM_QOS_BDF_PARAM_NUM          4
  #define QM_QOS_MAX_VAL                        1000
  #define QM_QOS_RATE                   100
  #define QM_QOS_EXPAND_RATE            1000
  #define QM_SHAPER_FACTOR_CBS_B_SHIFT  15
  #define QM_SHAPER_FACTOR_CBS_S_SHIFT  19
  #define QM_SHAPER_CBS_B                       1
 -#define QM_SHAPER_CBS_S                       16
  #define QM_SHAPER_VFT_OFFSET          6
 -#define WAIT_FOR_QOS_VF                       100
  #define QM_QOS_MIN_ERROR_RATE         5
 -#define QM_QOS_TYPICAL_NUM            8
  #define QM_SHAPER_MIN_CBS_S           8
  #define QM_QOS_TICK                   0x300U
  #define QM_QOS_DIVISOR_CLK            0x1f40U
  #define QM_QOS_MAX_CIR_B              200
  #define QM_QOS_MIN_CIR_B              100
  #define QM_QOS_MAX_CIR_U              6
 -#define QM_QOS_MAX_CIR_S              11
  #define QM_AUTOSUSPEND_DELAY          3000
  
  #define QM_MK_CQC_DW3_V1(hop_num, pg_sz, buf_sz, cqe_sz) \
 -      (((hop_num) << QM_CQ_HOP_NUM_SHIFT)     | \
 -      ((pg_sz) << QM_CQ_PAGE_SIZE_SHIFT)      | \
 -      ((buf_sz) << QM_CQ_BUF_SIZE_SHIFT)      | \
 +      (((hop_num) << QM_CQ_HOP_NUM_SHIFT) | \
 +      ((pg_sz) << QM_CQ_PAGE_SIZE_SHIFT) | \
 +      ((buf_sz) << QM_CQ_BUF_SIZE_SHIFT) | \
        ((cqe_sz) << QM_CQ_CQE_SIZE_SHIFT))
  
  #define QM_MK_CQC_DW3_V2(cqe_sz, cq_depth) \
        ((((u32)cq_depth) - 1) | ((cqe_sz) << QM_CQ_CQE_SIZE_SHIFT))
  
  #define QM_MK_SQC_W13(priority, orders, alg_type) \
 -      (((priority) << QM_SQ_PRIORITY_SHIFT)   | \
 -      ((orders) << QM_SQ_ORDERS_SHIFT)        | \
 +      (((priority) << QM_SQ_PRIORITY_SHIFT) | \
 +      ((orders) << QM_SQ_ORDERS_SHIFT) | \
        (((alg_type) & QM_SQ_TYPE_MASK) << QM_SQ_TYPE_SHIFT))
  
  #define QM_MK_SQC_DW3_V1(hop_num, pg_sz, buf_sz, sqe_sz) \
 -      (((hop_num) << QM_SQ_HOP_NUM_SHIFT)     | \
 -      ((pg_sz) << QM_SQ_PAGE_SIZE_SHIFT)      | \
 -      ((buf_sz) << QM_SQ_BUF_SIZE_SHIFT)      | \
 +      (((hop_num) << QM_SQ_HOP_NUM_SHIFT) | \
 +      ((pg_sz) << QM_SQ_PAGE_SIZE_SHIFT) | \
 +      ((buf_sz) << QM_SQ_BUF_SIZE_SHIFT) | \
        ((u32)ilog2(sqe_sz) << QM_SQ_SQE_SIZE_SHIFT))
  
  #define QM_MK_SQC_DW3_V2(sqe_sz, sq_depth) \
@@@ -696,7 -706,7 +696,7 @@@ static void qm_db_v2(struct hisi_qm *qm
  
        doorbell = qn | ((u64)cmd << QM_DB_CMD_SHIFT_V2) |
                   ((u64)randata << QM_DB_RAND_SHIFT_V2) |
 -                 ((u64)index << QM_DB_INDEX_SHIFT_V2)  |
 +                 ((u64)index << QM_DB_INDEX_SHIFT_V2) |
                   ((u64)priority << QM_DB_PRIORITY_SHIFT_V2);
  
        writeq(doorbell, io_base);
@@@ -895,7 -905,7 +895,7 @@@ static void qm_work_process(struct work
        }
  }
  
 -static bool do_qm_irq(struct hisi_qm *qm)
 +static bool do_qm_eq_irq(struct hisi_qm *qm)
  {
        struct qm_eqe *eqe = qm->eqe + qm->status.eq_head;
        struct hisi_qm_poll_data *poll_data;
        return false;
  }
  
 -static irqreturn_t qm_irq(int irq, void *data)
 +static irqreturn_t qm_eq_irq(int irq, void *data)
  {
        struct hisi_qm *qm = data;
        bool ret;
  
 -      ret = do_qm_irq(qm);
 +      ret = do_qm_eq_irq(qm);
        if (ret)
                return IRQ_HANDLED;
  
@@@ -1294,7 -1304,7 +1294,7 @@@ static int qm_get_vft_v2(struct hisi_q
        sqc_vft = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
                  ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) << 32);
        *base = QM_SQC_VFT_BASE_MASK_V2 & (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
 -      *number = (QM_SQC_VFT_NUM_MASK_v2 &
 +      *number = (QM_SQC_VFT_NUM_MASK_V2 &
                   (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
  
        return 0;
@@@ -1882,7 -1892,8 +1882,7 @@@ static struct hisi_qp *qm_create_qp_nol
   * @qm: The qm we create a qp from.
   * @alg_type: Accelerator specific algorithm type in sqc.
   *
 - * return created qp, -EBUSY if all qps in qm allocated, -ENOMEM if allocating
 - * qp memory fails.
 + * Return created qp, negative error code if failed.
   */
  static struct hisi_qp *hisi_qm_create_qp(struct hisi_qm *qm, u8 alg_type)
  {
@@@ -2051,7 -2062,7 +2051,7 @@@ static int qm_start_qp_nolock(struct hi
   * @arg: Accelerator specific argument.
   *
   * After this function, qp can receive request from user. Return 0 if
 - * successful, Return -EBUSY if failed.
 + * successful, negative error code if failed.
   */
  int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
  {
@@@ -2352,7 -2363,7 +2352,7 @@@ static int hisi_qm_uacce_mmap(struct ua
                                return -EINVAL;
                }
  
-               vma->vm_flags |= VM_IO;
+               vm_flags_set(vma, VM_IO);
  
                return remap_pfn_range(vma, vma->vm_start,
                                       phys_base >> PAGE_SHIFT,
@@@ -3063,6 -3074,7 +3063,6 @@@ static int qm_stop_started_qp(struct hi
        return 0;
  }
  
 -
  /**
   * qm_clear_queues() - Clear all queues memory in a qm.
   * @qm: The qm in which the queues will be cleared.
@@@ -3359,7 -3371,7 +3359,7 @@@ static int qm_vf_q_assign(struct hisi_q
                        act_q_num = q_num;
                }
  
 -              act_q_num = min_t(int, act_q_num, max_qp_num);
 +              act_q_num = min(act_q_num, max_qp_num);
                ret = hisi_qm_set_vft(qm, i, q_base, act_q_num);
                if (ret) {
                        for (j = num_vfs; j > i; j--)
@@@ -3546,7 -3558,7 +3546,7 @@@ static ssize_t qm_algqos_read(struct fi
        qos_val = ir / QM_QOS_RATE;
        ret = scnprintf(tbuf, QM_DBG_READ_LEN, "%u\n", qos_val);
  
 -      ret =  simple_read_from_buffer(buf, count, pos, tbuf, ret);
 +      ret = simple_read_from_buffer(buf, count, pos, tbuf, ret);
  
  err_get_status:
        clear_bit(QM_RESETTING, &qm->misc_ctl);
@@@ -4037,10 -4049,13 +4037,10 @@@ static void qm_dev_ecc_mbit_handle(stru
        if (!qm->err_status.is_dev_ecc_mbit &&
            qm->err_status.is_qm_ecc_mbit &&
            qm->err_ini->close_axi_master_ooo) {
 -
                qm->err_ini->close_axi_master_ooo(qm);
 -
        } else if (qm->err_status.is_dev_ecc_mbit &&
                   !qm->err_status.is_qm_ecc_mbit &&
                   !qm->err_ini->close_axi_master_ooo) {
 -
                nfe_enb = readl(qm->io_base + QM_RAS_NFE_ENABLE);
                writel(nfe_enb & QM_RAS_NFE_MBIT_DISABLE,
                       qm->io_base + QM_RAS_NFE_ENABLE);
@@@ -4484,6 -4499,7 +4484,6 @@@ static irqreturn_t qm_abnormal_irq(int 
        return IRQ_HANDLED;
  }
  
 -
  /**
   * hisi_qm_dev_shutdown() - Shutdown device.
   * @pdev: The device will be shutdown.
@@@ -4887,7 -4903,7 +4887,7 @@@ static int qm_register_eq_irq(struct hi
                return 0;
  
        irq_vector = val & QM_IRQ_VECTOR_MASK;
 -      ret = request_irq(pci_irq_vector(pdev, irq_vector), qm_irq, 0, qm->dev_name, qm);
 +      ret = request_irq(pci_irq_vector(pdev, irq_vector), qm_eq_irq, 0, qm->dev_name, qm);
        if (ret)
                dev_err(&pdev->dev, "failed to request eq irq, ret = %d", ret);
  
index ed1164a87fced0c699a49ee75cfda8a42423870b,a69fd6fdabb4e10c78af916e0d9c034d23475959..d8e683688daab516833dd8d6c423082db4f6b485
@@@ -34,7 -34,6 +34,7 @@@
  #include <drm/amdgpu_drm.h>
  #include <drm/drm_drv.h>
  #include <drm/drm_gem_ttm_helper.h>
 +#include <drm/ttm/ttm_tt.h>
  
  #include "amdgpu.h"
  #include "amdgpu_display.h"
@@@ -62,10 -61,10 +62,10 @@@ static vm_fault_t amdgpu_gem_fault(stru
                        goto unlock;
                }
  
 -               ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 -                                              TTM_BO_VM_NUM_PREFAULT);
 +              ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
 +                                             TTM_BO_VM_NUM_PREFAULT);
  
 -               drm_dev_exit(idx);
 +              drm_dev_exit(idx);
        } else {
                ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
        }
@@@ -258,7 -257,7 +258,7 @@@ static int amdgpu_gem_object_mmap(struc
         */
        if (is_cow_mapping(vma->vm_flags) &&
            !(vma->vm_flags & VM_ACCESS_FLAGS))
-               vma->vm_flags &= ~VM_MAYWRITE;
+               vm_flags_clear(vma, VM_MAYWRITE);
  
        return drm_gem_ttm_mmap(obj, vma);
  }
index 072fa4fbd27fc526d910433486a727a6f1d84bd5,d0933dd9af06d4f06efd2d967fa28809c89c72d2..a0e30f21e12e70a763b0a773895b20daee59b13a
@@@ -1065,20 -1065,6 +1065,20 @@@ static int kfd_ioctl_alloc_memory_of_gp
                mutex_unlock(&p->svms.lock);
                return -EADDRINUSE;
        }
 +
 +      /* When register user buffer check if it has been registered by svm by
 +       * buffer cpu virtual address.
 +       */
 +      if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) &&
 +          interval_tree_iter_first(&p->svms.objects,
 +                                   args->mmap_offset >> PAGE_SHIFT,
 +                                   (args->mmap_offset  + args->size - 1) >> PAGE_SHIFT)) {
 +              pr_err("User Buffer Address: 0x%llx already allocated by SVM\n",
 +                      args->mmap_offset);
 +              mutex_unlock(&p->svms.lock);
 +              return -EADDRINUSE;
 +      }
 +
        mutex_unlock(&p->svms.lock);
  #endif
        mutex_lock(&p->mutex);
        }
  
        /* Update the VRAM usage count */
 -      if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
 -              WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
 +      if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
 +              uint64_t size = args->size;
 +
 +              if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM)
 +                      size >>= 1;
 +              WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + PAGE_ALIGN(size));
 +      }
  
        mutex_unlock(&p->mutex);
  
@@@ -2898,8 -2879,8 +2898,8 @@@ static int kfd_mmio_mmap(struct kfd_de
  
        address = dev->adev->rmmio_remap.bus_addr;
  
-       vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
-                               VM_DONTDUMP | VM_PFNMAP;
+       vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
+                               VM_DONTDUMP | VM_PFNMAP);
  
        vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
  
index 72df6286e2407efbbe405eec82a148bc8f0028cf,1fad0ecdfaeb7e64e9d692ee803b4ee8d9d406c5..7acd55a814b2f32999ba4aa9b68db91da30ccfc7
@@@ -1330,7 -1330,7 +1330,7 @@@ bool kfd_process_xnack_mode(struct kfd_
                 * per-process XNACK mode selection. But let the dev->noretry
                 * setting still influence the default XNACK mode.
                 */
 -              if (supported && KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2))
 +              if (supported && KFD_SUPPORT_XNACK_PER_PROCESS(dev))
                        continue;
  
                /* GFXv10 and later GPUs do not support shader preemption
@@@ -1563,8 -1563,6 +1563,8 @@@ err_free_pdd
  int kfd_process_device_init_vm(struct kfd_process_device *pdd,
                               struct file *drm_file)
  {
 +      struct amdgpu_fpriv *drv_priv;
 +      struct amdgpu_vm *avm;
        struct kfd_process *p;
        struct kfd_dev *dev;
        int ret;
        if (pdd->drm_priv)
                return -EBUSY;
  
 +      ret = amdgpu_file_to_fpriv(drm_file, &drv_priv);
 +      if (ret)
 +              return ret;
 +      avm = &drv_priv->vm;
 +
        p = pdd->process;
        dev = pdd->dev;
  
 -      ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, drm_file,
 +      ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, avm,
                                                     &p->kgd_process_info,
                                                     &p->ef);
        if (ret) {
        if (ret)
                goto err_init_cwsr;
  
 -      ret = amdgpu_amdkfd_gpuvm_set_vm_pasid(dev->adev, drm_file, p->pasid);
 +      ret = amdgpu_amdkfd_gpuvm_set_vm_pasid(dev->adev, avm, p->pasid);
        if (ret)
                goto err_set_pasid;
  
@@@ -1614,7 -1607,6 +1614,7 @@@ err_init_cwsr
        kfd_process_device_destroy_ib_mem(pdd);
  err_reserve_ib_mem:
        pdd->drm_priv = NULL;
 +      amdgpu_amdkfd_gpuvm_destroy_cb(dev->adev, avm);
  
        return ret;
  }
@@@ -1986,8 -1978,8 +1986,8 @@@ int kfd_reserved_mem_mmap(struct kfd_de
                return -ENOMEM;
        }
  
-       vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND
-               | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
+       vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND
+               | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP);
        /* Mapping pages to user process */
        return remap_pfn_range(vma, vma->vm_start,
                               PFN_DOWN(__pa(qpd->cwsr_kaddr)),
index 59a0bb5ebd8520c4abc0a8e2d9794c8c2c71e7ed,54c76003d2ccefab148993764e799eb388f83b6d..7a3cb08dc942e1703b0ab7e2767328d501e44821
@@@ -169,20 -169,6 +169,20 @@@ void drm_gem_private_object_init(struc
  }
  EXPORT_SYMBOL(drm_gem_private_object_init);
  
 +/**
 + * drm_gem_private_object_fini - Finalize a failed drm_gem_object
 + * @obj: drm_gem_object
 + *
 + * Uninitialize an already allocated GEM object when it initialized failed
 + */
 +void drm_gem_private_object_fini(struct drm_gem_object *obj)
 +{
 +      WARN_ON(obj->dma_buf);
 +
 +      dma_resv_fini(&obj->_resv);
 +}
 +EXPORT_SYMBOL(drm_gem_private_object_fini);
 +
  /**
   * drm_gem_object_handle_free - release resources bound to userspace handles
   * @obj: GEM object to clean up.
@@@ -944,11 -930,12 +944,11 @@@ drm_gem_release(struct drm_device *dev
  void
  drm_gem_object_release(struct drm_gem_object *obj)
  {
 -      WARN_ON(obj->dma_buf);
 -
        if (obj->filp)
                fput(obj->filp);
  
 -      dma_resv_fini(&obj->_resv);
 +      drm_gem_private_object_fini(obj);
 +
        drm_gem_free_mmap_offset(obj);
        drm_gem_lru_remove(obj);
  }
@@@ -1060,7 -1047,7 +1060,7 @@@ int drm_gem_mmap_obj(struct drm_gem_obj
                        goto err_drm_gem_object_put;
                }
  
-               vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+               vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
                vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
                vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
        }
index df89fbd2d35cd6883aeb3144e3b169f4e9e8223b,fb2c764accc63b428f280193b8a3d3acbd0ec725..870b90b78bc4eba4f25bb36de50139a528c14787
@@@ -477,8 -477,8 +477,8 @@@ drm_gem_dma_prime_import_sg_table(struc
        dma_obj->dma_addr = sg_dma_address(sgt->sgl);
        dma_obj->sgt = sgt;
  
 -      DRM_DEBUG_PRIME("dma_addr = %pad, size = %zu\n", &dma_obj->dma_addr,
 -                      attach->dmabuf->size);
 +      drm_dbg_prime(dev, "dma_addr = %pad, size = %zu\n", &dma_obj->dma_addr,
 +                    attach->dmabuf->size);
  
        return &dma_obj->base;
  }
@@@ -530,8 -530,7 +530,7 @@@ int drm_gem_dma_mmap(struct drm_gem_dma
         * the whole buffer.
         */
        vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
-       vma->vm_flags &= ~VM_PFNMAP;
-       vma->vm_flags |= VM_DONTEXPAND;
+       vm_flags_mod(vma, VM_DONTEXPAND, VM_PFNMAP);
  
        if (dma_obj->map_noncoherent) {
                vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
index 259176d78f3b9ff02f13640fdf5f2290c4f5ee7f,a2c28483e01033d93e68ae13fec8a87695ff647d..7e5c6a8d02123d80f9ff7349f4c1ce9496dd3af3
@@@ -79,10 -79,8 +79,10 @@@ __drm_gem_shmem_create(struct drm_devic
        } else {
                ret = drm_gem_object_init(dev, obj, size);
        }
 -      if (ret)
 +      if (ret) {
 +              drm_gem_private_object_fini(obj);
                goto err_free;
 +      }
  
        ret = drm_gem_create_mmap_offset(obj);
        if (ret)
@@@ -415,7 -413,7 +415,7 @@@ void drm_gem_shmem_vunmap(struct drm_ge
  }
  EXPORT_SYMBOL(drm_gem_shmem_vunmap);
  
 -static struct drm_gem_shmem_object *
 +static int
  drm_gem_shmem_create_with_handle(struct drm_file *file_priv,
                                 struct drm_device *dev, size_t size,
                                 uint32_t *handle)
  
        shmem = drm_gem_shmem_create(dev, size);
        if (IS_ERR(shmem))
 -              return shmem;
 +              return PTR_ERR(shmem);
  
        /*
         * Allocate an id of idr table where the obj is registered
        ret = drm_gem_handle_create(file_priv, &shmem->base, handle);
        /* drop reference from allocate - handle holds it now. */
        drm_gem_object_put(&shmem->base);
 -      if (ret)
 -              return ERR_PTR(ret);
  
 -      return shmem;
 +      return ret;
  }
  
  /* Update madvise status, returns true if not purged, else
@@@ -518,6 -518,7 +518,6 @@@ int drm_gem_shmem_dumb_create(struct dr
                              struct drm_mode_create_dumb *args)
  {
        u32 min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
 -      struct drm_gem_shmem_object *shmem;
  
        if (!args->pitch || !args->size) {
                args->pitch = min_pitch;
                        args->size = PAGE_ALIGN(args->pitch * args->height);
        }
  
 -      shmem = drm_gem_shmem_create_with_handle(file, dev, args->size, &args->handle);
 -
 -      return PTR_ERR_OR_ZERO(shmem);
 +      return drm_gem_shmem_create_with_handle(file, dev, args->size, &args->handle);
  }
  EXPORT_SYMBOL_GPL(drm_gem_shmem_dumb_create);
  
@@@ -630,7 -633,7 +630,7 @@@ int drm_gem_shmem_mmap(struct drm_gem_s
        if (ret)
                return ret;
  
-       vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
        vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
        if (shmem->map_wc)
                vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
@@@ -678,7 -681,23 +678,7 @@@ struct sg_table *drm_gem_shmem_get_sg_t
  }
  EXPORT_SYMBOL_GPL(drm_gem_shmem_get_sg_table);
  
 -/**
 - * drm_gem_shmem_get_pages_sgt - Pin pages, dma map them, and return a
 - *                             scatter/gather table for a shmem GEM object.
 - * @shmem: shmem GEM object
 - *
 - * This function returns a scatter/gather table suitable for driver usage. If
 - * the sg table doesn't exist, the pages are pinned, dma-mapped, and a sg
 - * table created.
 - *
 - * This is the main function for drivers to get at backing storage, and it hides
 - * and difference between dma-buf imported and natively allocated objects.
 - * drm_gem_shmem_get_sg_table() should not be directly called by drivers.
 - *
 - * Returns:
 - * A pointer to the scatter/gather table of pinned pages or errno on failure.
 - */
 -struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_shmem_object *shmem)
 +static struct sg_table *drm_gem_shmem_get_pages_sgt_locked(struct drm_gem_shmem_object *shmem)
  {
        struct drm_gem_object *obj = &shmem->base;
        int ret;
  
        WARN_ON(obj->import_attach);
  
 -      ret = drm_gem_shmem_get_pages(shmem);
 +      ret = drm_gem_shmem_get_pages_locked(shmem);
        if (ret)
                return ERR_PTR(ret);
  
@@@ -711,40 -730,10 +711,40 @@@ err_free_sgt
        sg_free_table(sgt);
        kfree(sgt);
  err_put_pages:
 -      drm_gem_shmem_put_pages(shmem);
 +      drm_gem_shmem_put_pages_locked(shmem);
        return ERR_PTR(ret);
  }
 -EXPORT_SYMBOL_GPL(drm_gem_shmem_get_pages_sgt);
 +
 +/**
 + * drm_gem_shmem_get_pages_sgt - Pin pages, dma map them, and return a
 + *                             scatter/gather table for a shmem GEM object.
 + * @shmem: shmem GEM object
 + *
 + * This function returns a scatter/gather table suitable for driver usage. If
 + * the sg table doesn't exist, the pages are pinned, dma-mapped, and a sg
 + * table created.
 + *
 + * This is the main function for drivers to get at backing storage, and it hides
 + * and difference between dma-buf imported and natively allocated objects.
 + * drm_gem_shmem_get_sg_table() should not be directly called by drivers.
 + *
 + * Returns:
 + * A pointer to the scatter/gather table of pinned pages or errno on failure.
 + */
 +struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_shmem_object *shmem)
 +{
 +      int ret;
 +      struct sg_table *sgt;
 +
 +      ret = mutex_lock_interruptible(&shmem->pages_lock);
 +      if (ret)
 +              return ERR_PTR(ret);
 +      sgt = drm_gem_shmem_get_pages_sgt_locked(shmem);
 +      mutex_unlock(&shmem->pages_lock);
 +
 +      return sgt;
 +}
 +EXPORT_SYMBOL(drm_gem_shmem_get_pages_sgt);
  
  /**
   * drm_gem_shmem_prime_import_sg_table - Produce a shmem GEM object from
@@@ -775,7 -764,7 +775,7 @@@ drm_gem_shmem_prime_import_sg_table(str
  
        shmem->sgt = sgt;
  
 -      DRM_DEBUG_PRIME("size = %zu\n", size);
 +      drm_dbg_prime(dev, "size = %zu\n", size);
  
        return &shmem->base;
  }
index f471e0cb72980fba5f31ebd6b1fdf355a801432b,a9276c8a3e4e704a2acce06816e112b0f745875d..50611eb7f13486702f1ceabf2489f7dc7f46bc2e
  
  #include <drm/drm.h>
  #include <drm/drm_crtc.h>
 +#include <drm/drm_crtc_helper.h>
  #include <drm/drm_fb_helper.h>
  #include <drm/drm_fourcc.h>
  #include <drm/drm_framebuffer.h>
  #include <drm/drm_gem_framebuffer_helper.h>
 +#include <drm/drm_modeset_helper.h>
  
  #include "framebuffer.h"
  #include "gem.h"
@@@ -141,7 -139,7 +141,7 @@@ static int psbfb_mmap(struct fb_info *i
         */
        vma->vm_ops = &psbfb_vm_ops;
        vma->vm_private_data = (void *)fb;
-       vma->vm_flags |= VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP);
        return 0;
  }
  
@@@ -299,6 -297,11 +299,6 @@@ static int psbfb_create(struct drm_fb_h
        info->screen_base = dev_priv->vram_addr + backing->offset;
        info->screen_size = size;
  
 -      if (dev_priv->gtt.stolen_size) {
 -              info->apertures->ranges[0].base = dev_priv->fb_base;
 -              info->apertures->ranges[0].size = dev_priv->gtt.stolen_size;
 -      }
 -
        drm_fb_helper_fill_info(info, fb_helper, sizes);
  
        info->fix.mmio_start = pci_resource_start(pdev, 0);
@@@ -409,7 -412,7 +409,7 @@@ int psb_fbdev_init(struct drm_device *d
  
        dev_priv->fb_helper = fb_helper;
  
 -      drm_fb_helper_prepare(dev, fb_helper, &psb_fb_helper_funcs);
 +      drm_fb_helper_prepare(dev, fb_helper, 32, &psb_fb_helper_funcs);
  
        ret = drm_fb_helper_init(dev, fb_helper);
        if (ret)
        /* disable all the possible outputs/crtcs before entering KMS mode */
        drm_helper_disable_unused_functions(dev);
  
 -      ret = drm_fb_helper_initial_config(fb_helper, 32);
 +      ret = drm_fb_helper_initial_config(fb_helper);
        if (ret)
                goto fini;
  
  fini:
        drm_fb_helper_fini(fb_helper);
  free:
 +      drm_fb_helper_unprepare(fb_helper);
        kfree(fb_helper);
        return ret;
  }
@@@ -440,7 -442,6 +440,7 @@@ static void psb_fbdev_fini(struct drm_d
                return;
  
        psb_fbdev_destroy(dev, dev_priv->fb_helper);
 +      drm_fb_helper_unprepare(dev_priv->fb_helper);
        kfree(dev_priv->fb_helper);
        dev_priv->fb_helper = NULL;
  }
index 2aac6bf7874039b7eeddd96055eae04adb98b974,e95f4c729ca5c3d654c340e22a741f6149f34f41..d3c1dee16af2b557520202ec07b6272a7db15066
@@@ -395,7 -395,7 +395,7 @@@ retry
        /* Finally, remap it using the new GTT offset */
        ret = remap_io_mapping(area,
                               area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT),
 -                             (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT,
 +                             (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT,
                               min_t(u64, vma->size, area->vm_end - area->vm_start),
                               &ggtt->iomap);
        if (ret)
@@@ -697,7 -697,7 +697,7 @@@ insert
        GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
  out:
        if (file)
 -              drm_vma_node_allow(&mmo->vma_node, file);
 +              drm_vma_node_allow_once(&mmo->vma_node, file);
        return mmo;
  
  err:
@@@ -979,7 -979,7 +979,7 @@@ int i915_gem_mmap(struct file *filp, st
                        i915_gem_object_put(obj);
                        return -EINVAL;
                }
-               vma->vm_flags &= ~VM_MAYWRITE;
+               vm_flags_clear(vma, VM_MAYWRITE);
        }
  
        anon = mmap_singleton(to_i915(dev));
                return PTR_ERR(anon);
        }
  
-       vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+       vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
  
        /*
         * We keep the ref on mmo->obj, not vm_file, but we require
index ec0518aa931528451284777fc777fd4f8adf0d79,28659514bf205477070228c8421c74c2189316d0..a25b28d3ee9029b26f5f7fce4de1f8fbd44e2444
  
  static int mtk_drm_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
  
 +static const struct vm_operations_struct vm_ops = {
 +      .open = drm_gem_vm_open,
 +      .close = drm_gem_vm_close,
 +};
 +
  static const struct drm_gem_object_funcs mtk_drm_gem_object_funcs = {
        .free = mtk_drm_gem_free_object,
        .get_sg_table = mtk_gem_prime_get_sg_table,
        .vmap = mtk_drm_gem_prime_vmap,
        .vunmap = mtk_drm_gem_prime_vunmap,
        .mmap = mtk_drm_gem_object_mmap,
 -      .vm_ops = &drm_gem_dma_vm_ops,
 +      .vm_ops = &vm_ops,
  };
  
  static struct mtk_drm_gem_obj *mtk_drm_gem_init(struct drm_device *dev,
@@@ -163,12 -158,14 +163,12 @@@ static int mtk_drm_gem_object_mmap(stru
         * dma_alloc_attrs() allocated a struct page table for mtk_gem, so clear
         * VM_PFNMAP flag that was set by drm_gem_mmap_obj()/drm_gem_mmap().
         */
-       vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
        vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
        vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
  
        ret = dma_mmap_attrs(priv->dma_dev, vma, mtk_gem->cookie,
                             mtk_gem->dma_addr, obj->size, mtk_gem->dma_attrs);
 -      if (ret)
 -              drm_gem_vm_close(vma);
  
        return ret;
  }
@@@ -265,6 -262,6 +265,6 @@@ void mtk_drm_gem_prime_vunmap(struct dr
                return;
  
        vunmap(vaddr);
 -      mtk_gem->kvaddr = 0;
 +      mtk_gem->kvaddr = NULL;
        kfree(mtk_gem->pages);
  }
index d6b4934fa0fd64683f81149032228abb34b8c761,19fef933904b222720cf5bdd49c8a611c3f95519..6b58a5bb7b44b4ccdf9b9e251a9e256f9a0dc9b7
@@@ -543,8 -543,7 +543,7 @@@ int omap_gem_mmap_obj(struct drm_gem_ob
  {
        struct omap_gem_object *omap_obj = to_omap_bo(obj);
  
-       vma->vm_flags &= ~VM_PFNMAP;
-       vma->vm_flags |= VM_MIXEDMAP;
+       vm_flags_mod(vma, VM_MIXEDMAP, VM_PFNMAP);
  
        if (omap_obj->flags & OMAP_BO_WC) {
                vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
@@@ -605,7 -604,7 +604,7 @@@ int omap_gem_dumb_create(struct drm_fil
  }
  
  /**
 - * omap_gem_dumb_map  -       buffer mapping for dumb interface
 + * omap_gem_dumb_map_offset - create an offset for a dumb buffer
   * @file: our drm client file
   * @dev: drm device
   * @handle: GEM handle to the object (from dumb_create)
index 3ecda6db24b846e5c2a8faf591d194488a5f6c14,c00207582c7440aa8ba884c21517cb69a9ac8c29..ca7744b852f5e717403d86169b1dbf85dcb29126
  
  #define pr_fmt(fmt) "[TTM] " fmt
  
 -#include <drm/ttm/ttm_bo_driver.h>
 +#include <drm/ttm/ttm_bo.h>
  #include <drm/ttm/ttm_placement.h>
 -#include <drm/drm_vma_manager.h>
 +#include <drm/ttm/ttm_tt.h>
 +
  #include <drm/drm_drv.h>
  #include <drm/drm_managed.h>
 -#include <linux/mm.h>
 -#include <linux/pfn_t.h>
 -#include <linux/rbtree.h>
 -#include <linux/module.h>
 -#include <linux/uaccess.h>
 -#include <linux/mem_encrypt.h>
  
  static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
                                struct vm_fault *vmf)
@@@ -441,14 -446,6 +441,14 @@@ static const struct vm_operations_struc
        .access = ttm_bo_vm_access,
  };
  
 +/**
 + * ttm_bo_mmap_obj - mmap memory backed by a ttm buffer object.
 + *
 + * @vma:       vma as input from the fbdev mmap method.
 + * @bo:        The bo backing the address space.
 + *
 + * Maps a buffer object.
 + */
  int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
  {
        /* Enforce no COW since would have really strange behavior with it. */
  
        vma->vm_private_data = bo;
  
-       vma->vm_flags |= VM_PFNMAP;
-       vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
        return 0;
  }
  EXPORT_SYMBOL(ttm_bo_mmap_obj);
index 7c5d487ec9168e290af4fb32e60731635c5fcf4e,c6e59bc480f9066b0fe05a68461b7ce35e9a70ce..47e4c808420757a54548c91b029797461c25b4d3
@@@ -403,7 -403,7 +403,7 @@@ static int hfi1_file_mmap(struct file *
                        ret = -EPERM;
                        goto done;
                }
-               vma->vm_flags &= ~VM_MAYWRITE;
+               vm_flags_clear(vma, VM_MAYWRITE);
                addr = vma->vm_start;
                for (i = 0 ; i < uctxt->egrbufs.numbufs; i++) {
                        memlen = uctxt->egrbufs.buffers[i].len;
                goto done;
        }
  
-       vma->vm_flags = flags;
+       vm_flags_reset(vma, flags);
        hfi1_cdbg(PROC,
                  "%u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%lx\n",
                    ctxt, subctxt, type, mapio, vmf, memaddr, memlen,
@@@ -1318,15 -1318,12 +1318,15 @@@ static int user_exp_rcv_setup(struct hf
                addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
                if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
                                 sizeof(tinfo.tidcnt)))
 -                      return -EFAULT;
 +                      ret = -EFAULT;
  
                addr = arg + offsetof(struct hfi1_tid_info, length);
 -              if (copy_to_user((void __user *)addr, &tinfo.length,
 +              if (!ret && copy_to_user((void __user *)addr, &tinfo.length,
                                 sizeof(tinfo.length)))
                        ret = -EFAULT;
 +
 +              if (ret)
 +                      hfi1_user_exp_rcv_invalid(fd, &tinfo);
        }
  
        return ret;
index dc32e4518a28053b6e4e11370a03a86089f1ca14,e3c97aa2c46cd25e80af748eb633fc29f4eb7f37..471c3455dfebd2aaa7ab58f8e743897a883ae3b4
@@@ -2087,7 -2087,7 +2087,7 @@@ static int mlx5_ib_mmap_clock_info_page
  
        if (vma->vm_flags & (VM_WRITE | VM_EXEC))
                return -EPERM;
-       vma->vm_flags &= ~VM_MAYWRITE;
+       vm_flags_clear(vma, VM_MAYWRITE);
  
        if (!dev->mdev->clock_info)
                return -EOPNOTSUPP;
@@@ -2311,7 -2311,7 +2311,7 @@@ static int mlx5_ib_mmap(struct ib_ucont
  
                if (vma->vm_flags & VM_WRITE)
                        return -EPERM;
-               vma->vm_flags &= ~VM_MAYWRITE;
+               vm_flags_clear(vma, VM_MAYWRITE);
  
                /* Don't expose to user-space information it shouldn't have */
                if (PAGE_SIZE > 4096)
@@@ -3012,63 -3012,26 +3012,63 @@@ static void mlx5_eth_lag_cleanup(struc
        }
  }
  
 -static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u32 port_num)
 +static void mlx5_netdev_notifier_register(struct mlx5_roce *roce,
 +                                        struct net_device *netdev)
  {
        int err;
  
 -      dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
 -      err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
 -      if (err) {
 -              dev->port[port_num].roce.nb.notifier_call = NULL;
 -              return err;
 -      }
 +      if (roce->tracking_netdev)
 +              return;
 +      roce->tracking_netdev = netdev;
 +      roce->nb.notifier_call = mlx5_netdev_event;
 +      err = register_netdevice_notifier_dev_net(netdev, &roce->nb, &roce->nn);
 +      WARN_ON(err);
 +}
  
 -      return 0;
 +static void mlx5_netdev_notifier_unregister(struct mlx5_roce *roce)
 +{
 +      if (!roce->tracking_netdev)
 +              return;
 +      unregister_netdevice_notifier_dev_net(roce->tracking_netdev, &roce->nb,
 +                                            &roce->nn);
 +      roce->tracking_netdev = NULL;
  }
  
 -static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u32 port_num)
 +static int mlx5e_mdev_notifier_event(struct notifier_block *nb,
 +                                   unsigned long event, void *data)
  {
 -      if (dev->port[port_num].roce.nb.notifier_call) {
 -              unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
 -              dev->port[port_num].roce.nb.notifier_call = NULL;
 +      struct mlx5_roce *roce = container_of(nb, struct mlx5_roce, mdev_nb);
 +      struct net_device *netdev = data;
 +
 +      switch (event) {
 +      case MLX5_DRIVER_EVENT_UPLINK_NETDEV:
 +              if (netdev)
 +                      mlx5_netdev_notifier_register(roce, netdev);
 +              else
 +                      mlx5_netdev_notifier_unregister(roce);
 +              break;
 +      default:
 +              return NOTIFY_DONE;
        }
 +
 +      return NOTIFY_OK;
 +}
 +
 +static void mlx5_mdev_netdev_track(struct mlx5_ib_dev *dev, u32 port_num)
 +{
 +      struct mlx5_roce *roce = &dev->port[port_num].roce;
 +
 +      roce->mdev_nb.notifier_call = mlx5e_mdev_notifier_event;
 +      mlx5_blocking_notifier_register(dev->mdev, &roce->mdev_nb);
 +      mlx5_core_uplink_netdev_event_replay(dev->mdev);
 +}
 +
 +static void mlx5_mdev_netdev_untrack(struct mlx5_ib_dev *dev, u32 port_num)
 +{
 +      struct mlx5_roce *roce = &dev->port[port_num].roce;
 +
 +      mlx5_blocking_notifier_unregister(dev->mdev, &roce->mdev_nb);
 +      mlx5_netdev_notifier_unregister(roce);
  }
  
  static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
@@@ -3175,7 -3138,7 +3175,7 @@@ static void mlx5_ib_unbind_slave_port(s
        if (mpi->mdev_events.notifier_call)
                mlx5_notifier_unregister(mpi->mdev, &mpi->mdev_events);
        mpi->mdev_events.notifier_call = NULL;
 -      mlx5_remove_netdev_notifier(ibdev, port_num);
 +      mlx5_mdev_netdev_untrack(ibdev, port_num);
        spin_lock(&port->mp.mpi_lock);
  
        comps = mpi->mdev_refcnt;
@@@ -3233,7 -3196,12 +3233,7 @@@ static bool mlx5_ib_bind_slave_port(str
        if (err)
                goto unbind;
  
 -      err = mlx5_add_netdev_notifier(ibdev, port_num);
 -      if (err) {
 -              mlx5_ib_err(ibdev, "failed adding netdev notifier for port %u\n",
 -                          port_num + 1);
 -              goto unbind;
 -      }
 +      mlx5_mdev_netdev_track(ibdev, port_num);
  
        mpi->mdev_events.notifier_call = mlx5_ib_event_slave_port;
        mlx5_notifier_register(mpi->mdev, &mpi->mdev_events);
@@@ -3941,7 -3909,9 +3941,7 @@@ static int mlx5_ib_roce_init(struct mlx
                port_num = mlx5_core_native_port_num(dev->mdev) - 1;
  
                /* Register only for native ports */
 -              err = mlx5_add_netdev_notifier(dev, port_num);
 -              if (err)
 -                      return err;
 +              mlx5_mdev_netdev_track(dev, port_num);
  
                err = mlx5_enable_eth(dev);
                if (err)
  
        return 0;
  cleanup:
 -      mlx5_remove_netdev_notifier(dev, port_num);
 +      mlx5_mdev_netdev_untrack(dev, port_num);
        return err;
  }
  
@@@ -3968,7 -3938,7 +3968,7 @@@ static void mlx5_ib_roce_cleanup(struc
                mlx5_disable_eth(dev);
  
                port_num = mlx5_core_native_port_num(dev->mdev) - 1;
 -              mlx5_remove_netdev_notifier(dev, port_num);
 +              mlx5_mdev_netdev_untrack(dev, port_num);
        }
  }
  
index e668cd89b0e48cdf3fd0a8a63618a9b7b7196bcc,dc310c7b5769ef8185d867f4249284879e5cd144..aa5f059d022271c3ffe879edbd2d5027b51c710a
@@@ -157,6 -157,10 +157,6 @@@ static vm_fault_t fb_deferred_io_track_
        /* protect against the workqueue changing the page list */
        mutex_lock(&fbdefio->lock);
  
 -      /* first write in this cycle, notify the driver */
 -      if (fbdefio->first_io && list_empty(&fbdefio->pagereflist))
 -              fbdefio->first_io(info);
 -
        pageref = fb_deferred_io_pageref_get(info, offset, page);
        if (WARN_ON_ONCE(!pageref)) {
                ret = VM_FAULT_OOM;
@@@ -228,9 -232,9 +228,9 @@@ static const struct address_space_opera
  int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma)
  {
        vma->vm_ops = &fb_deferred_io_vm_ops;
-       vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
        if (!(info->flags & FBINFO_VIRTFB))
-               vma->vm_flags |= VM_IO;
+               vm_flags_set(vma, VM_IO);
        vma->vm_private_data = info;
        return 0;
  }
@@@ -309,7 -313,7 +309,7 @@@ void fb_deferred_io_open(struct fb_inf
  }
  EXPORT_SYMBOL_GPL(fb_deferred_io_open);
  
 -void fb_deferred_io_cleanup(struct fb_info *info)
 +void fb_deferred_io_release(struct fb_info *info)
  {
        struct fb_deferred_io *fbdefio = info->fbdefio;
        struct page *page;
                page = fb_deferred_io_page(info, i);
                page->mapping = NULL;
        }
 +}
 +EXPORT_SYMBOL_GPL(fb_deferred_io_release);
 +
 +void fb_deferred_io_cleanup(struct fb_info *info)
 +{
 +      struct fb_deferred_io *fbdefio = info->fbdefio;
 +
 +      fb_deferred_io_release(info);
  
        kvfree(info->pagerefs);
        mutex_destroy(&fbdefio->lock);
diff --combined fs/afs/write.c
index 2d17891b618e6e94890ee454f4b4311db8cc8aa0,2d3b08b7406ca77c418a8c9bf33da1c3984b0cef..571f3b9a417e5f8a8fd5d6c900c0961da83c6eae
@@@ -704,85 -704,87 +704,87 @@@ static int afs_writepages_region(struc
                                 bool max_one_loop)
  {
        struct folio *folio;
-       struct page *head_page;
+       struct folio_batch fbatch;
        ssize_t ret;
+       unsigned int i;
        int n, skips = 0;
  
        _enter("%llx,%llx,", start, end);
+       folio_batch_init(&fbatch);
  
        do {
                pgoff_t index = start / PAGE_SIZE;
  
-               n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
-                                            PAGECACHE_TAG_DIRTY, 1, &head_page);
+               n = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
+                                       PAGECACHE_TAG_DIRTY, &fbatch);
                if (!n)
                        break;
+               for (i = 0; i < n; i++) {
+                       folio = fbatch.folios[i];
+                       start = folio_pos(folio); /* May regress with THPs */
  
-               folio = page_folio(head_page);
-               start = folio_pos(folio); /* May regress with THPs */
-               _debug("wback %lx", folio_index(folio));
+                       _debug("wback %lx", folio_index(folio));
  
-               /* At this point we hold neither the i_pages lock nor the
-                * page lock: the page may be truncated or invalidated
-                * (changing page->mapping to NULL), or even swizzled
-                * back from swapper_space to tmpfs file mapping
-                */
-               if (wbc->sync_mode != WB_SYNC_NONE) {
-                       ret = folio_lock_killable(folio);
-                       if (ret < 0) {
-                               folio_put(folio);
-                               return ret;
-                       }
-               } else {
-                       if (!folio_trylock(folio)) {
-                               folio_put(folio);
-                               return 0;
+                       /* At this point we hold neither the i_pages lock nor the
+                        * page lock: the page may be truncated or invalidated
+                        * (changing page->mapping to NULL), or even swizzled
+                        * back from swapper_space to tmpfs file mapping
+                        */
+                       if (wbc->sync_mode != WB_SYNC_NONE) {
+                               ret = folio_lock_killable(folio);
+                               if (ret < 0) {
+                                       folio_batch_release(&fbatch);
+                                       return ret;
+                               }
+                       } else {
+                               if (!folio_trylock(folio))
+                                       continue;
                        }
-               }
  
-               if (folio_mapping(folio) != mapping ||
-                   !folio_test_dirty(folio)) {
-                       start += folio_size(folio);
-                       folio_unlock(folio);
-                       folio_put(folio);
-                       continue;
-               }
+                       if (folio->mapping != mapping ||
+                           !folio_test_dirty(folio)) {
+                               start += folio_size(folio);
+                               folio_unlock(folio);
+                               continue;
+                       }
  
-               if (folio_test_writeback(folio) ||
-                   folio_test_fscache(folio)) {
-                       folio_unlock(folio);
-                       if (wbc->sync_mode != WB_SYNC_NONE) {
-                               folio_wait_writeback(folio);
+                       if (folio_test_writeback(folio) ||
+                           folio_test_fscache(folio)) {
+                               folio_unlock(folio);
+                               if (wbc->sync_mode != WB_SYNC_NONE) {
+                                       folio_wait_writeback(folio);
  #ifdef CONFIG_AFS_FSCACHE
-                               folio_wait_fscache(folio);
+                                       folio_wait_fscache(folio);
  #endif
-                       } else {
-                               start += folio_size(folio);
+                               } else {
+                                       start += folio_size(folio);
+                               }
+                               if (wbc->sync_mode == WB_SYNC_NONE) {
+                                       if (skips >= 5 || need_resched()) {
+                                               *_next = start;
+                                               _leave(" = 0 [%llx]", *_next);
+                                               return 0;
+                                       }
+                                       skips++;
+                               }
+                               continue;
                        }
-                       folio_put(folio);
-                       if (wbc->sync_mode == WB_SYNC_NONE) {
-                               if (skips >= 5 || need_resched())
-                                       break;
-                               skips++;
+                       if (!folio_clear_dirty_for_io(folio))
+                               BUG();
+                       ret = afs_write_back_from_locked_folio(mapping, wbc,
+                                       folio, start, end);
+                       if (ret < 0) {
+                               _leave(" = %zd", ret);
+                               folio_batch_release(&fbatch);
+                               return ret;
                        }
-                       continue;
-               }
  
-               if (!folio_clear_dirty_for_io(folio))
-                       BUG();
-               ret = afs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
-               folio_put(folio);
-               if (ret < 0) {
-                       _leave(" = %zd", ret);
-                       return ret;
+                       start += ret;
                }
  
-               start += ret;
-               if (max_one_loop)
-                       break;
+               folio_batch_release(&fbatch);
                cond_resched();
        } while (wbc->nr_to_write > 0);
  
@@@ -992,7 -994,7 +994,7 @@@ int afs_launder_folio(struct folio *fol
  {
        struct afs_vnode *vnode = AFS_FS_I(folio_inode(folio));
        struct iov_iter iter;
 -      struct bio_vec bv[1];
 +      struct bio_vec bv;
        unsigned long priv;
        unsigned int f, t;
        int ret = 0;
                        t = afs_folio_dirty_to(folio, priv);
                }
  
 -              bv[0].bv_page = &folio->page;
 -              bv[0].bv_offset = f;
 -              bv[0].bv_len = t - f;
 -              iov_iter_bvec(&iter, ITER_SOURCE, bv, 1, bv[0].bv_len);
 +              bvec_set_folio(&bv, folio, t - f, f);
 +              iov_iter_bvec(&iter, ITER_SOURCE, &bv, 1, bv.bv_len);
  
                trace_afs_folio_dirty(vnode, tracepoint_string("launder"), folio);
                ret = afs_store_data(vnode, &iter, folio_pos(folio) + f, true);
diff --combined fs/btrfs/extent_io.c
index c25fa74d7615f7eb4e5fe9ac9aa6df5ee05d95c1,0a2d6fb611c6d15362a93c231cc36f04392cfb77..40300e8e5f99c90eec23a51f89dc731e5e5f3ba7
@@@ -36,7 -36,6 +36,7 @@@
  #include "file.h"
  #include "dev-replace.h"
  #include "super.h"
 +#include "transaction.h"
  
  static struct kmem_cache *extent_buffer_cache;
  
@@@ -100,6 -99,7 +100,6 @@@ struct btrfs_bio_ctrl 
        struct bio *bio;
        int mirror_num;
        enum btrfs_compression_type compress_type;
 -      u32 len_to_stripe_boundary;
        u32 len_to_oe_boundary;
        btrfs_bio_end_io_t end_io_func;
  
@@@ -126,7 -126,7 +126,7 @@@ static void submit_one_bio(struct btrfs
  {
        struct bio *bio;
        struct bio_vec *bv;
 -      struct btrfs_inode *inode;
 +      struct inode *inode;
        int mirror_num;
  
        if (!bio_ctrl->bio)
  
        bio = bio_ctrl->bio;
        bv = bio_first_bvec_all(bio);
 -      inode = BTRFS_I(bv->bv_page->mapping->host);
 +      inode = bv->bv_page->mapping->host;
        mirror_num = bio_ctrl->mirror_num;
  
        /* Caller should ensure the bio has at least some range added */
        ASSERT(bio->bi_iter.bi_size);
  
 -      btrfs_bio(bio)->file_offset = page_offset(bv->bv_page) + bv->bv_offset;
 -
 -      if (!is_data_inode(&inode->vfs_inode)) {
 +      if (!is_data_inode(inode)) {
                if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
                        /*
                         * For metadata read, we should have the parent_check,
                               bio_ctrl->parent_check,
                               sizeof(struct btrfs_tree_parent_check));
                }
 -              btrfs_submit_metadata_bio(inode, bio, mirror_num);
 -      } else if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
 -              btrfs_submit_data_write_bio(inode, bio, mirror_num);
 -      } else {
 -              btrfs_submit_data_read_bio(inode, bio, mirror_num,
 -                                         bio_ctrl->compress_type);
 +              bio->bi_opf |= REQ_META;
        }
  
 +      if (btrfs_op(bio) == BTRFS_MAP_READ &&
 +          bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
 +              btrfs_submit_compressed_read(inode, bio, mirror_num);
 +      else
 +              btrfs_submit_bio(bio, mirror_num);
 +
        /* The bio is owned by the end_io handler now */
        bio_ctrl->bio = NULL;
  }
@@@ -514,6 -515,266 +514,6 @@@ void extent_clear_unlock_delalloc(struc
                               start, end, page_ops, NULL);
  }
  
 -static int insert_failrec(struct btrfs_inode *inode,
 -                        struct io_failure_record *failrec)
 -{
 -      struct rb_node *exist;
 -
 -      spin_lock(&inode->io_failure_lock);
 -      exist = rb_simple_insert(&inode->io_failure_tree, failrec->bytenr,
 -                               &failrec->rb_node);
 -      spin_unlock(&inode->io_failure_lock);
 -
 -      return (exist == NULL) ? 0 : -EEXIST;
 -}
 -
 -static struct io_failure_record *get_failrec(struct btrfs_inode *inode, u64 start)
 -{
 -      struct rb_node *node;
 -      struct io_failure_record *failrec = ERR_PTR(-ENOENT);
 -
 -      spin_lock(&inode->io_failure_lock);
 -      node = rb_simple_search(&inode->io_failure_tree, start);
 -      if (node)
 -              failrec = rb_entry(node, struct io_failure_record, rb_node);
 -      spin_unlock(&inode->io_failure_lock);
 -      return failrec;
 -}
 -
 -static void free_io_failure(struct btrfs_inode *inode,
 -                          struct io_failure_record *rec)
 -{
 -      spin_lock(&inode->io_failure_lock);
 -      rb_erase(&rec->rb_node, &inode->io_failure_tree);
 -      spin_unlock(&inode->io_failure_lock);
 -
 -      kfree(rec);
 -}
 -
 -static int next_mirror(const struct io_failure_record *failrec, int cur_mirror)
 -{
 -      if (cur_mirror == failrec->num_copies)
 -              return cur_mirror + 1 - failrec->num_copies;
 -      return cur_mirror + 1;
 -}
 -
 -static int prev_mirror(const struct io_failure_record *failrec, int cur_mirror)
 -{
 -      if (cur_mirror == 1)
 -              return failrec->num_copies;
 -      return cur_mirror - 1;
 -}
 -
 -/*
 - * each time an IO finishes, we do a fast check in the IO failure tree
 - * to see if we need to process or clean up an io_failure_record
 - */
 -int btrfs_clean_io_failure(struct btrfs_inode *inode, u64 start,
 -                         struct page *page, unsigned int pg_offset)
 -{
 -      struct btrfs_fs_info *fs_info = inode->root->fs_info;
 -      struct extent_io_tree *io_tree = &inode->io_tree;
 -      u64 ino = btrfs_ino(inode);
 -      u64 locked_start, locked_end;
 -      struct io_failure_record *failrec;
 -      int mirror;
 -      int ret;
 -
 -      failrec = get_failrec(inode, start);
 -      if (IS_ERR(failrec))
 -              return 0;
 -
 -      BUG_ON(!failrec->this_mirror);
 -
 -      if (sb_rdonly(fs_info->sb))
 -              goto out;
 -
 -      ret = find_first_extent_bit(io_tree, failrec->bytenr, &locked_start,
 -                                  &locked_end, EXTENT_LOCKED, NULL);
 -      if (ret || locked_start > failrec->bytenr ||
 -          locked_end < failrec->bytenr + failrec->len - 1)
 -              goto out;
 -
 -      mirror = failrec->this_mirror;
 -      do {
 -              mirror = prev_mirror(failrec, mirror);
 -              btrfs_repair_io_failure(fs_info, ino, start, failrec->len,
 -                                failrec->logical, page, pg_offset, mirror);
 -      } while (mirror != failrec->failed_mirror);
 -
 -out:
 -      free_io_failure(inode, failrec);
 -      return 0;
 -}
 -
 -/*
 - * Can be called when
 - * - hold extent lock
 - * - under ordered extent
 - * - the inode is freeing
 - */
 -void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
 -{
 -      struct io_failure_record *failrec;
 -      struct rb_node *node, *next;
 -
 -      if (RB_EMPTY_ROOT(&inode->io_failure_tree))
 -              return;
 -
 -      spin_lock(&inode->io_failure_lock);
 -      node = rb_simple_search_first(&inode->io_failure_tree, start);
 -      while (node) {
 -              failrec = rb_entry(node, struct io_failure_record, rb_node);
 -              if (failrec->bytenr > end)
 -                      break;
 -
 -              next = rb_next(node);
 -              rb_erase(&failrec->rb_node, &inode->io_failure_tree);
 -              kfree(failrec);
 -
 -              node = next;
 -      }
 -      spin_unlock(&inode->io_failure_lock);
 -}
 -
 -static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode,
 -                                                           struct btrfs_bio *bbio,
 -                                                           unsigned int bio_offset)
 -{
 -      struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 -      u64 start = bbio->file_offset + bio_offset;
 -      struct io_failure_record *failrec;
 -      const u32 sectorsize = fs_info->sectorsize;
 -      int ret;
 -
 -      failrec = get_failrec(BTRFS_I(inode), start);
 -      if (!IS_ERR(failrec)) {
 -              btrfs_debug(fs_info,
 -      "Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu",
 -                      failrec->logical, failrec->bytenr, failrec->len);
 -              /*
 -               * when data can be on disk more than twice, add to failrec here
 -               * (e.g. with a list for failed_mirror) to make
 -               * clean_io_failure() clean all those errors at once.
 -               */
 -              ASSERT(failrec->this_mirror == bbio->mirror_num);
 -              ASSERT(failrec->len == fs_info->sectorsize);
 -              return failrec;
 -      }
 -
 -      failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
 -      if (!failrec)
 -              return ERR_PTR(-ENOMEM);
 -
 -      RB_CLEAR_NODE(&failrec->rb_node);
 -      failrec->bytenr = start;
 -      failrec->len = sectorsize;
 -      failrec->failed_mirror = bbio->mirror_num;
 -      failrec->this_mirror = bbio->mirror_num;
 -      failrec->logical = (bbio->iter.bi_sector << SECTOR_SHIFT) + bio_offset;
 -
 -      btrfs_debug(fs_info,
 -                  "new io failure record logical %llu start %llu",
 -                  failrec->logical, start);
 -
 -      failrec->num_copies = btrfs_num_copies(fs_info, failrec->logical, sectorsize);
 -      if (failrec->num_copies == 1) {
 -              /*
 -               * We only have a single copy of the data, so don't bother with
 -               * all the retry and error correction code that follows. No
 -               * matter what the error is, it is very likely to persist.
 -               */
 -              btrfs_debug(fs_info,
 -                      "cannot repair logical %llu num_copies %d",
 -                      failrec->logical, failrec->num_copies);
 -              kfree(failrec);
 -              return ERR_PTR(-EIO);
 -      }
 -
 -      /* Set the bits in the private failure tree */
 -      ret = insert_failrec(BTRFS_I(inode), failrec);
 -      if (ret) {
 -              kfree(failrec);
 -              return ERR_PTR(ret);
 -      }
 -
 -      return failrec;
 -}
 -
 -int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_bbio,
 -                          u32 bio_offset, struct page *page, unsigned int pgoff,
 -                          bool submit_buffered)
 -{
 -      u64 start = failed_bbio->file_offset + bio_offset;
 -      struct io_failure_record *failrec;
 -      struct btrfs_fs_info *fs_info = inode->root->fs_info;
 -      struct bio *failed_bio = &failed_bbio->bio;
 -      const int icsum = bio_offset >> fs_info->sectorsize_bits;
 -      struct bio *repair_bio;
 -      struct btrfs_bio *repair_bbio;
 -
 -      btrfs_debug(fs_info,
 -                 "repair read error: read error at %llu", start);
 -
 -      BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 -
 -      failrec = btrfs_get_io_failure_record(&inode->vfs_inode, failed_bbio, bio_offset);
 -      if (IS_ERR(failrec))
 -              return PTR_ERR(failrec);
 -
 -      /*
 -       * There are two premises:
 -       * a) deliver good data to the caller
 -       * b) correct the bad sectors on disk
 -       *
 -       * Since we're only doing repair for one sector, we only need to get
 -       * a good copy of the failed sector and if we succeed, we have setup
 -       * everything for btrfs_repair_io_failure to do the rest for us.
 -       */
 -      failrec->this_mirror = next_mirror(failrec, failrec->this_mirror);
 -      if (failrec->this_mirror == failrec->failed_mirror) {
 -              btrfs_debug(fs_info,
 -                      "failed to repair num_copies %d this_mirror %d failed_mirror %d",
 -                      failrec->num_copies, failrec->this_mirror, failrec->failed_mirror);
 -              free_io_failure(inode, failrec);
 -              return -EIO;
 -      }
 -
 -      repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io,
 -                                   failed_bbio->private);
 -      repair_bbio = btrfs_bio(repair_bio);
 -      repair_bbio->file_offset = start;
 -      repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
 -
 -      if (failed_bbio->csum) {
 -              const u32 csum_size = fs_info->csum_size;
 -
 -              repair_bbio->csum = repair_bbio->csum_inline;
 -              memcpy(repair_bbio->csum,
 -                     failed_bbio->csum + csum_size * icsum, csum_size);
 -      }
 -
 -      bio_add_page(repair_bio, page, failrec->len, pgoff);
 -      repair_bbio->iter = repair_bio->bi_iter;
 -
 -      btrfs_debug(fs_info,
 -                  "repair read error: submitting new read to mirror %d",
 -                  failrec->this_mirror);
 -
 -      /*
 -       * At this point we have a bio, so any errors from bio submission will
 -       * be handled by the endio on the repair_bio, so we can't return an
 -       * error here.
 -       */
 -      if (submit_buffered)
 -              btrfs_submit_data_read_bio(inode, repair_bio,
 -                                         failrec->this_mirror, 0);
 -      else
 -              btrfs_submit_dio_repair_bio(inode, repair_bio, failrec->this_mirror);
 -
 -      return BLK_STS_OK;
 -}
 -
  static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
  {
        struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
                btrfs_subpage_end_reader(fs_info, page, start, len);
  }
  
 -static void end_sector_io(struct page *page, u64 offset, bool uptodate)
 -{
 -      struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
 -      const u32 sectorsize = inode->root->fs_info->sectorsize;
 -
 -      end_page_read(page, uptodate, offset, sectorsize);
 -      unlock_extent(&inode->io_tree, offset, offset + sectorsize - 1, NULL);
 -}
 -
 -static void submit_data_read_repair(struct inode *inode,
 -                                  struct btrfs_bio *failed_bbio,
 -                                  u32 bio_offset, const struct bio_vec *bvec,
 -                                  unsigned int error_bitmap)
 -{
 -      const unsigned int pgoff = bvec->bv_offset;
 -      struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 -      struct page *page = bvec->bv_page;
 -      const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
 -      const u64 end = start + bvec->bv_len - 1;
 -      const u32 sectorsize = fs_info->sectorsize;
 -      const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
 -      int i;
 -
 -      BUG_ON(bio_op(&failed_bbio->bio) == REQ_OP_WRITE);
 -
 -      /* This repair is only for data */
 -      ASSERT(is_data_inode(inode));
 -
 -      /* We're here because we had some read errors or csum mismatch */
 -      ASSERT(error_bitmap);
 -
 -      /*
 -       * We only get called on buffered IO, thus page must be mapped and bio
 -       * must not be cloned.
 -       */
 -      ASSERT(page->mapping && !bio_flagged(&failed_bbio->bio, BIO_CLONED));
 -
 -      /* Iterate through all the sectors in the range */
 -      for (i = 0; i < nr_bits; i++) {
 -              const unsigned int offset = i * sectorsize;
 -              bool uptodate = false;
 -              int ret;
 -
 -              if (!(error_bitmap & (1U << i))) {
 -                      /*
 -                       * This sector has no error, just end the page read
 -                       * and unlock the range.
 -                       */
 -                      uptodate = true;
 -                      goto next;
 -              }
 -
 -              ret = btrfs_repair_one_sector(BTRFS_I(inode), failed_bbio,
 -                              bio_offset + offset, page, pgoff + offset,
 -                              true);
 -              if (!ret) {
 -                      /*
 -                       * We have submitted the read repair, the page release
 -                       * will be handled by the endio function of the
 -                       * submitted repair bio.
 -                       * Thus we don't need to do any thing here.
 -                       */
 -                      continue;
 -              }
 -              /*
 -               * Continue on failed repair, otherwise the remaining sectors
 -               * will not be properly unlocked.
 -               */
 -next:
 -              end_sector_io(page, start + offset, uptodate);
 -      }
 -}
 -
  /* lots and lots of room for performance fixes in the end_bio funcs */
  
  void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
@@@ -585,6 -919,7 +585,6 @@@ static void end_bio_extent_writepage(st
        u64 start;
        u64 end;
        struct bvec_iter_all iter_all;
 -      bool first_bvec = true;
  
        ASSERT(!bio_flagged(bio, BIO_CLONED));
        bio_for_each_segment_all(bvec, bio, iter_all) {
                start = page_offset(page) + bvec->bv_offset;
                end = start + bvec->bv_len - 1;
  
 -              if (first_bvec) {
 -                      btrfs_record_physical_zoned(inode, start, bio);
 -                      first_bvec = false;
 -              }
 -
                end_extent_writepage(page, error, start, end);
  
                btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
@@@ -753,6 -1093,8 +753,6 @@@ static void end_bio_extent_readpage(str
                struct inode *inode = page->mapping->host;
                struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
                const u32 sectorsize = fs_info->sectorsize;
 -              unsigned int error_bitmap = (unsigned int)-1;
 -              bool repair = false;
                u64 start;
                u64 end;
                u32 len;
                len = bvec->bv_len;
  
                mirror = bbio->mirror_num;
 -              if (likely(uptodate)) {
 -                      if (is_data_inode(inode)) {
 -                              error_bitmap = btrfs_verify_data_csum(bbio,
 -                                              bio_offset, page, start, end);
 -                              if (error_bitmap)
 -                                      uptodate = false;
 -                      } else {
 -                              if (btrfs_validate_metadata_buffer(bbio,
 -                                              page, start, end, mirror))
 -                                      uptodate = false;
 -                      }
 -              }
 +              if (uptodate && !is_data_inode(inode) &&
 +                  btrfs_validate_metadata_buffer(bbio, page, start, end, mirror))
 +                      uptodate = false;
  
                if (likely(uptodate)) {
                        loff_t i_size = i_size_read(inode);
                        pgoff_t end_index = i_size >> PAGE_SHIFT;
  
 -                      btrfs_clean_io_failure(BTRFS_I(inode), start, page, 0);
 -
                        /*
                         * Zero out the remaining part if this range straddles
                         * i_size.
                                zero_user_segment(page, zero_start,
                                                  offset_in_page(end) + 1);
                        }
 -              } else if (is_data_inode(inode)) {
 -                      /*
 -                       * Only try to repair bios that actually made it to a
 -                       * device.  If the bio failed to be submitted mirror
 -                       * is 0 and we need to fail it without retrying.
 -                       *
 -                       * This also includes the high level bios for compressed
 -                       * extents - these never make it to a device and repair
 -                       * is already handled on the lower compressed bio.
 -                       */
 -                      if (mirror > 0)
 -                              repair = true;
 -              } else {
 +              } else if (!is_data_inode(inode)) {
                        struct extent_buffer *eb;
  
                        eb = find_extent_buffer_readpage(fs_info, page, start);
                        atomic_dec(&eb->io_pages);
                }
  
 -              if (repair) {
 -                      /*
 -                       * submit_data_read_repair() will handle all the good
 -                       * and bad sectors, we just continue to the next bvec.
 -                       */
 -                      submit_data_read_repair(inode, bbio, bio_offset, bvec,
 -                                              error_bitmap);
 -              } else {
 -                      /* Update page status and unlock */
 -                      end_page_read(page, uptodate, start, len);
 -                      endio_readpage_release_extent(&processed, BTRFS_I(inode),
 -                                      start, end, PageUptodate(page));
 -              }
 +              /* Update page status and unlock. */
 +              end_page_read(page, uptodate, start, len);
 +              endio_readpage_release_extent(&processed, BTRFS_I(inode),
 +                                            start, end, PageUptodate(page));
  
                ASSERT(bio_offset + len > bio_offset);
                bio_offset += len;
        }
        /* Release the last extent */
        endio_readpage_release_extent(&processed, NULL, 0, 0, false);
 -      btrfs_bio_free_csum(bbio);
        bio_put(bio);
  }
  
@@@ -895,10 -1270,11 +895,10 @@@ static int btrfs_bio_add_page(struct bt
        u32 real_size;
        const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
        bool contig = false;
 -      int ret;
  
        ASSERT(bio);
        /* The limit should be calculated when bio_ctrl->bio is allocated */
 -      ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary);
 +      ASSERT(bio_ctrl->len_to_oe_boundary);
        if (bio_ctrl->compress_type != compress_type)
                return 0;
  
        if (!contig)
                return 0;
  
 -      real_size = min(bio_ctrl->len_to_oe_boundary,
 -                      bio_ctrl->len_to_stripe_boundary) - bio_size;
 -      real_size = min(real_size, size);
 +      real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size);
  
        /*
         * If real_size is 0, never call bio_add_*_page(), as even size is 0,
        if (real_size == 0)
                return 0;
  
 -      if (bio_op(bio) == REQ_OP_ZONE_APPEND)
 -              ret = bio_add_zone_append_page(bio, page, real_size, pg_offset);
 -      else
 -              ret = bio_add_page(bio, page, real_size, pg_offset);
 -
 -      return ret;
 +      return bio_add_page(bio, page, real_size, pg_offset);
  }
  
 -static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 -                             struct btrfs_inode *inode, u64 file_offset)
 +static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 +                              struct btrfs_inode *inode, u64 file_offset)
  {
 -      struct btrfs_fs_info *fs_info = inode->root->fs_info;
 -      struct btrfs_io_geometry geom;
        struct btrfs_ordered_extent *ordered;
 -      struct extent_map *em;
 -      u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
 -      int ret;
  
        /*
 -       * Pages for compressed extent are never submitted to disk directly,
 -       * thus it has no real boundary, just set them to U32_MAX.
 -       *
 -       * The split happens for real compressed bio, which happens in
 -       * btrfs_submit_compressed_read/write().
 +       * Limit the extent to the ordered boundary for Zone Append.
 +       * Compressed bios aren't submitted directly, so it doesn't apply to
 +       * them.
         */
 -      if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
 -              bio_ctrl->len_to_oe_boundary = U32_MAX;
 -              bio_ctrl->len_to_stripe_boundary = U32_MAX;
 -              return 0;
 -      }
 -      em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
 -      if (IS_ERR(em))
 -              return PTR_ERR(em);
 -      ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
 -                                  logical, &geom);
 -      free_extent_map(em);
 -      if (ret < 0) {
 -              return ret;
 -      }
 -      if (geom.len > U32_MAX)
 -              bio_ctrl->len_to_stripe_boundary = U32_MAX;
 -      else
 -              bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
 -
 -      if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
 -              bio_ctrl->len_to_oe_boundary = U32_MAX;
 -              return 0;
 -      }
 -
 -      /* Ordered extent not yet created, so we're good */
 -      ordered = btrfs_lookup_ordered_extent(inode, file_offset);
 -      if (!ordered) {
 -              bio_ctrl->len_to_oe_boundary = U32_MAX;
 -              return 0;
 +      if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE &&
 +          btrfs_use_zone_append(btrfs_bio(bio_ctrl->bio))) {
 +              ordered = btrfs_lookup_ordered_extent(inode, file_offset);
 +              if (ordered) {
 +                      bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
 +                                      ordered->file_offset +
 +                                      ordered->disk_num_bytes - file_offset);
 +                      btrfs_put_ordered_extent(ordered);
 +                      return;
 +              }
        }
  
 -      bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
 -              ordered->disk_bytenr + ordered->disk_num_bytes - logical);
 -      btrfs_put_ordered_extent(ordered);
 -      return 0;
 +      bio_ctrl->len_to_oe_boundary = U32_MAX;
  }
  
 -static int alloc_new_bio(struct btrfs_inode *inode,
 -                       struct btrfs_bio_ctrl *bio_ctrl,
 -                       struct writeback_control *wbc,
 -                       blk_opf_t opf,
 -                       u64 disk_bytenr, u32 offset, u64 file_offset,
 -                       enum btrfs_compression_type compress_type)
 +static void alloc_new_bio(struct btrfs_inode *inode,
 +                        struct btrfs_bio_ctrl *bio_ctrl,
 +                        struct writeback_control *wbc, blk_opf_t opf,
 +                        u64 disk_bytenr, u32 offset, u64 file_offset,
 +                        enum btrfs_compression_type compress_type)
  {
        struct btrfs_fs_info *fs_info = inode->root->fs_info;
        struct bio *bio;
 -      int ret;
  
 -      ASSERT(bio_ctrl->end_io_func);
 -
 -      bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, bio_ctrl->end_io_func, NULL);
 +      bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, inode, bio_ctrl->end_io_func,
 +                            NULL);
        /*
         * For compressed page range, its disk_bytenr is always @disk_bytenr
         * passed in, no matter if we have added any range into previous bio.
                bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
        else
                bio->bi_iter.bi_sector = (disk_bytenr + offset) >> SECTOR_SHIFT;
 +      btrfs_bio(bio)->file_offset = file_offset;
        bio_ctrl->bio = bio;
        bio_ctrl->compress_type = compress_type;
 -      ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 -      if (ret < 0)
 -              goto error;
 +      calc_bio_boundaries(bio_ctrl, inode, file_offset);
  
        if (wbc) {
                /*
 -               * For Zone append we need the correct block_device that we are
 -               * going to write to set in the bio to be able to respect the
 -               * hardware limitation.  Look it up here:
 +               * Pick the last added device to support cgroup writeback.  For
 +               * multi-device file systems this means blk-cgroup policies have
 +               * to always be set on the last added/replaced device.
 +               * This is a bit odd but has been like that for a long time.
                 */
 -              if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 -                      struct btrfs_device *dev;
 -
 -                      dev = btrfs_zoned_get_device(fs_info, disk_bytenr,
 -                                                   fs_info->sectorsize);
 -                      if (IS_ERR(dev)) {
 -                              ret = PTR_ERR(dev);
 -                              goto error;
 -                      }
 -
 -                      bio_set_dev(bio, dev->bdev);
 -              } else {
 -                      /*
 -                       * Otherwise pick the last added device to support
 -                       * cgroup writeback.  For multi-device file systems this
 -                       * means blk-cgroup policies have to always be set on the
 -                       * last added/replaced device.  This is a bit odd but has
 -                       * been like that for a long time.
 -                       */
 -                      bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
 -              }
 +              bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
                wbc_init_bio(wbc, bio);
 -      } else {
 -              ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND);
        }
 -      return 0;
 -error:
 -      bio_ctrl->bio = NULL;
 -      btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
 -      return ret;
  }
  
  /*
@@@ -1030,6 -1472,7 +1030,6 @@@ static int submit_extent_page(blk_opf_
                              enum btrfs_compression_type compress_type,
                              bool force_bio_submit)
  {
 -      int ret = 0;
        struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
        unsigned int cur = pg_offset;
  
  
                /* Allocate new bio if needed */
                if (!bio_ctrl->bio) {
 -                      ret = alloc_new_bio(inode, bio_ctrl, wbc, opf,
 -                                          disk_bytenr, offset,
 -                                          page_offset(page) + cur,
 -                                          compress_type);
 -                      if (ret < 0)
 -                              return ret;
 +                      alloc_new_bio(inode, bio_ctrl, wbc, opf, disk_bytenr,
 +                                    offset, page_offset(page) + cur,
 +                                    compress_type);
                }
                /*
                 * We must go through btrfs_bio_add_page() to ensure each
@@@ -1608,6 -2054,10 +1608,6 @@@ static noinline_for_stack int __extent_
                 * find_next_dirty_byte() are all exclusive
                 */
                iosize = min(min(em_end, end + 1), dirty_range_end) - cur;
 -
 -              if (btrfs_use_zone_append(inode, em->block_start))
 -                      op = REQ_OP_ZONE_APPEND;
 -
                free_extent_map(em);
                em = NULL;
  
@@@ -1910,6 -2360,13 +1910,6 @@@ static void set_btree_ioerr(struct pag
         */
        mapping_set_error(page->mapping, -EIO);
  
 -      /*
 -       * If we error out, we should add back the dirty_metadata_bytes
 -       * to make it consistent.
 -       */
 -      percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 -                               eb->len, fs_info->dirty_metadata_batch);
 -
        /*
         * If writeback for a btree extent that doesn't belong to a log tree
         * failed, increment the counter transaction->eb_write_errors.
@@@ -2408,14 -2865,14 +2408,14 @@@ int btree_write_cache_pages(struct addr
        int ret = 0;
        int done = 0;
        int nr_to_write_done = 0;
-       struct pagevec pvec;
-       int nr_pages;
+       struct folio_batch fbatch;
+       unsigned int nr_folios;
        pgoff_t index;
        pgoff_t end;            /* Inclusive */
        int scanned = 0;
        xa_mark_t tag;
  
-       pagevec_init(&pvec);
+       folio_batch_init(&fbatch);
        if (wbc->range_cyclic) {
                index = mapping->writeback_index; /* Start from prev offset */
                end = -1;
@@@ -2438,14 -2895,15 +2438,15 @@@ retry
        if (wbc->sync_mode == WB_SYNC_ALL)
                tag_pages_for_writeback(mapping, index, end);
        while (!done && !nr_to_write_done && (index <= end) &&
-              (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
-                       tag))) {
+              (nr_folios = filemap_get_folios_tag(mapping, &index, end,
+                                           tag, &fbatch))) {
                unsigned i;
  
-               for (i = 0; i < nr_pages; i++) {
-                       struct page *page = pvec.pages[i];
+               for (i = 0; i < nr_folios; i++) {
+                       struct folio *folio = fbatch.folios[i];
  
-                       ret = submit_eb_page(page, wbc, &bio_ctrl, &eb_context);
+                       ret = submit_eb_page(&folio->page, wbc, &bio_ctrl,
+                                       &eb_context);
                        if (ret == 0)
                                continue;
                        if (ret < 0) {
                         */
                        nr_to_write_done = wbc->nr_to_write <= 0;
                }
-               pagevec_release(&pvec);
+               folio_batch_release(&fbatch);
                cond_resched();
        }
        if (!scanned && !done) {
@@@ -2535,8 -2993,8 +2536,8 @@@ static int extent_write_cache_pages(str
        int ret = 0;
        int done = 0;
        int nr_to_write_done = 0;
-       struct pagevec pvec;
-       int nr_pages;
+       struct folio_batch fbatch;
+       unsigned int nr_folios;
        pgoff_t index;
        pgoff_t end;            /* Inclusive */
        pgoff_t done_index;
        if (!igrab(inode))
                return 0;
  
-       pagevec_init(&pvec);
+       folio_batch_init(&fbatch);
        if (wbc->range_cyclic) {
                index = mapping->writeback_index; /* Start from prev offset */
                end = -1;
@@@ -2594,14 -3052,14 +2595,14 @@@ retry
                tag_pages_for_writeback(mapping, index, end);
        done_index = index;
        while (!done && !nr_to_write_done && (index <= end) &&
-                       (nr_pages = pagevec_lookup_range_tag(&pvec, mapping,
-                                               &index, end, tag))) {
+                       (nr_folios = filemap_get_folios_tag(mapping, &index,
+                                                       end, tag, &fbatch))) {
                unsigned i;
  
-               for (i = 0; i < nr_pages; i++) {
-                       struct page *page = pvec.pages[i];
+               for (i = 0; i < nr_folios; i++) {
+                       struct folio *folio = fbatch.folios[i];
  
-                       done_index = page->index + 1;
+                       done_index = folio->index + folio_nr_pages(folio);
                        /*
                         * At this point we hold neither the i_pages lock nor
                         * the page lock: the page may be truncated or
                         * or even swizzled back from swapper_space to
                         * tmpfs file mapping
                         */
-                       if (!trylock_page(page)) {
+                       if (!folio_trylock(folio)) {
                                submit_write_bio(bio_ctrl, 0);
-                               lock_page(page);
+                               folio_lock(folio);
                        }
  
-                       if (unlikely(page->mapping != mapping)) {
-                               unlock_page(page);
+                       if (unlikely(folio->mapping != mapping)) {
+                               folio_unlock(folio);
                                continue;
                        }
  
                        if (wbc->sync_mode != WB_SYNC_NONE) {
-                               if (PageWriteback(page))
+                               if (folio_test_writeback(folio))
                                        submit_write_bio(bio_ctrl, 0);
-                               wait_on_page_writeback(page);
+                               folio_wait_writeback(folio);
                        }
  
-                       if (PageWriteback(page) ||
-                           !clear_page_dirty_for_io(page)) {
-                               unlock_page(page);
+                       if (folio_test_writeback(folio) ||
+                           !folio_clear_dirty_for_io(folio)) {
+                               folio_unlock(folio);
                                continue;
                        }
  
-                       ret = __extent_writepage(page, wbc, bio_ctrl);
+                       ret = __extent_writepage(&folio->page, wbc, bio_ctrl);
                        if (ret < 0) {
                                done = 1;
                                break;
                         */
                        nr_to_write_done = wbc->nr_to_write <= 0;
                }
-               pagevec_release(&pvec);
+               folio_batch_release(&fbatch);
                cond_resched();
        }
        if (!scanned && !done) {
@@@ -3369,7 -3827,6 +3370,7 @@@ int extent_fiemap(struct btrfs_inode *i
        lockend = round_up(start + len, inode->root->fs_info->sectorsize);
        prev_extent_end = lockstart;
  
 +      btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
        lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
  
        ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end);
@@@ -3563,7 -4020,6 +3564,7 @@@ check_eof_delalloc
  
  out_unlock:
        unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
 +      btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
  out:
        free_extent_state(delalloc_cached_state);
        btrfs_free_backref_share_ctx(backref_ctx);
@@@ -4267,25 -4723,12 +4268,25 @@@ static void clear_subpage_extent_buffer
        WARN_ON(atomic_read(&eb->refs) == 0);
  }
  
 -void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 +void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 +                            struct extent_buffer *eb)
  {
 +      struct btrfs_fs_info *fs_info = eb->fs_info;
        int i;
        int num_pages;
        struct page *page;
  
 +      btrfs_assert_tree_write_locked(eb);
 +
 +      if (trans && btrfs_header_generation(eb) != trans->transid)
 +              return;
 +
 +      if (!test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags))
 +              return;
 +
 +      percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, -eb->len,
 +                               fs_info->dirty_metadata_batch);
 +
        if (eb->fs_info->nodesize < PAGE_SIZE)
                return clear_subpage_extent_buffer_dirty(eb);
  
diff --combined fs/buffer.c
index 623e77d6ef770b1ce8e6db156f78d4bbc5f021af,7e42d67bcaadf7178d9a8d6789af0d402ced2473..9e1e2add541e07a593bcc743d73eb05ed51bfbd9
@@@ -48,7 -48,6 +48,7 @@@
  #include <linux/sched/mm.h>
  #include <trace/events/block.h>
  #include <linux/fscrypt.h>
 +#include <linux/fsverity.h>
  
  #include "internal.h"
  
@@@ -61,7 -60,7 +61,7 @@@ static void submit_bh_wbc(blk_opf_t opf
  inline void touch_buffer(struct buffer_head *bh)
  {
        trace_block_touch_buffer(bh);
-       mark_page_accessed(bh->b_page);
+       folio_mark_accessed(bh->b_folio);
  }
  EXPORT_SYMBOL(touch_buffer);
  
@@@ -247,18 -246,18 +247,18 @@@ static void end_buffer_async_read(struc
        unsigned long flags;
        struct buffer_head *first;
        struct buffer_head *tmp;
-       struct page *page;
-       int page_uptodate = 1;
+       struct folio *folio;
+       int folio_uptodate = 1;
  
        BUG_ON(!buffer_async_read(bh));
  
-       page = bh->b_page;
+       folio = bh->b_folio;
        if (uptodate) {
                set_buffer_uptodate(bh);
        } else {
                clear_buffer_uptodate(bh);
                buffer_io_error(bh, ", async page read");
-               SetPageError(page);
+               folio_set_error(folio);
        }
  
        /*
         * two buffer heads end IO at almost the same time and both
         * decide that the page is now completely done.
         */
-       first = page_buffers(page);
+       first = folio_buffers(folio);
        spin_lock_irqsave(&first->b_uptodate_lock, flags);
        clear_buffer_async_read(bh);
        unlock_buffer(bh);
        tmp = bh;
        do {
                if (!buffer_uptodate(tmp))
-                       page_uptodate = 0;
+                       folio_uptodate = 0;
                if (buffer_async_read(tmp)) {
                        BUG_ON(!buffer_locked(tmp));
                        goto still_busy;
         * If all of the buffers are uptodate then we can set the page
         * uptodate.
         */
-       if (page_uptodate)
-               SetPageUptodate(page);
-       unlock_page(page);
+       if (folio_uptodate)
+               folio_mark_uptodate(folio);
+       folio_unlock(folio);
        return;
  
  still_busy:
        return;
  }
  
 -struct decrypt_bh_ctx {
 +struct postprocess_bh_ctx {
        struct work_struct work;
        struct buffer_head *bh;
  };
  
 +static void verify_bh(struct work_struct *work)
 +{
 +      struct postprocess_bh_ctx *ctx =
 +              container_of(work, struct postprocess_bh_ctx, work);
 +      struct buffer_head *bh = ctx->bh;
 +      bool valid;
 +
 +      valid = fsverity_verify_blocks(page_folio(bh->b_page), bh->b_size,
 +                                     bh_offset(bh));
 +      end_buffer_async_read(bh, valid);
 +      kfree(ctx);
 +}
 +
 +static bool need_fsverity(struct buffer_head *bh)
 +{
 +      struct page *page = bh->b_page;
 +      struct inode *inode = page->mapping->host;
 +
 +      return fsverity_active(inode) &&
 +              /* needed by ext4 */
 +              page->index < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
 +}
 +
  static void decrypt_bh(struct work_struct *work)
  {
 -      struct decrypt_bh_ctx *ctx =
 -              container_of(work, struct decrypt_bh_ctx, work);
 +      struct postprocess_bh_ctx *ctx =
 +              container_of(work, struct postprocess_bh_ctx, work);
        struct buffer_head *bh = ctx->bh;
        int err;
  
 -      err = fscrypt_decrypt_pagecache_blocks(bh->b_page, bh->b_size,
 -                                             bh_offset(bh));
 +      err = fscrypt_decrypt_pagecache_blocks(page_folio(bh->b_page),
 +                                             bh->b_size, bh_offset(bh));
 +      if (err == 0 && need_fsverity(bh)) {
 +              /*
 +               * We use different work queues for decryption and for verity
 +               * because verity may require reading metadata pages that need
 +               * decryption, and we shouldn't recurse to the same workqueue.
 +               */
 +              INIT_WORK(&ctx->work, verify_bh);
 +              fsverity_enqueue_verify_work(&ctx->work);
 +              return;
 +      }
        end_buffer_async_read(bh, err == 0);
        kfree(ctx);
  }
   */
  static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
  {
-       struct inode *inode = bh->b_page->mapping->host;
 -      /* Decrypt if needed */
 -      if (uptodate &&
 -          fscrypt_inode_uses_fs_layer_crypto(bh->b_folio->mapping->host)) {
 -              struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
++      struct inode *inode = bh->b_folio->mapping->host;
 +      bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode);
 +      bool verify = need_fsverity(bh);
 +
 +      /* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */
 +      if (uptodate && (decrypt || verify)) {
 +              struct postprocess_bh_ctx *ctx =
 +                      kmalloc(sizeof(*ctx), GFP_ATOMIC);
  
                if (ctx) {
 -                      INIT_WORK(&ctx->work, decrypt_bh);
                        ctx->bh = bh;
 -                      fscrypt_enqueue_decrypt_work(&ctx->work);
 +                      if (decrypt) {
 +                              INIT_WORK(&ctx->work, decrypt_bh);
 +                              fscrypt_enqueue_decrypt_work(&ctx->work);
 +                      } else {
 +                              INIT_WORK(&ctx->work, verify_bh);
 +                              fsverity_enqueue_verify_work(&ctx->work);
 +                      }
                        return;
                }
                uptodate = 0;
@@@ -387,21 -344,21 +387,21 @@@ void end_buffer_async_write(struct buff
        unsigned long flags;
        struct buffer_head *first;
        struct buffer_head *tmp;
-       struct page *page;
+       struct folio *folio;
  
        BUG_ON(!buffer_async_write(bh));
  
-       page = bh->b_page;
+       folio = bh->b_folio;
        if (uptodate) {
                set_buffer_uptodate(bh);
        } else {
                buffer_io_error(bh, ", lost async page write");
                mark_buffer_write_io_error(bh);
                clear_buffer_uptodate(bh);
-               SetPageError(page);
+               folio_set_error(folio);
        }
  
-       first = page_buffers(page);
+       first = folio_buffers(folio);
        spin_lock_irqsave(&first->b_uptodate_lock, flags);
  
        clear_buffer_async_write(bh);
                tmp = tmp->b_this_page;
        }
        spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
-       end_page_writeback(page);
+       folio_end_writeback(folio);
        return;
  
  still_busy:
@@@ -613,7 -570,7 +613,7 @@@ void write_boundary_block(struct block_
  void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
  {
        struct address_space *mapping = inode->i_mapping;
-       struct address_space *buffer_mapping = bh->b_page->mapping;
+       struct address_space *buffer_mapping = bh->b_folio->mapping;
  
        mark_buffer_dirty(bh);
        if (!mapping->private_data) {
@@@ -1116,7 -1073,7 +1116,7 @@@ __getblk_slow(struct block_device *bdev
   * and then attach the address_space's inode to its superblock's dirty
   * inode list.
   *
-  * mark_buffer_dirty() is atomic.  It takes bh->b_page->mapping->private_lock,
+  * mark_buffer_dirty() is atomic.  It takes bh->b_folio->mapping->private_lock,
   * i_pages lock and mapping->host->i_lock.
   */
  void mark_buffer_dirty(struct buffer_head *bh)
        }
  
        if (!test_set_buffer_dirty(bh)) {
-               struct page *page = bh->b_page;
+               struct folio *folio = bh->b_folio;
                struct address_space *mapping = NULL;
  
-               lock_page_memcg(page);
-               if (!TestSetPageDirty(page)) {
-                       mapping = page_mapping(page);
+               folio_memcg_lock(folio);
+               if (!folio_test_set_dirty(folio)) {
+                       mapping = folio->mapping;
                        if (mapping)
-                               __set_page_dirty(page, mapping, 0);
+                               __folio_mark_dirty(folio, mapping, 0);
                }
-               unlock_page_memcg(page);
+               folio_memcg_unlock(folio);
                if (mapping)
                        __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
        }
@@@ -1160,8 -1117,8 +1160,8 @@@ void mark_buffer_write_io_error(struct 
  
        set_buffer_write_io_error(bh);
        /* FIXME: do we need to set this in both places? */
-       if (bh->b_page && bh->b_page->mapping)
-               mapping_set_error(bh->b_page->mapping, -EIO);
+       if (bh->b_folio && bh->b_folio->mapping)
+               mapping_set_error(bh->b_folio->mapping, -EIO);
        if (bh->b_assoc_map)
                mapping_set_error(bh->b_assoc_map, -EIO);
        rcu_read_lock();
@@@ -1197,7 -1154,7 +1197,7 @@@ void __bforget(struct buffer_head *bh
  {
        clear_buffer_dirty(bh);
        if (bh->b_assoc_map) {
-               struct address_space *buffer_mapping = bh->b_page->mapping;
+               struct address_space *buffer_mapping = bh->b_folio->mapping;
  
                spin_lock(&buffer_mapping->private_lock);
                list_del_init(&bh->b_assoc_buffers);
@@@ -2288,11 -2245,6 +2288,11 @@@ int block_read_full_folio(struct folio 
        int nr, i;
        int fully_mapped = 1;
        bool page_error = false;
 +      loff_t limit = i_size_read(inode);
 +
 +      /* This is needed for ext4. */
 +      if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
 +              limit = inode->i_sb->s_maxbytes;
  
        VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
  
        bbits = block_size_bits(blocksize);
  
        iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits);
 -      lblock = (i_size_read(inode)+blocksize-1) >> bbits;
 +      lblock = (limit+blocksize-1) >> bbits;
        bh = head;
        nr = 0;
        i = 0;
diff --combined fs/ceph/addr.c
index cac4083e387a50964674282ebbd0664f658c64d4,905268bf974109ed531c357fc6650a023da7a68e..d5335f445233600d81568432f7e1e41e7904df45
@@@ -305,7 -305,7 +305,7 @@@ static void ceph_netfs_issue_read(struc
        struct inode *inode = rreq->inode;
        struct ceph_inode_info *ci = ceph_inode(inode);
        struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 -      struct ceph_osd_request *req;
 +      struct ceph_osd_request *req = NULL;
        struct ceph_vino vino = ceph_vino(inode);
        struct iov_iter iter;
        struct page **pages;
        int err = 0;
        u64 len = subreq->len;
  
 +      if (ceph_inode_is_shutdown(inode)) {
 +              err = -EIO;
 +              goto out;
 +      }
 +
        if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq))
                return;
  
@@@ -568,9 -563,6 +568,9 @@@ static int writepage_nounlock(struct pa
  
        dout("writepage %p idx %lu\n", page, page->index);
  
 +      if (ceph_inode_is_shutdown(inode))
 +              return -EIO;
 +
        /* verify this is a writeable snap context */
        snapc = page_snap_context(page);
        if (!snapc) {
@@@ -800,7 -792,7 +800,7 @@@ static int ceph_writepages_start(struc
        struct ceph_vino vino = ceph_vino(inode);
        pgoff_t index, start_index, end = -1;
        struct ceph_snap_context *snapc = NULL, *last_snapc = NULL, *pgsnapc;
-       struct pagevec pvec;
+       struct folio_batch fbatch;
        int rc = 0;
        unsigned int wsize = i_blocksize(inode);
        struct ceph_osd_request *req = NULL;
        if (fsc->mount_options->wsize < wsize)
                wsize = fsc->mount_options->wsize;
  
-       pagevec_init(&pvec);
+       folio_batch_init(&fbatch);
  
        start_index = wbc->range_cyclic ? mapping->writeback_index : 0;
        index = start_index;
@@@ -877,7 -869,7 +877,7 @@@ retry
  
        while (!done && index <= end) {
                int num_ops = 0, op_idx;
-               unsigned i, pvec_pages, max_pages, locked_pages = 0;
+               unsigned i, nr_folios, max_pages, locked_pages = 0;
                struct page **pages = NULL, **data_pages;
                struct page *page;
                pgoff_t strip_unit_end = 0;
                max_pages = wsize >> PAGE_SHIFT;
  
  get_more_pages:
-               pvec_pages = pagevec_lookup_range_tag(&pvec, mapping, &index,
-                                               end, PAGECACHE_TAG_DIRTY);
-               dout("pagevec_lookup_range_tag got %d\n", pvec_pages);
-               if (!pvec_pages && !locked_pages)
+               nr_folios = filemap_get_folios_tag(mapping, &index,
+                               end, PAGECACHE_TAG_DIRTY, &fbatch);
+               dout("pagevec_lookup_range_tag got %d\n", nr_folios);
+               if (!nr_folios && !locked_pages)
                        break;
-               for (i = 0; i < pvec_pages && locked_pages < max_pages; i++) {
-                       page = pvec.pages[i];
+               for (i = 0; i < nr_folios && locked_pages < max_pages; i++) {
+                       page = &fbatch.folios[i]->page;
                        dout("? %p idx %lu\n", page, page->index);
                        if (locked_pages == 0)
                                lock_page(page);  /* first page */
                                len = 0;
                        }
  
-                       /* note position of first page in pvec */
+                       /* note position of first page in fbatch */
                        dout("%p will write page %p idx %lu\n",
                             inode, page, page->index);
  
                                fsc->write_congested = true;
  
                        pages[locked_pages++] = page;
-                       pvec.pages[i] = NULL;
+                       fbatch.folios[i] = NULL;
  
                        len += thp_size(page);
                }
  
                /* did we get anything? */
                if (!locked_pages)
-                       goto release_pvec_pages;
+                       goto release_folios;
                if (i) {
                        unsigned j, n = 0;
-                       /* shift unused page to beginning of pvec */
-                       for (j = 0; j < pvec_pages; j++) {
-                               if (!pvec.pages[j])
+                       /* shift unused page to beginning of fbatch */
+                       for (j = 0; j < nr_folios; j++) {
+                               if (!fbatch.folios[j])
                                        continue;
                                if (n < j)
-                                       pvec.pages[n] = pvec.pages[j];
+                                       fbatch.folios[n] = fbatch.folios[j];
                                n++;
                        }
-                       pvec.nr = n;
+                       fbatch.nr = n;
  
-                       if (pvec_pages && i == pvec_pages &&
+                       if (nr_folios && i == nr_folios &&
                            locked_pages < max_pages) {
-                               dout("reached end pvec, trying for more\n");
-                               pagevec_release(&pvec);
+                               dout("reached end fbatch, trying for more\n");
+                               folio_batch_release(&fbatch);
                                goto get_more_pages;
                        }
                }
@@@ -1172,10 -1164,10 +1172,10 @@@ new_request
                if (wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
                        done = true;
  
- release_pvec_pages:
-               dout("pagevec_release on %d pages (%p)\n", (int)pvec.nr,
-                    pvec.nr ? pvec.pages[0] : NULL);
-               pagevec_release(&pvec);
+ release_folios:
+               dout("folio_batch release on %d folios (%p)\n", (int)fbatch.nr,
+                    fbatch.nr ? fbatch.folios[0] : NULL);
+               folio_batch_release(&fbatch);
        }
  
        if (should_loop && !done) {
                        unsigned i, nr;
                        index = 0;
                        while ((index <= end) &&
-                              (nr = pagevec_lookup_tag(&pvec, mapping, &index,
-                                               PAGECACHE_TAG_WRITEBACK))) {
+                              (nr = filemap_get_folios_tag(mapping, &index,
+                                               (pgoff_t)-1,
+                                               PAGECACHE_TAG_WRITEBACK,
+                                               &fbatch))) {
                                for (i = 0; i < nr; i++) {
-                                       page = pvec.pages[i];
+                                       page = &fbatch.folios[i]->page;
                                        if (page_snap_context(page) != snapc)
                                                continue;
                                        wait_on_page_writeback(page);
                                }
-                               pagevec_release(&pvec);
+                               folio_batch_release(&fbatch);
                                cond_resched();
                        }
                }
@@@ -1651,7 -1645,7 +1653,7 @@@ int ceph_uninline_data(struct file *fil
        struct ceph_inode_info *ci = ceph_inode(inode);
        struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
        struct ceph_osd_request *req = NULL;
 -      struct ceph_cap_flush *prealloc_cf;
 +      struct ceph_cap_flush *prealloc_cf = NULL;
        struct folio *folio = NULL;
        u64 inline_version = CEPH_INLINE_NONE;
        struct page *pages[1];
        dout("uninline_data %p %llx.%llx inline_version %llu\n",
             inode, ceph_vinop(inode), inline_version);
  
 +      if (ceph_inode_is_shutdown(inode)) {
 +              err = -EIO;
 +              goto out;
 +      }
 +
        if (inline_version == CEPH_INLINE_NONE)
                return 0;
  
diff --combined fs/cifs/file.c
index 0e602173ac76c8abf04e492632ceb36660cf302b,162fab5a4583d88ca21c66b4abde0888dd94c31f..5365a329908884712935fd3d5551f6bfc4be1f4c
@@@ -9,7 -9,6 +9,7 @@@
   *
   */
  #include <linux/fs.h>
 +#include <linux/filelock.h>
  #include <linux/backing-dev.h>
  #include <linux/stat.h>
  #include <linux/fcntl.h>
  #include "cifs_ioctl.h"
  #include "cached_dir.h"
  
 +/*
 + * Remove the dirty flags from a span of pages.
 + */
 +static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned int len)
 +{
 +      struct address_space *mapping = inode->i_mapping;
 +      struct folio *folio;
 +      pgoff_t end;
 +
 +      XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
 +
 +      rcu_read_lock();
 +
 +      end = (start + len - 1) / PAGE_SIZE;
 +      xas_for_each_marked(&xas, folio, end, PAGECACHE_TAG_DIRTY) {
 +              xas_pause(&xas);
 +              rcu_read_unlock();
 +              folio_lock(folio);
 +              folio_clear_dirty_for_io(folio);
 +              folio_unlock(folio);
 +              rcu_read_lock();
 +      }
 +
 +      rcu_read_unlock();
 +}
 +
 +/*
 + * Completion of write to server.
 + */
 +void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len)
 +{
 +      struct address_space *mapping = inode->i_mapping;
 +      struct folio *folio;
 +      pgoff_t end;
 +
 +      XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
 +
 +      if (!len)
 +              return;
 +
 +      rcu_read_lock();
 +
 +      end = (start + len - 1) / PAGE_SIZE;
 +      xas_for_each(&xas, folio, end) {
 +              if (!folio_test_writeback(folio)) {
 +                      WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
 +                                len, start, folio_index(folio), end);
 +                      continue;
 +              }
 +
 +              folio_detach_private(folio);
 +              folio_end_writeback(folio);
 +      }
 +
 +      rcu_read_unlock();
 +}
 +
 +/*
 + * Failure of write to server.
 + */
 +void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len)
 +{
 +      struct address_space *mapping = inode->i_mapping;
 +      struct folio *folio;
 +      pgoff_t end;
 +
 +      XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
 +
 +      if (!len)
 +              return;
 +
 +      rcu_read_lock();
 +
 +      end = (start + len - 1) / PAGE_SIZE;
 +      xas_for_each(&xas, folio, end) {
 +              if (!folio_test_writeback(folio)) {
 +                      WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
 +                                len, start, folio_index(folio), end);
 +                      continue;
 +              }
 +
 +              folio_set_error(folio);
 +              folio_end_writeback(folio);
 +      }
 +
 +      rcu_read_unlock();
 +}
 +
 +/*
 + * Redirty pages after a temporary failure.
 + */
 +void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len)
 +{
 +      struct address_space *mapping = inode->i_mapping;
 +      struct folio *folio;
 +      pgoff_t end;
 +
 +      XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
 +
 +      if (!len)
 +              return;
 +
 +      rcu_read_lock();
 +
 +      end = (start + len - 1) / PAGE_SIZE;
 +      xas_for_each(&xas, folio, end) {
 +              if (!folio_test_writeback(folio)) {
 +                      WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
 +                                len, start, folio_index(folio), end);
 +                      continue;
 +              }
 +
 +              filemap_dirty_folio(folio->mapping, folio);
 +              folio_end_writeback(folio);
 +      }
 +
 +      rcu_read_unlock();
 +}
 +
  /*
   * Mark as invalid, all open files on tree connections since they
   * were closed when session to server was lost.
@@@ -380,15 -260,14 +380,15 @@@ static int cifs_nt_open(const char *ful
        if (f_flags & O_DIRECT)
                create_options |= CREATE_NO_BUFFER;
  
 -      oparms.tcon = tcon;
 -      oparms.cifs_sb = cifs_sb;
 -      oparms.desired_access = desired_access;
 -      oparms.create_options = cifs_create_options(cifs_sb, create_options);
 -      oparms.disposition = disposition;
 -      oparms.path = full_path;
 -      oparms.fid = fid;
 -      oparms.reconnect = false;
 +      oparms = (struct cifs_open_parms) {
 +              .tcon = tcon,
 +              .cifs_sb = cifs_sb,
 +              .desired_access = desired_access,
 +              .create_options = cifs_create_options(cifs_sb, create_options),
 +              .disposition = disposition,
 +              .path = full_path,
 +              .fid = fid,
 +      };
  
        rc = server->ops->open(xid, &oparms, oplock, buf);
        if (rc)
@@@ -969,16 -848,14 +969,16 @@@ cifs_reopen_file(struct cifsFileInfo *c
        if (server->ops->get_lease_key)
                server->ops->get_lease_key(inode, &cfile->fid);
  
 -      oparms.tcon = tcon;
 -      oparms.cifs_sb = cifs_sb;
 -      oparms.desired_access = desired_access;
 -      oparms.create_options = cifs_create_options(cifs_sb, create_options);
 -      oparms.disposition = disposition;
 -      oparms.path = full_path;
 -      oparms.fid = &cfile->fid;
 -      oparms.reconnect = true;
 +      oparms = (struct cifs_open_parms) {
 +              .tcon = tcon,
 +              .cifs_sb = cifs_sb,
 +              .desired_access = desired_access,
 +              .create_options = cifs_create_options(cifs_sb, create_options),
 +              .disposition = disposition,
 +              .path = full_path,
 +              .fid = &cfile->fid,
 +              .reconnect = true,
 +      };
  
        /*
         * Can not refresh inode by passing in file_info buf to be returned by
@@@ -2418,6 -2295,7 +2418,6 @@@ cifs_writedata_release(struct kref *ref
        if (wdata->cfile)
                cifsFileInfo_put(wdata->cfile);
  
 -      kvfree(wdata->pages);
        kfree(wdata);
  }
  
  static void
  cifs_writev_requeue(struct cifs_writedata *wdata)
  {
 -      int i, rc = 0;
 +      int rc = 0;
        struct inode *inode = d_inode(wdata->cfile->dentry);
        struct TCP_Server_Info *server;
 -      unsigned int rest_len;
 +      unsigned int rest_len = wdata->bytes;
 +      loff_t fpos = wdata->offset;
  
        server = tlink_tcon(wdata->cfile->tlink)->ses->server;
 -      i = 0;
 -      rest_len = wdata->bytes;
        do {
                struct cifs_writedata *wdata2;
 -              unsigned int j, nr_pages, wsize, tailsz, cur_len;
 +              unsigned int wsize, cur_len;
  
                wsize = server->ops->wp_retry_size(inode);
                if (wsize < rest_len) {
 -                      nr_pages = wsize / PAGE_SIZE;
 -                      if (!nr_pages) {
 +                      if (wsize < PAGE_SIZE) {
                                rc = -EOPNOTSUPP;
                                break;
                        }
 -                      cur_len = nr_pages * PAGE_SIZE;
 -                      tailsz = PAGE_SIZE;
 +                      cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
                } else {
 -                      nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE);
                        cur_len = rest_len;
 -                      tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
                }
  
 -              wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
 +              wdata2 = cifs_writedata_alloc(cifs_writev_complete);
                if (!wdata2) {
                        rc = -ENOMEM;
                        break;
                }
  
 -              for (j = 0; j < nr_pages; j++) {
 -                      wdata2->pages[j] = wdata->pages[i + j];
 -                      lock_page(wdata2->pages[j]);
 -                      clear_page_dirty_for_io(wdata2->pages[j]);
 -              }
 -
                wdata2->sync_mode = wdata->sync_mode;
 -              wdata2->nr_pages = nr_pages;
 -              wdata2->offset = page_offset(wdata2->pages[0]);
 -              wdata2->pagesz = PAGE_SIZE;
 -              wdata2->tailsz = tailsz;
 -              wdata2->bytes = cur_len;
 +              wdata2->offset  = fpos;
 +              wdata2->bytes   = cur_len;
 +              wdata2->iter    = wdata->iter;
 +
 +              iov_iter_advance(&wdata2->iter, fpos - wdata->offset);
 +              iov_iter_truncate(&wdata2->iter, wdata2->bytes);
 +
 +              if (iov_iter_is_xarray(&wdata2->iter))
 +                      /* Check for pages having been redirtied and clean
 +                       * them.  We can do this by walking the xarray.  If
 +                       * it's not an xarray, then it's a DIO and we shouldn't
 +                       * be mucking around with the page bits.
 +                       */
 +                      cifs_undirty_folios(inode, fpos, cur_len);
  
                rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
                                            &wdata2->cfile);
                                                       cifs_writedata_release);
                }
  
 -              for (j = 0; j < nr_pages; j++) {
 -                      unlock_page(wdata2->pages[j]);
 -                      if (rc != 0 && !is_retryable_error(rc)) {
 -                              SetPageError(wdata2->pages[j]);
 -                              end_page_writeback(wdata2->pages[j]);
 -                              put_page(wdata2->pages[j]);
 -                      }
 -              }
 -
                kref_put(&wdata2->refcount, cifs_writedata_release);
                if (rc) {
                        if (is_retryable_error(rc))
                                continue;
 -                      i += nr_pages;
 +                      fpos += cur_len;
 +                      rest_len -= cur_len;
                        break;
                }
  
 +              fpos += cur_len;
                rest_len -= cur_len;
 -              i += nr_pages;
 -      } while (i < wdata->nr_pages);
 +      } while (rest_len > 0);
  
 -      /* cleanup remaining pages from the original wdata */
 -      for (; i < wdata->nr_pages; i++) {
 -              SetPageError(wdata->pages[i]);
 -              end_page_writeback(wdata->pages[i]);
 -              put_page(wdata->pages[i]);
 -      }
 +      /* Clean up remaining pages from the original wdata */
 +      if (iov_iter_is_xarray(&wdata->iter))
 +              cifs_pages_write_failed(inode, fpos, rest_len);
  
        if (rc != 0 && !is_retryable_error(rc))
                mapping_set_error(inode->i_mapping, rc);
@@@ -2513,6 -2404,7 +2513,6 @@@ cifs_writev_complete(struct work_struc
        struct cifs_writedata *wdata = container_of(work,
                                                struct cifs_writedata, work);
        struct inode *inode = d_inode(wdata->cfile->dentry);
 -      int i = 0;
  
        if (wdata->result == 0) {
                spin_lock(&inode->i_lock);
        } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
                return cifs_writev_requeue(wdata);
  
 -      for (i = 0; i < wdata->nr_pages; i++) {
 -              struct page *page = wdata->pages[i];
 +      if (wdata->result == -EAGAIN)
 +              cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes);
 +      else if (wdata->result < 0)
 +              cifs_pages_write_failed(inode, wdata->offset, wdata->bytes);
 +      else
 +              cifs_pages_written_back(inode, wdata->offset, wdata->bytes);
  
 -              if (wdata->result == -EAGAIN)
 -                      __set_page_dirty_nobuffers(page);
 -              else if (wdata->result < 0)
 -                      SetPageError(page);
 -              end_page_writeback(page);
 -              cifs_readpage_to_fscache(inode, page);
 -              put_page(page);
 -      }
        if (wdata->result != -EAGAIN)
                mapping_set_error(inode->i_mapping, wdata->result);
        kref_put(&wdata->refcount, cifs_writedata_release);
  }
  
 -struct cifs_writedata *
 -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
 -{
 -      struct cifs_writedata *writedata = NULL;
 -      struct page **pages =
 -              kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
 -      if (pages) {
 -              writedata = cifs_writedata_direct_alloc(pages, complete);
 -              if (!writedata)
 -                      kvfree(pages);
 -      }
 -
 -      return writedata;
 -}
 -
 -struct cifs_writedata *
 -cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
 +struct cifs_writedata *cifs_writedata_alloc(work_func_t complete)
  {
        struct cifs_writedata *wdata;
  
        wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
        if (wdata != NULL) {
 -              wdata->pages = pages;
                kref_init(&wdata->refcount);
                INIT_LIST_HEAD(&wdata->list);
                init_completion(&wdata->done);
        return wdata;
  }
  
 -
  static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
  {
        struct address_space *mapping = page->mapping;
        return rc;
  }
  
 -static struct cifs_writedata *
 -wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
 -                        pgoff_t end, pgoff_t *index,
 -                        unsigned int *found_pages)
 +/*
 + * Extend the region to be written back to include subsequent contiguously
 + * dirty pages if possible, but don't sleep while doing so.
 + */
 +static void cifs_extend_writeback(struct address_space *mapping,
 +                                long *_count,
 +                                loff_t start,
 +                                int max_pages,
 +                                size_t max_len,
 +                                unsigned int *_len)
  {
 -      struct cifs_writedata *wdata;
 -      struct folio_batch fbatch;
 -      unsigned int i, idx, p, nr;
 -      wdata = cifs_writedata_alloc((unsigned int)tofind,
 -                                   cifs_writev_complete);
 -      if (!wdata)
 -              return NULL;
 -
 -      folio_batch_init(&fbatch);
 -      *found_pages = 0;
 -
 -again:
 -      nr = filemap_get_folios_tag(mapping, index, end,
 -                              PAGECACHE_TAG_DIRTY, &fbatch);
 -      if (!nr)
 -              goto out; /* No dirty pages left in the range */
 -
 -      for (i = 0; i < nr; i++) {
 -              struct folio *folio = fbatch.folios[i];
 -
 -              idx = 0;
 -              p = folio_nr_pages(folio);
 -add_more:
 -              wdata->pages[*found_pages] = folio_page(folio, idx);
 -              folio_get(folio);
 -              if (++*found_pages == tofind) {
 -                      folio_batch_release(&fbatch);
 -                      goto out;
 -              }
 -              if (++idx < p)
 -                      goto add_more;
 -      }
 -      folio_batch_release(&fbatch);
 -      goto again;
 -out:
 -      return wdata;
 -}
 +      struct folio_batch batch;
 +      struct folio *folio;
 +      unsigned int psize, nr_pages;
 +      size_t len = *_len;
 +      pgoff_t index = (start + len) / PAGE_SIZE;
 +      bool stop = true;
 +      unsigned int i;
 +      XA_STATE(xas, &mapping->i_pages, index);
  
 -static unsigned int
 -wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
 -                  struct address_space *mapping,
 -                  struct writeback_control *wbc,
 -                  pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
 -{
 -      unsigned int nr_pages = 0, i;
 -      struct page *page;
 +      folio_batch_init(&batch);
  
 -      for (i = 0; i < found_pages; i++) {
 -              page = wdata->pages[i];
 -              /*
 -               * At this point we hold neither the i_pages lock nor the
 -               * page lock: the page may be truncated or invalidated
 -               * (changing page->mapping to NULL), or even swizzled
 -               * back from swapper_space to tmpfs file mapping
 +      do {
 +              /* Firstly, we gather up a batch of contiguous dirty pages
 +               * under the RCU read lock - but we can't clear the dirty flags
 +               * there if any of those pages are mapped.
                 */
 +              rcu_read_lock();
  
 -              if (nr_pages == 0)
 -                      lock_page(page);
 -              else if (!trylock_page(page))
 -                      break;
 -
 -              if (unlikely(page->mapping != mapping)) {
 -                      unlock_page(page);
 -                      break;
 -              }
 +              xas_for_each(&xas, folio, ULONG_MAX) {
 +                      stop = true;
 +                      if (xas_retry(&xas, folio))
 +                              continue;
 +                      if (xa_is_value(folio))
 +                              break;
 +                      if (folio_index(folio) != index)
 +                              break;
 +                      if (!folio_try_get_rcu(folio)) {
 +                              xas_reset(&xas);
 +                              continue;
 +                      }
 +                      nr_pages = folio_nr_pages(folio);
 +                      if (nr_pages > max_pages)
 +                              break;
  
 -              if (!wbc->range_cyclic && page->index > end) {
 -                      *done = true;
 -                      unlock_page(page);
 -                      break;
 -              }
 +                      /* Has the page moved or been split? */
 +                      if (unlikely(folio != xas_reload(&xas))) {
 +                              folio_put(folio);
 +                              break;
 +                      }
  
 -              if (*next && (page->index != *next)) {
 -                      /* Not next consecutive page */
 -                      unlock_page(page);
 -                      break;
 -              }
 +                      if (!folio_trylock(folio)) {
 +                              folio_put(folio);
 +                              break;
 +                      }
 +                      if (!folio_test_dirty(folio) || folio_test_writeback(folio)) {
 +                              folio_unlock(folio);
 +                              folio_put(folio);
 +                              break;
 +                      }
  
 -              if (wbc->sync_mode != WB_SYNC_NONE)
 -                      wait_on_page_writeback(page);
 +                      max_pages -= nr_pages;
 +                      psize = folio_size(folio);
 +                      len += psize;
 +                      stop = false;
 +                      if (max_pages <= 0 || len >= max_len || *_count <= 0)
 +                              stop = true;
  
 -              if (PageWriteback(page) ||
 -                              !clear_page_dirty_for_io(page)) {
 -                      unlock_page(page);
 -                      break;
 +                      index += nr_pages;
 +                      if (!folio_batch_add(&batch, folio))
 +                              break;
 +                      if (stop)
 +                              break;
                }
  
 -              /*
 -               * This actually clears the dirty bit in the radix tree.
 -               * See cifs_writepage() for more commentary.
 +              if (!stop)
 +                      xas_pause(&xas);
 +              rcu_read_unlock();
 +
 +              /* Now, if we obtained any pages, we can shift them to being
 +               * writable and mark them for caching.
                 */
 -              set_page_writeback(page);
 -              if (page_offset(page) >= i_size_read(mapping->host)) {
 -                      *done = true;
 -                      unlock_page(page);
 -                      end_page_writeback(page);
 +              if (!folio_batch_count(&batch))
                        break;
 -              }
  
 -              wdata->pages[i] = page;
 -              *next = page->index + 1;
 -              ++nr_pages;
 -      }
 +              for (i = 0; i < folio_batch_count(&batch); i++) {
 +                      folio = batch.folios[i];
 +                      /* The folio should be locked, dirty and not undergoing
 +                       * writeback from the loop above.
 +                       */
 +                      if (!folio_clear_dirty_for_io(folio))
 +                              WARN_ON(1);
 +                      if (folio_start_writeback(folio))
 +                              WARN_ON(1);
  
 -      /* reset index to refind any pages skipped */
 -      if (nr_pages == 0)
 -              *index = wdata->pages[0]->index + 1;
 +                      *_count -= folio_nr_pages(folio);
 +                      folio_unlock(folio);
 +              }
  
 -      /* put any pages we aren't going to use */
 -      for (i = nr_pages; i < found_pages; i++) {
 -              put_page(wdata->pages[i]);
 -              wdata->pages[i] = NULL;
 -      }
 +              folio_batch_release(&batch);
 +              cond_resched();
 +      } while (!stop);
  
 -      return nr_pages;
 +      *_len = len;
  }
  
 -static int
 -wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
 -               struct address_space *mapping, struct writeback_control *wbc)
 +/*
 + * Write back the locked page and any subsequent non-locked dirty pages.
 + */
 +static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
 +                                               struct writeback_control *wbc,
 +                                               struct folio *folio,
 +                                               loff_t start, loff_t end)
  {
 +      struct inode *inode = mapping->host;
 +      struct TCP_Server_Info *server;
 +      struct cifs_writedata *wdata;
 +      struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 +      struct cifs_credits credits_on_stack;
 +      struct cifs_credits *credits = &credits_on_stack;
 +      struct cifsFileInfo *cfile = NULL;
 +      unsigned int xid, wsize, len;
 +      loff_t i_size = i_size_read(inode);
 +      size_t max_len;
 +      long count = wbc->nr_to_write;
        int rc;
  
 -      wdata->sync_mode = wbc->sync_mode;
 -      wdata->nr_pages = nr_pages;
 -      wdata->offset = page_offset(wdata->pages[0]);
 -      wdata->pagesz = PAGE_SIZE;
 -      wdata->tailsz = min(i_size_read(mapping->host) -
 -                      page_offset(wdata->pages[nr_pages - 1]),
 -                      (loff_t)PAGE_SIZE);
 -      wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz;
 -      wdata->pid = wdata->cfile->pid;
 -
 -      rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
 -      if (rc)
 -              return rc;
 -
 -      if (wdata->cfile->invalidHandle)
 -              rc = -EAGAIN;
 -      else
 -              rc = wdata->server->ops->async_writev(wdata,
 -                                                    cifs_writedata_release);
 +      /* The folio should be locked, dirty and not undergoing writeback. */
 +      if (folio_start_writeback(folio))
 +              WARN_ON(1);
  
 -      return rc;
 -}
 +      count -= folio_nr_pages(folio);
 +      len = folio_size(folio);
  
 -static int
 -cifs_writepage_locked(struct page *page, struct writeback_control *wbc);
 +      xid = get_xid();
 +      server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
  
 -static int cifs_write_one_page(struct folio *folio,
 -              struct writeback_control *wbc, void *data)
 -{
 -      struct address_space *mapping = data;
 -      int ret;
 +      rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
 +      if (rc) {
 +              cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
 +              goto err_xid;
 +      }
  
 -      ret = cifs_writepage_locked(&folio->page, wbc);
 -      folio_unlock(folio);
 -      mapping_set_error(mapping, ret);
 -      return ret;
 -}
 +      rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
 +                                         &wsize, credits);
 +      if (rc != 0)
 +              goto err_close;
  
 -static int cifs_writepages(struct address_space *mapping,
 -                         struct writeback_control *wbc)
 -{
 -      struct inode *inode = mapping->host;
 -      struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
 -      struct TCP_Server_Info *server;
 -      bool done = false, scanned = false, range_whole = false;
 -      pgoff_t end, index;
 -      struct cifs_writedata *wdata;
 -      struct cifsFileInfo *cfile = NULL;
 -      int rc = 0;
 -      int saved_rc = 0;
 -      unsigned int xid;
 +      wdata = cifs_writedata_alloc(cifs_writev_complete);
 +      if (!wdata) {
 +              rc = -ENOMEM;
 +              goto err_uncredit;
 +      }
  
 -      /*
 -       * If wsize is smaller than the page cache size, default to writing
 -       * one page at a time.
 +      wdata->sync_mode = wbc->sync_mode;
 +      wdata->offset = folio_pos(folio);
 +      wdata->pid = cfile->pid;
 +      wdata->credits = credits_on_stack;
 +      wdata->cfile = cfile;
 +      wdata->server = server;
 +      cfile = NULL;
 +
 +      /* Find all consecutive lockable dirty pages, stopping when we find a
 +       * page that is not immediately lockable, is not dirty or is missing,
 +       * or we reach the end of the range.
         */
 -      if (cifs_sb->ctx->wsize < PAGE_SIZE)
 -              return write_cache_pages(mapping, wbc, cifs_write_one_page,
 -                              mapping);
 +      if (start < i_size) {
 +              /* Trim the write to the EOF; the extra data is ignored.  Also
 +               * put an upper limit on the size of a single storedata op.
 +               */
 +              max_len = wsize;
 +              max_len = min_t(unsigned long long, max_len, end - start + 1);
 +              max_len = min_t(unsigned long long, max_len, i_size - start);
  
 -      xid = get_xid();
 -      if (wbc->range_cyclic) {
 -              index = mapping->writeback_index; /* Start from prev offset */
 -              end = -1;
 -      } else {
 -              index = wbc->range_start >> PAGE_SHIFT;
 -              end = wbc->range_end >> PAGE_SHIFT;
 -              if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 -                      range_whole = true;
 -              scanned = true;
 +              if (len < max_len) {
 +                      int max_pages = INT_MAX;
 +
 +#ifdef CONFIG_CIFS_SMB_DIRECT
 +                      if (server->smbd_conn)
 +                              max_pages = server->smbd_conn->max_frmr_depth;
 +#endif
 +                      max_pages -= folio_nr_pages(folio);
 +
 +                      if (max_pages > 0)
 +                              cifs_extend_writeback(mapping, &count, start,
 +                                                    max_pages, max_len, &len);
 +              }
 +              len = min_t(loff_t, len, max_len);
        }
 -      server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
  
 -retry:
 -      while (!done && index <= end) {
 -              unsigned int i, nr_pages, found_pages, wsize;
 -              pgoff_t next = 0, tofind, saved_index = index;
 -              struct cifs_credits credits_on_stack;
 -              struct cifs_credits *credits = &credits_on_stack;
 -              int get_file_rc = 0;
 +      wdata->bytes = len;
  
 -              if (cfile)
 -                      cifsFileInfo_put(cfile);
 +      /* We now have a contiguous set of dirty pages, each with writeback
 +       * set; the first page is still locked at this point, but all the rest
 +       * have been unlocked.
 +       */
 +      folio_unlock(folio);
  
 -              rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
 +      if (start < i_size) {
 +              iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages,
 +                              start, len);
  
 -              /* in case of an error store it to return later */
 +              rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
                if (rc)
 -                      get_file_rc = rc;
 +                      goto err_wdata;
  
 -              rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
 -                                                 &wsize, credits);
 -              if (rc != 0) {
 -                      done = true;
 -                      break;
 +              if (wdata->cfile->invalidHandle)
 +                      rc = -EAGAIN;
 +              else
 +                      rc = wdata->server->ops->async_writev(wdata,
 +                                                            cifs_writedata_release);
 +              if (rc >= 0) {
 +                      kref_put(&wdata->refcount, cifs_writedata_release);
 +                      goto err_close;
                }
 +      } else {
 +              /* The dirty region was entirely beyond the EOF. */
 +              cifs_pages_written_back(inode, start, len);
 +              rc = 0;
 +      }
  
 -              tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1;
 +err_wdata:
 +      kref_put(&wdata->refcount, cifs_writedata_release);
 +err_uncredit:
 +      add_credits_and_wake_if(server, credits, 0);
 +err_close:
 +      if (cfile)
 +              cifsFileInfo_put(cfile);
 +err_xid:
 +      free_xid(xid);
 +      if (rc == 0) {
 +              wbc->nr_to_write = count;
 +      } else if (is_retryable_error(rc)) {
 +              cifs_pages_write_redirty(inode, start, len);
 +      } else {
 +              cifs_pages_write_failed(inode, start, len);
 +              mapping_set_error(mapping, rc);
 +      }
 +      /* Indication to update ctime and mtime as close is deferred */
 +      set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
 +      return rc;
 +}
  
 -              wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
 -                                                &found_pages);
 -              if (!wdata) {
 -                      rc = -ENOMEM;
 -                      done = true;
 -                      add_credits_and_wake_if(server, credits, 0);
 -                      break;
 -              }
 +/*
 + * write a region of pages back to the server
 + */
 +static int cifs_writepages_region(struct address_space *mapping,
 +                                struct writeback_control *wbc,
 +                                loff_t start, loff_t end, loff_t *_next)
 +{
-       struct folio *folio;
-       struct page *head_page;
-       ssize_t ret;
-       int n, skips = 0;
++      struct folio_batch fbatch;
++      int skips = 0;
  
 -              if (found_pages == 0) {
 -                      kref_put(&wdata->refcount, cifs_writedata_release);
 -                      add_credits_and_wake_if(server, credits, 0);
++      folio_batch_init(&fbatch);
 +      do {
++              int nr;
 +              pgoff_t index = start / PAGE_SIZE;
 +
-               n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
-                                            PAGECACHE_TAG_DIRTY, 1, &head_page);
-               if (!n)
++              nr = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
++                                          PAGECACHE_TAG_DIRTY, &fbatch);
++              if (!nr)
                        break;
 -              }
  
-               folio = page_folio(head_page);
-               start = folio_pos(folio); /* May regress with THPs */
 -              nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
 -                                             end, &index, &next, &done);
++              for (int i = 0; i < nr; i++) {
++                      ssize_t ret;
++                      struct folio *folio = fbatch.folios[i];
  
-               /* At this point we hold neither the i_pages lock nor the
-                * page lock: the page may be truncated or invalidated
-                * (changing page->mapping to NULL), or even swizzled
-                * back from swapper_space to tmpfs file mapping
-                */
-               if (wbc->sync_mode != WB_SYNC_NONE) {
-                       ret = folio_lock_killable(folio);
-                       if (ret < 0) {
-                               folio_put(folio);
-                               return ret;
 -              /* nothing to write? */
 -              if (nr_pages == 0) {
 -                      kref_put(&wdata->refcount, cifs_writedata_release);
 -                      add_credits_and_wake_if(server, credits, 0);
 -                      continue;
 -              }
++redo_folio:
++                      start = folio_pos(folio); /* May regress with THPs */
 -              wdata->credits = credits_on_stack;
 -              wdata->cfile = cfile;
 -              wdata->server = server;
 -              cfile = NULL;
++                      /* At this point we hold neither the i_pages lock nor the
++                       * page lock: the page may be truncated or invalidated
++                       * (changing page->mapping to NULL), or even swizzled
++                       * back from swapper_space to tmpfs file mapping
++                       */
++                      if (wbc->sync_mode != WB_SYNC_NONE) {
++                              ret = folio_lock_killable(folio);
++                              if (ret < 0)
++                                      goto write_error;
++                      } else {
++                              if (!folio_trylock(folio))
++                                      goto skip_write;
 +                      }
-               } else {
-                       if (!folio_trylock(folio)) {
-                               folio_put(folio);
-                               return 0;
 -              if (!wdata->cfile) {
 -                      cifs_dbg(VFS, "No writable handle in writepages rc=%d\n",
 -                               get_file_rc);
 -                      if (is_retryable_error(get_file_rc))
 -                              rc = get_file_rc;
 -                      else
 -                              rc = -EBADF;
 -              } else
 -                      rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
++                      if (folio_mapping(folio) != mapping ||
++                          !folio_test_dirty(folio)) {
++                              folio_unlock(folio);
++                              goto skip_write;
 +                      }
-               }
  
-               if (folio_mapping(folio) != mapping ||
-                   !folio_test_dirty(folio)) {
-                       start += folio_size(folio);
-                       folio_unlock(folio);
-                       folio_put(folio);
-                       continue;
-               }
 -              for (i = 0; i < nr_pages; ++i)
 -                      unlock_page(wdata->pages[i]);
++                      if (folio_test_writeback(folio) ||
++                          folio_test_fscache(folio)) {
++                              folio_unlock(folio);
++                              if (wbc->sync_mode == WB_SYNC_NONE)
++                                      goto skip_write;
  
-               if (folio_test_writeback(folio) ||
-                   folio_test_fscache(folio)) {
-                       folio_unlock(folio);
-                       if (wbc->sync_mode != WB_SYNC_NONE) {
 -              /* send failure -- clean up the mess */
 -              if (rc != 0) {
 -                      add_credits_and_wake_if(server, &wdata->credits, 0);
 -                      for (i = 0; i < nr_pages; ++i) {
 -                              if (is_retryable_error(rc))
 -                                      redirty_page_for_writepage(wbc,
 -                                                         wdata->pages[i]);
 -                              else
 -                                      SetPageError(wdata->pages[i]);
 -                              end_page_writeback(wdata->pages[i]);
 -                              put_page(wdata->pages[i]);
 +                              folio_wait_writeback(folio);
 +#ifdef CONFIG_CIFS_FSCACHE
 +                              folio_wait_fscache(folio);
 +#endif
-                       } else {
-                               start += folio_size(folio);
++                              goto redo_folio;
                        }
-                       folio_put(folio);
-                       if (wbc->sync_mode == WB_SYNC_NONE) {
-                               if (skips >= 5 || need_resched())
-                                       break;
-                               skips++;
-                       }
-                       continue;
 -                      if (!is_retryable_error(rc))
 -                              mapping_set_error(mapping, rc);
--              }
 -              kref_put(&wdata->refcount, cifs_writedata_release);
  
-               if (!folio_clear_dirty_for_io(folio))
-                       /* We hold the page lock - it should've been dirty. */
-                       WARN_ON(1);
 -              if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
 -                      index = saved_index;
 -                      continue;
 -              }
++                      if (!folio_clear_dirty_for_io(folio))
++                              /* We hold the page lock - it should've been dirty. */
++                              WARN_ON(1);
  
-               ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
-               folio_put(folio);
-               if (ret < 0)
 -              /* Return immediately if we received a signal during writing */
 -              if (is_interrupt_error(rc)) {
 -                      done = true;
 -                      break;
 -              }
++                      ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
++                      if (ret < 0)
++                              goto write_error;
 -              if (rc != 0 && saved_rc == 0)
 -                      saved_rc = rc;
++                      start += ret;
++                      continue;
 -              wbc->nr_to_write -= nr_pages;
 -              if (wbc->nr_to_write <= 0)
 -                      done = true;
++write_error:
++                      folio_batch_release(&fbatch);
++                      *_next = start;
 +                      return ret;
  
-               start += ret;
 -              index = next;
 -      }
++skip_write:
++                      /*
++                       * Too many skipped writes, or need to reschedule?
++                       * Treat it as a write error without an error code.
++                       */
++                      if (skips >= 5 || need_resched()) {
++                              ret = 0;
++                              goto write_error;
++                      }
 -      if (!scanned && !done) {
 -              /*
 -               * We hit the last page and there is more work to be done: wrap
 -               * back to the start of the file
 -               */
 -              scanned = true;
 -              index = 0;
 -              goto retry;
 -      }
++                      /* Otherwise, just skip that folio and go on to the next */
++                      skips++;
++                      start += folio_size(folio);
++                      continue;
++              }
 -      if (saved_rc != 0)
 -              rc = saved_rc;
++              folio_batch_release(&fbatch);           
 +              cond_resched();
 +      } while (wbc->nr_to_write > 0);
  
 -      if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
 -              mapping->writeback_index = index;
 +      *_next = start;
 +      return 0;
 +}
  
 -      if (cfile)
 -              cifsFileInfo_put(cfile);
 -      free_xid(xid);
 -      /* Indication to update ctime and mtime as close is deferred */
 -      set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
 -      return rc;
 +/*
 + * Write some of the pending data back to the server
 + */
 +static int cifs_writepages(struct address_space *mapping,
 +                         struct writeback_control *wbc)
 +{
 +      loff_t start, next;
 +      int ret;
 +
 +      /* We have to be careful as we can end up racing with setattr()
 +       * truncating the pagecache since the caller doesn't take a lock here
 +       * to prevent it.
 +       */
 +
 +      if (wbc->range_cyclic) {
 +              start = mapping->writeback_index * PAGE_SIZE;
 +              ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
 +              if (ret == 0) {
 +                      mapping->writeback_index = next / PAGE_SIZE;
 +                      if (start > 0 && wbc->nr_to_write > 0) {
 +                              ret = cifs_writepages_region(mapping, wbc, 0,
 +                                                           start, &next);
 +                              if (ret == 0)
 +                                      mapping->writeback_index =
 +                                              next / PAGE_SIZE;
 +                      }
 +              }
 +      } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
 +              ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
 +              if (wbc->nr_to_write > 0 && ret == 0)
 +                      mapping->writeback_index = next / PAGE_SIZE;
 +      } else {
 +              ret = cifs_writepages_region(mapping, wbc,
 +                                           wbc->range_start, wbc->range_end, &next);
 +      }
 +
 +      return ret;
  }
  
  static int
@@@ -3024,7 -2902,6 +3038,7 @@@ static int cifs_write_end(struct file *
        struct inode *inode = mapping->host;
        struct cifsFileInfo *cfile = file->private_data;
        struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb);
 +      struct folio *folio = page_folio(page);
        __u32 pid;
  
        if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
        cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n",
                 page, pos, copied);
  
 -      if (PageChecked(page)) {
 +      if (folio_test_checked(folio)) {
                if (copied == len)
 -                      SetPageUptodate(page);
 -              ClearPageChecked(page);
 -      } else if (!PageUptodate(page) && copied == PAGE_SIZE)
 -              SetPageUptodate(page);
 +                      folio_mark_uptodate(folio);
 +              folio_clear_checked(folio);
 +      } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE)
 +              folio_mark_uptodate(folio);
  
 -      if (!PageUptodate(page)) {
 +      if (!folio_test_uptodate(folio)) {
                char *page_data;
                unsigned offset = pos & (PAGE_SIZE - 1);
                unsigned int xid;
@@@ -3202,13 -3079,57 +3216,13 @@@ int cifs_flush(struct file *file, fl_ow
        return rc;
  }
  
 -static int
 -cifs_write_allocate_pages(struct page **pages, unsigned long num_pages)
 -{
 -      int rc = 0;
 -      unsigned long i;
 -
 -      for (i = 0; i < num_pages; i++) {
 -              pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
 -              if (!pages[i]) {
 -                      /*
 -                       * save number of pages we have already allocated and
 -                       * return with ENOMEM error
 -                       */
 -                      num_pages = i;
 -                      rc = -ENOMEM;
 -                      break;
 -              }
 -      }
 -
 -      if (rc) {
 -              for (i = 0; i < num_pages; i++)
 -                      put_page(pages[i]);
 -      }
 -      return rc;
 -}
 -
 -static inline
 -size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len)
 -{
 -      size_t num_pages;
 -      size_t clen;
 -
 -      clen = min_t(const size_t, len, wsize);
 -      num_pages = DIV_ROUND_UP(clen, PAGE_SIZE);
 -
 -      if (cur_len)
 -              *cur_len = clen;
 -
 -      return num_pages;
 -}
 -
  static void
  cifs_uncached_writedata_release(struct kref *refcount)
  {
 -      int i;
        struct cifs_writedata *wdata = container_of(refcount,
                                        struct cifs_writedata, refcount);
  
        kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release);
 -      for (i = 0; i < wdata->nr_pages; i++)
 -              put_page(wdata->pages[i]);
        cifs_writedata_release(refcount);
  }
  
@@@ -3234,6 -3155,48 +3248,6 @@@ cifs_uncached_writev_complete(struct wo
        kref_put(&wdata->refcount, cifs_uncached_writedata_release);
  }
  
 -static int
 -wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
 -                    size_t *len, unsigned long *num_pages)
 -{
 -      size_t save_len, copied, bytes, cur_len = *len;
 -      unsigned long i, nr_pages = *num_pages;
 -
 -      save_len = cur_len;
 -      for (i = 0; i < nr_pages; i++) {
 -              bytes = min_t(const size_t, cur_len, PAGE_SIZE);
 -              copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
 -              cur_len -= copied;
 -              /*
 -               * If we didn't copy as much as we expected, then that
 -               * may mean we trod into an unmapped area. Stop copying
 -               * at that point. On the next pass through the big
 -               * loop, we'll likely end up getting a zero-length
 -               * write and bailing out of it.
 -               */
 -              if (copied < bytes)
 -                      break;
 -      }
 -      cur_len = save_len - cur_len;
 -      *len = cur_len;
 -
 -      /*
 -       * If we have no data to send, then that probably means that
 -       * the copy above failed altogether. That's most likely because
 -       * the address in the iovec was bogus. Return -EFAULT and let
 -       * the caller free anything we allocated and bail out.
 -       */
 -      if (!cur_len)
 -              return -EFAULT;
 -
 -      /*
 -       * i + 1 now represents the number of pages we actually used in
 -       * the copy phase above.
 -       */
 -      *num_pages = i + 1;
 -      return 0;
 -}
 -
  static int
  cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list,
        struct cifs_aio_ctx *ctx)
@@@ -3304,57 -3267,23 +3318,57 @@@ fail
        return rc;
  }
  
 +/*
 + * Select span of a bvec iterator we're going to use.  Limit it by both maximum
 + * size and maximum number of segments.
 + */
 +static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_size,
 +                                   size_t max_segs, unsigned int *_nsegs)
 +{
 +      const struct bio_vec *bvecs = iter->bvec;
 +      unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0;
 +      size_t len, span = 0, n = iter->count;
 +      size_t skip = iter->iov_offset;
 +
 +      if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0)
 +              return 0;
 +
 +      while (n && ix < nbv && skip) {
 +              len = bvecs[ix].bv_len;
 +              if (skip < len)
 +                      break;
 +              skip -= len;
 +              n -= len;
 +              ix++;
 +      }
 +
 +      while (n && ix < nbv) {
 +              len = min3(n, bvecs[ix].bv_len - skip, max_size);
 +              span += len;
 +              nsegs++;
 +              ix++;
 +              if (span >= max_size || nsegs >= max_segs)
 +                      break;
 +              skip = 0;
 +              n -= len;
 +      }
 +
 +      *_nsegs = nsegs;
 +      return span;
 +}
 +
  static int
 -cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 +cifs_write_from_iter(loff_t fpos, size_t len, struct iov_iter *from,
                     struct cifsFileInfo *open_file,
                     struct cifs_sb_info *cifs_sb, struct list_head *wdata_list,
                     struct cifs_aio_ctx *ctx)
  {
        int rc = 0;
 -      size_t cur_len;
 -      unsigned long nr_pages, num_pages, i;
 +      size_t cur_len, max_len;
        struct cifs_writedata *wdata;
 -      struct iov_iter saved_from = *from;
 -      loff_t saved_offset = offset;
        pid_t pid;
        struct TCP_Server_Info *server;
 -      struct page **pagevec;
 -      size_t start;
 -      unsigned int xid;
 +      unsigned int xid, max_segs = INT_MAX;
  
        if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
                pid = open_file->pid;
        server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
        xid = get_xid();
  
 +#ifdef CONFIG_CIFS_SMB_DIRECT
 +      if (server->smbd_conn)
 +              max_segs = server->smbd_conn->max_frmr_depth;
 +#endif
 +
        do {
 -              unsigned int wsize;
                struct cifs_credits credits_on_stack;
                struct cifs_credits *credits = &credits_on_stack;
 +              unsigned int wsize, nsegs = 0;
 +
 +              if (signal_pending(current)) {
 +                      rc = -EINTR;
 +                      break;
 +              }
  
                if (open_file->invalidHandle) {
                        rc = cifs_reopen_file(open_file, false);
                if (rc)
                        break;
  
 -              cur_len = min_t(const size_t, len, wsize);
 -
 -              if (ctx->direct_io) {
 -                      ssize_t result;
 -
 -                      result = iov_iter_get_pages_alloc2(
 -                              from, &pagevec, cur_len, &start);
 -                      if (result < 0) {
 -                              cifs_dbg(VFS,
 -                                       "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
 -                                       result, iov_iter_type(from),
 -                                       from->iov_offset, from->count);
 -                              dump_stack();
 -
 -                              rc = result;
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 -                      cur_len = (size_t)result;
 -
 -                      nr_pages =
 -                              (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
 -
 -                      wdata = cifs_writedata_direct_alloc(pagevec,
 -                                           cifs_uncached_writev_complete);
 -                      if (!wdata) {
 -                              rc = -ENOMEM;
 -                              for (i = 0; i < nr_pages; i++)
 -                                      put_page(pagevec[i]);
 -                              kvfree(pagevec);
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 -
 -
 -                      wdata->page_offset = start;
 -                      wdata->tailsz =
 -                              nr_pages > 1 ?
 -                                      cur_len - (PAGE_SIZE - start) -
 -                                      (nr_pages - 2) * PAGE_SIZE :
 -                                      cur_len;
 -              } else {
 -                      nr_pages = get_numpages(wsize, len, &cur_len);
 -                      wdata = cifs_writedata_alloc(nr_pages,
 -                                           cifs_uncached_writev_complete);
 -                      if (!wdata) {
 -                              rc = -ENOMEM;
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 -
 -                      rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
 -                      if (rc) {
 -                              kvfree(wdata->pages);
 -                              kfree(wdata);
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 -
 -                      num_pages = nr_pages;
 -                      rc = wdata_fill_from_iovec(
 -                              wdata, from, &cur_len, &num_pages);
 -                      if (rc) {
 -                              for (i = 0; i < nr_pages; i++)
 -                                      put_page(wdata->pages[i]);
 -                              kvfree(wdata->pages);
 -                              kfree(wdata);
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 +              max_len = min_t(const size_t, len, wsize);
 +              if (!max_len) {
 +                      rc = -EAGAIN;
 +                      add_credits_and_wake_if(server, credits, 0);
 +                      break;
 +              }
  
 -                      /*
 -                       * Bring nr_pages down to the number of pages we
 -                       * actually used, and free any pages that we didn't use.
 -                       */
 -                      for ( ; nr_pages > num_pages; nr_pages--)
 -                              put_page(wdata->pages[nr_pages - 1]);
 +              cur_len = cifs_limit_bvec_subset(from, max_len, max_segs, &nsegs);
 +              cifs_dbg(FYI, "write_from_iter len=%zx/%zx nsegs=%u/%lu/%u\n",
 +                       cur_len, max_len, nsegs, from->nr_segs, max_segs);
 +              if (cur_len == 0) {
 +                      rc = -EIO;
 +                      add_credits_and_wake_if(server, credits, 0);
 +                      break;
 +              }
  
 -                      wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
 +              wdata = cifs_writedata_alloc(cifs_uncached_writev_complete);
 +              if (!wdata) {
 +                      rc = -ENOMEM;
 +                      add_credits_and_wake_if(server, credits, 0);
 +                      break;
                }
  
                wdata->sync_mode = WB_SYNC_ALL;
 -              wdata->nr_pages = nr_pages;
 -              wdata->offset = (__u64)offset;
 -              wdata->cfile = cifsFileInfo_get(open_file);
 -              wdata->server = server;
 -              wdata->pid = pid;
 -              wdata->bytes = cur_len;
 -              wdata->pagesz = PAGE_SIZE;
 -              wdata->credits = credits_on_stack;
 -              wdata->ctx = ctx;
 +              wdata->offset   = (__u64)fpos;
 +              wdata->cfile    = cifsFileInfo_get(open_file);
 +              wdata->server   = server;
 +              wdata->pid      = pid;
 +              wdata->bytes    = cur_len;
 +              wdata->credits  = credits_on_stack;
 +              wdata->iter     = *from;
 +              wdata->ctx      = ctx;
                kref_get(&ctx->refcount);
  
 +              iov_iter_truncate(&wdata->iter, cur_len);
 +
                rc = adjust_credits(server, &wdata->credits, wdata->bytes);
  
                if (!rc) {
                        add_credits_and_wake_if(server, &wdata->credits, 0);
                        kref_put(&wdata->refcount,
                                 cifs_uncached_writedata_release);
 -                      if (rc == -EAGAIN) {
 -                              *from = saved_from;
 -                              iov_iter_advance(from, offset - saved_offset);
 +                      if (rc == -EAGAIN)
                                continue;
 -                      }
                        break;
                }
  
                list_add_tail(&wdata->list, wdata_list);
 -              offset += cur_len;
 +              iov_iter_advance(from, cur_len);
 +              fpos += cur_len;
                len -= cur_len;
        } while (len > 0);
  
@@@ -3548,8 -3526,20 +3562,8 @@@ static ssize_t __cifs_writev
        struct cifs_tcon *tcon;
        struct cifs_sb_info *cifs_sb;
        struct cifs_aio_ctx *ctx;
 -      struct iov_iter saved_from = *from;
 -      size_t len = iov_iter_count(from);
        int rc;
  
 -      /*
 -       * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
 -       * In this case, fall back to non-direct write function.
 -       * this could be improved by getting pages directly in ITER_KVEC
 -       */
 -      if (direct && iov_iter_is_kvec(from)) {
 -              cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n");
 -              direct = false;
 -      }
 -
        rc = generic_write_checks(iocb, from);
        if (rc <= 0)
                return rc;
                ctx->iocb = iocb;
  
        ctx->pos = iocb->ki_pos;
 +      ctx->direct_io = direct;
 +      ctx->nr_pinned_pages = 0;
  
 -      if (direct) {
 -              ctx->direct_io = true;
 -              ctx->iter = *from;
 -              ctx->len = len;
 -      } else {
 -              rc = setup_aio_ctx_iter(ctx, from, ITER_SOURCE);
 -              if (rc) {
 +      if (user_backed_iter(from)) {
 +              /*
 +               * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as
 +               * they contain references to the calling process's virtual
 +               * memory layout which won't be available in an async worker
 +               * thread.  This also takes a pin on every folio involved.
 +               */
 +              rc = netfs_extract_user_iter(from, iov_iter_count(from),
 +                                           &ctx->iter, 0);
 +              if (rc < 0) {
                        kref_put(&ctx->refcount, cifs_aio_ctx_release);
                        return rc;
                }
 +
 +              ctx->nr_pinned_pages = rc;
 +              ctx->bv = (void *)ctx->iter.bvec;
 +              ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter);
 +      } else if ((iov_iter_is_bvec(from) || iov_iter_is_kvec(from)) &&
 +                 !is_sync_kiocb(iocb)) {
 +              /*
 +               * If the op is asynchronous, we need to copy the list attached
 +               * to a BVEC/KVEC-type iterator, but we assume that the storage
 +               * will be pinned by the caller; in any case, we may or may not
 +               * be able to pin the pages, so we don't try.
 +               */
 +              ctx->bv = (void *)dup_iter(&ctx->iter, from, GFP_KERNEL);
 +              if (!ctx->bv) {
 +                      kref_put(&ctx->refcount, cifs_aio_ctx_release);
 +                      return -ENOMEM;
 +              }
 +      } else {
 +              /*
 +               * Otherwise, we just pass the iterator down as-is and rely on
 +               * the caller to make sure the pages referred to by the
 +               * iterator don't evaporate.
 +               */
 +              ctx->iter = *from;
        }
  
 +      ctx->len = iov_iter_count(&ctx->iter);
 +
        /* grab a lock here due to read response handlers can access ctx */
        mutex_lock(&ctx->aio_mutex);
  
 -      rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from,
 +      rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &ctx->iter,
                                  cfile, cifs_sb, &ctx->list, ctx);
  
        /*
        return written;
  }
  
 -static struct cifs_readdata *
 -cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
 +static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete)
  {
        struct cifs_readdata *rdata;
  
        rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
 -      if (rdata != NULL) {
 -              rdata->pages = pages;
 +      if (rdata) {
                kref_init(&rdata->refcount);
                INIT_LIST_HEAD(&rdata->list);
                init_completion(&rdata->done);
        return rdata;
  }
  
 -static struct cifs_readdata *
 -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
 -{
 -      struct page **pages =
 -              kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
 -      struct cifs_readdata *ret = NULL;
 -
 -      if (pages) {
 -              ret = cifs_readdata_direct_alloc(pages, complete);
 -              if (!ret)
 -                      kfree(pages);
 -      }
 -
 -      return ret;
 -}
 -
  void
  cifs_readdata_release(struct kref *refcount)
  {
        struct cifs_readdata *rdata = container_of(refcount,
                                        struct cifs_readdata, refcount);
 +
 +      if (rdata->ctx)
 +              kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
  #ifdef CONFIG_CIFS_SMB_DIRECT
        if (rdata->mr) {
                smbd_deregister_mr(rdata->mr);
        if (rdata->cfile)
                cifsFileInfo_put(rdata->cfile);
  
 -      kvfree(rdata->pages);
        kfree(rdata);
  }
  
 -static int
 -cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
 -{
 -      int rc = 0;
 -      struct page *page;
 -      unsigned int i;
 -
 -      for (i = 0; i < nr_pages; i++) {
 -              page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
 -              if (!page) {
 -                      rc = -ENOMEM;
 -                      break;
 -              }
 -              rdata->pages[i] = page;
 -      }
 -
 -      if (rc) {
 -              unsigned int nr_page_failed = i;
 -
 -              for (i = 0; i < nr_page_failed; i++) {
 -                      put_page(rdata->pages[i]);
 -                      rdata->pages[i] = NULL;
 -              }
 -      }
 -      return rc;
 -}
 -
 -static void
 -cifs_uncached_readdata_release(struct kref *refcount)
 -{
 -      struct cifs_readdata *rdata = container_of(refcount,
 -                                      struct cifs_readdata, refcount);
 -      unsigned int i;
 -
 -      kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
 -      for (i = 0; i < rdata->nr_pages; i++) {
 -              put_page(rdata->pages[i]);
 -      }
 -      cifs_readdata_release(refcount);
 -}
 -
 -/**
 - * cifs_readdata_to_iov - copy data from pages in response to an iovec
 - * @rdata:    the readdata response with list of pages holding data
 - * @iter:     destination for our data
 - *
 - * This function copies data from a list of pages in a readdata response into
 - * an array of iovecs. It will first calculate where the data should go
 - * based on the info in the readdata and then copy the data into that spot.
 - */
 -static int
 -cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
 -{
 -      size_t remaining = rdata->got_bytes;
 -      unsigned int i;
 -
 -      for (i = 0; i < rdata->nr_pages; i++) {
 -              struct page *page = rdata->pages[i];
 -              size_t copy = min_t(size_t, remaining, PAGE_SIZE);
 -              size_t written;
 -
 -              if (unlikely(iov_iter_is_pipe(iter))) {
 -                      void *addr = kmap_atomic(page);
 -
 -                      written = copy_to_iter(addr, copy, iter);
 -                      kunmap_atomic(addr);
 -              } else
 -                      written = copy_page_to_iter(page, 0, copy, iter);
 -              remaining -= written;
 -              if (written < copy && iov_iter_count(iter) > 0)
 -                      break;
 -      }
 -      return remaining ? -EFAULT : 0;
 -}
 -
  static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
  
  static void
@@@ -3807,7 -3857,81 +3821,7 @@@ cifs_uncached_readv_complete(struct wor
        complete(&rdata->done);
        collect_uncached_read_data(rdata->ctx);
        /* the below call can possibly free the last ref to aio ctx */
 -      kref_put(&rdata->refcount, cifs_uncached_readdata_release);
 -}
 -
 -static int
 -uncached_fill_pages(struct TCP_Server_Info *server,
 -                  struct cifs_readdata *rdata, struct iov_iter *iter,
 -                  unsigned int len)
 -{
 -      int result = 0;
 -      unsigned int i;
 -      unsigned int nr_pages = rdata->nr_pages;
 -      unsigned int page_offset = rdata->page_offset;
 -
 -      rdata->got_bytes = 0;
 -      rdata->tailsz = PAGE_SIZE;
 -      for (i = 0; i < nr_pages; i++) {
 -              struct page *page = rdata->pages[i];
 -              size_t n;
 -              unsigned int segment_size = rdata->pagesz;
 -
 -              if (i == 0)
 -                      segment_size -= page_offset;
 -              else
 -                      page_offset = 0;
 -
 -
 -              if (len <= 0) {
 -                      /* no need to hold page hostage */
 -                      rdata->pages[i] = NULL;
 -                      rdata->nr_pages--;
 -                      put_page(page);
 -                      continue;
 -              }
 -
 -              n = len;
 -              if (len >= segment_size)
 -                      /* enough data to fill the page */
 -                      n = segment_size;
 -              else
 -                      rdata->tailsz = len;
 -              len -= n;
 -
 -              if (iter)
 -                      result = copy_page_from_iter(
 -                                      page, page_offset, n, iter);
 -#ifdef CONFIG_CIFS_SMB_DIRECT
 -              else if (rdata->mr)
 -                      result = n;
 -#endif
 -              else
 -                      result = cifs_read_page_from_socket(
 -                                      server, page, page_offset, n);
 -              if (result < 0)
 -                      break;
 -
 -              rdata->got_bytes += result;
 -      }
 -
 -      return rdata->got_bytes > 0 && result != -ECONNABORTED ?
 -                                              rdata->got_bytes : result;
 -}
 -
 -static int
 -cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
 -                            struct cifs_readdata *rdata, unsigned int len)
 -{
 -      return uncached_fill_pages(server, rdata, NULL, len);
 -}
 -
 -static int
 -cifs_uncached_copy_into_pages(struct TCP_Server_Info *server,
 -                            struct cifs_readdata *rdata,
 -                            struct iov_iter *iter)
 -{
 -      return uncached_fill_pages(server, rdata, iter, iter->count);
 +      kref_put(&rdata->refcount, cifs_readdata_release);
  }
  
  static int cifs_resend_rdata(struct cifs_readdata *rdata,
        } while (rc == -EAGAIN);
  
  fail:
 -      kref_put(&rdata->refcount, cifs_uncached_readdata_release);
 +      kref_put(&rdata->refcount, cifs_readdata_release);
        return rc;
  }
  
  static int
 -cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 +cifs_send_async_read(loff_t fpos, size_t len, struct cifsFileInfo *open_file,
                     struct cifs_sb_info *cifs_sb, struct list_head *rdata_list,
                     struct cifs_aio_ctx *ctx)
  {
        struct cifs_readdata *rdata;
 -      unsigned int npages, rsize;
 +      unsigned int rsize, nsegs, max_segs = INT_MAX;
        struct cifs_credits credits_on_stack;
        struct cifs_credits *credits = &credits_on_stack;
 -      size_t cur_len;
 +      size_t cur_len, max_len;
        int rc;
        pid_t pid;
        struct TCP_Server_Info *server;
 -      struct page **pagevec;
 -      size_t start;
 -      struct iov_iter direct_iov = ctx->iter;
  
        server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
  
 +#ifdef CONFIG_CIFS_SMB_DIRECT
 +      if (server->smbd_conn)
 +              max_segs = server->smbd_conn->max_frmr_depth;
 +#endif
 +
        if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
                pid = open_file->pid;
        else
                pid = current->tgid;
  
 -      if (ctx->direct_io)
 -              iov_iter_advance(&direct_iov, offset - ctx->pos);
 -
        do {
                if (open_file->invalidHandle) {
                        rc = cifs_reopen_file(open_file, true);
                if (rc)
                        break;
  
 -              cur_len = min_t(const size_t, len, rsize);
 -
 -              if (ctx->direct_io) {
 -                      ssize_t result;
 -
 -                      result = iov_iter_get_pages_alloc2(
 -                                      &direct_iov, &pagevec,
 -                                      cur_len, &start);
 -                      if (result < 0) {
 -                              cifs_dbg(VFS,
 -                                       "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
 -                                       result, iov_iter_type(&direct_iov),
 -                                       direct_iov.iov_offset,
 -                                       direct_iov.count);
 -                              dump_stack();
 -
 -                              rc = result;
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 -                      cur_len = (size_t)result;
 -
 -                      rdata = cifs_readdata_direct_alloc(
 -                                      pagevec, cifs_uncached_readv_complete);
 -                      if (!rdata) {
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              rc = -ENOMEM;
 -                              break;
 -                      }
 -
 -                      npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
 -                      rdata->page_offset = start;
 -                      rdata->tailsz = npages > 1 ?
 -                              cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
 -                              cur_len;
 +              max_len = min_t(size_t, len, rsize);
  
 -              } else {
 -
 -                      npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
 -                      /* allocate a readdata struct */
 -                      rdata = cifs_readdata_alloc(npages,
 -                                          cifs_uncached_readv_complete);
 -                      if (!rdata) {
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              rc = -ENOMEM;
 -                              break;
 -                      }
 -
 -                      rc = cifs_read_allocate_pages(rdata, npages);
 -                      if (rc) {
 -                              kvfree(rdata->pages);
 -                              kfree(rdata);
 -                              add_credits_and_wake_if(server, credits, 0);
 -                              break;
 -                      }
 +              cur_len = cifs_limit_bvec_subset(&ctx->iter, max_len,
 +                                               max_segs, &nsegs);
 +              cifs_dbg(FYI, "read-to-iter len=%zx/%zx nsegs=%u/%lu/%u\n",
 +                       cur_len, max_len, nsegs, ctx->iter.nr_segs, max_segs);
 +              if (cur_len == 0) {
 +                      rc = -EIO;
 +                      add_credits_and_wake_if(server, credits, 0);
 +                      break;
 +              }
  
 -                      rdata->tailsz = PAGE_SIZE;
 +              rdata = cifs_readdata_alloc(cifs_uncached_readv_complete);
 +              if (!rdata) {
 +                      add_credits_and_wake_if(server, credits, 0);
 +                      rc = -ENOMEM;
 +                      break;
                }
  
 -              rdata->server = server;
 -              rdata->cfile = cifsFileInfo_get(open_file);
 -              rdata->nr_pages = npages;
 -              rdata->offset = offset;
 -              rdata->bytes = cur_len;
 -              rdata->pid = pid;
 -              rdata->pagesz = PAGE_SIZE;
 -              rdata->read_into_pages = cifs_uncached_read_into_pages;
 -              rdata->copy_into_pages = cifs_uncached_copy_into_pages;
 -              rdata->credits = credits_on_stack;
 -              rdata->ctx = ctx;
 +              rdata->server   = server;
 +              rdata->cfile    = cifsFileInfo_get(open_file);
 +              rdata->offset   = fpos;
 +              rdata->bytes    = cur_len;
 +              rdata->pid      = pid;
 +              rdata->credits  = credits_on_stack;
 +              rdata->ctx      = ctx;
                kref_get(&ctx->refcount);
  
 +              rdata->iter     = ctx->iter;
 +              iov_iter_truncate(&rdata->iter, cur_len);
 +
                rc = adjust_credits(server, &rdata->credits, rdata->bytes);
  
                if (!rc) {
  
                if (rc) {
                        add_credits_and_wake_if(server, &rdata->credits, 0);
 -                      kref_put(&rdata->refcount,
 -                              cifs_uncached_readdata_release);
 -                      if (rc == -EAGAIN) {
 -                              iov_iter_revert(&direct_iov, cur_len);
 +                      kref_put(&rdata->refcount, cifs_readdata_release);
 +                      if (rc == -EAGAIN)
                                continue;
 -                      }
                        break;
                }
  
                list_add_tail(&rdata->list, rdata_list);
 -              offset += cur_len;
 +              iov_iter_advance(&ctx->iter, cur_len);
 +              fpos += cur_len;
                len -= cur_len;
        } while (len > 0);
  
@@@ -4019,6 -4187,22 +4033,6 @@@ again
                                list_del_init(&rdata->list);
                                INIT_LIST_HEAD(&tmp_list);
  
 -                              /*
 -                               * Got a part of data and then reconnect has
 -                               * happened -- fill the buffer and continue
 -                               * reading.
 -                               */
 -                              if (got_bytes && got_bytes < rdata->bytes) {
 -                                      rc = 0;
 -                                      if (!ctx->direct_io)
 -                                              rc = cifs_readdata_to_iov(rdata, to);
 -                                      if (rc) {
 -                                              kref_put(&rdata->refcount,
 -                                                      cifs_uncached_readdata_release);
 -                                              continue;
 -                                      }
 -                              }
 -
                                if (ctx->direct_io) {
                                        /*
                                         * Re-use rdata as this is a
                                                &tmp_list, ctx);
  
                                        kref_put(&rdata->refcount,
 -                                              cifs_uncached_readdata_release);
 +                                              cifs_readdata_release);
                                }
  
                                list_splice(&tmp_list, &ctx->list);
                                goto again;
                        } else if (rdata->result)
                                rc = rdata->result;
 -                      else if (!ctx->direct_io)
 -                              rc = cifs_readdata_to_iov(rdata, to);
  
                        /* if there was a short read -- discard anything left */
                        if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
                        ctx->total_len += rdata->got_bytes;
                }
                list_del_init(&rdata->list);
 -              kref_put(&rdata->refcount, cifs_uncached_readdata_release);
 +              kref_put(&rdata->refcount, cifs_readdata_release);
        }
  
        if (!ctx->direct_io)
@@@ -4083,6 -4269,16 +4097,6 @@@ static ssize_t __cifs_readv
        loff_t offset = iocb->ki_pos;
        struct cifs_aio_ctx *ctx;
  
 -      /*
 -       * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
 -       * fall back to data copy read path
 -       * this could be improved by getting pages directly in ITER_KVEC
 -       */
 -      if (direct && iov_iter_is_kvec(to)) {
 -              cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
 -              direct = false;
 -      }
 -
        len = iov_iter_count(to);
        if (!len)
                return 0;
        if (!ctx)
                return -ENOMEM;
  
 -      ctx->cfile = cifsFileInfo_get(cfile);
 +      ctx->pos        = offset;
 +      ctx->direct_io  = direct;
 +      ctx->len        = len;
 +      ctx->cfile      = cifsFileInfo_get(cfile);
 +      ctx->nr_pinned_pages = 0;
  
        if (!is_sync_kiocb(iocb))
                ctx->iocb = iocb;
  
 -      if (user_backed_iter(to))
 -              ctx->should_dirty = true;
 -
 -      if (direct) {
 -              ctx->pos = offset;
 -              ctx->direct_io = true;
 -              ctx->iter = *to;
 -              ctx->len = len;
 -      } else {
 -              rc = setup_aio_ctx_iter(ctx, to, ITER_DEST);
 -              if (rc) {
 +      if (user_backed_iter(to)) {
 +              /*
 +               * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as
 +               * they contain references to the calling process's virtual
 +               * memory layout which won't be available in an async worker
 +               * thread.  This also takes a pin on every folio involved.
 +               */
 +              rc = netfs_extract_user_iter(to, iov_iter_count(to),
 +                                           &ctx->iter, 0);
 +              if (rc < 0) {
                        kref_put(&ctx->refcount, cifs_aio_ctx_release);
                        return rc;
                }
 -              len = ctx->len;
 +
 +              ctx->nr_pinned_pages = rc;
 +              ctx->bv = (void *)ctx->iter.bvec;
 +              ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter);
 +              ctx->should_dirty = true;
 +      } else if ((iov_iter_is_bvec(to) || iov_iter_is_kvec(to)) &&
 +                 !is_sync_kiocb(iocb)) {
 +              /*
 +               * If the op is asynchronous, we need to copy the list attached
 +               * to a BVEC/KVEC-type iterator, but we assume that the storage
 +               * will be retained by the caller; in any case, we may or may
 +               * not be able to pin the pages, so we don't try.
 +               */
 +              ctx->bv = (void *)dup_iter(&ctx->iter, to, GFP_KERNEL);
 +              if (!ctx->bv) {
 +                      kref_put(&ctx->refcount, cifs_aio_ctx_release);
 +                      return -ENOMEM;
 +              }
 +      } else {
 +              /*
 +               * Otherwise, we just pass the iterator down as-is and rely on
 +               * the caller to make sure the pages referred to by the
 +               * iterator don't evaporate.
 +               */
 +              ctx->iter = *to;
        }
  
        if (direct) {
@@@ -4346,22 -4515,23 +4360,22 @@@ cifs_read(struct file *file, char *read
   * If the page is mmap'ed into a process' page tables, then we need to make
   * sure that it doesn't change while being written back.
   */
 -static vm_fault_t
 -cifs_page_mkwrite(struct vm_fault *vmf)
 +static vm_fault_t cifs_page_mkwrite(struct vm_fault *vmf)
  {
 -      struct page *page = vmf->page;
 +      struct folio *folio = page_folio(vmf->page);
  
 -      /* Wait for the page to be written to the cache before we allow it to
 -       * be modified.  We then assume the entire page will need writing back.
 +      /* Wait for the folio to be written to the cache before we allow it to
 +       * be modified.  We then assume the entire folio will need writing back.
         */
  #ifdef CONFIG_CIFS_FSCACHE
 -      if (PageFsCache(page) &&
 -          wait_on_page_fscache_killable(page) < 0)
 +      if (folio_test_fscache(folio) &&
 +          folio_wait_fscache_killable(folio) < 0)
                return VM_FAULT_RETRY;
  #endif
  
 -      wait_on_page_writeback(page);
 +      folio_wait_writeback(folio);
  
 -      if (lock_page_killable(page) < 0)
 +      if (folio_lock_killable(folio) < 0)
                return VM_FAULT_RETRY;
        return VM_FAULT_LOCKED;
  }
@@@ -4409,72 -4579,149 +4423,72 @@@ int cifs_file_mmap(struct file *file, s
        return rc;
  }
  
 -static void
 -cifs_readv_complete(struct work_struct *work)
 +/*
 + * Unlock a bunch of folios in the pagecache.
 + */
 +static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last)
  {
 -      unsigned int i, got_bytes;
 -      struct cifs_readdata *rdata = container_of(work,
 -                                              struct cifs_readdata, work);
 -
 -      got_bytes = rdata->got_bytes;
 -      for (i = 0; i < rdata->nr_pages; i++) {
 -              struct page *page = rdata->pages[i];
 -
 -              if (rdata->result == 0 ||
 -                  (rdata->result == -EAGAIN && got_bytes)) {
 -                      flush_dcache_page(page);
 -                      SetPageUptodate(page);
 -              } else
 -                      SetPageError(page);
 -
 -              if (rdata->result == 0 ||
 -                  (rdata->result == -EAGAIN && got_bytes))
 -                      cifs_readpage_to_fscache(rdata->mapping->host, page);
 -
 -              unlock_page(page);
 +      struct folio *folio;
 +      XA_STATE(xas, &mapping->i_pages, first);
  
 -              got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);
 -
 -              put_page(page);
 -              rdata->pages[i] = NULL;
 +      rcu_read_lock();
 +      xas_for_each(&xas, folio, last) {
 +              folio_unlock(folio);
        }
 -      kref_put(&rdata->refcount, cifs_readdata_release);
 +      rcu_read_unlock();
  }
  
 -static int
 -readpages_fill_pages(struct TCP_Server_Info *server,
 -                   struct cifs_readdata *rdata, struct iov_iter *iter,
 -                   unsigned int len)
 +static void cifs_readahead_complete(struct work_struct *work)
  {
 -      int result = 0;
 -      unsigned int i;
 -      u64 eof;
 -      pgoff_t eof_index;
 -      unsigned int nr_pages = rdata->nr_pages;
 -      unsigned int page_offset = rdata->page_offset;
 -
 -      /* determine the eof that the server (probably) has */
 -      eof = CIFS_I(rdata->mapping->host)->server_eof;
 -      eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0;
 -      cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index);
 -
 -      rdata->got_bytes = 0;
 -      rdata->tailsz = PAGE_SIZE;
 -      for (i = 0; i < nr_pages; i++) {
 -              struct page *page = rdata->pages[i];
 -              unsigned int to_read = rdata->pagesz;
 -              size_t n;
 -
 -              if (i == 0)
 -                      to_read -= page_offset;
 -              else
 -                      page_offset = 0;
 -
 -              n = to_read;
 -
 -              if (len >= to_read) {
 -                      len -= to_read;
 -              } else if (len > 0) {
 -                      /* enough for partial page, fill and zero the rest */
 -                      zero_user(page, len + page_offset, to_read - len);
 -                      n = rdata->tailsz = len;
 -                      len = 0;
 -              } else if (page->index > eof_index) {
 -                      /*
 -                       * The VFS will not try to do readahead past the
 -                       * i_size, but it's possible that we have outstanding
 -                       * writes with gaps in the middle and the i_size hasn't
 -                       * caught up yet. Populate those with zeroed out pages
 -                       * to prevent the VFS from repeatedly attempting to
 -                       * fill them until the writes are flushed.
 -                       */
 -                      zero_user(page, 0, PAGE_SIZE);
 -                      flush_dcache_page(page);
 -                      SetPageUptodate(page);
 -                      unlock_page(page);
 -                      put_page(page);
 -                      rdata->pages[i] = NULL;
 -                      rdata->nr_pages--;
 -                      continue;
 -              } else {
 -                      /* no need to hold page hostage */
 -                      unlock_page(page);
 -                      put_page(page);
 -                      rdata->pages[i] = NULL;
 -                      rdata->nr_pages--;
 -                      continue;
 -              }
 +      struct cifs_readdata *rdata = container_of(work,
 +                                                 struct cifs_readdata, work);
 +      struct folio *folio;
 +      pgoff_t last;
 +      bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes);
  
 -              if (iter)
 -                      result = copy_page_from_iter(
 -                                      page, page_offset, n, iter);
 -#ifdef CONFIG_CIFS_SMB_DIRECT
 -              else if (rdata->mr)
 -                      result = n;
 -#endif
 -              else
 -                      result = cifs_read_page_from_socket(
 -                                      server, page, page_offset, n);
 -              if (result < 0)
 -                      break;
 +      XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE);
  
 -              rdata->got_bytes += result;
 -      }
 +      if (good)
 +              cifs_readahead_to_fscache(rdata->mapping->host,
 +                                        rdata->offset, rdata->bytes);
  
 -      return rdata->got_bytes > 0 && result != -ECONNABORTED ?
 -                                              rdata->got_bytes : result;
 -}
 +      if (iov_iter_count(&rdata->iter) > 0)
 +              iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter);
  
 -static int
 -cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
 -                             struct cifs_readdata *rdata, unsigned int len)
 -{
 -      return readpages_fill_pages(server, rdata, NULL, len);
 -}
 +      last = (rdata->offset + rdata->bytes - 1) / PAGE_SIZE;
  
 -static int
 -cifs_readpages_copy_into_pages(struct TCP_Server_Info *server,
 -                             struct cifs_readdata *rdata,
 -                             struct iov_iter *iter)
 -{
 -      return readpages_fill_pages(server, rdata, iter, iter->count);
 +      rcu_read_lock();
 +      xas_for_each(&xas, folio, last) {
 +              if (good) {
 +                      flush_dcache_folio(folio);
 +                      folio_mark_uptodate(folio);
 +              }
 +              folio_unlock(folio);
 +      }
 +      rcu_read_unlock();
 +
 +      kref_put(&rdata->refcount, cifs_readdata_release);
  }
  
  static void cifs_readahead(struct readahead_control *ractl)
  {
 -      int rc;
        struct cifsFileInfo *open_file = ractl->file->private_data;
        struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
        struct TCP_Server_Info *server;
 -      pid_t pid;
 -      unsigned int xid, nr_pages, last_batch_size = 0, cache_nr_pages = 0;
 -      pgoff_t next_cached = ULONG_MAX;
 +      unsigned int xid, nr_pages, cache_nr_pages = 0;
 +      unsigned int ra_pages;
 +      pgoff_t next_cached = ULONG_MAX, ra_index;
        bool caching = fscache_cookie_enabled(cifs_inode_cookie(ractl->mapping->host)) &&
                cifs_inode_cookie(ractl->mapping->host)->cache_priv;
        bool check_cache = caching;
 +      pid_t pid;
 +      int rc = 0;
 +
 +      /* Note that readahead_count() lags behind our dequeuing of pages from
 +       * the ractl, wo we have to keep track for ourselves.
 +       */
 +      ra_pages = readahead_count(ractl);
 +      ra_index = readahead_index(ractl);
  
        xid = get_xid();
  
        else
                pid = current->tgid;
  
 -      rc = 0;
        server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
  
        cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
 -               __func__, ractl->file, ractl->mapping, readahead_count(ractl));
 +               __func__, ractl->file, ractl->mapping, ra_pages);
  
        /*
         * Chop the readahead request up into rsize-sized read requests.
         */
 -      while ((nr_pages = readahead_count(ractl) - last_batch_size)) {
 -              unsigned int i, got, rsize;
 -              struct page *page;
 +      while ((nr_pages = ra_pages)) {
 +              unsigned int i, rsize;
                struct cifs_readdata *rdata;
                struct cifs_credits credits_on_stack;
                struct cifs_credits *credits = &credits_on_stack;
 -              pgoff_t index = readahead_index(ractl) + last_batch_size;
 +              struct folio *folio;
 +              pgoff_t fsize;
  
                /*
                 * Find out if we have anything cached in the range of
                if (caching) {
                        if (check_cache) {
                                rc = cifs_fscache_query_occupancy(
 -                                      ractl->mapping->host, index, nr_pages,
 +                                      ractl->mapping->host, ra_index, nr_pages,
                                        &next_cached, &cache_nr_pages);
                                if (rc < 0)
                                        caching = false;
                                check_cache = false;
                        }
  
 -                      if (index == next_cached) {
 +                      if (ra_index == next_cached) {
                                /*
                                 * TODO: Send a whole batch of pages to be read
                                 * by the cache.
                                 */
 -                              struct folio *folio = readahead_folio(ractl);
 -
 -                              last_batch_size = folio_nr_pages(folio);
 +                              folio = readahead_folio(ractl);
 +                              fsize = folio_nr_pages(folio);
 +                              ra_pages -= fsize;
 +                              ra_index += fsize;
                                if (cifs_readpage_from_fscache(ractl->mapping->host,
                                                               &folio->page) < 0) {
                                        /*
                                        caching = false;
                                }
                                folio_unlock(folio);
 -                              next_cached++;
 -                              cache_nr_pages--;
 +                              next_cached += fsize;
 +                              cache_nr_pages -= fsize;
                                if (cache_nr_pages == 0)
                                        check_cache = true;
                                continue;
                                                   &rsize, credits);
                if (rc)
                        break;
 -              nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl));
 -              nr_pages = min_t(size_t, nr_pages, next_cached - index);
 +              nr_pages = min_t(size_t, rsize / PAGE_SIZE, ra_pages);
 +              if (next_cached != ULONG_MAX)
 +                      nr_pages = min_t(size_t, nr_pages, next_cached - ra_index);
  
                /*
                 * Give up immediately if rsize is too small to read an entire
                        break;
                }
  
 -              rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
 +              rdata = cifs_readdata_alloc(cifs_readahead_complete);
                if (!rdata) {
                        /* best to give up if we're out of mem */
                        add_credits_and_wake_if(server, credits, 0);
                        break;
                }
  
 -              got = __readahead_batch(ractl, rdata->pages, nr_pages);
 -              if (got != nr_pages) {
 -                      pr_warn("__readahead_batch() returned %u/%u\n",
 -                              got, nr_pages);
 -                      nr_pages = got;
 -              }
 -
 -              rdata->nr_pages = nr_pages;
 -              rdata->bytes    = readahead_batch_length(ractl);
 +              rdata->offset   = ra_index * PAGE_SIZE;
 +              rdata->bytes    = nr_pages * PAGE_SIZE;
                rdata->cfile    = cifsFileInfo_get(open_file);
                rdata->server   = server;
                rdata->mapping  = ractl->mapping;
 -              rdata->offset   = readahead_pos(ractl);
                rdata->pid      = pid;
 -              rdata->pagesz   = PAGE_SIZE;
 -              rdata->tailsz   = PAGE_SIZE;
 -              rdata->read_into_pages = cifs_readpages_read_into_pages;
 -              rdata->copy_into_pages = cifs_readpages_copy_into_pages;
                rdata->credits  = credits_on_stack;
  
 +              for (i = 0; i < nr_pages; i++) {
 +                      if (!readahead_folio(ractl))
 +                              WARN_ON(1);
 +              }
 +              ra_pages -= nr_pages;
 +              ra_index += nr_pages;
 +
 +              iov_iter_xarray(&rdata->iter, ITER_DEST, &rdata->mapping->i_pages,
 +                              rdata->offset, rdata->bytes);
 +
                rc = adjust_credits(server, &rdata->credits, rdata->bytes);
                if (!rc) {
                        if (rdata->cfile->invalidHandle)
  
                if (rc) {
                        add_credits_and_wake_if(server, &rdata->credits, 0);
 -                      for (i = 0; i < rdata->nr_pages; i++) {
 -                              page = rdata->pages[i];
 -                              unlock_page(page);
 -                              put_page(page);
 -                      }
 +                      cifs_unlock_folios(rdata->mapping,
 +                                         rdata->offset / PAGE_SIZE,
 +                                         (rdata->offset + rdata->bytes - 1) / PAGE_SIZE);
                        /* Fallback to the readpage in error/reconnect cases */
                        kref_put(&rdata->refcount, cifs_readdata_release);
                        break;
                }
  
                kref_put(&rdata->refcount, cifs_readdata_release);
 -              last_batch_size = nr_pages;
        }
  
        free_xid(xid);
@@@ -4658,6 -4909,10 +4672,6 @@@ static int cifs_readpage_worker(struct 
  
        flush_dcache_page(page);
        SetPageUptodate(page);
 -
 -      /* send this page to the cache */
 -      cifs_readpage_to_fscache(file_inode(file), page);
 -
        rc = 0;
  
  io_error:
@@@ -5044,19 -5299,3 +5058,19 @@@ const struct address_space_operations c
        .launder_folio = cifs_launder_folio,
        .migrate_folio = filemap_migrate_folio,
  };
 +
 +/*
 + * Splice data from a file into a pipe.
 + */
 +ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
 +                       struct pipe_inode_info *pipe, size_t len,
 +                       unsigned int flags)
 +{
 +      if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes))
 +              return 0;
 +      if (unlikely(!len))
 +              return 0;
 +      if (in->f_flags & O_DIRECT)
 +              return direct_splice_read(in, ppos, pipe, len, flags);
 +      return filemap_splice_read(in, ppos, pipe, len, flags);
 +}
diff --combined fs/coredump.c
index a141dca681653c2829feb29a911397bf7131a18c,f27d734f3102944a8aa4b4d21c7e340f5eeae664..5df1e6e1eb2bea1ff98e69f3056baa4aba02cd11
@@@ -644,7 -644,7 +644,7 @@@ void do_coredump(const kernel_siginfo_
                        goto close_fail;
                }
        } else {
 -              struct user_namespace *mnt_userns;
 +              struct mnt_idmap *idmap;
                struct inode *inode;
                int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW |
                                 O_LARGEFILE | O_EXCL;
                 * a process dumps core while its cwd is e.g. on a vfat
                 * filesystem.
                 */
 -              mnt_userns = file_mnt_user_ns(cprm.file);
 -              if (!vfsuid_eq_kuid(i_uid_into_vfsuid(mnt_userns, inode),
 +              idmap = file_mnt_idmap(cprm.file);
 +              if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
                                    current_fsuid())) {
                        pr_info_ratelimited("Core dump to %s aborted: cannot preserve file owner\n",
                                            cn.corename);
                }
                if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
                        goto close_fail;
 -              if (do_truncate(mnt_userns, cprm.file->f_path.dentry,
 +              if (do_truncate(idmap, cprm.file->f_path.dentry,
                                0, 0, cprm.file))
                        goto close_fail;
        }
@@@ -838,33 -838,13 +838,33 @@@ static int __dump_skip(struct coredump_
        }
  }
  
 +int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
 +{
 +      if (cprm->to_skip) {
 +              if (!__dump_skip(cprm, cprm->to_skip))
 +                      return 0;
 +              cprm->to_skip = 0;
 +      }
 +      return __dump_emit(cprm, addr, nr);
 +}
 +EXPORT_SYMBOL(dump_emit);
 +
 +void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
 +{
 +      cprm->to_skip = pos - cprm->pos;
 +}
 +EXPORT_SYMBOL(dump_skip_to);
 +
 +void dump_skip(struct coredump_params *cprm, size_t nr)
 +{
 +      cprm->to_skip += nr;
 +}
 +EXPORT_SYMBOL(dump_skip);
 +
 +#ifdef CONFIG_ELF_CORE
  static int dump_emit_page(struct coredump_params *cprm, struct page *page)
  {
 -      struct bio_vec bvec = {
 -              .bv_page        = page,
 -              .bv_offset      = 0,
 -              .bv_len         = PAGE_SIZE,
 -      };
 +      struct bio_vec bvec;
        struct iov_iter iter;
        struct file *file = cprm->file;
        loff_t pos;
        if (dump_interrupted())
                return 0;
        pos = file->f_pos;
 +      bvec_set_page(&bvec, page, PAGE_SIZE, 0);
        iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
        n = __kernel_write_iter(cprm->file, &iter, &pos);
        if (n != PAGE_SIZE)
        return 1;
  }
  
 -int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
 -{
 -      if (cprm->to_skip) {
 -              if (!__dump_skip(cprm, cprm->to_skip))
 -                      return 0;
 -              cprm->to_skip = 0;
 -      }
 -      return __dump_emit(cprm, addr, nr);
 -}
 -EXPORT_SYMBOL(dump_emit);
 -
 -void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
 -{
 -      cprm->to_skip = pos - cprm->pos;
 -}
 -EXPORT_SYMBOL(dump_skip_to);
 -
 -void dump_skip(struct coredump_params *cprm, size_t nr)
 -{
 -      cprm->to_skip += nr;
 -}
 -EXPORT_SYMBOL(dump_skip);
 -
 -#ifdef CONFIG_ELF_CORE
  int dump_user_range(struct coredump_params *cprm, unsigned long start,
                    unsigned long len)
  {
@@@ -1108,14 -1111,14 +1108,14 @@@ whole
   * Helper function for iterating across a vma list.  It ensures that the caller
   * will visit `gate_vma' prior to terminating the search.
   */
- static struct vm_area_struct *coredump_next_vma(struct ma_state *mas,
+ static struct vm_area_struct *coredump_next_vma(struct vma_iterator *vmi,
                                       struct vm_area_struct *vma,
                                       struct vm_area_struct *gate_vma)
  {
        if (gate_vma && (vma == gate_vma))
                return NULL;
  
-       vma = mas_next(mas, ULONG_MAX);
+       vma = vma_next(vmi);
        if (vma)
                return vma;
        return gate_vma;
@@@ -1143,7 -1146,7 +1143,7 @@@ static bool dump_vma_snapshot(struct co
  {
        struct vm_area_struct *gate_vma, *vma = NULL;
        struct mm_struct *mm = current->mm;
-       MA_STATE(mas, &mm->mm_mt, 0, 0);
+       VMA_ITERATOR(vmi, mm, 0);
        int i = 0;
  
        /*
                return false;
        }
  
-       while ((vma = coredump_next_vma(&mas, vma, gate_vma)) != NULL) {
+       while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
                struct core_vma_metadata *m = cprm->vma_meta + i;
  
                m->start = vma->vm_start;
diff --combined fs/erofs/data.c
index 032e12dccb843f5b3bc7d840e07f45ec8d8f690b,f32d65987578369b757f1c962fd012d33b034538..e16545849ea7f3e6c49718479918e218b0e47411
@@@ -74,7 -74,8 +74,7 @@@ void *erofs_read_metabuf(struct erofs_b
  }
  
  static int erofs_map_blocks_flatmode(struct inode *inode,
 -                                   struct erofs_map_blocks *map,
 -                                   int flags)
 +                                   struct erofs_map_blocks *map)
  {
        erofs_blk_t nblocks, lastblk;
        u64 offset = map->m_la;
                map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la;
                map->m_plen = blknr_to_addr(lastblk) - offset;
        } else if (tailendpacking) {
 -              /* 2 - inode inline B: inode, [xattrs], inline last blk... */
 -              struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
 -
 -              map->m_pa = iloc(sbi, vi->nid) + vi->inode_isize +
 -                      vi->xattr_isize + erofs_blkoff(map->m_la);
 +              map->m_pa = erofs_iloc(inode) + vi->inode_isize +
 +                      vi->xattr_isize + erofs_blkoff(offset);
                map->m_plen = inode->i_size - offset;
  
                /* inline data should be located in the same meta block */
        return 0;
  }
  
 -int erofs_map_blocks(struct inode *inode,
 -                   struct erofs_map_blocks *map, int flags)
 +int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
  {
        struct super_block *sb = inode->i_sb;
        struct erofs_inode *vi = EROFS_I(inode);
        void *kaddr;
        int err = 0;
  
 -      trace_erofs_map_blocks_enter(inode, map, flags);
 +      trace_erofs_map_blocks_enter(inode, map, 0);
        map->m_deviceid = 0;
        if (map->m_la >= inode->i_size) {
                /* leave out-of-bound access unmapped */
        }
  
        if (vi->datalayout != EROFS_INODE_CHUNK_BASED) {
 -              err = erofs_map_blocks_flatmode(inode, map, flags);
 +              err = erofs_map_blocks_flatmode(inode, map);
                goto out;
        }
  
                unit = EROFS_BLOCK_MAP_ENTRY_SIZE;      /* block map */
  
        chunknr = map->m_la >> vi->chunkbits;
 -      pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize +
 +      pos = ALIGN(erofs_iloc(inode) + vi->inode_isize +
                    vi->xattr_isize, unit) + unit * chunknr;
  
        kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), EROFS_KMAP);
@@@ -187,7 -192,7 +187,7 @@@ out_unlock
  out:
        if (!err)
                map->m_llen = map->m_plen;
 -      trace_erofs_map_blocks_exit(inode, map, flags, 0);
 +      trace_erofs_map_blocks_exit(inode, map, 0, err);
        return err;
  }
  
@@@ -250,7 -255,7 +250,7 @@@ static int erofs_iomap_begin(struct ino
        map.m_la = offset;
        map.m_llen = length;
  
 -      ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
 +      ret = erofs_map_blocks(inode, &map);
        if (ret < 0)
                return ret;
  
@@@ -424,7 -429,7 +424,7 @@@ static int erofs_file_mmap(struct file 
                return -EINVAL;
  
        vma->vm_ops = &erofs_dax_vm_ops;
-       vma->vm_flags |= VM_HUGEPAGE;
+       vm_flags_set(vma, VM_HUGEPAGE);
        return 0;
  }
  #else
diff --combined fs/exec.c
index 5c00670d25f360da65829cf33809e84dfa55a89e,d2e2a15e5cfe62d20966dd8d354fbdaef3829132..7c44d0c65b1b4c7bcb91905110afc330d52a2bc8
+++ b/fs/exec.c
@@@ -270,7 -270,7 +270,7 @@@ static int __bprm_mm_init(struct linux_
        BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
        vma->vm_end = STACK_TOP_MAX;
        vma->vm_start = vma->vm_end - PAGE_SIZE;
-       vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
+       vm_flags_init(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP);
        vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
  
        err = insert_vm_struct(mm, vma);
@@@ -699,7 -699,7 +699,7 @@@ static int shift_arg_pages(struct vm_ar
        /*
         * cover the whole range: [new_start, old_end)
         */
-       if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+       if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
                return -ENOMEM;
  
        /*
        }
        tlb_finish_mmu(&tlb);
  
-       /*
-        * Shrink the vma to just the new range.  Always succeeds.
-        */
-       vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
-       return 0;
+       vma_prev(&vmi);
+       /* Shrink the vma to just the new range */
+       return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
  }
  
  /*
@@@ -758,6 -755,7 +755,7 @@@ int setup_arg_pages(struct linux_binpr
        unsigned long stack_expand;
        unsigned long rlim_stack;
        struct mmu_gather tlb;
+       struct vma_iterator vmi;
  
  #ifdef CONFIG_STACK_GROWSUP
        /* Limit stack size */
        vm_flags |= mm->def_flags;
        vm_flags |= VM_STACK_INCOMPLETE_SETUP;
  
+       vma_iter_init(&vmi, mm, vma->vm_start);
        tlb_gather_mmu(&tlb, mm);
-       ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end,
+       ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end,
                        vm_flags);
        tlb_finish_mmu(&tlb);
  
        }
  
        /* mprotect_fixup is overkill to remove the temporary stack flags */
-       vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
+       vm_flags_clear(vma, VM_STACK_INCOMPLETE_SETUP);
  
        stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
        stack_size = vma->vm_end - vma->vm_start;
@@@ -1010,7 -1010,6 +1010,7 @@@ static int exec_mmap(struct mm_struct *
        active_mm = tsk->active_mm;
        tsk->active_mm = mm;
        tsk->mm = mm;
 +      mm_init_cid(mm);
        /*
         * This prevents preemption while active_mm is being loaded and
         * it and mm are being updated, which could cause problems for
@@@ -1415,15 -1414,15 +1415,15 @@@ EXPORT_SYMBOL(begin_new_exec)
  void would_dump(struct linux_binprm *bprm, struct file *file)
  {
        struct inode *inode = file_inode(file);
 -      struct user_namespace *mnt_userns = file_mnt_user_ns(file);
 -      if (inode_permission(mnt_userns, inode, MAY_READ) < 0) {
 +      struct mnt_idmap *idmap = file_mnt_idmap(file);
 +      if (inode_permission(idmap, inode, MAY_READ) < 0) {
                struct user_namespace *old, *user_ns;
                bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
  
                /* Ensure mm->user_ns contains the executable */
                user_ns = old = bprm->mm->user_ns;
                while ((user_ns != &init_user_ns) &&
 -                     !privileged_wrt_inode_uidgid(user_ns, mnt_userns, inode))
 +                     !privileged_wrt_inode_uidgid(user_ns, idmap, inode))
                        user_ns = user_ns->parent;
  
                if (old != user_ns) {
@@@ -1597,7 -1596,7 +1597,7 @@@ static void check_unsafe_exec(struct li
  static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
  {
        /* Handle suid and sgid on files */
 -      struct user_namespace *mnt_userns;
 +      struct mnt_idmap *idmap;
        struct inode *inode = file_inode(file);
        unsigned int mode;
        vfsuid_t vfsuid;
        if (!(mode & (S_ISUID|S_ISGID)))
                return;
  
 -      mnt_userns = file_mnt_user_ns(file);
 +      idmap = file_mnt_idmap(file);
  
        /* Be careful if suid/sgid is set */
        inode_lock(inode);
  
        /* reload atomically mode/uid/gid now that lock held */
        mode = inode->i_mode;
 -      vfsuid = i_uid_into_vfsuid(mnt_userns, inode);
 -      vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
 +      vfsuid = i_uid_into_vfsuid(idmap, inode);
 +      vfsgid = i_gid_into_vfsgid(idmap, inode);
        inode_unlock(inode);
  
        /* We ignore suid/sgid if there are no mappings for them in the ns */
@@@ -1823,7 -1822,6 +1823,7 @@@ static int bprm_execve(struct linux_bin
         */
        check_unsafe_exec(bprm);
        current->in_execve = 1;
 +      sched_mm_cid_before_execve(current);
  
        file = do_open_execat(fd, filename, flags);
        retval = PTR_ERR(file);
        if (retval < 0)
                goto out;
  
 +      sched_mm_cid_after_execve(current);
        /* execve succeeded */
        current->fs->in_exec = 0;
        current->in_execve = 0;
@@@ -1874,7 -1871,6 +1874,7 @@@ out
                force_fatal_sig(SIGSEGV);
  
  out_unmark:
 +      sched_mm_cid_after_execve(current);
        current->fs->in_exec = 0;
        current->in_execve = 0;
  
diff --combined fs/ext4/inode.c
index b936ee3af51e21c93efd9eb3a470a09c325130d3,98c018dcd3fd3230349ae7bd3575844c0d78b429..f67d0e1bf4e01daca7c6b4f952ddbc313e28e832
@@@ -1136,8 -1136,7 +1136,8 @@@ static int ext4_block_write_begin(struc
                for (i = 0; i < nr_wait; i++) {
                        int err2;
  
 -                      err2 = fscrypt_decrypt_pagecache_blocks(page, blocksize,
 +                      err2 = fscrypt_decrypt_pagecache_blocks(page_folio(page),
 +                                                              blocksize,
                                                                bh_offset(wait[i]));
                        if (err2) {
                                clear_buffer_uptodate(wait[i]);
@@@ -2596,8 -2595,8 +2596,8 @@@ static bool ext4_page_nomap_can_writeou
  static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
  {
        struct address_space *mapping = mpd->inode->i_mapping;
-       struct pagevec pvec;
-       unsigned int nr_pages;
+       struct folio_batch fbatch;
+       unsigned int nr_folios;
        long left = mpd->wbc->nr_to_write;
        pgoff_t index = mpd->first_page;
        pgoff_t end = mpd->last_page;
                tag = PAGECACHE_TAG_TOWRITE;
        else
                tag = PAGECACHE_TAG_DIRTY;
-       pagevec_init(&pvec);
+       folio_batch_init(&fbatch);
        mpd->map.m_len = 0;
        mpd->next_page = index;
        while (index <= end) {
-               nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
-                               tag);
-               if (nr_pages == 0)
+               nr_folios = filemap_get_folios_tag(mapping, &index, end,
+                               tag, &fbatch);
+               if (nr_folios == 0)
                        break;
  
-               for (i = 0; i < nr_pages; i++) {
-                       struct page *page = pvec.pages[i];
+               for (i = 0; i < nr_folios; i++) {
+                       struct folio *folio = fbatch.folios[i];
  
                        /*
                         * Accumulated enough dirty pages? This doesn't apply
                                goto out;
  
                        /* If we can't merge this page, we are done. */
-                       if (mpd->map.m_len > 0 && mpd->next_page != page->index)
+                       if (mpd->map.m_len > 0 && mpd->next_page != folio->index)
                                goto out;
  
-                       lock_page(page);
+                       folio_lock(folio);
                        /*
                         * If the page is no longer dirty, or its mapping no
                         * longer corresponds to inode we are writing (which
                         * page is already under writeback and we are not doing
                         * a data integrity writeback, skip the page
                         */
-                       if (!PageDirty(page) ||
-                           (PageWriteback(page) &&
+                       if (!folio_test_dirty(folio) ||
+                           (folio_test_writeback(folio) &&
                             (mpd->wbc->sync_mode == WB_SYNC_NONE)) ||
-                           unlikely(page->mapping != mapping)) {
-                               unlock_page(page);
+                           unlikely(folio->mapping != mapping)) {
+                               folio_unlock(folio);
                                continue;
                        }
  
-                       wait_on_page_writeback(page);
-                       BUG_ON(PageWriteback(page));
+                       folio_wait_writeback(folio);
+                       BUG_ON(folio_test_writeback(folio));
  
                        /*
                         * Should never happen but for buggy code in
                         *
                         * [1] https://lore.kernel.org/linux-mm/[email protected]
                         */
-                       if (!page_has_buffers(page)) {
-                               ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", page->index);
-                               ClearPageDirty(page);
-                               unlock_page(page);
+                       if (!folio_buffers(folio)) {
+                               ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", folio->index);
+                               folio_clear_dirty(folio);
+                               folio_unlock(folio);
                                continue;
                        }
  
                        if (mpd->map.m_len == 0)
-                               mpd->first_page = page->index;
-                       mpd->next_page = page->index + 1;
+                               mpd->first_page = folio->index;
+                       mpd->next_page = folio->index + folio_nr_pages(folio);
                        /*
                         * Writeout for transaction commit where we cannot
                         * modify metadata is simple. Just submit the page.
                         */
                        if (!mpd->can_map) {
-                               if (ext4_page_nomap_can_writeout(page)) {
-                                       err = mpage_submit_page(mpd, page);
+                               if (ext4_page_nomap_can_writeout(&folio->page)) {
+                                       err = mpage_submit_page(mpd, &folio->page);
                                        if (err < 0)
                                                goto out;
                                } else {
-                                       unlock_page(page);
-                                       mpd->first_page++;
+                                       folio_unlock(folio);
+                                       mpd->first_page += folio_nr_pages(folio);
                                }
                        } else {
                                /* Add all dirty buffers to mpd */
-                               lblk = ((ext4_lblk_t)page->index) <<
+                               lblk = ((ext4_lblk_t)folio->index) <<
                                        (PAGE_SHIFT - blkbits);
-                               head = page_buffers(page);
+                               head = folio_buffers(folio);
                                err = mpage_process_page_bufs(mpd, head, head,
-                                                             lblk);
+                                               lblk);
                                if (err <= 0)
                                        goto out;
                                err = 0;
                        }
-                       left--;
+                       left -= folio_nr_pages(folio);
                }
-               pagevec_release(&pvec);
+               folio_batch_release(&fbatch);
                cond_resched();
        }
        mpd->scanned_until_end = 1;
        return 0;
  out:
-       pagevec_release(&pvec);
+       folio_batch_release(&fbatch);
        return err;
  }
  
- static int ext4_writepage_cb(struct page *page, struct writeback_control *wbc,
+ static int ext4_writepage_cb(struct folio *folio, struct writeback_control *wbc,
                             void *data)
  {
-       return ext4_writepage(page, wbc);
+       return ext4_writepage(&folio->page, wbc);
  }
  
  static int ext4_do_writepages(struct mpage_da_data *mpd)
@@@ -3859,8 -3857,7 +3858,8 @@@ static int __ext4_block_zero_page_range
                if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
                        /* We expect the key to be set. */
                        BUG_ON(!fscrypt_has_encryption_key(inode));
 -                      err = fscrypt_decrypt_pagecache_blocks(page, blocksize,
 +                      err = fscrypt_decrypt_pagecache_blocks(page_folio(page),
 +                                                             blocksize,
                                                               bh_offset(bh));
                        if (err) {
                                clear_buffer_uptodate(bh);
@@@ -5436,7 -5433,7 +5435,7 @@@ static void ext4_wait_for_tail_page_com
   *
   * Called with inode->i_rwsem down.
   */
 -int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 +int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
                 struct iattr *attr)
  {
        struct inode *inode = d_inode(dentry);
                                  ATTR_GID | ATTR_TIMES_SET))))
                return -EPERM;
  
 -      error = setattr_prepare(mnt_userns, dentry, attr);
 +      error = setattr_prepare(idmap, dentry, attr);
        if (error)
                return error;
  
        if (error)
                return error;
  
 -      if (is_quota_modification(mnt_userns, inode, attr)) {
 +      if (is_quota_modification(idmap, inode, attr)) {
                error = dquot_initialize(inode);
                if (error)
                        return error;
        }
  
 -      if (i_uid_needs_update(mnt_userns, attr, inode) ||
 -          i_gid_needs_update(mnt_userns, attr, inode)) {
 +      if (i_uid_needs_update(idmap, attr, inode) ||
 +          i_gid_needs_update(idmap, attr, inode)) {
                handle_t *handle;
  
                /* (user+group)*(old+new) structure, inode write (sb,
                 * counts xattr inode references.
                 */
                down_read(&EXT4_I(inode)->xattr_sem);
 -              error = dquot_transfer(mnt_userns, inode, attr);
 +              error = dquot_transfer(idmap, inode, attr);
                up_read(&EXT4_I(inode)->xattr_sem);
  
                if (error) {
                }
                /* Update corresponding info in inode so that everything is in
                 * one transaction */
 -              i_uid_update(mnt_userns, attr, inode);
 -              i_gid_update(mnt_userns, attr, inode);
 +              i_uid_update(idmap, attr, inode);
 +              i_gid_update(idmap, attr, inode);
                error = ext4_mark_inode_dirty(handle, inode);
                ext4_journal_stop(handle);
                if (unlikely(error)) {
@@@ -5632,7 -5629,7 +5631,7 @@@ out_mmap_sem
        if (!error) {
                if (inc_ivers)
                        inode_inc_iversion(inode);
 -              setattr_copy(mnt_userns, inode, attr);
 +              setattr_copy(idmap, inode, attr);
                mark_inode_dirty(inode);
        }
  
                ext4_orphan_del(NULL, inode);
  
        if (!error && (ia_valid & ATTR_MODE))
 -              rc = posix_acl_chmod(mnt_userns, dentry, inode->i_mode);
 +              rc = posix_acl_chmod(idmap, dentry, inode->i_mode);
  
  err_out:
        if  (error)
@@@ -5670,7 -5667,7 +5669,7 @@@ u32 ext4_dio_alignment(struct inode *in
        return 1; /* use the iomap defaults */
  }
  
 -int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 +int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
                 struct kstat *stat, u32 request_mask, unsigned int query_flags)
  {
        struct inode *inode = d_inode(path->dentry);
                                  STATX_ATTR_NODUMP |
                                  STATX_ATTR_VERITY);
  
 -      generic_fillattr(mnt_userns, inode, stat);
 +      generic_fillattr(idmap, inode, stat);
        return 0;
  }
  
 -int ext4_file_getattr(struct user_namespace *mnt_userns,
 +int ext4_file_getattr(struct mnt_idmap *idmap,
                      const struct path *path, struct kstat *stat,
                      u32 request_mask, unsigned int query_flags)
  {
        struct inode *inode = d_inode(path->dentry);
        u64 delalloc_blocks;
  
 -      ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
 +      ext4_getattr(idmap, path, stat, request_mask, query_flags);
  
        /*
         * If there is inline data in the inode, the inode will normally not
diff --combined fs/ext4/super.c
index 2ae46d11aa308baef09daa71dcd76db29685312e,49a8942b1e51c7c4f653f55a511562c91b8e8c42..faae0549347175eea372524ece2b0f7b7cf19d75
@@@ -482,7 -482,7 +482,7 @@@ static void ext4_journal_commit_callbac
   *
   * However, we may have to redirty a page (see below.)
   */
- static int ext4_journalled_writepage_callback(struct page *page,
+ static int ext4_journalled_writepage_callback(struct folio *folio,
                                              struct writeback_control *wbc,
                                              void *data)
  {
        struct buffer_head *bh, *head;
        struct journal_head *jh;
  
-       bh = head = page_buffers(page);
+       bh = head = folio_buffers(folio);
        do {
                /*
                 * We have to redirty a page in these cases:
                if (buffer_dirty(bh) ||
                    (jh && (jh->b_transaction != transaction ||
                            jh->b_next_transaction))) {
-                       redirty_page_for_writepage(wbc, page);
+                       folio_redirty_for_writepage(wbc, folio);
                        goto out;
                }
        } while ((bh = bh->b_this_page) != head);
@@@ -2635,6 -2635,7 +2635,6 @@@ static int ext4_check_test_dummy_encryp
  {
        const struct ext4_fs_context *ctx = fc->fs_private;
        const struct ext4_sb_info *sbi = EXT4_SB(sb);
 -      int err;
  
        if (!fscrypt_is_dummy_policy_set(&ctx->dummy_enc_policy))
                return 0;
                         "Conflicting test_dummy_encryption options");
                return -EINVAL;
        }
 -      /*
 -       * fscrypt_add_test_dummy_key() technically changes the super_block, so
 -       * technically it should be delayed until ext4_apply_options() like the
 -       * other changes.  But since we never get here for remounts (see above),
 -       * and this is the last chance to report errors, we do it here.
 -       */
 -      err = fscrypt_add_test_dummy_key(sb, &ctx->dummy_enc_policy);
 -      if (err)
 -              ext4_msg(NULL, KERN_WARNING,
 -                       "Error adding test dummy encryption key [%d]", err);
 -      return err;
 +      return 0;
  }
  
  static void ext4_apply_test_dummy_encryption(struct ext4_fs_context *ctx,
@@@ -5325,6 -5336,11 +5325,6 @@@ static int __ext4_fill_super(struct fs_
                }
        }
  
 -      if (ext4_has_feature_verity(sb) && sb->s_blocksize != PAGE_SIZE) {
 -              ext4_msg(sb, KERN_ERR, "Unsupported blocksize for fs-verity");
 -              goto failed_mount_wq;
 -      }
 -
        /*
         * Get the # of file system overhead blocks from the
         * superblock if present.
diff --combined fs/f2fs/data.c
index 8630df80fedb65dcd8ee3c85b6f57a0cd20543dd,b02c5b3842045c2b0316278347307c47bca2ff66..41addc605350f72b897ceb08fe4544795dc783dc
@@@ -2053,7 -2053,8 +2053,7 @@@ out
  
  static inline loff_t f2fs_readpage_limit(struct inode *inode)
  {
 -      if (IS_ENABLED(CONFIG_FS_VERITY) &&
 -          (IS_VERITY(inode) || f2fs_verity_in_progress(inode)))
 +      if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
                return inode->i_sb->s_maxbytes;
  
        return i_size_read(inode);
@@@ -2956,6 -2957,7 +2956,7 @@@ static int f2fs_write_cache_pages(struc
        int ret = 0;
        int done = 0, retry = 0;
        struct page *pages[F2FS_ONSTACK_PAGES];
+       struct folio_batch fbatch;
        struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
        struct bio *bio = NULL;
        sector_t last_block;
                .private = NULL,
        };
  #endif
+       int nr_folios, p, idx;
        int nr_pages;
        pgoff_t index;
        pgoff_t end;            /* Inclusive */
        int submitted = 0;
        int i;
  
+       folio_batch_init(&fbatch);
        if (get_dirty_pages(mapping->host) <=
                                SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
                set_inode_flag(mapping->host, FI_HOT_DATA);
@@@ -3011,13 -3016,38 +3015,38 @@@ retry
                tag_pages_for_writeback(mapping, index, end);
        done_index = index;
        while (!done && !retry && (index <= end)) {
-               nr_pages = find_get_pages_range_tag(mapping, &index, end,
-                               tag, F2FS_ONSTACK_PAGES, pages);
-               if (nr_pages == 0)
+               nr_pages = 0;
+ again:
+               nr_folios = filemap_get_folios_tag(mapping, &index, end,
+                               tag, &fbatch);
+               if (nr_folios == 0) {
+                       if (nr_pages)
+                               goto write;
                        break;
+               }
  
+               for (i = 0; i < nr_folios; i++) {
+                       struct folio *folio = fbatch.folios[i];
+                       idx = 0;
+                       p = folio_nr_pages(folio);
+ add_more:
+                       pages[nr_pages] = folio_page(folio, idx);
+                       folio_get(folio);
+                       if (++nr_pages == F2FS_ONSTACK_PAGES) {
+                               index = folio->index + idx + 1;
+                               folio_batch_release(&fbatch);
+                               goto write;
+                       }
+                       if (++idx < p)
+                               goto add_more;
+               }
+               folio_batch_release(&fbatch);
+               goto again;
+ write:
                for (i = 0; i < nr_pages; i++) {
                        struct page *page = pages[i];
+                       struct folio *folio = page_folio(page);
                        bool need_readd;
  readd:
                        need_readd = false;
                                }
  
                                if (!f2fs_cluster_can_merge_page(&cc,
-                                                               page->index)) {
+                                                               folio->index)) {
                                        ret = f2fs_write_multi_pages(&cc,
                                                &submitted, wbc, io_type);
                                        if (!ret)
                                }
  
                                if (unlikely(f2fs_cp_error(sbi)))
-                                       goto lock_page;
+                                       goto lock_folio;
  
                                if (!f2fs_cluster_is_empty(&cc))
-                                       goto lock_page;
+                                       goto lock_folio;
  
                                if (f2fs_all_cluster_page_ready(&cc,
                                        pages, i, nr_pages, true))
-                                       goto lock_page;
+                                       goto lock_folio;
  
                                ret2 = f2fs_prepare_compress_overwrite(
                                                        inode, &pagep,
-                                                       page->index, &fsdata);
+                                                       folio->index, &fsdata);
                                if (ret2 < 0) {
                                        ret = ret2;
                                        done = 1;
                                        break;
                                } else if (ret2 &&
                                        (!f2fs_compress_write_end(inode,
-                                               fsdata, page->index, 1) ||
+                                               fsdata, folio->index, 1) ||
                                         !f2fs_all_cluster_page_ready(&cc,
-                                               pages, i, nr_pages, false))) {
+                                               pages, i, nr_pages,
+                                               false))) {
                                        retry = 1;
                                        break;
                                }
                                break;
                        }
  #ifdef CONFIG_F2FS_FS_COMPRESSION
- lock_page:
+ lock_folio:
  #endif
-                       done_index = page->index;
+                       done_index = folio->index;
  retry_write:
-                       lock_page(page);
+                       folio_lock(folio);
  
-                       if (unlikely(page->mapping != mapping)) {
+                       if (unlikely(folio->mapping != mapping)) {
  continue_unlock:
-                               unlock_page(page);
+                               folio_unlock(folio);
                                continue;
                        }
  
-                       if (!PageDirty(page)) {
+                       if (!folio_test_dirty(folio)) {
                                /* someone wrote it for us */
                                goto continue_unlock;
                        }
  
-                       if (PageWriteback(page)) {
+                       if (folio_test_writeback(folio)) {
                                if (wbc->sync_mode != WB_SYNC_NONE)
-                                       f2fs_wait_on_page_writeback(page,
+                                       f2fs_wait_on_page_writeback(
+                                                       &folio->page,
                                                        DATA, true, true);
                                else
                                        goto continue_unlock;
                        }
  
-                       if (!clear_page_dirty_for_io(page))
+                       if (!folio_clear_dirty_for_io(folio))
                                goto continue_unlock;
  
  #ifdef CONFIG_F2FS_FS_COMPRESSION
                        if (f2fs_compressed_file(inode)) {
-                               get_page(page);
-                               f2fs_compress_ctx_add_page(&cc, page);
+                               folio_get(folio);
+                               f2fs_compress_ctx_add_page(&cc, &folio->page);
                                continue;
                        }
  #endif
-                       ret = f2fs_write_single_data_page(page, &submitted,
-                                       &bio, &last_block, wbc, io_type,
-                                       0, true);
+                       ret = f2fs_write_single_data_page(&folio->page,
+                                       &submitted, &bio, &last_block,
+                                       wbc, io_type, 0, true);
                        if (ret == AOP_WRITEPAGE_ACTIVATE)
-                               unlock_page(page);
+                               folio_unlock(folio);
  #ifdef CONFIG_F2FS_FS_COMPRESSION
  result:
  #endif
                                        }
                                        goto next;
                                }
-                               done_index = page->index + 1;
+                               done_index = folio->index +
+                                       folio_nr_pages(folio);
                                done = 1;
                                break;
                        }
diff --combined fs/fuse/file.c
index 82710d103556b468b8cb9a2fd75578aa92776656,3648747fb64d29e88d9a82fa0b78b269dc4fd44a..ff0b3ef774d492cbc3bb891d757cc895152cc8fc
@@@ -18,7 -18,6 +18,7 @@@
  #include <linux/falloc.h>
  #include <linux/uio.h>
  #include <linux/fs.h>
 +#include <linux/filelock.h>
  
  static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
                          unsigned int open_flags, int opcode,
@@@ -1314,8 -1313,7 +1314,8 @@@ static ssize_t fuse_cache_write_iter(st
                        return err;
  
                if (fc->handle_killpriv_v2 &&
 -                  setattr_should_drop_suidgid(&init_user_ns, file_inode(file))) {
 +                  setattr_should_drop_suidgid(&nop_mnt_idmap,
 +                                              file_inode(file))) {
                        goto writethrough;
                }
  
@@@ -2186,7 -2184,7 +2186,7 @@@ static bool fuse_writepage_need_send(st
        return false;
  }
  
- static int fuse_writepages_fill(struct page *page,
+ static int fuse_writepages_fill(struct folio *folio,
                struct writeback_control *wbc, void *_data)
  {
        struct fuse_fill_wb_data *data = _data;
                        goto out_unlock;
        }
  
-       if (wpa && fuse_writepage_need_send(fc, page, ap, data)) {
+       if (wpa && fuse_writepage_need_send(fc, &folio->page, ap, data)) {
                fuse_writepages_send(data);
                data->wpa = NULL;
        }
                data->max_pages = 1;
  
                ap = &wpa->ia.ap;
-               fuse_write_args_fill(&wpa->ia, data->ff, page_offset(page), 0);
+               fuse_write_args_fill(&wpa->ia, data->ff, folio_pos(folio), 0);
                wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
                wpa->next = NULL;
                ap->args.in_pages = true;
                ap->num_pages = 0;
                wpa->inode = inode;
        }
-       set_page_writeback(page);
+       folio_start_writeback(folio);
  
-       copy_highpage(tmp_page, page);
+       copy_highpage(tmp_page, &folio->page);
        ap->pages[ap->num_pages] = tmp_page;
        ap->descs[ap->num_pages].offset = 0;
        ap->descs[ap->num_pages].length = PAGE_SIZE;
-       data->orig_pages[ap->num_pages] = page;
+       data->orig_pages[ap->num_pages] = &folio->page;
  
        inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
        inc_node_page_state(tmp_page, NR_WRITEBACK_TEMP);
                spin_lock(&fi->lock);
                ap->num_pages++;
                spin_unlock(&fi->lock);
-       } else if (fuse_writepage_add(wpa, page)) {
+       } else if (fuse_writepage_add(wpa, &folio->page)) {
                data->wpa = wpa;
        } else {
-               end_page_writeback(page);
+               folio_end_writeback(folio);
        }
  out_unlock:
-       unlock_page(page);
+       folio_unlock(folio);
  
        return err;
  }
diff --combined fs/gfs2/aops.c
index 2748a82de42a80ff3a5d225f4baf67eb1e7d76c6,0a47068f9acce76c3cb617f6d0b9c5dbea625124..a5f4be6b9213edff2b7744c1be2209ca9ba609de
  #include "aops.h"
  
  
 -void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
 -                          unsigned int from, unsigned int len)
 +void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
 +                           unsigned int from, unsigned int len)
  {
 -      struct buffer_head *head = page_buffers(page);
 +      struct buffer_head *head = folio_buffers(folio);
        unsigned int bsize = head->b_size;
        struct buffer_head *bh;
        unsigned int to = from + len;
@@@ -127,6 -127,7 +127,6 @@@ static int __gfs2_jdata_writepage(struc
  {
        struct inode *inode = page->mapping->host;
        struct gfs2_inode *ip = GFS2_I(inode);
 -      struct gfs2_sbd *sdp = GFS2_SB(inode);
  
        if (PageChecked(page)) {
                ClearPageChecked(page);
                        create_empty_buffers(page, inode->i_sb->s_blocksize,
                                             BIT(BH_Dirty)|BIT(BH_Uptodate));
                }
 -              gfs2_page_add_databufs(ip, page, 0, sdp->sd_vfs->s_blocksize);
 +              gfs2_trans_add_databufs(ip, page_folio(page), 0, PAGE_SIZE);
        }
        return gfs2_write_jdata_page(page, wbc);
  }
@@@ -194,67 -195,71 +194,71 @@@ static int gfs2_writepages(struct addre
  }
  
  /**
-  * gfs2_write_jdata_pagevec - Write back a pagevec's worth of pages
+  * gfs2_write_jdata_batch - Write back a folio batch's worth of folios
   * @mapping: The mapping
   * @wbc: The writeback control
-  * @pvec: The vector of pages
-  * @nr_pages: The number of pages to write
+  * @fbatch: The batch of folios
   * @done_index: Page index
   *
   * Returns: non-zero if loop should terminate, zero otherwise
   */
  
- static int gfs2_write_jdata_pagevec(struct address_space *mapping,
+ static int gfs2_write_jdata_batch(struct address_space *mapping,
                                    struct writeback_control *wbc,
-                                   struct pagevec *pvec,
-                                   int nr_pages,
+                                   struct folio_batch *fbatch,
                                    pgoff_t *done_index)
  {
        struct inode *inode = mapping->host;
        struct gfs2_sbd *sdp = GFS2_SB(inode);
-       unsigned nrblocks = nr_pages * (PAGE_SIZE >> inode->i_blkbits);
+       unsigned nrblocks;
        int i;
        int ret;
+       int nr_pages = 0;
+       int nr_folios = folio_batch_count(fbatch);
+       for (i = 0; i < nr_folios; i++)
+               nr_pages += folio_nr_pages(fbatch->folios[i]);
+       nrblocks = nr_pages * (PAGE_SIZE >> inode->i_blkbits);
  
        ret = gfs2_trans_begin(sdp, nrblocks, nrblocks);
        if (ret < 0)
                return ret;
  
-       for(i = 0; i < nr_pages; i++) {
-               struct page *page = pvec->pages[i];
+       for (i = 0; i < nr_folios; i++) {
+               struct folio *folio = fbatch->folios[i];
  
-               *done_index = page->index;
+               *done_index = folio->index;
  
-               lock_page(page);
+               folio_lock(folio);
  
-               if (unlikely(page->mapping != mapping)) {
+               if (unlikely(folio->mapping != mapping)) {
  continue_unlock:
-                       unlock_page(page);
+                       folio_unlock(folio);
                        continue;
                }
  
-               if (!PageDirty(page)) {
+               if (!folio_test_dirty(folio)) {
                        /* someone wrote it for us */
                        goto continue_unlock;
                }
  
-               if (PageWriteback(page)) {
+               if (folio_test_writeback(folio)) {
                        if (wbc->sync_mode != WB_SYNC_NONE)
-                               wait_on_page_writeback(page);
+                               folio_wait_writeback(folio);
                        else
                                goto continue_unlock;
                }
  
-               BUG_ON(PageWriteback(page));
-               if (!clear_page_dirty_for_io(page))
+               BUG_ON(folio_test_writeback(folio));
+               if (!folio_clear_dirty_for_io(folio))
                        goto continue_unlock;
  
                trace_wbc_writepage(wbc, inode_to_bdi(inode));
  
-               ret = __gfs2_jdata_writepage(page, wbc);
+               ret = __gfs2_jdata_writepage(&folio->page, wbc);
                if (unlikely(ret)) {
                        if (ret == AOP_WRITEPAGE_ACTIVATE) {
-                               unlock_page(page);
+                               folio_unlock(folio);
                                ret = 0;
                        } else {
  
                                 * not be suitable for data integrity
                                 * writeout).
                                 */
-                               *done_index = page->index + 1;
+                               *done_index = folio->index +
+                                       folio_nr_pages(folio);
                                ret = 1;
                                break;
                        }
@@@ -304,8 -310,8 +309,8 @@@ static int gfs2_write_cache_jdata(struc
  {
        int ret = 0;
        int done = 0;
-       struct pagevec pvec;
-       int nr_pages;
+       struct folio_batch fbatch;
+       int nr_folios;
        pgoff_t writeback_index;
        pgoff_t index;
        pgoff_t end;
        int range_whole = 0;
        xa_mark_t tag;
  
-       pagevec_init(&pvec);
+       folio_batch_init(&fbatch);
        if (wbc->range_cyclic) {
                writeback_index = mapping->writeback_index; /* prev offset */
                index = writeback_index;
@@@ -340,17 -346,18 +345,18 @@@ retry
                tag_pages_for_writeback(mapping, index, end);
        done_index = index;
        while (!done && (index <= end)) {
-               nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
-                               tag);
-               if (nr_pages == 0)
+               nr_folios = filemap_get_folios_tag(mapping, &index, end,
+                               tag, &fbatch);
+               if (nr_folios == 0)
                        break;
  
-               ret = gfs2_write_jdata_pagevec(mapping, wbc, &pvec, nr_pages, &done_index);
+               ret = gfs2_write_jdata_batch(mapping, wbc, &fbatch,
+                               &done_index);
                if (ret)
                        done = 1;
                if (ret > 0)
                        ret = 0;
-               pagevec_release(&pvec);
+               folio_batch_release(&fbatch);
                cond_resched();
        }
  
diff --combined fs/gfs2/glops.c
index ad14818a790aa1df2c798a5af140037538a04ad8,081422644ec5efa8e41b6acfee08890a93ba1ce7..4d99cc77a29b7fd2e2e5dac9a0534c5f1755c7d6
@@@ -39,7 -39,7 +39,7 @@@ static void gfs2_ail_error(struct gfs2_
               "AIL buffer %p: blocknr %llu state 0x%08lx mapping %p page "
               "state 0x%lx\n",
               bh, (unsigned long long)bh->b_blocknr, bh->b_state,
-              bh->b_page->mapping, bh->b_page->flags);
+              bh->b_folio->mapping, bh->b_folio->flags);
        fs_err(sdp, "AIL glock %u:%llu mapping %p\n",
               gl->gl_name.ln_type, gl->gl_name.ln_number,
               gfs2_glock2aspace(gl));
@@@ -193,7 -193,7 +193,7 @@@ static int rgrp_go_sync(struct gfs2_glo
        struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(gl);
        int error;
  
 -      if (!test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
 +      if (!rgd || !test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
                return 0;
        GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
  
@@@ -222,12 -222,9 +222,12 @@@ static void rgrp_go_inval(struct gfs2_g
        struct address_space *mapping = &sdp->sd_aspace;
        struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(gl);
        const unsigned bsize = sdp->sd_sb.sb_bsize;
 -      loff_t start = (rgd->rd_addr * bsize) & PAGE_MASK;
 -      loff_t end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
 +      loff_t start, end;
  
 +      if (!rgd)
 +              return;
 +      start = (rgd->rd_addr * bsize) & PAGE_MASK;
 +      end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
        gfs2_rgrp_brelse(rgd);
        WARN_ON_ONCE(!(flags & DIO_METADATA));
        truncate_inode_pages_range(mapping, start, end);
@@@ -648,18 -645,23 +648,18 @@@ static void iopen_go_callback(struct gf
        struct gfs2_inode *ip = gl->gl_object;
        struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
  
 -      if (!remote || sb_rdonly(sdp->sd_vfs))
 +      if (!remote || sb_rdonly(sdp->sd_vfs) ||
 +          test_bit(SDF_DEACTIVATING, &sdp->sd_flags))
                return;
  
        if (gl->gl_demote_state == LM_ST_UNLOCKED &&
            gl->gl_state == LM_ST_SHARED && ip) {
                gl->gl_lockref.count++;
 -              if (!queue_delayed_work(gfs2_delete_workqueue,
 -                                      &gl->gl_delete, 0))
 +              if (!gfs2_queue_try_to_evict(gl))
                        gl->gl_lockref.count--;
        }
  }
  
 -static int iopen_go_demote_ok(const struct gfs2_glock *gl)
 -{
 -       return !gfs2_delete_work_queued(gl);
 -}
 -
  /**
   * inode_go_free - wake up anyone waiting for dlm's unlock ast to free it
   * @gl: glock being freed
@@@ -765,6 -767,7 +765,6 @@@ const struct gfs2_glock_operations gfs2
        .go_type = LM_TYPE_IOPEN,
        .go_callback = iopen_go_callback,
        .go_dump = inode_go_dump,
 -      .go_demote_ok = iopen_go_demote_ok,
        .go_flags = GLOF_LRU | GLOF_NONDISK,
        .go_subclass = 1,
  };
diff --combined fs/gfs2/log.c
index 61323deb80bc7b70d91a161898fff3ea6a31e2a1,1fcc829f02ab291186edda465c6f811226302fd3..d750d1128bed7c433e075f1e9f33deea0098ed0d
@@@ -80,15 -80,6 +80,15 @@@ void gfs2_remove_from_ail(struct gfs2_b
        brelse(bd->bd_bh);
  }
  
- static int __gfs2_writepage(struct page *page, struct writeback_control *wbc,
++static int __gfs2_writepage(struct folio *folio, struct writeback_control *wbc,
 +                     void *data)
 +{
 +      struct address_space *mapping = data;
-       int ret = mapping->a_ops->writepage(page, wbc);
++      int ret = mapping->a_ops->writepage(&folio->page, wbc);
 +      mapping_set_error(mapping, ret);
 +      return ret;
 +}
 +
  /**
   * gfs2_ail1_start_one - Start I/O on a transaction
   * @sdp: The superblock
@@@ -136,11 -127,11 +136,11 @@@ __acquires(&sdp->sd_ail_lock
                        continue;
                gl = bd->bd_gl;
                list_move(&bd->bd_ail_st_list, &tr->tr_ail1_list);
-               mapping = bh->b_page->mapping;
+               mapping = bh->b_folio->mapping;
                if (!mapping)
                        continue;
                spin_unlock(&sdp->sd_ail_lock);
 -              ret = filemap_fdatawrite_wbc(mapping, wbc);
 +              ret = write_cache_pages(mapping, wbc, __gfs2_writepage, mapping);
                if (need_resched()) {
                        blk_finish_plug(plug);
                        cond_resched();
diff --combined fs/hugetlbfs/inode.c
index 0ce1cc4c2add6720fddc9d0186928c1a089de78f,cfd09f95551b85345acebff13c4c5beb4ff37e32..9062da6da567534c50c8edccf8ae50c82b21b602
@@@ -132,7 -132,7 +132,7 @@@ static int hugetlbfs_file_mmap(struct f
         * way when do_mmap unwinds (may be important on powerpc
         * and ia64).
         */
-       vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
+       vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
        vma->vm_ops = &hugetlb_vm_ops;
  
        ret = seal_check_future_write(info->seals, vma);
@@@ -388,9 -388,7 +388,7 @@@ static bool hugetlb_vma_maps_page(struc
  {
        pte_t *ptep, pte;
  
-       ptep = huge_pte_offset(vma->vm_mm, addr,
-                       huge_page_size(hstate_vma(vma)));
+       ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma)));
        if (!ptep)
                return false;
  
   */
  static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start)
  {
+       unsigned long offset = 0;
        if (vma->vm_pgoff < start)
-               return (start - vma->vm_pgoff) << PAGE_SHIFT;
-       else
-               return 0;
+               offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
+       return vma->vm_start + offset;
  }
  
  static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end)
@@@ -457,7 -457,7 +457,7 @@@ retry
                v_start = vma_offset_start(vma, start);
                v_end = vma_offset_end(vma, end);
  
-               if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page))
+               if (!hugetlb_vma_maps_page(vma, v_start, page))
                        continue;
  
                if (!hugetlb_vma_trylock_write(vma)) {
                        break;
                }
  
-               unmap_hugepage_range(vma, vma->vm_start + v_start, v_end,
-                               NULL, ZAP_FLAG_DROP_MARKER);
+               unmap_hugepage_range(vma, v_start, v_end, NULL,
+                                    ZAP_FLAG_DROP_MARKER);
                hugetlb_vma_unlock_write(vma);
        }
  
                 */
                v_start = vma_offset_start(vma, start);
                v_end = vma_offset_end(vma, end);
-               if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page))
-                       unmap_hugepage_range(vma, vma->vm_start + v_start,
-                                               v_end, NULL,
-                                               ZAP_FLAG_DROP_MARKER);
+               if (hugetlb_vma_maps_page(vma, v_start, page))
+                       unmap_hugepage_range(vma, v_start, v_end, NULL,
+                                            ZAP_FLAG_DROP_MARKER);
  
                kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
                hugetlb_vma_unlock_write(vma);
@@@ -540,8 -539,7 +539,7 @@@ hugetlb_vmdelete_list(struct rb_root_ca
                v_start = vma_offset_start(vma, start);
                v_end = vma_offset_end(vma, end);
  
-               unmap_hugepage_range(vma, vma->vm_start + v_start, v_end,
-                                    NULL, zap_flags);
+               unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags);
  
                /*
                 * Note that vma lock only exists for shared/non-private
@@@ -813,7 -811,7 +811,7 @@@ static long hugetlbfs_fallocate(struct 
         * as input to create an allocation policy.
         */
        vma_init(&pseudo_vma, mm);
-       pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
+       vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
        pseudo_vma.vm_file = file;
  
        for (index = start; index < end; index++) {
                 * This is supposed to be the vaddr where the page is being
                 * faulted in, but we have no vaddr here.
                 */
-               struct page *page;
+               struct folio *folio;
                unsigned long addr;
+               bool present;
  
                cond_resched();
  
                mutex_lock(&hugetlb_fault_mutex_table[hash]);
  
                /* See if already present in mapping to avoid alloc/free */
-               page = find_get_page(mapping, index);
-               if (page) {
-                       put_page(page);
+               rcu_read_lock();
+               present = page_cache_next_miss(mapping, index, 1) != index;
+               rcu_read_unlock();
+               if (present) {
                        mutex_unlock(&hugetlb_fault_mutex_table[hash]);
                        hugetlb_drop_vma_policy(&pseudo_vma);
                        continue;
                }
  
                /*
-                * Allocate page without setting the avoid_reserve argument.
+                * Allocate folio without setting the avoid_reserve argument.
                 * There certainly are no reserves associated with the
                 * pseudo_vma.  However, there could be shared mappings with
                 * reserves for the file at the inode level.  If we fallocate
-                * pages in these areas, we need to consume the reserves
+                * folios in these areas, we need to consume the reserves
                 * to keep reservation accounting consistent.
                 */
-               page = alloc_huge_page(&pseudo_vma, addr, 0);
+               folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0);
                hugetlb_drop_vma_policy(&pseudo_vma);
-               if (IS_ERR(page)) {
+               if (IS_ERR(folio)) {
                        mutex_unlock(&hugetlb_fault_mutex_table[hash]);
-                       error = PTR_ERR(page);
+                       error = PTR_ERR(folio);
                        goto out;
                }
-               clear_huge_page(page, addr, pages_per_huge_page(h));
-               __SetPageUptodate(page);
-               error = hugetlb_add_to_page_cache(page, mapping, index);
+               clear_huge_page(&folio->page, addr, pages_per_huge_page(h));
+               __folio_mark_uptodate(folio);
+               error = hugetlb_add_to_page_cache(folio, mapping, index);
                if (unlikely(error)) {
-                       restore_reserve_on_error(h, &pseudo_vma, addr, page);
-                       put_page(page);
+                       restore_reserve_on_error(h, &pseudo_vma, addr, folio);
+                       folio_put(folio);
                        mutex_unlock(&hugetlb_fault_mutex_table[hash]);
                        goto out;
                }
  
                mutex_unlock(&hugetlb_fault_mutex_table[hash]);
  
-               SetHPageMigratable(page);
+               folio_set_hugetlb_migratable(folio);
                /*
-                * unlock_page because locked by hugetlb_add_to_page_cache()
-                * put_page() due to reference from alloc_huge_page()
+                * folio_unlock because locked by hugetlb_add_to_page_cache()
+                * folio_put() due to reference from alloc_hugetlb_folio()
                 */
-               unlock_page(page);
-               put_page(page);
+               folio_unlock(folio);
+               folio_put(folio);
        }
  
        if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
@@@ -898,7 -898,7 +898,7 @@@ out
        return error;
  }
  
 -static int hugetlbfs_setattr(struct user_namespace *mnt_userns,
 +static int hugetlbfs_setattr(struct mnt_idmap *idmap,
                             struct dentry *dentry, struct iattr *attr)
  {
        struct inode *inode = d_inode(dentry);
        unsigned int ia_valid = attr->ia_valid;
        struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
  
 -      error = setattr_prepare(&init_user_ns, dentry, attr);
 +      error = setattr_prepare(&nop_mnt_idmap, dentry, attr);
        if (error)
                return error;
  
                hugetlb_vmtruncate(inode, newsize);
        }
  
 -      setattr_copy(&init_user_ns, inode, attr);
 +      setattr_copy(&nop_mnt_idmap, inode, attr);
        mark_inode_dirty(inode);
        return 0;
  }
@@@ -980,7 -980,7 +980,7 @@@ static struct inode *hugetlbfs_get_inod
                struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
  
                inode->i_ino = get_next_ino();
 -              inode_init_owner(&init_user_ns, inode, dir, mode);
 +              inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
                lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
                                &hugetlbfs_i_mmap_rwsem_key);
                inode->i_mapping->a_ops = &hugetlbfs_aops;
  /*
   * File creation. Allocate an inode, and we're done..
   */
 -static int hugetlbfs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 +static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
                           struct dentry *dentry, umode_t mode, dev_t dev)
  {
        struct inode *inode;
        return 0;
  }
  
 -static int hugetlbfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 +static int hugetlbfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
                           struct dentry *dentry, umode_t mode)
  {
 -      int retval = hugetlbfs_mknod(&init_user_ns, dir, dentry,
 +      int retval = hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry,
                                     mode | S_IFDIR, 0);
        if (!retval)
                inc_nlink(dir);
        return retval;
  }
  
 -static int hugetlbfs_create(struct user_namespace *mnt_userns,
 +static int hugetlbfs_create(struct mnt_idmap *idmap,
                            struct inode *dir, struct dentry *dentry,
                            umode_t mode, bool excl)
  {
 -      return hugetlbfs_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
 +      return hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0);
  }
  
 -static int hugetlbfs_tmpfile(struct user_namespace *mnt_userns,
 +static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
                             struct inode *dir, struct file *file,
                             umode_t mode)
  {
        return finish_open_simple(file, 0);
  }
  
 -static int hugetlbfs_symlink(struct user_namespace *mnt_userns,
 +static int hugetlbfs_symlink(struct mnt_idmap *idmap,
                             struct inode *dir, struct dentry *dentry,
                             const char *symname)
  {
diff --combined fs/iomap/buffered-io.c
index d3c300563eb8ff5179da0845bd37981f62107647,292d273a2c80e0b2eb6cc494f90cd4cef4678b83..6f4c97a6d7e9dcdaa5aa9a06159d33d91c0aa9f6
@@@ -457,33 -457,6 +457,33 @@@ bool iomap_is_partially_uptodate(struc
  }
  EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
  
 +/**
 + * iomap_get_folio - get a folio reference for writing
 + * @iter: iteration structure
 + * @pos: start offset of write
 + *
 + * Returns a locked reference to the folio at @pos, or an error pointer if the
 + * folio could not be obtained.
 + */
 +struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
 +{
 +      unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
 +      struct folio *folio;
 +
 +      if (iter->flags & IOMAP_NOWAIT)
 +              fgp |= FGP_NOWAIT;
 +
 +      folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
 +                      fgp, mapping_gfp_mask(iter->inode->i_mapping));
 +      if (folio)
 +              return folio;
 +
 +      if (iter->flags & IOMAP_NOWAIT)
 +              return ERR_PTR(-EAGAIN);
 +      return ERR_PTR(-ENOMEM);
 +}
 +EXPORT_SYMBOL_GPL(iomap_get_folio);
 +
  bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags)
  {
        trace_iomap_release_folio(folio->mapping->host, folio_pos(folio),
@@@ -602,30 -575,6 +602,30 @@@ static int __iomap_write_begin(const st
        return 0;
  }
  
 +static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos,
 +              size_t len)
 +{
 +      const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
 +
 +      if (folio_ops && folio_ops->get_folio)
 +              return folio_ops->get_folio(iter, pos, len);
 +      else
 +              return iomap_get_folio(iter, pos);
 +}
 +
 +static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret,
 +              struct folio *folio)
 +{
 +      const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
 +
 +      if (folio_ops && folio_ops->put_folio) {
 +              folio_ops->put_folio(iter->inode, pos, ret, folio);
 +      } else {
 +              folio_unlock(folio);
 +              folio_put(folio);
 +      }
 +}
 +
  static int iomap_write_begin_inline(const struct iomap_iter *iter,
                struct folio *folio)
  {
  static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
                size_t len, struct folio **foliop)
  {
 -      const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
 +      const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
        const struct iomap *srcmap = iomap_iter_srcmap(iter);
        struct folio *folio;
 -      unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
        int status = 0;
  
 -      if (iter->flags & IOMAP_NOWAIT)
 -              fgp |= FGP_NOWAIT;
 -
        BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
        if (srcmap != &iter->iomap)
                BUG_ON(pos + len > srcmap->offset + srcmap->length);
        if (!mapping_large_folio_support(iter->inode->i_mapping))
                len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
  
 -      if (page_ops && page_ops->page_prepare) {
 -              status = page_ops->page_prepare(iter->inode, pos, len);
 -              if (status)
 -                      return status;
 -      }
 -
 -      folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
 -                      fgp, mapping_gfp_mask(iter->inode->i_mapping));
 -      if (!folio) {
 -              status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
 -              goto out_no_page;
 -      }
 +      folio = __iomap_get_folio(iter, pos, len);
 +      if (IS_ERR(folio))
 +              return PTR_ERR(folio);
  
        /*
         * Now we have a locked folio, before we do anything with it we need to
         * could do the wrong thing here (zero a page range incorrectly or fail
         * to zero) and corrupt data.
         */
 -      if (page_ops && page_ops->iomap_valid) {
 -              bool iomap_valid = page_ops->iomap_valid(iter->inode,
 -                                                      &iter->iomap);
 +      if (folio_ops && folio_ops->iomap_valid) {
 +              bool iomap_valid = folio_ops->iomap_valid(iter->inode,
 +                                                       &iter->iomap);
                if (!iomap_valid) {
                        iter->iomap.flags |= IOMAP_F_STALE;
                        status = 0;
        return 0;
  
  out_unlock:
 -      folio_unlock(folio);
 -      folio_put(folio);
 +      __iomap_put_folio(iter, pos, 0, folio);
        iomap_write_failed(iter->inode, pos, len);
  
 -out_no_page:
 -      if (page_ops && page_ops->page_done)
 -              page_ops->page_done(iter->inode, pos, 0, NULL);
        return status;
  }
  
@@@ -746,6 -712,7 +746,6 @@@ static size_t iomap_write_end_inline(co
  static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
                size_t copied, struct folio *folio)
  {
 -      const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
        const struct iomap *srcmap = iomap_iter_srcmap(iter);
        loff_t old_size = iter->inode->i_size;
        size_t ret;
                i_size_write(iter->inode, pos + ret);
                iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
        }
 -      folio_unlock(folio);
 +      __iomap_put_folio(iter, pos, ret, folio);
  
        if (old_size < pos)
                pagecache_isize_extended(iter->inode, old_size, pos);
 -      if (page_ops && page_ops->page_done)
 -              page_ops->page_done(iter->inode, pos, ret, &folio->page);
 -      folio_put(folio);
 -
        if (ret < len)
                iomap_write_failed(iter->inode, pos + ret, len - ret);
        return ret;
@@@ -1714,10 -1685,9 +1714,9 @@@ done
   * For unwritten space on the page, we need to start the conversion to
   * regular allocated space.
   */
- static int
iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
+ static int iomap_do_writepage(struct folio *folio,
              struct writeback_control *wbc, void *data)
  {
-       struct folio *folio = page_folio(page);
        struct iomap_writepage_ctx *wpc = data;
        struct inode *inode = folio->mapping->host;
        u64 end_pos, isize;
diff --combined fs/mpage.c
index ce53179428db4dd4ff4ac6366a9307a3b33021c0,b9b7f6dc9c376a7b5cd56aa3401a51ca3b914af9..22b9de5ddd684d5606b421d497be85fa52144089
@@@ -198,7 -198,7 +198,7 @@@ static struct bio *do_mpage_readpage(st
        /*
         * Then do more get_blocks calls until we are done with this folio.
         */
-       map_bh->b_page = &folio->page;
+       map_bh->b_folio = folio;
        while (page_block < blocks_per_page) {
                map_bh->b_state = 0;
                map_bh->b_size = 0;
  
  alloc_new:
        if (args->bio == NULL) {
-               if (first_hole == blocks_per_page) {
-                       if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
-                                                               &folio->page))
-                               goto out;
-               }
                args->bio = bio_alloc(bdev, bio_max_segs(args->nr_pages), opf,
                                      gfp);
                if (args->bio == NULL)
@@@ -445,15 -440,14 +440,14 @@@ void clean_page_buffers(struct page *pa
        clean_buffers(page, ~0U);
  }
  
- static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
+ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
                      void *data)
  {
        struct mpage_data *mpd = data;
        struct bio *bio = mpd->bio;
-       struct address_space *mapping = page->mapping;
-       struct inode *inode = page->mapping->host;
+       struct address_space *mapping = folio->mapping;
+       struct inode *inode = mapping->host;
        const unsigned blkbits = inode->i_blkbits;
-       unsigned long end_index;
        const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
        sector_t last_block;
        sector_t block_in_file;
        int boundary = 0;
        sector_t boundary_block = 0;
        struct block_device *boundary_bdev = NULL;
-       int length;
+       size_t length;
        struct buffer_head map_bh;
        loff_t i_size = i_size_read(inode);
        int ret = 0;
+       struct buffer_head *head = folio_buffers(folio);
  
-       if (page_has_buffers(page)) {
-               struct buffer_head *head = page_buffers(page);
+       if (head) {
                struct buffer_head *bh = head;
  
                /* If they're all mapped and dirty, do it */
        /*
         * The page has no buffers: map it to disk
         */
-       BUG_ON(!PageUptodate(page));
-       block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
+       BUG_ON(!folio_test_uptodate(folio));
+       block_in_file = (sector_t)folio->index << (PAGE_SHIFT - blkbits);
+       /*
+        * Whole page beyond EOF? Skip allocating blocks to avoid leaking
+        * space.
+        */
+       if (block_in_file >= (i_size + (1 << blkbits) - 1) >> blkbits)
+               goto page_is_mapped;
        last_block = (i_size - 1) >> blkbits;
-       map_bh.b_page = page;
+       map_bh.b_folio = folio;
        for (page_block = 0; page_block < blocks_per_page; ) {
  
                map_bh.b_state = 0;
                map_bh.b_size = 1 << blkbits;
                if (mpd->get_block(inode, block_in_file, &map_bh, 1))
                        goto confused;
 +              if (!buffer_mapped(&map_bh))
 +                      goto confused;
                if (buffer_new(&map_bh))
                        clean_bdev_bh_alias(&map_bh);
                if (buffer_boundary(&map_bh)) {
        first_unmapped = page_block;
  
  page_is_mapped:
-       end_index = i_size >> PAGE_SHIFT;
-       if (page->index >= end_index) {
+       /* Don't bother writing beyond EOF, truncate will discard the folio */
+       if (folio_pos(folio) >= i_size)
+               goto confused;
+       length = folio_size(folio);
+       if (folio_pos(folio) + length > i_size) {
                /*
                 * The page straddles i_size.  It must be zeroed out on each
                 * and every writepage invocation because it may be mmapped.
                 * is zeroed when mapped, and writes to that region are not
                 * written out to the file."
                 */
-               unsigned offset = i_size & (PAGE_SIZE - 1);
-               if (page->index > end_index || !offset)
-                       goto confused;
-               zero_user_segment(page, offset, PAGE_SIZE);
+               length = i_size - folio_pos(folio);
+               folio_zero_segment(folio, length, folio_size(folio));
        }
  
        /*
  
  alloc_new:
        if (bio == NULL) {
-               if (first_unmapped == blocks_per_page) {
-                       if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
-                                                               page, wbc))
-                               goto out;
-               }
                bio = bio_alloc(bdev, BIO_MAX_VECS,
                                REQ_OP_WRITE | wbc_to_write_flags(wbc),
                                GFP_NOFS);
         * the confused fail path above (OOM) will be very confused when
         * it finds all bh marked clean (i.e. it will not write anything)
         */
-       wbc_account_cgroup_owner(wbc, page, PAGE_SIZE);
+       wbc_account_cgroup_owner(wbc, &folio->page, folio_size(folio));
        length = first_unmapped << blkbits;
-       if (bio_add_page(bio, page, length, 0) < length) {
+       if (!bio_add_folio(bio, folio, length, 0)) {
                bio = mpage_bio_submit(bio);
                goto alloc_new;
        }
  
-       clean_buffers(page, first_unmapped);
+       clean_buffers(&folio->page, first_unmapped);
  
-       BUG_ON(PageWriteback(page));
-       set_page_writeback(page);
-       unlock_page(page);
+       BUG_ON(folio_test_writeback(folio));
+       folio_start_writeback(folio);
+       folio_unlock(folio);
        if (boundary || (first_unmapped != blocks_per_page)) {
                bio = mpage_bio_submit(bio);
                if (boundary_block) {
@@@ -628,7 -621,7 +623,7 @@@ confused
        /*
         * The caller has a ref on the inode, so *mapping is stable
         */
-       ret = block_write_full_page(page, mpd->get_block, wbc);
+       ret = block_write_full_page(&folio->page, mpd->get_block, wbc);
        mapping_set_error(mapping, ret);
  out:
        mpd->bio = bio;
   *
   * This is a library function, which implements the writepages()
   * address_space_operation.
-  *
-  * If a page is already under I/O, generic_writepages() skips it, even
-  * if it's dirty.  This is desirable behaviour for memory-cleaning writeback,
-  * but it is INCORRECT for data-integrity system calls such as fsync().  fsync()
-  * and msync() need to guarantee that all the data which was dirty at the time
-  * the call was made get new I/O started against them.  If wbc->sync_mode is
-  * WB_SYNC_ALL then we were called for data integrity and we must wait for
-  * existing IO to complete.
   */
  int
  mpage_writepages(struct address_space *mapping,
diff --combined fs/nfs/write.c
index b508c985eb14abdc7d41e793f1e37b1f8beafd8b,9d6432cb3f4465021855221aac55d94ee3bb0ff4..f4cca8f00c0c20f6906e4e2e08ea19fc1a692e00
@@@ -25,7 -25,6 +25,7 @@@
  #include <linux/freezer.h>
  #include <linux/wait.h>
  #include <linux/iversion.h>
 +#include <linux/filelock.h>
  
  #include <linux/uaccess.h>
  #include <linux/sched/mm.h>
@@@ -64,7 -63,7 +64,7 @@@ static void nfs_init_cinfo_from_inode(s
                                      struct inode *inode);
  static struct nfs_page *
  nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
 -                                              struct page *page);
 +                                              struct folio *folio);
  
  static struct kmem_cache *nfs_wdata_cachep;
  static mempool_t *nfs_wdata_mempool;
@@@ -171,28 -170,31 +171,28 @@@ nfs_cancel_remove_inode(struct nfs_pag
        return 0;
  }
  
 -static struct nfs_page *
 -nfs_page_private_request(struct page *page)
 +static struct nfs_page *nfs_folio_private_request(struct folio *folio)
  {
 -      if (!PagePrivate(page))
 -              return NULL;
 -      return (struct nfs_page *)page_private(page);
 +      return folio_get_private(folio);
  }
  
 -/*
 - * nfs_page_find_head_request_locked - find head request associated with @page
 +/**
 + * nfs_folio_find_private_request - find head request associated with a folio
 + * @folio: pointer to folio
   *
   * must be called while holding the inode lock.
   *
   * returns matching head request with reference held, or NULL if not found.
   */
 -static struct nfs_page *
 -nfs_page_find_private_request(struct page *page)
 +static struct nfs_page *nfs_folio_find_private_request(struct folio *folio)
  {
 -      struct address_space *mapping = page_file_mapping(page);
 +      struct address_space *mapping = folio_file_mapping(folio);
        struct nfs_page *req;
  
 -      if (!PagePrivate(page))
 +      if (!folio_test_private(folio))
                return NULL;
        spin_lock(&mapping->private_lock);
 -      req = nfs_page_private_request(page);
 +      req = nfs_folio_private_request(folio);
        if (req) {
                WARN_ON_ONCE(req->wb_head != req);
                kref_get(&req->wb_kref);
        return req;
  }
  
 -static struct nfs_page *
 -nfs_page_find_swap_request(struct page *page)
 +static struct nfs_page *nfs_folio_find_swap_request(struct folio *folio)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 +      struct inode *inode = folio_file_mapping(folio)->host;
        struct nfs_inode *nfsi = NFS_I(inode);
        struct nfs_page *req = NULL;
 -      if (!PageSwapCache(page))
 +      if (!folio_test_swapcache(folio))
                return NULL;
        mutex_lock(&nfsi->commit_mutex);
 -      if (PageSwapCache(page)) {
 +      if (folio_test_swapcache(folio)) {
                req = nfs_page_search_commits_for_head_request_locked(nfsi,
 -                      page);
 +                                                                    folio);
                if (req) {
                        WARN_ON_ONCE(req->wb_head != req);
                        kref_get(&req->wb_kref);
        return req;
  }
  
 -/*
 - * nfs_page_find_head_request - find head request associated with @page
 +/**
 + * nfs_folio_find_head_request - find head request associated with a folio
 + * @folio: pointer to folio
   *
   * returns matching head request with reference held, or NULL if not found.
   */
 -static struct nfs_page *nfs_page_find_head_request(struct page *page)
 +static struct nfs_page *nfs_folio_find_head_request(struct folio *folio)
  {
        struct nfs_page *req;
  
 -      req = nfs_page_find_private_request(page);
 +      req = nfs_folio_find_private_request(folio);
        if (!req)
 -              req = nfs_page_find_swap_request(page);
 +              req = nfs_folio_find_swap_request(folio);
        return req;
  }
  
 -static struct nfs_page *nfs_find_and_lock_page_request(struct page *page)
 +static struct nfs_page *nfs_folio_find_and_lock_request(struct folio *folio)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 +      struct inode *inode = folio_file_mapping(folio)->host;
        struct nfs_page *req, *head;
        int ret;
  
        for (;;) {
 -              req = nfs_page_find_head_request(page);
 +              req = nfs_folio_find_head_request(folio);
                if (!req)
                        return req;
                head = nfs_page_group_lock_head(req);
                        return ERR_PTR(ret);
                }
                /* Ensure that nobody removed the request before we locked it */
 -              if (head == nfs_page_private_request(page))
 +              if (head == nfs_folio_private_request(folio))
                        break;
 -              if (PageSwapCache(page))
 +              if (folio_test_swapcache(folio))
                        break;
                nfs_unlock_and_release_request(head);
        }
  }
  
  /* Adjust the file length if we're writing beyond the end */
 -static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
 +static void nfs_grow_file(struct folio *folio, unsigned int offset,
 +                        unsigned int count)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 +      struct inode *inode = folio_file_mapping(folio)->host;
        loff_t end, i_size;
        pgoff_t end_index;
  
        spin_lock(&inode->i_lock);
        i_size = i_size_read(inode);
 -      end_index = (i_size - 1) >> PAGE_SHIFT;
 -      if (i_size > 0 && page_index(page) < end_index)
 +      end_index = ((i_size - 1) >> folio_shift(folio)) << folio_order(folio);
 +      if (i_size > 0 && folio_index(folio) < end_index)
                goto out;
 -      end = page_file_offset(page) + ((loff_t)offset+count);
 +      end = folio_file_pos(folio) + (loff_t)offset + (loff_t)count;
        if (i_size >= end)
                goto out;
        trace_nfs_size_grow(inode, end);
@@@ -306,11 -307,11 +306,11 @@@ static void nfs_set_pageerror(struct ad
        spin_unlock(&inode->i_lock);
  }
  
 -static void nfs_mapping_set_error(struct page *page, int error)
 +static void nfs_mapping_set_error(struct folio *folio, int error)
  {
 -      struct address_space *mapping = page_file_mapping(page);
 +      struct address_space *mapping = folio_file_mapping(folio);
  
 -      SetPageError(page);
 +      folio_set_error(folio);
        filemap_set_wb_err(mapping, error);
        if (mapping->host)
                errseq_set(&mapping->host->i_sb->s_wb_err,
@@@ -357,9 -358,9 +357,9 @@@ nfs_page_group_search_locked(struct nfs
   */
  static bool nfs_page_group_covers_page(struct nfs_page *req)
  {
 +      unsigned int len = nfs_folio_length(nfs_page_to_folio(req));
        struct nfs_page *tmp;
        unsigned int pos = 0;
 -      unsigned int len = nfs_page_length(req->wb_page);
  
        nfs_page_group_lock(req);
  
   */
  static void nfs_mark_uptodate(struct nfs_page *req)
  {
 -      if (PageUptodate(req->wb_page))
 +      struct folio *folio = nfs_page_to_folio(req);
 +
 +      if (folio_test_uptodate(folio))
                return;
        if (!nfs_page_group_covers_page(req))
                return;
 -      SetPageUptodate(req->wb_page);
 +      folio_mark_uptodate(folio);
  }
  
  static int wb_priority(struct writeback_control *wbc)
@@@ -407,34 -406,35 +407,34 @@@ int nfs_congestion_kb
  #define NFS_CONGESTION_OFF_THRESH     \
        (NFS_CONGESTION_ON_THRESH - (NFS_CONGESTION_ON_THRESH >> 2))
  
 -static void nfs_set_page_writeback(struct page *page)
 +static void nfs_folio_set_writeback(struct folio *folio)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 -      struct nfs_server *nfss = NFS_SERVER(inode);
 -      int ret = test_set_page_writeback(page);
 -
 -      WARN_ON_ONCE(ret != 0);
 +      struct nfs_server *nfss = NFS_SERVER(folio_file_mapping(folio)->host);
  
 -      if (atomic_long_inc_return(&nfss->writeback) >
 -                      NFS_CONGESTION_ON_THRESH)
 +      folio_start_writeback(folio);
 +      if (atomic_long_inc_return(&nfss->writeback) > NFS_CONGESTION_ON_THRESH)
                nfss->write_congested = 1;
  }
  
 -static void nfs_end_page_writeback(struct nfs_page *req)
 +static void nfs_folio_end_writeback(struct folio *folio)
  {
 -      struct inode *inode = page_file_mapping(req->wb_page)->host;
 -      struct nfs_server *nfss = NFS_SERVER(inode);
 -      bool is_done;
 +      struct nfs_server *nfss = NFS_SERVER(folio_file_mapping(folio)->host);
  
 -      is_done = nfs_page_group_sync_on_bit(req, PG_WB_END);
 -      nfs_unlock_request(req);
 -      if (!is_done)
 -              return;
 -
 -      end_page_writeback(req->wb_page);
 -      if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
 +      folio_end_writeback(folio);
 +      if (atomic_long_dec_return(&nfss->writeback) <
 +          NFS_CONGESTION_OFF_THRESH)
                nfss->write_congested = 0;
  }
  
 +static void nfs_page_end_writeback(struct nfs_page *req)
 +{
 +      if (nfs_page_group_sync_on_bit(req, PG_WB_END)) {
 +              nfs_unlock_request(req);
 +              nfs_folio_end_writeback(nfs_page_to_folio(req));
 +      } else
 +              nfs_unlock_request(req);
 +}
 +
  /*
   * nfs_destroy_unlinked_subrequests - destroy recently unlinked subrequests
   *
@@@ -549,7 -549,7 +549,7 @@@ nfs_join_page_group(struct nfs_page *he
  
  /*
   * nfs_lock_and_join_requests - join all subreqs to the head req
 - * @page: the page used to lookup the "page group" of nfs_page structures
 + * @folio: the folio used to lookup the "page group" of nfs_page structures
   *
   * This function joins all sub requests to the head request by first
   * locking all requests in the group, cancelling any pending operations
   *
   * Returns a locked, referenced pointer to the head request - which after
   * this call is guaranteed to be the only request associated with the page.
 - * Returns NULL if no requests are found for @page, or a ERR_PTR if an
 + * Returns NULL if no requests are found for @folio, or a ERR_PTR if an
   * error was encountered.
   */
 -static struct nfs_page *
 -nfs_lock_and_join_requests(struct page *page)
 +static struct nfs_page *nfs_lock_and_join_requests(struct folio *folio)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 +      struct inode *inode = folio_file_mapping(folio)->host;
        struct nfs_page *head;
        int ret;
  
         * reference to the whole page group - the group will not be destroyed
         * until the head reference is released.
         */
 -      head = nfs_find_and_lock_page_request(page);
 +      head = nfs_folio_find_and_lock_request(folio);
        if (IS_ERR_OR_NULL(head))
                return head;
  
  
  static void nfs_write_error(struct nfs_page *req, int error)
  {
 -      trace_nfs_write_error(page_file_mapping(req->wb_page)->host, req,
 -                            error);
 -      nfs_mapping_set_error(req->wb_page, error);
 +      trace_nfs_write_error(nfs_page_to_inode(req), req, error);
 +      nfs_mapping_set_error(nfs_page_to_folio(req), error);
        nfs_inode_remove_request(req);
 -      nfs_end_page_writeback(req);
 +      nfs_page_end_writeback(req);
        nfs_release_request(req);
  }
  
   * Find an associated nfs write request, and prepare to flush it out
   * May return an error if the user signalled nfs_wait_on_request().
   */
 -static int nfs_page_async_flush(struct page *page,
 +static int nfs_page_async_flush(struct folio *folio,
                                struct writeback_control *wbc,
                                struct nfs_pageio_descriptor *pgio)
  {
        struct nfs_page *req;
        int ret = 0;
  
 -      req = nfs_lock_and_join_requests(page);
 +      req = nfs_lock_and_join_requests(folio);
        if (!req)
                goto out;
        ret = PTR_ERR(req);
        if (IS_ERR(req))
                goto out;
  
 -      nfs_set_page_writeback(page);
 +      nfs_folio_set_writeback(folio);
        WARN_ON_ONCE(test_bit(PG_CLEAN, &req->wb_flags));
  
        /* If there is a fatal error that covers this write, just exit */
                        goto out_launder;
                if (wbc->sync_mode == WB_SYNC_NONE)
                        ret = AOP_WRITEPAGE_ACTIVATE;
 -              redirty_page_for_writepage(wbc, page);
 +              folio_redirty_for_writepage(wbc, folio);
                nfs_redirty_request(req);
                pgio->pg_error = 0;
        } else
 -              nfs_add_stats(page_file_mapping(page)->host,
 -                              NFSIOS_WRITEPAGES, 1);
 +              nfs_add_stats(folio_file_mapping(folio)->host,
 +                            NFSIOS_WRITEPAGES, 1);
  out:
        return ret;
  out_launder:
        return 0;
  }
  
 -static int nfs_do_writepage(struct page *page, struct writeback_control *wbc,
 +static int nfs_do_writepage(struct folio *folio, struct writeback_control *wbc,
                            struct nfs_pageio_descriptor *pgio)
  {
 -      nfs_pageio_cond_complete(pgio, page_index(page));
 -      return nfs_page_async_flush(page, wbc, pgio);
 +      nfs_pageio_cond_complete(pgio, folio_index(folio));
 +      return nfs_page_async_flush(folio, wbc, pgio);
  }
  
  /*
   * Write an mmapped page to the server.
   */
 -static int nfs_writepage_locked(struct page *page,
 +static int nfs_writepage_locked(struct folio *folio,
                                struct writeback_control *wbc)
  {
        struct nfs_pageio_descriptor pgio;
 -      struct inode *inode = page_file_mapping(page)->host;
 +      struct inode *inode = folio_file_mapping(folio)->host;
        int err;
  
        if (wbc->sync_mode == WB_SYNC_NONE &&
                return AOP_WRITEPAGE_ACTIVATE;
  
        nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
 -      nfs_pageio_init_write(&pgio, inode, 0,
 -                              false, &nfs_async_write_completion_ops);
 -      err = nfs_do_writepage(page, wbc, &pgio);
 +      nfs_pageio_init_write(&pgio, inode, 0, false,
 +                            &nfs_async_write_completion_ops);
 +      err = nfs_do_writepage(folio, wbc, &pgio);
        pgio.pg_error = 0;
        nfs_pageio_complete(&pgio);
        return err;
  
  int nfs_writepage(struct page *page, struct writeback_control *wbc)
  {
 +      struct folio *folio = page_folio(page);
        int ret;
  
 -      ret = nfs_writepage_locked(page, wbc);
 +      ret = nfs_writepage_locked(folio, wbc);
        if (ret != AOP_WRITEPAGE_ACTIVATE)
                unlock_page(page);
        return ret;
  }
  
- static int nfs_writepages_callback(struct page *page,
+ static int nfs_writepages_callback(struct folio *folio,
 -              struct writeback_control *wbc, void *data)
 +                                 struct writeback_control *wbc, void *data)
  {
-       struct folio *folio = page_folio(page);
        int ret;
  
 -      ret = nfs_do_writepage(&folio->page, wbc, data);
 +      ret = nfs_do_writepage(folio, wbc, data);
        if (ret != AOP_WRITEPAGE_ACTIVATE)
-               unlock_page(page);
+               folio_unlock(folio);
        return ret;
  }
  
@@@ -750,11 -750,10 +749,11 @@@ out_err
  /*
   * Insert a write request into an inode
   */
 -static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 +static void nfs_inode_add_request(struct nfs_page *req)
  {
 -      struct address_space *mapping = page_file_mapping(req->wb_page);
 -      struct nfs_inode *nfsi = NFS_I(inode);
 +      struct folio *folio = nfs_page_to_folio(req);
 +      struct address_space *mapping = folio_file_mapping(folio);
 +      struct nfs_inode *nfsi = NFS_I(mapping->host);
  
        WARN_ON_ONCE(req->wb_this_page != req);
  
         * with invalidate/truncate.
         */
        spin_lock(&mapping->private_lock);
 -      if (likely(!PageSwapCache(req->wb_page))) {
 +      if (likely(!folio_test_swapcache(folio))) {
                set_bit(PG_MAPPED, &req->wb_flags);
 -              SetPagePrivate(req->wb_page);
 -              set_page_private(req->wb_page, (unsigned long)req);
 +              folio_set_private(folio);
 +              folio->private = req;
        }
        spin_unlock(&mapping->private_lock);
        atomic_long_inc(&nfsi->nrequests);
   */
  static void nfs_inode_remove_request(struct nfs_page *req)
  {
 -      struct address_space *mapping = page_file_mapping(req->wb_page);
 -      struct inode *inode = mapping->host;
 -      struct nfs_inode *nfsi = NFS_I(inode);
 -      struct nfs_page *head;
 -
        if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
 -              head = req->wb_head;
 +              struct folio *folio = nfs_page_to_folio(req->wb_head);
 +              struct address_space *mapping = folio_file_mapping(folio);
  
                spin_lock(&mapping->private_lock);
 -              if (likely(head->wb_page && !PageSwapCache(head->wb_page))) {
 -                      set_page_private(head->wb_page, 0);
 -                      ClearPagePrivate(head->wb_page);
 -                      clear_bit(PG_MAPPED, &head->wb_flags);
 +              if (likely(folio && !folio_test_swapcache(folio))) {
 +                      folio->private = NULL;
 +                      folio_clear_private(folio);
 +                      clear_bit(PG_MAPPED, &req->wb_head->wb_flags);
                }
                spin_unlock(&mapping->private_lock);
        }
  
        if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
                nfs_release_request(req);
 -              atomic_long_dec(&nfsi->nrequests);
 +              atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
        }
  }
  
 -static void
 -nfs_mark_request_dirty(struct nfs_page *req)
 +static void nfs_mark_request_dirty(struct nfs_page *req)
  {
 -      if (req->wb_page)
 -              __set_page_dirty_nobuffers(req->wb_page);
 +      struct folio *folio = nfs_page_to_folio(req);
 +      if (folio)
 +              filemap_dirty_folio(folio_mapping(folio), folio);
  }
  
  /*
   * nfs_page_search_commits_for_head_request_locked
   *
 - * Search through commit lists on @inode for the head request for @page.
 + * Search through commit lists on @inode for the head request for @folio.
   * Must be called while holding the inode (which is cinfo) lock.
   *
   * Returns the head request if found, or NULL if not found.
   */
  static struct nfs_page *
  nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
 -                                              struct page *page)
 +                                              struct folio *folio)
  {
        struct nfs_page *freq, *t;
        struct nfs_commit_info cinfo;
        nfs_init_cinfo_from_inode(&cinfo, inode);
  
        /* search through pnfs commit lists */
 -      freq = pnfs_search_commit_reqs(inode, &cinfo, page);
 +      freq = pnfs_search_commit_reqs(inode, &cinfo, folio);
        if (freq)
                return freq->wb_head;
  
        /* Linearly search the commit list for the correct request */
        list_for_each_entry_safe(freq, t, &cinfo.mds->list, wb_list) {
 -              if (freq->wb_page == page)
 +              if (nfs_page_to_folio(freq) == folio)
                        return freq->wb_head;
        }
  
@@@ -885,7 -888,8 +884,7 @@@ nfs_request_add_commit_list(struct nfs_
        mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
        nfs_request_add_commit_list_locked(req, &cinfo->mds->list, cinfo);
        mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
 -      if (req->wb_page)
 -              nfs_mark_page_unstable(req->wb_page, cinfo);
 +      nfs_folio_mark_unstable(nfs_page_to_folio(req), cinfo);
  }
  EXPORT_SYMBOL_GPL(nfs_request_add_commit_list);
  
@@@ -944,15 -948,12 +943,15 @@@ nfs_mark_request_commit(struct nfs_pag
        nfs_request_add_commit_list(req, cinfo);
  }
  
 -static void
 -nfs_clear_page_commit(struct page *page)
 +static void nfs_folio_clear_commit(struct folio *folio)
  {
 -      dec_node_page_state(page, NR_WRITEBACK);
 -      dec_wb_stat(&inode_to_bdi(page_file_mapping(page)->host)->wb,
 -                  WB_WRITEBACK);
 +      if (folio) {
 +              long nr = folio_nr_pages(folio);
 +
 +              node_stat_mod_folio(folio, NR_WRITEBACK, -nr);
 +              wb_stat_mod(&inode_to_bdi(folio_file_mapping(folio)->host)->wb,
 +                          WB_WRITEBACK, -nr);
 +      }
  }
  
  /* Called holding the request lock on @req */
@@@ -970,7 -971,7 +969,7 @@@ nfs_clear_request_commit(struct nfs_pag
                        nfs_request_remove_commit_list(req, &cinfo);
                }
                mutex_unlock(&NFS_I(inode)->commit_mutex);
 -              nfs_clear_page_commit(req->wb_page);
 +              nfs_folio_clear_commit(nfs_page_to_folio(req));
        }
  }
  
@@@ -1002,8 -1003,7 +1001,8 @@@ static void nfs_write_completion(struc
                if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) &&
                    (hdr->good_bytes < bytes)) {
                        trace_nfs_comp_error(hdr->inode, req, hdr->error);
 -                      nfs_mapping_set_error(req->wb_page, hdr->error);
 +                      nfs_mapping_set_error(nfs_page_to_folio(req),
 +                                            hdr->error);
                        goto remove_req;
                }
                if (nfs_write_need_commit(hdr)) {
  remove_req:
                nfs_inode_remove_request(req);
  next:
 -              nfs_end_page_writeback(req);
 +              nfs_page_end_writeback(req);
                nfs_release_request(req);
        }
  out:
@@@ -1093,9 -1093,10 +1092,9 @@@ nfs_scan_commit(struct inode *inode, st
   * If the attempt fails, then the existing request is flushed out
   * to disk.
   */
 -static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 -              struct page *page,
 -              unsigned int offset,
 -              unsigned int bytes)
 +static struct nfs_page *nfs_try_to_update_request(struct folio *folio,
 +                                                unsigned int offset,
 +                                                unsigned int bytes)
  {
        struct nfs_page *req;
        unsigned int rqend;
  
        end = offset + bytes;
  
 -      req = nfs_lock_and_join_requests(page);
 +      req = nfs_lock_and_join_requests(folio);
        if (IS_ERR_OR_NULL(req))
                return req;
  
@@@ -1137,7 -1138,7 +1136,7 @@@ out_flushme
         */
        nfs_mark_request_dirty(req);
        nfs_unlock_and_release_request(req);
 -      error = nfs_wb_page(inode, page);
 +      error = nfs_wb_folio(folio_file_mapping(folio)->host, folio);
        return (error < 0) ? ERR_PTR(error) : NULL;
  }
  
   * if we have to add a new request. Also assumes that the caller has
   * already called nfs_flush_incompatible() if necessary.
   */
 -static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 -              struct page *page, unsigned int offset, unsigned int bytes)
 +static struct nfs_page *nfs_setup_write_request(struct nfs_open_context *ctx,
 +                                              struct folio *folio,
 +                                              unsigned int offset,
 +                                              unsigned int bytes)
  {
 -      struct inode *inode = page_file_mapping(page)->host;
 -      struct nfs_page *req;
 +      struct nfs_page *req;
  
 -      req = nfs_try_to_update_request(inode, page, offset, bytes);
 +      req = nfs_try_to_update_request(folio, offset, bytes);
        if (req != NULL)
                goto out;
 -      req = nfs_create_request(ctx, page, offset, bytes);
 +      req = nfs_page_create_from_folio(ctx, folio, offset, bytes);
        if (IS_ERR(req))
                goto out;
 -      nfs_inode_add_request(inode, req);
 +      nfs_inode_add_request(req);
  out:
        return req;
  }
  
 -static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 -              unsigned int offset, unsigned int count)
 +static int nfs_writepage_setup(struct nfs_open_context *ctx,
 +                             struct folio *folio, unsigned int offset,
 +                             unsigned int count)
  {
 -      struct nfs_page *req;
 +      struct nfs_page *req;
  
 -      req = nfs_setup_write_request(ctx, page, offset, count);
 +      req = nfs_setup_write_request(ctx, folio, offset, count);
        if (IS_ERR(req))
                return PTR_ERR(req);
        /* Update file length */
 -      nfs_grow_file(page, offset, count);
 +      nfs_grow_file(folio, offset, count);
        nfs_mark_uptodate(req);
        nfs_mark_request_dirty(req);
        nfs_unlock_and_release_request(req);
        return 0;
  }
  
 -int nfs_flush_incompatible(struct file *file, struct page *page)
 +int nfs_flush_incompatible(struct file *file, struct folio *folio)
  {
        struct nfs_open_context *ctx = nfs_file_open_context(file);
        struct nfs_lock_context *l_ctx;
         * dropped page.
         */
        do {
 -              req = nfs_page_find_head_request(page);
 +              req = nfs_folio_find_head_request(folio);
                if (req == NULL)
                        return 0;
                l_ctx = req->wb_lock_context;
 -              do_flush = req->wb_page != page ||
 -                      !nfs_match_open_context(nfs_req_openctx(req), ctx);
 +              do_flush = nfs_page_to_folio(req) != folio ||
 +                         !nfs_match_open_context(nfs_req_openctx(req), ctx);
                if (l_ctx && flctx &&
                    !(list_empty_careful(&flctx->flc_posix) &&
                      list_empty_careful(&flctx->flc_flock))) {
                nfs_release_request(req);
                if (!do_flush)
                        return 0;
 -              status = nfs_wb_page(page_file_mapping(page)->host, page);
 +              status = nfs_wb_folio(folio_file_mapping(folio)->host, folio);
        } while (status == 0);
        return status;
  }
@@@ -1285,9 -1284,9 +1284,9 @@@ out
   * the PageUptodate() flag. In this case, we will need to turn off
   * write optimisations that depend on the page contents being correct.
   */
 -static bool nfs_write_pageuptodate(struct page *page, struct inode *inode,
 -                                 unsigned int pagelen)
 +static bool nfs_folio_write_uptodate(struct folio *folio, unsigned int pagelen)
  {
 +      struct inode *inode = folio_file_mapping(folio)->host;
        struct nfs_inode *nfsi = NFS_I(inode);
  
        if (nfs_have_delegated_attributes(inode))
  out:
        if (nfsi->cache_validity & NFS_INO_INVALID_DATA && pagelen != 0)
                return false;
 -      return PageUptodate(page) != 0;
 +      return folio_test_uptodate(folio) != 0;
  }
  
  static bool
@@@ -1319,17 -1318,16 +1318,17 @@@ is_whole_file_wrlock(struct file_lock *
   * If the file is opened for synchronous writes then we can just skip the rest
   * of the checks.
   */
 -static int nfs_can_extend_write(struct file *file, struct page *page,
 -                              struct inode *inode, unsigned int pagelen)
 +static int nfs_can_extend_write(struct file *file, struct folio *folio,
 +                              unsigned int pagelen)
  {
 -      int ret;
 +      struct inode *inode = file_inode(file);
        struct file_lock_context *flctx = locks_inode_context(inode);
        struct file_lock *fl;
 +      int ret;
  
        if (file->f_flags & O_DSYNC)
                return 0;
 -      if (!nfs_write_pageuptodate(page, inode, pagelen))
 +      if (!nfs_folio_write_uptodate(folio, pagelen))
                return 0;
        if (NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
                return 1;
   * XXX: Keep an eye on generic_file_read to make sure it doesn't do bad
   * things with a page scheduled for an RPC call (e.g. invalidate it).
   */
 -int nfs_updatepage(struct file *file, struct page *page,
 -              unsigned int offset, unsigned int count)
 +int nfs_update_folio(struct file *file, struct folio *folio,
 +                   unsigned int offset, unsigned int count)
  {
        struct nfs_open_context *ctx = nfs_file_open_context(file);
 -      struct address_space *mapping = page_file_mapping(page);
 -      struct inode    *inode = mapping->host;
 -      unsigned int    pagelen = nfs_page_length(page);
 +      struct address_space *mapping = folio_file_mapping(folio);
 +      struct inode *inode = mapping->host;
 +      unsigned int pagelen = nfs_folio_length(folio);
        int             status = 0;
  
        nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
  
 -      dprintk("NFS:       nfs_updatepage(%pD2 %d@%lld)\n",
 -              file, count, (long long)(page_file_offset(page) + offset));
 +      dprintk("NFS:       nfs_update_folio(%pD2 %d@%lld)\n", file, count,
 +              (long long)(folio_file_pos(folio) + offset));
  
        if (!count)
                goto out;
  
 -      if (nfs_can_extend_write(file, page, inode, pagelen)) {
 +      if (nfs_can_extend_write(file, folio, pagelen)) {
                count = max(count + offset, pagelen);
                offset = 0;
        }
  
 -      status = nfs_writepage_setup(ctx, page, offset, count);
 +      status = nfs_writepage_setup(ctx, folio, offset, count);
        if (status < 0)
                nfs_set_pageerror(mapping);
  out:
 -      dprintk("NFS:       nfs_updatepage returns %d (isize %lld)\n",
 +      dprintk("NFS:       nfs_update_folio returns %d (isize %lld)\n",
                        status, (long long)i_size_read(inode));
        return status;
  }
@@@ -1423,13 -1421,13 +1422,13 @@@ static void nfs_initiate_write(struct n
   */
  static void nfs_redirty_request(struct nfs_page *req)
  {
 -      struct nfs_inode *nfsi = NFS_I(page_file_mapping(req->wb_page)->host);
 +      struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
  
        /* Bump the transmission count */
        req->wb_nio++;
        nfs_mark_request_dirty(req);
        atomic_long_inc(&nfsi->redirtied_pages);
 -      nfs_end_page_writeback(req);
 +      nfs_page_end_writeback(req);
        nfs_release_request(req);
  }
  
@@@ -1787,18 -1785,18 +1786,18 @@@ void nfs_retry_commit(struct list_head 
                req = nfs_list_entry(page_list->next);
                nfs_list_remove_request(req);
                nfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx);
 -              if (!cinfo->dreq)
 -                      nfs_clear_page_commit(req->wb_page);
 +              nfs_folio_clear_commit(nfs_page_to_folio(req));
                nfs_unlock_and_release_request(req);
        }
  }
  EXPORT_SYMBOL_GPL(nfs_retry_commit);
  
 -static void
 -nfs_commit_resched_write(struct nfs_commit_info *cinfo,
 -              struct nfs_page *req)
 +static void nfs_commit_resched_write(struct nfs_commit_info *cinfo,
 +                                   struct nfs_page *req)
  {
 -      __set_page_dirty_nobuffers(req->wb_page);
 +      struct folio *folio = nfs_page_to_folio(req);
 +
 +      filemap_dirty_folio(folio_mapping(folio), folio);
  }
  
  /*
@@@ -1849,13 -1847,12 +1848,13 @@@ static void nfs_commit_release_pages(st
        int status = data->task.tk_status;
        struct nfs_commit_info cinfo;
        struct nfs_server *nfss;
 +      struct folio *folio;
  
        while (!list_empty(&data->pages)) {
                req = nfs_list_entry(data->pages.next);
                nfs_list_remove_request(req);
 -              if (req->wb_page)
 -                      nfs_clear_page_commit(req->wb_page);
 +              folio = nfs_page_to_folio(req);
 +              nfs_folio_clear_commit(folio);
  
                dprintk("NFS:       commit (%s/%llu %d@%lld)",
                        nfs_req_openctx(req)->dentry->d_sb->s_id,
                        req->wb_bytes,
                        (long long)req_offset(req));
                if (status < 0) {
 -                      if (req->wb_page) {
 +                      if (folio) {
                                trace_nfs_commit_error(data->inode, req,
                                                       status);
 -                              nfs_mapping_set_error(req->wb_page, status);
 +                              nfs_mapping_set_error(folio, status);
                                nfs_inode_remove_request(req);
                        }
                        dprintk_cont(", error = %d\n", status);
                 * returned by the server against all stored verfs. */
                if (nfs_write_match_verf(verf, req)) {
                        /* We have a match */
 -                      if (req->wb_page)
 +                      if (folio)
                                nfs_inode_remove_request(req);
                        dprintk_cont(" OK\n");
                        goto next;
@@@ -2058,7 -2055,7 +2057,7 @@@ int nfs_wb_folio_cancel(struct inode *i
  
        /* blocking call to cancel all requests and join to a single (head)
         * request */
 -      req = nfs_lock_and_join_requests(&folio->page);
 +      req = nfs_lock_and_join_requests(folio);
  
        if (IS_ERR(req)) {
                ret = PTR_ERR(req);
        return ret;
  }
  
 -/*
 - * Write back all requests on one page - we do this before reading it.
 +/**
 + * nfs_wb_folio - Write back all requests on one page
 + * @inode: pointer to page
 + * @folio: pointer to folio
 + *
 + * Assumes that the folio has been locked by the caller, and will
 + * not unlock it.
   */
 -int nfs_wb_page(struct inode *inode, struct page *page)
 +int nfs_wb_folio(struct inode *inode, struct folio *folio)
  {
 -      loff_t range_start = page_file_offset(page);
 -      loff_t range_end = range_start + (loff_t)(PAGE_SIZE - 1);
 +      loff_t range_start = folio_file_pos(folio);
 +      loff_t range_end = range_start + (loff_t)folio_size(folio) - 1;
        struct writeback_control wbc = {
                .sync_mode = WB_SYNC_ALL,
                .nr_to_write = 0,
        };
        int ret;
  
 -      trace_nfs_writeback_page_enter(inode);
 +      trace_nfs_writeback_folio(inode, folio);
  
        for (;;) {
 -              wait_on_page_writeback(page);
 -              if (clear_page_dirty_for_io(page)) {
 -                      ret = nfs_writepage_locked(page, &wbc);
 +              folio_wait_writeback(folio);
 +              if (folio_clear_dirty_for_io(folio)) {
 +                      ret = nfs_writepage_locked(folio, &wbc);
                        if (ret < 0)
                                goto out_error;
                        continue;
                }
                ret = 0;
 -              if (!PagePrivate(page))
 +              if (!folio_test_private(folio))
                        break;
                ret = nfs_commit_inode(inode, FLUSH_SYNC);
                if (ret < 0)
                        goto out_error;
        }
  out_error:
 -      trace_nfs_writeback_page_exit(inode, ret);
 +      trace_nfs_writeback_folio_done(inode, folio, ret);
        return ret;
  }
  
diff --combined fs/ntfs3/inode.c
index 8ce2616b087f1853a68a5982aca5f83a0fc815d6,9c646615f7140f714be91beb8f2f6a3eaebaf810..309d9b46b5d5cb9baccd2ff5e75a3599ebe57315
@@@ -832,32 -832,29 +832,29 @@@ out
        return err;
  }
  
- static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
+ static int ntfs_resident_writepage(struct folio *folio,
+               struct writeback_control *wbc, void *data)
  {
-       struct address_space *mapping = page->mapping;
-       struct inode *inode = mapping->host;
-       struct ntfs_inode *ni = ntfs_i(inode);
-       int err;
+       struct address_space *mapping = data;
+       struct ntfs_inode *ni = ntfs_i(mapping->host);
+       int ret;
  
-       if (is_resident(ni)) {
-               ni_lock(ni);
-               err = attr_data_write_resident(ni, page);
-               ni_unlock(ni);
-               if (err != E_NTFS_NONRESIDENT) {
-                       unlock_page(page);
-                       return err;
-               }
-       }
+       ni_lock(ni);
+       ret = attr_data_write_resident(ni, &folio->page);
+       ni_unlock(ni);
  
-       return block_write_full_page(page, ntfs_get_block, wbc);
+       if (ret != E_NTFS_NONRESIDENT)
+               folio_unlock(folio);
+       mapping_set_error(mapping, ret);
+       return ret;
  }
  
  static int ntfs_writepages(struct address_space *mapping,
                           struct writeback_control *wbc)
  {
-       /* Redirect call to 'ntfs_writepage' for resident files. */
        if (is_resident(ntfs_i(mapping->host)))
-               return generic_writepages(mapping, wbc);
+               return write_cache_pages(mapping, wbc, ntfs_resident_writepage,
+                                        mapping);
        return mpage_writepages(mapping, wbc, ntfs_get_block);
  }
  
@@@ -1185,7 -1182,7 +1182,7 @@@ out
   * 
   * NOTE: if fnd != NULL (ntfs_atomic_open) then @dir is locked
   */
 -struct inode *ntfs_create_inode(struct user_namespace *mnt_userns,
 +struct inode *ntfs_create_inode(struct mnt_idmap *idmap,
                                struct inode *dir, struct dentry *dentry,
                                const struct cpu_str *uni, umode_t mode,
                                dev_t dev, const char *symname, u32 size,
                goto out3;
        }
        inode = &ni->vfs_inode;
 -      inode_init_owner(mnt_userns, inode, dir, mode);
 +      inode_init_owner(idmap, inode, dir, mode);
        mode = inode->i_mode;
  
        inode->i_atime = inode->i_mtime = inode->i_ctime = ni->i_crtime =
  
  #ifdef CONFIG_NTFS3_FS_POSIX_ACL
        if (!S_ISLNK(mode) && (sb->s_flags & SB_POSIXACL)) {
 -              err = ntfs_init_acl(mnt_userns, inode, dir);
 +              err = ntfs_init_acl(idmap, inode, dir);
                if (err)
                        goto out7;
        } else
@@@ -2066,13 -2063,13 +2063,13 @@@ const struct inode_operations ntfs_link
  const struct address_space_operations ntfs_aops = {
        .read_folio     = ntfs_read_folio,
        .readahead      = ntfs_readahead,
-       .writepage      = ntfs_writepage,
        .writepages     = ntfs_writepages,
        .write_begin    = ntfs_write_begin,
        .write_end      = ntfs_write_end,
        .direct_IO      = ntfs_direct_IO,
        .bmap           = ntfs_bmap,
        .dirty_folio    = block_dirty_folio,
+       .migrate_folio  = buffer_migrate_folio,
        .invalidate_folio = block_invalidate_folio,
  };
  
diff --combined fs/orangefs/file.c
index 4ecb91a9bbebe4d4ed9a625735ef4864e775392d,a5e1ea8b71193a5a9f798c857b8444e4136a97b9..1a4301a38aa7c1a06e33c6da3fde1cee5b2c81cd
@@@ -14,7 -14,6 +14,7 @@@
  #include "orangefs-kernel.h"
  #include "orangefs-bufmap.h"
  #include <linux/fs.h>
 +#include <linux/filelock.h>
  #include <linux/pagemap.h>
  
  static int flush_racache(struct inode *inode)
@@@ -390,8 -389,7 +390,7 @@@ static int orangefs_file_mmap(struct fi
                     "orangefs_file_mmap: called on %pD\n", file);
  
        /* set the sequential readahead hint */
-       vma->vm_flags |= VM_SEQ_READ;
-       vma->vm_flags &= ~VM_RAND_READ;
+       vm_flags_mod(vma, VM_SEQ_READ, VM_RAND_READ);
  
        file_accessed(file);
        vma->vm_ops = &orangefs_file_vm_ops;
diff --combined fs/orangefs/inode.c
index c036851a6efe472d7779715468a93b5e5641a1a5,c25468974c8ab325d5422d686642741698df879f..aefdf1d3be7c4a0689dc6ff54e7faf651b8e4966
@@@ -49,8 -49,10 +49,8 @@@ static int orangefs_writepage_locked(st
        /* Should've been handled in orangefs_invalidate_folio. */
        WARN_ON(off == len || off + wlen > len);
  
 -      bv.bv_page = page;
 -      bv.bv_len = wlen;
 -      bv.bv_offset = off % PAGE_SIZE;
        WARN_ON(wlen == 0);
 +      bvec_set_page(&bv, page, wlen, off % PAGE_SIZE);
        iov_iter_bvec(&iter, ITER_SOURCE, &bv, 1, wlen);
  
        ret = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter, wlen,
@@@ -100,11 -102,15 +100,11 @@@ static int orangefs_writepages_work(str
  
        for (i = 0; i < ow->npages; i++) {
                set_page_writeback(ow->pages[i]);
 -              ow->bv[i].bv_page = ow->pages[i];
 -              ow->bv[i].bv_len = min(page_offset(ow->pages[i]) + PAGE_SIZE,
 -                  ow->off + ow->len) -
 -                  max(ow->off, page_offset(ow->pages[i]));
 -              if (i == 0)
 -                      ow->bv[i].bv_offset = ow->off -
 -                          page_offset(ow->pages[i]);
 -              else
 -                      ow->bv[i].bv_offset = 0;
 +              bvec_set_page(&ow->bv[i], ow->pages[i],
 +                            min(page_offset(ow->pages[i]) + PAGE_SIZE,
 +                                ow->off + ow->len) -
 +                            max(ow->off, page_offset(ow->pages[i])),
 +                            i == 0 ? ow->off - page_offset(ow->pages[i]) : 0);
        }
        iov_iter_bvec(&iter, ITER_SOURCE, ow->bv, ow->npages, ow->len);
  
        return ret;
  }
  
- static int orangefs_writepages_callback(struct page *page,
-     struct writeback_control *wbc, void *data)
+ static int orangefs_writepages_callback(struct folio *folio,
+               struct writeback_control *wbc, void *data)
  {
        struct orangefs_writepages *ow = data;
-       struct orangefs_write_range *wr;
+       struct orangefs_write_range *wr = folio->private;
        int ret;
  
-       if (!PagePrivate(page)) {
-               unlock_page(page);
+       if (!wr) {
+               folio_unlock(folio);
                /* It's not private so there's nothing to write, right? */
                printk("writepages_callback not private!\n");
                BUG();
                return 0;
        }
-       wr = (struct orangefs_write_range *)page_private(page);
  
        ret = -1;
        if (ow->npages == 0) {
                ow->len = wr->len;
                ow->uid = wr->uid;
                ow->gid = wr->gid;
-               ow->pages[ow->npages++] = page;
+               ow->pages[ow->npages++] = &folio->page;
                ret = 0;
                goto done;
        }
        }
        if (ow->off + ow->len == wr->pos) {
                ow->len += wr->len;
-               ow->pages[ow->npages++] = page;
+               ow->pages[ow->npages++] = &folio->page;
                ret = 0;
                goto done;
        }
@@@ -192,10 -197,10 +191,10 @@@ done
                        orangefs_writepages_work(ow, wbc);
                        ow->npages = 0;
                }
-               ret = orangefs_writepage_locked(page, wbc);
-               mapping_set_error(page->mapping, ret);
-               unlock_page(page);
-               end_page_writeback(page);
+               ret = orangefs_writepage_locked(&folio->page, wbc);
+               mapping_set_error(folio->mapping, ret);
+               folio_unlock(folio);
+               folio_end_writeback(folio);
        } else {
                if (ow->npages == ow->maxpages) {
                        orangefs_writepages_work(ow, wbc);
@@@ -294,7 -299,9 +293,7 @@@ static int orangefs_read_folio(struct f
                orangefs_launder_folio(folio);
  
        off = folio_pos(folio);
 -      bv.bv_page = &folio->page;
 -      bv.bv_len = folio_size(folio);
 -      bv.bv_offset = 0;
 +      bvec_set_folio(&bv, folio, folio_size(folio), 0);
        iov_iter_bvec(&iter, ITER_DEST, &bv, 1, folio_size(folio));
  
        ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, &off, &iter,
@@@ -814,7 -821,7 +813,7 @@@ again
                ORANGEFS_I(inode)->attr_uid = current_fsuid();
                ORANGEFS_I(inode)->attr_gid = current_fsgid();
        }
 -      setattr_copy(&init_user_ns, inode, iattr);
 +      setattr_copy(&nop_mnt_idmap, inode, iattr);
        spin_unlock(&inode->i_lock);
        mark_inode_dirty(inode);
  
@@@ -831,20 -838,20 +830,20 @@@ int __orangefs_setattr_mode(struct dent
        ret = __orangefs_setattr(inode, iattr);
        /* change mode on a file that has ACLs */
        if (!ret && (iattr->ia_valid & ATTR_MODE))
 -              ret = posix_acl_chmod(&init_user_ns, dentry, inode->i_mode);
 +              ret = posix_acl_chmod(&nop_mnt_idmap, dentry, inode->i_mode);
        return ret;
  }
  
  /*
   * Change attributes of an object referenced by dentry.
   */
 -int orangefs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 +int orangefs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
                     struct iattr *iattr)
  {
        int ret;
        gossip_debug(GOSSIP_INODE_DEBUG, "__orangefs_setattr: called on %pd\n",
            dentry);
 -      ret = setattr_prepare(&init_user_ns, dentry, iattr);
 +      ret = setattr_prepare(&nop_mnt_idmap, dentry, iattr);
        if (ret)
                goto out;
        ret = __orangefs_setattr_mode(dentry, iattr);
@@@ -858,7 -865,7 +857,7 @@@ out
  /*
   * Obtain attributes of an object given a dentry
   */
 -int orangefs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 +int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
                     struct kstat *stat, u32 request_mask, unsigned int flags)
  {
        int ret;
        ret = orangefs_inode_getattr(inode,
            request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
        if (ret == 0) {
 -              generic_fillattr(&init_user_ns, inode, stat);
 +              generic_fillattr(&nop_mnt_idmap, inode, stat);
  
                /* override block size reported to stat */
                if (!(request_mask & STATX_SIZE))
        return ret;
  }
  
 -int orangefs_permission(struct user_namespace *mnt_userns,
 +int orangefs_permission(struct mnt_idmap *idmap,
                        struct inode *inode, int mask)
  {
        int ret;
        if (ret < 0)
                return ret;
  
 -      return generic_permission(&init_user_ns, inode, mask);
 +      return generic_permission(&nop_mnt_idmap, inode, mask);
  }
  
  int orangefs_update_time(struct inode *inode, struct timespec64 *time, int flags)
@@@ -936,7 -943,7 +935,7 @@@ static int orangefs_fileattr_get(struc
        return 0;
  }
  
 -static int orangefs_fileattr_set(struct user_namespace *mnt_userns,
 +static int orangefs_fileattr_set(struct mnt_idmap *idmap,
                                 struct dentry *dentry, struct fileattr *fa)
  {
        u64 val = 0;
diff --combined fs/ramfs/file-nommu.c
index 5bf74c2f6042fdb463e40c087745a4e2243ec9c6,cd45376927513f04a8222cf5d49388dded426f4f..2f67516bb9bf6f6cf22dbd9ea600817891cc618e
@@@ -22,7 -22,7 +22,7 @@@
  #include <linux/uaccess.h>
  #include "internal.h"
  
 -static int ramfs_nommu_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 +static int ramfs_nommu_setattr(struct mnt_idmap *, struct dentry *, struct iattr *);
  static unsigned long ramfs_nommu_get_unmapped_area(struct file *file,
                                                   unsigned long addr,
                                                   unsigned long len,
@@@ -158,7 -158,7 +158,7 @@@ static int ramfs_nommu_resize(struct in
   * handle a change of attributes
   * - we're specifically interested in a change of size
   */
 -static int ramfs_nommu_setattr(struct user_namespace *mnt_userns,
 +static int ramfs_nommu_setattr(struct mnt_idmap *idmap,
                               struct dentry *dentry, struct iattr *ia)
  {
        struct inode *inode = d_inode(dentry);
        int ret = 0;
  
        /* POSIX UID/GID verification for setting inode attributes */
 -      ret = setattr_prepare(&init_user_ns, dentry, ia);
 +      ret = setattr_prepare(&nop_mnt_idmap, dentry, ia);
        if (ret)
                return ret;
  
                }
        }
  
 -      setattr_copy(&init_user_ns, inode, ia);
 +      setattr_copy(&nop_mnt_idmap, inode, ia);
   out:
        ia->ia_valid = old_ia_valid;
        return ret;
@@@ -264,7 -264,7 +264,7 @@@ out
   */
  static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma)
  {
-       if (!(vma->vm_flags & (VM_SHARED | VM_MAYSHARE)))
+       if (!is_nommu_shared_mapping(vma->vm_flags))
                return -ENOSYS;
  
        file_accessed(file);
diff --combined fs/udf/inode.c
index 3b2adf4cbc5793b2e85825c10962e51cccb4f907,34e416327dd4ee2e4f3e399a0b50eabdd7ff44e1..f7a9607c2b9578ce459dee3ebc87360ea834f327
  #define FE_DELETE_PERMS       (FE_PERM_U_DELETE | FE_PERM_G_DELETE | \
                         FE_PERM_O_DELETE)
  
 +struct udf_map_rq;
 +
  static umode_t udf_convert_permissions(struct fileEntry *);
  static int udf_update_inode(struct inode *, int);
  static int udf_sync_inode(struct inode *inode);
  static int udf_alloc_i_data(struct inode *inode, size_t size);
 -static sector_t inode_getblk(struct inode *, sector_t, int *, int *);
 -static int8_t udf_insert_aext(struct inode *, struct extent_position,
 -                            struct kernel_lb_addr, uint32_t);
 +static int inode_getblk(struct inode *inode, struct udf_map_rq *map);
 +static int udf_insert_aext(struct inode *, struct extent_position,
 +                         struct kernel_lb_addr, uint32_t);
  static void udf_split_extents(struct inode *, int *, int, udf_pblk_t,
                              struct kernel_long_ad *, int *);
  static void udf_prealloc_extents(struct inode *, int, int,
                                 struct kernel_long_ad *, int *);
  static void udf_merge_extents(struct inode *, struct kernel_long_ad *, int *);
 -static void udf_update_extents(struct inode *, struct kernel_long_ad *, int,
 -                             int, struct extent_position *);
 -static int udf_get_block(struct inode *, sector_t, struct buffer_head *, int);
 +static int udf_update_extents(struct inode *, struct kernel_long_ad *, int,
 +                            int, struct extent_position *);
 +static int udf_get_block_wb(struct inode *inode, sector_t block,
 +                          struct buffer_head *bh_result, int create);
  
  static void __udf_clear_extent_cache(struct inode *inode)
  {
@@@ -185,56 -182,14 +185,57 @@@ static void udf_write_failed(struct add
        }
  }
  
- static int udf_adinicb_writepage(struct page *page,
++static int udf_adinicb_writepage(struct folio *folio,
 +                               struct writeback_control *wbc, void *data)
 +{
++      struct page *page = &folio->page;
 +      struct inode *inode = page->mapping->host;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +
 +      BUG_ON(!PageLocked(page));
 +      memcpy_to_page(page, 0, iinfo->i_data + iinfo->i_lenEAttr,
 +                     i_size_read(inode));
 +      unlock_page(page);
 +      mark_inode_dirty(inode);
 +
 +      return 0;
 +}
 +
  static int udf_writepages(struct address_space *mapping,
 -                      struct writeback_control *wbc)
 +                        struct writeback_control *wbc)
  {
 -      return mpage_writepages(mapping, wbc, udf_get_block);
 +      struct inode *inode = mapping->host;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +
 +      if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB)
 +              return mpage_writepages(mapping, wbc, udf_get_block_wb);
 +      return write_cache_pages(mapping, wbc, udf_adinicb_writepage, NULL);
 +}
 +
 +static void udf_adinicb_readpage(struct page *page)
 +{
 +      struct inode *inode = page->mapping->host;
 +      char *kaddr;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
 +      loff_t isize = i_size_read(inode);
 +
 +      kaddr = kmap_local_page(page);
 +      memcpy(kaddr, iinfo->i_data + iinfo->i_lenEAttr, isize);
 +      memset(kaddr + isize, 0, PAGE_SIZE - isize);
 +      flush_dcache_page(page);
 +      SetPageUptodate(page);
 +      kunmap_local(kaddr);
  }
  
  static int udf_read_folio(struct file *file, struct folio *folio)
  {
 +      struct udf_inode_info *iinfo = UDF_I(file_inode(file));
 +
 +      if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
 +              udf_adinicb_readpage(&folio->page);
 +              folio_unlock(folio);
 +              return 0;
 +      }
        return mpage_read_folio(folio, udf_get_block);
  }
  
@@@ -244,49 -199,15 +245,49 @@@ static void udf_readahead(struct readah
  }
  
  static int udf_write_begin(struct file *file, struct address_space *mapping,
 -                      loff_t pos, unsigned len,
 -                      struct page **pagep, void **fsdata)
 +                         loff_t pos, unsigned len,
 +                         struct page **pagep, void **fsdata)
  {
 +      struct udf_inode_info *iinfo = UDF_I(file_inode(file));
 +      struct page *page;
        int ret;
  
 -      ret = block_write_begin(mapping, pos, len, pagep, udf_get_block);
 -      if (unlikely(ret))
 -              udf_write_failed(mapping, pos + len);
 -      return ret;
 +      if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB) {
 +              ret = block_write_begin(mapping, pos, len, pagep,
 +                                      udf_get_block);
 +              if (unlikely(ret))
 +                      udf_write_failed(mapping, pos + len);
 +              return ret;
 +      }
 +      if (WARN_ON_ONCE(pos >= PAGE_SIZE))
 +              return -EIO;
 +      page = grab_cache_page_write_begin(mapping, 0);
 +      if (!page)
 +              return -ENOMEM;
 +      *pagep = page;
 +      if (!PageUptodate(page))
 +              udf_adinicb_readpage(page);
 +      return 0;
 +}
 +
 +static int udf_write_end(struct file *file, struct address_space *mapping,
 +                       loff_t pos, unsigned len, unsigned copied,
 +                       struct page *page, void *fsdata)
 +{
 +      struct inode *inode = file_inode(file);
 +      loff_t last_pos;
 +
 +      if (UDF_I(inode)->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB)
 +              return generic_write_end(file, mapping, pos, len, copied, page,
 +                                       fsdata);
 +      last_pos = pos + copied;
 +      if (last_pos > inode->i_size)
 +              i_size_write(inode, last_pos);
 +      set_page_dirty(page);
 +      unlock_page(page);
 +      put_page(page);
 +
 +      return copied;
  }
  
  static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
        size_t count = iov_iter_count(iter);
        ssize_t ret;
  
 +      /* Fallback to buffered IO for in-ICB files */
 +      if (UDF_I(inode)->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
 +              return 0;
        ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block);
        if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE))
                udf_write_failed(mapping, iocb->ki_pos + count);
  
  static sector_t udf_bmap(struct address_space *mapping, sector_t block)
  {
 +      struct udf_inode_info *iinfo = UDF_I(mapping->host);
 +
 +      if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
 +              return -EINVAL;
        return generic_block_bmap(mapping, block, udf_get_block);
  }
  
@@@ -322,7 -236,7 +323,7 @@@ const struct address_space_operations u
        .readahead      = udf_readahead,
        .writepages     = udf_writepages,
        .write_begin    = udf_write_begin,
 -      .write_end      = generic_write_end,
 +      .write_end      = udf_write_end,
        .direct_IO      = udf_direct_IO,
        .bmap           = udf_bmap,
        .migrate_folio  = buffer_migrate_folio,
  /*
   * Expand file stored in ICB to a normal one-block-file
   *
 - * This function requires i_data_sem for writing and releases it.
   * This function requires i_mutex held
   */
  int udf_expand_file_adinicb(struct inode *inode)
  {
        struct page *page;
 -      char *kaddr;
        struct udf_inode_info *iinfo = UDF_I(inode);
        int err;
  
        WARN_ON_ONCE(!inode_is_locked(inode));
        if (!iinfo->i_lenAlloc) {
 +              down_write(&iinfo->i_data_sem);
                if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_SHORT_AD))
                        iinfo->i_alloc_type = ICBTAG_FLAG_AD_SHORT;
                else
                mark_inode_dirty(inode);
                return 0;
        }
 -      /*
 -       * Release i_data_sem so that we can lock a page - page lock ranks
 -       * above i_data_sem. i_mutex still protects us against file changes.
 -       */
 -      up_write(&iinfo->i_data_sem);
  
        page = find_or_create_page(inode->i_mapping, 0, GFP_NOFS);
        if (!page)
                return -ENOMEM;
  
 -      if (!PageUptodate(page)) {
 -              kaddr = kmap_atomic(page);
 -              memset(kaddr + iinfo->i_lenAlloc, 0x00,
 -                     PAGE_SIZE - iinfo->i_lenAlloc);
 -              memcpy(kaddr, iinfo->i_data + iinfo->i_lenEAttr,
 -                      iinfo->i_lenAlloc);
 -              flush_dcache_page(page);
 -              SetPageUptodate(page);
 -              kunmap_atomic(kaddr);
 -      }
 +      if (!PageUptodate(page))
 +              udf_adinicb_readpage(page);
        down_write(&iinfo->i_data_sem);
        memset(iinfo->i_data + iinfo->i_lenEAttr, 0x00,
               iinfo->i_lenAlloc);
                iinfo->i_alloc_type = ICBTAG_FLAG_AD_SHORT;
        else
                iinfo->i_alloc_type = ICBTAG_FLAG_AD_LONG;
 -      /* from now on we have normal address_space methods */
 -      inode->i_data.a_ops = &udf_aops;
        set_page_dirty(page);
        unlock_page(page);
        up_write(&iinfo->i_data_sem);
                /* Restore everything back so that we don't lose data... */
                lock_page(page);
                down_write(&iinfo->i_data_sem);
 -              kaddr = kmap_atomic(page);
 -              memcpy(iinfo->i_data + iinfo->i_lenEAttr, kaddr, inode->i_size);
 -              kunmap_atomic(kaddr);
 +              memcpy_to_page(page, 0, iinfo->i_data + iinfo->i_lenEAttr,
 +                             inode->i_size);
                unlock_page(page);
                iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
 -              inode->i_data.a_ops = &udf_adinicb_aops;
                iinfo->i_lenAlloc = inode->i_size;
                up_write(&iinfo->i_data_sem);
        }
        return err;
  }
  
 -struct buffer_head *udf_expand_dir_adinicb(struct inode *inode,
 -                                          udf_pblk_t *block, int *err)
 -{
 -      udf_pblk_t newblock;
 -      struct buffer_head *dbh = NULL;
 -      struct kernel_lb_addr eloc;
 -      uint8_t alloctype;
 -      struct extent_position epos;
 +#define UDF_MAP_CREATE                0x01    /* Mapping can allocate new blocks */
 +#define UDF_MAP_NOPREALLOC    0x02    /* Do not preallocate blocks */
  
 -      struct udf_fileident_bh sfibh, dfibh;
 -      loff_t f_pos = udf_ext0_offset(inode);
 -      int size = udf_ext0_offset(inode) + inode->i_size;
 -      struct fileIdentDesc cfi, *sfi, *dfi;
 -      struct udf_inode_info *iinfo = UDF_I(inode);
 +#define UDF_BLK_MAPPED        0x01    /* Block was successfully mapped */
 +#define UDF_BLK_NEW   0x02    /* Block was freshly allocated */
  
 -      if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_SHORT_AD))
 -              alloctype = ICBTAG_FLAG_AD_SHORT;
 -      else
 -              alloctype = ICBTAG_FLAG_AD_LONG;
 +struct udf_map_rq {
 +      sector_t lblk;
 +      udf_pblk_t pblk;
 +      int iflags;             /* UDF_MAP_ flags determining behavior */
 +      int oflags;             /* UDF_BLK_ flags reporting results */
 +};
  
 -      if (!inode->i_size) {
 -              iinfo->i_alloc_type = alloctype;
 -              mark_inode_dirty(inode);
 -              return NULL;
 -      }
 +static int udf_map_block(struct inode *inode, struct udf_map_rq *map)
 +{
 +      int err;
 +      struct udf_inode_info *iinfo = UDF_I(inode);
  
 -      /* alloc block, and copy data to it */
 -      *block = udf_new_block(inode->i_sb, inode,
 -                             iinfo->i_location.partitionReferenceNum,
 -                             iinfo->i_location.logicalBlockNum, err);
 -      if (!(*block))
 -              return NULL;
 -      newblock = udf_get_pblock(inode->i_sb, *block,
 -                                iinfo->i_location.partitionReferenceNum,
 -                              0);
 -      if (!newblock)
 -              return NULL;
 -      dbh = udf_tgetblk(inode->i_sb, newblock);
 -      if (!dbh)
 -              return NULL;
 -      lock_buffer(dbh);
 -      memset(dbh->b_data, 0x00, inode->i_sb->s_blocksize);
 -      set_buffer_uptodate(dbh);
 -      unlock_buffer(dbh);
 -      mark_buffer_dirty_inode(dbh, inode);
 -
 -      sfibh.soffset = sfibh.eoffset =
 -                      f_pos & (inode->i_sb->s_blocksize - 1);
 -      sfibh.sbh = sfibh.ebh = NULL;
 -      dfibh.soffset = dfibh.eoffset = 0;
 -      dfibh.sbh = dfibh.ebh = dbh;
 -      while (f_pos < size) {
 -              iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
 -              sfi = udf_fileident_read(inode, &f_pos, &sfibh, &cfi, NULL,
 -                                       NULL, NULL, NULL);
 -              if (!sfi) {
 -                      brelse(dbh);
 -                      return NULL;
 -              }
 -              iinfo->i_alloc_type = alloctype;
 -              sfi->descTag.tagLocation = cpu_to_le32(*block);
 -              dfibh.soffset = dfibh.eoffset;
 -              dfibh.eoffset += (sfibh.eoffset - sfibh.soffset);
 -              dfi = (struct fileIdentDesc *)(dbh->b_data + dfibh.soffset);
 -              if (udf_write_fi(inode, sfi, dfi, &dfibh, sfi->impUse,
 -                               udf_get_fi_ident(sfi))) {
 -                      iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
 -                      brelse(dbh);
 -                      return NULL;
 +      map->oflags = 0;
 +      if (!(map->iflags & UDF_MAP_CREATE)) {
 +              struct kernel_lb_addr eloc;
 +              uint32_t elen;
 +              sector_t offset;
 +              struct extent_position epos = {};
 +
 +              down_read(&iinfo->i_data_sem);
 +              if (inode_bmap(inode, map->lblk, &epos, &eloc, &elen, &offset)
 +                              == (EXT_RECORDED_ALLOCATED >> 30)) {
 +                      map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc,
 +                                                      offset);
 +                      map->oflags |= UDF_BLK_MAPPED;
                }
 -      }
 -      mark_buffer_dirty_inode(dbh, inode);
 +              up_read(&iinfo->i_data_sem);
 +              brelse(epos.bh);
  
 -      memset(iinfo->i_data + iinfo->i_lenEAttr, 0, iinfo->i_lenAlloc);
 -      iinfo->i_lenAlloc = 0;
 -      eloc.logicalBlockNum = *block;
 -      eloc.partitionReferenceNum =
 -                              iinfo->i_location.partitionReferenceNum;
 -      iinfo->i_lenExtents = inode->i_size;
 -      epos.bh = NULL;
 -      epos.block = iinfo->i_location;
 -      epos.offset = udf_file_entry_alloc_offset(inode);
 -      udf_add_aext(inode, &epos, &eloc, inode->i_size, 0);
 -      /* UniqueID stuff */
 -
 -      brelse(epos.bh);
 -      mark_inode_dirty(inode);
 -      return dbh;
 -}
 -
 -static int udf_get_block(struct inode *inode, sector_t block,
 -                       struct buffer_head *bh_result, int create)
 -{
 -      int err, new;
 -      sector_t phys = 0;
 -      struct udf_inode_info *iinfo;
 -
 -      if (!create) {
 -              phys = udf_block_map(inode, block);
 -              if (phys)
 -                      map_bh(bh_result, inode->i_sb, phys);
                return 0;
        }
  
 -      err = -EIO;
 -      new = 0;
 -      iinfo = UDF_I(inode);
 -
        down_write(&iinfo->i_data_sem);
 -      if (block == iinfo->i_next_alloc_block + 1) {
 -              iinfo->i_next_alloc_block++;
 -              iinfo->i_next_alloc_goal++;
 -      }
 -
        /*
         * Block beyond EOF and prealloc extents? Just discard preallocation
         * as it is not useful and complicates things.
         */
 -      if (((loff_t)block) << inode->i_blkbits > iinfo->i_lenExtents)
 +      if (((loff_t)map->lblk) << inode->i_blkbits >= iinfo->i_lenExtents)
                udf_discard_prealloc(inode);
        udf_clear_extent_cache(inode);
 -      phys = inode_getblk(inode, block, &err, &new);
 -      if (!phys)
 -              goto abort;
 -
 -      if (new)
 -              set_buffer_new(bh_result);
 -      map_bh(bh_result, inode->i_sb, phys);
 -
 -abort:
 +      err = inode_getblk(inode, map);
        up_write(&iinfo->i_data_sem);
        return err;
  }
  
 -static struct buffer_head *udf_getblk(struct inode *inode, udf_pblk_t block,
 -                                    int create, int *err)
 +static int __udf_get_block(struct inode *inode, sector_t block,
 +                         struct buffer_head *bh_result, int flags)
  {
 -      struct buffer_head *bh;
 -      struct buffer_head dummy;
 -
 -      dummy.b_state = 0;
 -      dummy.b_blocknr = -1000;
 -      *err = udf_get_block(inode, block, &dummy, create);
 -      if (!*err && buffer_mapped(&dummy)) {
 -              bh = sb_getblk(inode->i_sb, dummy.b_blocknr);
 -              if (buffer_new(&dummy)) {
 -                      lock_buffer(bh);
 -                      memset(bh->b_data, 0x00, inode->i_sb->s_blocksize);
 -                      set_buffer_uptodate(bh);
 -                      unlock_buffer(bh);
 -                      mark_buffer_dirty_inode(bh, inode);
 -              }
 -              return bh;
 +      int err;
 +      struct udf_map_rq map = {
 +              .lblk = block,
 +              .iflags = flags,
 +      };
 +
 +      err = udf_map_block(inode, &map);
 +      if (err < 0)
 +              return err;
 +      if (map.oflags & UDF_BLK_MAPPED) {
 +              map_bh(bh_result, inode->i_sb, map.pblk);
 +              if (map.oflags & UDF_BLK_NEW)
 +                      set_buffer_new(bh_result);
        }
 +      return 0;
 +}
  
 -      return NULL;
 +int udf_get_block(struct inode *inode, sector_t block,
 +                struct buffer_head *bh_result, int create)
 +{
 +      int flags = create ? UDF_MAP_CREATE : 0;
 +
 +      /*
 +       * We preallocate blocks only for regular files. It also makes sense
 +       * for directories but there's a problem when to drop the
 +       * preallocation. We might use some delayed work for that but I feel
 +       * it's overengineering for a filesystem like UDF.
 +       */
 +      if (!S_ISREG(inode->i_mode))
 +              flags |= UDF_MAP_NOPREALLOC;
 +      return __udf_get_block(inode, block, bh_result, flags);
 +}
 +
 +/*
 + * We shouldn't be allocating blocks on page writeback since we allocate them
 + * on page fault. We can spot dirty buffers without allocated blocks though
 + * when truncate expands file. These however don't have valid data so we can
 + * safely ignore them. So never allocate blocks from page writeback.
 + */
 +static int udf_get_block_wb(struct inode *inode, sector_t block,
 +                          struct buffer_head *bh_result, int create)
 +{
 +      return __udf_get_block(inode, block, bh_result, 0);
  }
  
  /* Extend the file with new blocks totaling 'new_block_bytes',
@@@ -518,7 -509,6 +519,7 @@@ static int udf_do_extend_file(struct in
                        ~(sb->s_blocksize - 1);
        }
  
 +      add = 0;
        /* Can we merge with the previous extent? */
        if ((last_ext->extLength & UDF_EXTENT_FLAG_MASK) ==
                                        EXT_NOT_RECORDED_NOT_ALLOCATED) {
        }
  
        if (fake) {
 -              udf_add_aext(inode, last_pos, &last_ext->extLocation,
 -                           last_ext->extLength, 1);
 +              err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
 +                                 last_ext->extLength, 1);
 +              if (err < 0)
 +                      goto out_err;
                count++;
        } else {
                struct kernel_lb_addr tmploc;
                if (new_block_bytes)
                        udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
        }
 +      iinfo->i_lenExtents += add;
  
        /* Managed to do everything necessary? */
        if (!new_block_bytes)
                err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
                                   last_ext->extLength, 1);
                if (err)
 -                      return err;
 +                      goto out_err;
 +              iinfo->i_lenExtents += add;
                count++;
        }
        if (new_block_bytes) {
                err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
                                   last_ext->extLength, 1);
                if (err)
 -                      return err;
 +                      goto out_err;
 +              iinfo->i_lenExtents += new_block_bytes;
                count++;
        }
  
@@@ -594,11 -579,6 +595,11 @@@ out
                return -EIO;
  
        return count;
 +out_err:
 +      /* Remove extents we've created so far */
 +      udf_clear_extent_cache(inode);
 +      udf_truncate_extents(inode);
 +      return err;
  }
  
  /* Extend the final block of the file to final_block_len bytes */
@@@ -646,7 -626,6 +647,7 @@@ static int udf_extend_file(struct inod
        else
                BUG();
  
 +      down_write(&iinfo->i_data_sem);
        /*
         * When creating hole in file, just don't bother with preserving
         * preallocation. It likely won't be very useful anyway.
        if (err < 0)
                goto out;
        err = 0;
 -      iinfo->i_lenExtents = newsize;
  out:
        brelse(epos.bh);
 +      up_write(&iinfo->i_data_sem);
        return err;
  }
  
 -static sector_t inode_getblk(struct inode *inode, sector_t block,
 -                           int *err, int *new)
 +static int inode_getblk(struct inode *inode, struct udf_map_rq *map)
  {
        struct kernel_long_ad laarr[EXTENT_MERGE_SIZE];
        struct extent_position prev_epos, cur_epos, next_epos;
        struct kernel_lb_addr eloc, tmpeloc;
        int c = 1;
        loff_t lbcount = 0, b_off = 0;
 -      udf_pblk_t newblocknum, newblock = 0;
 +      udf_pblk_t newblocknum;
        sector_t offset = 0;
        int8_t etype;
        struct udf_inode_info *iinfo = UDF_I(inode);
        udf_pblk_t goal = 0, pgoal = iinfo->i_location.logicalBlockNum;
        int lastblock = 0;
        bool isBeyondEOF;
 +      int ret = 0;
  
 -      *err = 0;
 -      *new = 0;
        prev_epos.offset = udf_file_entry_alloc_offset(inode);
        prev_epos.block = iinfo->i_location;
        prev_epos.bh = NULL;
        cur_epos = next_epos = prev_epos;
 -      b_off = (loff_t)block << inode->i_sb->s_blocksize_bits;
 +      b_off = (loff_t)map->lblk << inode->i_sb->s_blocksize_bits;
  
        /* find the extent which contains the block we are looking for.
           alternate between laarr[0] and laarr[1] for locations of the
                        elen = EXT_RECORDED_ALLOCATED |
                                ((elen + inode->i_sb->s_blocksize - 1) &
                                 ~(inode->i_sb->s_blocksize - 1));
 +                      iinfo->i_lenExtents =
 +                              ALIGN(iinfo->i_lenExtents,
 +                                    inode->i_sb->s_blocksize);
                        udf_write_aext(inode, &cur_epos, &eloc, elen, 1);
                }
 -              newblock = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
 +              map->oflags = UDF_BLK_MAPPED;
 +              map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
                goto out_free;
        }
  
        /* Are we beyond EOF and preallocated extent? */
        if (etype == -1) {
 -              int ret;
                loff_t hole_len;
  
                isBeyondEOF = true;
                /* Create extents for the hole between EOF and offset */
                hole_len = (loff_t)offset << inode->i_blkbits;
                ret = udf_do_extend_file(inode, &prev_epos, laarr, hole_len);
 -              if (ret < 0) {
 -                      *err = ret;
 +              if (ret < 0)
                        goto out_free;
 -              }
                c = 0;
                offset = 0;
                count += ret;
 -              /* We are not covered by a preallocated extent? */
 -              if ((laarr[0].extLength & UDF_EXTENT_FLAG_MASK) !=
 -                                              EXT_NOT_RECORDED_ALLOCATED) {
 -                      /* Is there any real extent? - otherwise we overwrite
 -                       * the fake one... */
 -                      if (count)
 -                              c = !c;
 -                      laarr[c].extLength = EXT_NOT_RECORDED_NOT_ALLOCATED |
 -                              inode->i_sb->s_blocksize;
 -                      memset(&laarr[c].extLocation, 0x00,
 -                              sizeof(struct kernel_lb_addr));
 -                      count++;
 -              }
 +              /*
 +               * Is there any real extent? - otherwise we overwrite the fake
 +               * one...
 +               */
 +              if (count)
 +                      c = !c;
 +              laarr[c].extLength = EXT_NOT_RECORDED_NOT_ALLOCATED |
 +                      inode->i_sb->s_blocksize;
 +              memset(&laarr[c].extLocation, 0x00,
 +                      sizeof(struct kernel_lb_addr));
 +              count++;
                endnum = c + 1;
                lastblock = 1;
        } else {
        if ((laarr[c].extLength >> 30) == (EXT_NOT_RECORDED_ALLOCATED >> 30))
                newblocknum = laarr[c].extLocation.logicalBlockNum + offset;
        else { /* otherwise, allocate a new block */
 -              if (iinfo->i_next_alloc_block == block)
 +              if (iinfo->i_next_alloc_block == map->lblk)
                        goal = iinfo->i_next_alloc_goal;
  
                if (!goal) {
  
                newblocknum = udf_new_block(inode->i_sb, inode,
                                iinfo->i_location.partitionReferenceNum,
 -                              goal, err);
 -              if (!newblocknum) {
 -                      *err = -ENOSPC;
 +                              goal, &ret);
 +              if (!newblocknum)
                        goto out_free;
 -              }
                if (isBeyondEOF)
                        iinfo->i_lenExtents += inode->i_sb->s_blocksize;
        }
         * block */
        udf_split_extents(inode, &c, offset, newblocknum, laarr, &endnum);
  
 -      /* We preallocate blocks only for regular files. It also makes sense
 -       * for directories but there's a problem when to drop the
 -       * preallocation. We might use some delayed work for that but I feel
 -       * it's overengineering for a filesystem like UDF. */
 -      if (S_ISREG(inode->i_mode))
 +      if (!(map->iflags & UDF_MAP_NOPREALLOC))
                udf_prealloc_extents(inode, c, lastblock, laarr, &endnum);
  
        /* merge any continuous blocks in laarr */
        /* write back the new extents, inserting new extents if the new number
         * of extents is greater than the old number, and deleting extents if
         * the new number of extents is less than the old number */
 -      udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);
 +      ret = udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);
 +      if (ret < 0)
 +              goto out_free;
  
 -      newblock = udf_get_pblock(inode->i_sb, newblocknum,
 +      map->pblk = udf_get_pblock(inode->i_sb, newblocknum,
                                iinfo->i_location.partitionReferenceNum, 0);
 -      if (!newblock) {
 -              *err = -EIO;
 +      if (!map->pblk) {
 +              ret = -EFSCORRUPTED;
                goto out_free;
        }
 -      *new = 1;
 -      iinfo->i_next_alloc_block = block;
 -      iinfo->i_next_alloc_goal = newblocknum;
 +      map->oflags = UDF_BLK_NEW | UDF_BLK_MAPPED;
 +      iinfo->i_next_alloc_block = map->lblk + 1;
 +      iinfo->i_next_alloc_goal = newblocknum + 1;
        inode->i_ctime = current_time(inode);
  
        if (IS_SYNC(inode))
                udf_sync_inode(inode);
        else
                mark_inode_dirty(inode);
 +      ret = 0;
  out_free:
        brelse(prev_epos.bh);
        brelse(cur_epos.bh);
        brelse(next_epos.bh);
 -      return newblock;
 +      return ret;
  }
  
  static void udf_split_extents(struct inode *inode, int *c, int offset,
@@@ -1095,8 -1080,23 +1096,8 @@@ static void udf_merge_extents(struct in
                        blocksize - 1) >> blocksize_bits)))) {
  
                        if (((li->extLength & UDF_EXTENT_LENGTH_MASK) +
 -                              (lip1->extLength & UDF_EXTENT_LENGTH_MASK) +
 -                              blocksize - 1) & ~UDF_EXTENT_LENGTH_MASK) {
 -                              lip1->extLength = (lip1->extLength -
 -                                                (li->extLength &
 -                                                 UDF_EXTENT_LENGTH_MASK) +
 -                                                 UDF_EXTENT_LENGTH_MASK) &
 -                                                      ~(blocksize - 1);
 -                              li->extLength = (li->extLength &
 -                                               UDF_EXTENT_FLAG_MASK) +
 -                                              (UDF_EXTENT_LENGTH_MASK + 1) -
 -                                              blocksize;
 -                              lip1->extLocation.logicalBlockNum =
 -                                      li->extLocation.logicalBlockNum +
 -                                      ((li->extLength &
 -                                              UDF_EXTENT_LENGTH_MASK) >>
 -                                              blocksize_bits);
 -                      } else {
 +                           (lip1->extLength & UDF_EXTENT_LENGTH_MASK) +
 +                           blocksize - 1) <= UDF_EXTENT_LENGTH_MASK) {
                                li->extLength = lip1->extLength +
                                        (((li->extLength &
                                                UDF_EXTENT_LENGTH_MASK) +
        }
  }
  
 -static void udf_update_extents(struct inode *inode, struct kernel_long_ad *laarr,
 -                             int startnum, int endnum,
 -                             struct extent_position *epos)
 +static int udf_update_extents(struct inode *inode, struct kernel_long_ad *laarr,
 +                            int startnum, int endnum,
 +                            struct extent_position *epos)
  {
        int start = 0, i;
        struct kernel_lb_addr tmploc;
        uint32_t tmplen;
 +      int err;
  
        if (startnum > endnum) {
                for (i = 0; i < (startnum - endnum); i++)
                        udf_delete_aext(inode, *epos);
        } else if (startnum < endnum) {
                for (i = 0; i < (endnum - startnum); i++) {
 -                      udf_insert_aext(inode, *epos, laarr[i].extLocation,
 -                                      laarr[i].extLength);
 +                      err = udf_insert_aext(inode, *epos,
 +                                            laarr[i].extLocation,
 +                                            laarr[i].extLength);
 +                      /*
 +                       * If we fail here, we are likely corrupting the extent
 +                       * list and leaking blocks. At least stop early to
 +                       * limit the damage.
 +                       */
 +                      if (err < 0)
 +                              return err;
                        udf_next_aext(inode, epos, &laarr[i].extLocation,
                                      &laarr[i].extLength, 1);
                        start++;
                udf_write_aext(inode, epos, &laarr[i].extLocation,
                               laarr[i].extLength, 1);
        }
 +      return 0;
  }
  
  struct buffer_head *udf_bread(struct inode *inode, udf_pblk_t block,
                              int create, int *err)
  {
        struct buffer_head *bh = NULL;
 +      struct udf_map_rq map = {
 +              .lblk = block,
 +              .iflags = UDF_MAP_NOPREALLOC | (create ? UDF_MAP_CREATE : 0),
 +      };
  
 -      bh = udf_getblk(inode, block, create, err);
 -      if (!bh)
 +      *err = udf_map_block(inode, &map);
 +      if (*err || !(map.oflags & UDF_BLK_MAPPED))
                return NULL;
  
 +      bh = sb_getblk(inode->i_sb, map.pblk);
 +      if (!bh) {
 +              *err = -ENOMEM;
 +              return NULL;
 +      }
 +      if (map.oflags & UDF_BLK_NEW) {
 +              lock_buffer(bh);
 +              memset(bh->b_data, 0x00, inode->i_sb->s_blocksize);
 +              set_buffer_uptodate(bh);
 +              unlock_buffer(bh);
 +              mark_buffer_dirty_inode(bh, inode);
 +              return bh;
 +      }
 +
        if (bh_read(bh, 0) >= 0)
                return bh;
  
  
  int udf_setsize(struct inode *inode, loff_t newsize)
  {
 -      int err;
 +      int err = 0;
        struct udf_inode_info *iinfo;
        unsigned int bsize = i_blocksize(inode);
  
        if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
                return -EPERM;
  
 +      filemap_invalidate_lock(inode->i_mapping);
        iinfo = UDF_I(inode);
        if (newsize > inode->i_size) {
 -              down_write(&iinfo->i_data_sem);
                if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
 -                      if (bsize <
 +                      if (bsize >=
                            (udf_file_entry_alloc_offset(inode) + newsize)) {
 -                              err = udf_expand_file_adinicb(inode);
 -                              if (err)
 -                                      return err;
                                down_write(&iinfo->i_data_sem);
 -                      } else {
                                iinfo->i_lenAlloc = newsize;
 +                              up_write(&iinfo->i_data_sem);
                                goto set_size;
                        }
 +                      err = udf_expand_file_adinicb(inode);
 +                      if (err)
 +                              goto out_unlock;
                }
                err = udf_extend_file(inode, newsize);
 -              if (err) {
 -                      up_write(&iinfo->i_data_sem);
 -                      return err;
 -              }
 +              if (err)
 +                      goto out_unlock;
  set_size:
 -              up_write(&iinfo->i_data_sem);
                truncate_setsize(inode, newsize);
        } else {
                if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
                err = block_truncate_page(inode->i_mapping, newsize,
                                          udf_get_block);
                if (err)
 -                      return err;
 +                      goto out_unlock;
                truncate_setsize(inode, newsize);
                down_write(&iinfo->i_data_sem);
                udf_clear_extent_cache(inode);
                err = udf_truncate_extents(inode);
                up_write(&iinfo->i_data_sem);
                if (err)
 -                      return err;
 +                      goto out_unlock;
        }
  update_time:
        inode->i_mtime = inode->i_ctime = current_time(inode);
                udf_sync_inode(inode);
        else
                mark_inode_dirty(inode);
 -      return 0;
 +out_unlock:
 +      filemap_invalidate_unlock(inode->i_mapping);
 +      return err;
  }
  
  /*
@@@ -1408,7 -1381,6 +1409,7 @@@ reread
                ret = -EIO;
                goto out;
        }
 +      iinfo->i_hidden = hidden_inode;
        iinfo->i_unique = 0;
        iinfo->i_lenEAttr = 0;
        iinfo->i_lenExtents = 0;
        case ICBTAG_FILE_TYPE_REGULAR:
        case ICBTAG_FILE_TYPE_UNDEF:
        case ICBTAG_FILE_TYPE_VAT20:
 -              if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
 -                      inode->i_data.a_ops = &udf_adinicb_aops;
 -              else
 -                      inode->i_data.a_ops = &udf_aops;
 +              inode->i_data.a_ops = &udf_aops;
                inode->i_op = &udf_file_inode_operations;
                inode->i_fop = &udf_file_operations;
                inode->i_mode |= S_IFREG;
@@@ -1696,7 -1671,7 +1697,7 @@@ static int udf_update_inode(struct inod
        unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits;
        struct udf_inode_info *iinfo = UDF_I(inode);
  
 -      bh = udf_tgetblk(inode->i_sb,
 +      bh = sb_getblk(inode->i_sb,
                        udf_get_lb_pblock(inode->i_sb, &iinfo->i_location, 0));
        if (!bh) {
                udf_debug("getblk failure\n");
  
        if (S_ISDIR(inode->i_mode) && inode->i_nlink > 0)
                fe->fileLinkCount = cpu_to_le16(inode->i_nlink - 1);
 -      else
 -              fe->fileLinkCount = cpu_to_le16(inode->i_nlink);
 +      else {
 +              if (iinfo->i_hidden)
 +                      fe->fileLinkCount = cpu_to_le16(0);
 +              else
 +                      fe->fileLinkCount = cpu_to_le16(inode->i_nlink);
 +      }
  
        fe->informationLength = cpu_to_le64(inode->i_size);
  
@@@ -1917,13 -1888,8 +1918,13 @@@ struct inode *__udf_iget(struct super_b
        if (!inode)
                return ERR_PTR(-ENOMEM);
  
 -      if (!(inode->i_state & I_NEW))
 +      if (!(inode->i_state & I_NEW)) {
 +              if (UDF_I(inode)->i_hidden != hidden_inode) {
 +                      iput(inode);
 +                      return ERR_PTR(-EFSCORRUPTED);
 +              }
                return inode;
 +      }
  
        memcpy(&UDF_I(inode)->i_location, ino, sizeof(struct kernel_lb_addr));
        err = udf_read_inode(inode, hidden_inode);
@@@ -1956,7 -1922,7 +1957,7 @@@ int udf_setup_indirect_aext(struct inod
        neloc.logicalBlockNum = block;
        neloc.partitionReferenceNum = epos->block.partitionReferenceNum;
  
 -      bh = udf_tgetblk(sb, udf_get_lb_pblock(sb, &neloc, 0));
 +      bh = sb_getblk(sb, udf_get_lb_pblock(sb, &neloc, 0));
        if (!bh)
                return -EIO;
        lock_buffer(bh);
@@@ -2173,7 -2139,7 +2174,7 @@@ int8_t udf_next_aext(struct inode *inod
                epos->offset = sizeof(struct allocExtDesc);
                brelse(epos->bh);
                block = udf_get_lb_pblock(inode->i_sb, &epos->block, 0);
 -              epos->bh = udf_tread(inode->i_sb, block);
 +              epos->bh = sb_bread(inode->i_sb, block);
                if (!epos->bh) {
                        udf_debug("reading block %u failed!\n", block);
                        return -1;
@@@ -2237,13 -2203,12 +2238,13 @@@ int8_t udf_current_aext(struct inode *i
        return etype;
  }
  
 -static int8_t udf_insert_aext(struct inode *inode, struct extent_position epos,
 -                            struct kernel_lb_addr neloc, uint32_t nelen)
 +static int udf_insert_aext(struct inode *inode, struct extent_position epos,
 +                         struct kernel_lb_addr neloc, uint32_t nelen)
  {
        struct kernel_lb_addr oeloc;
        uint32_t oelen;
        int8_t etype;
 +      int err;
  
        if (epos.bh)
                get_bh(epos.bh);
                neloc = oeloc;
                nelen = (etype << 30) | oelen;
        }
 -      udf_add_aext(inode, &epos, &neloc, nelen, 1);
 +      err = udf_add_aext(inode, &epos, &neloc, nelen, 1);
        brelse(epos.bh);
  
 -      return (nelen >> 30);
 +      return err;
  }
  
  int8_t udf_delete_aext(struct inode *inode, struct extent_position epos)
@@@ -2374,3 -2339,28 +2375,3 @@@ int8_t inode_bmap(struct inode *inode, 
  
        return etype;
  }
 -
 -udf_pblk_t udf_block_map(struct inode *inode, sector_t block)
 -{
 -      struct kernel_lb_addr eloc;
 -      uint32_t elen;
 -      sector_t offset;
 -      struct extent_position epos = {};
 -      udf_pblk_t ret;
 -
 -      down_read(&UDF_I(inode)->i_data_sem);
 -
 -      if (inode_bmap(inode, block, &epos, &eloc, &elen, &offset) ==
 -                                              (EXT_RECORDED_ALLOCATED >> 30))
 -              ret = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
 -      else
 -              ret = 0;
 -
 -      up_read(&UDF_I(inode)->i_data_sem);
 -      brelse(epos.bh);
 -
 -      if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_VARCONV))
 -              return udf_fixed_to_variable(ret);
 -      else
 -              return ret;
 -}
diff --combined fs/xfs/xfs_file.c
index d06c0cc62f612103b1a882c261dadb5d8f5aaed5,b0039a8fea2ef71027cd1ec2272707bd4ad07de7..705250f9f90a1b3d9fad16cf84df582145a21422
@@@ -1047,7 -1047,7 +1047,7 @@@ xfs_file_fallocate
  
                iattr.ia_valid = ATTR_SIZE;
                iattr.ia_size = new_size;
 -              error = xfs_vn_setattr_size(file_mnt_user_ns(file),
 +              error = xfs_vn_setattr_size(file_mnt_idmap(file),
                                            file_dentry(file), &iattr);
                if (error)
                        goto out_unlock;
@@@ -1429,7 -1429,7 +1429,7 @@@ xfs_file_mmap
        file_accessed(file);
        vma->vm_ops = &xfs_file_vm_ops;
        if (IS_DAX(inode))
-               vma->vm_flags |= VM_HUGEPAGE;
+               vm_flags_set(vma, VM_HUGEPAGE);
        return 0;
  }
  
diff --combined include/linux/blkdev.h
index b9637d63e6f0240053357f7af1b211686653438f,c5e59965b145e3a1333fa98510cb1e76f28abb92..41a41561b77325a5ae5469267d9a305a5df7b58f
@@@ -288,7 -288,6 +288,7 @@@ struct queue_limits 
        unsigned int            max_dev_sectors;
        unsigned int            chunk_sectors;
        unsigned int            max_sectors;
 +      unsigned int            max_user_sectors;
        unsigned int            max_segment_size;
        unsigned int            physical_block_size;
        unsigned int            logical_block_size;
@@@ -485,7 -484,6 +485,7 @@@ struct request_queue 
        DECLARE_BITMAP          (blkcg_pols, BLKCG_MAX_POLS);
        struct blkcg_gq         *root_blkg;
        struct list_head        blkg_list;
 +      struct mutex            blkcg_mutex;
  #endif
  
        struct queue_limits     limits;
  #define QUEUE_FLAG_IO_STAT    7       /* do disk/partitions IO accounting */
  #define QUEUE_FLAG_NOXMERGES  9       /* No extended merges */
  #define QUEUE_FLAG_ADD_RANDOM 10      /* Contributes to random pool */
+ #define QUEUE_FLAG_SYNCHRONOUS        11      /* always completes in submit context */
  #define QUEUE_FLAG_SAME_FORCE 12      /* force complete on same CPU */
  #define QUEUE_FLAG_INIT_DONE  14      /* queue is initialized */
  #define QUEUE_FLAG_STABLE_WRITES 15   /* don't modify blks until WB is done */
@@@ -1097,12 -1096,11 +1098,12 @@@ static inline bool bdev_is_partition(st
  enum blk_default_limits {
        BLK_MAX_SEGMENTS        = 128,
        BLK_SAFE_MAX_SECTORS    = 255,
 -      BLK_DEF_MAX_SECTORS     = 2560,
        BLK_MAX_SEGMENT_SIZE    = 65536,
        BLK_SEG_BOUNDARY_MASK   = 0xFFFFFFFFUL,
  };
  
 +#define BLK_DEF_MAX_SECTORS 2560u
 +
  static inline unsigned long queue_segment_boundary(const struct request_queue *q)
  {
        return q->limits.seg_boundary_mask;
@@@ -1253,6 -1251,12 +1254,12 @@@ static inline bool bdev_nonrot(struct b
        return blk_queue_nonrot(bdev_get_queue(bdev));
  }
  
+ static inline bool bdev_synchronous(struct block_device *bdev)
+ {
+       return test_bit(QUEUE_FLAG_SYNCHRONOUS,
+                       &bdev_get_queue(bdev)->queue_flags);
+ }
  static inline bool bdev_stable_writes(struct block_device *bdev)
  {
        return test_bit(QUEUE_FLAG_STABLE_WRITES,
@@@ -1286,12 -1290,12 +1293,12 @@@ static inline enum blk_zoned_model bdev
  
  static inline bool bdev_is_zoned(struct block_device *bdev)
  {
 -      struct request_queue *q = bdev_get_queue(bdev);
 -
 -      if (q)
 -              return blk_queue_is_zoned(q);
 +      return blk_queue_is_zoned(bdev_get_queue(bdev));
 +}
  
 -      return false;
 +static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
 +{
 +      return disk_zone_no(bdev->bd_disk, sec);
  }
  
  static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
@@@ -1312,18 -1316,6 +1319,18 @@@ static inline sector_t bdev_zone_sector
        return q->limits.chunk_sectors;
  }
  
 +static inline sector_t bdev_offset_from_zone_start(struct block_device *bdev,
 +                                                 sector_t sector)
 +{
 +      return sector & (bdev_zone_sectors(bdev) - 1);
 +}
 +
 +static inline bool bdev_is_zone_start(struct block_device *bdev,
 +                                    sector_t sector)
 +{
 +      return bdev_offset_from_zone_start(bdev, sector) == 0;
 +}
 +
  static inline int queue_dma_alignment(const struct request_queue *q)
  {
        return q ? q->limits.dma_alignment : 511;
@@@ -1397,7 -1389,6 +1404,6 @@@ struct block_device_operations 
                        unsigned int flags);
        int (*open) (struct block_device *, fmode_t);
        void (*release) (struct gendisk *, fmode_t);
-       int (*rw_page)(struct block_device *, sector_t, struct page *, enum req_op);
        int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
        int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
        unsigned int (*check_events) (struct gendisk *disk,
@@@ -1432,10 -1423,6 +1438,6 @@@ extern int blkdev_compat_ptr_ioctl(stru
  #define blkdev_compat_ptr_ioctl NULL
  #endif
  
- extern int bdev_read_page(struct block_device *, sector_t, struct page *);
- extern int bdev_write_page(struct block_device *, sector_t, struct page *,
-                                               struct writeback_control *);
  static inline void blk_wake_io_task(struct task_struct *waiter)
  {
        /*
diff --combined include/linux/fs.h
index d46ae1e525fc9f3f27e4bd965545670384e70bbb,d353c262d669a38999d1b7fcbe2b5c13106fc6d6..c85916e9f7db500474d043c1f6a6b5e5c8d22020
@@@ -166,6 -166,8 +166,8 @@@ typedef int (dio_iodone_t)(struct kioc
  /* File supports DIRECT IO */
  #define       FMODE_CAN_ODIRECT       ((__force fmode_t)0x400000)
  
+ #define       FMODE_NOREUSE           ((__force fmode_t)0x800000)
  /* File was opened by fanotify and shouldn't generate fanotify events */
  #define FMODE_NONOTIFY                ((__force fmode_t)0x4000000)
  
@@@ -1003,11 -1005,135 +1005,11 @@@ static inline struct file *get_file(str
  #define MAX_LFS_FILESIZE      ((loff_t)LLONG_MAX)
  #endif
  
 -#define FL_POSIX      1
 -#define FL_FLOCK      2
 -#define FL_DELEG      4       /* NFSv4 delegation */
 -#define FL_ACCESS     8       /* not trying to lock, just looking */
 -#define FL_EXISTS     16      /* when unlocking, test for existence */
 -#define FL_LEASE      32      /* lease held on this file */
 -#define FL_CLOSE      64      /* unlock on close */
 -#define FL_SLEEP      128     /* A blocking lock */
 -#define FL_DOWNGRADE_PENDING  256 /* Lease is being downgraded */
 -#define FL_UNLOCK_PENDING     512 /* Lease is being broken */
 -#define FL_OFDLCK     1024    /* lock is "owned" by struct file */
 -#define FL_LAYOUT     2048    /* outstanding pNFS layout */
 -#define FL_RECLAIM    4096    /* reclaiming from a reboot server */
 -
 -#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
 -
 -/*
 - * Special return value from posix_lock_file() and vfs_lock_file() for
 - * asynchronous locking.
 - */
 -#define FILE_LOCK_DEFERRED 1
 -
  /* legacy typedef, should eventually be removed */
  typedef void *fl_owner_t;
  
  struct file_lock;
  
 -struct file_lock_operations {
 -      void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
 -      void (*fl_release_private)(struct file_lock *);
 -};
 -
 -struct lock_manager_operations {
 -      void *lm_mod_owner;
 -      fl_owner_t (*lm_get_owner)(fl_owner_t);
 -      void (*lm_put_owner)(fl_owner_t);
 -      void (*lm_notify)(struct file_lock *);  /* unblock callback */
 -      int (*lm_grant)(struct file_lock *, int);
 -      bool (*lm_break)(struct file_lock *);
 -      int (*lm_change)(struct file_lock *, int, struct list_head *);
 -      void (*lm_setup)(struct file_lock *, void **);
 -      bool (*lm_breaker_owns_lease)(struct file_lock *);
 -      bool (*lm_lock_expirable)(struct file_lock *cfl);
 -      void (*lm_expire_lock)(void);
 -};
 -
 -struct lock_manager {
 -      struct list_head list;
 -      /*
 -       * NFSv4 and up also want opens blocked during the grace period;
 -       * NLM doesn't care:
 -       */
 -      bool block_opens;
 -};
 -
 -struct net;
 -void locks_start_grace(struct net *, struct lock_manager *);
 -void locks_end_grace(struct lock_manager *);
 -bool locks_in_grace(struct net *);
 -bool opens_in_grace(struct net *);
 -
 -/* that will die - we need it for nfs_lock_info */
 -#include <linux/nfs_fs_i.h>
 -
 -/*
 - * struct file_lock represents a generic "file lock". It's used to represent
 - * POSIX byte range locks, BSD (flock) locks, and leases. It's important to
 - * note that the same struct is used to represent both a request for a lock and
 - * the lock itself, but the same object is never used for both.
 - *
 - * FIXME: should we create a separate "struct lock_request" to help distinguish
 - * these two uses?
 - *
 - * The varous i_flctx lists are ordered by:
 - *
 - * 1) lock owner
 - * 2) lock range start
 - * 3) lock range end
 - *
 - * Obviously, the last two criteria only matter for POSIX locks.
 - */
 -struct file_lock {
 -      struct file_lock *fl_blocker;   /* The lock, that is blocking us */
 -      struct list_head fl_list;       /* link into file_lock_context */
 -      struct hlist_node fl_link;      /* node in global lists */
 -      struct list_head fl_blocked_requests;   /* list of requests with
 -                                               * ->fl_blocker pointing here
 -                                               */
 -      struct list_head fl_blocked_member;     /* node in
 -                                               * ->fl_blocker->fl_blocked_requests
 -                                               */
 -      fl_owner_t fl_owner;
 -      unsigned int fl_flags;
 -      unsigned char fl_type;
 -      unsigned int fl_pid;
 -      int fl_link_cpu;                /* what cpu's list is this on? */
 -      wait_queue_head_t fl_wait;
 -      struct file *fl_file;
 -      loff_t fl_start;
 -      loff_t fl_end;
 -
 -      struct fasync_struct *  fl_fasync; /* for lease break notifications */
 -      /* for lease breaks: */
 -      unsigned long fl_break_time;
 -      unsigned long fl_downgrade_time;
 -
 -      const struct file_lock_operations *fl_ops;      /* Callbacks for filesystems */
 -      const struct lock_manager_operations *fl_lmops; /* Callbacks for lockmanagers */
 -      union {
 -              struct nfs_lock_info    nfs_fl;
 -              struct nfs4_lock_info   nfs4_fl;
 -              struct {
 -                      struct list_head link;  /* link in AFS vnode's pending_locks list */
 -                      int state;              /* state of grant or error if -ve */
 -                      unsigned int    debug_id;
 -              } afs;
 -              struct {
 -                      struct inode *inode;
 -              } ceph;
 -      } fl_u;
 -} __randomize_layout;
 -
 -struct file_lock_context {
 -      spinlock_t              flc_lock;
 -      struct list_head        flc_flock;
 -      struct list_head        flc_posix;
 -      struct list_head        flc_lease;
 -};
 -
  /* The following constant reflects the upper bound of the file/locking space */
  #ifndef OFFSET_MAX
  #define OFFSET_MAX    type_max(loff_t)
  
  extern void send_sigio(struct fown_struct *fown, int fd, int band);
  
 -#define locks_inode(f) file_inode(f)
 -
 -#ifdef CONFIG_FILE_LOCKING
 -extern int fcntl_getlk(struct file *, unsigned int, struct flock *);
 -extern int fcntl_setlk(unsigned int, struct file *, unsigned int,
 -                      struct flock *);
 -
 -#if BITS_PER_LONG == 32
 -extern int fcntl_getlk64(struct file *, unsigned int, struct flock64 *);
 -extern int fcntl_setlk64(unsigned int, struct file *, unsigned int,
 -                      struct flock64 *);
 -#endif
 -
 -extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
 -extern int fcntl_getlease(struct file *filp);
 -
 -/* fs/locks.c */
 -void locks_free_lock_context(struct inode *inode);
 -void locks_free_lock(struct file_lock *fl);
 -extern void locks_init_lock(struct file_lock *);
 -extern struct file_lock * locks_alloc_lock(void);
 -extern void locks_copy_lock(struct file_lock *, struct file_lock *);
 -extern void locks_copy_conflock(struct file_lock *, struct file_lock *);
 -extern void locks_remove_posix(struct file *, fl_owner_t);
 -extern void locks_remove_file(struct file *);
 -extern void locks_release_private(struct file_lock *);
 -extern void posix_test_lock(struct file *, struct file_lock *);
 -extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
 -extern int locks_delete_block(struct file_lock *);
 -extern int vfs_test_lock(struct file *, struct file_lock *);
 -extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 -extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
 -bool vfs_inode_has_locks(struct inode *inode);
 -extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
 -extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
 -extern void lease_get_mtime(struct inode *, struct timespec64 *time);
 -extern int generic_setlease(struct file *, long, struct file_lock **, void **priv);
 -extern int vfs_setlease(struct file *, long, struct file_lock **, void **);
 -extern int lease_modify(struct file_lock *, int, struct list_head *);
 -
 -struct notifier_block;
 -extern int lease_register_notifier(struct notifier_block *);
 -extern void lease_unregister_notifier(struct notifier_block *);
 -
 -struct files_struct;
 -extern void show_fd_locks(struct seq_file *f,
 -                       struct file *filp, struct files_struct *files);
 -extern bool locks_owner_has_blockers(struct file_lock_context *flctx,
 -                      fl_owner_t owner);
 -
 -static inline struct file_lock_context *
 -locks_inode_context(const struct inode *inode)
 -{
 -      return smp_load_acquire(&inode->i_flctx);
 -}
 -
 -#else /* !CONFIG_FILE_LOCKING */
 -static inline int fcntl_getlk(struct file *file, unsigned int cmd,
 -                            struct flock __user *user)
 -{
 -      return -EINVAL;
 -}
 -
 -static inline int fcntl_setlk(unsigned int fd, struct file *file,
 -                            unsigned int cmd, struct flock __user *user)
 -{
 -      return -EACCES;
 -}
 -
 -#if BITS_PER_LONG == 32
 -static inline int fcntl_getlk64(struct file *file, unsigned int cmd,
 -                              struct flock64 *user)
 -{
 -      return -EINVAL;
 -}
 -
 -static inline int fcntl_setlk64(unsigned int fd, struct file *file,
 -                              unsigned int cmd, struct flock64 *user)
 -{
 -      return -EACCES;
 -}
 -#endif
 -static inline int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 -{
 -      return -EINVAL;
 -}
 -
 -static inline int fcntl_getlease(struct file *filp)
 -{
 -      return F_UNLCK;
 -}
 -
 -static inline void
 -locks_free_lock_context(struct inode *inode)
 -{
 -}
 -
 -static inline void locks_init_lock(struct file_lock *fl)
 -{
 -      return;
 -}
 -
 -static inline void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
 -{
 -      return;
 -}
 -
 -static inline void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
 -{
 -      return;
 -}
 -
 -static inline void locks_remove_posix(struct file *filp, fl_owner_t owner)
 -{
 -      return;
 -}
 -
 -static inline void locks_remove_file(struct file *filp)
 -{
 -      return;
 -}
 -
 -static inline void posix_test_lock(struct file *filp, struct file_lock *fl)
 -{
 -      return;
 -}
 -
 -static inline int posix_lock_file(struct file *filp, struct file_lock *fl,
 -                                struct file_lock *conflock)
 -{
 -      return -ENOLCK;
 -}
 -
 -static inline int locks_delete_block(struct file_lock *waiter)
 -{
 -      return -ENOENT;
 -}
 -
 -static inline int vfs_test_lock(struct file *filp, struct file_lock *fl)
 -{
 -      return 0;
 -}
 -
 -static inline int vfs_lock_file(struct file *filp, unsigned int cmd,
 -                              struct file_lock *fl, struct file_lock *conf)
 -{
 -      return -ENOLCK;
 -}
 -
 -static inline int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 -{
 -      return 0;
 -}
 -
 -static inline bool vfs_inode_has_locks(struct inode *inode)
 -{
 -      return false;
 -}
 -
 -static inline int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 -{
 -      return -ENOLCK;
 -}
 -
 -static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 -{
 -      return 0;
 -}
 -
 -static inline void lease_get_mtime(struct inode *inode,
 -                                 struct timespec64 *time)
 -{
 -      return;
 -}
 -
 -static inline int generic_setlease(struct file *filp, long arg,
 -                                  struct file_lock **flp, void **priv)
 -{
 -      return -EINVAL;
 -}
 -
 -static inline int vfs_setlease(struct file *filp, long arg,
 -                             struct file_lock **lease, void **priv)
 -{
 -      return -EINVAL;
 -}
 -
 -static inline int lease_modify(struct file_lock *fl, int arg,
 -                             struct list_head *dispose)
 -{
 -      return -EINVAL;
 -}
 -
 -struct files_struct;
 -static inline void show_fd_locks(struct seq_file *f,
 -                      struct file *filp, struct files_struct *files) {}
 -static inline bool locks_owner_has_blockers(struct file_lock_context *flctx,
 -                      fl_owner_t owner)
 -{
 -      return false;
 -}
 -
 -static inline struct file_lock_context *
 -locks_inode_context(const struct inode *inode)
 -{
 -      return NULL;
 -}
 -
 -#endif /* !CONFIG_FILE_LOCKING */
 -
  static inline struct inode *file_inode(const struct file *f)
  {
        return f->f_inode;
@@@ -1026,6 -1362,11 +1028,6 @@@ static inline struct dentry *file_dentr
        return d_real(file->f_path.dentry, file_inode(file));
  }
  
 -static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
 -{
 -      return locks_lock_inode_wait(locks_inode(filp), fl);
 -}
 -
  struct fasync_struct {
        rwlock_t                fa_lock;
        int                     magic;
@@@ -1296,22 -1637,22 +1298,22 @@@ static inline void i_gid_write(struct i
  }
  
  /**
 - * i_uid_into_vfsuid - map an inode's i_uid down into a mnt_userns
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * i_uid_into_vfsuid - map an inode's i_uid down according to an idmapping
 + * @idmap: idmap of the mount the inode was found from
   * @inode: inode to map
   *
 - * Return: whe inode's i_uid mapped down according to @mnt_userns.
 + * Return: whe inode's i_uid mapped down according to @idmap.
   * If the inode's i_uid has no mapping INVALID_VFSUID is returned.
   */
 -static inline vfsuid_t i_uid_into_vfsuid(struct user_namespace *mnt_userns,
 +static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
                                         const struct inode *inode)
  {
 -      return make_vfsuid(mnt_userns, i_user_ns(inode), inode->i_uid);
 +      return make_vfsuid(idmap, i_user_ns(inode), inode->i_uid);
  }
  
  /**
   * i_uid_needs_update - check whether inode's i_uid needs to be updated
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   * @attr: the new attributes of @inode
   * @inode: the inode to update
   *
   *
   * Return: true if @inode's i_uid field needs to be updated, false if not.
   */
 -static inline bool i_uid_needs_update(struct user_namespace *mnt_userns,
 +static inline bool i_uid_needs_update(struct mnt_idmap *idmap,
                                      const struct iattr *attr,
                                      const struct inode *inode)
  {
        return ((attr->ia_valid & ATTR_UID) &&
                !vfsuid_eq(attr->ia_vfsuid,
 -                         i_uid_into_vfsuid(mnt_userns, inode)));
 +                         i_uid_into_vfsuid(idmap, inode)));
  }
  
  /**
   * i_uid_update - update @inode's i_uid field
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   * @attr: the new attributes of @inode
   * @inode: the inode to update
   *
   * Safely update @inode's i_uid field translating the vfsuid of any idmapped
   * mount into the filesystem kuid.
   */
 -static inline void i_uid_update(struct user_namespace *mnt_userns,
 +static inline void i_uid_update(struct mnt_idmap *idmap,
                                const struct iattr *attr,
                                struct inode *inode)
  {
        if (attr->ia_valid & ATTR_UID)
 -              inode->i_uid = from_vfsuid(mnt_userns, i_user_ns(inode),
 +              inode->i_uid = from_vfsuid(idmap, i_user_ns(inode),
                                           attr->ia_vfsuid);
  }
  
  /**
 - * i_gid_into_vfsgid - map an inode's i_gid down into a mnt_userns
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * i_gid_into_vfsgid - map an inode's i_gid down according to an idmapping
 + * @idmap: idmap of the mount the inode was found from
   * @inode: inode to map
   *
 - * Return: the inode's i_gid mapped down according to @mnt_userns.
 + * Return: the inode's i_gid mapped down according to @idmap.
   * If the inode's i_gid has no mapping INVALID_VFSGID is returned.
   */
 -static inline vfsgid_t i_gid_into_vfsgid(struct user_namespace *mnt_userns,
 +static inline vfsgid_t i_gid_into_vfsgid(struct mnt_idmap *idmap,
                                         const struct inode *inode)
  {
 -      return make_vfsgid(mnt_userns, i_user_ns(inode), inode->i_gid);
 +      return make_vfsgid(idmap, i_user_ns(inode), inode->i_gid);
  }
  
  /**
   * i_gid_needs_update - check whether inode's i_gid needs to be updated
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   * @attr: the new attributes of @inode
   * @inode: the inode to update
   *
   *
   * Return: true if @inode's i_gid field needs to be updated, false if not.
   */
 -static inline bool i_gid_needs_update(struct user_namespace *mnt_userns,
 +static inline bool i_gid_needs_update(struct mnt_idmap *idmap,
                                      const struct iattr *attr,
                                      const struct inode *inode)
  {
        return ((attr->ia_valid & ATTR_GID) &&
                !vfsgid_eq(attr->ia_vfsgid,
 -                         i_gid_into_vfsgid(mnt_userns, inode)));
 +                         i_gid_into_vfsgid(idmap, inode)));
  }
  
  /**
   * i_gid_update - update @inode's i_gid field
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   * @attr: the new attributes of @inode
   * @inode: the inode to update
   *
   * Safely update @inode's i_gid field translating the vfsgid of any idmapped
   * mount into the filesystem kgid.
   */
 -static inline void i_gid_update(struct user_namespace *mnt_userns,
 +static inline void i_gid_update(struct mnt_idmap *idmap,
                                const struct iattr *attr,
                                struct inode *inode)
  {
        if (attr->ia_valid & ATTR_GID)
 -              inode->i_gid = from_vfsgid(mnt_userns, i_user_ns(inode),
 +              inode->i_gid = from_vfsgid(idmap, i_user_ns(inode),
                                           attr->ia_vfsgid);
  }
  
  /**
   * inode_fsuid_set - initialize inode's i_uid field with callers fsuid
   * @inode: inode to initialize
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   *
   * Initialize the i_uid field of @inode. If the inode was found/created via
 - * an idmapped mount map the caller's fsuid according to @mnt_users.
 + * an idmapped mount map the caller's fsuid according to @idmap.
   */
  static inline void inode_fsuid_set(struct inode *inode,
 -                                 struct user_namespace *mnt_userns)
 +                                 struct mnt_idmap *idmap)
  {
 -      inode->i_uid = mapped_fsuid(mnt_userns, i_user_ns(inode));
 +      inode->i_uid = mapped_fsuid(idmap, i_user_ns(inode));
  }
  
  /**
   * inode_fsgid_set - initialize inode's i_gid field with callers fsgid
   * @inode: inode to initialize
 - * @mnt_userns: user namespace of the mount the inode was found from
 + * @idmap: idmap of the mount the inode was found from
   *
   * Initialize the i_gid field of @inode. If the inode was found/created via
 - * an idmapped mount map the caller's fsgid according to @mnt_users.
 + * an idmapped mount map the caller's fsgid according to @idmap.
   */
  static inline void inode_fsgid_set(struct inode *inode,
 -                                 struct user_namespace *mnt_userns)
 +                                 struct mnt_idmap *idmap)
  {
 -      inode->i_gid = mapped_fsgid(mnt_userns, i_user_ns(inode));
 +      inode->i_gid = mapped_fsgid(idmap, i_user_ns(inode));
  }
  
  /**
   * fsuidgid_has_mapping() - check whether caller's fsuid/fsgid is mapped
   * @sb: the superblock we want a mapping in
 - * @mnt_userns: user namespace of the relevant mount
 + * @idmap: idmap of the relevant mount
   *
   * Check whether the caller's fsuid and fsgid have a valid mapping in the
   * s_user_ns of the superblock @sb. If the caller is on an idmapped mount map
 - * the caller's fsuid and fsgid according to the @mnt_userns first.
 + * the caller's fsuid and fsgid according to the @idmap first.
   *
   * Return: true if fsuid and fsgid is mapped, false if not.
   */
  static inline bool fsuidgid_has_mapping(struct super_block *sb,
 -                                      struct user_namespace *mnt_userns)
 +                                      struct mnt_idmap *idmap)
  {
        struct user_namespace *fs_userns = sb->s_user_ns;
        kuid_t kuid;
        kgid_t kgid;
  
 -      kuid = mapped_fsuid(mnt_userns, fs_userns);
 +      kuid = mapped_fsuid(idmap, fs_userns);
        if (!uid_valid(kuid))
                return false;
 -      kgid = mapped_fsgid(mnt_userns, fs_userns);
 +      kgid = mapped_fsgid(idmap, fs_userns);
        if (!gid_valid(kgid))
                return false;
        return kuid_has_mapping(fs_userns, kuid) &&
@@@ -1602,42 -1943,42 +1604,42 @@@ static inline bool sb_start_intwrite_tr
        return __sb_start_write_trylock(sb, SB_FREEZE_FS);
  }
  
 -bool inode_owner_or_capable(struct user_namespace *mnt_userns,
 +bool inode_owner_or_capable(struct mnt_idmap *idmap,
                            const struct inode *inode);
  
  /*
   * VFS helper functions..
   */
 -int vfs_create(struct user_namespace *, struct inode *,
 +int vfs_create(struct mnt_idmap *, struct inode *,
               struct dentry *, umode_t, bool);
 -int vfs_mkdir(struct user_namespace *, struct inode *,
 +int vfs_mkdir(struct mnt_idmap *, struct inode *,
              struct dentry *, umode_t);
 -int vfs_mknod(struct user_namespace *, struct inode *, struct dentry *,
 +int vfs_mknod(struct mnt_idmap *, struct inode *, struct dentry *,
                umode_t, dev_t);
 -int vfs_symlink(struct user_namespace *, struct inode *,
 +int vfs_symlink(struct mnt_idmap *, struct inode *,
                struct dentry *, const char *);
 -int vfs_link(struct dentry *, struct user_namespace *, struct inode *,
 +int vfs_link(struct dentry *, struct mnt_idmap *, struct inode *,
             struct dentry *, struct inode **);
 -int vfs_rmdir(struct user_namespace *, struct inode *, struct dentry *);
 -int vfs_unlink(struct user_namespace *, struct inode *, struct dentry *,
 +int vfs_rmdir(struct mnt_idmap *, struct inode *, struct dentry *);
 +int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
               struct inode **);
  
  /**
   * struct renamedata - contains all information required for renaming
 - * @old_mnt_userns:    old user namespace of the mount the inode was found from
 + * @old_mnt_idmap:     idmap of the old mount the inode was found from
   * @old_dir:           parent of source
   * @old_dentry:                source
 - * @new_mnt_userns:    new user namespace of the mount the inode was found from
 + * @new_mnt_idmap:     idmap of the new mount the inode was found from
   * @new_dir:           parent of destination
   * @new_dentry:                destination
   * @delegated_inode:   returns an inode needing a delegation break
   * @flags:             rename flags
   */
  struct renamedata {
 -      struct user_namespace *old_mnt_userns;
 +      struct mnt_idmap *old_mnt_idmap;
        struct inode *old_dir;
        struct dentry *old_dentry;
 -      struct user_namespace *new_mnt_userns;
 +      struct mnt_idmap *new_mnt_idmap;
        struct inode *new_dir;
        struct dentry *new_dentry;
        struct inode **delegated_inode;
  
  int vfs_rename(struct renamedata *);
  
 -static inline int vfs_whiteout(struct user_namespace *mnt_userns,
 +static inline int vfs_whiteout(struct mnt_idmap *idmap,
                               struct inode *dir, struct dentry *dentry)
  {
 -      return vfs_mknod(mnt_userns, dir, dentry, S_IFCHR | WHITEOUT_MODE,
 +      return vfs_mknod(idmap, dir, dentry, S_IFCHR | WHITEOUT_MODE,
                         WHITEOUT_DEV);
  }
  
 -struct file *vfs_tmpfile_open(struct user_namespace *mnt_userns,
 +struct file *vfs_tmpfile_open(struct mnt_idmap *idmap,
                        const struct path *parentpath,
                        umode_t mode, int open_flag, const struct cred *cred);
  
@@@ -1677,10 -2018,10 +1679,10 @@@ extern long compat_ptr_ioctl(struct fil
  /*
   * VFS file helper functions.
   */
 -void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
 +void inode_init_owner(struct mnt_idmap *idmap, struct inode *inode,
                      const struct inode *dir, umode_t mode);
  extern bool may_open_dev(const struct path *path);
 -umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
 +umode_t mode_strip_sgid(struct mnt_idmap *idmap,
                        const struct inode *dir, umode_t mode);
  
  /*
@@@ -1798,26 -2139,27 +1800,26 @@@ struct file_operations 
  struct inode_operations {
        struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
        const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
 -      int (*permission) (struct user_namespace *, struct inode *, int);
 +      int (*permission) (struct mnt_idmap *, struct inode *, int);
        struct posix_acl * (*get_inode_acl)(struct inode *, int, bool);
  
        int (*readlink) (struct dentry *, char __user *,int);
  
 -      int (*create) (struct user_namespace *, struct inode *,struct dentry *,
 +      int (*create) (struct mnt_idmap *, struct inode *,struct dentry *,
                       umode_t, bool);
        int (*link) (struct dentry *,struct inode *,struct dentry *);
        int (*unlink) (struct inode *,struct dentry *);
 -      int (*symlink) (struct user_namespace *, struct inode *,struct dentry *,
 +      int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,
                        const char *);
 -      int (*mkdir) (struct user_namespace *, struct inode *,struct dentry *,
 +      int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,
                      umode_t);
        int (*rmdir) (struct inode *,struct dentry *);
 -      int (*mknod) (struct user_namespace *, struct inode *,struct dentry *,
 +      int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,
                      umode_t,dev_t);
 -      int (*rename) (struct user_namespace *, struct inode *, struct dentry *,
 +      int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
                        struct inode *, struct dentry *, unsigned int);
 -      int (*setattr) (struct user_namespace *, struct dentry *,
 -                      struct iattr *);
 -      int (*getattr) (struct user_namespace *, const struct path *,
 +      int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *);
 +      int (*getattr) (struct mnt_idmap *, const struct path *,
                        struct kstat *, u32, unsigned int);
        ssize_t (*listxattr) (struct dentry *, char *, size_t);
        int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
        int (*atomic_open)(struct inode *, struct dentry *,
                           struct file *, unsigned open_flag,
                           umode_t create_mode);
 -      int (*tmpfile) (struct user_namespace *, struct inode *,
 +      int (*tmpfile) (struct mnt_idmap *, struct inode *,
                        struct file *, umode_t);
 -      struct posix_acl *(*get_acl)(struct user_namespace *, struct dentry *,
 +      struct posix_acl *(*get_acl)(struct mnt_idmap *, struct dentry *,
                                     int);
 -      int (*set_acl)(struct user_namespace *, struct dentry *,
 +      int (*set_acl)(struct mnt_idmap *, struct dentry *,
                       struct posix_acl *, int);
 -      int (*fileattr_set)(struct user_namespace *mnt_userns,
 +      int (*fileattr_set)(struct mnt_idmap *idmap,
                            struct dentry *dentry, struct fileattr *fa);
        int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
  } ____cacheline_aligned;
@@@ -1986,11 -2328,11 +1988,11 @@@ static inline bool sb_rdonly(const stru
  #define IS_WHITEOUT(inode)    (S_ISCHR(inode->i_mode) && \
                                 (inode)->i_rdev == WHITEOUT_DEV)
  
 -static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns,
 +static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
                                   struct inode *inode)
  {
 -      return !vfsuid_valid(i_uid_into_vfsuid(mnt_userns, inode)) ||
 -             !vfsgid_valid(i_gid_into_vfsgid(mnt_userns, inode));
 +      return !vfsuid_valid(i_uid_into_vfsuid(idmap, inode)) ||
 +             !vfsgid_valid(i_gid_into_vfsgid(idmap, inode));
  }
  
  static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
@@@ -2284,6 -2626,96 +2286,6 @@@ extern struct kobject *fs_kobj
  
  #define MAX_RW_COUNT (INT_MAX & PAGE_MASK)
  
 -#ifdef CONFIG_FILE_LOCKING
 -static inline int break_lease(struct inode *inode, unsigned int mode)
 -{
 -      /*
 -       * Since this check is lockless, we must ensure that any refcounts
 -       * taken are done before checking i_flctx->flc_lease. Otherwise, we
 -       * could end up racing with tasks trying to set a new lease on this
 -       * file.
 -       */
 -      smp_mb();
 -      if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
 -              return __break_lease(inode, mode, FL_LEASE);
 -      return 0;
 -}
 -
 -static inline int break_deleg(struct inode *inode, unsigned int mode)
 -{
 -      /*
 -       * Since this check is lockless, we must ensure that any refcounts
 -       * taken are done before checking i_flctx->flc_lease. Otherwise, we
 -       * could end up racing with tasks trying to set a new lease on this
 -       * file.
 -       */
 -      smp_mb();
 -      if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
 -              return __break_lease(inode, mode, FL_DELEG);
 -      return 0;
 -}
 -
 -static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
 -{
 -      int ret;
 -
 -      ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
 -      if (ret == -EWOULDBLOCK && delegated_inode) {
 -              *delegated_inode = inode;
 -              ihold(inode);
 -      }
 -      return ret;
 -}
 -
 -static inline int break_deleg_wait(struct inode **delegated_inode)
 -{
 -      int ret;
 -
 -      ret = break_deleg(*delegated_inode, O_WRONLY);
 -      iput(*delegated_inode);
 -      *delegated_inode = NULL;
 -      return ret;
 -}
 -
 -static inline int break_layout(struct inode *inode, bool wait)
 -{
 -      smp_mb();
 -      if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
 -              return __break_lease(inode,
 -                              wait ? O_WRONLY : O_WRONLY | O_NONBLOCK,
 -                              FL_LAYOUT);
 -      return 0;
 -}
 -
 -#else /* !CONFIG_FILE_LOCKING */
 -static inline int break_lease(struct inode *inode, unsigned int mode)
 -{
 -      return 0;
 -}
 -
 -static inline int break_deleg(struct inode *inode, unsigned int mode)
 -{
 -      return 0;
 -}
 -
 -static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
 -{
 -      return 0;
 -}
 -
 -static inline int break_deleg_wait(struct inode **delegated_inode)
 -{
 -      BUG();
 -      return 0;
 -}
 -
 -static inline int break_layout(struct inode *inode, bool wait)
 -{
 -      return 0;
 -}
 -
 -#endif /* CONFIG_FILE_LOCKING */
 -
  /* fs/open.c */
  struct audit_names;
  struct filename {
  };
  static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
  
 -static inline struct user_namespace *file_mnt_user_ns(struct file *file)
 -{
 -      return mnt_user_ns(file->f_path.mnt);
 -}
 -
  static inline struct mnt_idmap *file_mnt_idmap(struct file *file)
  {
        return mnt_idmap(file->f_path.mnt);
@@@ -2314,7 -2751,7 +2316,7 @@@ static inline bool is_idmapped_mnt(cons
  }
  
  extern long vfs_truncate(const struct path *, loff_t);
 -int do_truncate(struct user_namespace *, struct dentry *, loff_t start,
 +int do_truncate(struct mnt_idmap *, struct dentry *, loff_t start,
                unsigned int time_attrs, struct file *filp);
  extern int vfs_fallocate(struct file *file, int mode, loff_t offset,
                        loff_t len);
@@@ -2469,21 -2906,21 +2471,21 @@@ static inline int bmap(struct inode *in
  }
  #endif
  
 -int notify_change(struct user_namespace *, struct dentry *,
 +int notify_change(struct mnt_idmap *, struct dentry *,
                  struct iattr *, struct inode **);
 -int inode_permission(struct user_namespace *, struct inode *, int);
 -int generic_permission(struct user_namespace *, struct inode *, int);
 +int inode_permission(struct mnt_idmap *, struct inode *, int);
 +int generic_permission(struct mnt_idmap *, struct inode *, int);
  static inline int file_permission(struct file *file, int mask)
  {
 -      return inode_permission(file_mnt_user_ns(file),
 +      return inode_permission(file_mnt_idmap(file),
                                file_inode(file), mask);
  }
  static inline int path_permission(const struct path *path, int mask)
  {
 -      return inode_permission(mnt_user_ns(path->mnt),
 +      return inode_permission(mnt_idmap(path->mnt),
                                d_inode(path->dentry), mask);
  }
 -int __check_sticky(struct user_namespace *mnt_userns, struct inode *dir,
 +int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
                   struct inode *inode);
  
  static inline bool execute_ok(struct inode *inode)
@@@ -2671,7 -3108,7 +2673,7 @@@ extern void __destroy_inode(struct inod
  extern struct inode *new_inode_pseudo(struct super_block *sb);
  extern struct inode *new_inode(struct super_block *sb);
  extern void free_inode_nonrcu(struct inode *inode);
 -extern int setattr_should_drop_suidgid(struct user_namespace *, struct inode *);
 +extern int setattr_should_drop_suidgid(struct mnt_idmap *, struct inode *);
  extern int file_remove_privs(struct file *);
  
  /*
@@@ -2728,12 -3165,6 +2730,12 @@@ ssize_t vfs_iocb_iter_write(struct fil
                            struct iov_iter *iter);
  
  /* fs/splice.c */
 +ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 +                          struct pipe_inode_info *pipe,
 +                          size_t len, unsigned int flags);
 +ssize_t direct_splice_read(struct file *in, loff_t *ppos,
 +                         struct pipe_inode_info *pipe,
 +                         size_t len, unsigned int flags);
  extern ssize_t generic_file_splice_read(struct file *, loff_t *,
                struct pipe_inode_info *, size_t, unsigned int);
  extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
@@@ -2836,7 -3267,7 +2838,7 @@@ extern void page_put_link(void *)
  extern int page_symlink(struct inode *inode, const char *symname, int len);
  extern const struct inode_operations page_symlink_inode_operations;
  extern void kfree_link(void *);
 -void generic_fillattr(struct user_namespace *, struct inode *, struct kstat *);
 +void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *);
  void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
  extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
  extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
@@@ -2887,9 -3318,9 +2889,9 @@@ extern int dcache_dir_open(struct inod
  extern int dcache_dir_close(struct inode *, struct file *);
  extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
  extern int dcache_readdir(struct file *, struct dir_context *);
 -extern int simple_setattr(struct user_namespace *, struct dentry *,
 +extern int simple_setattr(struct mnt_idmap *, struct dentry *,
                          struct iattr *);
 -extern int simple_getattr(struct user_namespace *, const struct path *,
 +extern int simple_getattr(struct mnt_idmap *, const struct path *,
                          struct kstat *, u32, unsigned int);
  extern int simple_statfs(struct dentry *, struct kstatfs *);
  extern int simple_open(struct inode *inode, struct file *file);
@@@ -2898,7 -3329,7 +2900,7 @@@ extern int simple_unlink(struct inode *
  extern int simple_rmdir(struct inode *, struct dentry *);
  extern int simple_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
                                  struct inode *new_dir, struct dentry *new_dentry);
 -extern int simple_rename(struct user_namespace *, struct inode *,
 +extern int simple_rename(struct mnt_idmap *, struct inode *,
                         struct dentry *, struct inode *, struct dentry *,
                         unsigned int);
  extern void simple_recursive_removal(struct dentry *,
@@@ -2940,11 -3371,11 +2942,11 @@@ extern int generic_check_addressable(un
  
  extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
  
 -int may_setattr(struct user_namespace *mnt_userns, struct inode *inode,
 +int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
                unsigned int ia_valid);
 -int setattr_prepare(struct user_namespace *, struct dentry *, struct iattr *);
 +int setattr_prepare(struct mnt_idmap *, struct dentry *, struct iattr *);
  extern int inode_newsize_ok(const struct inode *, loff_t offset);
 -void setattr_copy(struct user_namespace *, struct inode *inode,
 +void setattr_copy(struct mnt_idmap *, struct inode *inode,
                  const struct iattr *attr);
  
  extern int file_update_time(struct file *file);
@@@ -3111,13 -3542,13 +3113,13 @@@ static inline bool is_sxid(umode_t mode
        return mode & (S_ISUID | S_ISGID);
  }
  
 -static inline int check_sticky(struct user_namespace *mnt_userns,
 +static inline int check_sticky(struct mnt_idmap *idmap,
                               struct inode *dir, struct inode *inode)
  {
        if (!(dir->i_mode & S_ISVTX))
                return 0;
  
 -      return __check_sticky(mnt_userns, dir, inode);
 +      return __check_sticky(idmap, dir, inode);
  }
  
  static inline void inode_has_no_xattr(struct inode *inode)
diff --combined include/linux/hugetlb.h
index 9ab9d3105d5c2c2029966ab8893fdd24af0d316a,5f5e4177b2e0c38dfcb7492360248cbc744294f2..7c977d234aba3c546fa2f61f35a349d8281001d0
@@@ -2,6 -2,7 +2,7 @@@
  #ifndef _LINUX_HUGETLB_H
  #define _LINUX_HUGETLB_H
  
+ #include <linux/mm.h>
  #include <linux/mm_types.h>
  #include <linux/mmdebug.h>
  #include <linux/fs.h>
@@@ -170,11 -171,11 +171,11 @@@ bool hugetlb_reserve_pages(struct inod
                                                vm_flags_t vm_flags);
  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
                                                long freed);
int isolate_hugetlb(struct page *page, struct list_head *list);
- int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
bool isolate_hugetlb(struct folio *folio, struct list_head *list);
+ int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison);
  int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
                                bool *migratable_cleared);
- void putback_active_hugepage(struct page *page);
+ void folio_putback_active_hugetlb(struct folio *folio);
  void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
  void free_huge_page(struct page *page);
  void hugetlb_fix_reserve_counts(struct inode *inode);
@@@ -193,6 -194,43 +194,43 @@@ extern struct list_head huge_boot_pages
  
  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
                        unsigned long addr, unsigned long sz);
+ /*
+  * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
+  * Returns the pte_t* if found, or NULL if the address is not mapped.
+  *
+  * IMPORTANT: we should normally not directly call this function, instead
+  * this is only a common interface to implement arch-specific
+  * walker. Please use hugetlb_walk() instead, because that will attempt to
+  * verify the locking for you.
+  *
+  * Since this function will walk all the pgtable pages (including not only
+  * high-level pgtable page, but also PUD entry that can be unshared
+  * concurrently for VM_SHARED), the caller of this function should be
+  * responsible of its thread safety.  One can follow this rule:
+  *
+  *  (1) For private mappings: pmd unsharing is not possible, so holding the
+  *      mmap_lock for either read or write is sufficient. Most callers
+  *      already hold the mmap_lock, so normally, no special action is
+  *      required.
+  *
+  *  (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged
+  *      pgtable page can go away from under us!  It can be done by a pmd
+  *      unshare with a follow up munmap() on the other process), then we
+  *      need either:
+  *
+  *     (2.1) hugetlb vma lock read or write held, to make sure pmd unshare
+  *           won't happen upon the range (it also makes sure the pte_t we
+  *           read is the right and stable one), or,
+  *
+  *     (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make
+  *           sure even if unshare happened the racy unmap() will wait until
+  *           i_mmap_rwsem is released.
+  *
+  * Option (2.1) is the safest, which guarantees pte stability from pmd
+  * sharing pov, until the vma lock released.  Option (2.2) doesn't protect
+  * a concurrent pmd unshare, but it makes sure the pgtable page is safe to
+  * access.
+  */
  pte_t *huge_pte_offset(struct mm_struct *mm,
                       unsigned long addr, unsigned long sz);
  unsigned long hugetlb_mask_last_page(struct hstate *h);
@@@ -211,7 -249,7 +249,7 @@@ void hugetlb_vma_lock_release(struct kr
  
  int pmd_huge(pmd_t pmd);
  int pud_huge(pud_t pud);
unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
+ long hugetlb_change_protection(struct vm_area_struct *vma,
                unsigned long address, unsigned long end, pgprot_t newprot,
                unsigned long cp_flags);
  
@@@ -375,12 -413,12 +413,12 @@@ static inline pte_t *huge_pte_offset(st
        return NULL;
  }
  
- static inline int isolate_hugetlb(struct page *page, struct list_head *list)
+ static inline bool isolate_hugetlb(struct folio *folio, struct list_head *list)
  {
-       return -EBUSY;
+       return false;
  }
  
- static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
+ static inline int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison)
  {
        return 0;
  }
@@@ -391,7 -429,7 +429,7 @@@ static inline int get_huge_page_for_hwp
        return 0;
  }
  
- static inline void putback_active_hugepage(struct page *page)
+ static inline void folio_putback_active_hugetlb(struct folio *folio)
  {
  }
  
@@@ -400,7 -438,7 +438,7 @@@ static inline void move_hugetlb_state(s
  {
  }
  
- static inline unsigned long hugetlb_change_protection(
+ static inline long hugetlb_change_protection(
                        struct vm_area_struct *vma, unsigned long address,
                        unsigned long end, pgprot_t newprot,
                        unsigned long cp_flags)
@@@ -679,16 -717,16 +717,16 @@@ struct huge_bootmem_page 
  };
  
  int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
                                unsigned long addr, int avoid_reserve);
- struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
                                nodemask_t *nmask, gfp_t gfp_mask);
- struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
                                unsigned long address);
- int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
+ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
                        pgoff_t idx);
  void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
-                               unsigned long address, struct page *page);
+                               unsigned long address, struct folio *folio);
  
  /* arch callback */
  int __init __alloc_bootmem_huge_page(struct hstate *h, int nid);
@@@ -743,10 -781,7 +781,10 @@@ static inline struct hstate *hstate_siz
        if (!page_size_log)
                return &default_hstate;
  
 -      return size_to_hstate(1UL << page_size_log);
 +      if (page_size_log < BITS_PER_LONG)
 +              return size_to_hstate(1UL << page_size_log);
 +
 +      return NULL;
  }
  
  static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
@@@ -843,9 -878,9 +881,9 @@@ extern int dissolve_free_huge_pages(uns
                                    unsigned long end_pfn);
  
  #ifdef CONFIG_MEMORY_FAILURE
- extern void hugetlb_clear_page_hwpoison(struct page *hpage);
+ extern void folio_clear_hugetlb_hwpoison(struct folio *folio);
  #else
- static inline void hugetlb_clear_page_hwpoison(struct page *hpage)
+ static inline void folio_clear_hugetlb_hwpoison(struct folio *folio)
  {
  }
  #endif
@@@ -998,21 -1033,21 +1036,21 @@@ static inline int isolate_or_dissolve_h
        return -ENOMEM;
  }
  
- static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
+ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
                                           unsigned long addr,
                                           int avoid_reserve)
  {
        return NULL;
  }
  
- static inline struct page *
- alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+ static inline struct folio *
+ alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
                        nodemask_t *nmask, gfp_t gfp_mask)
  {
        return NULL;
  }
  
- static inline struct page *alloc_huge_page_vma(struct hstate *h,
+ static inline struct folio *alloc_hugetlb_folio_vma(struct hstate *h,
                                               struct vm_area_struct *vma,
                                               unsigned long address)
  {
@@@ -1213,4 -1248,35 +1251,35 @@@ bool want_pmd_share(struct vm_area_stru
  #define flush_hugetlb_tlb_range(vma, addr, end)       flush_tlb_range(vma, addr, end)
  #endif
  
+ static inline bool __vma_shareable_lock(struct vm_area_struct *vma)
+ {
+       return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data;
+ }
+ /*
+  * Safe version of huge_pte_offset() to check the locks.  See comments
+  * above huge_pte_offset().
+  */
+ static inline pte_t *
+ hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz)
+ {
+ #if defined(CONFIG_HUGETLB_PAGE) && \
+       defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP)
+       struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+       /*
+        * If pmd sharing possible, locking needed to safely walk the
+        * hugetlb pgtables.  More information can be found at the comment
+        * above huge_pte_offset() in the same file.
+        *
+        * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP.
+        */
+       if (__vma_shareable_lock(vma))
+               WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) &&
+                            !lockdep_is_held(
+                                &vma->vm_file->f_mapping->i_mmap_rwsem));
+ #endif
+       return huge_pte_offset(vma->vm_mm, addr, sz);
+ }
  #endif /* _LINUX_HUGETLB_H */
index 1e38e99998c79bb4eca5191fe6197ae165e79adb,5567319027d1806f3a261109993a9cca9fad9ff3..b6eda2ab205dc7133472ba56a4f658cfaec5ff4f
@@@ -466,34 -466,34 +466,34 @@@ static inline struct mem_cgroup *folio_
  }
  
  /*
-  * page_memcg_check - get the memory cgroup associated with a page
-  * @page: a pointer to the page struct
+  * folio_memcg_check - Get the memory cgroup associated with a folio.
+  * @folio: Pointer to the folio.
   *
-  * Returns a pointer to the memory cgroup associated with the page,
-  * or NULL. This function unlike page_memcg() can take any page
-  * as an argument. It has to be used in cases when it's not known if a page
+  * Returns a pointer to the memory cgroup associated with the folio,
+  * or NULL. This function unlike folio_memcg() can take any folio
+  * as an argument. It has to be used in cases when it's not known if a folio
   * has an associated memory cgroup pointer or an object cgroups vector or
   * an object cgroup.
   *
-  * For a non-kmem page any of the following ensures page and memcg binding
+  * For a non-kmem folio any of the following ensures folio and memcg binding
   * stability:
   *
-  * - the page lock
+  * - the folio lock
   * - LRU isolation
-  * - lock_page_memcg()
+  * - lock_folio_memcg()
   * - exclusive reference
   * - mem_cgroup_trylock_pages()
   *
-  * For a kmem page a caller should hold an rcu read lock to protect memcg
-  * associated with a kmem page from being released.
+  * For a kmem folio a caller should hold an rcu read lock to protect memcg
+  * associated with a kmem folio from being released.
   */
- static inline struct mem_cgroup *page_memcg_check(struct page *page)
+ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
  {
        /*
-        * Because page->memcg_data might be changed asynchronously
-        * for slab pages, READ_ONCE() should be used here.
+        * Because folio->memcg_data might be changed asynchronously
+        * for slabs, READ_ONCE() should be used here.
         */
-       unsigned long memcg_data = READ_ONCE(page->memcg_data);
+       unsigned long memcg_data = READ_ONCE(folio->memcg_data);
  
        if (memcg_data & MEMCG_DATA_OBJCGS)
                return NULL;
        return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
  }
  
+ static inline struct mem_cgroup *page_memcg_check(struct page *page)
+ {
+       if (PageTail(page))
+               return NULL;
+       return folio_memcg_check((struct folio *)page);
+ }
  static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
  {
        struct mem_cgroup *memcg;
@@@ -794,6 -801,11 +801,11 @@@ static inline void obj_cgroup_put(struc
        percpu_ref_put(&objcg->refcnt);
  }
  
+ static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+ {
+       return !memcg || css_tryget(&memcg->css);
+ }
  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
  {
        if (memcg)
@@@ -878,7 -890,7 +890,7 @@@ static inline bool mm_match_cgroup(stru
        return match;
  }
  
- struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page);
+ struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio);
  ino_t page_cgroup_ino(struct page *page);
  
  static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
@@@ -1165,6 -1177,11 +1177,11 @@@ static inline struct mem_cgroup *folio_
        return NULL;
  }
  
+ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
+ {
+       return NULL;
+ }
  static inline struct mem_cgroup *page_memcg_check(struct page *page)
  {
        return NULL;
@@@ -1301,6 -1318,11 +1318,11 @@@ static inline void obj_cgroup_put(struc
  {
  }
  
+ static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+ {
+       return true;
+ }
  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
  {
  }
@@@ -1754,30 -1776,24 +1776,30 @@@ struct obj_cgroup *get_obj_cgroup_from_
  int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
  void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
  
- extern struct static_key_false memcg_kmem_enabled_key;
 +extern struct static_key_false memcg_bpf_enabled_key;
 +static inline bool memcg_bpf_enabled(void)
 +{
 +      return static_branch_likely(&memcg_bpf_enabled_key);
 +}
 +
+ extern struct static_key_false memcg_kmem_online_key;
  
- static inline bool memcg_kmem_enabled(void)
+ static inline bool memcg_kmem_online(void)
  {
-       return static_branch_likely(&memcg_kmem_enabled_key);
+       return static_branch_likely(&memcg_kmem_online_key);
  }
  
  static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp,
                                         int order)
  {
-       if (memcg_kmem_enabled())
+       if (memcg_kmem_online())
                return __memcg_kmem_charge_page(page, gfp, order);
        return 0;
  }
  
  static inline void memcg_kmem_uncharge_page(struct page *page, int order)
  {
-       if (memcg_kmem_enabled())
+       if (memcg_kmem_online())
                __memcg_kmem_uncharge_page(page, order);
  }
  
@@@ -1798,7 -1814,7 +1820,7 @@@ static inline void count_objcg_event(st
  {
        struct mem_cgroup *memcg;
  
-       if (!memcg_kmem_enabled())
+       if (!memcg_kmem_online())
                return;
  
        rcu_read_lock();
@@@ -1838,12 -1854,7 +1860,12 @@@ static inline struct obj_cgroup *get_ob
        return NULL;
  }
  
- static inline bool memcg_kmem_enabled(void)
 +static inline bool memcg_bpf_enabled(void)
 +{
 +      return false;
 +}
 +
+ static inline bool memcg_kmem_online(void)
  {
        return false;
  }
diff --combined include/linux/mm.h
index 716d30d93616ca83675893e84beb6a14f078e39a,2992a2d55aee65f53e866c8e1042eb68bcb68be5..1f79667824eb60cf4ae79353ab014a9cc04eecc2
@@@ -282,7 -282,12 +282,12 @@@ extern unsigned int kobjsize(const voi
  #define VM_MAYSHARE   0x00000080
  
  #define VM_GROWSDOWN  0x00000100      /* general info on the segment */
+ #ifdef CONFIG_MMU
  #define VM_UFFD_MISSING       0x00000200      /* missing pages tracking */
+ #else /* CONFIG_MMU */
+ #define VM_MAYOVERLAY 0x00000200      /* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
+ #define VM_UFFD_MISSING       0
+ #endif /* CONFIG_MMU */
  #define VM_PFNMAP     0x00000400      /* Page-ranges managed without "struct page", just pure PFN */
  #define VM_UFFD_WP    0x00001000      /* wrprotect pages tracking */
  
  /* This mask defines which mm->def_flags a process can inherit its parent */
  #define VM_INIT_DEF_MASK      VM_NOHUGEPAGE
  
- /* This mask is used to clear all the VMA flags used by mlock */
- #define VM_LOCKED_CLEAR_MASK  (~(VM_LOCKED | VM_LOCKONFAULT))
+ /* This mask represents all the VMA flag bits used by mlock */
+ #define VM_LOCKED_MASK        (VM_LOCKED | VM_LOCKONFAULT)
  
  /* Arch-specific flags to clear when updating VM flags on protection change */
  #ifndef VM_ARCH_CLEAR
@@@ -628,6 -633,63 +633,63 @@@ static inline void vma_init(struct vm_a
        INIT_LIST_HEAD(&vma->anon_vma_chain);
  }
  
+ /* Use when VMA is not part of the VMA tree and needs no locking */
+ static inline void vm_flags_init(struct vm_area_struct *vma,
+                                vm_flags_t flags)
+ {
+       ACCESS_PRIVATE(vma, __vm_flags) = flags;
+ }
+ /* Use when VMA is part of the VMA tree and modifications need coordination */
+ static inline void vm_flags_reset(struct vm_area_struct *vma,
+                                 vm_flags_t flags)
+ {
+       mmap_assert_write_locked(vma->vm_mm);
+       vm_flags_init(vma, flags);
+ }
+ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
+                                      vm_flags_t flags)
+ {
+       mmap_assert_write_locked(vma->vm_mm);
+       WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
+ }
+ static inline void vm_flags_set(struct vm_area_struct *vma,
+                               vm_flags_t flags)
+ {
+       mmap_assert_write_locked(vma->vm_mm);
+       ACCESS_PRIVATE(vma, __vm_flags) |= flags;
+ }
+ static inline void vm_flags_clear(struct vm_area_struct *vma,
+                                 vm_flags_t flags)
+ {
+       mmap_assert_write_locked(vma->vm_mm);
+       ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
+ }
+ /*
+  * Use only if VMA is not part of the VMA tree or has no other users and
+  * therefore needs no locking.
+  */
+ static inline void __vm_flags_mod(struct vm_area_struct *vma,
+                                 vm_flags_t set, vm_flags_t clear)
+ {
+       vm_flags_init(vma, (vma->vm_flags | set) & ~clear);
+ }
+ /*
+  * Use only when the order of set/clear operations is unimportant, otherwise
+  * use vm_flags_{set|clear} explicitly.
+  */
+ static inline void vm_flags_mod(struct vm_area_struct *vma,
+                               vm_flags_t set, vm_flags_t clear)
+ {
+       mmap_assert_write_locked(vma->vm_mm);
+       __vm_flags_mod(vma, set, clear);
+ }
  static inline void vma_set_anonymous(struct vm_area_struct *vma)
  {
        vma->vm_ops = NULL;
@@@ -671,16 -733,16 +733,16 @@@ static inline bool vma_is_accessible(st
  static inline
  struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max)
  {
-       return mas_find(&vmi->mas, max);
+       return mas_find(&vmi->mas, max - 1);
  }
  
  static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
  {
        /*
-        * Uses vma_find() to get the first VMA when the iterator starts.
+        * Uses mas_find() to get the first VMA when the iterator starts.
         * Calling mas_next() could skip the first entry.
         */
-       return vma_find(vmi, ULONG_MAX);
+       return mas_find(&vmi->mas, ULONG_MAX);
  }
  
  static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi)
@@@ -693,12 -755,50 +755,50 @@@ static inline unsigned long vma_iter_ad
        return vmi->mas.index;
  }
  
+ static inline unsigned long vma_iter_end(struct vma_iterator *vmi)
+ {
+       return vmi->mas.last + 1;
+ }
+ static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi,
+                                     unsigned long count)
+ {
+       return mas_expected_entries(&vmi->mas, count);
+ }
+ /* Free any unused preallocations */
+ static inline void vma_iter_free(struct vma_iterator *vmi)
+ {
+       mas_destroy(&vmi->mas);
+ }
+ static inline int vma_iter_bulk_store(struct vma_iterator *vmi,
+                                     struct vm_area_struct *vma)
+ {
+       vmi->mas.index = vma->vm_start;
+       vmi->mas.last = vma->vm_end - 1;
+       mas_store(&vmi->mas, vma);
+       if (unlikely(mas_is_err(&vmi->mas)))
+               return -ENOMEM;
+       return 0;
+ }
+ static inline void vma_iter_invalidate(struct vma_iterator *vmi)
+ {
+       mas_pause(&vmi->mas);
+ }
+ static inline void vma_iter_set(struct vma_iterator *vmi, unsigned long addr)
+ {
+       mas_set(&vmi->mas, addr);
+ }
  #define for_each_vma(__vmi, __vma)                                    \
        while (((__vma) = vma_next(&(__vmi))) != NULL)
  
  /* The MM code likes to work with exclusive end addresses */
  #define for_each_vma_range(__vmi, __vma, __end)                               \
-       while (((__vma) = vma_find(&(__vmi), (__end) - 1)) != NULL)
+       while (((__vma) = vma_find(&(__vmi), (__end))) != NULL)
  
  #ifdef CONFIG_SHMEM
  /*
@@@ -720,11 -820,20 +820,20 @@@ int vma_is_stack_for_current(struct vm_
  struct mmu_gather;
  struct inode;
  
+ /*
+  * compound_order() can be called without holding a reference, which means
+  * that niceties like page_folio() don't work.  These callers should be
+  * prepared to handle wild return values.  For example, PG_head may be
+  * set before _folio_order is initialised, or this may be a tail page.
+  * See compaction.c for some good examples.
+  */
  static inline unsigned int compound_order(struct page *page)
  {
-       if (!PageHead(page))
+       struct folio *folio = (struct folio *)page;
+       if (!test_bit(PG_head, &folio->flags))
                return 0;
-       return page[1].compound_order;
+       return folio->_folio_order;
  }
  
  /**
@@@ -783,6 -892,13 +892,13 @@@ static inline bool get_page_unless_zero
        return page_ref_add_unless(page, 1, 0);
  }
  
+ static inline struct folio *folio_get_nontail_page(struct page *page)
+ {
+       if (unlikely(!get_page_unless_zero(page)))
+               return NULL;
+       return (struct folio *)page;
+ }
  extern int page_is_ram(unsigned long pfn);
  
  enum {
@@@ -832,34 -948,7 +948,7 @@@ static inline int is_vmalloc_or_module_
  static inline int folio_entire_mapcount(struct folio *folio)
  {
        VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
-       return atomic_read(folio_mapcount_ptr(folio)) + 1;
- }
- /*
-  * Mapcount of compound page as a whole, does not include mapped sub-pages.
-  * Must be called only on head of compound page.
-  */
- static inline int head_compound_mapcount(struct page *head)
- {
-       return atomic_read(compound_mapcount_ptr(head)) + 1;
- }
- /*
-  * If a 16GB hugetlb page were mapped by PTEs of all of its 4kB sub-pages,
-  * its subpages_mapcount would be 0x400000: choose the COMPOUND_MAPPED bit
-  * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE).  Hugetlb currently
-  * leaves subpages_mapcount at 0, but avoid surprise if it participates later.
-  */
- #define COMPOUND_MAPPED       0x800000
- #define SUBPAGES_MAPPED       (COMPOUND_MAPPED - 1)
- /*
-  * Number of sub-pages mapped by PTE, does not include compound mapcount.
-  * Must be called only on head of compound page.
-  */
- static inline int head_subpages_mapcount(struct page *head)
- {
-       return atomic_read(subpages_mapcount_ptr(head)) & SUBPAGES_MAPPED;
+       return atomic_read(&folio->_entire_mapcount) + 1;
  }
  
  /*
@@@ -872,25 -961,29 +961,29 @@@ static inline void page_mapcount_reset(
        atomic_set(&(page)->_mapcount, -1);
  }
  
- /*
-  * Mapcount of 0-order page; when compound sub-page, includes
-  * compound_mapcount of compound_head of page.
+ /**
+  * page_mapcount() - Number of times this precise page is mapped.
+  * @page: The page.
+  *
+  * The number of times this page is mapped.  If this page is part of
+  * a large folio, it includes the number of times this page is mapped
+  * as part of that folio.
   *
-  * Result is undefined for pages which cannot be mapped into userspace.
+  * The result is undefined for pages which cannot be mapped into userspace.
   * For example SLAB or special types of pages. See function page_has_type().
-  * They use this place in struct page differently.
+  * They use this field in struct page differently.
   */
  static inline int page_mapcount(struct page *page)
  {
        int mapcount = atomic_read(&page->_mapcount) + 1;
  
-       if (likely(!PageCompound(page)))
-               return mapcount;
-       page = compound_head(page);
-       return head_compound_mapcount(page) + mapcount;
+       if (unlikely(PageCompound(page)))
+               mapcount += folio_entire_mapcount(page_folio(page));
+       return mapcount;
  }
  
- int total_compound_mapcount(struct page *head);
+ int folio_total_mapcount(struct folio *folio);
  
  /**
   * folio_mapcount() - Calculate the number of mappings of this folio.
@@@ -907,24 -1000,24 +1000,24 @@@ static inline int folio_mapcount(struc
  {
        if (likely(!folio_test_large(folio)))
                return atomic_read(&folio->_mapcount) + 1;
-       return total_compound_mapcount(&folio->page);
+       return folio_total_mapcount(folio);
  }
  
  static inline int total_mapcount(struct page *page)
  {
        if (likely(!PageCompound(page)))
                return atomic_read(&page->_mapcount) + 1;
-       return total_compound_mapcount(compound_head(page));
+       return folio_total_mapcount(page_folio(page));
  }
  
  static inline bool folio_large_is_mapped(struct folio *folio)
  {
        /*
-        * Reading folio_mapcount_ptr() below could be omitted if hugetlb
-        * participated in incrementing subpages_mapcount when compound mapped.
+        * Reading _entire_mapcount below could be omitted if hugetlb
+        * participated in incrementing nr_pages_mapped when compound mapped.
         */
-       return atomic_read(folio_subpages_mapcount_ptr(folio)) > 0 ||
-               atomic_read(folio_mapcount_ptr(folio)) >= 0;
+       return atomic_read(&folio->_nr_pages_mapped) > 0 ||
+               atomic_read(&folio->_entire_mapcount) >= 0;
  }
  
  /**
@@@ -999,8 -1092,11 +1092,11 @@@ extern compound_page_dtor * const compo
  static inline void set_compound_page_dtor(struct page *page,
                enum compound_dtor_id compound_dtor)
  {
+       struct folio *folio = (struct folio *)page;
        VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
-       page[1].compound_dtor = compound_dtor;
+       VM_BUG_ON_PAGE(!PageHead(page), page);
+       folio->_folio_dtor = compound_dtor;
  }
  
  static inline void folio_set_compound_dtor(struct folio *folio,
  
  void destroy_large_folio(struct folio *folio);
  
- static inline int head_compound_pincount(struct page *head)
- {
-       return atomic_read(compound_pincount_ptr(head));
- }
  static inline void set_compound_order(struct page *page, unsigned int order)
  {
-       page[1].compound_order = order;
- #ifdef CONFIG_64BIT
-       page[1].compound_nr = 1U << order;
- #endif
- }
- /*
-  * folio_set_compound_order is generally passed a non-zero order to
-  * initialize a large folio.  However, hugetlb code abuses this by
-  * passing in zero when 'dissolving' a large folio.
-  */
- static inline void folio_set_compound_order(struct folio *folio,
-               unsigned int order)
- {
-       VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
+       struct folio *folio = (struct folio *)page;
  
        folio->_folio_order = order;
  #ifdef CONFIG_64BIT
-       folio->_folio_nr_pages = order ? 1U << order : 0;
- #endif
- }
- /* Returns the number of pages in this potentially compound page. */
- static inline unsigned long compound_nr(struct page *page)
- {
-       if (!PageHead(page))
-               return 1;
- #ifdef CONFIG_64BIT
-       return page[1].compound_nr;
- #else
-       return 1UL << compound_order(page);
+       folio->_folio_nr_pages = 1U << order;
  #endif
  }
  
@@@ -1075,16 -1140,6 +1140,6 @@@ static inline unsigned int thp_order(st
        return compound_order(page);
  }
  
- /**
-  * thp_nr_pages - The number of regular pages in this huge page.
-  * @page: The head page of a huge page.
-  */
- static inline int thp_nr_pages(struct page *page)
- {
-       VM_BUG_ON_PGFLAGS(PageTail(page), page);
-       return compound_nr(page);
- }
  /**
   * thp_size - Size of a transparent huge page.
   * @page: Head page of a transparent huge page.
@@@ -1226,8 -1281,6 +1281,6 @@@ static inline void get_page(struct pag
        folio_get(page_folio(page));
  }
  
- int __must_check try_grab_page(struct page *page, unsigned int flags);
  static inline __must_check bool try_get_page(struct page *page)
  {
        page = compound_head(page);
@@@ -1369,6 -1422,21 +1422,21 @@@ static inline bool is_cow_mapping(vm_fl
        return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
  }
  
+ #ifndef CONFIG_MMU
+ static inline bool is_nommu_shared_mapping(vm_flags_t flags)
+ {
+       /*
+        * NOMMU shared mappings are ordinary MAP_SHARED mappings and selected
+        * R/O MAP_PRIVATE file mappings that are an effective R/O overlay of
+        * a file mapping. R/O MAP_PRIVATE mappings might still modify
+        * underlying memory if ptrace is active, so this is only possible if
+        * ptrace does not apply. Note that there is no mprotect() to upgrade
+        * write permissions later.
+        */
+       return flags & (VM_MAYSHARE | VM_MAYOVERLAY);
+ }
+ #endif
  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
  #define SECTION_IN_PAGE_FLAGS
  #endif
@@@ -1643,11 -1711,6 +1711,6 @@@ static inline struct folio *pfn_folio(u
        return page_folio(pfn_to_page(pfn));
  }
  
- static inline atomic_t *folio_pincount_ptr(struct folio *folio)
- {
-       return &folio_page(folio, 1)->compound_pincount;
- }
  /**
   * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.
   * @folio: The folio.
   * expected to be able to deal gracefully with a false positive.
   *
   * For large folios, the result will be exactly correct. That's because
-  * we have more tracking data available: the compound_pincount is used
+  * we have more tracking data available: the _pincount field is used
   * instead of the GUP_PIN_COUNTING_BIAS scheme.
   *
   * For more information, please see Documentation/core-api/pin_user_pages.rst.
  static inline bool folio_maybe_dma_pinned(struct folio *folio)
  {
        if (folio_test_large(folio))
-               return atomic_read(folio_pincount_ptr(folio)) > 0;
+               return atomic_read(&folio->_pincount) > 0;
  
        /*
         * folio_ref_count() is signed. If that refcount overflows, then
@@@ -1784,6 -1847,33 +1847,33 @@@ static inline long folio_nr_pages(struc
  #endif
  }
  
+ /*
+  * compound_nr() returns the number of pages in this potentially compound
+  * page.  compound_nr() can be called on a tail page, and is defined to
+  * return 1 in that case.
+  */
+ static inline unsigned long compound_nr(struct page *page)
+ {
+       struct folio *folio = (struct folio *)page;
+       if (!test_bit(PG_head, &folio->flags))
+               return 1;
+ #ifdef CONFIG_64BIT
+       return folio->_folio_nr_pages;
+ #else
+       return 1L << folio->_folio_order;
+ #endif
+ }
+ /**
+  * thp_nr_pages - The number of regular pages in this huge page.
+  * @page: The head page of a huge page.
+  */
+ static inline int thp_nr_pages(struct page *page)
+ {
+       return folio_nr_pages((struct folio *)page);
+ }
  /**
   * folio_next - Move to the next physical folio.
   * @folio: The folio we're currently operating on.
@@@ -1833,6 -1923,24 +1923,24 @@@ static inline size_t folio_size(struct 
        return PAGE_SIZE << folio_order(folio);
  }
  
+ /**
+  * folio_estimated_sharers - Estimate the number of sharers of a folio.
+  * @folio: The folio.
+  *
+  * folio_estimated_sharers() aims to serve as a function to efficiently
+  * estimate the number of processes sharing a folio. This is done by
+  * looking at the precise mapcount of the first subpage in the folio, and
+  * assuming the other subpages are the same. This may not be true for large
+  * folios. If you want exact mapcounts for exact calculations, look at
+  * page_mapcount() or folio_total_mapcount().
+  *
+  * Return: The estimated number of processes sharing a folio.
+  */
+ static inline int folio_estimated_sharers(struct folio *folio)
+ {
+       return page_mapcount(folio_page(folio, 0));
+ }
  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
  static inline int arch_make_page_accessible(struct page *page)
  {
@@@ -1928,6 -2036,21 +2036,21 @@@ static inline bool page_is_pfmemalloc(c
        return (uintptr_t)page->lru.next & BIT(1);
  }
  
+ /*
+  * Return true only if the folio has been allocated with
+  * ALLOC_NO_WATERMARKS and the low watermark was not
+  * met implying that the system is under some pressure.
+  */
+ static inline bool folio_is_pfmemalloc(const struct folio *folio)
+ {
+       /*
+        * lru.next has bit 1 set if the page is allocated from the
+        * pfmemalloc reserves.  Callers may simply overwrite it if
+        * they do not need to preserve that information.
+        */
+       return (uintptr_t)folio->lru.next & BIT(1);
+ }
  /*
   * Only to be called by the page allocator on a freshly allocated
   * page.
@@@ -1982,31 -2105,6 +2105,31 @@@ struct zap_details 
  /* Set in unmap_vmas() to indicate a final unmap call.  Only used by hugetlb */
  #define  ZAP_FLAG_UNMAP              ((__force zap_flags_t) BIT(1))
  
 +#ifdef CONFIG_SCHED_MM_CID
 +void sched_mm_cid_before_execve(struct task_struct *t);
 +void sched_mm_cid_after_execve(struct task_struct *t);
 +void sched_mm_cid_fork(struct task_struct *t);
 +void sched_mm_cid_exit_signals(struct task_struct *t);
 +static inline int task_mm_cid(struct task_struct *t)
 +{
 +      return t->mm_cid;
 +}
 +#else
 +static inline void sched_mm_cid_before_execve(struct task_struct *t) { }
 +static inline void sched_mm_cid_after_execve(struct task_struct *t) { }
 +static inline void sched_mm_cid_fork(struct task_struct *t) { }
 +static inline void sched_mm_cid_exit_signals(struct task_struct *t) { }
 +static inline int task_mm_cid(struct task_struct *t)
 +{
 +      /*
 +       * Use the processor id as a fall-back when the mm cid feature is
 +       * disabled. This provides functional per-cpu data structure accesses
 +       * in user-space, althrough it won't provide the memory usage benefits.
 +       */
 +      return raw_smp_processor_id();
 +}
 +#endif
 +
  #ifdef CONFIG_MMU
  extern bool can_do_mlock(void);
  #else
@@@ -2015,6 -2113,8 +2138,8 @@@ static inline bool can_do_mlock(void) 
  extern int user_shm_lock(size_t, struct ucounts *);
  extern void user_shm_unlock(size_t, struct ucounts *);
  
+ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
+                            pte_t pte);
  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
                             pte_t pte);
  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
  
  void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
                  unsigned long size);
- void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-                   unsigned long size);
  void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
                           unsigned long size, struct zap_details *details);
+ static inline void zap_vma_pages(struct vm_area_struct *vma)
+ {
+       zap_page_range_single(vma, vma->vm_start,
+                             vma->vm_end - vma->vm_start, NULL);
+ }
  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
                struct vm_area_struct *start_vma, unsigned long start,
-               unsigned long end);
+               unsigned long end, bool mm_wr_locked);
  
  struct mmu_notifier_range;
  
@@@ -2126,6 -2229,8 +2254,6 @@@ int __account_locked_vm(struct mm_struc
                        struct task_struct *task, bool bypass_rlim);
  
  struct kvec;
 -int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 -                      struct page **pages);
  struct page *get_dump_page(unsigned long addr);
  
  bool folio_mark_dirty(struct folio *folio);
@@@ -2175,21 -2280,18 +2303,18 @@@ static inline bool vma_wants_manual_pte
  }
  bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
                             pte_t pte);
- extern unsigned long change_protection(struct mmu_gather *tlb,
+ extern long change_protection(struct mmu_gather *tlb,
                              struct vm_area_struct *vma, unsigned long start,
-                             unsigned long end, pgprot_t newprot,
-                             unsigned long cp_flags);
- extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
-                         struct vm_area_struct **pprev, unsigned long start,
-                         unsigned long end, unsigned long newflags);
+                             unsigned long end, unsigned long cp_flags);
+ extern int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+         struct vm_area_struct *vma, struct vm_area_struct **pprev,
+         unsigned long start, unsigned long end, unsigned long newflags);
  
  /*
   * doesn't attempt to fault and will return short.
   */
  int get_user_pages_fast_only(unsigned long start, int nr_pages,
                             unsigned int gup_flags, struct page **pages);
- int pin_user_pages_fast_only(unsigned long start, int nr_pages,
-                            unsigned int gup_flags, struct page **pages);
  
  static inline bool get_user_page_fast_only(unsigned long addr,
                        unsigned int gup_flags, struct page **pagep)
@@@ -2813,23 -2915,21 +2938,21 @@@ void anon_vma_interval_tree_verify(stru
  
  /* mmap.c */
  extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
- extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
-       unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
-       struct vm_area_struct *expand);
- static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
-       unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
- {
-       return __vma_adjust(vma, start, end, pgoff, insert, NULL);
- }
- extern struct vm_area_struct *vma_merge(struct mm_struct *,
-       struct vm_area_struct *prev, unsigned long addr, unsigned long end,
-       unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
-       struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
+ extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+                     unsigned long start, unsigned long end, pgoff_t pgoff,
+                     struct vm_area_struct *next);
+ extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+                      unsigned long start, unsigned long end, pgoff_t pgoff);
+ extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
+       struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
+       unsigned long end, unsigned long vm_flags, struct anon_vma *,
+       struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
+       struct anon_vma_name *);
  extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
- extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
-       unsigned long addr, int new_below);
- extern int split_vma(struct mm_struct *, struct vm_area_struct *,
-       unsigned long addr, int new_below);
+ extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+                      unsigned long addr, int new_below);
+ extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+                        unsigned long addr, int new_below);
  extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
  extern void unlink_file_vma(struct vm_area_struct *);
  extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
        bool *need_rmap_locks);
  extern void exit_mmap(struct mm_struct *);
  
- void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas);
- void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas);
  static inline int check_data_rlimit(unsigned long rlim,
                                    unsigned long new,
                                    unsigned long start,
@@@ -2887,7 -2984,7 +3007,7 @@@ extern unsigned long mmap_region(struc
  extern unsigned long do_mmap(struct file *file, unsigned long addr,
        unsigned long len, unsigned long prot, unsigned long flags,
        unsigned long pgoff, unsigned long *populate, struct list_head *uf);
- extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
                         unsigned long start, size_t len, struct list_head *uf,
                         bool downgrade);
  extern int do_munmap(struct mm_struct *, unsigned long, size_t,
  extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
  
  #ifdef CONFIG_MMU
+ extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
+                        unsigned long start, unsigned long end,
+                        struct list_head *uf, bool downgrade);
  extern int __mm_populate(unsigned long addr, unsigned long len,
                         int ignore_errors);
  static inline void mm_populate(unsigned long addr, unsigned long len)
@@@ -3100,81 -3200,6 +3223,6 @@@ static inline vm_fault_t vmf_error(int 
  struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
                         unsigned int foll_flags);
  
- #define FOLL_WRITE    0x01    /* check pte is writable */
- #define FOLL_TOUCH    0x02    /* mark page accessed */
- #define FOLL_GET      0x04    /* do get_page on page */
- #define FOLL_DUMP     0x08    /* give error on hole if it would be zero */
- #define FOLL_FORCE    0x10    /* get_user_pages read/write w/o permission */
- #define FOLL_NOWAIT   0x20    /* if a disk transfer is needed, start the IO
-                                * and return without waiting upon it */
- #define FOLL_NOFAULT  0x80    /* do not fault in pages */
- #define FOLL_HWPOISON 0x100   /* check page is hwpoisoned */
- #define FOLL_TRIED    0x800   /* a retry, previous pass started an IO */
- #define FOLL_REMOTE   0x2000  /* we are working on non-current tsk/mm */
- #define FOLL_ANON     0x8000  /* don't do file mappings */
- #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */
- #define FOLL_SPLIT_PMD        0x20000 /* split huge pmd before returning */
- #define FOLL_PIN      0x40000 /* pages must be released via unpin_user_page */
- #define FOLL_FAST_ONLY        0x80000 /* gup_fast: prevent fall-back to slow gup */
- #define FOLL_PCI_P2PDMA       0x100000 /* allow returning PCI P2PDMA pages */
- #define FOLL_INTERRUPTIBLE  0x200000 /* allow interrupts from generic signals */
- /*
-  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
-  * other. Here is what they mean, and how to use them:
-  *
-  * FOLL_LONGTERM indicates that the page will be held for an indefinite time
-  * period _often_ under userspace control.  This is in contrast to
-  * iov_iter_get_pages(), whose usages are transient.
-  *
-  * FIXME: For pages which are part of a filesystem, mappings are subject to the
-  * lifetime enforced by the filesystem and we need guarantees that longterm
-  * users like RDMA and V4L2 only establish mappings which coordinate usage with
-  * the filesystem.  Ideas for this coordination include revoking the longterm
-  * pin, delaying writeback, bounce buffer page writeback, etc.  As FS DAX was
-  * added after the problem with filesystems was found FS DAX VMAs are
-  * specifically failed.  Filesystem pages are still subject to bugs and use of
-  * FOLL_LONGTERM should be avoided on those pages.
-  *
-  * FIXME: Also NOTE that FOLL_LONGTERM is not supported in every GUP call.
-  * Currently only get_user_pages() and get_user_pages_fast() support this flag
-  * and calls to get_user_pages_[un]locked are specifically not allowed.  This
-  * is due to an incompatibility with the FS DAX check and
-  * FAULT_FLAG_ALLOW_RETRY.
-  *
-  * In the CMA case: long term pins in a CMA region would unnecessarily fragment
-  * that region.  And so, CMA attempts to migrate the page before pinning, when
-  * FOLL_LONGTERM is specified.
-  *
-  * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
-  * but an additional pin counting system) will be invoked. This is intended for
-  * anything that gets a page reference and then touches page data (for example,
-  * Direct IO). This lets the filesystem know that some non-file-system entity is
-  * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
-  * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
-  * a call to unpin_user_page().
-  *
-  * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
-  * and separate refcounting mechanisms, however, and that means that each has
-  * its own acquire and release mechanisms:
-  *
-  *     FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
-  *
-  *     FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release.
-  *
-  * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
-  * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
-  * calls applied to them, and that's perfectly OK. This is a constraint on the
-  * callers, not on the pages.)
-  *
-  * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
-  * directly by the caller. That's in order to help avoid mismatches when
-  * releasing pages: get_user_pages*() pages must be released via put_page(),
-  * while pin_user_pages*() pages must be released via unpin_user_page().
-  *
-  * Please see Documentation/core-api/pin_user_pages.rst for more information.
-  */
  static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
  {
        if (vm_fault & VM_FAULT_OOM)
        return 0;
  }
  
- /*
-  * Indicates for which pages that are write-protected in the page table,
-  * whether GUP has to trigger unsharing via FAULT_FLAG_UNSHARE such that the
-  * GUP pin will remain consistent with the pages mapped into the page tables
-  * of the MM.
-  *
-  * Temporary unmapping of PageAnonExclusive() pages or clearing of
-  * PageAnonExclusive() has to protect against concurrent GUP:
-  * * Ordinary GUP: Using the PT lock
-  * * GUP-fast and fork(): mm->write_protect_seq
-  * * GUP-fast and KSM or temporary unmapping (swap, migration): see
-  *    page_try_share_anon_rmap()
-  *
-  * Must be called with the (sub)page that's actually referenced via the
-  * page table entry, which might not necessarily be the head page for a
-  * PTE-mapped THP.
-  *
-  * If the vma is NULL, we're coming from the GUP-fast path and might have
-  * to fallback to the slow path just to lookup the vma.
-  */
- static inline bool gup_must_unshare(struct vm_area_struct *vma,
-                                   unsigned int flags, struct page *page)
- {
-       /*
-        * FOLL_WRITE is implicitly handled correctly as the page table entry
-        * has to be writable -- and if it references (part of) an anonymous
-        * folio, that part is required to be marked exclusive.
-        */
-       if ((flags & (FOLL_WRITE | FOLL_PIN)) != FOLL_PIN)
-               return false;
-       /*
-        * Note: PageAnon(page) is stable until the page is actually getting
-        * freed.
-        */
-       if (!PageAnon(page)) {
-               /*
-                * We only care about R/O long-term pining: R/O short-term
-                * pinning does not have the semantics to observe successive
-                * changes through the process page tables.
-                */
-               if (!(flags & FOLL_LONGTERM))
-                       return false;
-               /* We really need the vma ... */
-               if (!vma)
-                       return true;
-               /*
-                * ... because we only care about writable private ("COW")
-                * mappings where we have to break COW early.
-                */
-               return is_cow_mapping(vma->vm_flags);
-       }
-       /* Paired with a memory barrier in page_try_share_anon_rmap(). */
-       if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
-               smp_rmb();
-       /*
-        * Note that PageKsm() pages cannot be exclusive, and consequently,
-        * cannot get pinned.
-        */
-       return !PageAnonExclusive(page);
- }
  /*
   * Indicates whether GUP can follow a PROT_NONE mapped page, or whether
   * a (NUMA hinting) fault is required.
@@@ -3550,6 -3510,11 +3533,11 @@@ enum mf_action_page_type 
        MF_MSG_UNKNOWN,
  };
  
+ /*
+  * Sysfs entries for memory failure handling statistics.
+  */
+ extern const struct attribute_group memory_failure_attr_group;
  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
  extern void clear_huge_page(struct page *page,
                            unsigned long addr_hint,
@@@ -3667,7 -3632,7 +3655,7 @@@ static inline int seal_check_future_wri
                 * VM_MAYWRITE as we still want them to be COW-writable.
                 */
                if (vma->vm_flags & VM_SHARED)
-                       vma->vm_flags &= ~(VM_MAYWRITE);
+                       vm_flags_clear(vma, VM_MAYWRITE);
        }
  
        return 0;
diff --combined include/linux/mm_types.h
index af8119776ab18a57932378d12ae6285ab7284ac1,56753d0f096d06591c5c2ff29b05b72fb212c7b3..0722859c36478d92f06455c6841144a7bfe5d207
@@@ -140,30 -140,6 +140,6 @@@ struct page 
                };
                struct {        /* Tail pages of compound page */
                        unsigned long compound_head;    /* Bit zero is set */
-                       /* First tail page only */
-                       unsigned char compound_dtor;
-                       unsigned char compound_order;
-                       atomic_t compound_mapcount;
-                       atomic_t subpages_mapcount;
-                       atomic_t compound_pincount;
- #ifdef CONFIG_64BIT
-                       unsigned int compound_nr; /* 1 << compound_order */
- #endif
-               };
-               struct {        /* Second tail page of transparent huge page */
-                       unsigned long _compound_pad_1;  /* compound_head */
-                       unsigned long _compound_pad_2;
-                       /* For both global and memcg */
-                       struct list_head deferred_list;
-               };
-               struct {        /* Second tail page of hugetlb page */
-                       unsigned long _hugetlb_pad_1;   /* compound_head */
-                       void *hugetlb_subpool;
-                       void *hugetlb_cgroup;
-                       void *hugetlb_cgroup_rsvd;
-                       void *hugetlb_hwpoison;
-                       /* No more space on 32-bit: use third tail if more */
                };
                struct {        /* Page table pages */
                        unsigned long _pt_pad_1;        /* compound_head */
@@@ -302,20 -278,17 +278,17 @@@ static inline struct page *encoded_page
   * @_refcount: Do not access this member directly.  Use folio_ref_count()
   *    to find how many references there are to this folio.
   * @memcg_data: Memory Control Group data.
-  * @_flags_1: For large folios, additional page flags.
-  * @_head_1: Points to the folio.  Do not use.
   * @_folio_dtor: Which destructor to use for this folio.
   * @_folio_order: Do not use directly, call folio_order().
-  * @_compound_mapcount: Do not use directly, call folio_entire_mapcount().
-  * @_subpages_mapcount: Do not use directly, call folio_mapcount().
+  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
+  * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
   * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
   * @_folio_nr_pages: Do not use directly, call folio_nr_pages().
-  * @_flags_2: For alignment.  Do not use.
-  * @_head_2: Points to the folio.  Do not use.
   * @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h.
   * @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h.
   * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h.
   * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head().
+  * @_deferred_list: Folios to be split under memory pressure.
   *
   * A folio is a physically, virtually and logically contiguous set
   * of bytes.  It is a power-of-two in size, and it is aligned to that
@@@ -358,14 -331,16 +331,16 @@@ struct folio 
                struct {
                        unsigned long _flags_1;
                        unsigned long _head_1;
+       /* public: */
                        unsigned char _folio_dtor;
                        unsigned char _folio_order;
-                       atomic_t _compound_mapcount;
-                       atomic_t _subpages_mapcount;
+                       atomic_t _entire_mapcount;
+                       atomic_t _nr_pages_mapped;
                        atomic_t _pincount;
  #ifdef CONFIG_64BIT
                        unsigned int _folio_nr_pages;
  #endif
+       /* private: the union with struct page is transitional */
                };
                struct page __page_1;
        };
                struct {
                        unsigned long _flags_2;
                        unsigned long _head_2;
+       /* public: */
                        void *_hugetlb_subpool;
                        void *_hugetlb_cgroup;
                        void *_hugetlb_cgroup_rsvd;
                        void *_hugetlb_hwpoison;
+       /* private: the union with struct page is transitional */
+               };
+               struct {
+                       unsigned long _flags_2a;
+                       unsigned long _head_2a;
+       /* public: */
+                       struct list_head _deferred_list;
+       /* private: the union with struct page is transitional */
                };
                struct page __page_2;
        };
@@@ -401,53 -385,14 +385,14 @@@ FOLIO_MATCH(memcg_data, memcg_data)
                        offsetof(struct page, pg) + sizeof(struct page))
  FOLIO_MATCH(flags, _flags_1);
  FOLIO_MATCH(compound_head, _head_1);
- FOLIO_MATCH(compound_dtor, _folio_dtor);
- FOLIO_MATCH(compound_order, _folio_order);
- FOLIO_MATCH(compound_mapcount, _compound_mapcount);
- FOLIO_MATCH(subpages_mapcount, _subpages_mapcount);
- FOLIO_MATCH(compound_pincount, _pincount);
- #ifdef CONFIG_64BIT
- FOLIO_MATCH(compound_nr, _folio_nr_pages);
- #endif
  #undef FOLIO_MATCH
  #define FOLIO_MATCH(pg, fl)                                           \
        static_assert(offsetof(struct folio, fl) ==                     \
                        offsetof(struct page, pg) + 2 * sizeof(struct page))
  FOLIO_MATCH(flags, _flags_2);
  FOLIO_MATCH(compound_head, _head_2);
- FOLIO_MATCH(hugetlb_subpool, _hugetlb_subpool);
- FOLIO_MATCH(hugetlb_cgroup, _hugetlb_cgroup);
- FOLIO_MATCH(hugetlb_cgroup_rsvd, _hugetlb_cgroup_rsvd);
- FOLIO_MATCH(hugetlb_hwpoison, _hugetlb_hwpoison);
  #undef FOLIO_MATCH
  
- static inline atomic_t *folio_mapcount_ptr(struct folio *folio)
- {
-       struct page *tail = &folio->page + 1;
-       return &tail->compound_mapcount;
- }
- static inline atomic_t *folio_subpages_mapcount_ptr(struct folio *folio)
- {
-       struct page *tail = &folio->page + 1;
-       return &tail->subpages_mapcount;
- }
- static inline atomic_t *compound_mapcount_ptr(struct page *page)
- {
-       return &page[1].compound_mapcount;
- }
- static inline atomic_t *subpages_mapcount_ptr(struct page *page)
- {
-       return &page[1].subpages_mapcount;
- }
- static inline atomic_t *compound_pincount_ptr(struct page *page)
- {
-       return &page[1].compound_pincount;
- }
  /*
   * Used for sizing the vmemmap region on some architectures
   */
@@@ -546,7 -491,15 +491,15 @@@ struct vm_area_struct 
         * See vmf_insert_mixed_prot() for discussion.
         */
        pgprot_t vm_page_prot;
-       unsigned long vm_flags;         /* Flags, see mm.h. */
+       /*
+        * Flags, see mm.h.
+        * To modify use vm_flags_{init|reset|set|clear|mod} functions.
+        */
+       union {
+               const vm_flags_t vm_flags;
+               vm_flags_t __private __vm_flags;
+       };
  
        /*
         * For areas with an address space and backing store,
@@@ -645,20 -598,9 +598,20 @@@ struct mm_struct 
                 * &struct mm_struct is freed.
                 */
                atomic_t mm_count;
 -
 +#ifdef CONFIG_SCHED_MM_CID
 +              /**
 +               * @cid_lock: Protect cid bitmap updates vs lookups.
 +               *
 +               * Prevent situations where updates to the cid bitmap happen
 +               * concurrently with lookups. Those can lead to situations
 +               * where a lookup cannot find a free bit simply because it was
 +               * unlucky enough to load, non-atomically, bitmap words as they
 +               * were being concurrently updated by the updaters.
 +               */
 +              raw_spinlock_t cid_lock;
 +#endif
  #ifdef CONFIG_MMU
-               atomic_long_t pgtables_bytes;   /* PTE page table pages */
+               atomic_long_t pgtables_bytes;   /* size of all page tables */
  #endif
                int map_count;                  /* number of VMAs */
  
@@@ -915,41 -857,9 +868,39 @@@ struct vma_iterator 
  static inline void vma_iter_init(struct vma_iterator *vmi,
                struct mm_struct *mm, unsigned long addr)
  {
-       vmi->mas.tree = &mm->mm_mt;
-       vmi->mas.index = addr;
-       vmi->mas.node = MAS_START;
+       mas_init(&vmi->mas, &mm->mm_mt, addr);
  }
  
 +#ifdef CONFIG_SCHED_MM_CID
 +/* Accessor for struct mm_struct's cidmask. */
 +static inline cpumask_t *mm_cidmask(struct mm_struct *mm)
 +{
 +      unsigned long cid_bitmap = (unsigned long)mm;
 +
 +      cid_bitmap += offsetof(struct mm_struct, cpu_bitmap);
 +      /* Skip cpu_bitmap */
 +      cid_bitmap += cpumask_size();
 +      return (struct cpumask *)cid_bitmap;
 +}
 +
 +static inline void mm_init_cid(struct mm_struct *mm)
 +{
 +      raw_spin_lock_init(&mm->cid_lock);
 +      cpumask_clear(mm_cidmask(mm));
 +}
 +
 +static inline unsigned int mm_cid_size(void)
 +{
 +      return cpumask_size();
 +}
 +#else /* CONFIG_SCHED_MM_CID */
 +static inline void mm_init_cid(struct mm_struct *mm) { }
 +static inline unsigned int mm_cid_size(void)
 +{
 +      return 0;
 +}
 +#endif /* CONFIG_SCHED_MM_CID */
 +
  struct mmu_gather;
  extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
  extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
@@@ -1126,4 -1036,87 +1077,87 @@@ enum fault_flag 
  
  typedef unsigned int __bitwise zap_flags_t;
  
+ /*
+  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
+  * other. Here is what they mean, and how to use them:
+  *
+  *
+  * FIXME: For pages which are part of a filesystem, mappings are subject to the
+  * lifetime enforced by the filesystem and we need guarantees that longterm
+  * users like RDMA and V4L2 only establish mappings which coordinate usage with
+  * the filesystem.  Ideas for this coordination include revoking the longterm
+  * pin, delaying writeback, bounce buffer page writeback, etc.  As FS DAX was
+  * added after the problem with filesystems was found FS DAX VMAs are
+  * specifically failed.  Filesystem pages are still subject to bugs and use of
+  * FOLL_LONGTERM should be avoided on those pages.
+  *
+  * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+  * that region.  And so, CMA attempts to migrate the page before pinning, when
+  * FOLL_LONGTERM is specified.
+  *
+  * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
+  * but an additional pin counting system) will be invoked. This is intended for
+  * anything that gets a page reference and then touches page data (for example,
+  * Direct IO). This lets the filesystem know that some non-file-system entity is
+  * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
+  * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
+  * a call to unpin_user_page().
+  *
+  * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
+  * and separate refcounting mechanisms, however, and that means that each has
+  * its own acquire and release mechanisms:
+  *
+  *     FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
+  *
+  *     FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release.
+  *
+  * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
+  * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
+  * calls applied to them, and that's perfectly OK. This is a constraint on the
+  * callers, not on the pages.)
+  *
+  * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
+  * directly by the caller. That's in order to help avoid mismatches when
+  * releasing pages: get_user_pages*() pages must be released via put_page(),
+  * while pin_user_pages*() pages must be released via unpin_user_page().
+  *
+  * Please see Documentation/core-api/pin_user_pages.rst for more information.
+  */
+ enum {
+       /* check pte is writable */
+       FOLL_WRITE = 1 << 0,
+       /* do get_page on page */
+       FOLL_GET = 1 << 1,
+       /* give error on hole if it would be zero */
+       FOLL_DUMP = 1 << 2,
+       /* get_user_pages read/write w/o permission */
+       FOLL_FORCE = 1 << 3,
+       /*
+        * if a disk transfer is needed, start the IO and return without waiting
+        * upon it
+        */
+       FOLL_NOWAIT = 1 << 4,
+       /* do not fault in pages */
+       FOLL_NOFAULT = 1 << 5,
+       /* check page is hwpoisoned */
+       FOLL_HWPOISON = 1 << 6,
+       /* don't do file mappings */
+       FOLL_ANON = 1 << 7,
+       /*
+        * FOLL_LONGTERM indicates that the page will be held for an indefinite
+        * time period _often_ under userspace control.  This is in contrast to
+        * iov_iter_get_pages(), whose usages are transient.
+        */
+       FOLL_LONGTERM = 1 << 8,
+       /* split huge pmd before returning */
+       FOLL_SPLIT_PMD = 1 << 9,
+       /* allow returning PCI P2PDMA pages */
+       FOLL_PCI_P2PDMA = 1 << 10,
+       /* allow interrupts from generic signals */
+       FOLL_INTERRUPTIBLE = 1 << 11,
+       /* See also internal only FOLL flags in mm/internal.h */
+ };
  #endif /* _LINUX_MM_TYPES_H */
diff --combined include/linux/pagemap.h
index 2f5b36f446cce386f317b226a066329b7d0dfe13,6a32ac170d3df79217d674cd6c07e42c19e69697..0acb8e1fb7afdc92b8d17c1216294d02f6fa19d9
@@@ -546,6 -546,26 +546,26 @@@ static inline struct folio *filemap_loc
        return __filemap_get_folio(mapping, index, FGP_LOCK, 0);
  }
  
+ /**
+  * filemap_grab_folio - grab a folio from the page cache
+  * @mapping: The address space to search
+  * @index: The page index
+  *
+  * Looks up the page cache entry at @mapping & @index. If no folio is found,
+  * a new folio is created. The folio is locked, marked as accessed, and
+  * returned.
+  *
+  * Return: A found or created folio. NULL if no folio is found and failed to
+  * create a folio.
+  */
+ static inline struct folio *filemap_grab_folio(struct address_space *mapping,
+                                       pgoff_t index)
+ {
+       return __filemap_get_folio(mapping, index,
+                       FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+                       mapping_gfp_mask(mapping));
+ }
  /**
   * find_get_page - find and get a page reference
   * @mapping: the address_space to search
@@@ -719,16 -739,8 +739,8 @@@ unsigned filemap_get_folios(struct addr
                pgoff_t end, struct folio_batch *fbatch);
  unsigned filemap_get_folios_contig(struct address_space *mapping,
                pgoff_t *start, pgoff_t end, struct folio_batch *fbatch);
- unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
-                       pgoff_t end, xa_mark_t tag, unsigned int nr_pages,
-                       struct page **pages);
- static inline unsigned find_get_pages_tag(struct address_space *mapping,
-                       pgoff_t *index, xa_mark_t tag, unsigned int nr_pages,
-                       struct page **pages)
- {
-       return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag,
-                                       nr_pages, pages);
- }
+ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
+               pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch);
  
  struct page *grab_cache_page_write_begin(struct address_space *mapping,
                        pgoff_t index);
@@@ -744,6 -756,8 +756,8 @@@ static inline struct page *grab_cache_p
  
  struct folio *read_cache_folio(struct address_space *, pgoff_t index,
                filler_t *filler, struct file *file);
+ struct folio *mapping_read_folio_gfp(struct address_space *, pgoff_t index,
+               gfp_t flags);
  struct page *read_cache_page(struct address_space *, pgoff_t index,
                filler_t *filler, struct file *file);
  extern struct page * read_cache_page_gfp(struct address_space *mapping,
@@@ -977,6 -991,16 +991,6 @@@ static inline int folio_lock_killable(s
        return 0;
  }
  
 -/*
 - * lock_page_killable is like lock_page but can be interrupted by fatal
 - * signals.  It returns 0 if it locked the page and -EINTR if it was
 - * killed while waiting.
 - */
 -static inline int lock_page_killable(struct page *page)
 -{
 -      return folio_lock_killable(page_folio(page));
 -}
 -
  /*
   * folio_lock_or_retry - Lock the folio, unless this would block and the
   * caller indicated that it can handle a retry.
diff --combined init/main.c
index 669cb892e6c179f8934dd4638e64076a0eca8023,64cd2ff051c429de760ee1450ed42f7bac0adf22..4425d1783d5c21e8f1da703e503a5edc37a3d93f
@@@ -156,7 -156,7 +156,7 @@@ static char *extra_init_args
  
  #ifdef CONFIG_BOOT_CONFIG
  /* Is bootconfig on command line? */
 -static bool bootconfig_found;
 +static bool bootconfig_found = IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE);
  static size_t initargs_offs;
  #else
  # define bootconfig_found false
@@@ -855,8 -855,8 +855,8 @@@ static void __init mm_init(void
        pgtable_init();
        debug_objects_mem_init();
        vmalloc_init();
-       /* Should be run after vmap initialization */
-       if (early_page_ext_enabled())
+       /* If no deferred init page_ext now, as vmap is fully initialized */
+       if (!deferred_struct_pages)
                page_ext_init();
        /* Should be run before the first non-init thread is created */
        init_espfix_bsp();
@@@ -1628,7 -1628,7 +1628,7 @@@ static noinline void __init kernel_init
        padata_init();
        page_alloc_init_late();
        /* Initialize page ext after all struct pages are initialized. */
-       if (!early_page_ext_enabled())
+       if (deferred_struct_pages)
                page_ext_init();
  
        do_basic_setup();
diff --combined io_uring/io_uring.c
index 3b915deb4d08f4d705cf75618295941cede5dc98,3a934f73313672b5cbc7e0b64d4b52844fbb2ad9..1df68da89f998660d54bc4999430f6219000b8a2
@@@ -151,7 -151,7 +151,7 @@@ static void io_move_task_work_from_loca
  static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
  static __cold void io_fallback_tw(struct io_uring_task *tctx);
  
 -static struct kmem_cache *req_cachep;
 +struct kmem_cache *req_cachep;
  
  struct sock *io_uring_get_socket(struct file *file)
  {
@@@ -230,7 -230,6 +230,7 @@@ static inline void req_fail_link_node(s
  static inline void io_req_add_to_cache(struct io_kiocb *req, struct io_ring_ctx *ctx)
  {
        wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
 +      kasan_poison_object_data(req_cachep, req);
  }
  
  static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref)
@@@ -246,15 -245,17 +246,15 @@@ static __cold void io_fallback_req_func
                                                fallback_work.work);
        struct llist_node *node = llist_del_all(&ctx->fallback_llist);
        struct io_kiocb *req, *tmp;
 -      bool locked = false;
 +      bool locked = true;
  
 -      percpu_ref_get(&ctx->refs);
 +      mutex_lock(&ctx->uring_lock);
        llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
                req->io_task_work.func(req, &locked);
 -
 -      if (locked) {
 -              io_submit_flush_completions(ctx);
 -              mutex_unlock(&ctx->uring_lock);
 -      }
 -      percpu_ref_put(&ctx->refs);
 +      if (WARN_ON_ONCE(!locked))
 +              return;
 +      io_submit_flush_completions(ctx);
 +      mutex_unlock(&ctx->uring_lock);
  }
  
  static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
@@@ -315,7 -316,6 +315,7 @@@ static __cold struct io_ring_ctx *io_ri
        xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1);
        mutex_init(&ctx->uring_lock);
        init_waitqueue_head(&ctx->cq_wait);
 +      init_waitqueue_head(&ctx->poll_wq);
        spin_lock_init(&ctx->completion_lock);
        spin_lock_init(&ctx->timeout_lock);
        INIT_WQ_LIST(&ctx->iopoll_list);
@@@ -407,7 -407,7 +407,7 @@@ static inline void io_arm_ltimeout(stru
  
  static void io_prep_async_work(struct io_kiocb *req)
  {
 -      const struct io_op_def *def = &io_op_defs[req->opcode];
 +      const struct io_issue_def *def = &io_issue_defs[req->opcode];
        struct io_ring_ctx *ctx = req->ctx;
  
        if (!(req->flags & REQ_F_CREDS)) {
@@@ -572,8 -572,6 +572,8 @@@ static void io_eventfd_flush_signal(str
  
  void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
  {
 +      if (ctx->poll_activated)
 +              io_poll_wq_wake(ctx);
        if (ctx->off_timeout_used)
                io_flush_timeouts(ctx);
        if (ctx->drain_active) {
@@@ -620,25 -618,6 +620,25 @@@ static inline void __io_cq_unlock_post(
        io_cqring_wake(ctx);
  }
  
 +static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
 +      __releases(ctx->completion_lock)
 +{
 +      io_commit_cqring(ctx);
 +      __io_cq_unlock(ctx);
 +      io_commit_cqring_flush(ctx);
 +
 +      /*
 +       * As ->task_complete implies that the ring is single tasked, cq_wait
 +       * may only be waited on by the current in io_cqring_wait(), but since
 +       * it will re-check the wakeup conditions once we return we can safely
 +       * skip waking it up.
 +       */
 +      if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
 +              smp_mb();
 +              __io_cqring_wake(ctx);
 +      }
 +}
 +
  void io_cq_unlock_post(struct io_ring_ctx *ctx)
        __releases(ctx->completion_lock)
  {
@@@ -666,6 -645,7 +666,6 @@@ static void io_cqring_overflow_kill(str
        }
  }
  
 -/* Returns true if there are no backlogged entries after the flush */
  static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
  {
        size_t cqe_size = sizeof(struct io_uring_cqe);
@@@ -713,8 -693,7 +713,8 @@@ static void io_cqring_overflow_flush(st
                io_cqring_do_overflow_flush(ctx);
  }
  
 -void __io_put_task(struct task_struct *task, int nr)
 +/* can be called by any task */
 +static void io_put_task_remote(struct task_struct *task, int nr)
  {
        struct io_uring_task *tctx = task->io_uring;
  
        put_task_struct_many(task, nr);
  }
  
 +/* used by a task to put its own references */
 +static void io_put_task_local(struct task_struct *task, int nr)
 +{
 +      task->io_uring->cached_refs += nr;
 +}
 +
 +/* must to be called somewhat shortly after putting a request */
 +static inline void io_put_task(struct task_struct *task, int nr)
 +{
 +      if (likely(task == current))
 +              io_put_task_local(task, nr);
 +      else
 +              io_put_task_remote(task, nr);
 +}
 +
  void io_task_refs_refill(struct io_uring_task *tctx)
  {
        unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR;
@@@ -981,15 -945,15 +981,15 @@@ static void __io_req_complete_post(stru
                                req->link = NULL;
                        }
                }
 +              io_put_kbuf_comp(req);
 +              io_dismantle_req(req);
                io_req_put_rsrc(req);
                /*
                 * Selected buffer deallocation in io_clean_op() assumes that
                 * we don't hold ->completion_lock. Clean them here to avoid
                 * deadlocks.
                 */
 -              io_put_kbuf_comp(req);
 -              io_dismantle_req(req);
 -              io_put_task(req->task, 1);
 +              io_put_task_remote(req->task, 1);
                wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
                ctx->locked_free_nr++;
        }
@@@ -1016,7 -980,7 +1016,7 @@@ void io_req_complete_post(struct io_kio
  void io_req_defer_failed(struct io_kiocb *req, s32 res)
        __must_hold(&ctx->uring_lock)
  {
 -      const struct io_op_def *def = &io_op_defs[req->opcode];
 +      const struct io_cold_def *def = &io_cold_defs[req->opcode];
  
        lockdep_assert_held(&req->ctx->uring_lock);
  
@@@ -1112,7 -1076,7 +1112,7 @@@ __cold void io_free_req(struct io_kioc
  
        io_req_put_rsrc(req);
        io_dismantle_req(req);
 -      io_put_task(req->task, 1);
 +      io_put_task_remote(req->task, 1);
  
        spin_lock(&ctx->completion_lock);
        wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
@@@ -1166,7 -1130,7 +1166,7 @@@ static unsigned int handle_tw_list(stru
  {
        unsigned int count = 0;
  
 -      while (node != last) {
 +      while (node && node != last) {
                struct llist_node *next = node->next;
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    io_task_work.node);
                        /* if not contended, grab and improve batching */
                        *locked = mutex_trylock(&(*ctx)->uring_lock);
                        percpu_ref_get(&(*ctx)->refs);
 -              }
 +              } else if (!*locked)
 +                      *locked = mutex_trylock(&(*ctx)->uring_lock);
                req->io_task_work.func(req, locked);
                node = next;
                count++;
 +              if (unlikely(need_resched())) {
 +                      ctx_flush_and_put(*ctx, locked);
 +                      *ctx = NULL;
 +                      cond_resched();
 +              }
        }
  
        return count;
@@@ -1232,29 -1190,23 +1232,29 @@@ void tctx_task_work(struct callback_hea
                                                  task_work);
        struct llist_node fake = {};
        struct llist_node *node;
 -      unsigned int loops = 1;
 -      unsigned int count;
 +      unsigned int loops = 0;
 +      unsigned int count = 0;
  
        if (unlikely(current->flags & PF_EXITING)) {
                io_fallback_tw(tctx);
                return;
        }
  
 -      node = io_llist_xchg(&tctx->task_list, &fake);
 -      count = handle_tw_list(node, &ctx, &uring_locked, NULL);
 -      node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
 -      while (node != &fake) {
 +      do {
                loops++;
                node = io_llist_xchg(&tctx->task_list, &fake);
                count += handle_tw_list(node, &ctx, &uring_locked, &fake);
 +
 +              /* skip expensive cmpxchg if there are items in the list */
 +              if (READ_ONCE(tctx->task_list.first) != &fake)
 +                      continue;
 +              if (uring_locked && !wq_list_empty(&ctx->submit_state.compl_reqs)) {
 +                      io_submit_flush_completions(ctx);
 +                      if (READ_ONCE(tctx->task_list.first) != &fake)
 +                              continue;
 +              }
                node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
 -      }
 +      } while (node != &fake);
  
        ctx_flush_and_put(ctx, &uring_locked);
  
@@@ -1289,7 -1241,7 +1289,7 @@@ static void io_req_local_work_add(struc
                percpu_ref_put(&ctx->refs);
                return;
        }
 -      /* need it for the following io_cqring_wake() */
 +      /* needed for the following wake up */
        smp_mb__after_atomic();
  
        if (unlikely(atomic_read(&req->task->io_uring->in_idle))) {
  
        if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
                atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 -
        if (ctx->has_evfd)
                io_eventfd_signal(ctx);
 -      __io_cqring_wake(ctx);
 +
 +      if (READ_ONCE(ctx->cq_waiting))
 +              wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
        percpu_ref_put(&ctx->refs);
  }
  
@@@ -1345,19 -1296,21 +1345,19 @@@ static void __cold io_move_task_work_fr
        }
  }
  
 -int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
 +static int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
  {
        struct llist_node *node;
 -      struct llist_node fake;
 -      struct llist_node *current_final = NULL;
 -      int ret;
 -      unsigned int loops = 1;
 +      unsigned int loops = 0;
 +      int ret = 0;
  
 -      if (unlikely(ctx->submitter_task != current))
 +      if (WARN_ON_ONCE(ctx->submitter_task != current))
                return -EEXIST;
 -
 -      node = io_llist_xchg(&ctx->work_llist, &fake);
 -      ret = 0;
 +      if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 +              atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
  again:
 -      while (node != current_final) {
 +      node = io_llist_xchg(&ctx->work_llist, NULL);
 +      while (node) {
                struct llist_node *next = node->next;
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    io_task_work.node);
                ret++;
                node = next;
        }
 +      loops++;
  
 -      if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 -              atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 -
 -      node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL);
 -      if (node != &fake) {
 -              loops++;
 -              current_final = &fake;
 -              node = io_llist_xchg(&ctx->work_llist, &fake);
 +      if (!llist_empty(&ctx->work_llist))
                goto again;
 -      }
 -
 -      if (*locked)
 +      if (*locked) {
                io_submit_flush_completions(ctx);
 +              if (!llist_empty(&ctx->work_llist))
 +                      goto again;
 +      }
        trace_io_uring_local_work_run(ctx, ret, loops);
        return ret;
 -
  }
  
 -int io_run_local_work(struct io_ring_ctx *ctx)
 +static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
  {
        bool locked;
        int ret;
        if (llist_empty(&ctx->work_llist))
                return 0;
  
 -      __set_current_state(TASK_RUNNING);
 -      locked = mutex_trylock(&ctx->uring_lock);
 +      locked = true;
 +      ret = __io_run_local_work(ctx, &locked);
 +      /* shouldn't happen! */
 +      if (WARN_ON_ONCE(!locked))
 +              mutex_lock(&ctx->uring_lock);
 +      return ret;
 +}
 +
 +static int io_run_local_work(struct io_ring_ctx *ctx)
 +{
 +      bool locked = mutex_trylock(&ctx->uring_lock);
 +      int ret;
 +
        ret = __io_run_local_work(ctx, &locked);
        if (locked)
                mutex_unlock(&ctx->uring_lock);
@@@ -1417,12 -1365,10 +1417,12 @@@ void io_req_task_submit(struct io_kioc
  {
        io_tw_lock(req->ctx, locked);
        /* req->task == current here, checking PF_EXITING is safe */
 -      if (likely(!(req->task->flags & PF_EXITING)))
 -              io_queue_sqe(req);
 -      else
 +      if (unlikely(req->task->flags & PF_EXITING))
                io_req_defer_failed(req, -EFAULT);
 +      else if (req->flags & REQ_F_FORCE_ASYNC)
 +              io_queue_iowq(req, locked);
 +      else
 +              io_queue_sqe(req);
  }
  
  void io_req_task_queue_fail(struct io_kiocb *req, int ret)
@@@ -1521,7 -1467,7 +1521,7 @@@ static void __io_submit_flush_completio
                        }
                }
        }
 -      __io_cq_unlock_post(ctx);
 +      __io_cq_unlock_post_flush(ctx);
  
        if (!wq_list_empty(&ctx->submit_state.compl_reqs)) {
                io_free_batch_list(ctx, state->compl_reqs.first);
@@@ -1762,8 -1708,8 +1762,8 @@@ unsigned int io_file_get_flags(struct f
  
  bool io_alloc_async_data(struct io_kiocb *req)
  {
 -      WARN_ON_ONCE(!io_op_defs[req->opcode].async_size);
 -      req->async_data = kmalloc(io_op_defs[req->opcode].async_size, GFP_KERNEL);
 +      WARN_ON_ONCE(!io_cold_defs[req->opcode].async_size);
 +      req->async_data = kmalloc(io_cold_defs[req->opcode].async_size, GFP_KERNEL);
        if (req->async_data) {
                req->flags |= REQ_F_ASYNC_DATA;
                return false;
  
  int io_req_prep_async(struct io_kiocb *req)
  {
 -      const struct io_op_def *def = &io_op_defs[req->opcode];
 +      const struct io_cold_def *cdef = &io_cold_defs[req->opcode];
 +      const struct io_issue_def *def = &io_issue_defs[req->opcode];
  
        /* assign early for deferred execution for non-fixed file */
        if (def->needs_file && !(req->flags & REQ_F_FIXED_FILE))
                req->file = io_file_get_normal(req, req->cqe.fd);
 -      if (!def->prep_async)
 +      if (!cdef->prep_async)
                return 0;
        if (WARN_ON_ONCE(req_has_async_data(req)))
                return -EFAULT;
 -      if (!io_op_defs[req->opcode].manual_alloc) {
 +      if (!def->manual_alloc) {
                if (io_alloc_async_data(req))
                        return -EAGAIN;
        }
 -      return def->prep_async(req);
 +      return cdef->prep_async(req);
  }
  
  static u32 io_get_sequence(struct io_kiocb *req)
@@@ -1820,12 -1765,17 +1820,12 @@@ queue
        }
        spin_unlock(&ctx->completion_lock);
  
 -      ret = io_req_prep_async(req);
 -      if (ret) {
 -fail:
 -              io_req_defer_failed(req, ret);
 -              return;
 -      }
        io_prep_async_link(req);
        de = kmalloc(sizeof(*de), GFP_KERNEL);
        if (!de) {
                ret = -ENOMEM;
 -              goto fail;
 +              io_req_defer_failed(req, ret);
 +              return;
        }
  
        spin_lock(&ctx->completion_lock);
@@@ -1851,7 -1801,7 +1851,7 @@@ static void io_clean_op(struct io_kioc
        }
  
        if (req->flags & REQ_F_NEED_CLEANUP) {
 -              const struct io_op_def *def = &io_op_defs[req->opcode];
 +              const struct io_cold_def *def = &io_cold_defs[req->opcode];
  
                if (def->cleanup)
                        def->cleanup(req);
        req->flags &= ~IO_REQ_CLEAN_FLAGS;
  }
  
 -static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags)
 +static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def,
 +                         unsigned int issue_flags)
  {
 -      if (req->file || !io_op_defs[req->opcode].needs_file)
 +      if (req->file || !def->needs_file)
                return true;
  
        if (req->flags & REQ_F_FIXED_FILE)
  
  static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
  {
 -      const struct io_op_def *def = &io_op_defs[req->opcode];
 +      const struct io_issue_def *def = &io_issue_defs[req->opcode];
        const struct cred *creds = NULL;
        int ret;
  
 -      if (unlikely(!io_assign_file(req, issue_flags)))
 +      if (unlikely(!io_assign_file(req, def, issue_flags)))
                return -EBADF;
  
        if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
@@@ -1945,7 -1894,7 +1945,7 @@@ struct io_wq_work *io_wq_free_work(stru
  void io_wq_submit_work(struct io_wq_work *work)
  {
        struct io_kiocb *req = container_of(work, struct io_kiocb, work);
 -      const struct io_op_def *def = &io_op_defs[req->opcode];
 +      const struct io_issue_def *def = &io_issue_defs[req->opcode];
        unsigned int issue_flags = IO_URING_F_UNLOCKED | IO_URING_F_IOWQ;
        bool needs_poll = false;
        int ret = 0, err = -ECANCELED;
@@@ -1964,7 -1913,7 +1964,7 @@@ fail
                io_req_task_queue_fail(req, err);
                return;
        }
 -      if (!io_assign_file(req, issue_flags)) {
 +      if (!io_assign_file(req, def, issue_flags)) {
                err = -EBADF;
                work->flags |= IO_WQ_WORK_CANCEL;
                goto fail;
@@@ -2099,16 -2048,13 +2099,16 @@@ static void io_queue_sqe_fallback(struc
                req->flags &= ~REQ_F_HARDLINK;
                req->flags |= REQ_F_LINK;
                io_req_defer_failed(req, req->cqe.res);
 -      } else if (unlikely(req->ctx->drain_active)) {
 -              io_drain_req(req);
        } else {
                int ret = io_req_prep_async(req);
  
 -              if (unlikely(ret))
 +              if (unlikely(ret)) {
                        io_req_defer_failed(req, ret);
 +                      return;
 +              }
 +
 +              if (unlikely(req->ctx->drain_active))
 +                      io_drain_req(req);
                else
                        io_queue_iowq(req, NULL);
        }
@@@ -2160,7 -2106,7 +2160,7 @@@ static int io_init_req(struct io_ring_c
                       const struct io_uring_sqe *sqe)
        __must_hold(&ctx->uring_lock)
  {
 -      const struct io_op_def *def;
 +      const struct io_issue_def *def;
        unsigned int sqe_flags;
        int personality;
        u8 opcode;
                req->opcode = 0;
                return -EINVAL;
        }
 -      def = &io_op_defs[opcode];
 +      def = &io_issue_defs[opcode];
        if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
                /* enforce forwards compatibility on users */
                if (sqe_flags & ~SQE_VALID_FLAGS)
@@@ -2389,7 -2335,7 +2389,7 @@@ static void io_commit_sqring(struct io_
   * used, it's important that those reads are done through READ_ONCE() to
   * prevent a re-load down the line.
   */
 -static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
 +static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
  {
        unsigned head, mask = ctx->sq_entries - 1;
        unsigned sq_idx = ctx->cached_sq_head++ & mask;
                /* double index for 128-byte SQEs, twice as long */
                if (ctx->flags & IORING_SETUP_SQE128)
                        head <<= 1;
 -              return &ctx->sq_sqes[head];
 +              *sqe = &ctx->sq_sqes[head];
 +              return true;
        }
  
        /* drop invalid entries */
        ctx->cq_extra--;
        WRITE_ONCE(ctx->rings->sq_dropped,
                   READ_ONCE(ctx->rings->sq_dropped) + 1);
 -      return NULL;
 +      return false;
  }
  
  int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
                const struct io_uring_sqe *sqe;
                struct io_kiocb *req;
  
 -              if (unlikely(!io_alloc_req_refill(ctx)))
 +              if (unlikely(!io_alloc_req(ctx, &req)))
                        break;
 -              req = io_alloc_req(ctx);
 -              sqe = io_get_sqe(ctx);
 -              if (unlikely(!sqe)) {
 +              if (unlikely(!io_get_sqe(ctx, &sqe))) {
                        io_req_add_to_cache(req, ctx);
                        break;
                }
@@@ -2473,13 -2420,13 +2473,13 @@@ struct io_wait_queue 
        struct io_ring_ctx *ctx;
        unsigned cq_tail;
        unsigned nr_timeouts;
 +      ktime_t timeout;
  };
  
  static inline bool io_has_work(struct io_ring_ctx *ctx)
  {
        return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
 -             ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
 -              !llist_empty(&ctx->work_llist));
 +             !llist_empty(&ctx->work_llist);
  }
  
  static inline bool io_should_wake(struct io_wait_queue *iowq)
  static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
                            int wake_flags, void *key)
  {
 -      struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
 -                                                      wq);
 -      struct io_ring_ctx *ctx = iowq->ctx;
 +      struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, wq);
  
        /*
         * Cannot safely flush overflowed CQEs from here, ensure we wake up
         * the task, and the next invocation will do it.
         */
 -      if (io_should_wake(iowq) || io_has_work(ctx))
 +      if (io_should_wake(iowq) || io_has_work(iowq->ctx))
                return autoremove_wake_function(curr, mode, wake_flags, key);
        return -1;
  }
  
  int io_run_task_work_sig(struct io_ring_ctx *ctx)
  {
 -      if (io_run_task_work_ctx(ctx) > 0)
 +      if (!llist_empty(&ctx->work_llist)) {
 +              __set_current_state(TASK_RUNNING);
 +              if (io_run_local_work(ctx) > 0)
 +                      return 1;
 +      }
 +      if (io_run_task_work() > 0)
                return 1;
        if (task_sigpending(current))
                return -EINTR;
  
  /* when returns >0, the caller should retry */
  static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 -                                        struct io_wait_queue *iowq,
 -                                        ktime_t *timeout)
 +                                        struct io_wait_queue *iowq)
  {
 -      int ret;
 -      unsigned long check_cq;
 -
 -      /* make sure we run task_work before checking for signals */
 -      ret = io_run_task_work_sig(ctx);
 -      if (ret || io_should_wake(iowq))
 -              return ret;
 -
 -      check_cq = READ_ONCE(ctx->check_cq);
 -      if (unlikely(check_cq)) {
 -              /* let the caller flush overflows, retry */
 -              if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
 -                      return 1;
 -              if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
 -                      return -EBADR;
 -      }
 -      if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
 +      if (unlikely(READ_ONCE(ctx->check_cq)))
 +              return 1;
 +      if (unlikely(!llist_empty(&ctx->work_llist)))
 +              return 1;
 +      if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL)))
 +              return 1;
 +      if (unlikely(task_sigpending(current)))
 +              return -EINTR;
 +      if (unlikely(io_should_wake(iowq)))
 +              return 0;
 +      if (iowq->timeout == KTIME_MAX)
 +              schedule();
 +      else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
                return -ETIME;
 -
 -      /*
 -       * Run task_work after scheduling. If we got woken because of
 -       * task_work being processed, run it now rather than let the caller
 -       * do another wait loop.
 -       */
 -      ret = io_run_task_work_sig(ctx);
 -      return ret < 0 ? ret : 1;
 +      return 0;
  }
  
  /*
@@@ -2554,17 -2510,23 +2554,17 @@@ static int io_cqring_wait(struct io_rin
  {
        struct io_wait_queue iowq;
        struct io_rings *rings = ctx->rings;
 -      ktime_t timeout = KTIME_MAX;
        int ret;
  
        if (!io_allowed_run_tw(ctx))
                return -EEXIST;
 -
 -      do {
 -              /* always run at least 1 task work to process local work */
 -              ret = io_run_task_work_ctx(ctx);
 -              if (ret < 0)
 -                      return ret;
 -              io_cqring_overflow_flush(ctx);
 -
 -              /* if user messes with these they will just get an early return */
 -              if (__io_cqring_events_user(ctx) >= min_events)
 -                      return 0;
 -      } while (ret > 0);
 +      if (!llist_empty(&ctx->work_llist))
 +              io_run_local_work(ctx);
 +      io_run_task_work();
 +      io_cqring_overflow_flush(ctx);
 +      /* if user messes with these they will just get an early return */
 +      if (__io_cqring_events_user(ctx) >= min_events)
 +              return 0;
  
        if (sig) {
  #ifdef CONFIG_COMPAT
                        return ret;
        }
  
 -      if (uts) {
 -              struct timespec64 ts;
 -
 -              if (get_timespec64(&ts, uts))
 -                      return -EFAULT;
 -              timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
 -      }
 -
        init_waitqueue_func_entry(&iowq.wq, io_wake_function);
        iowq.wq.private = current;
        INIT_LIST_HEAD(&iowq.wq.entry);
        iowq.ctx = ctx;
        iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
        iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 +      iowq.timeout = KTIME_MAX;
 +
 +      if (uts) {
 +              struct timespec64 ts;
 +
 +              if (get_timespec64(&ts, uts))
 +                      return -EFAULT;
 +              iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
 +      }
  
        trace_io_uring_cqring_wait(ctx, min_events);
        do {
 -              if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
 -                      finish_wait(&ctx->cq_wait, &iowq.wq);
 -                      io_cqring_do_overflow_flush(ctx);
 +              unsigned long check_cq;
 +
 +              if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
 +                      WRITE_ONCE(ctx->cq_waiting, 1);
 +                      set_current_state(TASK_INTERRUPTIBLE);
 +              } else {
 +                      prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 +                                                      TASK_INTERRUPTIBLE);
 +              }
 +
 +              ret = io_cqring_wait_schedule(ctx, &iowq);
 +              __set_current_state(TASK_RUNNING);
 +              WRITE_ONCE(ctx->cq_waiting, 0);
 +
 +              if (ret < 0)
 +                      break;
 +              /*
 +               * Run task_work after scheduling and before io_should_wake().
 +               * If we got woken because of task_work being processed, run it
 +               * now rather than let the caller do another wait loop.
 +               */
 +              io_run_task_work();
 +              if (!llist_empty(&ctx->work_llist))
 +                      io_run_local_work(ctx);
 +
 +              check_cq = READ_ONCE(ctx->check_cq);
 +              if (unlikely(check_cq)) {
 +                      /* let the caller flush overflows, retry */
 +                      if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
 +                              io_cqring_do_overflow_flush(ctx);
 +                      if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)) {
 +                              ret = -EBADR;
 +                              break;
 +                      }
                }
 -              prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
 -                                              TASK_INTERRUPTIBLE);
 -              ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
 -              if (__io_cqring_events_user(ctx) >= min_events)
 +
 +              if (io_should_wake(&iowq)) {
 +                      ret = 0;
                        break;
 +              }
                cond_resched();
 -      } while (ret > 0);
 +      } while (1);
  
 -      finish_wait(&ctx->cq_wait, &iowq.wq);
 +      if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
 +              finish_wait(&ctx->cq_wait, &iowq.wq);
        restore_saved_sigmask_unless(ret == -EINTR);
  
        return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
@@@ -2756,14 -2685,14 +2756,14 @@@ static int io_eventfd_unregister(struc
  
  static void io_req_caches_free(struct io_ring_ctx *ctx)
  {
 +      struct io_kiocb *req;
        int nr = 0;
  
        mutex_lock(&ctx->uring_lock);
        io_flush_cached_locked_reqs(ctx, &ctx->submit_state);
  
        while (!io_req_cache_empty(ctx)) {
 -              struct io_kiocb *req = io_alloc_req(ctx);
 -
 +              req = io_extract_req(ctx);
                kmem_cache_free(req_cachep, req);
                nr++;
        }
@@@ -2835,54 -2764,12 +2835,54 @@@ static __cold void io_ring_ctx_free(str
        kfree(ctx);
  }
  
 +static __cold void io_activate_pollwq_cb(struct callback_head *cb)
 +{
 +      struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
 +                                             poll_wq_task_work);
 +
 +      mutex_lock(&ctx->uring_lock);
 +      ctx->poll_activated = true;
 +      mutex_unlock(&ctx->uring_lock);
 +
 +      /*
 +       * Wake ups for some events between start of polling and activation
 +       * might've been lost due to loose synchronisation.
 +       */
 +      wake_up_all(&ctx->poll_wq);
 +      percpu_ref_put(&ctx->refs);
 +}
 +
 +static __cold void io_activate_pollwq(struct io_ring_ctx *ctx)
 +{
 +      spin_lock(&ctx->completion_lock);
 +      /* already activated or in progress */
 +      if (ctx->poll_activated || ctx->poll_wq_task_work.func)
 +              goto out;
 +      if (WARN_ON_ONCE(!ctx->task_complete))
 +              goto out;
 +      if (!ctx->submitter_task)
 +              goto out;
 +      /*
 +       * with ->submitter_task only the submitter task completes requests, we
 +       * only need to sync with it, which is done by injecting a tw
 +       */
 +      init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb);
 +      percpu_ref_get(&ctx->refs);
 +      if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL))
 +              percpu_ref_put(&ctx->refs);
 +out:
 +      spin_unlock(&ctx->completion_lock);
 +}
 +
  static __poll_t io_uring_poll(struct file *file, poll_table *wait)
  {
        struct io_ring_ctx *ctx = file->private_data;
        __poll_t mask = 0;
  
 -      poll_wait(file, &ctx->cq_wait, wait);
 +      if (unlikely(!ctx->poll_activated))
 +              io_activate_pollwq(ctx);
 +
 +      poll_wait(file, &ctx->poll_wq, wait);
        /*
         * synchronizes with barrier from wq_has_sleeper call in
         * io_commit_cqring
         * pushes them to do the flush.
         */
  
 -      if (io_cqring_events(ctx) || io_has_work(ctx))
 +      if (__io_cqring_events_user(ctx) || io_has_work(ctx))
                mask |= EPOLLIN | EPOLLRDNORM;
  
        return mask;
@@@ -3168,12 -3055,10 +3168,12 @@@ static __cold bool io_uring_try_cancel_
                while (!wq_list_empty(&ctx->iopoll_list)) {
                        io_iopoll_try_reap_events(ctx);
                        ret = true;
 +                      cond_resched();
                }
        }
  
 -      if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 +      if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
 +          io_allowed_defer_tw_run(ctx))
                ret |= io_run_local_work(ctx) > 0;
        ret |= io_cancel_defer_files(ctx, task, cancel_all);
        mutex_lock(&ctx->uring_lock);
@@@ -3321,7 -3206,7 +3321,7 @@@ static __cold int io_uring_mmap(struct 
  
  static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
  {
-       return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -EINVAL;
+       return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
  }
  
  static unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
@@@ -3445,9 -3330,11 +3445,9 @@@ SYSCALL_DEFINE6(io_uring_enter, unsigne
                }
                if (flags & IORING_ENTER_SQ_WAKEUP)
                        wake_up(&ctx->sq_data->wait);
 -              if (flags & IORING_ENTER_SQ_WAIT) {
 -                      ret = io_sqpoll_wait_sq(ctx);
 -                      if (ret)
 -                              goto out;
 -              }
 +              if (flags & IORING_ENTER_SQ_WAIT)
 +                      io_sqpoll_wait_sq(ctx);
 +
                ret = to_submit;
        } else if (to_submit) {
                ret = io_uring_add_tctx_node(ctx);
@@@ -3687,13 -3574,6 +3687,13 @@@ static __cold int io_uring_create(unsig
            !(ctx->flags & IORING_SETUP_SQPOLL))
                ctx->task_complete = true;
  
 +      /*
 +       * lazy poll_wq activation relies on ->task_complete for synchronisation
 +       * purposes, see io_activate_pollwq()
 +       */
 +      if (!ctx->task_complete)
 +              ctx->poll_activated = true;
 +
        /*
         * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
         * space applications don't need to do io completion events
                        IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
                        IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
                        IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
 -                      IORING_FEAT_LINKED_FILE;
 +                      IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING;
  
        if (copy_to_user(params, p, sizeof(*p))) {
                ret = -EFAULT;
  
        if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
            && !(ctx->flags & IORING_SETUP_R_DISABLED))
 -              ctx->submitter_task = get_task_struct(current);
 +              WRITE_ONCE(ctx->submitter_task, get_task_struct(current));
  
        file = io_uring_get_file(ctx);
        if (IS_ERR(file)) {
@@@ -3882,7 -3762,7 +3882,7 @@@ static __cold int io_probe(struct io_ri
  
        for (i = 0; i < nr_args; i++) {
                p->ops[i].op = i;
 -              if (!io_op_defs[i].not_supported)
 +              if (!io_issue_defs[i].not_supported)
                        p->ops[i].flags = IO_URING_OP_SUPPORTED;
        }
        p->ops_len = i;
@@@ -3987,15 -3867,8 +3987,15 @@@ static int io_register_enable_rings(str
        if (!(ctx->flags & IORING_SETUP_R_DISABLED))
                return -EBADFD;
  
 -      if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task)
 -              ctx->submitter_task = get_task_struct(current);
 +      if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task) {
 +              WRITE_ONCE(ctx->submitter_task, get_task_struct(current));
 +              /*
 +               * Lazy activation attempts would fail if it was polled before
 +               * submitter_task is set.
 +               */
 +              if (wq_has_sleeper(&ctx->poll_wq))
 +                      io_activate_pollwq(ctx);
 +      }
  
        if (ctx->restrictions.registered)
                ctx->restricted = 1;
@@@ -4306,36 -4179,17 +4306,36 @@@ SYSCALL_DEFINE4(io_uring_register, unsi
        struct io_ring_ctx *ctx;
        long ret = -EBADF;
        struct fd f;
 +      bool use_registered_ring;
 +
 +      use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING);
 +      opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
  
        if (opcode >= IORING_REGISTER_LAST)
                return -EINVAL;
  
 -      f = fdget(fd);
 -      if (!f.file)
 -              return -EBADF;
 +      if (use_registered_ring) {
 +              /*
 +               * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
 +               * need only dereference our task private array to find it.
 +               */
 +              struct io_uring_task *tctx = current->io_uring;
  
 -      ret = -EOPNOTSUPP;
 -      if (!io_is_uring_fops(f.file))
 -              goto out_fput;
 +              if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
 +                      return -EINVAL;
 +              fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
 +              f.file = tctx->registered_rings[fd];
 +              f.flags = 0;
 +              if (unlikely(!f.file))
 +                      return -EBADF;
 +      } else {
 +              f = fdget(fd);
 +              if (unlikely(!f.file))
 +                      return -EBADF;
 +              ret = -EOPNOTSUPP;
 +              if (!io_is_uring_fops(f.file))
 +                      goto out_fput;
 +      }
  
        ctx = f.file->private_data;
  
diff --combined kernel/bpf/syscall.c
index e3fcdc9836a6c2eb6467e9dc5aad5eee80075bc8,9f56b442daa95ff8cb01d6394739afd4e6f2e060..adc83cb82f379df0c9692b5fdd24cbb33319b444
@@@ -181,7 -181,7 +181,7 @@@ static int bpf_map_update_value(struct 
        int err;
  
        /* Need to create a kthread, thus must support schedule */
 -      if (bpf_map_is_dev_bound(map)) {
 +      if (bpf_map_is_offloaded(map)) {
                return bpf_map_offload_update_elem(map, key, value, flags);
        } else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
                   map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
@@@ -238,7 -238,7 +238,7 @@@ static int bpf_map_copy_value(struct bp
        void *ptr;
        int err;
  
 -      if (bpf_map_is_dev_bound(map))
 +      if (bpf_map_is_offloaded(map))
                return bpf_map_offload_lookup_elem(map, key, value);
  
        bpf_disable_instrumentation();
@@@ -309,7 -309,7 +309,7 @@@ static void *__bpf_map_area_alloc(u64 s
         * __GFP_RETRY_MAYFAIL to avoid such situations.
         */
  
 -      const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
 +      gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO);
        unsigned int flags = 0;
        unsigned long align = 1;
        void *area;
@@@ -390,7 -390,7 +390,7 @@@ static int bpf_map_alloc_id(struct bpf_
        return id > 0 ? 0 : id;
  }
  
 -void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 +void bpf_map_free_id(struct bpf_map *map)
  {
        unsigned long flags;
  
        if (!map->id)
                return;
  
 -      if (do_idr_lock)
 -              spin_lock_irqsave(&map_idr_lock, flags);
 -      else
 -              __acquire(&map_idr_lock);
 +      spin_lock_irqsave(&map_idr_lock, flags);
  
        idr_remove(&map_idr, map->id);
        map->id = 0;
  
 -      if (do_idr_lock)
 -              spin_unlock_irqrestore(&map_idr_lock, flags);
 -      else
 -              __release(&map_idr_lock);
 +      spin_unlock_irqrestore(&map_idr_lock, flags);
  }
  
  #ifdef CONFIG_MEMCG_KMEM
@@@ -418,8 -424,7 +418,8 @@@ static void bpf_map_save_memcg(struct b
         * So we have to check map->objcg for being NULL each time it's
         * being used.
         */
 -      map->objcg = get_obj_cgroup_from_current();
 +      if (memcg_bpf_enabled())
 +              map->objcg = get_obj_cgroup_from_current();
  }
  
  static void bpf_map_release_memcg(struct bpf_map *map)
@@@ -465,21 -470,6 +465,21 @@@ void *bpf_map_kzalloc(const struct bpf_
        return ptr;
  }
  
 +void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
 +                     gfp_t flags)
 +{
 +      struct mem_cgroup *memcg, *old_memcg;
 +      void *ptr;
 +
 +      memcg = bpf_map_get_memcg(map);
 +      old_memcg = set_active_memcg(memcg);
 +      ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
 +      set_active_memcg(old_memcg);
 +      mem_cgroup_put(memcg);
 +
 +      return ptr;
 +}
 +
  void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
                                    size_t align, gfp_t flags)
  {
@@@ -537,6 -527,9 +537,6 @@@ void btf_record_free(struct btf_record 
                return;
        for (i = 0; i < rec->cnt; i++) {
                switch (rec->fields[i].type) {
 -              case BPF_SPIN_LOCK:
 -              case BPF_TIMER:
 -                      break;
                case BPF_KPTR_UNREF:
                case BPF_KPTR_REF:
                        if (rec->fields[i].kptr.module)
                        break;
                case BPF_LIST_HEAD:
                case BPF_LIST_NODE:
 -                      /* Nothing to release for bpf_list_head */
 +              case BPF_RB_ROOT:
 +              case BPF_RB_NODE:
 +              case BPF_SPIN_LOCK:
 +              case BPF_TIMER:
 +                      /* Nothing to release */
                        break;
                default:
                        WARN_ON_ONCE(1);
@@@ -582,6 -571,9 +582,6 @@@ struct btf_record *btf_record_dup(cons
        new_rec->cnt = 0;
        for (i = 0; i < rec->cnt; i++) {
                switch (fields[i].type) {
 -              case BPF_SPIN_LOCK:
 -              case BPF_TIMER:
 -                      break;
                case BPF_KPTR_UNREF:
                case BPF_KPTR_REF:
                        btf_get(fields[i].kptr.btf);
                        break;
                case BPF_LIST_HEAD:
                case BPF_LIST_NODE:
 -                      /* Nothing to acquire for bpf_list_head */
 +              case BPF_RB_ROOT:
 +              case BPF_RB_NODE:
 +              case BPF_SPIN_LOCK:
 +              case BPF_TIMER:
 +                      /* Nothing to acquire */
                        break;
                default:
                        ret = -EFAULT;
@@@ -676,13 -664,7 +676,13 @@@ void bpf_obj_free_fields(const struct b
                                continue;
                        bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
                        break;
 +              case BPF_RB_ROOT:
 +                      if (WARN_ON_ONCE(rec->spin_lock_off < 0))
 +                              continue;
 +                      bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
 +                      break;
                case BPF_LIST_NODE:
 +              case BPF_RB_NODE:
                        break;
                default:
                        WARN_ON_ONCE(1);
@@@ -724,13 -706,13 +724,13 @@@ static void bpf_map_put_uref(struct bpf
  }
  
  /* decrement map refcnt and schedule it for freeing via workqueue
 - * (unrelying map implementation ops->map_free() might sleep)
 + * (underlying map implementation ops->map_free() might sleep)
   */
 -static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
 +void bpf_map_put(struct bpf_map *map)
  {
        if (atomic64_dec_and_test(&map->refcnt)) {
                /* bpf_map_free_id() must be called first */
 -              bpf_map_free_id(map, do_idr_lock);
 +              bpf_map_free_id(map);
                btf_put(map->btf);
                INIT_WORK(&map->work, bpf_map_free_deferred);
                /* Avoid spawning kworkers, since they all might contend
                queue_work(system_unbound_wq, &map->work);
        }
  }
 -
 -void bpf_map_put(struct bpf_map *map)
 -{
 -      __bpf_map_put(map, true);
 -}
  EXPORT_SYMBOL_GPL(bpf_map_put);
  
  void bpf_map_put_with_uref(struct bpf_map *map)
@@@ -895,10 -882,10 +895,10 @@@ static int bpf_map_mmap(struct file *fi
        /* set default open/close callbacks */
        vma->vm_ops = &bpf_map_default_vmops;
        vma->vm_private_data = map;
-       vma->vm_flags &= ~VM_MAYEXEC;
+       vm_flags_clear(vma, VM_MAYEXEC);
        if (!(vma->vm_flags & VM_WRITE))
                /* disallow re-mapping with PROT_WRITE */
-               vma->vm_flags &= ~VM_MAYWRITE;
+               vm_flags_clear(vma, VM_MAYWRITE);
  
        err = map->ops->map_mmap(map, vma);
        if (err)
@@@ -1018,8 -1005,7 +1018,8 @@@ static int map_check_btf(struct bpf_ma
                return -EINVAL;
  
        map->record = btf_parse_fields(btf, value_type,
 -                                     BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
 +                                     BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 +                                     BPF_RB_ROOT,
                                       map->value_size);
        if (!IS_ERR_OR_NULL(map->record)) {
                int i;
                                }
                                break;
                        case BPF_LIST_HEAD:
 +                      case BPF_RB_ROOT:
                                if (map->map_type != BPF_MAP_TYPE_HASH &&
                                    map->map_type != BPF_MAP_TYPE_LRU_HASH &&
                                    map->map_type != BPF_MAP_TYPE_ARRAY) {
@@@ -1498,7 -1483,7 +1498,7 @@@ static int map_delete_elem(union bpf_at
                goto err_put;
        }
  
 -      if (bpf_map_is_dev_bound(map)) {
 +      if (bpf_map_is_offloaded(map)) {
                err = bpf_map_offload_delete_elem(map, key);
                goto out;
        } else if (IS_FD_PROG_ARRAY(map) ||
@@@ -1562,7 -1547,7 +1562,7 @@@ static int map_get_next_key(union bpf_a
        if (!next_key)
                goto free_key;
  
 -      if (bpf_map_is_dev_bound(map)) {
 +      if (bpf_map_is_offloaded(map)) {
                err = bpf_map_offload_get_next_key(map, key, next_key);
                goto out;
        }
@@@ -1620,7 -1605,7 +1620,7 @@@ int generic_map_delete_batch(struct bpf
                                   map->key_size))
                        break;
  
 -              if (bpf_map_is_dev_bound(map)) {
 +              if (bpf_map_is_offloaded(map)) {
                        err = bpf_map_offload_delete_elem(map, key);
                        break;
                }
@@@ -1866,7 -1851,7 +1866,7 @@@ static int map_lookup_and_delete_elem(u
                   map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
                   map->map_type == BPF_MAP_TYPE_LRU_HASH ||
                   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
 -              if (!bpf_map_is_dev_bound(map)) {
 +              if (!bpf_map_is_offloaded(map)) {
                        bpf_disable_instrumentation();
                        rcu_read_lock();
                        err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags);
@@@ -1959,7 -1944,7 +1959,7 @@@ static int find_prog_type(enum bpf_prog
        if (!ops)
                return -EINVAL;
  
 -      if (!bpf_prog_is_dev_bound(prog->aux))
 +      if (!bpf_prog_is_offloaded(prog->aux))
                prog->aux->ops = ops;
        else
                prog->aux->ops = &bpf_offload_prog_ops;
@@@ -1987,7 -1972,7 +1987,7 @@@ static void bpf_audit_prog(const struc
                return;
        if (audit_enabled == AUDIT_OFF)
                return;
 -      if (op == BPF_AUDIT_LOAD)
 +      if (!in_irq() && !irqs_disabled())
                ctx = audit_context();
        ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF);
        if (unlikely(!ab))
@@@ -2016,7 -2001,7 +2016,7 @@@ static int bpf_prog_alloc_id(struct bpf
        return id > 0 ? 0 : id;
  }
  
 -void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 +void bpf_prog_free_id(struct bpf_prog *prog)
  {
        unsigned long flags;
  
        if (!prog->aux->id)
                return;
  
 -      if (do_idr_lock)
 -              spin_lock_irqsave(&prog_idr_lock, flags);
 -      else
 -              __acquire(&prog_idr_lock);
 -
 +      spin_lock_irqsave(&prog_idr_lock, flags);
        idr_remove(&prog_idr, prog->aux->id);
        prog->aux->id = 0;
 -
 -      if (do_idr_lock)
 -              spin_unlock_irqrestore(&prog_idr_lock, flags);
 -      else
 -              __release(&prog_idr_lock);
 +      spin_unlock_irqrestore(&prog_idr_lock, flags);
  }
  
  static void __bpf_prog_put_rcu(struct rcu_head *rcu)
@@@ -2074,15 -2067,17 +2074,15 @@@ static void bpf_prog_put_deferred(struc
        prog = aux->prog;
        perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0);
        bpf_audit_prog(prog, BPF_AUDIT_UNLOAD);
 +      bpf_prog_free_id(prog);
        __bpf_prog_put_noref(prog, true);
  }
  
 -static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 +static void __bpf_prog_put(struct bpf_prog *prog)
  {
        struct bpf_prog_aux *aux = prog->aux;
  
        if (atomic64_dec_and_test(&aux->refcnt)) {
 -              /* bpf_prog_free_id() must be called first */
 -              bpf_prog_free_id(prog, do_idr_lock);
 -
                if (in_irq() || irqs_disabled()) {
                        INIT_WORK(&aux->work, bpf_prog_put_deferred);
                        schedule_work(&aux->work);
  
  void bpf_prog_put(struct bpf_prog *prog)
  {
 -      __bpf_prog_put(prog, true);
 +      __bpf_prog_put(prog);
  }
  EXPORT_SYMBOL_GPL(bpf_prog_put);
  
@@@ -2260,7 -2255,7 +2260,7 @@@ bool bpf_prog_get_ok(struct bpf_prog *p
  
        if (prog->type != *attach_type)
                return false;
 -      if (bpf_prog_is_dev_bound(prog->aux) && !attach_drv)
 +      if (bpf_prog_is_offloaded(prog->aux) && !attach_drv)
                return false;
  
        return true;
@@@ -2496,8 -2491,7 +2496,8 @@@ static int bpf_prog_load(union bpf_att
                                 BPF_F_TEST_STATE_FREQ |
                                 BPF_F_SLEEPABLE |
                                 BPF_F_TEST_RND_HI32 |
 -                               BPF_F_XDP_HAS_FRAGS))
 +                               BPF_F_XDP_HAS_FRAGS |
 +                               BPF_F_XDP_DEV_BOUND_ONLY))
                return -EINVAL;
  
        if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
        prog->aux->attach_btf = attach_btf;
        prog->aux->attach_btf_id = attr->attach_btf_id;
        prog->aux->dst_prog = dst_prog;
 -      prog->aux->offload_requested = !!attr->prog_ifindex;
 +      prog->aux->dev_bound = !!attr->prog_ifindex;
        prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
        prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
  
        prog->gpl_compatible = is_gpl ? 1 : 0;
  
        if (bpf_prog_is_dev_bound(prog->aux)) {
 -              err = bpf_prog_offload_init(prog, attr);
 +              err = bpf_prog_dev_bound_init(prog, attr);
 +              if (err)
 +                      goto free_prog_sec;
 +      }
 +
 +      if (type == BPF_PROG_TYPE_EXT && dst_prog &&
 +          bpf_prog_is_dev_bound(dst_prog->aux)) {
 +              err = bpf_prog_dev_bound_inherit(prog, dst_prog);
                if (err)
                        goto free_prog_sec;
        }
@@@ -4010,7 -3997,7 +4010,7 @@@ static int bpf_prog_get_info_by_fd(stru
                        return -EFAULT;
        }
  
 -      if (bpf_prog_is_dev_bound(prog->aux)) {
 +      if (bpf_prog_is_offloaded(prog->aux)) {
                err = bpf_prog_offload_info_fill(&info, prog);
                if (err)
                        return err;
@@@ -4238,7 -4225,7 +4238,7 @@@ static int bpf_map_get_info_by_fd(struc
        }
        info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
  
 -      if (bpf_map_is_dev_bound(map)) {
 +      if (bpf_map_is_offloaded(map)) {
                err = bpf_map_offload_info_fill(&info, map);
                if (err)
                        return err;
@@@ -5332,6 -5319,7 +5332,6 @@@ static struct ctl_table bpf_syscall_tab
        {
                .procname       = "bpf_stats_enabled",
                .data           = &bpf_stats_enabled_key.key,
 -              .maxlen         = sizeof(bpf_stats_enabled_key),
                .mode           = 0644,
                .proc_handler   = bpf_stats_handler,
        },
diff --combined kernel/events/core.c
index 7099c77bc53bfc8297c078e8fc830dd99b0e35a3,55a82f12a42c2ef3c595b3867dafb5260c071513..a5a51dfdd622617d52d26dccaf9f69f50c967ddb
@@@ -4813,17 -4813,19 +4813,17 @@@ find_get_pmu_context(struct pmu *pmu, s
  
                cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
                epc = &cpc->epc;
 -
 +              raw_spin_lock_irq(&ctx->lock);
                if (!epc->ctx) {
                        atomic_set(&epc->refcount, 1);
                        epc->embedded = 1;
 -                      raw_spin_lock_irq(&ctx->lock);
                        list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
                        epc->ctx = ctx;
 -                      raw_spin_unlock_irq(&ctx->lock);
                } else {
                        WARN_ON_ONCE(epc->ctx != ctx);
                        atomic_inc(&epc->refcount);
                }
 -
 +              raw_spin_unlock_irq(&ctx->lock);
                return epc;
        }
  
@@@ -4894,30 -4896,33 +4894,30 @@@ static void free_epc_rcu(struct rcu_hea
  
  static void put_pmu_ctx(struct perf_event_pmu_context *epc)
  {
 +      struct perf_event_context *ctx = epc->ctx;
        unsigned long flags;
  
 -      if (!atomic_dec_and_test(&epc->refcount))
 +      /*
 +       * XXX
 +       *
 +       * lockdep_assert_held(&ctx->mutex);
 +       *
 +       * can't because of the call-site in _free_event()/put_event()
 +       * which isn't always called under ctx->mutex.
 +       */
 +      if (!atomic_dec_and_raw_lock_irqsave(&epc->refcount, &ctx->lock, flags))
                return;
  
 -      if (epc->ctx) {
 -              struct perf_event_context *ctx = epc->ctx;
 +      WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
  
 -              /*
 -               * XXX
 -               *
 -               * lockdep_assert_held(&ctx->mutex);
 -               *
 -               * can't because of the call-site in _free_event()/put_event()
 -               * which isn't always called under ctx->mutex.
 -               */
 -
 -              WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
 -              raw_spin_lock_irqsave(&ctx->lock, flags);
 -              list_del_init(&epc->pmu_ctx_entry);
 -              epc->ctx = NULL;
 -              raw_spin_unlock_irqrestore(&ctx->lock, flags);
 -      }
 +      list_del_init(&epc->pmu_ctx_entry);
 +      epc->ctx = NULL;
  
        WARN_ON_ONCE(!list_empty(&epc->pinned_active));
        WARN_ON_ONCE(!list_empty(&epc->flexible_active));
  
 +      raw_spin_unlock_irqrestore(&ctx->lock, flags);
 +
        if (epc->embedded)
                return;
  
@@@ -6568,7 -6573,7 +6568,7 @@@ aux_unlock
         * Since pinned accounting is per vm we cannot allow fork() to copy our
         * vma.
         */
-       vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
        vma->vm_ops = &perf_mmap_vmops;
  
        if (event->pmu->event_mapped)
@@@ -7041,20 -7046,13 +7041,20 @@@ out_put
        ring_buffer_put(rb);
  }
  
 -static void __perf_event_header__init_id(struct perf_event_header *header,
 -                                       struct perf_sample_data *data,
 +/*
 + * A set of common sample data types saved even for non-sample records
 + * when event->attr.sample_id_all is set.
 + */
 +#define PERF_SAMPLE_ID_ALL  (PERF_SAMPLE_TID | PERF_SAMPLE_TIME |     \
 +                           PERF_SAMPLE_ID | PERF_SAMPLE_STREAM_ID |   \
 +                           PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER)
 +
 +static void __perf_event_header__init_id(struct perf_sample_data *data,
                                         struct perf_event *event,
                                         u64 sample_type)
  {
        data->type = event->attr.sample_type;
 -      header->size += event->id_header_size;
 +      data->sample_flags |= data->type & PERF_SAMPLE_ID_ALL;
  
        if (sample_type & PERF_SAMPLE_TID) {
                /* namespace issues */
@@@ -7081,10 -7079,8 +7081,10 @@@ void perf_event_header__init_id(struct 
                                struct perf_sample_data *data,
                                struct perf_event *event)
  {
 -      if (event->attr.sample_id_all)
 -              __perf_event_header__init_id(header, data, event, event->attr.sample_type);
 +      if (event->attr.sample_id_all) {
 +              header->size += event->id_header_size;
 +              __perf_event_header__init_id(data, event, event->attr.sample_type);
 +      }
  }
  
  static void __perf_event__output_id_sample(struct perf_output_handle *handle,
@@@ -7314,7 -7310,7 +7314,7 @@@ void perf_output_sample(struct perf_out
        }
  
        if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
 -              if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
 +              if (data->br_stack) {
                        size_t size;
  
                        size = data->br_stack->nr
@@@ -7558,68 -7554,83 +7558,68 @@@ perf_callchain(struct perf_event *event
        return callchain ?: &__empty_callchain;
  }
  
 -void perf_prepare_sample(struct perf_event_header *header,
 -                       struct perf_sample_data *data,
 +static __always_inline u64 __cond_set(u64 flags, u64 s, u64 d)
 +{
 +      return d * !!(flags & s);
 +}
 +
 +void perf_prepare_sample(struct perf_sample_data *data,
                         struct perf_event *event,
                         struct pt_regs *regs)
  {
        u64 sample_type = event->attr.sample_type;
        u64 filtered_sample_type;
  
 -      header->type = PERF_RECORD_SAMPLE;
 -      header->size = sizeof(*header) + event->header_size;
 -
 -      header->misc = 0;
 -      header->misc |= perf_misc_flags(regs);
 -
        /*
 -       * Clear the sample flags that have already been done by the
 -       * PMU driver.
 +       * Add the sample flags that are dependent to others.  And clear the
 +       * sample flags that have already been done by the PMU driver.
         */
 -      filtered_sample_type = sample_type & ~data->sample_flags;
 -      __perf_event_header__init_id(header, data, event, filtered_sample_type);
 -
 -      if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
 -              data->ip = perf_instruction_pointer(regs);
 -
 -      if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 -              int size = 1;
 -
 -              if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
 -                      data->callchain = perf_callchain(event, regs);
 +      filtered_sample_type = sample_type;
 +      filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_CODE_PAGE_SIZE,
 +                                         PERF_SAMPLE_IP);
 +      filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_DATA_PAGE_SIZE |
 +                                         PERF_SAMPLE_PHYS_ADDR, PERF_SAMPLE_ADDR);
 +      filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_STACK_USER,
 +                                         PERF_SAMPLE_REGS_USER);
 +      filtered_sample_type &= ~data->sample_flags;
  
 -              size += data->callchain->nr;
 -
 -              header->size += size * sizeof(u64);
 +      if (filtered_sample_type == 0) {
 +              /* Make sure it has the correct data->type for output */
 +              data->type = event->attr.sample_type;
 +              return;
        }
  
 -      if (sample_type & PERF_SAMPLE_RAW) {
 -              struct perf_raw_record *raw = data->raw;
 -              int size;
 +      __perf_event_header__init_id(data, event, filtered_sample_type);
  
 -              if (raw && (data->sample_flags & PERF_SAMPLE_RAW)) {
 -                      struct perf_raw_frag *frag = &raw->frag;
 -                      u32 sum = 0;
 +      if (filtered_sample_type & PERF_SAMPLE_IP) {
 +              data->ip = perf_instruction_pointer(regs);
 +              data->sample_flags |= PERF_SAMPLE_IP;
 +      }
  
 -                      do {
 -                              sum += frag->size;
 -                              if (perf_raw_frag_last(frag))
 -                                      break;
 -                              frag = frag->next;
 -                      } while (1);
 +      if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
 +              perf_sample_save_callchain(data, event, regs);
  
 -                      size = round_up(sum + sizeof(u32), sizeof(u64));
 -                      raw->size = size - sizeof(u32);
 -                      frag->pad = raw->size - sum;
 -              } else {
 -                      size = sizeof(u64);
 -                      data->raw = NULL;
 -              }
 -
 -              header->size += size;
 +      if (filtered_sample_type & PERF_SAMPLE_RAW) {
 +              data->raw = NULL;
 +              data->dyn_size += sizeof(u64);
 +              data->sample_flags |= PERF_SAMPLE_RAW;
        }
  
 -      if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
 -              int size = sizeof(u64); /* nr */
 -              if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
 -                      if (branch_sample_hw_index(event))
 -                              size += sizeof(u64);
 -
 -                      size += data->br_stack->nr
 -                            * sizeof(struct perf_branch_entry);
 -              }
 -              header->size += size;
 +      if (filtered_sample_type & PERF_SAMPLE_BRANCH_STACK) {
 +              data->br_stack = NULL;
 +              data->dyn_size += sizeof(u64);
 +              data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
        }
  
 -      if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER))
 +      if (filtered_sample_type & PERF_SAMPLE_REGS_USER)
                perf_sample_regs_user(&data->regs_user, regs);
  
 -      if (sample_type & PERF_SAMPLE_REGS_USER) {
 +      /*
 +       * It cannot use the filtered_sample_type here as REGS_USER can be set
 +       * by STACK_USER (using __cond_set() above) and we don't want to update
 +       * the dyn_size if it's not requested by users.
 +       */
 +      if ((sample_type & ~data->sample_flags) & PERF_SAMPLE_REGS_USER) {
                /* regs dump ABI info */
                int size = sizeof(u64);
  
                        size += hweight64(mask) * sizeof(u64);
                }
  
 -              header->size += size;
 +              data->dyn_size += size;
 +              data->sample_flags |= PERF_SAMPLE_REGS_USER;
        }
  
 -      if (sample_type & PERF_SAMPLE_STACK_USER) {
 +      if (filtered_sample_type & PERF_SAMPLE_STACK_USER) {
                /*
                 * Either we need PERF_SAMPLE_STACK_USER bit to be always
                 * processed as the last one or have additional check added
                 * up the rest of the sample size.
                 */
                u16 stack_size = event->attr.sample_stack_user;
 +              u16 header_size = perf_sample_data_size(data, event);
                u16 size = sizeof(u64);
  
 -              stack_size = perf_sample_ustack_size(stack_size, header->size,
 +              stack_size = perf_sample_ustack_size(stack_size, header_size,
                                                     data->regs_user.regs);
  
                /*
                        size += sizeof(u64) + stack_size;
  
                data->stack_user_size = stack_size;
 -              header->size += size;
 +              data->dyn_size += size;
 +              data->sample_flags |= PERF_SAMPLE_STACK_USER;
        }
  
 -      if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
 +      if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE) {
                data->weight.full = 0;
 +              data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
 +      }
  
 -      if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
 +      if (filtered_sample_type & PERF_SAMPLE_DATA_SRC) {
                data->data_src.val = PERF_MEM_NA;
 +              data->sample_flags |= PERF_SAMPLE_DATA_SRC;
 +      }
  
 -      if (filtered_sample_type & PERF_SAMPLE_TRANSACTION)
 +      if (filtered_sample_type & PERF_SAMPLE_TRANSACTION) {
                data->txn = 0;
 +              data->sample_flags |= PERF_SAMPLE_TRANSACTION;
 +      }
  
 -      if (sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_DATA_PAGE_SIZE)) {
 -              if (filtered_sample_type & PERF_SAMPLE_ADDR)
 -                      data->addr = 0;
 +      if (filtered_sample_type & PERF_SAMPLE_ADDR) {
 +              data->addr = 0;
 +              data->sample_flags |= PERF_SAMPLE_ADDR;
        }
  
 -      if (sample_type & PERF_SAMPLE_REGS_INTR) {
 +      if (filtered_sample_type & PERF_SAMPLE_REGS_INTR) {
                /* regs dump ABI info */
                int size = sizeof(u64);
  
                        size += hweight64(mask) * sizeof(u64);
                }
  
 -              header->size += size;
 +              data->dyn_size += size;
 +              data->sample_flags |= PERF_SAMPLE_REGS_INTR;
        }
  
 -      if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
 -          filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
 +      if (filtered_sample_type & PERF_SAMPLE_PHYS_ADDR) {
                data->phys_addr = perf_virt_to_phys(data->addr);
 +              data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
 +      }
  
  #ifdef CONFIG_CGROUP_PERF
 -      if (sample_type & PERF_SAMPLE_CGROUP) {
 +      if (filtered_sample_type & PERF_SAMPLE_CGROUP) {
                struct cgroup *cgrp;
  
                /* protected by RCU */
                cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
                data->cgroup = cgroup_id(cgrp);
 +              data->sample_flags |= PERF_SAMPLE_CGROUP;
        }
  #endif
  
         * require PERF_SAMPLE_ADDR, kernel implicitly retrieve the data->addr,
         * but the value will not dump to the userspace.
         */
 -      if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 +      if (filtered_sample_type & PERF_SAMPLE_DATA_PAGE_SIZE) {
                data->data_page_size = perf_get_page_size(data->addr);
 +              data->sample_flags |= PERF_SAMPLE_DATA_PAGE_SIZE;
 +      }
  
 -      if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
 +      if (filtered_sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) {
                data->code_page_size = perf_get_page_size(data->ip);
 +              data->sample_flags |= PERF_SAMPLE_CODE_PAGE_SIZE;
 +      }
  
 -      if (sample_type & PERF_SAMPLE_AUX) {
 +      if (filtered_sample_type & PERF_SAMPLE_AUX) {
                u64 size;
 +              u16 header_size = perf_sample_data_size(data, event);
  
 -              header->size += sizeof(u64); /* size */
 +              header_size += sizeof(u64); /* size */
  
                /*
                 * Given the 16bit nature of header::size, an AUX sample can
                 * Make sure this doesn't happen by using up to U16_MAX bytes
                 * per sample in total (rounded down to 8 byte boundary).
                 */
 -              size = min_t(size_t, U16_MAX - header->size,
 +              size = min_t(size_t, U16_MAX - header_size,
                             event->attr.aux_sample_size);
                size = rounddown(size, 8);
                size = perf_prepare_sample_aux(event, data, size);
  
 -              WARN_ON_ONCE(size + header->size > U16_MAX);
 -              header->size += size;
 +              WARN_ON_ONCE(size + header_size > U16_MAX);
 +              data->dyn_size += size + sizeof(u64); /* size above */
 +              data->sample_flags |= PERF_SAMPLE_AUX;
        }
 +}
 +
 +void perf_prepare_header(struct perf_event_header *header,
 +                       struct perf_sample_data *data,
 +                       struct perf_event *event,
 +                       struct pt_regs *regs)
 +{
 +      header->type = PERF_RECORD_SAMPLE;
 +      header->size = perf_sample_data_size(data, event);
 +      header->misc = perf_misc_flags(regs);
 +
        /*
         * If you're adding more sample types here, you likely need to do
         * something about the overflowing header::size, like repurpose the
@@@ -7785,8 -7767,7 +7785,8 @@@ __perf_event_output(struct perf_event *
        /* protect the callchain buffers */
        rcu_read_lock();
  
 -      perf_prepare_sample(&header, data, event, regs);
 +      perf_prepare_sample(data, event, regs);
 +      perf_prepare_header(&header, data, event, regs);
  
        err = output_begin(&handle, data, event, header.size);
        if (err)
@@@ -10144,7 -10125,8 +10144,7 @@@ void perf_tp_event(u16 event_type, u64 
        };
  
        perf_sample_data_init(&data, 0, 0);
 -      data.raw = &raw;
 -      data.sample_flags |= PERF_SAMPLE_RAW;
 +      perf_sample_save_raw_data(&data, &raw);
  
        perf_trace_buf_update(record, event_type);
  
@@@ -10351,7 -10333,13 +10351,7 @@@ static void bpf_overflow_handler(struc
        rcu_read_lock();
        prog = READ_ONCE(event->prog);
        if (prog) {
 -              if (prog->call_get_stack &&
 -                  (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
 -                  !(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) {
 -                      data->callchain = perf_callchain(event, regs);
 -                      data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
 -              }
 -
 +              perf_prepare_sample(data, event, regs);
                ret = bpf_prog_run(prog, &ctx);
        }
        rcu_read_unlock();
diff --combined kernel/fork.c
index 038b898dad523e408c6d8740abf4b45994382eec,5e3029ea8e1ec26efc5495c559c0f58600797625..f68954d05e89dce1f7d430610322fff0a3a379f1
@@@ -472,7 -472,7 +472,7 @@@ struct vm_area_struct *vm_area_dup(stru
                 * orig->shared.rb may be modified concurrently, but the clone
                 * will be reinitialized.
                 */
-               *new = data_race(*orig);
+               data_race(memcpy(new, orig, sizeof(*new)));
                INIT_LIST_HEAD(&new->anon_vma_chain);
                dup_anon_vma_name(orig, new);
        }
@@@ -585,8 -585,8 +585,8 @@@ static __latent_entropy int dup_mmap(st
        int retval;
        unsigned long charge = 0;
        LIST_HEAD(uf);
-       MA_STATE(old_mas, &oldmm->mm_mt, 0, 0);
-       MA_STATE(mas, &mm->mm_mt, 0, 0);
+       VMA_ITERATOR(old_vmi, oldmm, 0);
+       VMA_ITERATOR(vmi, mm, 0);
  
        uprobe_start_dup_mmap();
        if (mmap_write_lock_killable(oldmm)) {
                goto out;
        khugepaged_fork(mm, oldmm);
  
-       retval = mas_expected_entries(&mas, oldmm->map_count);
+       retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count);
        if (retval)
                goto out;
  
-       mas_for_each(&old_mas, mpnt, ULONG_MAX) {
+       for_each_vma(old_vmi, mpnt) {
                struct file *file;
  
                if (mpnt->vm_flags & VM_DONTCOPY) {
                        tmp->anon_vma = NULL;
                } else if (anon_vma_fork(tmp, mpnt))
                        goto fail_nomem_anon_vma_fork;
-               tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+               vm_flags_clear(tmp, VM_LOCKED_MASK);
                file = tmp->vm_file;
                if (file) {
                        struct address_space *mapping = file->f_mapping;
                        hugetlb_dup_vma_private(tmp);
  
                /* Link the vma into the MT */
-               mas.index = tmp->vm_start;
-               mas.last = tmp->vm_end - 1;
-               mas_store(&mas, tmp);
-               if (mas_is_err(&mas))
-                       goto fail_nomem_mas_store;
+               if (vma_iter_bulk_store(&vmi, tmp))
+                       goto fail_nomem_vmi_store;
  
                mm->map_count++;
                if (!(tmp->vm_flags & VM_WIPEONFORK))
        /* a new mm has just been created */
        retval = arch_dup_mmap(oldmm, mm);
  loop_out:
-       mas_destroy(&mas);
+       vma_iter_free(&vmi);
  out:
        mmap_write_unlock(mm);
        flush_tlb_mm(oldmm);
@@@ -712,7 -709,7 +709,7 @@@ fail_uprobe_end
        uprobe_end_dup_mmap();
        return retval;
  
- fail_nomem_mas_store:
+ fail_nomem_vmi_store:
        unlink_anon_vmas(tmp);
  fail_nomem_anon_vma_fork:
        mpol_put(vma_policy(tmp));
@@@ -1044,7 -1041,7 +1041,7 @@@ static struct task_struct *dup_task_str
  #endif
  
  #ifdef CONFIG_BLK_CGROUP
 -      tsk->throttle_queue = NULL;
 +      tsk->throttle_disk = NULL;
        tsk->use_memdelay = 0;
  #endif
  
        tsk->reported_split_lock = 0;
  #endif
  
 +#ifdef CONFIG_SCHED_MM_CID
 +      tsk->mm_cid = -1;
 +      tsk->mm_cid_active = 0;
 +#endif
        return tsk;
  
  free_stack:
@@@ -1173,7 -1166,6 +1170,7 @@@ static struct mm_struct *mm_init(struc
  
        mm->user_ns = get_user_ns(user_ns);
        lru_gen_init_mm(mm);
 +      mm_init_cid(mm);
        return mm;
  
  fail_pcpu:
@@@ -1606,7 -1598,6 +1603,7 @@@ static int copy_mm(unsigned long clone_
  
        tsk->mm = mm;
        tsk->active_mm = mm;
 +      sched_mm_cid_fork(tsk);
        return 0;
  }
  
@@@ -3040,7 -3031,7 +3037,7 @@@ void __init mm_cache_init(void
         * dynamically sized based on the maximum CPU number this system
         * can have, taking hotplug into account (nr_cpu_ids).
         */
 -      mm_size = sizeof(struct mm_struct) + cpumask_size();
 +      mm_size = sizeof(struct mm_struct) + cpumask_size() + mm_cid_size();
  
        mm_cachep = kmem_cache_create_usercopy("mm_struct",
                        mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
diff --combined kernel/pid_namespace.c
index fc21c5d5fd5de7932cfd3329840ac68e0fd341af,8a98b1af937659f8afe058aab18d4f24152b848e..46e0d5a3f91f0d21923966adbdbedaa71aa3968f
@@@ -23,6 -23,7 +23,7 @@@
  #include <linux/sched/task.h>
  #include <linux/sched/signal.h>
  #include <linux/idr.h>
+ #include "pid_sysctl.h"
  
  static DEFINE_MUTEX(pid_caches_mutex);
  static struct kmem_cache *pid_ns_cachep;
@@@ -110,6 -111,8 +111,8 @@@ static struct pid_namespace *create_pid
        ns->ucounts = ucounts;
        ns->pid_allocated = PIDNS_ADDING;
  
+       initialize_memfd_noexec_scope(ns);
        return ns;
  
  out_free_idr:
@@@ -244,24 -247,7 +247,24 @@@ void zap_pid_ns_processes(struct pid_na
                set_current_state(TASK_INTERRUPTIBLE);
                if (pid_ns->pid_allocated == init_pids)
                        break;
 +              /*
 +               * Release tasks_rcu_exit_srcu to avoid following deadlock:
 +               *
 +               * 1) TASK A unshare(CLONE_NEWPID)
 +               * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
 +               *    and TASK C
 +               * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
 +               * 4) TASK A calls synchronize_rcu_tasks()
 +               *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
 +               * 5) *DEADLOCK*
 +               *
 +               * It is considered safe to release tasks_rcu_exit_srcu here
 +               * because we assume the current task can not be concurrently
 +               * reaped at this point.
 +               */
 +              exit_tasks_rcu_stop();
                schedule();
 +              exit_tasks_rcu_start();
        }
        __set_current_state(TASK_RUNNING);
  
@@@ -472,6 -458,8 +475,8 @@@ static __init int pid_namespaces_init(v
  #ifdef CONFIG_CHECKPOINT_RESTORE
        register_sysctl_paths(kern_path, pid_ns_ctl_table);
  #endif
+       register_pid_ns_sysctl_table_vm();
        return 0;
  }
  
diff --combined kernel/sched/fair.c
index ff4dbbae3b1078e5089ecf273c08fcee1b66a501,9c9950249d7b2b0b09141e45a21551f5130ddf79..7a1b1f855b9635e75282913850b70ffba2006322
@@@ -468,7 -468,7 +468,7 @@@ is_same_group(struct sched_entity *se, 
        return NULL;
  }
  
 -static inline struct sched_entity *parent_entity(struct sched_entity *se)
 +static inline struct sched_entity *parent_entity(const struct sched_entity *se)
  {
        return se->parent;
  }
@@@ -595,8 -595,8 +595,8 @@@ static inline u64 min_vruntime(u64 min_
        return min_vruntime;
  }
  
 -static inline bool entity_before(struct sched_entity *a,
 -                              struct sched_entity *b)
 +static inline bool entity_before(const struct sched_entity *a,
 +                               const struct sched_entity *b)
  {
        return (s64)(a->vruntime - b->vruntime) < 0;
  }
@@@ -1804,7 -1804,7 +1804,7 @@@ static void update_numa_stats(struct ta
                ns->nr_running += rq->cfs.h_nr_running;
                ns->compute_capacity += capacity_of(cpu);
  
 -              if (find_idle && !rq->nr_running && idle_cpu(cpu)) {
 +              if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) {
                        if (READ_ONCE(rq->numa_migrate_on) ||
                            !cpumask_test_cpu(cpu, env->p->cpus_ptr))
                                continue;
@@@ -1836,7 -1836,7 +1836,7 @@@ static void task_numa_assign(struct tas
                int start = env->dst_cpu;
  
                /* Find alternative idle CPU. */
 -              for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start) {
 +              for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start + 1) {
                        if (cpu == env->best_cpu || !idle_cpu(cpu) ||
                            !cpumask_test_cpu(cpu, env->p->cpus_ptr)) {
                                continue;
@@@ -2938,11 -2938,11 +2938,11 @@@ static void task_numa_work(struct callb
        struct task_struct *p = current;
        struct mm_struct *mm = p->mm;
        u64 runtime = p->se.sum_exec_runtime;
-       MA_STATE(mas, &mm->mm_mt, 0, 0);
        struct vm_area_struct *vma;
        unsigned long start, end;
        unsigned long nr_pte_updates = 0;
        long pages, virtpages;
+       struct vma_iterator vmi;
  
        SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
  
  
        if (!mmap_read_trylock(mm))
                return;
-       mas_set(&mas, start);
-       vma = mas_find(&mas, ULONG_MAX);
+       vma_iter_init(&vmi, mm, start);
+       vma = vma_next(&vmi);
        if (!vma) {
                reset_ptenuma_scan(p);
                start = 0;
-               mas_set(&mas, start);
-               vma = mas_find(&mas, ULONG_MAX);
+               vma_iter_set(&vmi, start);
+               vma = vma_next(&vmi);
        }
  
-       for (; vma; vma = mas_find(&mas, ULONG_MAX)) {
+       do {
                if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
                        is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
                        continue;
  
                        cond_resched();
                } while (end != vma->vm_end);
-       }
+       } for_each_vma(vmi, vma);
  
  out:
        /*
@@@ -4476,9 -4476,17 +4476,9 @@@ static inline int util_fits_cpu(unsigne
         *
         * For uclamp_max, we can tolerate a drop in performance level as the
         * goal is to cap the task. So it's okay if it's getting less.
 -       *
 -       * In case of capacity inversion we should honour the inverted capacity
 -       * for both uclamp_min and uclamp_max all the time.
         */
 -      capacity_orig = cpu_in_capacity_inversion(cpu);
 -      if (capacity_orig) {
 -              capacity_orig_thermal = capacity_orig;
 -      } else {
 -              capacity_orig = capacity_orig_of(cpu);
 -              capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
 -      }
 +      capacity_orig = capacity_orig_of(cpu);
 +      capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
  
        /*
         * We want to force a task to fit a cpu as implied by uclamp_max.
         * handle the case uclamp_min > uclamp_max.
         */
        uclamp_min = min(uclamp_min, uclamp_max);
 -      if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
 -              fits = fits && (uclamp_min <= capacity_orig_thermal);
 +      if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal))
 +              return -1;
  
        return fits;
  }
@@@ -4564,11 -4572,7 +4564,11 @@@ static inline int task_fits_cpu(struct 
        unsigned long uclamp_min = uclamp_eff_value(p, UCLAMP_MIN);
        unsigned long uclamp_max = uclamp_eff_value(p, UCLAMP_MAX);
        unsigned long util = task_util_est(p);
 -      return util_fits_cpu(util, uclamp_min, uclamp_max, cpu);
 +      /*
 +       * Return true only if the cpu fully fits the task requirements, which
 +       * include the utilization but also the performance hints.
 +       */
 +      return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0);
  }
  
  static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
@@@ -4652,7 -4656,6 +4652,7 @@@ static voi
  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
  {
        u64 vruntime = cfs_rq->min_vruntime;
 +      u64 sleep_time;
  
        /*
         * The 'current' period is already promised to the current tasks,
                vruntime -= thresh;
        }
  
 -      /* ensure we never gain time by being placed backwards. */
 -      se->vruntime = max_vruntime(se->vruntime, vruntime);
 +      /*
 +       * Pull vruntime of the entity being placed to the base level of
 +       * cfs_rq, to prevent boosting it if placed backwards.  If the entity
 +       * slept for a long time, don't even try to compare its vruntime with
 +       * the base as it may be too far off and the comparison may get
 +       * inversed due to s64 overflow.
 +       */
 +      sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
 +      if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
 +              se->vruntime = vruntime;
 +      else
 +              se->vruntime = max_vruntime(se->vruntime, vruntime);
  }
  
  static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
@@@ -4903,13 -4896,7 +4903,13 @@@ check_preempt_tick(struct cfs_rq *cfs_r
        struct sched_entity *se;
        s64 delta;
  
 -      ideal_runtime = sched_slice(cfs_rq, curr);
 +      /*
 +       * When many tasks blow up the sched_period; it is possible that
 +       * sched_slice() reports unusually large results (when many tasks are
 +       * very light for example). Therefore impose a maximum.
 +       */
 +      ideal_runtime = min_t(u64, sched_slice(cfs_rq, curr), sysctl_sched_latency);
 +
        delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
        if (delta_exec > ideal_runtime) {
                resched_curr(rq_of(cfs_rq));
@@@ -5474,105 -5461,22 +5474,105 @@@ unthrottle_throttle
                resched_curr(rq);
  }
  
 -static void distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 +#ifdef CONFIG_SMP
 +static void __cfsb_csd_unthrottle(void *arg)
  {
 -      struct cfs_rq *cfs_rq;
 +      struct cfs_rq *cursor, *tmp;
 +      struct rq *rq = arg;
 +      struct rq_flags rf;
 +
 +      rq_lock(rq, &rf);
 +
 +      /*
 +       * Since we hold rq lock we're safe from concurrent manipulation of
 +       * the CSD list. However, this RCU critical section annotates the
 +       * fact that we pair with sched_free_group_rcu(), so that we cannot
 +       * race with group being freed in the window between removing it
 +       * from the list and advancing to the next entry in the list.
 +       */
 +      rcu_read_lock();
 +
 +      list_for_each_entry_safe(cursor, tmp, &rq->cfsb_csd_list,
 +                               throttled_csd_list) {
 +              list_del_init(&cursor->throttled_csd_list);
 +
 +              if (cfs_rq_throttled(cursor))
 +                      unthrottle_cfs_rq(cursor);
 +      }
 +
 +      rcu_read_unlock();
 +
 +      rq_unlock(rq, &rf);
 +}
 +
 +static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 +{
 +      struct rq *rq = rq_of(cfs_rq);
 +      bool first;
 +
 +      if (rq == this_rq()) {
 +              unthrottle_cfs_rq(cfs_rq);
 +              return;
 +      }
 +
 +      /* Already enqueued */
 +      if (SCHED_WARN_ON(!list_empty(&cfs_rq->throttled_csd_list)))
 +              return;
 +
 +      first = list_empty(&rq->cfsb_csd_list);
 +      list_add_tail(&cfs_rq->throttled_csd_list, &rq->cfsb_csd_list);
 +      if (first)
 +              smp_call_function_single_async(cpu_of(rq), &rq->cfsb_csd);
 +}
 +#else
 +static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 +{
 +      unthrottle_cfs_rq(cfs_rq);
 +}
 +#endif
 +
 +static void unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 +{
 +      lockdep_assert_rq_held(rq_of(cfs_rq));
 +
 +      if (SCHED_WARN_ON(!cfs_rq_throttled(cfs_rq) ||
 +          cfs_rq->runtime_remaining <= 0))
 +              return;
 +
 +      __unthrottle_cfs_rq_async(cfs_rq);
 +}
 +
 +static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 +{
 +      struct cfs_rq *local_unthrottle = NULL;
 +      int this_cpu = smp_processor_id();
        u64 runtime, remaining = 1;
 +      bool throttled = false;
 +      struct cfs_rq *cfs_rq;
 +      struct rq_flags rf;
 +      struct rq *rq;
  
        rcu_read_lock();
        list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
                                throttled_list) {
 -              struct rq *rq = rq_of(cfs_rq);
 -              struct rq_flags rf;
 +              rq = rq_of(cfs_rq);
 +
 +              if (!remaining) {
 +                      throttled = true;
 +                      break;
 +              }
  
                rq_lock_irqsave(rq, &rf);
                if (!cfs_rq_throttled(cfs_rq))
                        goto next;
  
 -              /* By the above check, this should never be true */
 +#ifdef CONFIG_SMP
 +              /* Already queued for async unthrottle */
 +              if (!list_empty(&cfs_rq->throttled_csd_list))
 +                      goto next;
 +#endif
 +
 +              /* By the above checks, this should never be true */
                SCHED_WARN_ON(cfs_rq->runtime_remaining > 0);
  
                raw_spin_lock(&cfs_b->lock);
                cfs_rq->runtime_remaining += runtime;
  
                /* we check whether we're throttled above */
 -              if (cfs_rq->runtime_remaining > 0)
 -                      unthrottle_cfs_rq(cfs_rq);
 +              if (cfs_rq->runtime_remaining > 0) {
 +                      if (cpu_of(rq) != this_cpu ||
 +                          SCHED_WARN_ON(local_unthrottle))
 +                              unthrottle_cfs_rq_async(cfs_rq);
 +                      else
 +                              local_unthrottle = cfs_rq;
 +              } else {
 +                      throttled = true;
 +              }
  
  next:
                rq_unlock_irqrestore(rq, &rf);
 -
 -              if (!remaining)
 -                      break;
        }
        rcu_read_unlock();
 +
 +      if (local_unthrottle) {
 +              rq = cpu_rq(this_cpu);
 +              rq_lock_irqsave(rq, &rf);
 +              if (cfs_rq_throttled(local_unthrottle))
 +                      unthrottle_cfs_rq(local_unthrottle);
 +              rq_unlock_irqrestore(rq, &rf);
 +      }
 +
 +      return throttled;
  }
  
  /*
@@@ -5654,8 -5544,10 +5654,8 @@@ static int do_sched_cfs_period_timer(st
        while (throttled && cfs_b->runtime > 0) {
                raw_spin_unlock_irqrestore(&cfs_b->lock, flags);
                /* we can't nest cfs_b->lock while distributing bandwidth */
 -              distribute_cfs_runtime(cfs_b);
 +              throttled = distribute_cfs_runtime(cfs_b);
                raw_spin_lock_irqsave(&cfs_b->lock, flags);
 -
 -              throttled = !list_empty(&cfs_b->throttled_cfs_rq);
        }
  
        /*
@@@ -5932,9 -5824,6 +5932,9 @@@ static void init_cfs_rq_runtime(struct 
  {
        cfs_rq->runtime_enabled = 0;
        INIT_LIST_HEAD(&cfs_rq->throttled_list);
 +#ifdef CONFIG_SMP
 +      INIT_LIST_HEAD(&cfs_rq->throttled_csd_list);
 +#endif
  }
  
  void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
  
  static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
  {
 +      int __maybe_unused i;
 +
        /* init_cfs_bandwidth() was not called */
        if (!cfs_b->throttled_cfs_rq.next)
                return;
  
        hrtimer_cancel(&cfs_b->period_timer);
        hrtimer_cancel(&cfs_b->slack_timer);
 +
 +      /*
 +       * It is possible that we still have some cfs_rq's pending on a CSD
 +       * list, though this race is very rare. In order for this to occur, we
 +       * must have raced with the last task leaving the group while there
 +       * exist throttled cfs_rq(s), and the period_timer must have queued the
 +       * CSD item but the remote cpu has not yet processed it. To handle this,
 +       * we can simply flush all pending CSD work inline here. We're
 +       * guaranteed at this point that no additional cfs_rq of this group can
 +       * join a CSD list.
 +       */
 +#ifdef CONFIG_SMP
 +      for_each_possible_cpu(i) {
 +              struct rq *rq = cpu_rq(i);
 +              unsigned long flags;
 +
 +              if (list_empty(&rq->cfsb_csd_list))
 +                      continue;
 +
 +              local_irq_save(flags);
 +              __cfsb_csd_unthrottle(rq);
 +              local_irq_restore(flags);
 +      }
 +#endif
  }
  
  /*
@@@ -6145,7 -6008,6 +6145,7 @@@ static inline bool cpu_overutilized(in
        unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
        unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
  
 +      /* Return true only if the utilization doesn't fit CPU's capacity */
        return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
  }
  
@@@ -6939,7 -6801,6 +6939,7 @@@ static in
  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
  {
        unsigned long task_util, util_min, util_max, best_cap = 0;
 +      int fits, best_fits = 0;
        int cpu, best_cpu = -1;
        struct cpumask *cpus;
  
        util_min = uclamp_eff_value(p, UCLAMP_MIN);
        util_max = uclamp_eff_value(p, UCLAMP_MAX);
  
 -      for_each_cpu_wrap(cpu, cpus, target) {
 +      for_each_cpu_wrap(cpu, cpus, target + 1) {
                unsigned long cpu_cap = capacity_of(cpu);
  
                if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
                        continue;
 -              if (util_fits_cpu(task_util, util_min, util_max, cpu))
 +
 +              fits = util_fits_cpu(task_util, util_min, util_max, cpu);
 +
 +              /* This CPU fits with all requirements */
 +              if (fits > 0)
                        return cpu;
 +              /*
 +               * Only the min performance hint (i.e. uclamp_min) doesn't fit.
 +               * Look for the CPU with best capacity.
 +               */
 +              else if (fits < 0)
 +                      cpu_cap = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
  
 -              if (cpu_cap > best_cap) {
 +              /*
 +               * First, select CPU which fits better (-1 being better than 0).
 +               * Then, select the one with best capacity at same level.
 +               */
 +              if ((fits < best_fits) ||
 +                  ((fits == best_fits) && (cpu_cap > best_cap))) {
                        best_cap = cpu_cap;
                        best_cpu = cpu;
 +                      best_fits = fits;
                }
        }
  
@@@ -6989,11 -6834,7 +6989,11 @@@ static inline bool asym_fits_cpu(unsign
                                 int cpu)
  {
        if (sched_asym_cpucap_active())
 -              return util_fits_cpu(util, util_min, util_max, cpu);
 +              /*
 +               * Return true only if the cpu fully fits the task requirements
 +               * which include the utilization and the performance hints.
 +               */
 +              return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
  
        return true;
  }
@@@ -7360,9 -7201,6 +7360,9 @@@ static int find_energy_efficient_cpu(st
        unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024;
        struct root_domain *rd = this_rq()->rd;
        int cpu, best_energy_cpu, target = -1;
 +      int prev_fits = -1, best_fits = -1;
 +      unsigned long best_thermal_cap = 0;
 +      unsigned long prev_thermal_cap = 0;
        struct sched_domain *sd;
        struct perf_domain *pd;
        struct energy_env eenv;
        eenv_task_busy_time(&eenv, p, prev_cpu);
  
        for (; pd; pd = pd->next) {
 +              unsigned long util_min = p_util_min, util_max = p_util_max;
                unsigned long cpu_cap, cpu_thermal_cap, util;
                unsigned long cur_delta, max_spare_cap = 0;
                unsigned long rq_util_min, rq_util_max;
 -              unsigned long util_min, util_max;
                unsigned long prev_spare_cap = 0;
                int max_spare_cap_cpu = -1;
                unsigned long base_energy;
 +              int fits, max_fits = -1;
  
                cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
  
                eenv.pd_cap = 0;
  
                for_each_cpu(cpu, cpus) {
 +                      struct rq *rq = cpu_rq(cpu);
 +
                        eenv.pd_cap += cpu_thermal_cap;
  
                        if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
                         * much capacity we can get out of the CPU; this is
                         * aligned with sched_cpu_util().
                         */
 -                      if (uclamp_is_used()) {
 -                              if (uclamp_rq_is_idle(cpu_rq(cpu))) {
 -                                      util_min = p_util_min;
 -                                      util_max = p_util_max;
 -                              } else {
 -                                      /*
 -                                       * Open code uclamp_rq_util_with() except for
 -                                       * the clamp() part. Ie: apply max aggregation
 -                                       * only. util_fits_cpu() logic requires to
 -                                       * operate on non clamped util but must use the
 -                                       * max-aggregated uclamp_{min, max}.
 -                                       */
 -                                      rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
 -                                      rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
 -
 -                                      util_min = max(rq_util_min, p_util_min);
 -                                      util_max = max(rq_util_max, p_util_max);
 -                              }
 +                      if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) {
 +                              /*
 +                               * Open code uclamp_rq_util_with() except for
 +                               * the clamp() part. Ie: apply max aggregation
 +                               * only. util_fits_cpu() logic requires to
 +                               * operate on non clamped util but must use the
 +                               * max-aggregated uclamp_{min, max}.
 +                               */
 +                              rq_util_min = uclamp_rq_get(rq, UCLAMP_MIN);
 +                              rq_util_max = uclamp_rq_get(rq, UCLAMP_MAX);
 +
 +                              util_min = max(rq_util_min, p_util_min);
 +                              util_max = max(rq_util_max, p_util_max);
                        }
 -                      if (!util_fits_cpu(util, util_min, util_max, cpu))
 +
 +                      fits = util_fits_cpu(util, util_min, util_max, cpu);
 +                      if (!fits)
                                continue;
  
                        lsub_positive(&cpu_cap, util);
                        if (cpu == prev_cpu) {
                                /* Always use prev_cpu as a candidate. */
                                prev_spare_cap = cpu_cap;
 -                      } else if (cpu_cap > max_spare_cap) {
 +                              prev_fits = fits;
 +                      } else if ((fits > max_fits) ||
 +                                 ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
                                /*
                                 * Find the CPU with the maximum spare capacity
                                 * among the remaining CPUs in the performance
                                 */
                                max_spare_cap = cpu_cap;
                                max_spare_cap_cpu = cpu;
 +                              max_fits = fits;
                        }
                }
  
                        if (prev_delta < base_energy)
                                goto unlock;
                        prev_delta -= base_energy;
 +                      prev_thermal_cap = cpu_thermal_cap;
                        best_delta = min(best_delta, prev_delta);
                }
  
                /* Evaluate the energy impact of using max_spare_cap_cpu. */
                if (max_spare_cap_cpu >= 0 && max_spare_cap > prev_spare_cap) {
 +                      /* Current best energy cpu fits better */
 +                      if (max_fits < best_fits)
 +                              continue;
 +
 +                      /*
 +                       * Both don't fit performance hint (i.e. uclamp_min)
 +                       * but best energy cpu has better capacity.
 +                       */
 +                      if ((max_fits < 0) &&
 +                          (cpu_thermal_cap <= best_thermal_cap))
 +                              continue;
 +
                        cur_delta = compute_energy(&eenv, pd, cpus, p,
                                                   max_spare_cap_cpu);
                        /* CPU utilization has changed */
                        if (cur_delta < base_energy)
                                goto unlock;
                        cur_delta -= base_energy;
 -                      if (cur_delta < best_delta) {
 -                              best_delta = cur_delta;
 -                              best_energy_cpu = max_spare_cap_cpu;
 -                      }
 +
 +                      /*
 +                       * Both fit for the task but best energy cpu has lower
 +                       * energy impact.
 +                       */
 +                      if ((max_fits > 0) && (best_fits > 0) &&
 +                          (cur_delta >= best_delta))
 +                              continue;
 +
 +                      best_delta = cur_delta;
 +                      best_energy_cpu = max_spare_cap_cpu;
 +                      best_fits = max_fits;
 +                      best_thermal_cap = cpu_thermal_cap;
                }
        }
        rcu_read_unlock();
  
 -      if (best_delta < prev_delta)
 +      if ((best_fits > prev_fits) ||
 +          ((best_fits > 0) && (best_delta < prev_delta)) ||
 +          ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap)))
                target = best_energy_cpu;
  
        return target;
@@@ -9030,16 -8841,73 +9030,16 @@@ static unsigned long scale_rt_capacity(
  
  static void update_cpu_capacity(struct sched_domain *sd, int cpu)
  {
 -      unsigned long capacity_orig = arch_scale_cpu_capacity(cpu);
        unsigned long capacity = scale_rt_capacity(cpu);
        struct sched_group *sdg = sd->groups;
 -      struct rq *rq = cpu_rq(cpu);
  
 -      rq->cpu_capacity_orig = capacity_orig;
 +      cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu);
  
        if (!capacity)
                capacity = 1;
  
 -      rq->cpu_capacity = capacity;
 -
 -      /*
 -       * Detect if the performance domain is in capacity inversion state.
 -       *
 -       * Capacity inversion happens when another perf domain with equal or
 -       * lower capacity_orig_of() ends up having higher capacity than this
 -       * domain after subtracting thermal pressure.
 -       *
 -       * We only take into account thermal pressure in this detection as it's
 -       * the only metric that actually results in *real* reduction of
 -       * capacity due to performance points (OPPs) being dropped/become
 -       * unreachable due to thermal throttling.
 -       *
 -       * We assume:
 -       *   * That all cpus in a perf domain have the same capacity_orig
 -       *     (same uArch).
 -       *   * Thermal pressure will impact all cpus in this perf domain
 -       *     equally.
 -       */
 -      if (static_branch_unlikely(&sched_asym_cpucapacity)) {
 -              unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
 -              struct perf_domain *pd = rcu_dereference(rq->rd->pd);
 -
 -              rq->cpu_capacity_inverted = 0;
 -
 -              for (; pd; pd = pd->next) {
 -                      struct cpumask *pd_span = perf_domain_span(pd);
 -                      unsigned long pd_cap_orig, pd_cap;
 -
 -                      cpu = cpumask_any(pd_span);
 -                      pd_cap_orig = arch_scale_cpu_capacity(cpu);
 -
 -                      if (capacity_orig < pd_cap_orig)
 -                              continue;
 -
 -                      /*
 -                       * handle the case of multiple perf domains have the
 -                       * same capacity_orig but one of them is under higher
 -                       * thermal pressure. We record it as capacity
 -                       * inversion.
 -                       */
 -                      if (capacity_orig == pd_cap_orig) {
 -                              pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
 -
 -                              if (pd_cap > inv_cap) {
 -                                      rq->cpu_capacity_inverted = inv_cap;
 -                                      break;
 -                              }
 -                      } else if (pd_cap_orig > inv_cap) {
 -                              rq->cpu_capacity_inverted = inv_cap;
 -                              break;
 -                      }
 -              }
 -      }
 -
 -      trace_sched_cpu_capacity_tp(rq);
 +      cpu_rq(cpu)->cpu_capacity = capacity;
 +      trace_sched_cpu_capacity_tp(cpu_rq(cpu));
  
        sdg->sgc->capacity = capacity;
        sdg->sgc->min_capacity = capacity;
@@@ -10267,23 -10135,24 +10267,23 @@@ static struct sched_group *find_busiest
         */
        update_sd_lb_stats(env, &sds);
  
 -      if (sched_energy_enabled()) {
 -              struct root_domain *rd = env->dst_rq->rd;
 -
 -              if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
 -                      goto out_balanced;
 -      }
 -
 -      local = &sds.local_stat;
 -      busiest = &sds.busiest_stat;
 -
        /* There is no busy sibling group to pull tasks from */
        if (!sds.busiest)
                goto out_balanced;
  
 +      busiest = &sds.busiest_stat;
 +
        /* Misfit tasks should be dealt with regardless of the avg load */
        if (busiest->group_type == group_misfit_task)
                goto force_balance;
  
 +      if (sched_energy_enabled()) {
 +              struct root_domain *rd = env->dst_rq->rd;
 +
 +              if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
 +                      goto out_balanced;
 +      }
 +
        /* ASYM feature bypasses nice load balance check */
        if (busiest->group_type == group_asym_packing)
                goto force_balance;
        if (busiest->group_type == group_imbalanced)
                goto force_balance;
  
 +      local = &sds.local_stat;
        /*
         * If the local group is busier than the selected busiest group
         * don't try and pull any tasks.
@@@ -11860,8 -11728,7 +11860,8 @@@ static inline void task_tick_core(struc
  /*
   * se_fi_update - Update the cfs_rq->min_vruntime_fi in a CFS hierarchy if needed.
   */
 -static void se_fi_update(struct sched_entity *se, unsigned int fi_seq, bool forceidle)
 +static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
 +                       bool forceidle)
  {
        for_each_sched_entity(se) {
                struct cfs_rq *cfs_rq = cfs_rq_of(se);
@@@ -11886,12 -11753,11 +11886,12 @@@ void task_vruntime_update(struct rq *rq
        se_fi_update(se, rq->core->core_forceidle_seq, in_fi);
  }
  
 -bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool in_fi)
 +bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
 +                      bool in_fi)
  {
        struct rq *rq = task_rq(a);
 -      struct sched_entity *sea = &a->se;
 -      struct sched_entity *seb = &b->se;
 +      const struct sched_entity *sea = &a->se;
 +      const struct sched_entity *seb = &b->se;
        struct cfs_rq *cfs_rqa;
        struct cfs_rq *cfs_rqb;
        s64 delta;
@@@ -12608,11 -12474,6 +12608,11 @@@ __init void init_sched_fair_class(void
        for_each_possible_cpu(i) {
                zalloc_cpumask_var_node(&per_cpu(load_balance_mask, i), GFP_KERNEL, cpu_to_node(i));
                zalloc_cpumask_var_node(&per_cpu(select_rq_mask,    i), GFP_KERNEL, cpu_to_node(i));
 +
 +#ifdef CONFIG_CFS_BANDWIDTH
 +              INIT_CSD(&cpu_rq(i)->cfsb_csd, __cfsb_csd_unthrottle, cpu_rq(i));
 +              INIT_LIST_HEAD(&cpu_rq(i)->cfsb_csd_list);
 +#endif
        }
  
        open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
diff --combined kernel/sys.c
index 88b31f096fb2d92f3758e368fd728f04b9a856b5,b3cab94545ed3f8b0289dc58825a0ae98e48d5ab..495cd87d9bf41b718e4b9acdf7b549a971c8d032
@@@ -1442,8 -1442,6 +1442,8 @@@ static int do_prlimit(struct task_struc
  
        if (resource >= RLIM_NLIMITS)
                return -EINVAL;
 +      resource = array_index_nospec(resource, RLIM_NLIMITS);
 +
        if (new_rlim) {
                if (new_rlim->rlim_cur > new_rlim->rlim_max)
                        return -EINVAL;
@@@ -2350,6 -2348,33 +2350,33 @@@ static int prctl_set_vma(unsigned long 
  }
  #endif /* CONFIG_ANON_VMA_NAME */
  
+ static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
+                                unsigned long arg4, unsigned long arg5)
+ {
+       if (arg3 || arg4 || arg5)
+               return -EINVAL;
+       if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+               return -EINVAL;
+       if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
+               set_bit(MMF_HAS_MDWE, &current->mm->flags);
+       else if (test_bit(MMF_HAS_MDWE, &current->mm->flags))
+               return -EPERM; /* Cannot unset the flag */
+       return 0;
+ }
+ static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
+                                unsigned long arg4, unsigned long arg5)
+ {
+       if (arg2 || arg3 || arg4 || arg5)
+               return -EINVAL;
+       return test_bit(MMF_HAS_MDWE, &current->mm->flags) ?
+               PR_MDWE_REFUSE_EXEC_GAIN : 0;
+ }
  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
                unsigned long, arg4, unsigned long, arg5)
  {
                error = sched_core_share_pid(arg2, arg3, arg4, arg5);
                break;
  #endif
+       case PR_SET_MDWE:
+               error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
+               break;
+       case PR_GET_MDWE:
+               error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
+               break;
        case PR_SET_VMA:
                error = prctl_set_vma(arg2, arg3, arg4, arg5);
                break;
diff --combined lib/Kconfig.debug
index 1dd4bd7dc27149f457f8ff2733987166b9ce9ba4,958087475edbbce84d98c668550429af441acf38..2e91421e096e72ea95653a7a6f678bce912707c2
@@@ -389,15 -389,6 +389,15 @@@ config PAHOLE_HAS_BTF_TA
          btf_decl_tag) or not. Currently only clang compiler implements
          these attributes, so make the config depend on CC_IS_CLANG.
  
 +config PAHOLE_HAS_LANG_EXCLUDE
 +      def_bool PAHOLE_VERSION >= 124
 +      help
 +        Support for the --lang_exclude flag which makes pahole exclude
 +        compilation units from the supplied language. Used in Kbuild to
 +        omit Rust CUs which are not supported in version 1.24 of pahole,
 +        otherwise it would emit malformed kernel and module binaries when
 +        using DEBUG_INFO_BTF_MODULES.
 +
  config DEBUG_INFO_BTF_MODULES
        def_bool y
        depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
@@@ -752,77 -743,6 +752,6 @@@ config SHRINKER_DEBU
          visibility into the kernel memory shrinkers subsystem.
          Disable it to avoid an extra memory footprint.
  
- config HAVE_DEBUG_KMEMLEAK
-       bool
- config DEBUG_KMEMLEAK
-       bool "Kernel memory leak detector"
-       depends on DEBUG_KERNEL && HAVE_DEBUG_KMEMLEAK
-       select DEBUG_FS
-       select STACKTRACE if STACKTRACE_SUPPORT
-       select KALLSYMS
-       select CRC32
-       select STACKDEPOT
-       select STACKDEPOT_ALWAYS_INIT if !DEBUG_KMEMLEAK_DEFAULT_OFF
-       help
-         Say Y here if you want to enable the memory leak
-         detector. The memory allocation/freeing is traced in a way
-         similar to the Boehm's conservative garbage collector, the
-         difference being that the orphan objects are not freed but
-         only shown in /sys/kernel/debug/kmemleak. Enabling this
-         feature will introduce an overhead to memory
-         allocations. See Documentation/dev-tools/kmemleak.rst for more
-         details.
-         Enabling DEBUG_SLAB or SLUB_DEBUG may increase the chances
-         of finding leaks due to the slab objects poisoning.
-         In order to access the kmemleak file, debugfs needs to be
-         mounted (usually at /sys/kernel/debug).
- config DEBUG_KMEMLEAK_MEM_POOL_SIZE
-       int "Kmemleak memory pool size"
-       depends on DEBUG_KMEMLEAK
-       range 200 1000000
-       default 16000
-       help
-         Kmemleak must track all the memory allocations to avoid
-         reporting false positives. Since memory may be allocated or
-         freed before kmemleak is fully initialised, use a static pool
-         of metadata objects to track such callbacks. After kmemleak is
-         fully initialised, this memory pool acts as an emergency one
-         if slab allocations fail.
- config DEBUG_KMEMLEAK_TEST
-       tristate "Simple test for the kernel memory leak detector"
-       depends on DEBUG_KMEMLEAK && m
-       help
-         This option enables a module that explicitly leaks memory.
-         If unsure, say N.
- config DEBUG_KMEMLEAK_DEFAULT_OFF
-       bool "Default kmemleak to off"
-       depends on DEBUG_KMEMLEAK
-       help
-         Say Y here to disable kmemleak by default. It can then be enabled
-         on the command line via kmemleak=on.
- config DEBUG_KMEMLEAK_AUTO_SCAN
-       bool "Enable kmemleak auto scan thread on boot up"
-       default y
-       depends on DEBUG_KMEMLEAK
-       help
-         Depending on the cpu, kmemleak scan may be cpu intensive and can
-         stall user tasks at times. This option enables/disables automatic
-         kmemleak scan at boot up.
-         Say N here to disable kmemleak auto scan thread to stop automatic
-         scanning. Disabling this option disables automatic reporting of
-         memory leaks.
-         If unsure, say Y.
  config DEBUG_STACK_USAGE
        bool "Stack utilization instrumentation"
        depends on DEBUG_KERNEL && !IA64
@@@ -1562,17 -1482,6 +1491,17 @@@ config TRACE_IRQFLAGS_NM
        depends on TRACE_IRQFLAGS
        depends on TRACE_IRQFLAGS_NMI_SUPPORT
  
 +config NMI_CHECK_CPU
 +      bool "Debugging for CPUs failing to respond to backtrace requests"
 +      depends on DEBUG_KERNEL
 +      depends on X86
 +      default n
 +      help
 +        Enables debug prints when a CPU fails to respond to a given
 +        backtrace NMI.  These prints provide some reasons why a CPU
 +        might legitimately be failing to respond, for example, if it
 +        is offline of if ignore_nmis is set.
 +
  config DEBUG_IRQFLAGS
        bool "Debug IRQ flag manipulation"
        help
@@@ -1938,7 -1847,7 +1867,7 @@@ config FUNCTION_ERROR_INJECTIO
        help
          Add fault injections into various functions that are annotated with
          ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
 -        value of theses functions. This is useful to test error paths of code.
 +        value of these functions. This is useful to test error paths of code.
  
          If unsure, say N
  
@@@ -2517,19 -2426,6 +2446,19 @@@ config LIST_KUNIT_TES
  
          If unsure, say N.
  
 +config HASHTABLE_KUNIT_TEST
 +      tristate "KUnit Test for Kernel Hashtable structures" if !KUNIT_ALL_TESTS
 +      depends on KUNIT
 +      default KUNIT_ALL_TESTS
 +      help
 +        This builds the hashtable KUnit test suite.
 +        It tests the basic functionality of the API defined in
 +        include/linux/hashtable.h. For more information on KUnit and
 +        unit tests in general please refer to the KUnit documentation
 +        in Documentation/dev-tools/kunit/.
 +
 +        If unsure, say N.
 +
  config LINEAR_RANGES_TEST
        tristate "KUnit test for linear_ranges"
        depends on KUNIT
@@@ -2600,15 -2496,6 +2529,15 @@@ config MEMCPY_KUNIT_TES
  
          If unsure, say N.
  
 +config MEMCPY_SLOW_KUNIT_TEST
 +      bool "Include exhaustive memcpy tests"
 +      depends on MEMCPY_KUNIT_TEST
 +      default y
 +      help
 +        Some memcpy tests are quite exhaustive in checking for overlaps
 +        and bit ranges. These can be very slow, so they are split out
 +        as a separate config, in case they need to be disabled.
 +
  config IS_SIGNED_TYPE_KUNIT_TEST
        tristate "Test is_signed_type() macro" if !KUNIT_ALL_TESTS
        depends on KUNIT
@@@ -2915,4 -2802,6 +2844,4 @@@ config RUST_BUILD_ASSERT_ALLO
  
  endmenu # "Rust"
  
 -source "Documentation/Kconfig"
 -
  endmenu # Kernel hacking
diff --combined mm/compaction.c
index 8238e83385a7916cfa4ff3398daadc641aa691ec,ad7409f70519048ecf3505240f75feb4346be194..5a9501e0ae0174f3093d764b3e339d0dd0d40b7f
@@@ -122,7 -122,6 +122,6 @@@ bool PageMovable(struct page *page
  
        return false;
  }
- EXPORT_SYMBOL(PageMovable);
  
  void __SetPageMovable(struct page *page, const struct movable_operations *mops)
  {
@@@ -977,7 -976,7 +976,7 @@@ isolate_migratepages_block(struct compa
                                        locked = NULL;
                                }
  
-                               if (!isolate_movable_page(page, mode))
+                               if (isolate_movable_page(page, mode))
                                        goto isolate_success;
                        }
  
@@@ -1102,12 -1101,12 +1101,12 @@@ isolate_success_no_list
  
                /*
                 * Avoid isolating too much unless this block is being
-                * rescanned (e.g. dirty/writeback pages, parallel allocation)
+                * fully scanned (e.g. dirty/writeback pages, parallel allocation)
                 * or a lock is contended. For contention, isolate quickly to
                 * potentially remove one source of contention.
                 */
                if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
-                   !cc->rescan && !cc->contended) {
+                   !cc->finish_pageblock && !cc->contended) {
                        ++low_pfn;
                        break;
                }
@@@ -1172,14 -1171,14 +1171,14 @@@ isolate_abort
        }
  
        /*
-        * Updated the cached scanner pfn once the pageblock has been scanned
+        * Update the cached scanner pfn once the pageblock has been scanned.
         * Pages will either be migrated in which case there is no point
         * scanning in the near future or migration failed in which case the
         * failure reason may persist. The block is marked for skipping if
         * there were no pages isolated in the block or if the block is
         * rescanned twice in a row.
         */
-       if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
+       if (low_pfn == end_pfn && (!nr_isolated || cc->finish_pageblock)) {
                if (valid_page && !skip_updated)
                        set_pageblock_skip(valid_page);
                update_cached_migrate(cc, low_pfn);
@@@ -1762,6 -1761,13 +1761,13 @@@ static unsigned long fast_find_migrateb
        if (cc->ignore_skip_hint)
                return pfn;
  
+       /*
+        * If the pageblock should be finished then do not select a different
+        * pageblock.
+        */
+       if (cc->finish_pageblock)
+               return pfn;
        /*
         * If the migrate_pfn is not at the start of a zone or the start
         * of a pageblock then assume this is a continuation of a previous
                                        pfn = cc->zone->zone_start_pfn;
                                cc->fast_search_fail = 0;
                                found_block = true;
 +                              set_pageblock_skip(freepage);
                                break;
                        }
                }
@@@ -2027,6 -2032,8 +2033,8 @@@ static unsigned int fragmentation_score
                struct zone *zone;
  
                zone = &pgdat->node_zones[zoneid];
+               if (!populated_zone(zone))
+                       continue;
                score += fragmentation_score_zone_weighted(zone);
        }
  
@@@ -2315,9 -2322,6 +2323,6 @@@ compact_zone(struct compact_control *cc
        if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
                return ret;
  
-       /* huh, compaction_suitable is returning something unexpected */
-       VM_BUG_ON(ret != COMPACT_CONTINUE);
        /*
         * Clear pageblock skip if there were failures recently and compaction
         * is about to be retried after being deferred.
                unsigned long iteration_start_pfn = cc->migrate_pfn;
  
                /*
-                * Avoid multiple rescans which can happen if a page cannot be
-                * isolated (dirty/writeback in async mode) or if the migrated
-                * pages are being allocated before the pageblock is cleared.
-                * The first rescan will capture the entire pageblock for
-                * migration. If it fails, it'll be marked skip and scanning
-                * will proceed as normal.
+                * Avoid multiple rescans of the same pageblock which can
+                * happen if a page cannot be isolated (dirty/writeback in
+                * async mode) or if the migrated pages are being allocated
+                * before the pageblock is cleared.  The first rescan will
+                * capture the entire pageblock for migration. If it fails,
+                * it'll be marked skip and scanning will proceed as normal.
                 */
-               cc->rescan = false;
+               cc->finish_pageblock = false;
                if (pageblock_start_pfn(last_migrated_pfn) ==
                    pageblock_start_pfn(iteration_start_pfn)) {
-                       cc->rescan = true;
+                       cc->finish_pageblock = true;
                }
  
+ rescan:
                switch (isolate_migratepages(cc)) {
                case ISOLATE_ABORT:
                        ret = COMPACT_CONTENDED;
                                goto out;
                        }
                        /*
-                        * We failed to migrate at least one page in the current
-                        * order-aligned block, so skip the rest of it.
+                        * If an ASYNC or SYNC_LIGHT fails to migrate a page
+                        * within the current order-aligned block, scan the
+                        * remainder of the pageblock. This will mark the
+                        * pageblock "skip" to avoid rescanning in the near
+                        * future. This will isolate more pages than necessary
+                        * for the request but avoid loops due to
+                        * fast_find_migrateblock revisiting blocks that were
+                        * recently partially scanned.
                         */
-                       if (cc->direct_compaction &&
-                                               (cc->mode == MIGRATE_ASYNC)) {
-                               cc->migrate_pfn = block_end_pfn(
-                                               cc->migrate_pfn - 1, cc->order);
-                               /* Draining pcplists is useless in this case */
-                               last_migrated_pfn = 0;
+                       if (cc->direct_compaction && !cc->finish_pageblock &&
+                                               (cc->mode < MIGRATE_SYNC)) {
+                               cc->finish_pageblock = true;
+                               /*
+                                * Draining pcplists does not help THP if
+                                * any page failed to migrate. Even after
+                                * drain, the pageblock will not be free.
+                                */
+                               if (cc->order == COMPACTION_HPAGE_ORDER)
+                                       last_migrated_pfn = 0;
+                               goto rescan;
                        }
                }
  
+               /* Stop if a page has been captured */
+               if (capc && capc->page) {
+                       ret = COMPACT_SUCCESS;
+                       break;
+               }
  check_drain:
                /*
                 * Has the migration scanner moved away from the previous
                                last_migrated_pfn = 0;
                        }
                }
-               /* Stop if a page has been captured */
-               if (capc && capc->page) {
-                       ret = COMPACT_SUCCESS;
-                       break;
-               }
        }
  
  out:
  
        trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
  
+       VM_BUG_ON(!list_empty(&cc->freepages));
+       VM_BUG_ON(!list_empty(&cc->migratepages));
        return ret;
  }
  
@@@ -2531,9 -2552,6 +2553,6 @@@ static enum compact_result compact_zone
  
        ret = compact_zone(&cc, &capc);
  
-       VM_BUG_ON(!list_empty(&cc.freepages));
-       VM_BUG_ON(!list_empty(&cc.migratepages));
        /*
         * Make sure we hide capture control first before we read the captured
         * page pointer, otherwise an interrupt could free and capture a page
@@@ -2665,8 -2683,10 +2684,10 @@@ static void proactive_compact_node(pg_d
  
                compact_zone(&cc, NULL);
  
-               VM_BUG_ON(!list_empty(&cc.freepages));
-               VM_BUG_ON(!list_empty(&cc.migratepages));
+               count_compact_events(KCOMPACTD_MIGRATE_SCANNED,
+                                    cc.total_migrate_scanned);
+               count_compact_events(KCOMPACTD_FREE_SCANNED,
+                                    cc.total_free_scanned);
        }
  }
  
@@@ -2694,9 -2714,6 +2715,6 @@@ static void compact_node(int nid
                cc.zone = zone;
  
                compact_zone(&cc, NULL);
-               VM_BUG_ON(!list_empty(&cc.freepages));
-               VM_BUG_ON(!list_empty(&cc.migratepages));
        }
  }
  
@@@ -2736,6 -2753,8 +2754,8 @@@ int compaction_proactiveness_sysctl_han
                                continue;
  
                        pgdat->proactive_compact_trigger = true;
+                       trace_mm_compaction_wakeup_kcompactd(pgdat->node_id, -1,
+                                                            pgdat->nr_zones - 1);
                        wake_up_interruptible(&pgdat->kcompactd_wait);
                }
        }
@@@ -2873,9 -2892,6 +2893,6 @@@ static void kcompactd_do_work(pg_data_
                                     cc.total_migrate_scanned);
                count_compact_events(KCOMPACTD_FREE_SCANNED,
                                     cc.total_free_scanned);
-               VM_BUG_ON(!list_empty(&cc.freepages));
-               VM_BUG_ON(!list_empty(&cc.migratepages));
        }
  
        /*
diff --combined mm/filemap.c
index b794943bce760d618cfea226b9008afbb2d94f5b,2ebcf500871d293d59255af15563dc5897ff16ce..2723104cc06a12477a002fff87fe7fe7ec8ccd66
@@@ -42,8 -42,6 +42,8 @@@
  #include <linux/ramfs.h>
  #include <linux/page_idle.h>
  #include <linux/migrate.h>
 +#include <linux/pipe_fs_i.h>
 +#include <linux/splice.h>
  #include <asm/pgalloc.h>
  #include <asm/tlbflush.h>
  #include "internal.h"
@@@ -99,7 -97,7 +99,7 @@@
   *    ->i_pages lock          (__sync_single_inode)
   *
   *  ->i_mmap_rwsem
-  *    ->anon_vma.lock         (vma_adjust)
+  *    ->anon_vma.lock         (vma_merge)
   *
   *  ->anon_vma.lock
   *    ->page_table_lock or pte_lock   (anon_vma_prepare and various)
@@@ -472,7 -470,7 +472,7 @@@ EXPORT_SYMBOL(filemap_flush)
  bool filemap_range_has_page(struct address_space *mapping,
                           loff_t start_byte, loff_t end_byte)
  {
-       struct page *page;
+       struct folio *folio;
        XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
        pgoff_t max = end_byte >> PAGE_SHIFT;
  
  
        rcu_read_lock();
        for (;;) {
-               page = xas_find(&xas, max);
-               if (xas_retry(&xas, page))
+               folio = xas_find(&xas, max);
+               if (xas_retry(&xas, folio))
                        continue;
                /* Shadow entries don't count */
-               if (xa_is_value(page))
+               if (xa_is_value(folio))
                        continue;
                /*
                 * We don't need to try to pin this page; we're about to
        }
        rcu_read_unlock();
  
-       return page != NULL;
+       return folio != NULL;
  }
  EXPORT_SYMBOL(filemap_range_has_page);
  
@@@ -505,25 -503,27 +505,27 @@@ static void __filemap_fdatawait_range(s
  {
        pgoff_t index = start_byte >> PAGE_SHIFT;
        pgoff_t end = end_byte >> PAGE_SHIFT;
-       struct pagevec pvec;
-       int nr_pages;
+       struct folio_batch fbatch;
+       unsigned nr_folios;
+       folio_batch_init(&fbatch);
  
-       pagevec_init(&pvec);
        while (index <= end) {
                unsigned i;
  
-               nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index,
-                               end, PAGECACHE_TAG_WRITEBACK);
-               if (!nr_pages)
+               nr_folios = filemap_get_folios_tag(mapping, &index, end,
+                               PAGECACHE_TAG_WRITEBACK, &fbatch);
+               if (!nr_folios)
                        break;
  
-               for (i = 0; i < nr_pages; i++) {
-                       struct page *page = pvec.pages[i];
+               for (i = 0; i < nr_folios; i++) {
+                       struct folio *folio = fbatch.folios[i];
  
-                       wait_on_page_writeback(page);
-                       ClearPageError(page);
+                       folio_wait_writeback(folio);
+                       folio_clear_error(folio);
                }
-               pagevec_release(&pvec);
+               folio_batch_release(&fbatch);
                cond_resched();
        }
  }
  EXPORT_SYMBOL(filemap_get_folios_contig);
  
  /**
-  * find_get_pages_range_tag - Find and return head pages matching @tag.
-  * @mapping:  the address_space to search
-  * @index:    the starting page index
-  * @end:      The final page index (inclusive)
-  * @tag:      the tag index
-  * @nr_pages: the maximum number of pages
-  * @pages:    where the resulting pages are placed
+  * filemap_get_folios_tag - Get a batch of folios matching @tag
+  * @mapping:    The address_space to search
+  * @start:      The starting page index
+  * @end:        The final page index (inclusive)
+  * @tag:        The tag index
+  * @fbatch:     The batch to fill
   *
-  * Like find_get_pages_range(), except we only return head pages which are
-  * tagged with @tag.  @index is updated to the index immediately after the
-  * last page we return, ready for the next iteration.
+  * Same as filemap_get_folios(), but only returning folios tagged with @tag.
   *
-  * Return: the number of pages which were found.
+  * Return: The number of folios found.
+  * Also update @start to index the next folio for traversal.
   */
- unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
-                       pgoff_t end, xa_mark_t tag, unsigned int nr_pages,
-                       struct page **pages)
+ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
+                       pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch)
  {
-       XA_STATE(xas, &mapping->i_pages, *index);
+       XA_STATE(xas, &mapping->i_pages, *start);
        struct folio *folio;
-       unsigned ret = 0;
-       if (unlikely(!nr_pages))
-               return 0;
  
        rcu_read_lock();
-       while ((folio = find_get_entry(&xas, end, tag))) {
+       while ((folio = find_get_entry(&xas, end, tag)) != NULL) {
                /*
                 * Shadow entries should never be tagged, but this iteration
                 * is lockless so there is a window for page reclaim to evict
-                * a page we saw tagged.  Skip over it.
+                * a page we saw tagged. Skip over it.
                 */
                if (xa_is_value(folio))
                        continue;
+               if (!folio_batch_add(fbatch, folio)) {
+                       unsigned long nr = folio_nr_pages(folio);
  
-               pages[ret] = &folio->page;
-               if (++ret == nr_pages) {
-                       *index = folio->index + folio_nr_pages(folio);
+                       if (folio_test_hugetlb(folio))
+                               nr = 1;
+                       *start = folio->index + nr;
                        goto out;
                }
        }
        /*
-        * We come here when we got to @end. We take care to not overflow the
-        * index @index as it confuses some of the callers. This breaks the
-        * iteration when there is a page at index -1 but that is already
-        * broken anyway.
+        * We come here when there is no page beyond @end. We take care to not
+        * overflow the index @start as it confuses some of the callers. This
+        * breaks the iteration when there is a page at index -1 but that is
+        * already broke anyway.
         */
        if (end == (pgoff_t)-1)
-               *index = (pgoff_t)-1;
+               *start = (pgoff_t)-1;
        else
-               *index = end + 1;
+               *start = end + 1;
  out:
        rcu_read_unlock();
  
-       return ret;
+       return folio_batch_count(fbatch);
  }
- EXPORT_SYMBOL(find_get_pages_range_tag);
+ EXPORT_SYMBOL(filemap_get_folios_tag);
  
  /*
   * CD/DVDs are error prone. When a medium error occurs, the driver may fail
@@@ -2442,19 -2436,21 +2438,19 @@@ static int filemap_read_folio(struct fi
  }
  
  static bool filemap_range_uptodate(struct address_space *mapping,
 -              loff_t pos, struct iov_iter *iter, struct folio *folio)
 +              loff_t pos, size_t count, struct folio *folio,
 +              bool need_uptodate)
  {
 -      int count;
 -
        if (folio_test_uptodate(folio))
                return true;
        /* pipes can't handle partially uptodate pages */
 -      if (iov_iter_is_pipe(iter))
 +      if (need_uptodate)
                return false;
        if (!mapping->a_ops->is_partially_uptodate)
                return false;
        if (mapping->host->i_blkbits >= folio_shift(folio))
                return false;
  
 -      count = iter->count;
        if (folio_pos(folio) > pos) {
                count -= folio_pos(folio) - pos;
                pos = 0;
  }
  
  static int filemap_update_page(struct kiocb *iocb,
 -              struct address_space *mapping, struct iov_iter *iter,
 -              struct folio *folio)
 +              struct address_space *mapping, size_t count,
 +              struct folio *folio, bool need_uptodate)
  {
        int error;
  
                goto unlock;
  
        error = 0;
 -      if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, folio))
 +      if (filemap_range_uptodate(mapping, iocb->ki_pos, count, folio,
 +                                 need_uptodate))
                goto unlock;
  
        error = -EAGAIN;
@@@ -2578,8 -2573,8 +2574,8 @@@ static int filemap_readahead(struct kio
        return 0;
  }
  
 -static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
 -              struct folio_batch *fbatch)
 +static int filemap_get_pages(struct kiocb *iocb, size_t count,
 +              struct folio_batch *fbatch, bool need_uptodate)
  {
        struct file *filp = iocb->ki_filp;
        struct address_space *mapping = filp->f_mapping;
        struct folio *folio;
        int err = 0;
  
 -      last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
 +      /* "last_index" is the index of the page beyond the end of the read */
 +      last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
  retry:
        if (fatal_signal_pending(current))
                return -EINTR;
  
 -      filemap_get_read_batch(mapping, index, last_index, fbatch);
 +      filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
        if (!folio_batch_count(fbatch)) {
                if (iocb->ki_flags & IOCB_NOIO)
                        return -EAGAIN;
                page_cache_sync_readahead(mapping, ra, filp, index,
                                last_index - index);
 -              filemap_get_read_batch(mapping, index, last_index, fbatch);
 +              filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
        }
        if (!folio_batch_count(fbatch)) {
                if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
                if ((iocb->ki_flags & IOCB_WAITQ) &&
                    folio_batch_count(fbatch) > 1)
                        iocb->ki_flags |= IOCB_NOWAIT;
 -              err = filemap_update_page(iocb, mapping, iter, folio);
 +              err = filemap_update_page(iocb, mapping, count, folio,
 +                                        need_uptodate);
                if (err)
                        goto err;
        }
@@@ -2694,8 -2687,7 +2690,8 @@@ ssize_t filemap_read(struct kiocb *iocb
                if (unlikely(iocb->ki_pos >= i_size_read(inode)))
                        break;
  
 -              error = filemap_get_pages(iocb, iter, &fbatch);
 +              error = filemap_get_pages(iocb, iter->count, &fbatch,
 +                                        iov_iter_is_pipe(iter));
                if (error < 0)
                        break;
  
@@@ -2845,134 -2837,6 +2841,134 @@@ generic_file_read_iter(struct kiocb *io
  }
  EXPORT_SYMBOL(generic_file_read_iter);
  
 +/*
 + * Splice subpages from a folio into a pipe.
 + */
 +size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 +                            struct folio *folio, loff_t fpos, size_t size)
 +{
 +      struct page *page;
 +      size_t spliced = 0, offset = offset_in_folio(folio, fpos);
 +
 +      page = folio_page(folio, offset / PAGE_SIZE);
 +      size = min(size, folio_size(folio) - offset);
 +      offset %= PAGE_SIZE;
 +
 +      while (spliced < size &&
 +             !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
 +              struct pipe_buffer *buf = pipe_head_buf(pipe);
 +              size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced);
 +
 +              *buf = (struct pipe_buffer) {
 +                      .ops    = &page_cache_pipe_buf_ops,
 +                      .page   = page,
 +                      .offset = offset,
 +                      .len    = part,
 +              };
 +              folio_get(folio);
 +              pipe->head++;
 +              page++;
 +              spliced += part;
 +              offset = 0;
 +      }
 +
 +      return spliced;
 +}
 +
 +/*
 + * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into
 + * a pipe.
 + */
 +ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 +                          struct pipe_inode_info *pipe,
 +                          size_t len, unsigned int flags)
 +{
 +      struct folio_batch fbatch;
 +      struct kiocb iocb;
 +      size_t total_spliced = 0, used, npages;
 +      loff_t isize, end_offset;
 +      bool writably_mapped;
 +      int i, error = 0;
 +
 +      init_sync_kiocb(&iocb, in);
 +      iocb.ki_pos = *ppos;
 +
 +      /* Work out how much data we can actually add into the pipe */
 +      used = pipe_occupancy(pipe->head, pipe->tail);
 +      npages = max_t(ssize_t, pipe->max_usage - used, 0);
 +      len = min_t(size_t, len, npages * PAGE_SIZE);
 +
 +      folio_batch_init(&fbatch);
 +
 +      do {
 +              cond_resched();
 +
 +              if (*ppos >= i_size_read(file_inode(in)))
 +                      break;
 +
 +              iocb.ki_pos = *ppos;
 +              error = filemap_get_pages(&iocb, len, &fbatch, true);
 +              if (error < 0)
 +                      break;
 +
 +              /*
 +               * i_size must be checked after we know the pages are Uptodate.
 +               *
 +               * Checking i_size after the check allows us to calculate
 +               * the correct value for "nr", which means the zero-filled
 +               * part of the page is not copied back to userspace (unless
 +               * another truncate extends the file - this is desired though).
 +               */
 +              isize = i_size_read(file_inode(in));
 +              if (unlikely(*ppos >= isize))
 +                      break;
 +              end_offset = min_t(loff_t, isize, *ppos + len);
 +
 +              /*
 +               * Once we start copying data, we don't want to be touching any
 +               * cachelines that might be contended:
 +               */
 +              writably_mapped = mapping_writably_mapped(in->f_mapping);
 +
 +              for (i = 0; i < folio_batch_count(&fbatch); i++) {
 +                      struct folio *folio = fbatch.folios[i];
 +                      size_t n;
 +
 +                      if (folio_pos(folio) >= end_offset)
 +                              goto out;
 +                      folio_mark_accessed(folio);
 +
 +                      /*
 +                       * If users can be writing to this folio using arbitrary
 +                       * virtual addresses, take care of potential aliasing
 +                       * before reading the folio on the kernel side.
 +                       */
 +                      if (writably_mapped)
 +                              flush_dcache_folio(folio);
 +
 +                      n = min_t(loff_t, len, isize - *ppos);
 +                      n = splice_folio_into_pipe(pipe, folio, *ppos, n);
 +                      if (!n)
 +                              goto out;
 +                      len -= n;
 +                      total_spliced += n;
 +                      *ppos += n;
 +                      in->f_ra.prev_pos = *ppos;
 +                      if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
 +                              goto out;
 +              }
 +
 +              folio_batch_release(&fbatch);
 +      } while (len);
 +
 +out:
 +      folio_batch_release(&fbatch);
 +      file_accessed(in);
 +
 +      return total_spliced ? total_spliced : error;
 +}
 +EXPORT_SYMBOL(filemap_splice_read);
 +
  static inline loff_t folio_seek_hole_data(struct xa_state *xas,
                struct address_space *mapping, struct folio *folio,
                loff_t start, loff_t end, bool seek_data)
@@@ -3395,22 -3259,24 +3391,24 @@@ out_retry
  }
  EXPORT_SYMBOL(filemap_fault);
  
- static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
+ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
+               pgoff_t start)
  {
        struct mm_struct *mm = vmf->vma->vm_mm;
  
        /* Huge page is mapped? No need to proceed. */
        if (pmd_trans_huge(*vmf->pmd)) {
-               unlock_page(page);
-               put_page(page);
+               folio_unlock(folio);
+               folio_put(folio);
                return true;
        }
  
-       if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
+       if (pmd_none(*vmf->pmd) && folio_test_pmd_mappable(folio)) {
+               struct page *page = folio_file_page(folio, start);
                vm_fault_t ret = do_set_pmd(vmf, page);
                if (!ret) {
                        /* The page is mapped successfully, reference consumed. */
-                       unlock_page(page);
+                       folio_unlock(folio);
                        return true;
                }
        }
  
        /* See comment in handle_pte_fault() */
        if (pmd_devmap_trans_unstable(vmf->pmd)) {
-               unlock_page(page);
-               put_page(page);
+               folio_unlock(folio);
+               folio_put(folio);
                return true;
        }
  
@@@ -3504,7 -3370,7 +3502,7 @@@ vm_fault_t filemap_map_pages(struct vm_
        if (!folio)
                goto out;
  
-       if (filemap_map_pmd(vmf, &folio->page)) {
+       if (filemap_map_pmd(vmf, folio, start_pgoff)) {
                ret = VM_FAULT_NOPAGE;
                goto out;
        }
@@@ -3719,6 -3585,30 +3717,30 @@@ struct folio *read_cache_folio(struct a
  }
  EXPORT_SYMBOL(read_cache_folio);
  
+ /**
+  * mapping_read_folio_gfp - Read into page cache, using specified allocation flags.
+  * @mapping:  The address_space for the folio.
+  * @index:    The index that the allocated folio will contain.
+  * @gfp:      The page allocator flags to use if allocating.
+  *
+  * This is the same as "read_cache_folio(mapping, index, NULL, NULL)", but with
+  * any new memory allocations done using the specified allocation flags.
+  *
+  * The most likely error from this function is EIO, but ENOMEM is
+  * possible and so is EINTR.  If ->read_folio returns another error,
+  * that will be returned to the caller.
+  *
+  * The function expects mapping->invalidate_lock to be already held.
+  *
+  * Return: Uptodate folio on success, ERR_PTR() on failure.
+  */
+ struct folio *mapping_read_folio_gfp(struct address_space *mapping,
+               pgoff_t index, gfp_t gfp)
+ {
+       return do_read_cache_folio(mapping, index, NULL, NULL, gfp);
+ }
+ EXPORT_SYMBOL(mapping_read_folio_gfp);
  static struct page *do_read_cache_page(struct address_space *mapping,
                pgoff_t index, filler_t *filler, struct file *file, gfp_t gfp)
  {
diff --combined mm/huge_memory.c
index 1b791b26d72d7aa678512a40488beab342c2648b,1343a7d88299be850afdc1f022607d675bf92d9b..4fc43859e59a31932a657cd2fac2b511c00e812b
@@@ -119,7 -119,8 +119,8 @@@ bool hugepage_vma_check(struct vm_area_
         * own flags.
         */
        if (!in_pf && shmem_file(vma->vm_file))
-               return shmem_huge_enabled(vma, !enforce_sysfs);
+               return shmem_is_huge(file_inode(vma->vm_file), vma->vm_pgoff,
+                                    !enforce_sysfs, vma->vm_mm, vm_flags);
  
        /* Enforce sysfs THP requirements as necessary */
        if (enforce_sysfs &&
@@@ -559,10 -560,11 +560,11 @@@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, stru
  }
  
  #ifdef CONFIG_MEMCG
- static inline struct deferred_split *get_deferred_split_queue(struct page *page)
+ static inline
+ struct deferred_split *get_deferred_split_queue(struct folio *folio)
  {
-       struct mem_cgroup *memcg = page_memcg(compound_head(page));
-       struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
+       struct mem_cgroup *memcg = folio_memcg(folio);
+       struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
  
        if (memcg)
                return &memcg->deferred_split_queue;
                return &pgdat->deferred_split_queue;
  }
  #else
- static inline struct deferred_split *get_deferred_split_queue(struct page *page)
+ static inline
+ struct deferred_split *get_deferred_split_queue(struct folio *folio)
  {
-       struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
+       struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
  
        return &pgdat->deferred_split_queue;
  }
  
  void prep_transhuge_page(struct page *page)
  {
-       /*
-        * we use page->mapping and page->index in second tail page
-        * as list_head: assuming THP order >= 2
-        */
+       struct folio *folio = (struct folio *)page;
  
-       INIT_LIST_HEAD(page_deferred_list(page));
+       VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
+       INIT_LIST_HEAD(&folio->_deferred_list);
        set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
  }
  
  static inline bool is_transparent_hugepage(struct page *page)
  {
+       struct folio *folio;
        if (!PageCompound(page))
                return false;
  
-       page = compound_head(page);
-       return is_huge_zero_page(page) ||
-              page[1].compound_dtor == TRANSHUGE_PAGE_DTOR;
+       folio = page_folio(page);
+       return is_huge_zero_page(&folio->page) ||
+              folio->_folio_dtor == TRANSHUGE_PAGE_DTOR;
  }
  
  static unsigned long __thp_get_unmapped_area(struct file *filp,
@@@ -1039,11 -1042,6 +1042,6 @@@ struct page *follow_devmap_pmd(struct v
  
        assert_spin_locked(pmd_lockptr(mm, pmd));
  
-       /* FOLL_GET and FOLL_PIN are mutually exclusive. */
-       if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
-                        (FOLL_PIN | FOLL_GET)))
-               return NULL;
        if (flags & FOLL_WRITE && !pmd_write(*pmd))
                return NULL;
  
@@@ -1202,11 -1200,6 +1200,6 @@@ struct page *follow_devmap_pud(struct v
        if (flags & FOLL_WRITE && !pud_write(*pud))
                return NULL;
  
-       /* FOLL_GET and FOLL_PIN are mutually exclusive. */
-       if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
-                        (FOLL_PIN | FOLL_GET)))
-               return NULL;
        if (pud_present(*pud) && pud_devmap(*pud))
                /* pass */;
        else
@@@ -1603,7 -1596,7 +1596,7 @@@ bool madvise_free_huge_pmd(struct mmu_g
  {
        spinlock_t *ptl;
        pmd_t orig_pmd;
-       struct page *page;
+       struct folio *folio;
        struct mm_struct *mm = tlb->mm;
        bool ret = false;
  
                goto out;
        }
  
-       page = pmd_page(orig_pmd);
+       folio = pfn_folio(pmd_pfn(orig_pmd));
        /*
-        * If other processes are mapping this page, we couldn't discard
-        * the page unless they all do MADV_FREE so let's skip the page.
+        * If other processes are mapping this folio, we couldn't discard
+        * the folio unless they all do MADV_FREE so let's skip the folio.
         */
-       if (total_mapcount(page) != 1)
+       if (folio_mapcount(folio) != 1)
                goto out;
  
-       if (!trylock_page(page))
+       if (!folio_trylock(folio))
                goto out;
  
        /*
         * will deactivate only them.
         */
        if (next - addr != HPAGE_PMD_SIZE) {
-               get_page(page);
+               folio_get(folio);
                spin_unlock(ptl);
-               split_huge_page(page);
-               unlock_page(page);
-               put_page(page);
+               split_folio(folio);
+               folio_unlock(folio);
+               folio_put(folio);
                goto out_unlocked;
        }
  
-       if (PageDirty(page))
-               ClearPageDirty(page);
-       unlock_page(page);
+       if (folio_test_dirty(folio))
+               folio_clear_dirty(folio);
+       folio_unlock(folio);
  
        if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
                pmdp_invalidate(vma, addr, pmd);
                tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
        }
  
-       mark_page_lazyfree(page);
+       folio_mark_lazyfree(folio);
        ret = true;
  out:
        spin_unlock(ptl);
@@@ -1920,17 -1913,15 +1913,15 @@@ int change_huge_pmd(struct mmu_gather *
        oldpmd = pmdp_invalidate_ad(vma, addr, pmd);
  
        entry = pmd_modify(oldpmd, newprot);
-       if (uffd_wp) {
-               entry = pmd_wrprotect(entry);
+       if (uffd_wp)
                entry = pmd_mkuffd_wp(entry);
-       } else if (uffd_wp_resolve) {
+       else if (uffd_wp_resolve)
                /*
                 * Leave the write bit to be handled by PF interrupt
                 * handler, then things like COW could be properly
                 * handled.
                 */
                entry = pmd_clear_uffd_wp(entry);
-       }
  
        /* See change_pte_range(). */
        if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) &&
@@@ -2022,7 -2013,7 +2013,7 @@@ void __split_huge_pud(struct vm_area_st
        spinlock_t *ptl;
        struct mmu_notifier_range range;
  
-       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
                                address & HPAGE_PUD_MASK,
                                (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
        mmu_notifier_invalidate_range_start(&range);
@@@ -2284,7 -2275,7 +2275,7 @@@ void __split_huge_pmd(struct vm_area_st
        spinlock_t *ptl;
        struct mmu_notifier_range range;
  
-       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
                                address & HPAGE_PMD_MASK,
                                (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
        mmu_notifier_invalidate_range_start(&range);
@@@ -2479,9 -2470,9 +2470,9 @@@ static void __split_huge_page_tail(stru
         * of swap cache pages that store the swp_entry_t in tail pages.
         * Fix up and warn once if private is unexpectedly set.
         *
-        * What of 32-bit systems, on which head[1].compound_pincount overlays
+        * What of 32-bit systems, on which folio->_pincount overlays
         * head[1].private?  No problem: THP_SWAP is not enabled on 32-bit, and
-        * compound_pincount must be 0 for folio_ref_freeze() to have succeeded.
+        * pincount must be 0 for folio_ref_freeze() to have succeeded.
         */
        if (!folio_test_swapcache(page_folio(head))) {
                VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
@@@ -2652,7 -2643,7 +2643,7 @@@ bool can_split_folio(struct folio *foli
  int split_huge_page_to_list(struct page *page, struct list_head *list)
  {
        struct folio *folio = page_folio(page);
-       struct deferred_split *ds_queue = get_deferred_split_queue(&folio->page);
+       struct deferred_split *ds_queue = get_deferred_split_queue(folio);
        XA_STATE(xas, &folio->mapping->i_pages, folio->index);
        struct anon_vma *anon_vma = NULL;
        struct address_space *mapping = NULL;
        /* Prevent deferred_split_scan() touching ->_refcount */
        spin_lock(&ds_queue->split_queue_lock);
        if (folio_ref_freeze(folio, 1 + extra_pins)) {
-               if (!list_empty(page_deferred_list(&folio->page))) {
+               if (!list_empty(&folio->_deferred_list)) {
                        ds_queue->split_queue_len--;
-                       list_del(page_deferred_list(&folio->page));
+                       list_del(&folio->_deferred_list);
                }
                spin_unlock(&ds_queue->split_queue_lock);
                if (mapping) {
  
  void free_transhuge_page(struct page *page)
  {
-       struct deferred_split *ds_queue = get_deferred_split_queue(page);
+       struct folio *folio = (struct folio *)page;
+       struct deferred_split *ds_queue = get_deferred_split_queue(folio);
        unsigned long flags;
  
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-       if (!list_empty(page_deferred_list(page))) {
+       if (!list_empty(&folio->_deferred_list)) {
                ds_queue->split_queue_len--;
-               list_del(page_deferred_list(page));
+               list_del(&folio->_deferred_list);
        }
        spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
        free_compound_page(page);
  }
  
- void deferred_split_huge_page(struct page *page)
+ void deferred_split_folio(struct folio *folio)
  {
-       struct deferred_split *ds_queue = get_deferred_split_queue(page);
+       struct deferred_split *ds_queue = get_deferred_split_queue(folio);
  #ifdef CONFIG_MEMCG
-       struct mem_cgroup *memcg = page_memcg(compound_head(page));
+       struct mem_cgroup *memcg = folio_memcg(folio);
  #endif
        unsigned long flags;
  
-       VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+       VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
  
        /*
         * The try_to_unmap() in page reclaim path might reach here too,
         * this may cause a race condition to corrupt deferred split queue.
-        * And, if page reclaim is already handling the same page, it is
+        * And, if page reclaim is already handling the same folio, it is
         * unnecessary to handle it again in shrinker.
         *
-        * Check PageSwapCache to determine if the page is being
-        * handled by page reclaim since THP swap would add the page into
+        * Check the swapcache flag to determine if the folio is being
+        * handled by page reclaim since THP swap would add the folio into
         * swap cache before calling try_to_unmap().
         */
-       if (PageSwapCache(page))
+       if (folio_test_swapcache(folio))
+               return;
+       if (!list_empty(&folio->_deferred_list))
                return;
  
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-       if (list_empty(page_deferred_list(page))) {
+       if (list_empty(&folio->_deferred_list)) {
                count_vm_event(THP_DEFERRED_SPLIT_PAGE);
-               list_add_tail(page_deferred_list(page), &ds_queue->split_queue);
+               list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
                ds_queue->split_queue_len++;
  #ifdef CONFIG_MEMCG
                if (memcg)
-                       set_shrinker_bit(memcg, page_to_nid(page),
+                       set_shrinker_bit(memcg, folio_nid(folio),
                                         deferred_split_shrinker.id);
  #endif
        }
@@@ -2870,8 -2865,8 +2865,8 @@@ static unsigned long deferred_split_sca
        struct pglist_data *pgdata = NODE_DATA(sc->nid);
        struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
        unsigned long flags;
-       LIST_HEAD(list), *pos, *next;
-       struct page *page;
+       LIST_HEAD(list);
+       struct folio *folio, *next;
        int split = 0;
  
  #ifdef CONFIG_MEMCG
  
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
        /* Take pin on all head pages to avoid freeing them under us */
-       list_for_each_safe(pos, next, &ds_queue->split_queue) {
-               page = list_entry((void *)pos, struct page, deferred_list);
-               page = compound_head(page);
-               if (get_page_unless_zero(page)) {
-                       list_move(page_deferred_list(page), &list);
+       list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
+                                                       _deferred_list) {
+               if (folio_try_get(folio)) {
+                       list_move(&folio->_deferred_list, &list);
                } else {
-                       /* We lost race with put_compound_page() */
-                       list_del_init(page_deferred_list(page));
+                       /* We lost race with folio_put() */
+                       list_del_init(&folio->_deferred_list);
                        ds_queue->split_queue_len--;
                }
                if (!--sc->nr_to_scan)
        }
        spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
  
-       list_for_each_safe(pos, next, &list) {
-               page = list_entry((void *)pos, struct page, deferred_list);
-               if (!trylock_page(page))
+       list_for_each_entry_safe(folio, next, &list, _deferred_list) {
+               if (!folio_trylock(folio))
                        goto next;
                /* split_huge_page() removes page from list on success */
-               if (!split_huge_page(page))
+               if (!split_folio(folio))
                        split++;
-               unlock_page(page);
+               folio_unlock(folio);
  next:
-               put_page(page);
+               folio_put(folio);
        }
  
        spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
@@@ -2934,6 -2927,7 +2927,7 @@@ static void split_huge_pages_all(void
  {
        struct zone *zone;
        struct page *page;
+       struct folio *folio;
        unsigned long pfn, max_zone_pfn;
        unsigned long total = 0, split = 0;
  
                        int nr_pages;
  
                        page = pfn_to_online_page(pfn);
-                       if (!page || !get_page_unless_zero(page))
+                       if (!page || PageTail(page))
+                               continue;
+                       folio = page_folio(page);
+                       if (!folio_try_get(folio))
                                continue;
  
-                       if (zone != page_zone(page))
+                       if (unlikely(page_folio(page) != folio))
                                goto next;
  
-                       if (!PageHead(page) || PageHuge(page) || !PageLRU(page))
+                       if (zone != folio_zone(folio))
+                               goto next;
+                       if (!folio_test_large(folio)
+                               || folio_test_hugetlb(folio)
+                               || !folio_test_lru(folio))
                                goto next;
  
                        total++;
-                       lock_page(page);
-                       nr_pages = thp_nr_pages(page);
-                       if (!split_huge_page(page))
+                       folio_lock(folio);
+                       nr_pages = folio_nr_pages(folio);
+                       if (!split_folio(folio))
                                split++;
                        pfn += nr_pages - 1;
-                       unlock_page(page);
+                       folio_unlock(folio);
  next:
-                       put_page(page);
+                       folio_put(folio);
                        cond_resched();
                }
        }
@@@ -3272,17 -3274,15 +3274,17 @@@ void remove_migration_pmd(struct page_v
        pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
        if (pmd_swp_soft_dirty(*pvmw->pmd))
                pmde = pmd_mksoft_dirty(pmde);
 -      if (is_writable_migration_entry(entry))
 -              pmde = maybe_pmd_mkwrite(pmde, vma);
        if (pmd_swp_uffd_wp(*pvmw->pmd))
-               pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
+               pmde = pmd_mkuffd_wp(pmde);
        if (!is_migration_entry_young(entry))
                pmde = pmd_mkold(pmde);
        /* NOTE: this may contain setting soft-dirty on some archs */
        if (PageDirty(new) && is_migration_entry_dirty(entry))
                pmde = pmd_mkdirty(pmde);
 +      if (is_writable_migration_entry(entry))
 +              pmde = maybe_pmd_mkwrite(pmde, vma);
 +      else
 +              pmde = pmd_wrprotect(pmde);
  
        if (PageAnon(new)) {
                rmap_t rmap_flags = RMAP_COMPOUND;
diff --combined mm/internal.h
index 6d4ca98f384495ceb844bf9e51dd9a5a28f9b40b,fc01fd092ea584d1954a680499a463e24b97a2de..7920a8b7982ec3b9753f520217d16bcc0f8270e2
@@@ -24,7 -24,7 +24,7 @@@ struct folio_batch
  #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
                        __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
                        __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
-                       __GFP_ATOMIC|__GFP_NOLOCKDEP)
+                       __GFP_NOLOCKDEP)
  
  /* The GFP flags allowed during early boot */
  #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
  
  void page_writeback_init(void);
  
+ /*
+  * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
+  * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit
+  * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE).  Hugetlb currently
+  * leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
+  */
+ #define COMPOUND_MAPPED               0x800000
+ #define FOLIO_PAGES_MAPPED    (COMPOUND_MAPPED - 1)
+ /*
+  * How many individual pages have an elevated _mapcount.  Excludes
+  * the folio's entire_mapcount.
+  */
+ static inline int folio_nr_pages_mapped(struct folio *folio)
+ {
+       return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
+ }
  static inline void *folio_raw_mapping(struct folio *folio)
  {
        unsigned long mapping = (unsigned long)folio->mapping;
@@@ -141,17 -159,6 +159,6 @@@ static inline bool folio_evictable(stru
        return ret;
  }
  
- static inline bool page_evictable(struct page *page)
- {
-       bool ret;
-       /* Prevent address_space of inode and swap cache from being freed */
-       rcu_read_lock();
-       ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
-       rcu_read_unlock();
-       return ret;
- }
  /*
   * Turn a non-refcounted page (->_refcount == 0) into refcounted with
   * a count of one.
@@@ -180,8 -187,8 +187,8 @@@ pgprot_t __init early_memremap_pgprot_a
  /*
   * in mm/vmscan.c:
   */
int isolate_lru_page(struct page *page);
int folio_isolate_lru(struct folio *folio);
bool isolate_lru_page(struct page *page);
bool folio_isolate_lru(struct folio *folio);
  void putback_lru_page(struct page *page);
  void folio_putback_lru(struct folio *folio);
  extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
@@@ -378,6 -385,25 +385,25 @@@ extern void *memmap_alloc(phys_addr_t s
  int split_free_page(struct page *free_page,
                        unsigned int order, unsigned long split_pfn_offset);
  
+ /*
+  * This will have no effect, other than possibly generating a warning, if the
+  * caller passes in a non-large folio.
+  */
+ static inline void folio_set_order(struct folio *folio, unsigned int order)
+ {
+       if (WARN_ON_ONCE(!folio_test_large(folio)))
+               return;
+       folio->_folio_order = order;
+ #ifdef CONFIG_64BIT
+       /*
+        * When hugetlb dissolves a folio, we need to clear the tail
+        * page, rather than setting nr_pages to 1.
+        */
+       folio->_folio_nr_pages = order ? 1U << order : 0;
+ #endif
+ }
  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
  
  /*
@@@ -422,7 -448,11 +448,11 @@@ struct compact_control 
        bool proactive_compaction;      /* kcompactd proactive compaction */
        bool whole_zone;                /* Whole zone should/has been scanned */
        bool contended;                 /* Signal lock contention */
-       bool rescan;                    /* Rescanning the same pageblock */
+       bool finish_pageblock;          /* Scan the remainder of a pageblock. Used
+                                        * when there are potentially transient
+                                        * isolation or migration failures to
+                                        * ensure forward progress.
+                                        */
        bool alloc_contig;              /* alloc_contig_range allocation */
  };
  
@@@ -492,14 -522,13 +522,13 @@@ extern long faultin_vma_page_range(stru
  extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
                              unsigned long len);
  /*
-  * mlock_vma_page() and munlock_vma_page():
+  * mlock_vma_folio() and munlock_vma_folio():
   * should be called with vma's mmap_lock held for read or write,
   * under page table lock for the pte/pmd being added or removed.
   *
-  * mlock is usually called at the end of page_add_*_rmap(),
-  * munlock at the end of page_remove_rmap(); but new anon
-  * pages are managed by lru_cache_add_inactive_or_unevictable()
-  * calling mlock_new_page().
+  * mlock is usually called at the end of page_add_*_rmap(), munlock at
+  * the end of page_remove_rmap(); but new anon folios are managed by
+  * folio_add_lru_vma() calling mlock_new_folio().
   *
   * @compound is used to include pmd mappings of THPs, but filter out
   * pte mappings of THPs, which cannot be consistently counted: a pte
@@@ -522,24 -551,19 +551,19 @@@ static inline void mlock_vma_folio(stru
                mlock_folio(folio);
  }
  
- static inline void mlock_vma_page(struct page *page,
-                       struct vm_area_struct *vma, bool compound)
- {
-       mlock_vma_folio(page_folio(page), vma, compound);
- }
- void munlock_page(struct page *page);
- static inline void munlock_vma_page(struct page *page,
+ void munlock_folio(struct folio *folio);
+ static inline void munlock_vma_folio(struct folio *folio,
                        struct vm_area_struct *vma, bool compound)
  {
        if (unlikely(vma->vm_flags & VM_LOCKED) &&
-           (compound || !PageTransCompound(page)))
-               munlock_page(page);
+           (compound || !folio_test_large(folio)))
+               munlock_folio(folio);
  }
- void mlock_new_page(struct page *page);
- bool need_mlock_page_drain(int cpu);
- void mlock_page_drain_local(void);
- void mlock_page_drain_remote(int cpu);
+ void mlock_new_folio(struct folio *folio);
+ bool need_mlock_drain(int cpu);
+ void mlock_drain_local(void);
+ void mlock_drain_remote(int cpu);
  
  extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
  
@@@ -624,14 -648,10 +648,10 @@@ static inline struct file *maybe_unlock
  }
  #else /* !CONFIG_MMU */
  static inline void unmap_mapping_folio(struct folio *folio) { }
- static inline void mlock_vma_page(struct page *page,
-                       struct vm_area_struct *vma, bool compound) { }
- static inline void munlock_vma_page(struct page *page,
-                       struct vm_area_struct *vma, bool compound) { }
- static inline void mlock_new_page(struct page *page) { }
- static inline bool need_mlock_page_drain(int cpu) { return false; }
- static inline void mlock_page_drain_local(void) { }
- static inline void mlock_page_drain_remote(int cpu) { }
+ static inline void mlock_new_folio(struct folio *folio) { }
+ static inline bool need_mlock_drain(int cpu) { return false; }
+ static inline void mlock_drain_local(void) { }
+ static inline void mlock_drain_remote(int cpu) { }
  static inline void vunmap_range_noflush(unsigned long start, unsigned long end)
  {
  }
@@@ -735,8 -755,13 +755,13 @@@ unsigned int reclaim_clean_pages_from_l
  #define ALLOC_OOM             ALLOC_NO_WATERMARKS
  #endif
  
- #define ALLOC_HARDER           0x10 /* try to alloc harder */
- #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
+ #define ALLOC_NON_BLOCK                0x10 /* Caller cannot block. Allow access
+                                      * to 25% of the min watermark or
+                                      * 62.5% if __GFP_HIGH is set.
+                                      */
+ #define ALLOC_MIN_RESERVE      0x20 /* __GFP_HIGH set. Allow access to 50%
+                                      * of the min watermark.
+                                      */
  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
  #define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
  #ifdef CONFIG_ZONE_DMA32
  #else
  #define ALLOC_NOFRAGMENT        0x0
  #endif
+ #define ALLOC_HIGHATOMIC      0x200 /* Allows access to MIGRATE_HIGHATOMIC */
  #define ALLOC_KSWAPD          0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
  
+ /* Flags that allow allocations below the min watermark. */
+ #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
  enum ttu_flags;
  struct tlbflush_unmap_batch;
  
@@@ -794,12 -823,6 +823,12 @@@ struct migration_target_control 
        gfp_t gfp_mask;
  };
  
 +/*
 + * mm/filemap.c
 + */
 +size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 +                            struct folio *folio, loff_t fpos, size_t size);
 +
  /*
   * mm/vmalloc.c
   */
@@@ -833,6 -856,87 +862,87 @@@ int migrate_device_coherent_page(struc
   * mm/gup.c
   */
  struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
+ int __must_check try_grab_page(struct page *page, unsigned int flags);
+ enum {
+       /* mark page accessed */
+       FOLL_TOUCH = 1 << 16,
+       /* a retry, previous pass started an IO */
+       FOLL_TRIED = 1 << 17,
+       /* we are working on non-current tsk/mm */
+       FOLL_REMOTE = 1 << 18,
+       /* pages must be released via unpin_user_page */
+       FOLL_PIN = 1 << 19,
+       /* gup_fast: prevent fall-back to slow gup */
+       FOLL_FAST_ONLY = 1 << 20,
+       /* allow unlocking the mmap lock */
+       FOLL_UNLOCKABLE = 1 << 21,
+ };
+ /*
+  * Indicates for which pages that are write-protected in the page table,
+  * whether GUP has to trigger unsharing via FAULT_FLAG_UNSHARE such that the
+  * GUP pin will remain consistent with the pages mapped into the page tables
+  * of the MM.
+  *
+  * Temporary unmapping of PageAnonExclusive() pages or clearing of
+  * PageAnonExclusive() has to protect against concurrent GUP:
+  * * Ordinary GUP: Using the PT lock
+  * * GUP-fast and fork(): mm->write_protect_seq
+  * * GUP-fast and KSM or temporary unmapping (swap, migration): see
+  *    page_try_share_anon_rmap()
+  *
+  * Must be called with the (sub)page that's actually referenced via the
+  * page table entry, which might not necessarily be the head page for a
+  * PTE-mapped THP.
+  *
+  * If the vma is NULL, we're coming from the GUP-fast path and might have
+  * to fallback to the slow path just to lookup the vma.
+  */
+ static inline bool gup_must_unshare(struct vm_area_struct *vma,
+                                   unsigned int flags, struct page *page)
+ {
+       /*
+        * FOLL_WRITE is implicitly handled correctly as the page table entry
+        * has to be writable -- and if it references (part of) an anonymous
+        * folio, that part is required to be marked exclusive.
+        */
+       if ((flags & (FOLL_WRITE | FOLL_PIN)) != FOLL_PIN)
+               return false;
+       /*
+        * Note: PageAnon(page) is stable until the page is actually getting
+        * freed.
+        */
+       if (!PageAnon(page)) {
+               /*
+                * We only care about R/O long-term pining: R/O short-term
+                * pinning does not have the semantics to observe successive
+                * changes through the process page tables.
+                */
+               if (!(flags & FOLL_LONGTERM))
+                       return false;
+               /* We really need the vma ... */
+               if (!vma)
+                       return true;
+               /*
+                * ... because we only care about writable private ("COW")
+                * mappings where we have to break COW early.
+                */
+               return is_cow_mapping(vma->vm_flags);
+       }
+       /* Paired with a memory barrier in page_try_share_anon_rmap(). */
+       if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+               smp_rmb();
+       /*
+        * Note that PageKsm() pages cannot be exclusive, and consequently,
+        * cannot get pinned.
+        */
+       return !PageAnonExclusive(page);
+ }
  
  extern bool mirrored_kernelcore;
  
@@@ -854,4 -958,82 +964,82 @@@ static inline bool vma_soft_dirty_enabl
        return !(vma->vm_flags & VM_SOFTDIRTY);
  }
  
+ /*
+  * VMA Iterator functions shared between nommu and mmap
+  */
+ static inline int vma_iter_prealloc(struct vma_iterator *vmi)
+ {
+       return mas_preallocate(&vmi->mas, GFP_KERNEL);
+ }
+ static inline void vma_iter_clear(struct vma_iterator *vmi,
+                                 unsigned long start, unsigned long end)
+ {
+       mas_set_range(&vmi->mas, start, end - 1);
+       mas_store_prealloc(&vmi->mas, NULL);
+ }
+ static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi)
+ {
+       return mas_walk(&vmi->mas);
+ }
+ /* Store a VMA with preallocated memory */
+ static inline void vma_iter_store(struct vma_iterator *vmi,
+                                 struct vm_area_struct *vma)
+ {
+ #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
+       if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.index > vma->vm_start)) {
+               printk("%lu > %lu\n", vmi->mas.index, vma->vm_start);
+               printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+               printk("into slot    %lu-%lu", vmi->mas.index, vmi->mas.last);
+               mt_dump(vmi->mas.tree);
+       }
+       if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.last <  vma->vm_start)) {
+               printk("%lu < %lu\n", vmi->mas.last, vma->vm_start);
+               printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+               printk("into slot    %lu-%lu", vmi->mas.index, vmi->mas.last);
+               mt_dump(vmi->mas.tree);
+       }
+ #endif
+       if (vmi->mas.node != MAS_START &&
+           ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+               vma_iter_invalidate(vmi);
+       vmi->mas.index = vma->vm_start;
+       vmi->mas.last = vma->vm_end - 1;
+       mas_store_prealloc(&vmi->mas, vma);
+ }
+ static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
+                       struct vm_area_struct *vma, gfp_t gfp)
+ {
+       if (vmi->mas.node != MAS_START &&
+           ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+               vma_iter_invalidate(vmi);
+       vmi->mas.index = vma->vm_start;
+       vmi->mas.last = vma->vm_end - 1;
+       mas_store_gfp(&vmi->mas, vma, gfp);
+       if (unlikely(mas_is_err(&vmi->mas)))
+               return -ENOMEM;
+       return 0;
+ }
+ /*
+  * VMA lock generalization
+  */
+ struct vma_prepare {
+       struct vm_area_struct *vma;
+       struct vm_area_struct *adj_next;
+       struct file *file;
+       struct address_space *mapping;
+       struct anon_vma *anon_vma;
+       struct vm_area_struct *insert;
+       struct vm_area_struct *remove;
+       struct vm_area_struct *remove2;
+ };
  #endif        /* __MM_INTERNAL_H */
diff --combined mm/kasan/kasan.h
index 71c15438afcfc004876a8e47fc1becfdabca6c08,8fae87ab99cc53e02f827f00b3a9e8bddbe3d9cf..9377b0789edc2c9390872f7eaf0c83502ab80cf5
@@@ -42,6 -42,10 +42,10 @@@ enum kasan_mode 
  
  extern enum kasan_mode kasan_mode __ro_after_init;
  
+ extern unsigned long kasan_page_alloc_sample;
+ extern unsigned int kasan_page_alloc_sample_order;
+ DECLARE_PER_CPU(long, kasan_page_alloc_skip);
  static inline bool kasan_vmalloc_enabled(void)
  {
        return static_branch_likely(&kasan_flag_vmalloc);
@@@ -57,6 -61,24 +61,24 @@@ static inline bool kasan_sync_fault_pos
        return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM;
  }
  
+ static inline bool kasan_sample_page_alloc(unsigned int order)
+ {
+       /* Fast-path for when sampling is disabled. */
+       if (kasan_page_alloc_sample == 1)
+               return true;
+       if (order < kasan_page_alloc_sample_order)
+               return true;
+       if (this_cpu_dec_return(kasan_page_alloc_skip) < 0) {
+               this_cpu_write(kasan_page_alloc_skip,
+                              kasan_page_alloc_sample - 1);
+               return true;
+       }
+       return false;
+ }
  #else /* CONFIG_KASAN_HW_TAGS */
  
  static inline bool kasan_async_fault_possible(void)
@@@ -69,6 -91,11 +91,11 @@@ static inline bool kasan_sync_fault_pos
        return true;
  }
  
+ static inline bool kasan_sample_page_alloc(unsigned int order)
+ {
+       return true;
+ }
  #endif /* CONFIG_KASAN_HW_TAGS */
  
  #ifdef CONFIG_KASAN_GENERIC
@@@ -180,6 -207,7 +207,7 @@@ struct kasan_report_info 
        void *first_bad_addr;
        struct kmem_cache *cache;
        void *object;
+       size_t alloc_size;
  
        /* Filled in by the mode-specific reporting code. */
        const char *bug_type;
@@@ -269,7 -297,7 +297,7 @@@ static inline const void *kasan_shadow_
                << KASAN_SHADOW_SCALE_SHIFT);
  }
  
- static inline bool addr_has_metadata(const void *addr)
+ static __always_inline bool addr_has_metadata(const void *addr)
  {
        return (kasan_reset_tag(addr) >=
                kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
@@@ -288,7 -316,7 +316,7 @@@ bool kasan_check_range(unsigned long ad
  
  #else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
  
- static inline bool addr_has_metadata(const void *addr)
+ static __always_inline bool addr_has_metadata(const void *addr)
  {
        return (is_vmalloc_addr(addr) || virt_addr_valid(addr));
  }
  #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
  
  void *kasan_find_first_bad_addr(void *addr, size_t size);
+ size_t kasan_get_alloc_size(void *object, struct kmem_cache *cache);
  void kasan_complete_mode_report_info(struct kasan_report_info *info);
  void kasan_metadata_fetch_row(char *buffer, void *row);
  
@@@ -618,10 -647,6 +647,10 @@@ void __asan_set_shadow_f3(const void *a
  void __asan_set_shadow_f5(const void *addr, size_t size);
  void __asan_set_shadow_f8(const void *addr, size_t size);
  
 +void *__asan_memset(void *addr, int c, size_t len);
 +void *__asan_memmove(void *dest, const void *src, size_t len);
 +void *__asan_memcpy(void *dest, const void *src, size_t len);
 +
  void __hwasan_load1_noabort(unsigned long addr);
  void __hwasan_store1_noabort(unsigned long addr);
  void __hwasan_load2_noabort(unsigned long addr);
diff --combined mm/khugepaged.c
index a26a28e3738c1075b5c0856b2989f3c3e92722b0,987281ead49ef6a53209cee6b37c37eb24c3e769..92e6f56a932da3b5ca401f62a5e956f1a7d230db
@@@ -490,32 -490,43 +490,43 @@@ void __khugepaged_exit(struct mm_struc
        }
  }
  
+ static void release_pte_folio(struct folio *folio)
+ {
+       node_stat_mod_folio(folio,
+                       NR_ISOLATED_ANON + folio_is_file_lru(folio),
+                       -folio_nr_pages(folio));
+       folio_unlock(folio);
+       folio_putback_lru(folio);
+ }
  static void release_pte_page(struct page *page)
  {
-       mod_node_page_state(page_pgdat(page),
-                       NR_ISOLATED_ANON + page_is_file_lru(page),
-                       -compound_nr(page));
-       unlock_page(page);
-       putback_lru_page(page);
+       release_pte_folio(page_folio(page));
  }
  
  static void release_pte_pages(pte_t *pte, pte_t *_pte,
                struct list_head *compound_pagelist)
  {
-       struct page *page, *tmp;
+       struct folio *folio, *tmp;
  
        while (--_pte >= pte) {
                pte_t pteval = *_pte;
+               unsigned long pfn;
  
-               page = pte_page(pteval);
-               if (!pte_none(pteval) && !is_zero_pfn(pte_pfn(pteval)) &&
-                               !PageCompound(page))
-                       release_pte_page(page);
+               if (pte_none(pteval))
+                       continue;
+               pfn = pte_pfn(pteval);
+               if (is_zero_pfn(pfn))
+                       continue;
+               folio = pfn_folio(pfn);
+               if (folio_test_large(folio))
+                       continue;
+               release_pte_folio(folio);
        }
  
-       list_for_each_entry_safe(page, tmp, compound_pagelist, lru) {
-               list_del(&page->lru);
-               release_pte_page(page);
+       list_for_each_entry_safe(folio, tmp, compound_pagelist, lru) {
+               list_del(&folio->lru);
+               release_pte_folio(folio);
        }
  }
  
@@@ -625,7 -636,7 +636,7 @@@ static int __collapse_huge_page_isolate
                 * Isolate the page to avoid collapsing an hugepage
                 * currently in use by the VM.
                 */
-               if (isolate_lru_page(page)) {
+               if (!isolate_lru_page(page)) {
                        unlock_page(page);
                        result = SCAN_DEL_PAGE_LRU;
                        goto out;
@@@ -1040,8 -1051,8 +1051,8 @@@ static int collapse_huge_page(struct mm
  
        anon_vma_lock_write(vma->anon_vma);
  
-       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
-                               address, address + HPAGE_PMD_SIZE);
+       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
+                               address + HPAGE_PMD_SIZE);
        mmu_notifier_invalidate_range_start(&range);
  
        pte = pte_offset_map(pmd, address);
@@@ -1412,7 -1423,7 +1423,7 @@@ static void collapse_and_free_pmd(struc
        if (vma->anon_vma)
                lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
  
-       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, addr,
+       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr,
                                addr + HPAGE_PMD_SIZE);
        mmu_notifier_invalidate_range_start(&range);
        pmd = pmdp_collapse_flush(vma, addr, pmdp);
@@@ -1939,7 -1950,7 +1950,7 @@@ static int collapse_file(struct mm_stru
                        goto out_unlock;
                }
  
-               if (folio_isolate_lru(folio)) {
+               if (!folio_isolate_lru(folio)) {
                        result = SCAN_DEL_PAGE_LRU;
                        goto out_unlock;
                }
@@@ -2611,7 -2622,6 +2622,7 @@@ static int madvise_collapse_errno(enum 
        case SCAN_CGROUP_CHARGE_FAIL:
                return -EBUSY;
        /* Resource temporary unavailable - trying again might succeed */
 +      case SCAN_PAGE_COUNT:
        case SCAN_PAGE_LOCK:
        case SCAN_PAGE_LRU:
        case SCAN_DEL_PAGE_LRU:
diff --combined mm/madvise.c
index 18c2e2affac4398d92aa40285d39c16bb8143281,c2202f51e9ddf6aa41b881f58a676e61b25613ab..340125d08c03d9638cc5a47a19f6179039ef2b6c
@@@ -142,6 -142,7 +142,7 @@@ static int madvise_update_vma(struct vm
        struct mm_struct *mm = vma->vm_mm;
        int error;
        pgoff_t pgoff;
+       VMA_ITERATOR(vmi, mm, start);
  
        if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
                *prev = vma;
        }
  
        pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-       *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
-                         vma->vm_file, pgoff, vma_policy(vma),
+       *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
+                         vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
                          vma->vm_userfaultfd_ctx, anon_name);
        if (*prev) {
                vma = *prev;
        *prev = vma;
  
        if (start != vma->vm_start) {
-               if (unlikely(mm->map_count >= sysctl_max_map_count))
-                       return -ENOMEM;
-               error = __split_vma(mm, vma, start, 1);
+               error = split_vma(&vmi, vma, start, 1);
                if (error)
                        return error;
        }
  
        if (end != vma->vm_end) {
-               if (unlikely(mm->map_count >= sysctl_max_map_count))
-                       return -ENOMEM;
-               error = __split_vma(mm, vma, end, 0);
+               error = split_vma(&vmi, vma, end, 0);
                if (error)
                        return error;
        }
@@@ -179,7 -176,7 +176,7 @@@ success
        /*
         * vm_flags is protected by the mmap_lock held in write mode.
         */
-       vma->vm_flags = new_flags;
+       vm_flags_reset(vma, new_flags);
        if (!vma->vm_file || vma_is_anon_shmem(vma)) {
                error = replace_anon_vma_name(vma, anon_name);
                if (error)
@@@ -329,7 -326,7 +326,7 @@@ static inline bool can_do_file_pageout(
         * otherwise we'd be including shared non-exclusive mappings, which
         * opens a side channel.
         */
 -      return inode_owner_or_capable(&init_user_ns,
 +      return inode_owner_or_capable(&nop_mnt_idmap,
                                      file_inode(vma->vm_file)) ||
               file_permission(vma->vm_file, MAY_WRITE) == 0;
  }
@@@ -345,8 -342,8 +342,8 @@@ static int madvise_cold_or_pageout_pte_
        struct vm_area_struct *vma = walk->vma;
        pte_t *orig_pte, *pte, ptent;
        spinlock_t *ptl;
-       struct page *page = NULL;
-       LIST_HEAD(page_list);
+       struct folio *folio = NULL;
+       LIST_HEAD(folio_list);
        bool pageout_anon_only_filter;
  
        if (fatal_signal_pending(current))
                        goto huge_unlock;
                }
  
-               page = pmd_page(orig_pmd);
+               folio = pfn_folio(pmd_pfn(orig_pmd));
  
-               /* Do not interfere with other mappings of this page */
-               if (page_mapcount(page) != 1)
+               /* Do not interfere with other mappings of this folio */
+               if (folio_mapcount(folio) != 1)
                        goto huge_unlock;
  
-               if (pageout_anon_only_filter && !PageAnon(page))
+               if (pageout_anon_only_filter && !folio_test_anon(folio))
                        goto huge_unlock;
  
                if (next - addr != HPAGE_PMD_SIZE) {
                        int err;
  
-                       get_page(page);
+                       folio_get(folio);
                        spin_unlock(ptl);
-                       lock_page(page);
-                       err = split_huge_page(page);
-                       unlock_page(page);
-                       put_page(page);
+                       folio_lock(folio);
+                       err = split_folio(folio);
+                       folio_unlock(folio);
+                       folio_put(folio);
                        if (!err)
-                               goto regular_page;
+                               goto regular_folio;
                        return 0;
                }
  
                        tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
                }
  
-               ClearPageReferenced(page);
-               test_and_clear_page_young(page);
+               folio_clear_referenced(folio);
+               folio_test_clear_young(folio);
                if (pageout) {
-                       if (!isolate_lru_page(page)) {
-                               if (PageUnevictable(page))
-                                       putback_lru_page(page);
+                       if (folio_isolate_lru(folio)) {
+                               if (folio_test_unevictable(folio))
+                                       folio_putback_lru(folio);
                                else
-                                       list_add(&page->lru, &page_list);
+                                       list_add(&folio->lru, &folio_list);
                        }
                } else
-                       deactivate_page(page);
+                       folio_deactivate(folio);
  huge_unlock:
                spin_unlock(ptl);
                if (pageout)
-                       reclaim_pages(&page_list);
+                       reclaim_pages(&folio_list);
                return 0;
        }
  
- regular_page:
+ regular_folio:
        if (pmd_trans_unstable(pmd))
                return 0;
  #endif
                if (!pte_present(ptent))
                        continue;
  
-               page = vm_normal_page(vma, addr, ptent);
-               if (!page || is_zone_device_page(page))
+               folio = vm_normal_folio(vma, addr, ptent);
+               if (!folio || folio_is_zone_device(folio))
                        continue;
  
                /*
                 * Creating a THP page is expensive so split it only if we
                 * are sure it's worth. Split it if we are only owner.
                 */
-               if (PageTransCompound(page)) {
-                       if (page_mapcount(page) != 1)
+               if (folio_test_large(folio)) {
+                       if (folio_mapcount(folio) != 1)
                                break;
-                       if (pageout_anon_only_filter && !PageAnon(page))
+                       if (pageout_anon_only_filter && !folio_test_anon(folio))
                                break;
-                       get_page(page);
-                       if (!trylock_page(page)) {
-                               put_page(page);
+                       folio_get(folio);
+                       if (!folio_trylock(folio)) {
+                               folio_put(folio);
                                break;
                        }
                        pte_unmap_unlock(orig_pte, ptl);
-                       if (split_huge_page(page)) {
-                               unlock_page(page);
-                               put_page(page);
+                       if (split_folio(folio)) {
+                               folio_unlock(folio);
+                               folio_put(folio);
                                orig_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
                                break;
                        }
-                       unlock_page(page);
-                       put_page(page);
+                       folio_unlock(folio);
+                       folio_put(folio);
                        orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
                        pte--;
                        addr -= PAGE_SIZE;
                }
  
                /*
-                * Do not interfere with other mappings of this page and
-                * non-LRU page.
+                * Do not interfere with other mappings of this folio and
+                * non-LRU folio.
                 */
-               if (!PageLRU(page) || page_mapcount(page) != 1)
+               if (!folio_test_lru(folio) || folio_mapcount(folio) != 1)
                        continue;
  
-               if (pageout_anon_only_filter && !PageAnon(page))
+               if (pageout_anon_only_filter && !folio_test_anon(folio))
                        continue;
  
-               VM_BUG_ON_PAGE(PageTransCompound(page), page);
+               VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
  
                if (pte_young(ptent)) {
                        ptent = ptep_get_and_clear_full(mm, addr, pte,
                }
  
                /*
-                * We are deactivating a page for accelerating reclaiming.
-                * VM couldn't reclaim the page unless we clear PG_young.
+                * We are deactivating a folio for accelerating reclaiming.
+                * VM couldn't reclaim the folio unless we clear PG_young.
                 * As a side effect, it makes confuse idle-page tracking
                 * because they will miss recent referenced history.
                 */
-               ClearPageReferenced(page);
-               test_and_clear_page_young(page);
+               folio_clear_referenced(folio);
+               folio_test_clear_young(folio);
                if (pageout) {
-                       if (!isolate_lru_page(page)) {
-                               if (PageUnevictable(page))
-                                       putback_lru_page(page);
+                       if (folio_isolate_lru(folio)) {
+                               if (folio_test_unevictable(folio))
+                                       folio_putback_lru(folio);
                                else
-                                       list_add(&page->lru, &page_list);
+                                       list_add(&folio->lru, &folio_list);
                        }
                } else
-                       deactivate_page(page);
+                       folio_deactivate(folio);
        }
  
        arch_leave_lazy_mmu_mode();
        pte_unmap_unlock(orig_pte, ptl);
        if (pageout)
-               reclaim_pages(&page_list);
+               reclaim_pages(&folio_list);
        cond_resched();
  
        return 0;
@@@ -617,7 -614,6 +614,6 @@@ static int madvise_free_pte_range(pmd_
        spinlock_t *ptl;
        pte_t *orig_pte, *pte, ptent;
        struct folio *folio;
-       struct page *page;
        int nr_swap = 0;
        unsigned long next;
  
                        continue;
                }
  
-               page = vm_normal_page(vma, addr, ptent);
-               if (!page || is_zone_device_page(page))
+               folio = vm_normal_folio(vma, addr, ptent);
+               if (!folio || folio_is_zone_device(folio))
                        continue;
-               folio = page_folio(page);
  
                /*
                 * If pmd isn't transhuge but the folio is large and
                        set_pte_at(mm, addr, pte, ptent);
                        tlb_remove_tlb_entry(tlb, pte, addr);
                }
-               mark_page_lazyfree(&folio->page);
+               folio_mark_lazyfree(folio);
        }
  out:
        if (nr_swap) {
@@@ -765,7 -760,7 +760,7 @@@ static int madvise_free_single_vma(stru
        range.end = min(vma->vm_end, end_addr);
        if (range.end <= vma->vm_start)
                return -EINVAL;
-       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
+       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
                                range.start, range.end);
  
        lru_add_drain();
diff --combined mm/memcontrol.c
index 49f40730e7117da7747940942016b97c7ff1e101,25f2465d5a37b55d9588e91f3f70e41785261ecd..5abffe6f8389e27a705068e028dee875c91efa91
@@@ -88,9 -88,6 +88,9 @@@ static bool cgroup_memory_nosocket __ro
  /* Kernel memory accounting disabled? */
  static bool cgroup_memory_nokmem __ro_after_init;
  
 +/* BPF memory accounting disabled? */
 +static bool cgroup_memory_nobpf __ro_after_init;
 +
  #ifdef CONFIG_CGROUP_WRITEBACK
  static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
  #endif
@@@ -348,29 -345,24 +348,27 @@@ static void memcg_reparent_objcgs(struc
   * conditional to this static branch, we'll have to allow modules that does
   * kmem_cache_alloc and the such to see this symbol as well
   */
- DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
- EXPORT_SYMBOL(memcg_kmem_enabled_key);
+ DEFINE_STATIC_KEY_FALSE(memcg_kmem_online_key);
+ EXPORT_SYMBOL(memcg_kmem_online_key);
 +
 +DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key);
 +EXPORT_SYMBOL(memcg_bpf_enabled_key);
  #endif
  
  /**
-  * mem_cgroup_css_from_page - css of the memcg associated with a page
-  * @page: page of interest
+  * mem_cgroup_css_from_folio - css of the memcg associated with a folio
+  * @folio: folio of interest
   *
   * If memcg is bound to the default hierarchy, css of the memcg associated
-  * with @page is returned.  The returned css remains associated with @page
+  * with @folio is returned.  The returned css remains associated with @folio
   * until it is released.
   *
   * If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup
   * is returned.
   */
- struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page)
+ struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio)
  {
-       struct mem_cgroup *memcg;
-       memcg = page_memcg(page);
+       struct mem_cgroup *memcg = folio_memcg(folio);
  
        if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
                memcg = root_mem_cgroup;
@@@ -483,6 -475,12 +481,12 @@@ static void mem_cgroup_update_tree(stru
        struct mem_cgroup_per_node *mz;
        struct mem_cgroup_tree_per_node *mctz;
  
+       if (lru_gen_enabled()) {
+               if (soft_limit_excess(memcg))
+                       lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec);
+               return;
+       }
        mctz = soft_limit_tree.rb_tree_per_node[nid];
        if (!mctz)
                return;
@@@ -2945,13 -2943,13 +2949,13 @@@ struct mem_cgroup *mem_cgroup_from_obj_
        }
  
        /*
-        * page_memcg_check() is used here, because in theory we can encounter
+        * folio_memcg_check() is used here, because in theory we can encounter
         * a folio where the slab flag has been cleared already, but
         * slab->memcg_data has not been freed yet
-        * page_memcg_check(page) will guarantee that a proper memory
+        * folio_memcg_check() will guarantee that a proper memory
         * cgroup pointer or NULL will be returned.
         */
-       return page_memcg_check(folio_page(folio, 0));
+       return folio_memcg_check(folio);
  }
  
  /*
@@@ -3036,7 -3034,7 +3040,7 @@@ struct obj_cgroup *get_obj_cgroup_from_
  {
        struct obj_cgroup *objcg;
  
-       if (!memcg_kmem_enabled())
+       if (!memcg_kmem_online())
                return NULL;
  
        if (PageMemcgKmem(page)) {
@@@ -3532,6 -3530,9 +3536,9 @@@ unsigned long mem_cgroup_soft_limit_rec
        struct mem_cgroup_tree_per_node *mctz;
        unsigned long excess;
  
+       if (lru_gen_enabled())
+               return 0;
        if (order > 0)
                return 0;
  
@@@ -3745,7 -3746,7 +3752,7 @@@ static int memcg_online_kmem(struct mem
        objcg->memcg = memcg;
        rcu_assign_pointer(memcg->objcg, objcg);
  
-       static_branch_enable(&memcg_kmem_enabled_key);
+       static_branch_enable(&memcg_kmem_online_key);
  
        memcg->kmemcg_id = memcg->id.id;
  
@@@ -3920,6 -3921,10 +3927,10 @@@ static int mem_cgroup_move_charge_write
  {
        struct mem_cgroup *memcg = mem_cgroup_from_css(css);
  
+       pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
+                    "Please report your usecase to [email protected] if you "
+                    "depend on this functionality.\n");
        if (val & ~MOVE_MASK)
                return -EINVAL;
  
@@@ -5363,11 -5368,6 +5374,11 @@@ mem_cgroup_css_alloc(struct cgroup_subs
        if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
                static_branch_inc(&memcg_sockets_enabled_key);
  
 +#if defined(CONFIG_MEMCG_KMEM)
 +      if (!cgroup_memory_nobpf)
 +              static_branch_inc(&memcg_bpf_enabled_key);
 +#endif
 +
        return &memcg->css;
  }
  
@@@ -5393,6 -5393,7 +5404,7 @@@ static int mem_cgroup_css_online(struc
        if (unlikely(mem_cgroup_is_root(memcg)))
                queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
                                   2UL*HZ);
+       lru_gen_online_memcg(memcg);
        return 0;
  offline_kmem:
        memcg_offline_kmem(memcg);
@@@ -5424,6 -5425,7 +5436,7 @@@ static void mem_cgroup_css_offline(stru
        memcg_offline_kmem(memcg);
        reparent_shrinker_deferred(memcg);
        wb_memcg_offline(memcg);
+       lru_gen_offline_memcg(memcg);
  
        drain_all_stock(memcg);
  
@@@ -5435,6 -5437,7 +5448,7 @@@ static void mem_cgroup_css_released(str
        struct mem_cgroup *memcg = mem_cgroup_from_css(css);
  
        invalidate_reclaim_iterators(memcg);
+       lru_gen_release_memcg(memcg);
  }
  
  static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
        if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_active)
                static_branch_dec(&memcg_sockets_enabled_key);
  
 +#if defined(CONFIG_MEMCG_KMEM)
 +      if (!cgroup_memory_nobpf)
 +              static_branch_dec(&memcg_bpf_enabled_key);
 +#endif
 +
        vmpressure_cleanup(&memcg->vmpressure);
        cancel_work_sync(&memcg->high_work);
        mem_cgroup_remove_from_trees(memcg);
@@@ -5703,7 -5701,7 +5717,7 @@@ static struct page *mc_handle_file_pte(
   * @from: mem_cgroup which the page is moved from.
   * @to:       mem_cgroup which the page is moved to. @from != @to.
   *
-  * The caller must make sure the page is not on LRU (isolate_page() is useful.)
+  * The page must be locked and not on the LRU.
   *
   * This function doesn't do "charge" to new cgroup and doesn't do "uncharge"
   * from old cgroup.
@@@ -5720,20 -5718,13 +5734,13 @@@ static int mem_cgroup_move_account(stru
        int nid, ret;
  
        VM_BUG_ON(from == to);
+       VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
        VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
        VM_BUG_ON(compound && !folio_test_large(folio));
  
-       /*
-        * Prevent mem_cgroup_migrate() from looking at
-        * page's memory cgroup of its source page while we change it.
-        */
-       ret = -EBUSY;
-       if (!folio_trylock(folio))
-               goto out;
        ret = -EINVAL;
        if (folio_memcg(folio) != from)
-               goto out_unlock;
+               goto out;
  
        pgdat = folio_pgdat(folio);
        from_vec = mem_cgroup_lruvec(from, pgdat);
        mem_cgroup_charge_statistics(from, -nr_pages);
        memcg_check_events(from, nid);
        local_irq_enable();
- out_unlock:
-       folio_unlock(folio);
  out:
        return ret;
  }
@@@ -5870,6 -5859,29 +5875,29 @@@ static enum mc_target_type get_mctgt_ty
        else if (is_swap_pte(ptent))
                page = mc_handle_swap_pte(vma, ptent, &ent);
  
+       if (target && page) {
+               if (!trylock_page(page)) {
+                       put_page(page);
+                       return ret;
+               }
+               /*
+                * page_mapped() must be stable during the move. This
+                * pte is locked, so if it's present, the page cannot
+                * become unmapped. If it isn't, we have only partial
+                * control over the mapped state: the page lock will
+                * prevent new faults against pagecache and swapcache,
+                * so an unmapped page cannot become mapped. However,
+                * if the page is already mapped elsewhere, it can
+                * unmap, and there is nothing we can do about it.
+                * Alas, skip moving the page in this case.
+                */
+               if (!pte_present(ptent) && page_mapped(page)) {
+                       unlock_page(page);
+                       put_page(page);
+                       return ret;
+               }
+       }
        if (!page && !ent.val)
                return ret;
        if (page) {
                        if (target)
                                target->page = page;
                }
-               if (!ret || !target)
+               if (!ret || !target) {
+                       if (target)
+                               unlock_page(page);
                        put_page(page);
+               }
        }
        /*
         * There is a swap entry and a page doesn't exist or isn't charged.
@@@ -5927,6 -5942,10 +5958,10 @@@ static enum mc_target_type get_mctgt_ty
                ret = MC_TARGET_PAGE;
                if (target) {
                        get_page(page);
+                       if (!trylock_page(page)) {
+                               put_page(page);
+                               return MC_TARGET_NONE;
+                       }
                        target->page = page;
                }
        }
@@@ -6157,7 -6176,7 +6192,7 @@@ static int mem_cgroup_move_charge_pte_r
                target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
                if (target_type == MC_TARGET_PAGE) {
                        page = target.page;
-                       if (!isolate_lru_page(page)) {
+                       if (isolate_lru_page(page)) {
                                if (!mem_cgroup_move_account(page, true,
                                                             mc.from, mc.to)) {
                                        mc.precharge -= HPAGE_PMD_NR;
                                }
                                putback_lru_page(page);
                        }
+                       unlock_page(page);
                        put_page(page);
                } else if (target_type == MC_TARGET_DEVICE) {
                        page = target.page;
                                mc.precharge -= HPAGE_PMD_NR;
                                mc.moved_charge += HPAGE_PMD_NR;
                        }
+                       unlock_page(page);
                        put_page(page);
                }
                spin_unlock(ptl);
@@@ -6205,7 -6226,7 +6242,7 @@@ retry
                         */
                        if (PageTransCompound(page))
                                goto put;
-                       if (!device && isolate_lru_page(page))
+                       if (!device && !isolate_lru_page(page))
                                goto put;
                        if (!mem_cgroup_move_account(page, false,
                                                mc.from, mc.to)) {
                        }
                        if (!device)
                                putback_lru_page(page);
- put:                  /* get_mctgt_type() gets the page */
+ put:                  /* get_mctgt_type() gets & locks the page */
+                       unlock_page(page);
                        put_page(page);
                        break;
                case MC_TARGET_SWAP:
@@@ -7285,8 -7307,6 +7323,8 @@@ static int __init cgroup_memory(char *s
                        cgroup_memory_nosocket = true;
                if (!strcmp(token, "nokmem"))
                        cgroup_memory_nokmem = true;
 +              if (!strcmp(token, "nobpf"))
 +                      cgroup_memory_nobpf = true;
        }
        return 1;
  }
diff --combined mm/migrate.c
index cc5455614e018576ff85c70a637c664088fcc2c6,9a101c7bb8fffc19f14bc6f02a60b080d561a723..37865f85df6d4132f5343aac33f16a580bf2f001
@@@ -58,8 -58,9 +58,9 @@@
  
  #include "internal.h"
  
int isolate_movable_page(struct page *page, isolate_mode_t mode)
bool isolate_movable_page(struct page *page, isolate_mode_t mode)
  {
+       struct folio *folio = folio_get_nontail_page(page);
        const struct movable_operations *mops;
  
        /*
         * the put_page() at the end of this block will take care of
         * release this page, thus avoiding a nasty leakage.
         */
-       if (unlikely(!get_page_unless_zero(page)))
+       if (!folio)
                goto out;
  
-       if (unlikely(PageSlab(page)))
-               goto out_putpage;
+       if (unlikely(folio_test_slab(folio)))
+               goto out_putfolio;
        /* Pairs with smp_wmb() in slab freeing, e.g. SLUB's __free_slab() */
        smp_rmb();
        /*
         * we use non-atomic bitops on newly allocated page flags so
         * unconditionally grabbing the lock ruins page's owner side.
         */
-       if (unlikely(!__PageMovable(page)))
-               goto out_putpage;
+       if (unlikely(!__folio_test_movable(folio)))
+               goto out_putfolio;
        /* Pairs with smp_wmb() in slab allocation, e.g. SLUB's alloc_slab_page() */
        smp_rmb();
-       if (unlikely(PageSlab(page)))
-               goto out_putpage;
+       if (unlikely(folio_test_slab(folio)))
+               goto out_putfolio;
  
        /*
         * As movable pages are not isolated from LRU lists, concurrent
         * lets be sure we have the page lock
         * before proceeding with the movable page isolation steps.
         */
-       if (unlikely(!trylock_page(page)))
-               goto out_putpage;
+       if (unlikely(!folio_trylock(folio)))
+               goto out_putfolio;
  
-       if (!PageMovable(page) || PageIsolated(page))
+       if (!folio_test_movable(folio) || folio_test_isolated(folio))
                goto out_no_isolated;
  
-       mops = page_movable_ops(page);
-       VM_BUG_ON_PAGE(!mops, page);
+       mops = folio_movable_ops(folio);
+       VM_BUG_ON_FOLIO(!mops, folio);
  
-       if (!mops->isolate_page(page, mode))
+       if (!mops->isolate_page(&folio->page, mode))
                goto out_no_isolated;
  
        /* Driver shouldn't use PG_isolated bit of page->flags */
-       WARN_ON_ONCE(PageIsolated(page));
-       SetPageIsolated(page);
-       unlock_page(page);
+       WARN_ON_ONCE(folio_test_isolated(folio));
+       folio_set_isolated(folio);
+       folio_unlock(folio);
  
-       return 0;
+       return true;
  
  out_no_isolated:
-       unlock_page(page);
- out_putpage:
-       put_page(page);
+       folio_unlock(folio);
+ out_putfolio:
+       folio_put(folio);
  out:
-       return -EBUSY;
+       return false;
  }
  
- static void putback_movable_page(struct page *page)
+ static void putback_movable_folio(struct folio *folio)
  {
-       const struct movable_operations *mops = page_movable_ops(page);
+       const struct movable_operations *mops = folio_movable_ops(folio);
  
-       mops->putback_page(page);
-       ClearPageIsolated(page);
+       mops->putback_page(&folio->page);
+       folio_clear_isolated(folio);
  }
  
  /*
   */
  void putback_movable_pages(struct list_head *l)
  {
-       struct page *page;
-       struct page *page2;
+       struct folio *folio;
+       struct folio *folio2;
  
-       list_for_each_entry_safe(page, page2, l, lru) {
-               if (unlikely(PageHuge(page))) {
-                       putback_active_hugepage(page);
+       list_for_each_entry_safe(folio, folio2, l, lru) {
+               if (unlikely(folio_test_hugetlb(folio))) {
+                       folio_putback_active_hugetlb(folio);
                        continue;
                }
-               list_del(&page->lru);
+               list_del(&folio->lru);
                /*
-                * We isolated non-lru movable page so here we can use
-                * __PageMovable because LRU page's mapping cannot have
+                * We isolated non-lru movable folio so here we can use
+                * __PageMovable because LRU folio's mapping cannot have
                 * PAGE_MAPPING_MOVABLE.
                 */
-               if (unlikely(__PageMovable(page))) {
-                       VM_BUG_ON_PAGE(!PageIsolated(page), page);
-                       lock_page(page);
-                       if (PageMovable(page))
-                               putback_movable_page(page);
+               if (unlikely(__folio_test_movable(folio))) {
+                       VM_BUG_ON_FOLIO(!folio_test_isolated(folio), folio);
+                       folio_lock(folio);
+                       if (folio_test_movable(folio))
+                               putback_movable_folio(folio);
                        else
-                               ClearPageIsolated(page);
-                       unlock_page(page);
-                       put_page(page);
+                               folio_clear_isolated(folio);
+                       folio_unlock(folio);
+                       folio_put(folio);
                } else {
-                       mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
-                                       page_is_file_lru(page), -thp_nr_pages(page));
-                       putback_lru_page(page);
+                       node_stat_mod_folio(folio, NR_ISOLATED_ANON +
+                                       folio_is_file_lru(folio), -folio_nr_pages(folio));
+                       folio_putback_lru(folio);
                }
        }
  }
@@@ -224,8 -225,6 +225,8 @@@ static bool remove_migration_pte(struc
                        pte = maybe_mkwrite(pte, vma);
                else if (pte_swp_uffd_wp(*pvmw.pte))
                        pte = pte_mkuffd_wp(pte);
 +              else
 +                      pte = pte_wrprotect(pte);
  
                if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
                        rmap_flags |= RMAP_EXCLUSIVE;
                        set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
                }
                if (vma->vm_flags & VM_LOCKED)
-                       mlock_page_drain_local();
+                       mlock_drain_local();
  
                trace_remove_migration_pte(pvmw.address, pte_val(pte),
                                           compound_order(new));
@@@ -331,24 -330,41 +332,41 @@@ void migration_entry_wait(struct mm_str
  }
  
  #ifdef CONFIG_HUGETLB_PAGE
- void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl)
+ /*
+  * The vma read lock must be held upon entry. Holding that lock prevents either
+  * the pte or the ptl from being freed.
+  *
+  * This function will release the vma lock before returning.
+  */
+ void __migration_entry_wait_huge(struct vm_area_struct *vma,
+                                pte_t *ptep, spinlock_t *ptl)
  {
        pte_t pte;
  
+       hugetlb_vma_assert_locked(vma);
        spin_lock(ptl);
        pte = huge_ptep_get(ptep);
  
-       if (unlikely(!is_hugetlb_entry_migration(pte)))
+       if (unlikely(!is_hugetlb_entry_migration(pte))) {
                spin_unlock(ptl);
-       else
+               hugetlb_vma_unlock_read(vma);
+       } else {
+               /*
+                * If migration entry existed, safe to release vma lock
+                * here because the pgtable page won't be freed without the
+                * pgtable lock released.  See comment right above pgtable
+                * lock release in migration_entry_wait_on_locked().
+                */
+               hugetlb_vma_unlock_read(vma);
                migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl);
+       }
  }
  
  void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte)
  {
        spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte);
  
-       __migration_entry_wait_huge(pte, ptl);
+       __migration_entry_wait_huge(vma, pte, ptl);
  }
  #endif
  
@@@ -975,7 -991,7 +993,7 @@@ static int move_to_new_folio(struct fol
                        goto out;
                }
  
-               mops = page_movable_ops(&src->page);
+               mops = folio_movable_ops(src);
                rc = mops->migrate_page(&dst->page, &src->page, mode);
                WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
                                !folio_test_isolated(src));
        return rc;
  }
  
- static int __unmap_and_move(struct folio *src, struct folio *dst,
-                               int force, enum migrate_mode mode)
+ /*
+  * To record some information during migration, we use some unused
+  * fields (mapping and private) of struct folio of the newly allocated
+  * destination folio.  This is safe because nobody is using them
+  * except us.
+  */
+ static void __migrate_folio_record(struct folio *dst,
+                                  unsigned long page_was_mapped,
+                                  struct anon_vma *anon_vma)
+ {
+       dst->mapping = (void *)anon_vma;
+       dst->private = (void *)page_was_mapped;
+ }
+ static void __migrate_folio_extract(struct folio *dst,
+                                  int *page_was_mappedp,
+                                  struct anon_vma **anon_vmap)
+ {
+       *anon_vmap = (void *)dst->mapping;
+       *page_was_mappedp = (unsigned long)dst->private;
+       dst->mapping = NULL;
+       dst->private = NULL;
+ }
+ /* Restore the source folio to the original state upon failure */
+ static void migrate_folio_undo_src(struct folio *src,
+                                  int page_was_mapped,
+                                  struct anon_vma *anon_vma,
+                                  bool locked,
+                                  struct list_head *ret)
+ {
+       if (page_was_mapped)
+               remove_migration_ptes(src, src, false);
+       /* Drop an anon_vma reference if we took one */
+       if (anon_vma)
+               put_anon_vma(anon_vma);
+       if (locked)
+               folio_unlock(src);
+       if (ret)
+               list_move_tail(&src->lru, ret);
+ }
+ /* Restore the destination folio to the original state upon failure */
+ static void migrate_folio_undo_dst(struct folio *dst,
+                                  bool locked,
+                                  free_page_t put_new_page,
+                                  unsigned long private)
+ {
+       if (locked)
+               folio_unlock(dst);
+       if (put_new_page)
+               put_new_page(&dst->page, private);
+       else
+               folio_put(dst);
+ }
+ /* Cleanup src folio upon migration success */
+ static void migrate_folio_done(struct folio *src,
+                              enum migrate_reason reason)
+ {
+       /*
+        * Compaction can migrate also non-LRU pages which are
+        * not accounted to NR_ISOLATED_*. They can be recognized
+        * as __PageMovable
+        */
+       if (likely(!__folio_test_movable(src)))
+               mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
+                                   folio_is_file_lru(src), -folio_nr_pages(src));
+       if (reason != MR_MEMORY_FAILURE)
+               /* We release the page in page_handle_poison. */
+               folio_put(src);
+ }
+ /* Obtain the lock on page, remove all ptes. */
+ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
+                              unsigned long private, struct folio *src,
+                              struct folio **dstp, int force, bool avoid_force_lock,
+                              enum migrate_mode mode, enum migrate_reason reason,
+                              struct list_head *ret)
  {
+       struct folio *dst;
        int rc = -EAGAIN;
-       bool page_was_mapped = false;
+       struct page *newpage = NULL;
+       int page_was_mapped = 0;
        struct anon_vma *anon_vma = NULL;
        bool is_lru = !__PageMovable(&src->page);
+       bool locked = false;
+       bool dst_locked = false;
+       if (folio_ref_count(src) == 1) {
+               /* Folio was freed from under us. So we are done. */
+               folio_clear_active(src);
+               folio_clear_unevictable(src);
+               /* free_pages_prepare() will clear PG_isolated. */
+               list_del(&src->lru);
+               migrate_folio_done(src, reason);
+               return MIGRATEPAGE_SUCCESS;
+       }
+       newpage = get_new_page(&src->page, private);
+       if (!newpage)
+               return -ENOMEM;
+       dst = page_folio(newpage);
+       *dstp = dst;
+       dst->private = NULL;
  
        if (!folio_trylock(src)) {
                if (!force || mode == MIGRATE_ASYNC)
                if (current->flags & PF_MEMALLOC)
                        goto out;
  
+               /*
+                * We have locked some folios and are going to wait to lock
+                * this folio.  To avoid a potential deadlock, let's bail
+                * out and not do that. The locked folios will be moved and
+                * unlocked, then we can wait to lock this folio.
+                */
+               if (avoid_force_lock) {
+                       rc = -EDEADLOCK;
+                       goto out;
+               }
                folio_lock(src);
        }
+       locked = true;
  
        if (folio_test_writeback(src)) {
                /*
                        break;
                default:
                        rc = -EBUSY;
-                       goto out_unlock;
+                       goto out;
                }
                if (!force)
-                       goto out_unlock;
+                       goto out;
                folio_wait_writeback(src);
        }
  
         * This is much like races on refcount of oldpage: just don't BUG().
         */
        if (unlikely(!folio_trylock(dst)))
-               goto out_unlock;
+               goto out;
+       dst_locked = true;
  
        if (unlikely(!is_lru)) {
-               rc = move_to_new_folio(dst, src, mode);
-               goto out_unlock_both;
+               __migrate_folio_record(dst, page_was_mapped, anon_vma);
+               return MIGRATEPAGE_UNMAP;
        }
  
        /*
        if (!src->mapping) {
                if (folio_test_private(src)) {
                        try_to_free_buffers(src);
-                       goto out_unlock_both;
+                       goto out;
                }
        } else if (folio_mapped(src)) {
                /* Establish migration ptes */
                VM_BUG_ON_FOLIO(folio_test_anon(src) &&
                               !folio_test_ksm(src) && !anon_vma, src);
-               try_to_migrate(src, 0);
-               page_was_mapped = true;
+               try_to_migrate(src, TTU_BATCH_FLUSH);
+               page_was_mapped = 1;
        }
  
-       if (!folio_mapped(src))
-               rc = move_to_new_folio(dst, src, mode);
+       if (!folio_mapped(src)) {
+               __migrate_folio_record(dst, page_was_mapped, anon_vma);
+               return MIGRATEPAGE_UNMAP;
+       }
+ out:
+       /*
+        * A folio that has not been unmapped will be restored to
+        * right list unless we want to retry.
+        */
+       if (rc == -EAGAIN || rc == -EDEADLOCK)
+               ret = NULL;
+       migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
+       migrate_folio_undo_dst(dst, dst_locked, put_new_page, private);
+       return rc;
+ }
+ /* Migrate the folio to the newly allocated folio in dst. */
+ static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
+                             struct folio *src, struct folio *dst,
+                             enum migrate_mode mode, enum migrate_reason reason,
+                             struct list_head *ret)
+ {
+       int rc;
+       int page_was_mapped = 0;
+       struct anon_vma *anon_vma = NULL;
+       bool is_lru = !__PageMovable(&src->page);
+       struct list_head *prev;
+       __migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+       prev = dst->lru.prev;
+       list_del(&dst->lru);
+       rc = move_to_new_folio(dst, src, mode);
+       if (rc)
+               goto out;
+       if (unlikely(!is_lru))
+               goto out_unlock_both;
  
        /*
         * When successful, push dst to LRU immediately: so that if it
         * unsuccessful, and other cases when a page has been temporarily
         * isolated from the unevictable LRU: but this case is the easiest.
         */
-       if (rc == MIGRATEPAGE_SUCCESS) {
-               folio_add_lru(dst);
-               if (page_was_mapped)
-                       lru_add_drain();
-       }
+       folio_add_lru(dst);
+       if (page_was_mapped)
+               lru_add_drain();
  
        if (page_was_mapped)
-               remove_migration_ptes(src,
-                       rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
+               remove_migration_ptes(src, dst, false);
  
  out_unlock_both:
        folio_unlock(dst);
- out_unlock:
-       /* Drop an anon_vma reference if we took one */
-       if (anon_vma)
-               put_anon_vma(anon_vma);
-       folio_unlock(src);
- out:
+       set_page_owner_migrate_reason(&dst->page, reason);
        /*
         * If migration is successful, decrease refcount of dst,
         * which will not free the page because new page owner increased
         * refcounter.
         */
-       if (rc == MIGRATEPAGE_SUCCESS)
-               folio_put(dst);
+       folio_put(dst);
  
-       return rc;
- }
- /*
-  * Obtain the lock on folio, remove all ptes and migrate the folio
-  * to the newly allocated folio in dst.
-  */
- static int unmap_and_move(new_page_t get_new_page,
-                                  free_page_t put_new_page,
-                                  unsigned long private, struct folio *src,
-                                  int force, enum migrate_mode mode,
-                                  enum migrate_reason reason,
-                                  struct list_head *ret)
- {
-       struct folio *dst;
-       int rc = MIGRATEPAGE_SUCCESS;
-       struct page *newpage = NULL;
-       if (!thp_migration_supported() && folio_test_transhuge(src))
-               return -ENOSYS;
-       if (folio_ref_count(src) == 1) {
-               /* Folio was freed from under us. So we are done. */
-               folio_clear_active(src);
-               folio_clear_unevictable(src);
-               /* free_pages_prepare() will clear PG_isolated. */
-               goto out;
-       }
-       newpage = get_new_page(&src->page, private);
-       if (!newpage)
-               return -ENOMEM;
-       dst = page_folio(newpage);
-       dst->private = NULL;
-       rc = __unmap_and_move(src, dst, force, mode);
-       if (rc == MIGRATEPAGE_SUCCESS)
-               set_page_owner_migrate_reason(&dst->page, reason);
+       /*
+        * A folio that has been migrated has all references removed
+        * and will be freed.
+        */
+       list_del(&src->lru);
+       /* Drop an anon_vma reference if we took one */
+       if (anon_vma)
+               put_anon_vma(anon_vma);
+       folio_unlock(src);
+       migrate_folio_done(src, reason);
  
+       return rc;
  out:
-       if (rc != -EAGAIN) {
-               /*
-                * A folio that has been migrated has all references
-                * removed and will be freed. A folio that has not been
-                * migrated will have kept its references and be restored.
-                */
-               list_del(&src->lru);
-       }
        /*
-        * If migration is successful, releases reference grabbed during
-        * isolation. Otherwise, restore the folio to right list unless
-        * we want to retry.
+        * A folio that has not been migrated will be restored to
+        * right list unless we want to retry.
         */
-       if (rc == MIGRATEPAGE_SUCCESS) {
-               /*
-                * Compaction can migrate also non-LRU folios which are
-                * not accounted to NR_ISOLATED_*. They can be recognized
-                * as __folio_test_movable
-                */
-               if (likely(!__folio_test_movable(src)))
-                       mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
-                                       folio_is_file_lru(src), -folio_nr_pages(src));
-               if (reason != MR_MEMORY_FAILURE)
-                       /*
-                        * We release the folio in page_handle_poison.
-                        */
-                       folio_put(src);
-       } else {
-               if (rc != -EAGAIN)
-                       list_add_tail(&src->lru, ret);
-               if (put_new_page)
-                       put_new_page(&dst->page, private);
-               else
-                       folio_put(dst);
+       if (rc == -EAGAIN) {
+               list_add(&dst->lru, prev);
+               __migrate_folio_record(dst, page_was_mapped, anon_vma);
+               return rc;
        }
  
+       migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret);
+       migrate_folio_undo_dst(dst, true, put_new_page, private);
        return rc;
  }
  
@@@ -1271,19 -1377,9 +1379,9 @@@ static int unmap_and_move_huge_page(new
        struct anon_vma *anon_vma = NULL;
        struct address_space *mapping = NULL;
  
-       /*
-        * Migratability of hugepages depends on architectures and their size.
-        * This check is necessary because some callers of hugepage migration
-        * like soft offline and memory hotremove don't walk through page
-        * tables or check whether the hugepage is pmd-based or not before
-        * kicking migration.
-        */
-       if (!hugepage_migration_supported(page_hstate(hpage)))
-               return -ENOSYS;
        if (folio_ref_count(src) == 1) {
                /* page was freed from under us. So we are done. */
-               putback_active_hugepage(hpage);
+               folio_putback_active_hugetlb(src);
                return MIGRATEPAGE_SUCCESS;
        }
  
@@@ -1368,7 -1464,7 +1466,7 @@@ out_unlock
        folio_unlock(src);
  out:
        if (rc == MIGRATEPAGE_SUCCESS)
-               putback_active_hugepage(hpage);
+               folio_putback_active_hugetlb(src);
        else if (rc != -EAGAIN)
                list_move_tail(&src->lru, ret);
  
        if (put_new_page)
                put_new_page(new_hpage, private);
        else
-               putback_active_hugepage(new_hpage);
+               folio_putback_active_hugetlb(dst);
  
        return rc;
  }
@@@ -1398,61 -1494,153 +1496,153 @@@ static inline int try_split_folio(struc
        return rc;
  }
  
+ #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ #define NR_MAX_BATCHED_MIGRATION      HPAGE_PMD_NR
+ #else
+ #define NR_MAX_BATCHED_MIGRATION      512
+ #endif
+ #define NR_MAX_MIGRATE_PAGES_RETRY    10
+ struct migrate_pages_stats {
+       int nr_succeeded;       /* Normal and large folios migrated successfully, in
+                                  units of base pages */
+       int nr_failed_pages;    /* Normal and large folios failed to be migrated, in
+                                  units of base pages.  Untried folios aren't counted */
+       int nr_thp_succeeded;   /* THP migrated successfully */
+       int nr_thp_failed;      /* THP failed to be migrated */
+       int nr_thp_split;       /* THP split before migrating */
+ };
  /*
-  * migrate_pages - migrate the folios specified in a list, to the free folios
-  *               supplied as the target for the page migration
-  *
-  * @from:             The list of folios to be migrated.
-  * @get_new_page:     The function used to allocate free folios to be used
-  *                    as the target of the folio migration.
-  * @put_new_page:     The function used to free target folios if migration
-  *                    fails, or NULL if no special handling is necessary.
-  * @private:          Private data to be passed on to get_new_page()
-  * @mode:             The migration mode that specifies the constraints for
-  *                    folio migration, if any.
-  * @reason:           The reason for folio migration.
-  * @ret_succeeded:    Set to the number of folios migrated successfully if
-  *                    the caller passes a non-NULL pointer.
-  *
-  * The function returns after 10 attempts or if no folios are movable any more
-  * because the list has become empty or no retryable folios exist any more.
-  * It is caller's responsibility to call putback_movable_pages() to return folios
-  * to the LRU or free list only if ret != 0.
-  *
-  * Returns the number of {normal folio, large folio, hugetlb} that were not
-  * migrated, or an error code. The number of large folio splits will be
-  * considered as the number of non-migrated large folio, no matter how many
-  * split folios of the large folio are migrated successfully.
+  * Returns the number of hugetlb folios that were not migrated, or an error code
+  * after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no hugetlb folios are movable
+  * any more because the list has become empty or no retryable hugetlb folios
+  * exist any more. It is caller's responsibility to call putback_movable_pages()
+  * only if ret != 0.
   */
- int migrate_pages(struct list_head *from, new_page_t get_new_page,
-               free_page_t put_new_page, unsigned long private,
-               enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+ static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
+                           free_page_t put_new_page, unsigned long private,
+                           enum migrate_mode mode, int reason,
+                           struct migrate_pages_stats *stats,
+                           struct list_head *ret_folios)
  {
        int retry = 1;
+       int nr_failed = 0;
+       int nr_retry_pages = 0;
+       int pass = 0;
+       struct folio *folio, *folio2;
+       int rc, nr_pages;
+       for (pass = 0; pass < NR_MAX_MIGRATE_PAGES_RETRY && retry; pass++) {
+               retry = 0;
+               nr_retry_pages = 0;
+               list_for_each_entry_safe(folio, folio2, from, lru) {
+                       if (!folio_test_hugetlb(folio))
+                               continue;
+                       nr_pages = folio_nr_pages(folio);
+                       cond_resched();
+                       /*
+                        * Migratability of hugepages depends on architectures and
+                        * their size.  This check is necessary because some callers
+                        * of hugepage migration like soft offline and memory
+                        * hotremove don't walk through page tables or check whether
+                        * the hugepage is pmd-based or not before kicking migration.
+                        */
+                       if (!hugepage_migration_supported(folio_hstate(folio))) {
+                               nr_failed++;
+                               stats->nr_failed_pages += nr_pages;
+                               list_move_tail(&folio->lru, ret_folios);
+                               continue;
+                       }
+                       rc = unmap_and_move_huge_page(get_new_page,
+                                                     put_new_page, private,
+                                                     &folio->page, pass > 2, mode,
+                                                     reason, ret_folios);
+                       /*
+                        * The rules are:
+                        *      Success: hugetlb folio will be put back
+                        *      -EAGAIN: stay on the from list
+                        *      -ENOMEM: stay on the from list
+                        *      Other errno: put on ret_folios list
+                        */
+                       switch(rc) {
+                       case -ENOMEM:
+                               /*
+                                * When memory is low, don't bother to try to migrate
+                                * other folios, just exit.
+                                */
+                               stats->nr_failed_pages += nr_pages + nr_retry_pages;
+                               return -ENOMEM;
+                       case -EAGAIN:
+                               retry++;
+                               nr_retry_pages += nr_pages;
+                               break;
+                       case MIGRATEPAGE_SUCCESS:
+                               stats->nr_succeeded += nr_pages;
+                               break;
+                       default:
+                               /*
+                                * Permanent failure (-EBUSY, etc.):
+                                * unlike -EAGAIN case, the failed folio is
+                                * removed from migration folio list and not
+                                * retried in the next outer loop.
+                                */
+                               nr_failed++;
+                               stats->nr_failed_pages += nr_pages;
+                               break;
+                       }
+               }
+       }
+       /*
+        * nr_failed is number of hugetlb folios failed to be migrated.  After
+        * NR_MAX_MIGRATE_PAGES_RETRY attempts, give up and count retried hugetlb
+        * folios as failed.
+        */
+       nr_failed += retry;
+       stats->nr_failed_pages += nr_retry_pages;
+       return nr_failed;
+ }
+ /*
+  * migrate_pages_batch() first unmaps folios in the from list as many as
+  * possible, then move the unmapped folios.
+  */
+ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
+               free_page_t put_new_page, unsigned long private,
+               enum migrate_mode mode, int reason, struct list_head *ret_folios,
+               struct migrate_pages_stats *stats)
+ {
+       int retry;
        int large_retry = 1;
        int thp_retry = 1;
        int nr_failed = 0;
-       int nr_failed_pages = 0;
        int nr_retry_pages = 0;
-       int nr_succeeded = 0;
-       int nr_thp_succeeded = 0;
        int nr_large_failed = 0;
-       int nr_thp_failed = 0;
-       int nr_thp_split = 0;
        int pass = 0;
        bool is_large = false;
        bool is_thp = false;
-       struct folio *folio, *folio2;
-       int rc, nr_pages;
-       LIST_HEAD(ret_folios);
+       struct folio *folio, *folio2, *dst = NULL, *dst2;
+       int rc, rc_saved, nr_pages;
        LIST_HEAD(split_folios);
+       LIST_HEAD(unmap_folios);
+       LIST_HEAD(dst_folios);
        bool nosplit = (reason == MR_NUMA_MISPLACED);
        bool no_split_folio_counting = false;
-       trace_mm_migrate_pages_start(mode, reason);
- split_folio_migration:
-       for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
+       bool avoid_force_lock;
+ retry:
+       rc_saved = 0;
+       avoid_force_lock = false;
+       retry = 1;
+       for (pass = 0;
+            pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
+            pass++) {
                retry = 0;
                large_retry = 0;
                thp_retry = 0;
                         * folio. Capture required information that might get
                         * lost during migration.
                         */
-                       is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
+                       is_large = folio_test_large(folio);
                        is_thp = is_large && folio_test_pmd_mappable(folio);
                        nr_pages = folio_nr_pages(folio);
                        cond_resched();
  
-                       if (folio_test_hugetlb(folio))
-                               rc = unmap_and_move_huge_page(get_new_page,
-                                               put_new_page, private,
-                                               &folio->page, pass > 2, mode,
-                                               reason,
-                                               &ret_folios);
-                       else
-                               rc = unmap_and_move(get_new_page, put_new_page,
-                                               private, folio, pass > 2, mode,
-                                               reason, &ret_folios);
-                       /*
-                        * The rules are:
-                        *      Success: non hugetlb folio will be freed, hugetlb
-                        *               folio will be put back
-                        *      -EAGAIN: stay on the from list
-                        *      -ENOMEM: stay on the from list
-                        *      -ENOSYS: stay on the from list
-                        *      Other errno: put on ret_folios list then splice to
-                        *                   from list
-                        */
-                       switch(rc) {
                        /*
                         * Large folio migration might be unsupported or
-                        * the allocation could've failed so we should retry
+                        * the allocation might be failed so we should retry
                         * on the same folio with the large folio split
                         * to normal folios.
                         *
                         * we will migrate them after the rest of the
                         * list is processed.
                         */
-                       case -ENOSYS:
-                               /* Large folio migration is unsupported */
-                               if (is_large) {
-                                       nr_large_failed++;
-                                       nr_thp_failed += is_thp;
-                                       if (!try_split_folio(folio, &split_folios)) {
-                                               nr_thp_split += is_thp;
-                                               break;
-                                       }
-                               /* Hugetlb migration is unsupported */
-                               } else if (!no_split_folio_counting) {
-                                       nr_failed++;
+                       if (!thp_migration_supported() && is_thp) {
+                               nr_large_failed++;
+                               stats->nr_thp_failed++;
+                               if (!try_split_folio(folio, &split_folios)) {
+                                       stats->nr_thp_split++;
+                                       continue;
                                }
+                               stats->nr_failed_pages += nr_pages;
+                               list_move_tail(&folio->lru, ret_folios);
+                               continue;
+                       }
  
-                               nr_failed_pages += nr_pages;
-                               list_move_tail(&folio->lru, &ret_folios);
-                               break;
+                       rc = migrate_folio_unmap(get_new_page, put_new_page, private,
+                                                folio, &dst, pass > 2, avoid_force_lock,
+                                                mode, reason, ret_folios);
+                       /*
+                        * The rules are:
+                        *      Success: folio will be freed
+                        *      Unmap: folio will be put on unmap_folios list,
+                        *             dst folio put on dst_folios list
+                        *      -EAGAIN: stay on the from list
+                        *      -EDEADLOCK: stay on the from list
+                        *      -ENOMEM: stay on the from list
+                        *      Other errno: put on ret_folios list
+                        */
+                       switch(rc) {
                        case -ENOMEM:
                                /*
                                 * When memory is low, don't bother to try to migrate
-                                * other folios, just exit.
+                                * other folios, move unmapped folios, then exit.
                                 */
                                if (is_large) {
                                        nr_large_failed++;
-                                       nr_thp_failed += is_thp;
+                                       stats->nr_thp_failed += is_thp;
                                        /* Large folio NUMA faulting doesn't split to retry. */
                                        if (!nosplit) {
                                                int ret = try_split_folio(folio, &split_folios);
  
                                                if (!ret) {
-                                                       nr_thp_split += is_thp;
+                                                       stats->nr_thp_split += is_thp;
                                                        break;
                                                } else if (reason == MR_LONGTERM_PIN &&
                                                           ret == -EAGAIN) {
                                        nr_failed++;
                                }
  
-                               nr_failed_pages += nr_pages + nr_retry_pages;
+                               stats->nr_failed_pages += nr_pages + nr_retry_pages;
                                /*
                                 * There might be some split folios of fail-to-migrate large
-                                * folios left in split_folios list. Move them back to migration
+                                * folios left in split_folios list. Move them to ret_folios
                                 * list so that they could be put back to the right list by
                                 * the caller otherwise the folio refcnt will be leaked.
                                 */
-                               list_splice_init(&split_folios, from);
+                               list_splice_init(&split_folios, ret_folios);
                                /* nr_failed isn't updated for not used */
                                nr_large_failed += large_retry;
-                               nr_thp_failed += thp_retry;
-                               goto out;
+                               stats->nr_thp_failed += thp_retry;
+                               rc_saved = rc;
+                               if (list_empty(&unmap_folios))
+                                       goto out;
+                               else
+                                       goto move;
+                       case -EDEADLOCK:
+                               /*
+                                * The folio cannot be locked for potential deadlock.
+                                * Go move (and unlock) all locked folios.  Then we can
+                                * try again.
+                                */
+                               rc_saved = rc;
+                               goto move;
                        case -EAGAIN:
                                if (is_large) {
                                        large_retry++;
                                nr_retry_pages += nr_pages;
                                break;
                        case MIGRATEPAGE_SUCCESS:
-                               nr_succeeded += nr_pages;
-                               nr_thp_succeeded += is_thp;
+                               stats->nr_succeeded += nr_pages;
+                               stats->nr_thp_succeeded += is_thp;
+                               break;
+                       case MIGRATEPAGE_UNMAP:
+                               /*
+                                * We have locked some folios, don't force lock
+                                * to avoid deadlock.
+                                */
+                               avoid_force_lock = true;
+                               list_move_tail(&folio->lru, &unmap_folios);
+                               list_add_tail(&dst->lru, &dst_folios);
                                break;
                        default:
                                /*
                                 */
                                if (is_large) {
                                        nr_large_failed++;
-                                       nr_thp_failed += is_thp;
+                                       stats->nr_thp_failed += is_thp;
+                               } else if (!no_split_folio_counting) {
+                                       nr_failed++;
+                               }
+                               stats->nr_failed_pages += nr_pages;
+                               break;
+                       }
+               }
+       }
+       nr_failed += retry;
+       nr_large_failed += large_retry;
+       stats->nr_thp_failed += thp_retry;
+       stats->nr_failed_pages += nr_retry_pages;
+ move:
+       /* Flush TLBs for all unmapped folios */
+       try_to_unmap_flush();
+       retry = 1;
+       for (pass = 0;
+            pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
+            pass++) {
+               retry = 0;
+               large_retry = 0;
+               thp_retry = 0;
+               nr_retry_pages = 0;
+               dst = list_first_entry(&dst_folios, struct folio, lru);
+               dst2 = list_next_entry(dst, lru);
+               list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+                       is_large = folio_test_large(folio);
+                       is_thp = is_large && folio_test_pmd_mappable(folio);
+                       nr_pages = folio_nr_pages(folio);
+                       cond_resched();
+                       rc = migrate_folio_move(put_new_page, private,
+                                               folio, dst, mode,
+                                               reason, ret_folios);
+                       /*
+                        * The rules are:
+                        *      Success: folio will be freed
+                        *      -EAGAIN: stay on the unmap_folios list
+                        *      Other errno: put on ret_folios list
+                        */
+                       switch(rc) {
+                       case -EAGAIN:
+                               if (is_large) {
+                                       large_retry++;
+                                       thp_retry += is_thp;
+                               } else if (!no_split_folio_counting) {
+                                       retry++;
+                               }
+                               nr_retry_pages += nr_pages;
+                               break;
+                       case MIGRATEPAGE_SUCCESS:
+                               stats->nr_succeeded += nr_pages;
+                               stats->nr_thp_succeeded += is_thp;
+                               break;
+                       default:
+                               if (is_large) {
+                                       nr_large_failed++;
+                                       stats->nr_thp_failed += is_thp;
                                } else if (!no_split_folio_counting) {
                                        nr_failed++;
                                }
  
-                               nr_failed_pages += nr_pages;
+                               stats->nr_failed_pages += nr_pages;
                                break;
                        }
+                       dst = dst2;
+                       dst2 = list_next_entry(dst, lru);
                }
        }
        nr_failed += retry;
        nr_large_failed += large_retry;
-       nr_thp_failed += thp_retry;
-       nr_failed_pages += nr_retry_pages;
+       stats->nr_thp_failed += thp_retry;
+       stats->nr_failed_pages += nr_retry_pages;
+       if (rc_saved)
+               rc = rc_saved;
+       else
+               rc = nr_failed + nr_large_failed;
+ out:
+       /* Cleanup remaining folios */
+       dst = list_first_entry(&dst_folios, struct folio, lru);
+       dst2 = list_next_entry(dst, lru);
+       list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+               int page_was_mapped = 0;
+               struct anon_vma *anon_vma = NULL;
+               __migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+               migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
+                                      true, ret_folios);
+               list_del(&dst->lru);
+               migrate_folio_undo_dst(dst, true, put_new_page, private);
+               dst = dst2;
+               dst2 = list_next_entry(dst, lru);
+       }
        /*
         * Try to migrate split folios of fail-to-migrate large folios, no
         * nr_failed counting in this round, since all split folios of a
         * large folio is counted as 1 failure in the first round.
         */
-       if (!list_empty(&split_folios)) {
+       if (rc >= 0 && !list_empty(&split_folios)) {
                /*
-                * Move non-migrated folios (after 10 retries) to ret_folios
-                * to avoid migrating them again.
+                * Move non-migrated folios (after NR_MAX_MIGRATE_PAGES_RETRY
+                * retries) to ret_folios to avoid migrating them again.
                 */
-               list_splice_init(from, &ret_folios);
+               list_splice_init(from, ret_folios);
                list_splice_init(&split_folios, from);
                no_split_folio_counting = true;
-               retry = 1;
-               goto split_folio_migration;
+               goto retry;
        }
  
-       rc = nr_failed + nr_large_failed;
+       /*
+        * We have unlocked all locked folios, so we can force lock now, let's
+        * try again.
+        */
+       if (rc == -EDEADLOCK)
+               goto retry;
+       return rc;
+ }
+ /*
+  * migrate_pages - migrate the folios specified in a list, to the free folios
+  *               supplied as the target for the page migration
+  *
+  * @from:             The list of folios to be migrated.
+  * @get_new_page:     The function used to allocate free folios to be used
+  *                    as the target of the folio migration.
+  * @put_new_page:     The function used to free target folios if migration
+  *                    fails, or NULL if no special handling is necessary.
+  * @private:          Private data to be passed on to get_new_page()
+  * @mode:             The migration mode that specifies the constraints for
+  *                    folio migration, if any.
+  * @reason:           The reason for folio migration.
+  * @ret_succeeded:    Set to the number of folios migrated successfully if
+  *                    the caller passes a non-NULL pointer.
+  *
+  * The function returns after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no folios
+  * are movable any more because the list has become empty or no retryable folios
+  * exist any more. It is caller's responsibility to call putback_movable_pages()
+  * only if ret != 0.
+  *
+  * Returns the number of {normal folio, large folio, hugetlb} that were not
+  * migrated, or an error code. The number of large folio splits will be
+  * considered as the number of non-migrated large folio, no matter how many
+  * split folios of the large folio are migrated successfully.
+  */
+ int migrate_pages(struct list_head *from, new_page_t get_new_page,
+               free_page_t put_new_page, unsigned long private,
+               enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+ {
+       int rc, rc_gather;
+       int nr_pages;
+       struct folio *folio, *folio2;
+       LIST_HEAD(folios);
+       LIST_HEAD(ret_folios);
+       struct migrate_pages_stats stats;
+       trace_mm_migrate_pages_start(mode, reason);
+       memset(&stats, 0, sizeof(stats));
+       rc_gather = migrate_hugetlbs(from, get_new_page, put_new_page, private,
+                                    mode, reason, &stats, &ret_folios);
+       if (rc_gather < 0)
+               goto out;
+ again:
+       nr_pages = 0;
+       list_for_each_entry_safe(folio, folio2, from, lru) {
+               /* Retried hugetlb folios will be kept in list  */
+               if (folio_test_hugetlb(folio)) {
+                       list_move_tail(&folio->lru, &ret_folios);
+                       continue;
+               }
+               nr_pages += folio_nr_pages(folio);
+               if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+                       break;
+       }
+       if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+               list_cut_before(&folios, from, &folio->lru);
+       else
+               list_splice_init(from, &folios);
+       rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
+                                mode, reason, &ret_folios, &stats);
+       list_splice_tail_init(&folios, &ret_folios);
+       if (rc < 0) {
+               rc_gather = rc;
+               goto out;
+       }
+       rc_gather += rc;
+       if (!list_empty(from))
+               goto again;
  out:
        /*
         * Put the permanent failure folio back to migration list, they
         * are migrated successfully.
         */
        if (list_empty(from))
-               rc = 0;
+               rc_gather = 0;
  
-       count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
-       count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
-       count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
-       count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
-       count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
-       trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
-                              nr_thp_failed, nr_thp_split, mode, reason);
+       count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
+       count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
+       count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
+       count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
+       count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
+       trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
+                              stats.nr_thp_succeeded, stats.nr_thp_failed,
+                              stats.nr_thp_split, mode, reason);
  
        if (ret_succeeded)
-               *ret_succeeded = nr_succeeded;
+               *ret_succeeded = stats.nr_succeeded;
  
-       return rc;
+       return rc_gather;
  }
  
  struct page *alloc_migration_target(struct page *page, unsigned long private)
        struct migration_target_control *mtc;
        gfp_t gfp_mask;
        unsigned int order = 0;
+       struct folio *hugetlb_folio = NULL;
        struct folio *new_folio = NULL;
        int nid;
        int zidx;
                struct hstate *h = folio_hstate(folio);
  
                gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-               return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
+               hugetlb_folio = alloc_hugetlb_folio_nodemask(h, nid,
+                                               mtc->nmask, gfp_mask);
+               return &hugetlb_folio->page;
        }
  
        if (folio_test_large(folio)) {
@@@ -1727,6 -2095,7 +2097,7 @@@ static int add_page_for_migration(struc
        struct vm_area_struct *vma;
        struct page *page;
        int err;
+       bool isolated;
  
        mmap_read_lock(mm);
        err = -EFAULT;
  
        if (PageHuge(page)) {
                if (PageHead(page)) {
-                       err = isolate_hugetlb(page, pagelist);
-                       if (!err)
-                               err = 1;
+                       isolated = isolate_hugetlb(page_folio(page), pagelist);
+                       err = isolated ? 1 : -EBUSY;
                }
        } else {
                struct page *head;
  
                head = compound_head(page);
-               err = isolate_lru_page(head);
-               if (err)
+               isolated = isolate_lru_page(head);
+               if (!isolated) {
+                       err = -EBUSY;
                        goto out_putpage;
+               }
  
                err = 1;
                list_add_tail(&head->lru, pagelist);
@@@ -2173,7 -2543,7 +2545,7 @@@ static int numamigrate_isolate_page(pg_
                return 0;
        }
  
-       if (isolate_lru_page(page))
+       if (!isolate_lru_page(page))
                return 0;
  
        mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
diff --combined mm/page_alloc.c
index 3bb3484563eda591d7e1fba46b2e3263d870a11c,4c9ab8b93b1a3b9168f764dc93efacabdd4e5ac4..ac1fc986af44c46736d4891dea771473c3bee7f1
@@@ -430,6 -430,8 +430,8 @@@ EXPORT_SYMBOL(nr_online_nodes)
  
  int page_group_by_mobility_disabled __read_mostly;
  
+ bool deferred_struct_pages __meminitdata;
  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
  /*
   * During boot we initialize deferred pages on-demand, as needed, but once
@@@ -443,15 -445,15 +445,15 @@@ static inline bool deferred_pages_enabl
        return static_branch_unlikely(&deferred_pages);
  }
  
- /* Returns true if the struct page for the pfn is uninitialised */
- static inline bool __meminit early_page_uninitialised(unsigned long pfn)
+ /* Returns true if the struct page for the pfn is initialised */
+ static inline bool __meminit early_page_initialised(unsigned long pfn)
  {
        int nid = early_pfn_to_nid(pfn);
  
        if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
-               return true;
+               return false;
  
-       return false;
+       return true;
  }
  
  /*
@@@ -498,9 -500,9 +500,9 @@@ static inline bool deferred_pages_enabl
        return false;
  }
  
- static inline bool early_page_uninitialised(unsigned long pfn)
+ static inline bool early_page_initialised(unsigned long pfn)
  {
-       return false;
+       return true;
  }
  
  static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
@@@ -775,11 -777,13 +777,13 @@@ void free_compound_page(struct page *pa
  
  static void prep_compound_head(struct page *page, unsigned int order)
  {
+       struct folio *folio = (struct folio *)page;
        set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
        set_compound_order(page, order);
-       atomic_set(compound_mapcount_ptr(page), -1);
-       atomic_set(subpages_mapcount_ptr(page), 0);
-       atomic_set(compound_pincount_ptr(page), 0);
+       atomic_set(&folio->_entire_mapcount, -1);
+       atomic_set(&folio->_nr_pages_mapped, 0);
+       atomic_set(&folio->_pincount, 0);
  }
  
  static void prep_compound_tail(struct page *head, int tail_idx)
@@@ -805,7 -809,7 +809,7 @@@ void prep_compound_page(struct page *pa
  
  void destroy_large_folio(struct folio *folio)
  {
-       enum compound_dtor_id dtor = folio_page(folio, 1)->compound_dtor;
+       enum compound_dtor_id dtor = folio->_folio_dtor;
  
        VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
        compound_page_dtors[dtor](&folio->page);
@@@ -1291,6 -1295,7 +1295,7 @@@ static inline bool free_page_is_bad(str
  
  static int free_tail_pages_check(struct page *head_page, struct page *page)
  {
+       struct folio *folio = (struct folio *)head_page;
        int ret = 1;
  
        /*
        switch (page - head_page) {
        case 1:
                /* the first tail page: these may be in place of ->mapping */
-               if (unlikely(head_compound_mapcount(head_page))) {
-                       bad_page(page, "nonzero compound_mapcount");
+               if (unlikely(folio_entire_mapcount(folio))) {
+                       bad_page(page, "nonzero entire_mapcount");
                        goto out;
                }
-               if (unlikely(atomic_read(subpages_mapcount_ptr(head_page)))) {
-                       bad_page(page, "nonzero subpages_mapcount");
+               if (unlikely(atomic_read(&folio->_nr_pages_mapped))) {
+                       bad_page(page, "nonzero nr_pages_mapped");
                        goto out;
                }
-               if (unlikely(head_compound_pincount(head_page))) {
-                       bad_page(page, "nonzero compound_pincount");
+               if (unlikely(atomic_read(&folio->_pincount))) {
+                       bad_page(page, "nonzero pincount");
                        goto out;
                }
                break;
@@@ -1356,6 -1361,8 +1361,8 @@@ out
   *    see the comment next to it.
   * 3. Skipping poisoning is requested via __GFP_SKIP_KASAN_POISON,
   *    see the comment next to it.
+  * 4. The allocation is excluded from being checked due to sampling,
+  *    see the call to kasan_unpoison_pages.
   *
   * Poisoning pages during deferred memory init will greatly lengthen the
   * process and cause problem in large memory systems as the deferred pages
@@@ -1403,7 -1410,7 +1410,7 @@@ static __always_inline bool free_pages_
                 * Do not let hwpoison pages hit pcplists/buddy
                 * Untie memcg state and reset page's owner
                 */
-               if (memcg_kmem_enabled() && PageMemcgKmem(page))
+               if (memcg_kmem_online() && PageMemcgKmem(page))
                        __memcg_kmem_uncharge_page(page, order);
                reset_page_owner(page, order);
                page_table_check_free(page, order);
        }
        if (PageMappingFlags(page))
                page->mapping = NULL;
-       if (memcg_kmem_enabled() && PageMemcgKmem(page))
+       if (memcg_kmem_online() && PageMemcgKmem(page))
                __memcg_kmem_uncharge_page(page, order);
        if (check_free && free_page_is_bad(page))
                bad++;
@@@ -1641,7 -1648,7 +1648,7 @@@ static void __meminit init_reserved_pag
        pg_data_t *pgdat;
        int nid, zid;
  
-       if (!early_page_uninitialised(pfn))
+       if (early_page_initialised(pfn))
                return;
  
        nid = early_pfn_to_nid(pfn);
@@@ -1804,7 -1811,7 +1811,7 @@@ int __meminit early_pfn_to_nid(unsigne
  void __init memblock_free_pages(struct page *page, unsigned long pfn,
                                                        unsigned int order)
  {
-       if (early_page_uninitialised(pfn))
+       if (!early_page_initialised(pfn))
                return;
        if (!kmsan_memblock_free_pages(page, order)) {
                /* KMSAN will take care of these pages. */
@@@ -2468,7 -2475,8 +2475,8 @@@ inline void post_alloc_hook(struct pag
  {
        bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
                        !should_skip_init(gfp_flags);
-       bool init_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+       bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+       bool reset_tags = true;
        int i;
  
        set_page_private(page, 0);
         */
  
        /*
-        * If memory tags should be zeroed (which happens only when memory
-        * should be initialized as well).
+        * If memory tags should be zeroed
+        * (which happens only when memory should be initialized as well).
         */
-       if (init_tags) {
-               /* Initialize both memory and tags. */
+       if (zero_tags) {
+               /* Initialize both memory and memory tags. */
                for (i = 0; i != 1 << order; ++i)
                        tag_clear_highpage(page + i);
  
-               /* Note that memory is already initialized by the loop above. */
+               /* Take note that memory was initialized by the loop above. */
                init = false;
        }
        if (!should_skip_kasan_unpoison(gfp_flags)) {
-               /* Unpoison shadow memory or set memory tags. */
-               kasan_unpoison_pages(page, order, init);
-               /* Note that memory is already initialized by KASAN. */
-               if (kasan_has_integrated_init())
-                       init = false;
-       } else {
-               /* Ensure page_address() dereferencing does not fault. */
+               /* Try unpoisoning (or setting tags) and initializing memory. */
+               if (kasan_unpoison_pages(page, order, init)) {
+                       /* Take note that memory was initialized by KASAN. */
+                       if (kasan_has_integrated_init())
+                               init = false;
+                       /* Take note that memory tags were set by KASAN. */
+                       reset_tags = false;
+               } else {
+                       /*
+                        * KASAN decided to exclude this allocation from being
+                        * (un)poisoned due to sampling. Make KASAN skip
+                        * poisoning when the allocation is freed.
+                        */
+                       SetPageSkipKASanPoison(page);
+               }
+       }
+       /*
+        * If memory tags have not been set by KASAN, reset the page tags to
+        * ensure page_address() dereferencing does not fault.
+        */
+       if (reset_tags) {
                for (i = 0; i != 1 << order; ++i)
                        page_kasan_tag_reset(page + i);
        }
-       /* If memory is still not initialized, do it now. */
+       /* If memory is still not initialized, initialize it now. */
        if (init)
                kernel_init_pages(page, 1 << order);
        /* Propagate __GFP_SKIP_KASAN_POISON to page flags. */
@@@ -2582,10 -2603,10 +2603,10 @@@ struct page *__rmqueue_smallest(struct 
   *
   * The other migratetypes do not have fallbacks.
   */
- static int fallbacks[MIGRATE_TYPES][3] = {
-       [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
-       [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
-       [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
+ static int fallbacks[MIGRATE_TYPES][MIGRATE_PCPTYPES - 1] = {
+       [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE   },
+       [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
+       [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
  };
  
  #ifdef CONFIG_CMA
@@@ -2844,11 -2865,8 +2865,8 @@@ int find_suitable_fallback(struct free_
                return -1;
  
        *can_steal = false;
-       for (i = 0;; i++) {
+       for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
                fallback_mt = fallbacks[migratetype][i];
-               if (fallback_mt == MIGRATE_TYPES)
-                       break;
                if (free_area_empty(area, fallback_mt))
                        continue;
  
@@@ -3706,10 -3724,20 +3724,20 @@@ struct page *rmqueue_buddy(struct zone 
                 * reserved for high-order atomic allocation, so order-0
                 * request should skip it.
                 */
-               if (order > 0 && alloc_flags & ALLOC_HARDER)
+               if (alloc_flags & ALLOC_HIGHATOMIC)
                        page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
                if (!page) {
                        page = __rmqueue(zone, order, migratetype, alloc_flags);
+                       /*
+                        * If the allocation fails, allow OOM handling access
+                        * to HIGHATOMIC reserves as failing now is worse than
+                        * failing a high-order atomic allocation in the
+                        * future.
+                        */
+                       if (!page && (alloc_flags & ALLOC_OOM))
+                               page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
                        if (!page) {
                                spin_unlock_irqrestore(&zone->lock, flags);
                                return NULL;
@@@ -3939,15 -3967,14 +3967,14 @@@ ALLOW_ERROR_INJECTION(should_fail_alloc
  static inline long __zone_watermark_unusable_free(struct zone *z,
                                unsigned int order, unsigned int alloc_flags)
  {
-       const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
        long unusable_free = (1 << order) - 1;
  
        /*
-        * If the caller does not have rights to ALLOC_HARDER then subtract
-        * the high-atomic reserves. This will over-estimate the size of the
-        * atomic reserve but it avoids a search.
+        * If the caller does not have rights to reserves below the min
+        * watermark then subtract the high-atomic reserves. This will
+        * over-estimate the size of the atomic reserve but it avoids a search.
         */
-       if (likely(!alloc_harder))
+       if (likely(!(alloc_flags & ALLOC_RESERVES)))
                unusable_free += z->nr_reserved_highatomic;
  
  #ifdef CONFIG_CMA
@@@ -3971,25 -3998,37 +3998,37 @@@ bool __zone_watermark_ok(struct zone *z
  {
        long min = mark;
        int o;
-       const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
  
        /* free_pages may go negative - that's OK */
        free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
  
-       if (alloc_flags & ALLOC_HIGH)
-               min -= min / 2;
+       if (unlikely(alloc_flags & ALLOC_RESERVES)) {
+               /*
+                * __GFP_HIGH allows access to 50% of the min reserve as well
+                * as OOM.
+                */
+               if (alloc_flags & ALLOC_MIN_RESERVE) {
+                       min -= min / 2;
+                       /*
+                        * Non-blocking allocations (e.g. GFP_ATOMIC) can
+                        * access more reserves than just __GFP_HIGH. Other
+                        * non-blocking allocations requests such as GFP_NOWAIT
+                        * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
+                        * access to the min reserve.
+                        */
+                       if (alloc_flags & ALLOC_NON_BLOCK)
+                               min -= min / 4;
+               }
  
-       if (unlikely(alloc_harder)) {
                /*
-                * OOM victims can try even harder than normal ALLOC_HARDER
+                * OOM victims can try even harder than the normal reserve
                 * users on the grounds that it's definitely going to be in
                 * the exit path shortly and free memory. Any allocation it
                 * makes during the free path will be small and short-lived.
                 */
                if (alloc_flags & ALLOC_OOM)
                        min -= min / 2;
-               else
-                       min -= min / 4;
        }
  
        /*
                        return true;
                }
  #endif
-               if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
+               if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
+                   !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
                        return true;
+               }
        }
        return false;
  }
@@@ -4064,13 -4105,14 +4105,14 @@@ static inline bool zone_watermark_fast(
        if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
                                        free_pages))
                return true;
        /*
-        * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
+        * Ignore watermark boosting for __GFP_HIGH order-0 allocations
         * when checking the min watermark. The min watermark is the
         * point where boosting is ignored so that kswapd is woken up
         * when below the low watermark.
         */
-       if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
+       if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
                && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
                mark = z->_watermark[WMARK_MIN];
                return __zone_watermark_ok(z, order, mark, highest_zoneidx,
@@@ -4244,7 -4286,7 +4286,7 @@@ retry
                         * Watermark failed for this zone, but see if we can
                         * grow this zone if it contains deferred pages.
                         */
-                       if (static_branch_unlikely(&deferred_pages)) {
+                       if (deferred_pages_enabled()) {
                                if (_deferred_grow_zone(zone, order))
                                        goto try_this_zone;
                        }
@@@ -4286,14 -4328,14 +4328,14 @@@ try_this_zone
                         * If this is a high-order atomic allocation then check
                         * if the pageblock should be reserved for the future
                         */
-                       if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
+                       if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
                                reserve_highatomic_pageblock(page, zone, order);
  
                        return page;
                } else {
  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
                        /* Try again if zone has deferred pages */
-                       if (static_branch_unlikely(&deferred_pages)) {
+                       if (deferred_pages_enabled()) {
                                if (_deferred_grow_zone(zone, order))
                                        goto try_this_zone;
                        }
@@@ -4813,41 -4855,48 +4855,48 @@@ static void wake_all_kswapds(unsigned i
  }
  
  static inline unsigned int
- gfp_to_alloc_flags(gfp_t gfp_mask)
+ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
  {
        unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
  
        /*
-        * __GFP_HIGH is assumed to be the same as ALLOC_HIGH
+        * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE
         * and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD
         * to save two branches.
         */
-       BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
+       BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_MIN_RESERVE);
        BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM != (__force gfp_t) ALLOC_KSWAPD);
  
        /*
         * The caller may dip into page reserves a bit more if the caller
         * cannot run direct reclaim, or if the caller has realtime scheduling
         * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-        * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH).
+        * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
         */
        alloc_flags |= (__force int)
                (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
  
-       if (gfp_mask & __GFP_ATOMIC) {
+       if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
                /*
                 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
                 * if it can't schedule.
                 */
-               if (!(gfp_mask & __GFP_NOMEMALLOC))
-                       alloc_flags |= ALLOC_HARDER;
+               if (!(gfp_mask & __GFP_NOMEMALLOC)) {
+                       alloc_flags |= ALLOC_NON_BLOCK;
+                       if (order > 0)
+                               alloc_flags |= ALLOC_HIGHATOMIC;
+               }
                /*
-                * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
-                * comment for __cpuset_node_allowed().
+                * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
+                * GFP_ATOMIC) rather than fail, see the comment for
+                * __cpuset_node_allowed().
                 */
-               alloc_flags &= ~ALLOC_CPUSET;
+               if (alloc_flags & ALLOC_MIN_RESERVE)
+                       alloc_flags &= ~ALLOC_CPUSET;
        } else if (unlikely(rt_task(current)) && in_task())
-               alloc_flags |= ALLOC_HARDER;
+               alloc_flags |= ALLOC_MIN_RESERVE;
  
        alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
  
@@@ -5028,14 -5077,6 +5077,6 @@@ __alloc_pages_slowpath(gfp_t gfp_mask, 
        unsigned int zonelist_iter_cookie;
        int reserve_flags;
  
-       /*
-        * We also sanity check to catch abuse of atomic reserves being used by
-        * callers that are not in atomic context.
-        */
-       if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
-                               (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
-               gfp_mask &= ~__GFP_ATOMIC;
  restart:
        compaction_retries = 0;
        no_progress_loops = 0;
         * kswapd needs to be woken up, and to avoid the cost of setting up
         * alloc_flags precisely. So we do that now.
         */
-       alloc_flags = gfp_to_alloc_flags(gfp_mask);
+       alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
  
        /*
         * We need to recalculate the starting point for the zonelist iterator
@@@ -5276,12 -5317,13 +5317,13 @@@ nopage
                WARN_ON_ONCE_GFP(costly_order, gfp_mask);
  
                /*
-                * Help non-failing allocations by giving them access to memory
-                * reserves but do not use ALLOC_NO_WATERMARKS because this
+                * Help non-failing allocations by giving some access to memory
+                * reserves normally used for high priority non-blocking
+                * allocations but do not use ALLOC_NO_WATERMARKS because this
                 * could deplete whole memory reserves which would just make
-                * the situation worse
+                * the situation worse.
                 */
-               page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+               page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
                if (page)
                        goto got_pg;
  
@@@ -5390,7 -5432,7 +5432,7 @@@ unsigned long __alloc_pages_bulk(gfp_t 
                goto out;
  
        /* Bulk allocator does not support memcg accounting. */
-       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT))
+       if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT))
                goto failed;
  
        /* Use the single page allocator for one page. */
@@@ -5562,7 -5604,7 +5604,7 @@@ struct page *__alloc_pages(gfp_t gfp, u
        page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
  
  out:
-       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
+       if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
            unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
                __free_pages(page, order);
                page = NULL;
@@@ -5631,12 -5673,9 +5673,12 @@@ EXPORT_SYMBOL(get_zeroed_page)
   */
  void __free_pages(struct page *page, unsigned int order)
  {
 +      /* get PageHead before we drop reference */
 +      int head = PageHead(page);
 +
        if (put_page_testzero(page))
                free_the_page(page, order);
 -      else if (!PageHead(page))
 +      else if (!head)
                while (order-- > 0)
                        free_the_page(page + (1 << order), order);
  }
@@@ -6764,8 -6803,10 +6806,10 @@@ void __meminit memmap_init_range(unsign
                if (context == MEMINIT_EARLY) {
                        if (overlap_memmap_init(zone, &pfn))
                                continue;
-                       if (defer_init(nid, pfn, zone_end_pfn))
+                       if (defer_init(nid, pfn, zone_end_pfn)) {
+                               deferred_struct_pages = true;
                                break;
+                       }
                }
  
                page = pfn_to_page(pfn);
@@@ -7929,6 -7970,7 +7973,7 @@@ static void __init free_area_init_node(
        pgdat_set_deferred_range(pgdat);
  
        free_area_init_core(pgdat);
+       lru_gen_init_pgdat(pgdat);
  }
  
  static void __init free_area_init_memoryless_node(int nid)
@@@ -8363,11 -8405,9 +8408,9 @@@ void __init free_area_init(unsigned lon
  
                        /* Allocator not initialized yet */
                        pgdat = arch_alloc_nodedata(nid);
-                       if (!pgdat) {
-                               pr_err("Cannot allocate %zuB for node %d.\n",
-                                               sizeof(*pgdat), nid);
-                               continue;
-                       }
+                       if (!pgdat)
+                               panic("Cannot allocate %zuB for node %d.\n",
+                                      sizeof(*pgdat), nid);
                        arch_refresh_nodedata(nid, pgdat);
                        free_area_init_memoryless_node(nid);
  
@@@ -8571,7 -8611,7 +8614,7 @@@ static int page_alloc_cpu_dead(unsigne
        struct zone *zone;
  
        lru_add_drain_cpu(cpu);
-       mlock_page_drain_remote(cpu);
+       mlock_drain_remote(cpu);
        drain_pages(cpu);
  
        /*
diff --combined mm/page_io.c
index 233f6e6eb1c50823fe148786e49d77eb4a1fb4a5,a805117f7fd73db984abc9902d60b58175d624b1..87b682d188503c2bb0d65267d03d290b106560d8
@@@ -18,7 -18,6 +18,6 @@@
  #include <linux/swap.h>
  #include <linux/bio.h>
  #include <linux/swapops.h>
- #include <linux/buffer_head.h>
  #include <linux/writeback.h>
  #include <linux/frontswap.h>
  #include <linux/blkdev.h>
@@@ -28,7 -27,7 +27,7 @@@
  #include <linux/delayacct.h>
  #include "swap.h"
  
- static void end_swap_bio_write(struct bio *bio)
+ static void __end_swap_bio_write(struct bio *bio)
  {
        struct page *page = bio_first_page_all(bio);
  
                ClearPageReclaim(page);
        }
        end_page_writeback(page);
+ }
+ static void end_swap_bio_write(struct bio *bio)
+ {
+       __end_swap_bio_write(bio);
        bio_put(bio);
  }
  
- static void end_swap_bio_read(struct bio *bio)
+ static void __end_swap_bio_read(struct bio *bio)
  {
        struct page *page = bio_first_page_all(bio);
-       struct task_struct *waiter = bio->bi_private;
  
        if (bio->bi_status) {
                SetPageError(page);
                pr_alert_ratelimited("Read-error on swap-device (%u:%u:%llu)\n",
                                     MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)),
                                     (unsigned long long)bio->bi_iter.bi_sector);
-               goto out;
+       } else {
+               SetPageUptodate(page);
        }
-       SetPageUptodate(page);
- out:
        unlock_page(page);
-       WRITE_ONCE(bio->bi_private, NULL);
+ }
+ static void end_swap_bio_read(struct bio *bio)
+ {
+       __end_swap_bio_read(bio);
        bio_put(bio);
-       if (waiter) {
-               blk_wake_io_task(waiter);
-               put_task_struct(waiter);
-       }
  }
  
  int generic_swapfile_activate(struct swap_info_struct *sis,
@@@ -181,11 -182,11 +182,11 @@@ bad_bmap
  int swap_writepage(struct page *page, struct writeback_control *wbc)
  {
        struct folio *folio = page_folio(page);
-       int ret = 0;
+       int ret;
  
        if (folio_free_swap(folio)) {
                folio_unlock(folio);
-               goto out;
+               return 0;
        }
        /*
         * Arch code may have to preserve more data than just the page
        if (ret) {
                folio_mark_dirty(folio);
                folio_unlock(folio);
-               goto out;
+               return ret;
        }
        if (frontswap_store(&folio->page) == 0) {
                folio_start_writeback(folio);
                folio_unlock(folio);
                folio_end_writeback(folio);
-               goto out;
+               return 0;
        }
-       ret = __swap_writepage(&folio->page, wbc);
- out:
-       return ret;
+       __swap_writepage(&folio->page, wbc);
+       return 0;
  }
  
  static inline void count_swpout_vm_event(struct page *page)
@@@ -292,7 -292,7 +292,7 @@@ static void sio_write_complete(struct k
        mempool_free(sio, sio_pool);
  }
  
- static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
+ static void swap_writepage_fs(struct page *page, struct writeback_control *wbc)
  {
        struct swap_iocb *sio = NULL;
        struct swap_info_struct *sis = page_swap_info(page);
                sio->pages = 0;
                sio->len = 0;
        }
 -      sio->bvec[sio->pages].bv_page = page;
 -      sio->bvec[sio->pages].bv_len = thp_size(page);
 -      sio->bvec[sio->pages].bv_offset = 0;
 +      bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0);
        sio->len += thp_size(page);
        sio->pages += 1;
        if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) {
        }
        if (wbc->swap_plug)
                *wbc->swap_plug = sio;
-       return 0;
  }
  
- int __swap_writepage(struct page *page, struct writeback_control *wbc)
+ static void swap_writepage_bdev_sync(struct page *page,
+               struct writeback_control *wbc, struct swap_info_struct *sis)
  {
-       struct bio *bio;
-       int ret;
-       struct swap_info_struct *sis = page_swap_info(page);
+       struct bio_vec bv;
+       struct bio bio;
  
-       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-       /*
-        * ->flags can be updated non-atomicially (scan_swap_map_slots),
-        * but that will never affect SWP_FS_OPS, so the data_race
-        * is safe.
-        */
-       if (data_race(sis->flags & SWP_FS_OPS))
-               return swap_writepage_fs(page, wbc);
+       bio_init(&bio, sis->bdev, &bv, 1,
+                REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc));
+       bio.bi_iter.bi_sector = swap_page_sector(page);
+       bio_add_page(&bio, page, thp_size(page), 0);
  
-       ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
-       if (!ret) {
-               count_swpout_vm_event(page);
-               return 0;
-       }
+       bio_associate_blkg_from_page(&bio, page);
+       count_swpout_vm_event(page);
+       set_page_writeback(page);
+       unlock_page(page);
+       submit_bio_wait(&bio);
+       __end_swap_bio_write(&bio);
+ }
+ static void swap_writepage_bdev_async(struct page *page,
+               struct writeback_control *wbc, struct swap_info_struct *sis)
+ {
+       struct bio *bio;
  
        bio = bio_alloc(sis->bdev, 1,
                        REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc),
        set_page_writeback(page);
        unlock_page(page);
        submit_bio(bio);
+ }
  
-       return 0;
+ void __swap_writepage(struct page *page, struct writeback_control *wbc)
+ {
+       struct swap_info_struct *sis = page_swap_info(page);
+       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+       /*
+        * ->flags can be updated non-atomicially (scan_swap_map_slots),
+        * but that will never affect SWP_FS_OPS, so the data_race
+        * is safe.
+        */
+       if (data_race(sis->flags & SWP_FS_OPS))
+               swap_writepage_fs(page, wbc);
+       else if (sis->flags & SWP_SYNCHRONOUS_IO)
+               swap_writepage_bdev_sync(page, wbc, sis);
+       else
+               swap_writepage_bdev_async(page, wbc, sis);
  }
  
  void swap_write_unplug(struct swap_iocb *sio)
@@@ -430,7 -451,9 +449,7 @@@ static void swap_readpage_fs(struct pag
                sio->pages = 0;
                sio->len = 0;
        }
 -      sio->bvec[sio->pages].bv_page = page;
 -      sio->bvec[sio->pages].bv_len = thp_size(page);
 -      sio->bvec[sio->pages].bv_offset = 0;
 +      bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0);
        sio->len += thp_size(page);
        sio->pages += 1;
        if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) {
                *plug = sio;
  }
  
- int swap_readpage(struct page *page, bool synchronous,
-                 struct swap_iocb **plug)
+ static void swap_readpage_bdev_sync(struct page *page,
+               struct swap_info_struct *sis)
+ {
+       struct bio_vec bv;
+       struct bio bio;
+       bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ);
+       bio.bi_iter.bi_sector = swap_page_sector(page);
+       bio_add_page(&bio, page, thp_size(page), 0);
+       /*
+        * Keep this task valid during swap readpage because the oom killer may
+        * attempt to access it in the page fault retry time check.
+        */
+       get_task_struct(current);
+       count_vm_event(PSWPIN);
+       submit_bio_wait(&bio);
+       __end_swap_bio_read(&bio);
+       put_task_struct(current);
+ }
+ static void swap_readpage_bdev_async(struct page *page,
+               struct swap_info_struct *sis)
  {
        struct bio *bio;
-       int ret = 0;
+       bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
+       bio->bi_iter.bi_sector = swap_page_sector(page);
+       bio->bi_end_io = end_swap_bio_read;
+       bio_add_page(bio, page, thp_size(page), 0);
+       count_vm_event(PSWPIN);
+       submit_bio(bio);
+ }
+ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug)
+ {
        struct swap_info_struct *sis = page_swap_info(page);
        bool workingset = PageWorkingset(page);
        unsigned long pflags;
        if (frontswap_load(page) == 0) {
                SetPageUptodate(page);
                unlock_page(page);
-               goto out;
-       }
-       if (data_race(sis->flags & SWP_FS_OPS)) {
+       } else if (data_race(sis->flags & SWP_FS_OPS)) {
                swap_readpage_fs(page, plug);
-               goto out;
-       }
-       if (sis->flags & SWP_SYNCHRONOUS_IO) {
-               ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
-               if (!ret) {
-                       count_vm_event(PSWPIN);
-                       goto out;
-               }
-       }
-       ret = 0;
-       bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
-       bio->bi_iter.bi_sector = swap_page_sector(page);
-       bio->bi_end_io = end_swap_bio_read;
-       bio_add_page(bio, page, thp_size(page), 0);
-       /*
-        * Keep this task valid during swap readpage because the oom killer may
-        * attempt to access it in the page fault retry time check.
-        */
-       if (synchronous) {
-               get_task_struct(current);
-               bio->bi_private = current;
-       }
-       count_vm_event(PSWPIN);
-       bio_get(bio);
-       submit_bio(bio);
-       while (synchronous) {
-               set_current_state(TASK_UNINTERRUPTIBLE);
-               if (!READ_ONCE(bio->bi_private))
-                       break;
-               blk_io_schedule();
+       } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) {
+               swap_readpage_bdev_sync(page, sis);
+       } else {
+               swap_readpage_bdev_async(page, sis);
        }
-       __set_current_state(TASK_RUNNING);
-       bio_put(bio);
  
- out:
        if (workingset) {
                delayacct_thrashing_end(&in_thrashing);
                psi_memstall_leave(&pflags);
        }
        delayacct_swapin_end();
-       return ret;
  }
  
  void __swap_read_unplug(struct swap_iocb *sio)
diff --combined mm/secretmem.c
index afcf46e99cda5ae1b58fa2b07af58542fa2b79b2,8453ada8f41d49b288a5c406d7183aa62efa97f8..0b502625cd30ed861fa593fbe16cee7207ac720c
@@@ -128,7 -128,7 +128,7 @@@ static int secretmem_mmap(struct file *
        if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len))
                return -EAGAIN;
  
-       vma->vm_flags |= VM_LOCKED | VM_DONTDUMP;
+       vm_flags_set(vma, VM_LOCKED | VM_DONTDUMP);
        vma->vm_ops = &secretmem_vm_ops;
  
        return 0;
@@@ -162,7 -162,7 +162,7 @@@ const struct address_space_operations s
        .migrate_folio  = secretmem_migrate_folio,
  };
  
 -static int secretmem_setattr(struct user_namespace *mnt_userns,
 +static int secretmem_setattr(struct mnt_idmap *idmap,
                             struct dentry *dentry, struct iattr *iattr)
  {
        struct inode *inode = d_inode(dentry);
        if ((ia_valid & ATTR_SIZE) && inode->i_size)
                ret = -EINVAL;
        else
 -              ret = simple_setattr(mnt_userns, dentry, iattr);
 +              ret = simple_setattr(idmap, dentry, iattr);
  
        filemap_invalidate_unlock(mapping);
  
@@@ -190,7 -190,7 +190,7 @@@ static struct vfsmount *secretmem_mnt
  
  static struct file *secretmem_file_create(unsigned long flags)
  {
-       struct file *file = ERR_PTR(-ENOMEM);
+       struct file *file;
        struct inode *inode;
        const char *anon_name = "[secretmem]";
        const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name));
diff --combined mm/shmem.c
index 41f82c5a5e28a1ffdb0054a52e46e6cf16d800b6,577b3838c6b9eb6622466d5e25105dc4dfcd431f..448f393d8ab2b1bd5f22eec603195977716a8727
@@@ -33,6 -33,7 +33,7 @@@
  #include <linux/random.h>
  #include <linux/sched/signal.h>
  #include <linux/export.h>
+ #include <linux/shmem_fs.h>
  #include <linux/swap.h>
  #include <linux/uio.h>
  #include <linux/hugetlb.h>
@@@ -58,7 -59,6 +59,6 @@@ static struct vfsmount *shm_mnt
  #include <linux/string.h>
  #include <linux/slab.h>
  #include <linux/backing-dev.h>
- #include <linux/shmem_fs.h>
  #include <linux/writeback.h>
  #include <linux/pagevec.h>
  #include <linux/percpu_counter.h>
@@@ -468,15 -468,14 +468,14 @@@ static bool shmem_confirm_swap(struct a
  
  static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER;
  
- bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode,
-                  pgoff_t index, bool shmem_huge_force)
+ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
+                  struct mm_struct *mm, unsigned long vm_flags)
  {
        loff_t i_size;
  
        if (!S_ISREG(inode->i_mode))
                return false;
-       if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
-           test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
+       if (mm && ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &mm->flags)))
                return false;
        if (shmem_huge == SHMEM_HUGE_DENY)
                return false;
                        return true;
                fallthrough;
        case SHMEM_HUGE_ADVISE:
-               if (vma && (vma->vm_flags & VM_HUGEPAGE))
+               if (mm && (vm_flags & VM_HUGEPAGE))
                        return true;
                fallthrough;
        default:
@@@ -676,8 -675,8 +675,8 @@@ static long shmem_unused_huge_count(str
  
  #define shmem_huge SHMEM_HUGE_DENY
  
- bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode,
-                  pgoff_t index, bool shmem_huge_force)
+ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
+                  struct mm_struct *mm, unsigned long vm_flags)
  {
        return false;
  }
@@@ -1045,7 -1044,7 +1044,7 @@@ void shmem_truncate_range(struct inode 
  }
  EXPORT_SYMBOL_GPL(shmem_truncate_range);
  
 -static int shmem_getattr(struct user_namespace *mnt_userns,
 +static int shmem_getattr(struct mnt_idmap *idmap,
                         const struct path *path, struct kstat *stat,
                         u32 request_mask, unsigned int query_flags)
  {
        stat->attributes_mask |= (STATX_ATTR_APPEND |
                        STATX_ATTR_IMMUTABLE |
                        STATX_ATTR_NODUMP);
 -      generic_fillattr(&init_user_ns, inode, stat);
 +      generic_fillattr(idmap, inode, stat);
  
-       if (shmem_is_huge(NULL, inode, 0, false))
+       if (shmem_is_huge(inode, 0, false, NULL, 0))
                stat->blksize = HPAGE_PMD_SIZE;
  
        if (request_mask & STATX_BTIME) {
        return 0;
  }
  
 -static int shmem_setattr(struct user_namespace *mnt_userns,
 +static int shmem_setattr(struct mnt_idmap *idmap,
                         struct dentry *dentry, struct iattr *attr)
  {
        struct inode *inode = d_inode(dentry);
        bool update_mtime = false;
        bool update_ctime = true;
  
 -      error = setattr_prepare(&init_user_ns, dentry, attr);
 +      error = setattr_prepare(idmap, dentry, attr);
        if (error)
                return error;
  
+       if ((info->seals & F_SEAL_EXEC) && (attr->ia_valid & ATTR_MODE)) {
+               if ((inode->i_mode ^ attr->ia_mode) & 0111) {
+                       return -EPERM;
+               }
+       }
        if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
                loff_t oldsize = inode->i_size;
                loff_t newsize = attr->ia_size;
                }
        }
  
 -      setattr_copy(&init_user_ns, inode, attr);
 +      setattr_copy(idmap, inode, attr);
        if (attr->ia_valid & ATTR_MODE)
 -              error = posix_acl_chmod(&init_user_ns, dentry, inode->i_mode);
 +              error = posix_acl_chmod(idmap, dentry, inode->i_mode);
        if (!error && update_ctime) {
                inode->i_ctime = current_time(inode);
                if (update_mtime)
@@@ -1733,6 -1738,7 +1738,7 @@@ static int shmem_swapin_folio(struct in
        struct address_space *mapping = inode->i_mapping;
        struct shmem_inode_info *info = SHMEM_I(inode);
        struct mm_struct *charge_mm = vma ? vma->vm_mm : NULL;
+       struct swap_info_struct *si;
        struct folio *folio = NULL;
        swp_entry_t swap;
        int error;
        if (is_swapin_error_entry(swap))
                return -EIO;
  
+       si = get_swap_device(swap);
+       if (!si) {
+               if (!shmem_confirm_swap(mapping, index, swap))
+                       return -EEXIST;
+               else
+                       return -EINVAL;
+       }
        /* Look it up and read it in.. */
        folio = swap_cache_get_folio(swap, NULL, 0);
        if (!folio) {
        delete_from_swap_cache(folio);
        folio_mark_dirty(folio);
        swap_free(swap);
+       put_swap_device(si);
  
        *foliop = folio;
        return 0;
@@@ -1817,6 -1832,7 +1832,7 @@@ unlock
                folio_unlock(folio);
                folio_put(folio);
        }
+       put_swap_device(si);
  
        return error;
  }
@@@ -1909,7 -1925,8 +1925,8 @@@ repeat
                return 0;
        }
  
-       if (!shmem_is_huge(vma, inode, index, false))
+       if (!shmem_is_huge(inode, index, false,
+                          vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
                goto alloc_nohuge;
  
        huge_gfp = vma_thp_gfp_mask(vma);
@@@ -2287,7 -2304,7 +2304,7 @@@ static int shmem_mmap(struct file *file
                return ret;
  
        /* arm64 - allow memory tagging on RAM-based files */
-       vma->vm_flags |= VM_MTE_ALLOWED;
+       vm_flags_set(vma, VM_MTE_ALLOWED);
  
        file_accessed(file);
        /* This is anonymous shared memory if it is unlinked at the time of mmap */
@@@ -2327,9 -2344,8 +2344,9 @@@ static void shmem_set_inode_flags(struc
  #define shmem_initxattrs NULL
  #endif
  
 -static struct inode *shmem_get_inode(struct super_block *sb, struct inode *dir,
 -                                   umode_t mode, dev_t dev, unsigned long flags)
 +static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block *sb,
 +                                   struct inode *dir, umode_t mode, dev_t dev,
 +                                   unsigned long flags)
  {
        struct inode *inode;
        struct shmem_inode_info *info;
        inode = new_inode(sb);
        if (inode) {
                inode->i_ino = ino;
 -              inode_init_owner(&init_user_ns, inode, dir, mode);
 +              inode_init_owner(idmap, inode, dir, mode);
                inode->i_blocks = 0;
                inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
                inode->i_generation = get_random_u32();
@@@ -2562,33 -2578,23 +2579,23 @@@ shmem_write_end(struct file *file, stru
                        loff_t pos, unsigned len, unsigned copied,
                        struct page *page, void *fsdata)
  {
+       struct folio *folio = page_folio(page);
        struct inode *inode = mapping->host;
  
        if (pos + copied > inode->i_size)
                i_size_write(inode, pos + copied);
  
-       if (!PageUptodate(page)) {
-               struct page *head = compound_head(page);
-               if (PageTransCompound(page)) {
-                       int i;
-                       for (i = 0; i < HPAGE_PMD_NR; i++) {
-                               if (head + i == page)
-                                       continue;
-                               clear_highpage(head + i);
-                               flush_dcache_page(head + i);
-                       }
-               }
-               if (copied < PAGE_SIZE) {
-                       unsigned from = pos & (PAGE_SIZE - 1);
-                       zero_user_segments(page, 0, from,
-                                       from + copied, PAGE_SIZE);
+       if (!folio_test_uptodate(folio)) {
+               if (copied < folio_size(folio)) {
+                       size_t from = offset_in_folio(folio, pos);
+                       folio_zero_segments(folio, 0, from,
+                                       from + copied, folio_size(folio));
                }
-               SetPageUptodate(head);
+               folio_mark_uptodate(folio);
        }
-       set_page_dirty(page);
-       unlock_page(page);
-       put_page(page);
+       folio_mark_dirty(folio);
+       folio_unlock(folio);
+       folio_put(folio);
  
        return copied;
  }
@@@ -2914,13 -2920,13 +2921,13 @@@ static int shmem_statfs(struct dentry *
   * File creation. Allocate an inode, and we're done..
   */
  static int
 -shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 +shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
            struct dentry *dentry, umode_t mode, dev_t dev)
  {
        struct inode *inode;
        int error = -ENOSPC;
  
 -      inode = shmem_get_inode(dir->i_sb, dir, mode, dev, VM_NORESERVE);
 +      inode = shmem_get_inode(idmap, dir->i_sb, dir, mode, dev, VM_NORESERVE);
        if (inode) {
                error = simple_acl_create(dir, inode);
                if (error)
@@@ -2945,13 -2951,13 +2952,13 @@@ out_iput
  }
  
  static int
 -shmem_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 +shmem_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
              struct file *file, umode_t mode)
  {
        struct inode *inode;
        int error = -ENOSPC;
  
 -      inode = shmem_get_inode(dir->i_sb, dir, mode, 0, VM_NORESERVE);
 +      inode = shmem_get_inode(idmap, dir->i_sb, dir, mode, 0, VM_NORESERVE);
        if (inode) {
                error = security_inode_init_security(inode, dir,
                                                     NULL,
@@@ -2969,22 -2975,22 +2976,22 @@@ out_iput
        return error;
  }
  
 -static int shmem_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 +static int shmem_mkdir(struct mnt_idmap *idmap, struct inode *dir,
                       struct dentry *dentry, umode_t mode)
  {
        int error;
  
 -      if ((error = shmem_mknod(&init_user_ns, dir, dentry,
 -                               mode | S_IFDIR, 0)))
 +      error = shmem_mknod(idmap, dir, dentry, mode | S_IFDIR, 0);
 +      if (error)
                return error;
        inc_nlink(dir);
        return 0;
  }
  
 -static int shmem_create(struct user_namespace *mnt_userns, struct inode *dir,
 +static int shmem_create(struct mnt_idmap *idmap, struct inode *dir,
                        struct dentry *dentry, umode_t mode, bool excl)
  {
 -      return shmem_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
 +      return shmem_mknod(idmap, dir, dentry, mode | S_IFREG, 0);
  }
  
  /*
@@@ -3044,7 -3050,7 +3051,7 @@@ static int shmem_rmdir(struct inode *di
        return shmem_unlink(dir, dentry);
  }
  
 -static int shmem_whiteout(struct user_namespace *mnt_userns,
 +static int shmem_whiteout(struct mnt_idmap *idmap,
                          struct inode *old_dir, struct dentry *old_dentry)
  {
        struct dentry *whiteout;
        if (!whiteout)
                return -ENOMEM;
  
 -      error = shmem_mknod(&init_user_ns, old_dir, whiteout,
 +      error = shmem_mknod(idmap, old_dir, whiteout,
                            S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
        dput(whiteout);
        if (error)
   * it exists so that the VFS layer correctly free's it when it
   * gets overwritten.
   */
 -static int shmem_rename2(struct user_namespace *mnt_userns,
 +static int shmem_rename2(struct mnt_idmap *idmap,
                         struct inode *old_dir, struct dentry *old_dentry,
                         struct inode *new_dir, struct dentry *new_dentry,
                         unsigned int flags)
        if (flags & RENAME_WHITEOUT) {
                int error;
  
 -              error = shmem_whiteout(&init_user_ns, old_dir, old_dentry);
 +              error = shmem_whiteout(idmap, old_dir, old_dentry);
                if (error)
                        return error;
        }
        return 0;
  }
  
 -static int shmem_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 +static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
                         struct dentry *dentry, const char *symname)
  {
        int error;
        if (len > PAGE_SIZE)
                return -ENAMETOOLONG;
  
 -      inode = shmem_get_inode(dir->i_sb, dir, S_IFLNK | 0777, 0,
 +      inode = shmem_get_inode(idmap, dir->i_sb, dir, S_IFLNK | 0777, 0,
                                VM_NORESERVE);
        if (!inode)
                return -ENOSPC;
@@@ -3228,7 -3234,7 +3235,7 @@@ static int shmem_fileattr_get(struct de
        return 0;
  }
  
 -static int shmem_fileattr_set(struct user_namespace *mnt_userns,
 +static int shmem_fileattr_set(struct mnt_idmap *idmap,
                              struct dentry *dentry, struct fileattr *fa)
  {
        struct inode *inode = d_inode(dentry);
@@@ -3302,7 -3308,7 +3309,7 @@@ static int shmem_xattr_handler_get(cons
  }
  
  static int shmem_xattr_handler_set(const struct xattr_handler *handler,
 -                                 struct user_namespace *mnt_userns,
 +                                 struct mnt_idmap *idmap,
                                   struct dentry *unused, struct inode *inode,
                                   const char *name, const void *value,
                                   size_t size, int flags)
@@@ -3818,8 -3824,7 +3825,8 @@@ static int shmem_fill_super(struct supe
  #endif
        uuid_gen(&sb->s_uuid);
  
 -      inode = shmem_get_inode(sb, NULL, S_IFDIR | sbinfo->mode, 0, VM_NORESERVE);
 +      inode = shmem_get_inode(&nop_mnt_idmap, sb, NULL, S_IFDIR | sbinfo->mode, 0,
 +                              VM_NORESERVE);
        if (!inode)
                goto failed;
        inode->i_uid = sbinfo->uid;
@@@ -4044,11 -4049,7 +4051,11 @@@ static struct file_system_type shmem_fs
        .parameters     = shmem_fs_parameters,
  #endif
        .kill_sb        = kill_litter_super,
 +#ifdef CONFIG_SHMEM
 +      .fs_flags       = FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
 +#else
        .fs_flags       = FS_USERNS_MOUNT,
 +#endif
  };
  
  void __init shmem_init(void)
@@@ -4200,7 -4201,7 +4207,7 @@@ EXPORT_SYMBOL_GPL(shmem_truncate_range)
  #define shmem_vm_ops                          generic_file_vm_ops
  #define shmem_anon_vm_ops                     generic_file_vm_ops
  #define shmem_file_operations                 ramfs_file_operations
 -#define shmem_get_inode(sb, dir, mode, dev, flags)    ramfs_get_inode(sb, dir, mode, dev)
 +#define shmem_get_inode(idmap, sb, dir, mode, dev, flags) ramfs_get_inode(sb, dir, mode, dev)
  #define shmem_acct_size(flags, size)          0
  #define shmem_unacct_size(flags, size)                do {} while (0)
  
@@@ -4223,11 -4224,8 +4230,11 @@@ static struct file *__shmem_file_setup(
        if (shmem_acct_size(flags, size))
                return ERR_PTR(-ENOMEM);
  
 -      inode = shmem_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0,
 -                              flags);
 +      if (is_idmapped_mnt(mnt))
 +              return ERR_PTR(-EINVAL);
 +
 +      inode = shmem_get_inode(&nop_mnt_idmap, mnt->mnt_sb, NULL,
 +                              S_IFREG | S_IRWXUGO, 0, flags);
        if (unlikely(!inode)) {
                shmem_unacct_size(flags, size);
                return ERR_PTR(-ENOSPC);
@@@ -4313,9 -4311,9 +4320,9 @@@ int shmem_zero_setup(struct vm_area_str
  }
  
  /**
-  * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags.
-  * @mapping:  the page's address_space
-  * @index:    the page index
+  * shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
+  * @mapping:  the folio's address_space
+  * @index:    the folio index
   * @gfp:      the page allocator flags to use if allocating
   *
   * This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
   * i915_gem_object_get_pages_gtt() mixes __GFP_NORETRY | __GFP_NOWARN in
   * with the mapping_gfp_mask(), to avoid OOMing the machine unnecessarily.
   */
- struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
-                                        pgoff_t index, gfp_t gfp)
+ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
+               pgoff_t index, gfp_t gfp)
  {
  #ifdef CONFIG_SHMEM
        struct inode *inode = mapping->host;
        struct folio *folio;
-       struct page *page;
        int error;
  
        BUG_ON(!shmem_mapping(mapping));
                return ERR_PTR(error);
  
        folio_unlock(folio);
+       return folio;
+ #else
+       /*
+        * The tiny !SHMEM case uses ramfs without swap
+        */
+       return mapping_read_folio_gfp(mapping, index, gfp);
+ #endif
+ }
+ EXPORT_SYMBOL_GPL(shmem_read_folio_gfp);
+ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+                                        pgoff_t index, gfp_t gfp)
+ {
+       struct folio *folio = shmem_read_folio_gfp(mapping, index, gfp);
+       struct page *page;
+       if (IS_ERR(folio))
+               return &folio->page;
        page = folio_file_page(folio, index);
        if (PageHWPoison(page)) {
                folio_put(folio);
        }
  
        return page;
- #else
-       /*
-        * The tiny !SHMEM case uses ramfs without swap
-        */
-       return read_cache_page_gfp(mapping, index, gfp);
- #endif
  }
  EXPORT_SYMBOL_GPL(shmem_read_mapping_page_gfp);
diff --combined mm/slab.c
index 74ece29e3a7ea12898c8dc20a33ff7e16def7b38,b77be9c6d6b1b3daec2192e4027b0055a937de58..dabc2a671fc6f7ffc5da481a3b7a9574500cdb0a
+++ b/mm/slab.c
@@@ -220,6 -220,7 +220,6 @@@ static inline void fixup_objfreelist_de
  static inline void fixup_slab_list(struct kmem_cache *cachep,
                                struct kmem_cache_node *n, struct slab *slab,
                                void **list);
 -static int slab_early_init = 1;
  
  #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node))
  
@@@ -1248,6 -1249,8 +1248,6 @@@ void __init kmem_cache_init(void
        slab_state = PARTIAL_NODE;
        setup_kmalloc_cache_index_table();
  
 -      slab_early_init = 0;
 -
        /* 5) Replace the bootstrap kmem_cache_node */
        {
                int nid;
@@@ -1370,7 -1373,7 +1370,7 @@@ static struct slab *kmem_getpages(struc
        /* Make the flag visible before any changes to folio->mapping */
        smp_wmb();
        /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
-       if (sk_memalloc_socks() && page_is_pfmemalloc(folio_page(folio, 0)))
+       if (sk_memalloc_socks() && folio_is_pfmemalloc(folio))
                slab_set_pfmemalloc(slab);
  
        return slab;
@@@ -1386,7 -1389,7 +1386,7 @@@ static void kmem_freepages(struct kmem_
  
        BUG_ON(!folio_test_slab(folio));
        __slab_clear_pfmemalloc(slab);
 -      page_mapcount_reset(folio_page(folio, 0));
 +      page_mapcount_reset(&folio->page);
        folio->mapping = NULL;
        /* Make the mapping reset visible before clearing the flag */
        smp_wmb();
        if (current->reclaim_state)
                current->reclaim_state->reclaimed_slab += 1 << order;
        unaccount_slab(slab, order, cachep);
 -      __free_pages(folio_page(folio, 0), order);
 +      __free_pages(&folio->page, order);
  }
  
  static void kmem_rcu_free(struct rcu_head *head)
  }
  
  #if DEBUG
 -static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
 +static inline bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
  {
 -      if (debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
 -              (cachep->size % PAGE_SIZE) == 0)
 -              return true;
 -
 -      return false;
 +      return debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
 +                      ((cachep->size % PAGE_SIZE) == 0);
  }
  
  #ifdef CONFIG_DEBUG_PAGEALLOC
@@@ -2205,8 -2211,6 +2205,8 @@@ static int drain_freelist(struct kmem_c
                raw_spin_unlock_irq(&n->list_lock);
                slab_destroy(cache, slab);
                nr_freed++;
 +
 +              cond_resched();
        }
  out:
        return nr_freed;
@@@ -3473,15 -3477,14 +3473,15 @@@ cache_alloc_debugcheck_after_bulk(struc
  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
                          void **p)
  {
 -      size_t i;
        struct obj_cgroup *objcg = NULL;
 +      unsigned long irqflags;
 +      size_t i;
  
        s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
        if (!s)
                return 0;
  
 -      local_irq_disable();
 +      local_irq_save(irqflags);
        for (i = 0; i < size; i++) {
                void *objp = kfence_alloc(s, s->object_size, flags) ?:
                             __do_cache_alloc(s, flags, NUMA_NO_NODE);
                        goto error;
                p[i] = objp;
        }
 -      local_irq_enable();
 +      local_irq_restore(irqflags);
  
        cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
  
        /* FIXME: Trace call missing. Christoph would like a bulk variant */
        return size;
  error:
 -      local_irq_enable();
 +      local_irq_restore(irqflags);
        cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_);
        slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size);
        kmem_cache_free_bulk(s, i, p);
@@@ -3605,9 -3608,8 +3605,9 @@@ EXPORT_SYMBOL(kmem_cache_free)
  
  void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
  {
 +      unsigned long flags;
  
 -      local_irq_disable();
 +      local_irq_save(flags);
        for (int i = 0; i < size; i++) {
                void *objp = p[i];
                struct kmem_cache *s;
  
                        /* called via kfree_bulk */
                        if (!folio_test_slab(folio)) {
 -                              local_irq_enable();
 +                              local_irq_restore(flags);
                                free_large_kmalloc(folio, objp);
 -                              local_irq_disable();
 +                              local_irq_save(flags);
                                continue;
                        }
                        s = folio_slab(folio)->slab_cache;
  
                __cache_free(s, objp, _RET_IP_);
        }
 -      local_irq_enable();
 +      local_irq_restore(flags);
  
        /* FIXME: add tracing */
  }
diff --combined mm/slub.c
index 1013834fb7bb6a7c2d23c3ede30a5a481fd521a1,f8dba33e4d15ff71f4c72fc53a8f2285729c3cca..39327e98fce347a7c7c4d55850d7850f8c468907
+++ b/mm/slub.c
@@@ -1592,7 -1592,7 +1592,7 @@@ static int __init setup_slub_debug(cha
                } else {
                        slab_list_specified = true;
                        if (flags & SLAB_STORE_USER)
-                               stack_depot_want_early_init();
+                               stack_depot_request_early_init();
                }
        }
  
  out:
        slub_debug = global_flags;
        if (slub_debug & SLAB_STORE_USER)
-               stack_depot_want_early_init();
+               stack_depot_request_early_init();
        if (slub_debug != 0 || slub_debug_string)
                static_branch_enable(&slub_debug_enabled);
        else
@@@ -1859,7 -1859,7 +1859,7 @@@ static inline struct slab *alloc_slab_p
        __folio_set_slab(folio);
        /* Make the flag visible before any changes to folio->mapping */
        smp_wmb();
-       if (page_is_pfmemalloc(folio_page(folio, 0)))
+       if (folio_is_pfmemalloc(folio))
                slab_set_pfmemalloc(slab);
  
        return slab;
@@@ -2066,7 -2066,7 +2066,7 @@@ static void __free_slab(struct kmem_cac
        if (current->reclaim_state)
                current->reclaim_state->reclaimed_slab += pages;
        unaccount_slab(slab, order, s);
 -      __free_pages(folio_page(folio, 0), order);
 +      __free_pages(&folio->page, order);
  }
  
  static void rcu_free_slab(struct rcu_head *h)
@@@ -3913,7 -3913,6 +3913,7 @@@ static inline int __kmem_cache_alloc_bu
                        size_t size, void **p, struct obj_cgroup *objcg)
  {
        struct kmem_cache_cpu *c;
 +      unsigned long irqflags;
        int i;
  
        /*
         * handlers invoking normal fastpath.
         */
        c = slub_get_cpu_ptr(s->cpu_slab);
 -      local_lock_irq(&s->cpu_slab->lock);
 +      local_lock_irqsave(&s->cpu_slab->lock, irqflags);
  
        for (i = 0; i < size; i++) {
                void *object = kfence_alloc(s, s->object_size, flags);
                         */
                        c->tid = next_tid(c->tid);
  
 -                      local_unlock_irq(&s->cpu_slab->lock);
 +                      local_unlock_irqrestore(&s->cpu_slab->lock, irqflags);
  
                        /*
                         * Invoking slow path likely have side-effect
                        c = this_cpu_ptr(s->cpu_slab);
                        maybe_wipe_obj_freeptr(s, p[i]);
  
 -                      local_lock_irq(&s->cpu_slab->lock);
 +                      local_lock_irqsave(&s->cpu_slab->lock, irqflags);
  
                        continue; /* goto for-loop */
                }
                maybe_wipe_obj_freeptr(s, p[i]);
        }
        c->tid = next_tid(c->tid);
 -      local_unlock_irq(&s->cpu_slab->lock);
 +      local_unlock_irqrestore(&s->cpu_slab->lock, irqflags);
        slub_put_cpu_ptr(s->cpu_slab);
  
        return i;
@@@ -6450,7 -6449,7 +6450,7 @@@ static void debugfs_slab_add(struct kme
  
  void debugfs_slab_release(struct kmem_cache *s)
  {
 -      debugfs_remove_recursive(debugfs_lookup(s->name, slab_debugfs_root));
 +      debugfs_lookup_and_remove(s->name, slab_debugfs_root);
  }
  
  static int __init slab_debugfs_init(void)
diff --combined mm/swap.c
index 4c03ecab698ea49c268eb93f292b28f523201c72,2a51faa34e641c7ad8a303c1d63e0f4adecbc02b..57cb01b042f6231b87378529fe02c84116afb968
+++ b/mm/swap.c
@@@ -158,6 -158,36 +158,6 @@@ void put_pages_list(struct list_head *p
  }
  EXPORT_SYMBOL(put_pages_list);
  
 -/*
 - * get_kernel_pages() - pin kernel pages in memory
 - * @kiov:     An array of struct kvec structures
 - * @nr_segs:  number of segments to pin
 - * @write:    pinning for read/write, currently ignored
 - * @pages:    array that receives pointers to the pages pinned.
 - *            Should be at least nr_segs long.
 - *
 - * Returns number of pages pinned. This may be fewer than the number requested.
 - * If nr_segs is 0 or negative, returns 0.  If no pages were pinned, returns 0.
 - * Each page returned must be released with a put_page() call when it is
 - * finished with.
 - */
 -int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
 -              struct page **pages)
 -{
 -      int seg;
 -
 -      for (seg = 0; seg < nr_segs; seg++) {
 -              if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
 -                      return seg;
 -
 -              pages[seg] = kmap_to_page(kiov[seg].iov_base);
 -              get_page(pages[seg]);
 -      }
 -
 -      return seg;
 -}
 -EXPORT_SYMBOL_GPL(get_kernel_pages);
 -
  typedef void (*move_fn_t)(struct lruvec *lruvec, struct folio *folio);
  
  static void lru_add_fn(struct lruvec *lruvec, struct folio *folio)
         * Is an smp_mb__after_atomic() still required here, before
         * folio_evictable() tests the mlocked flag, to rule out the possibility
         * of stranding an evictable folio on an unevictable LRU?  I think
-        * not, because __munlock_page() only clears the mlocked flag
+        * not, because __munlock_folio() only clears the mlocked flag
         * while the LRU lock is held.
         *
         * (That is not true of __page_cache_release(), and not necessarily
                folio_set_unevictable(folio);
                /*
                 * folio->mlock_count = !!folio_test_mlocked(folio)?
-                * But that leaves __mlock_page() in doubt whether another
+                * But that leaves __mlock_folio() in doubt whether another
                 * actor has already counted the mlock or not.  Err on the
                 * safe side, underestimate, let page reclaim fix it, rather
                 * than leaving a page on the unevictable LRU indefinitely.
@@@ -532,7 -562,7 +532,7 @@@ void folio_add_lru_vma(struct folio *fo
        VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
  
        if (unlikely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED))
-               mlock_new_page(&folio->page);
+               mlock_new_folio(folio);
        else
                folio_add_lru(folio);
  }
@@@ -703,17 -733,15 +703,15 @@@ void deactivate_file_folio(struct foli
  }
  
  /*
-  * deactivate_page - deactivate a page
-  * @page: page to deactivate
+  * folio_deactivate - deactivate a folio
+  * @folio: folio to deactivate
   *
-  * deactivate_page() moves @page to the inactive list if @page was on the active
-  * list and was not an unevictable page.  This is done to accelerate the reclaim
-  * of @page.
+  * folio_deactivate() moves @folio to the inactive list if @folio was on the
+  * active list and was not unevictable. This is done to accelerate the
+  * reclaim of @folio.
   */
- void deactivate_page(struct page *page)
+ void folio_deactivate(struct folio *folio)
  {
-       struct folio *folio = page_folio(page);
        if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
            (folio_test_active(folio) || lru_gen_enabled())) {
                struct folio_batch *fbatch;
  }
  
  /**
-  * mark_page_lazyfree - make an anon page lazyfree
-  * @page: page to deactivate
+  * folio_mark_lazyfree - make an anon folio lazyfree
+  * @folio: folio to deactivate
   *
-  * mark_page_lazyfree() moves @page to the inactive file list.
-  * This is done to accelerate the reclaim of @page.
+  * folio_mark_lazyfree() moves @folio to the inactive file list.
+  * This is done to accelerate the reclaim of @folio.
   */
- void mark_page_lazyfree(struct page *page)
+ void folio_mark_lazyfree(struct folio *folio)
  {
-       struct folio *folio = page_folio(page);
        if (folio_test_lru(folio) && folio_test_anon(folio) &&
            folio_test_swapbacked(folio) && !folio_test_swapcache(folio) &&
            !folio_test_unevictable(folio)) {
@@@ -755,7 -781,7 +751,7 @@@ void lru_add_drain(void
        local_lock(&cpu_fbatches.lock);
        lru_add_drain_cpu(smp_processor_id());
        local_unlock(&cpu_fbatches.lock);
-       mlock_page_drain_local();
+       mlock_drain_local();
  }
  
  /*
@@@ -770,7 -796,7 +766,7 @@@ static void lru_add_and_bh_lrus_drain(v
        lru_add_drain_cpu(smp_processor_id());
        local_unlock(&cpu_fbatches.lock);
        invalidate_bh_lrus_cpu();
-       mlock_page_drain_local();
+       mlock_drain_local();
  }
  
  void lru_add_drain_cpu_zone(struct zone *zone)
        lru_add_drain_cpu(smp_processor_id());
        drain_local_pages(zone);
        local_unlock(&cpu_fbatches.lock);
-       mlock_page_drain_local();
+       mlock_drain_local();
  }
  
  #ifdef CONFIG_SMP
@@@ -802,7 -828,7 +798,7 @@@ static bool cpu_needs_drain(unsigned in
                folio_batch_count(&fbatches->lru_deactivate) ||
                folio_batch_count(&fbatches->lru_lazyfree) ||
                folio_batch_count(&fbatches->activate) ||
-               need_mlock_page_drain(cpu) ||
+               need_mlock_drain(cpu) ||
                has_bh_in_lru(cpu, NULL);
  }
  
@@@ -1089,16 -1115,6 +1085,6 @@@ void folio_batch_remove_exceptionals(st
        fbatch->nr = j;
  }
  
- unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
-               struct address_space *mapping, pgoff_t *index, pgoff_t end,
-               xa_mark_t tag)
- {
-       pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
-                                       PAGEVEC_SIZE, pvec->pages);
-       return pagevec_count(pvec);
- }
- EXPORT_SYMBOL(pagevec_lookup_range_tag);
  /*
   * Perform any setup for the swap system
   */
diff --combined mm/swapfile.c
index 0ab52d16bde6699b649191a750aedeb4a3242443,da8b8e1f97f14ccbd7a1cb73901b67bed1dc4c3d..62ba2bf577d7e7210398181b47f11fca9e5f5099
@@@ -1098,8 -1098,6 +1098,6 @@@ start_over
                spin_unlock(&si->lock);
                if (n_ret || size == SWAPFILE_CLUSTER)
                        goto check_out;
-               pr_debug("scan_swap_map of si %d failed to find offset\n",
-                       si->type);
                cond_resched();
  
                spin_lock(&swap_avail_lock);
@@@ -1844,13 -1842,13 +1842,13 @@@ static int unuse_pte_range(struct vm_ar
        pte_t *pte;
        struct swap_info_struct *si;
        int ret = 0;
-       volatile unsigned char *swap_map;
  
        si = swap_info[type];
        pte = pte_offset_map(pmd, addr);
        do {
                struct folio *folio;
                unsigned long offset;
+               unsigned char swp_count;
  
                if (!is_swap_pte(*pte))
                        continue;
  
                offset = swp_offset(entry);
                pte_unmap(pte);
-               swap_map = &si->swap_map[offset];
                folio = swap_cache_get_folio(entry, vma, addr);
                if (!folio) {
                        struct page *page;
                                folio = page_folio(page);
                }
                if (!folio) {
-                       if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD)
+                       swp_count = READ_ONCE(si->swap_map[offset]);
+                       if (swp_count == 0 || swp_count == SWAP_MAP_BAD)
                                goto try_next;
                        return -ENOMEM;
                }
  
@@@ -3078,7 -3077,7 +3077,7 @@@ SYSCALL_DEFINE2(swapon, const char __us
        if (p->bdev && bdev_stable_writes(p->bdev))
                p->flags |= SWP_STABLE_WRITES;
  
-       if (p->bdev && p->bdev->bd_disk->fops->rw_page)
+       if (p->bdev && bdev_synchronous(p->bdev))
                p->flags |= SWP_SYNCHRONOUS_IO;
  
        if (p->bdev && bdev_nonrot(p->bdev)) {
@@@ -3651,7 -3650,7 +3650,7 @@@ void __cgroup_throttle_swaprate(struct 
         * We've already scheduled a throttle, avoid taking the global swap
         * lock.
         */
 -      if (current->throttle_queue)
 +      if (current->throttle_disk)
                return;
  
        spin_lock(&swap_avail_lock);
diff --combined mm/vmalloc.c
index 61f5bec0f2b6aed7b83041b313fbcdd5ec04724a,3b57260b6d3927063f6d90860741b84a8bc21338..ef910bf349e1361e64edc0ed0166605160a0fe52
@@@ -89,17 -89,6 +89,6 @@@ struct vfree_deferred 
  };
  static DEFINE_PER_CPU(struct vfree_deferred, vfree_deferred);
  
- static void __vunmap(const void *, int);
- static void free_work(struct work_struct *w)
- {
-       struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
-       struct llist_node *t, *llnode;
-       llist_for_each_safe(llnode, t, llist_del_all(&p->list))
-               __vunmap((void *)llnode, 1);
- }
  /*** Page table manipulation functions ***/
  static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
                        phys_addr_t phys_addr, pgprot_t prot,
@@@ -656,7 -645,6 +645,7 @@@ int is_vmalloc_or_module_addr(const voi
  #endif
        return is_vmalloc_addr(x);
  }
 +EXPORT_SYMBOL_GPL(is_vmalloc_or_module_addr);
  
  /*
   * Walk a vmap address to the struct page it maps. Huge vmap mappings will
@@@ -1590,7 -1578,8 +1579,8 @@@ preload_this_cpu_lock(spinlock_t *lock
  static struct vmap_area *alloc_vmap_area(unsigned long size,
                                unsigned long align,
                                unsigned long vstart, unsigned long vend,
-                               int node, gfp_t gfp_mask)
+                               int node, gfp_t gfp_mask,
+                               unsigned long va_flags)
  {
        struct vmap_area *va;
        unsigned long freed;
        int purged = 0;
        int ret;
  
-       BUG_ON(!size);
-       BUG_ON(offset_in_page(size));
-       BUG_ON(!is_power_of_2(align));
+       if (unlikely(!size || offset_in_page(size) || !is_power_of_2(align)))
+               return ERR_PTR(-EINVAL);
  
        if (unlikely(!vmap_initialized))
                return ERR_PTR(-EBUSY);
@@@ -1636,6 -1624,7 +1625,7 @@@ retry
        va->va_start = addr;
        va->va_end = addr + size;
        va->vm = NULL;
+       va->flags = va_flags;
  
        spin_lock(&vmap_area_lock);
        insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
@@@ -1816,9 -1805,9 +1806,9 @@@ static void drain_vmap_area_work(struc
  }
  
  /*
-  * Free a vmap area, caller ensuring that the area has been unmapped
-  * and flush_cache_vunmap had been called for the correct range
-  * previously.
+  * Free a vmap area, caller ensuring that the area has been unmapped,
+  * unlinked and flush_cache_vunmap had been called for the correct
+  * range previously.
   */
  static void free_vmap_area_noflush(struct vmap_area *va)
  {
        unsigned long va_start = va->va_start;
        unsigned long nr_lazy;
  
-       spin_lock(&vmap_area_lock);
-       unlink_va(va, &vmap_area_root);
-       spin_unlock(&vmap_area_lock);
+       if (WARN_ON_ONCE(!list_empty(&va->list)))
+               return;
  
        nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >>
                                PAGE_SHIFT, &vmap_lazy_nr);
@@@ -1872,6 -1860,19 +1861,19 @@@ struct vmap_area *find_vmap_area(unsign
        return va;
  }
  
+ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
+ {
+       struct vmap_area *va;
+       spin_lock(&vmap_area_lock);
+       va = __find_vmap_area(addr, &vmap_area_root);
+       if (va)
+               unlink_va(va, &vmap_area_root);
+       spin_unlock(&vmap_area_lock);
+       return va;
+ }
  /*** Per cpu kva allocator ***/
  
  /*
  
  #define VMAP_BLOCK_SIZE               (VMAP_BBMAP_BITS * PAGE_SIZE)
  
+ #define VMAP_RAM              0x1 /* indicates vm_map_ram area*/
+ #define VMAP_BLOCK            0x2 /* mark out the vmap_block sub-type*/
+ #define VMAP_FLAGS_MASK               0x3
  struct vmap_block_queue {
        spinlock_t lock;
        struct list_head free;
@@@ -1911,6 -1916,7 +1917,7 @@@ struct vmap_block 
        spinlock_t lock;
        struct vmap_area *va;
        unsigned long free, dirty;
+       DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
        unsigned long dirty_min, dirty_max; /*< dirty range */
        struct list_head free_list;
        struct rcu_head rcu_head;
@@@ -1976,7 -1982,8 +1983,8 @@@ static void *new_vmap_block(unsigned in
  
        va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
                                        VMALLOC_START, VMALLOC_END,
-                                       node, gfp_mask);
+                                       node, gfp_mask,
+                                       VMAP_RAM|VMAP_BLOCK);
        if (IS_ERR(va)) {
                kfree(vb);
                return ERR_CAST(va);
        vb->va = va;
        /* At least something should be left free */
        BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+       bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
        vb->free = VMAP_BBMAP_BITS - (1UL << order);
        vb->dirty = 0;
        vb->dirty_min = VMAP_BBMAP_BITS;
        vb->dirty_max = 0;
+       bitmap_set(vb->used_map, 0, (1UL << order));
        INIT_LIST_HEAD(&vb->free_list);
  
        vb_idx = addr_to_vb_idx(va->va_start);
@@@ -2016,6 -2025,10 +2026,10 @@@ static void free_vmap_block(struct vmap
        tmp = xa_erase(&vmap_blocks, addr_to_vb_idx(vb->va->va_start));
        BUG_ON(tmp != vb);
  
+       spin_lock(&vmap_area_lock);
+       unlink_va(vb->va, &vmap_area_root);
+       spin_unlock(&vmap_area_lock);
        free_vmap_area_noflush(vb->va);
        kfree_rcu(vb, rcu_head);
  }
@@@ -2096,6 -2109,7 +2110,7 @@@ static void *vb_alloc(unsigned long siz
                pages_off = VMAP_BBMAP_BITS - vb->free;
                vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
                vb->free -= 1UL << order;
+               bitmap_set(vb->used_map, pages_off, (1UL << order));
                if (vb->free == 0) {
                        spin_lock(&vbq->lock);
                        list_del_rcu(&vb->free_list);
@@@ -2129,6 -2143,9 +2144,9 @@@ static void vb_free(unsigned long addr
        order = get_order(size);
        offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
        vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
+       spin_lock(&vb->lock);
+       bitmap_clear(vb->used_map, offset, (1UL << order));
+       spin_unlock(&vb->lock);
  
        vunmap_range_noflush(addr, addr + size);
  
@@@ -2237,8 -2254,10 +2255,10 @@@ void vm_unmap_ram(const void *mem, unsi
                return;
        }
  
-       va = find_vmap_area(addr);
-       BUG_ON(!va);
+       va = find_unlink_vmap_area(addr);
+       if (WARN_ON_ONCE(!va))
+               return;
        debug_check_no_locks_freed((void *)va->va_start,
                                    (va->va_end - va->va_start));
        free_unmap_vmap_area(va);
@@@ -2273,7 -2292,8 +2293,8 @@@ void *vm_map_ram(struct page **pages, u
        } else {
                struct vmap_area *va;
                va = alloc_vmap_area(size, PAGE_SIZE,
-                               VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
+                               VMALLOC_START, VMALLOC_END,
+                               node, GFP_KERNEL, VMAP_RAM);
                if (IS_ERR(va))
                        return NULL;
  
@@@ -2417,48 -2437,6 +2438,6 @@@ static void vmap_init_free_space(void
        }
  }
  
- void __init vmalloc_init(void)
- {
-       struct vmap_area *va;
-       struct vm_struct *tmp;
-       int i;
-       /*
-        * Create the cache for vmap_area objects.
-        */
-       vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
-       for_each_possible_cpu(i) {
-               struct vmap_block_queue *vbq;
-               struct vfree_deferred *p;
-               vbq = &per_cpu(vmap_block_queue, i);
-               spin_lock_init(&vbq->lock);
-               INIT_LIST_HEAD(&vbq->free);
-               p = &per_cpu(vfree_deferred, i);
-               init_llist_head(&p->list);
-               INIT_WORK(&p->wq, free_work);
-       }
-       /* Import existing vmlist entries. */
-       for (tmp = vmlist; tmp; tmp = tmp->next) {
-               va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
-               if (WARN_ON_ONCE(!va))
-                       continue;
-               va->va_start = (unsigned long)tmp->addr;
-               va->va_end = va->va_start + tmp->size;
-               va->vm = tmp;
-               insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
-       }
-       /*
-        * Now we can initialize a free vmap space.
-        */
-       vmap_init_free_space();
-       vmap_initialized = true;
- }
  static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
        struct vmap_area *va, unsigned long flags, const void *caller)
  {
@@@ -2513,7 -2491,7 +2492,7 @@@ static struct vm_struct *__get_vm_area_
        if (!(flags & VM_NO_GUARD))
                size += PAGE_SIZE;
  
-       va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
+       va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
        if (IS_ERR(va)) {
                kfree(area);
                return NULL;
@@@ -2605,25 -2583,26 +2584,26 @@@ struct vm_struct *find_vm_area(const vo
  struct vm_struct *remove_vm_area(const void *addr)
  {
        struct vmap_area *va;
+       struct vm_struct *vm;
  
        might_sleep();
  
-       spin_lock(&vmap_area_lock);
-       va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
-       if (va && va->vm) {
-               struct vm_struct *vm = va->vm;
-               va->vm = NULL;
-               spin_unlock(&vmap_area_lock);
+       if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+                       addr))
+               return NULL;
  
-               kasan_free_module_shadow(vm);
-               free_unmap_vmap_area(va);
+       va = find_unlink_vmap_area((unsigned long)addr);
+       if (!va || !va->vm)
+               return NULL;
+       vm = va->vm;
  
-               return vm;
-       }
+       debug_check_no_locks_freed(vm->addr, get_vm_area_size(vm));
+       debug_check_no_obj_freed(vm->addr, get_vm_area_size(vm));
+       kasan_free_module_shadow(vm);
+       kasan_poison_vmalloc(vm->addr, get_vm_area_size(vm));
  
-       spin_unlock(&vmap_area_lock);
-       return NULL;
+       free_unmap_vmap_area(va);
+       return vm;
  }
  
  static inline void set_area_direct_map(const struct vm_struct *area,
                        set_direct_map(area->pages[i]);
  }
  
- /* Handle removing and resetting vm mappings related to the vm_struct. */
- static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
+ /*
+  * Flush the vm mapping and reset the direct map.
+  */
+ static void vm_reset_perms(struct vm_struct *area)
  {
        unsigned long start = ULONG_MAX, end = 0;
        unsigned int page_order = vm_area_page_order(area);
-       int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
        int flush_dmap = 0;
        int i;
  
-       remove_vm_area(area->addr);
-       /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */
-       if (!flush_reset)
-               return;
        /*
-        * If not deallocating pages, just do the flush of the VM area and
-        * return.
-        */
-       if (!deallocate_pages) {
-               vm_unmap_aliases();
-               return;
-       }
-       /*
-        * If execution gets here, flush the vm mapping and reset the direct
-        * map. Find the start and end range of the direct mappings to make sure
+        * Find the start and end range of the direct mappings to make sure that
         * the vm_unmap_aliases() flush includes the direct map.
         */
        for (i = 0; i < area->nr_pages; i += 1U << page_order) {
                unsigned long addr = (unsigned long)page_address(area->pages[i]);
                if (addr) {
                        unsigned long page_size;
  
        set_area_direct_map(area, set_direct_map_default_noflush);
  }
  
- static void __vunmap(const void *addr, int deallocate_pages)
+ static void delayed_vfree_work(struct work_struct *w)
  {
-       struct vm_struct *area;
-       if (!addr)
-               return;
-       if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
-                       addr))
-               return;
-       area = find_vm_area(addr);
-       if (unlikely(!area)) {
-               WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
-                               addr);
-               return;
-       }
-       debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
-       debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
-       kasan_poison_vmalloc(area->addr, get_vm_area_size(area));
-       vm_remove_mappings(area, deallocate_pages);
-       if (deallocate_pages) {
-               int i;
-               for (i = 0; i < area->nr_pages; i++) {
-                       struct page *page = area->pages[i];
-                       BUG_ON(!page);
-                       mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
-                       /*
-                        * High-order allocs for huge vmallocs are split, so
-                        * can be freed as an array of order-0 allocations
-                        */
-                       __free_pages(page, 0);
-                       cond_resched();
-               }
-               atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
-               kvfree(area->pages);
-       }
-       kfree(area);
- }
- static inline void __vfree_deferred(const void *addr)
- {
-       /*
-        * Use raw_cpu_ptr() because this can be called from preemptible
-        * context. Preemption is absolutely fine here, because the llist_add()
-        * implementation is lockless, so it works even if we are adding to
-        * another cpu's list. schedule_work() should be fine with this too.
-        */
-       struct vfree_deferred *p = raw_cpu_ptr(&vfree_deferred);
+       struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
+       struct llist_node *t, *llnode;
  
-       if (llist_add((struct llist_node *)addr, &p->list))
-               schedule_work(&p->wq);
+       llist_for_each_safe(llnode, t, llist_del_all(&p->list))
+               vfree(llnode);
  }
  
  /**
   */
  void vfree_atomic(const void *addr)
  {
-       BUG_ON(in_nmi());
+       struct vfree_deferred *p = raw_cpu_ptr(&vfree_deferred);
  
+       BUG_ON(in_nmi());
        kmemleak_free(addr);
  
-       if (!addr)
-               return;
-       __vfree_deferred(addr);
- }
- static void __vfree(const void *addr)
- {
-       if (unlikely(in_interrupt()))
-               __vfree_deferred(addr);
-       else
-               __vunmap(addr, 1);
+       /*
+        * Use raw_cpu_ptr() because this can be called from preemptible
+        * context. Preemption is absolutely fine here, because the llist_add()
+        * implementation is lockless, so it works even if we are adding to
+        * another cpu's list. schedule_work() should be fine with this too.
+        */
+       if (addr && llist_add((struct llist_node *)addr, &p->list))
+               schedule_work(&p->wq);
  }
  
  /**
   */
  void vfree(const void *addr)
  {
-       BUG_ON(in_nmi());
+       struct vm_struct *vm;
+       int i;
  
-       kmemleak_free(addr);
+       if (unlikely(in_interrupt())) {
+               vfree_atomic(addr);
+               return;
+       }
  
-       might_sleep_if(!in_interrupt());
+       BUG_ON(in_nmi());
+       kmemleak_free(addr);
+       might_sleep();
  
        if (!addr)
                return;
  
-       __vfree(addr);
+       vm = remove_vm_area(addr);
+       if (unlikely(!vm)) {
+               WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+                               addr);
+               return;
+       }
+       if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
+               vm_reset_perms(vm);
+       for (i = 0; i < vm->nr_pages; i++) {
+               struct page *page = vm->pages[i];
+               BUG_ON(!page);
+               mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+               /*
+                * High-order allocs for huge vmallocs are split, so
+                * can be freed as an array of order-0 allocations
+                */
+               __free_pages(page, 0);
+               cond_resched();
+       }
+       atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+       kvfree(vm->pages);
+       kfree(vm);
  }
  EXPORT_SYMBOL(vfree);
  
   */
  void vunmap(const void *addr)
  {
+       struct vm_struct *vm;
        BUG_ON(in_interrupt());
        might_sleep();
-       if (addr)
-               __vunmap(addr, 0);
+       if (!addr)
+               return;
+       vm = remove_vm_area(addr);
+       if (unlikely(!vm)) {
+               WARN(1, KERN_ERR "Trying to vunmap() nonexistent vm area (%p)\n",
+                               addr);
+               return;
+       }
+       kfree(vm);
  }
  EXPORT_SYMBOL(vunmap);
  
@@@ -2850,6 -2799,9 +2800,9 @@@ void *vmap(struct page **pages, unsigne
  
        might_sleep();
  
+       if (WARN_ON_ONCE(flags & VM_FLUSH_RESET_PERMS))
+               return NULL;
        /*
         * Your top guard is someone else's bottom guard. Not having a top
         * guard compromises someone else's mappings too.
@@@ -3032,7 -2984,7 +2985,7 @@@ static void *__vmalloc_area_node(struc
        int ret;
  
        array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
-       gfp_mask |= __GFP_NOWARN;
        if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
                gfp_mask |= __GFP_HIGHMEM;
  
  
        /*
         * If not enough pages were obtained to accomplish an
-        * allocation request, free them via __vfree() if any.
+        * allocation request, free them via vfree() if any.
         */
        if (area->nr_pages != nr_small_pages) {
                warn_alloc(gfp_mask, NULL,
        return area->addr;
  
  fail:
-       __vfree(area->addr);
+       vfree(area->addr);
        return NULL;
  }
  
@@@ -3511,6 -3463,68 +3464,68 @@@ static int aligned_vread(char *buf, cha
        return copied;
  }
  
+ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
+ {
+       char *start;
+       struct vmap_block *vb;
+       unsigned long offset;
+       unsigned int rs, re, n;
+       /*
+        * If it's area created by vm_map_ram() interface directly, but
+        * not further subdividing and delegating management to vmap_block,
+        * handle it here.
+        */
+       if (!(flags & VMAP_BLOCK)) {
+               aligned_vread(buf, addr, count);
+               return;
+       }
+       /*
+        * Area is split into regions and tracked with vmap_block, read out
+        * each region and zero fill the hole between regions.
+        */
+       vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
+       if (!vb)
+               goto finished;
+       spin_lock(&vb->lock);
+       if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
+               spin_unlock(&vb->lock);
+               goto finished;
+       }
+       for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
+               if (!count)
+                       break;
+               start = vmap_block_vaddr(vb->va->va_start, rs);
+               while (addr < start) {
+                       if (count == 0)
+                               goto unlock;
+                       *buf = '\0';
+                       buf++;
+                       addr++;
+                       count--;
+               }
+               /*it could start reading from the middle of used region*/
+               offset = offset_in_page(addr);
+               n = ((re - rs + 1) << PAGE_SHIFT) - offset;
+               if (n > count)
+                       n = count;
+               aligned_vread(buf, start+offset, n);
+               buf += n;
+               addr += n;
+               count -= n;
+       }
+ unlock:
+       spin_unlock(&vb->lock);
+ finished:
+       /* zero-fill the left dirty or free regions */
+       if (count)
+               memset(buf, 0, count);
+ }
  /**
   * vread() - read vmalloc area in a safe way.
   * @buf:     buffer for reading data
@@@ -3541,7 -3555,7 +3556,7 @@@ long vread(char *buf, char *addr, unsig
        struct vm_struct *vm;
        char *vaddr, *buf_start = buf;
        unsigned long buflen = count;
-       unsigned long n;
+       unsigned long n, size, flags;
  
        addr = kasan_reset_tag(addr);
  
                if (!count)
                        break;
  
-               if (!va->vm)
+               vm = va->vm;
+               flags = va->flags & VMAP_FLAGS_MASK;
+               /*
+                * VMAP_BLOCK indicates a sub-type of vm_map_ram area, need
+                * be set together with VMAP_RAM.
+                */
+               WARN_ON(flags == VMAP_BLOCK);
+               if (!vm && !flags)
                        continue;
  
-               vm = va->vm;
-               vaddr = (char *) vm->addr;
-               if (addr >= vaddr + get_vm_area_size(vm))
+               if (vm && (vm->flags & VM_UNINITIALIZED))
+                       continue;
+               /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
+               smp_rmb();
+               vaddr = (char *) va->va_start;
+               size = vm ? get_vm_area_size(vm) : va_size(va);
+               if (addr >= vaddr + size)
                        continue;
                while (addr < vaddr) {
                        if (count == 0)
                        addr++;
                        count--;
                }
-               n = vaddr + get_vm_area_size(vm) - addr;
+               n = vaddr + size - addr;
                if (n > count)
                        n = count;
-               if (!(vm->flags & VM_IOREMAP))
+               if (flags & VMAP_RAM)
+                       vmap_ram_vread(buf, addr, n, flags);
+               else if (!(vm->flags & VM_IOREMAP))
                        aligned_vread(buf, addr, n);
                else /* IOREMAP area is treated as memory hole */
                        memset(buf, 0, n);
@@@ -3658,7 -3689,7 +3690,7 @@@ int remap_vmalloc_range_partial(struct 
                size -= PAGE_SIZE;
        } while (size > 0);
  
-       vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+       vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
  
        return 0;
  }
@@@ -4126,14 -4157,11 +4158,11 @@@ static int s_show(struct seq_file *m, v
  
        va = list_entry(p, struct vmap_area, list);
  
-       /*
-        * s_show can encounter race with remove_vm_area, !vm on behalf
-        * of vmap area is being tear down or vm_map_ram allocation.
-        */
        if (!va->vm) {
-               seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
-                       (void *)va->va_start, (void *)va->va_end,
-                       va->va_end - va->va_start);
+               if (va->flags & VMAP_RAM)
+                       seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
+                               (void *)va->va_start, (void *)va->va_end,
+                               va->va_end - va->va_start);
  
                goto final;
        }
@@@ -4203,3 -4231,45 +4232,45 @@@ static int __init proc_vmalloc_init(voi
  module_init(proc_vmalloc_init);
  
  #endif
+ void __init vmalloc_init(void)
+ {
+       struct vmap_area *va;
+       struct vm_struct *tmp;
+       int i;
+       /*
+        * Create the cache for vmap_area objects.
+        */
+       vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
+       for_each_possible_cpu(i) {
+               struct vmap_block_queue *vbq;
+               struct vfree_deferred *p;
+               vbq = &per_cpu(vmap_block_queue, i);
+               spin_lock_init(&vbq->lock);
+               INIT_LIST_HEAD(&vbq->free);
+               p = &per_cpu(vfree_deferred, i);
+               init_llist_head(&p->list);
+               INIT_WORK(&p->wq, delayed_vfree_work);
+       }
+       /* Import existing vmlist entries. */
+       for (tmp = vmlist; tmp; tmp = tmp->next) {
+               va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
+               if (WARN_ON_ONCE(!va))
+                       continue;
+               va->va_start = (unsigned long)tmp->addr;
+               va->va_end = va->va_start + tmp->size;
+               va->vm = tmp;
+               insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
+       }
+       /*
+        * Now we can initialize a free vmap space.
+        */
+       vmap_init_free_space();
+       vmap_initialized = true;
+ }
diff --combined net/ipv4/tcp.c
index 33f559f491c8cfd1a66a6f58c7b4a8a19ffa6b4f,7db45cdc3e1a1fa21480dbab8b7b48032083f4ce..288693981b0060c028680033aee143466b2285e4
@@@ -435,7 -435,6 +435,7 @@@ void tcp_init_sock(struct sock *sk
  
        /* There's a bubble in the pipe until at least the first ACK. */
        tp->app_limited = ~0U;
 +      tp->rate_app_limited = 1;
  
        /* See draft-stevens-tcpca-spec-01 for discussion of the
         * initialization of these values.
@@@ -1891,10 -1890,10 +1891,10 @@@ int tcp_mmap(struct file *file, struct 
  {
        if (vma->vm_flags & (VM_WRITE | VM_EXEC))
                return -EPERM;
-       vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+       vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
  
        /* Instruct vm_insert_page() to not mmap_read_lock(mm) */
-       vma->vm_flags |= VM_MIXEDMAP;
+       vm_flags_set(vma, VM_MIXEDMAP);
  
        vma->vm_ops = &tcp_vm_ops;
        return 0;
@@@ -2093,7 -2092,7 +2093,7 @@@ static int tcp_zerocopy_vm_insert_batch
                maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
                                *length + /* Mapped or pending */
                                (pages_remaining * PAGE_SIZE); /* Failed map. */
-               zap_page_range(vma, *address, maybe_zap_len);
+               zap_page_range_single(vma, *address, maybe_zap_len, NULL);
                err = 0;
        }
  
                unsigned long leftover_pages = pages_remaining;
                int bytes_mapped;
  
-               /* We called zap_page_range, try to reinsert. */
+               /* We called zap_page_range_single, try to reinsert. */
                err = vm_insert_pages(vma, *address,
                                      pending_pages,
                                      &pages_remaining);
@@@ -2235,7 -2234,8 +2235,8 @@@ static int tcp_zerocopy_receive(struct 
        total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
        if (total_bytes_to_map) {
                if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
-                       zap_page_range(vma, address, total_bytes_to_map);
+                       zap_page_range_single(vma, address, total_bytes_to_map,
+                                             NULL);
                zc->length = total_bytes_to_map;
                zc->recv_skip_hint = 0;
        } else {
@@@ -3179,7 -3179,6 +3180,7 @@@ int tcp_disconnect(struct sock *sk, in
        tp->plb_rehash = 0;
        /* There's a bubble in the pipe until at least the first ACK. */
        tp->app_limited = ~0U;
 +      tp->rate_app_limited = 1;
        tp->rack.mstamp = 0;
        tp->rack.advanced = 0;
        tp->rack.reo_wnd_steps = 1;
diff --combined tools/objtool/check.c
index 94b518c12f9afe3fe577b9937bb4f8521b4e8453,b1a5f658673f0ab37f960b5dbcc6d25a7cecab9a..35374812afdc409ca5d2b0b6df91ae8ad8b34dc7
@@@ -186,7 -186,6 +186,7 @@@ static bool __dead_end_function(struct 
                "snp_abort",
                "stop_this_cpu",
                "usercopy_abort",
 +              "xen_cpu_bringup_again",
                "xen_start_kernel",
        };
  
@@@ -377,7 -376,6 +377,7 @@@ static int decode_instructions(struct o
  
                if (!strcmp(sec->name, ".noinstr.text") ||
                    !strcmp(sec->name, ".entry.text") ||
 +                  !strcmp(sec->name, ".cpuidle.text") ||
                    !strncmp(sec->name, ".text.__x86.", 12))
                        sec->noinstr = true;
  
@@@ -1188,6 -1186,8 +1188,8 @@@ static const char *uaccess_safe_builtin
        "__tsan_atomic64_compare_exchange_val",
        "__tsan_atomic_thread_fence",
        "__tsan_atomic_signal_fence",
+       "__tsan_unaligned_read16",
+       "__tsan_unaligned_write16",
        /* KCOV */
        "write_comp_data",
        "check_kcov_mode",
        "__ubsan_handle_type_mismatch",
        "__ubsan_handle_type_mismatch_v1",
        "__ubsan_handle_shift_out_of_bounds",
 +      "__ubsan_handle_load_invalid_value",
        /* misc */
        "csum_partial_copy_generic",
        "copy_mc_fragile",
@@@ -3377,12 -3376,6 +3379,12 @@@ static inline bool noinstr_call_dest(st
        if (func->sec->noinstr)
                return true;
  
 +      /*
 +       * If the symbol is a static_call trampoline, we can't tell.
 +       */
 +      if (func->static_call_tramp)
 +              return true;
 +
        /*
         * The __ubsan_handle_*() calls are like WARN(), they only happen when
         * something 'BAD' happened. At the risk of taking the machine down,
@@@ -4180,12 -4173,6 +4182,12 @@@ static int validate_noinstr_sections(st
                warnings += validate_unwind_hints(file, sec);
        }
  
 +      sec = find_section_by_name(file->elf, ".cpuidle.text");
 +      if (sec) {
 +              warnings += validate_section(file, sec);
 +              warnings += validate_unwind_hints(file, sec);
 +      }
 +
        return warnings;
  }
  
index 0ebd9cce49b9661c5d9127a915a6b9fcfad91a56,56a29f2de8e665c99f8419288fb7563136d8ed0d..13a6837a0c6bc62902833292e10d055cafdbc6c1
@@@ -26,7 -26,6 +26,7 @@@ TARGETS += fp
  TARGETS += ftrace
  TARGETS += futex
  TARGETS += gpio
 +TARGETS += hid
  TARGETS += intel_pstate
  TARGETS += iommu
  TARGETS += ipc
@@@ -86,7 -85,7 +86,7 @@@ TARGETS += tmpf
  TARGETS += tpm2
  TARGETS += user
  TARGETS += vDSO
- TARGETS += vm
+ TARGETS += mm
  TARGETS += x86
  TARGETS += zram
  #Please keep the TARGETS list alphabetically sorted
@@@ -237,8 -236,8 +237,8 @@@ ifdef INSTALL_PAT
        @# included in the generated runlist.
        for TARGET in $(TARGETS); do \
                BUILD_TARGET=$$BUILD/$$TARGET;  \
 -              [ ! -d $(INSTALL_PATH)/$$TARGET ] && echo "Skipping non-existent dir: $$TARGET" && continue; \
 -              echo -ne "Emit Tests for $$TARGET\n"; \
 +              [ ! -d $(INSTALL_PATH)/$$TARGET ] && printf "Skipping non-existent dir: $$TARGET\n" && continue; \
 +              printf "Emit Tests for $$TARGET\n"; \
                $(MAKE) -s --no-print-directory OUTPUT=$$BUILD_TARGET COLLECTION=$$TARGET \
                        -C $$TARGET emit_tests >> $(TEST_LIST); \
        done;
index 0000000000000000000000000000000000000000,d90cdc06aa59e398b5d3833e6cedf20fda64788d..c31d952cff68fd3681951a9544eeb6c55eca7116
mode 000000,100644..100644
--- /dev/null
@@@ -1,0 -1,185 +1,185 @@@
 -CFLAGS = -Wall -I $(top_srcdir) -I $(top_srcdir)/usr/include $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ # SPDX-License-Identifier: GPL-2.0
+ # Makefile for mm selftests
+ LOCAL_HDRS += $(selfdir)/mm/local_config.h $(top_srcdir)/mm/gup_test.h
+ include local_config.mk
+ ifeq ($(CROSS_COMPILE),)
+ uname_M := $(shell uname -m 2>/dev/null || echo not)
+ else
+ uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+')
+ endif
+ MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+ # Without this, failed build products remain, with up-to-date timestamps,
+ # thus tricking Make (and you!) into believing that All Is Well, in subsequent
+ # make invocations:
+ .DELETE_ON_ERROR:
+ # Avoid accidental wrong builds, due to built-in rules working just a little
+ # bit too well--but not quite as well as required for our situation here.
+ #
+ # In other words, "make userfaultfd" is supposed to fail to build at all,
+ # because this Makefile only supports either "make" (all), or "make /full/path".
+ # However,  the built-in rules, if not suppressed, will pick up CFLAGS and the
+ # initial LDLIBS (but not the target-specific LDLIBS, because those are only
+ # set for the full path target!). This causes it to get pretty far into building
+ # things despite using incorrect values such as an *occasionally* incomplete
+ # LDLIBS.
+ MAKEFLAGS += --no-builtin-rules
++CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ LDLIBS = -lrt -lpthread
+ TEST_GEN_FILES = cow
+ TEST_GEN_FILES += compaction_test
+ TEST_GEN_FILES += gup_test
+ TEST_GEN_FILES += hmm-tests
+ TEST_GEN_FILES += hugetlb-madvise
+ TEST_GEN_FILES += hugepage-mmap
+ TEST_GEN_FILES += hugepage-mremap
+ TEST_GEN_FILES += hugepage-shm
+ TEST_GEN_FILES += hugepage-vmemmap
+ TEST_GEN_FILES += khugepaged
+ TEST_GEN_PROGS = madv_populate
+ TEST_GEN_FILES += map_fixed_noreplace
+ TEST_GEN_FILES += map_hugetlb
+ TEST_GEN_FILES += map_populate
+ TEST_GEN_FILES += memfd_secret
+ TEST_GEN_FILES += migration
+ TEST_GEN_FILES += mlock-random-test
+ TEST_GEN_FILES += mlock2-tests
+ TEST_GEN_FILES += mrelease_test
+ TEST_GEN_FILES += mremap_dontunmap
+ TEST_GEN_FILES += mremap_test
+ TEST_GEN_FILES += on-fault-limit
+ TEST_GEN_FILES += thuge-gen
+ TEST_GEN_FILES += transhuge-stress
+ TEST_GEN_FILES += userfaultfd
+ TEST_GEN_PROGS += soft-dirty
+ TEST_GEN_PROGS += split_huge_page_test
+ TEST_GEN_FILES += ksm_tests
+ TEST_GEN_PROGS += ksm_functional_tests
+ TEST_GEN_PROGS += mdwe_test
+ ifeq ($(MACHINE),x86_64)
+ CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
+ CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
+ CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
+ VMTARGETS := protection_keys
+ BINARIES_32 := $(VMTARGETS:%=%_32)
+ BINARIES_64 := $(VMTARGETS:%=%_64)
+ ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+ CFLAGS += -no-pie
+ endif
+ ifeq ($(CAN_BUILD_I386),1)
+ TEST_GEN_FILES += $(BINARIES_32)
+ endif
+ ifeq ($(CAN_BUILD_X86_64),1)
+ TEST_GEN_FILES += $(BINARIES_64)
+ endif
+ else
+ ifneq (,$(findstring $(MACHINE),ppc64))
+ TEST_GEN_FILES += protection_keys
+ endif
+ endif
+ ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64))
+ TEST_GEN_FILES += va_128TBswitch
+ TEST_GEN_FILES += virtual_address_range
+ TEST_GEN_FILES += write_to_hugetlbfs
+ endif
+ TEST_PROGS := run_vmtests.sh
+ TEST_FILES := test_vmalloc.sh
+ TEST_FILES += test_hmm.sh
+ TEST_FILES += va_128TBswitch.sh
+ include ../lib.mk
+ $(OUTPUT)/cow: vm_util.c
+ $(OUTPUT)/khugepaged: vm_util.c
+ $(OUTPUT)/ksm_functional_tests: vm_util.c
+ $(OUTPUT)/madv_populate: vm_util.c
+ $(OUTPUT)/soft-dirty: vm_util.c
+ $(OUTPUT)/split_huge_page_test: vm_util.c
+ $(OUTPUT)/userfaultfd: vm_util.c
+ ifeq ($(MACHINE),x86_64)
+ BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+ BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+ define gen-target-rule-32
+ $(1) $(1)_32: $(OUTPUT)/$(1)_32
+ .PHONY: $(1) $(1)_32
+ endef
+ define gen-target-rule-64
+ $(1) $(1)_64: $(OUTPUT)/$(1)_64
+ .PHONY: $(1) $(1)_64
+ endef
+ ifeq ($(CAN_BUILD_I386),1)
+ $(BINARIES_32): CFLAGS += -m32 -mxsave
+ $(BINARIES_32): LDLIBS += -lrt -ldl -lm
+ $(BINARIES_32): $(OUTPUT)/%_32: %.c
+       $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-32,$(t))))
+ endif
+ ifeq ($(CAN_BUILD_X86_64),1)
+ $(BINARIES_64): CFLAGS += -m64 -mxsave
+ $(BINARIES_64): LDLIBS += -lrt -ldl
+ $(BINARIES_64): $(OUTPUT)/%_64: %.c
+       $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-64,$(t))))
+ endif
+ # x86_64 users should be encouraged to install 32-bit libraries
+ ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+ all: warn_32bit_failure
+ warn_32bit_failure:
+       @echo "Warning: you seem to have a broken 32-bit build" 2>&1;           \
+       echo  "environment. This will reduce test coverage of 64-bit" 2>&1;     \
+       echo  "kernels. If you are using a Debian-like distribution," 2>&1;     \
+       echo  "try:"; 2>&1;                                                     \
+       echo  "";                                                               \
+       echo  "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386";       \
+       echo  "";                                                               \
+       echo  "If you are using a Fedora-like distribution, try:";              \
+       echo  "";                                                               \
+       echo  "  yum install glibc-devel.*i686";                                \
+       exit 0;
+ endif
+ endif
+ # cow_EXTRA_LIBS may get set in local_config.mk, or it may be left empty.
+ $(OUTPUT)/cow: LDLIBS += $(COW_EXTRA_LIBS)
+ $(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap
+ $(OUTPUT)/ksm_tests: LDLIBS += -lnuma
+ $(OUTPUT)/migration: LDLIBS += -lnuma
+ local_config.mk local_config.h: check_config.sh
+       /bin/sh ./check_config.sh $(CC)
+ EXTRA_CLEAN += local_config.mk local_config.h
+ ifeq ($(COW_EXTRA_LIBS),)
+ all: warn_missing_liburing
+ warn_missing_liburing:
+       @echo ; \
+       echo "Warning: missing liburing support. Some COW tests will be skipped." ; \
+       echo
+ endif
This page took 2.048391 seconds and 4 git commands to generate.