Memory Resource Controller
==========================
-NOTE:
+.. caution::
This document is hopelessly outdated and it asks for a complete
rewrite. It still contains a useful information so we are keeping it
here but make sure to check the current code if you need a deeper
understanding.
-NOTE:
+.. note::
The Memory Resource Controller has generically been referred to as the
memory controller in this document. Do not confuse memory controller
used here with the memory controller that is used in hardware.
-(For editors) In this document:
+.. hint::
When we mention a cgroup (cgroupfs's directory) with memory controller,
we call it "memory cgroup". When you see git-log and source code, you'll
see patch's title and function names tend to use "memcg".
=============================================
The memory controller isolates the memory behaviour of a group of tasks
-from the rest of the system. The article on LWN [12] mentions some probable
+from the rest of the system. The article on LWN [12]_ mentions some probable
uses of the memory controller. The memory controller can be used to
a. Isolate an application or a group of applications
- Root cgroup has no limit controls.
Kernel memory support is a work in progress, and the current version provides
- basically functionality. (See Section 2.7)
+ basically functionality. (See :ref:`section 2.7
+ <cgroup-v1-memory-kernel-extension>`)
Brief summary of control files.
memory.swappiness set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate set/show controls of moving charges
+ This knob is deprecated and shouldn't be
+ used.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
==========
The memory controller has a long history. A request for comments for the memory
-controller was posted by Balbir Singh [1]. At the time the RFC was posted
+controller was posted by Balbir Singh [1]_. At the time the RFC was posted
there were several implementations for memory control. The goal of the
RFC was to build consensus and agreement for the minimal features required
-for memory control. The first RSS controller was posted by Balbir Singh[2]
-in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the
-RSS controller. At OLS, at the resource management BoF, everyone suggested
-that we handle both page cache and RSS together. Another request was raised
-to allow user space handling of OOM. The current memory controller is
+for memory control. The first RSS controller was posted by Balbir Singh [2]_
+in Feb 2007. Pavel Emelianov [3]_ [4]_ [5]_ has since posted three versions
+of the RSS controller. At OLS, at the resource management BoF, everyone
+suggested that we handle both page cache and RSS together. Another request was
+raised to allow user space handling of OOM. The current memory controller is
at version 6; it combines both mapped (RSS) and unmapped Page
-Cache Control [11].
+Cache Control [11]_.
2. Memory Control
=================
2.2. Accounting
---------------
-::
+.. code-block::
+ :caption: Figure 1: Hierarchy of Accounting
+--------------------+
| mem_cgroup |
| | | |
+---------------+ +---------------+
- (Figure 1: Hierarchy of Accounting)
Figure 1 shows the important aspects of the controller
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
-But see section 8.2: when moving a task to another cgroup, its pages may
-be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
+But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
+task to another cgroup, its pages may be recharged to the new cgroup, if
+move_charge_at_immigrate has been chosen.
2.4 Swap Extension
--------------------------------------
By using the memsw limit, you can avoid system OOM which can be caused by swap
shortage.
-**why 'memory+swap' rather than swap**
+2.4.1 why 'memory+swap' rather than swap
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
affecting global LRU, memory+swap limit is better than just limiting swap from
an OS point of view.
-**What happens when a cgroup hits memory.memsw.limit_in_bytes**
+2.4.2. What happens when a cgroup hits memory.memsw.limit_in_bytes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
in this cgroup. Then, swap-out will not be done by cgroup routine and file
to reclaim memory from the cgroup so as to make space for the new
pages that the cgroup has touched. If the reclaim is unsuccessful,
an OOM routine is invoked to select and kill the bulkiest task in the
-cgroup. (See 10. OOM Control below.)
+cgroup. (See :ref:`10. OOM Control <cgroup-v1-memory-oom-control>` below.)
The reclaim algorithm has not been modified for cgroups, except that
pages that are selected for reclaiming come from the per-cgroup LRU
list.
-NOTE:
- Reclaim does not work for the root cgroup, since we cannot set any
- limits on the root cgroup.
+.. note::
+ Reclaim does not work for the root cgroup, since we cannot set any
+ limits on the root cgroup.
-Note2:
- When panic_on_oom is set to "2", the whole system will panic.
+.. note::
+ When panic_on_oom is set to "2", the whole system will panic.
When oom event notifier is registered, event will be delivered.
-(See oom_control section)
+(See :ref:`oom_control <cgroup-v1-memory-oom-control>` section)
2.6 Locking
-----------
-Lock order is as follows:
+Lock order is as follows::
Page lock (PG_locked bit of page->flags)
mm->page_table_lock or split pte_lock
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
isolating a page from its LRU under lruvec->lru_lock.
+.. _cgroup-v1-memory-kernel-extension:
+
2.7 Kernel Memory Extension
-----------------------------------------------
never greater than the total memory, and freely set U at the cost of his
QoS.
-WARNING:
- In the current implementation, memory reclaim will NOT be
- triggered for a cgroup when it hits K while staying below U, which makes
- this setup impractical.
+ .. warning::
+ In the current implementation, memory reclaim will NOT be triggered for
+ a cgroup when it hits K while staying below U, which makes this setup
+ impractical.
U != 0, K >= U:
Since kmem charges will also be fed to the user counter and reclaim will be
3. User Interface
=================
-3.0. Configuration
-------------------
-
-a. Enable CONFIG_CGROUPS
-b. Enable CONFIG_MEMCG
-
-3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
--------------------------------------------------------------------
+To use the user interface:
-::
+1. Enable CONFIG_CGROUPS and CONFIG_MEMCG options
+2. Prepare the cgroups (see :ref:`Why are cgroups needed?
+ <cgroups-why-needed>` for the background information)::
# mount -t tmpfs none /sys/fs/cgroup
# mkdir /sys/fs/cgroup/memory
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
-3.2. Make the new group and move bash into it::
+3. Make the new group and move bash into it::
# mkdir /sys/fs/cgroup/memory/0
# echo $$ > /sys/fs/cgroup/memory/0/tasks
-Since now we're in the 0 cgroup, we can alter the memory limit::
+4. Since now we're in the 0 cgroup, we can alter the memory limit::
# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
-NOTE:
- We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
- mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
- Gibibytes.)
+ The limit can now be queried::
-NOTE:
- We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
+ # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
+ 4194304
-NOTE:
- We cannot set limits on the root cgroup any more.
+.. note::
+ We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
+ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
+ Gibibytes.)
-::
+.. note::
+ We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
+
+.. note::
+ We cannot set limits on the root cgroup any more.
- # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
- 4194304
We can check the usage::
But the above two are testing extreme situations.
Trying usual test under memory controller is always helpful.
+.. _cgroup-v1-memory-test-troubleshoot:
+
4.1 Troubleshooting
-------------------
A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
some of the pages cached in the cgroup (page cache pages).
-To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
-seeing what happens will be helpful.
+To know what happens, disabling OOM_Kill as per :ref:`"10. OOM Control"
+<cgroup-v1-memory-oom-control>` (below) and seeing what happens will be
+helpful.
+
+.. _cgroup-v1-memory-test-task-migration:
4.2 Task migration
------------------
reclaimed.
You can move charges of a task along with task migration.
-See 8. "Move charges at task migration"
+See :ref:`8. "Move charges at task migration" <cgroup-v1-memory-move-charges>`
4.3 Removing a cgroup
---------------------
-A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
-cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. (because we charge against pages, not
-against tasks.)
+A cgroup can be removed by rmdir, but as discussed in :ref:`sections 4.1
+<cgroup-v1-memory-test-troubleshoot>` and :ref:`4.2
+<cgroup-v1-memory-test-task-migration>`, a cgroup might have some charge
+associated with it, even though all tasks have migrated away from it. (because
+we charge against pages, not against tasks.)
We move the stats to parent, and no change on the charge except uncharging
from the child.
5.2 stat file
-------------
-memory.stat file includes following statistics
-
-per-memory cgroup local status
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-=============== ===============================================================
-cache # of bytes of page cache memory.
-rss # of bytes of anonymous and swap cache memory (includes
- transparent hugepages).
-rss_huge # of bytes of anonymous transparent hugepages.
-mapped_file # of bytes of mapped file (includes tmpfs/shmem)
-pgpgin # of charging events to the memory cgroup. The charging
- event happens each time a page is accounted as either mapped
- anon page(RSS) or cache page(Page Cache) to the cgroup.
-pgpgout # of uncharging events to the memory cgroup. The uncharging
- event happens each time a page is unaccounted from the cgroup.
-swap # of bytes of swap usage
-dirty # of bytes that are waiting to get written back to the disk.
-writeback # of bytes of file/anon cache that are queued for syncing to
- disk.
-inactive_anon # of bytes of anonymous and swap cache memory on inactive
- LRU list.
-active_anon # of bytes of anonymous and swap cache memory on active
- LRU list.
-inactive_file # of bytes of file-backed memory and MADV_FREE anonymous memory(
- LazyFree pages) on inactive LRU list.
-active_file # of bytes of file-backed memory on active LRU list.
-unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
-=============== ===============================================================
-
-status considering hierarchy (see memory.use_hierarchy settings)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ===================================================
-hierarchical_memory_limit # of bytes of memory limit with regard to hierarchy
- under which the memory cgroup is
-hierarchical_memsw_limit # of bytes of memory+swap limit with regard to
- hierarchy under which memory cgroup is.
-
-total_<counter> # hierarchical version of <counter>, which in
- addition to the cgroup's own value includes the
- sum of all hierarchical children's values of
- <counter>, i.e. total_cache
-========================= ===================================================
-
-The following additional stats are dependent on CONFIG_DEBUG_VM
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ========================================
-recent_rotated_anon VM internal parameter. (see mm/vmscan.c)
-recent_rotated_file VM internal parameter. (see mm/vmscan.c)
-recent_scanned_anon VM internal parameter. (see mm/vmscan.c)
-recent_scanned_file VM internal parameter. (see mm/vmscan.c)
-========================= ========================================
-
-Memo:
+memory.stat file includes following statistics:
+
+ * per-memory cgroup local status
+
+ =============== ===============================================================
+ cache # of bytes of page cache memory.
+ rss # of bytes of anonymous and swap cache memory (includes
+ transparent hugepages).
+ rss_huge # of bytes of anonymous transparent hugepages.
+ mapped_file # of bytes of mapped file (includes tmpfs/shmem)
+ pgpgin # of charging events to the memory cgroup. The charging
+ event happens each time a page is accounted as either mapped
+ anon page(RSS) or cache page(Page Cache) to the cgroup.
+ pgpgout # of uncharging events to the memory cgroup. The uncharging
+ event happens each time a page is unaccounted from the
+ cgroup.
+ swap # of bytes of swap usage
+ dirty # of bytes that are waiting to get written back to the disk.
+ writeback # of bytes of file/anon cache that are queued for syncing to
+ disk.
+ inactive_anon # of bytes of anonymous and swap cache memory on inactive
+ LRU list.
+ active_anon # of bytes of anonymous and swap cache memory on active
+ LRU list.
+ inactive_file # of bytes of file-backed memory and MADV_FREE anonymous
+ memory (LazyFree pages) on inactive LRU list.
+ active_file # of bytes of file-backed memory on active LRU list.
+ unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
+ =============== ===============================================================
+
+ * status considering hierarchy (see memory.use_hierarchy settings):
+
+ ========================= ===================================================
+ hierarchical_memory_limit # of bytes of memory limit with regard to
+ hierarchy
+ under which the memory cgroup is
+ hierarchical_memsw_limit # of bytes of memory+swap limit with regard to
+ hierarchy under which memory cgroup is.
+
+ total_<counter> # hierarchical version of <counter>, which in
+ addition to the cgroup's own value includes the
+ sum of all hierarchical children's values of
+ <counter>, i.e. total_cache
+ ========================= ===================================================
+
+ * additional vm parameters (depends on CONFIG_DEBUG_VM):
+
+ ========================= ========================================
+ recent_rotated_anon VM internal parameter. (see mm/vmscan.c)
+ recent_rotated_file VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_anon VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_file VM internal parameter. (see mm/vmscan.c)
+ ========================= ========================================
+
+.. hint::
recent_rotated means recent frequency of LRU rotation.
recent_scanned means recent # of scans to LRU.
showing for better debug please see the code for meanings.
-Note:
+.. note::
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup.
# echo 1G > memory.soft_limit_in_bytes
-NOTE1:
+.. note::
Soft limits take effect over a long period of time, since they involve
reclaiming memory for balancing between memory cgroups
-NOTE2:
+
+.. note::
It is recommended to set the soft limit always below the hard limit,
otherwise the hard limit will take precedence.
- 8. Move charges at task migration
- =================================
+.. _cgroup-v1-memory-move-charges:
+
+ 8. Move charges at task migration (DEPRECATED!)
+ ===============================================
+
+ THIS IS DEPRECATED!
+
+ It's expensive and unreliable! It's better practice to launch workload
+ tasks directly from inside their target cgroup. Use dedicated workload
+ cgroups to allow fine-grained policy adjustments without having to
+ move physical pages between control domains.
Users can move charges associated with a task along with task migration, that
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
# echo (some positive value) > memory.move_charge_at_immigrate
-Note:
+.. note::
Each bits of move_charge_at_immigrate has its own meaning about what type
- of charges should be moved. See 8.2 for details.
-Note:
+ of charges should be moved. See :ref:`section 8.2
+ <cgroup-v1-memory-movable-charges>` for details.
+
+.. note::
Charges are moved only when you move mm->owner, in other words,
a leader of a thread group.
-Note:
+
+.. note::
If we cannot find enough space for the task in the destination cgroup, we
try to make space by reclaiming memory. Task migration may fail if we
cannot make enough space.
-Note:
+
+.. note::
It can take several seconds if you move charges much.
And if you want disable it again::
# echo 0 > memory.move_charge_at_immigrate
+.. _cgroup-v1-memory-movable-charges:
+
8.2 Type of charges which can be moved
--------------------------------------
It's applicable for root and non-root cgroup.
+.. _cgroup-v1-memory-oom-control:
+
10. OOM Control
===============
References
==========
-1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
-2. Singh, Balbir. Memory Controller (RSS Control),
+.. [1] Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
+.. [2] Singh, Balbir. Memory Controller (RSS Control),
http://lwn.net/Articles/222762/
-3. Emelianov, Pavel. Resource controllers based on process cgroups
+.. [3] Emelianov, Pavel. Resource controllers based on process cgroups
-4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
+.. [4] Emelianov, Pavel. RSS controller based on process cgroups (v2)
-5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
+.. [5] Emelianov, Pavel. RSS controller based on process cgroups (v3)
+
6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
subsystem (v3), http://lwn.net/Articles/235534/
10. Singh, Balbir. Memory controller v6 test results,
https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
-11. Singh, Balbir. Memory controller introduction (v6),
- https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop
-12. Corbet, Jonathan, Controlling memory use in cgroups,
- http://lwn.net/Articles/243795/
+
+.. [11] Singh, Balbir. Memory controller introduction (v6),
+ https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop
+.. [12] Corbet, Jonathan, Controlling memory use in cgroups,
+ http://lwn.net/Articles/243795/
To let sysadmins enable or disable it and tune for the given system,
DAMON_RECLAIM utilizes module parameters. That is, you can put
``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write
-proper values to ``/sys/modules/damon_reclaim/parameters/<parameter>`` files.
+proper values to ``/sys/module/damon_reclaim/parameters/<parameter>`` files.
Below are the description of each parameter.
against. That is, DAMON_RECLAIM will find cold memory regions in this region
and reclaims. By default, biggest System RAM is used as the region.
+ skip_anon
+ ---------
+
+ Skip anonymous pages reclamation.
+
+ If this parameter is set as ``Y``, DAMON_RECLAIM does not reclaim anonymous
+ pages. By default, ``N``.
+
+
kdamond_pid
-----------
do nothing again, so that we can fall back to the LRU-list based page
granularity reclamation. ::
- # cd /sys/modules/damon_reclaim/parameters
+ # cd /sys/module/damon_reclaim/parameters
# echo 30000000 > min_age
# echo $((1 * 1024 * 1024 * 1024)) > quota_sz
# echo 1000 > quota_reset_interval_ms
-.. _hugetlbpage:
-
=============
HugeTLB Pages
=============
Note: When the feature of freeing unused vmemmap pages associated with each
hugetlb page is enabled, we can fail to free the huge pages triggered by
-the user when ths system is under memory pressure. Please try again later.
+the user when the system is under memory pressure. Please try again later.
Pages that are used as huge pages are reserved inside the kernel and cannot
be used for other purposes. Huge pages cannot be swapped out under
resulting effect on persistent huge page allocation is as follows:
#. Regardless of mempolicy mode [see
- :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`],
+ Documentation/admin-guide/mm/numa_memory_policy.rst],
persistent huge pages will be distributed across the node or nodes
specified in the mempolicy as if "interleave" had been specified.
However, if a node in the policy does not contain sufficient contiguous
.. _map_hugetlb:
``map_hugetlb``
- see tools/testing/selftests/vm/map_hugetlb.c
+ see tools/testing/selftests/mm/map_hugetlb.c
``hugepage-shm``
- see tools/testing/selftests/vm/hugepage-shm.c
+ see tools/testing/selftests/mm/hugepage-shm.c
``hugepage-mmap``
- see tools/testing/selftests/vm/hugepage-mmap.c
+ see tools/testing/selftests/mm/hugepage-mmap.c
The `libhugetlbfs`_ library provides a wide range of userspace tools
to help with huge page usability, environment setup, and control.
-.. _idle_page_tracking:
-
==================
Idle Page Tracking
==================
are not reclaimable, he or she can filter them out using
``/proc/kpageflags``.
- The page-types tool in the tools/vm directory can be used to assist in this.
+ The page-types tool in the tools/mm directory can be used to assist in this.
If the tool is run initially with the appropriate option, it will mark all the
queried pages as idle. Subsequent runs of the tool can then show which pages have
their idle flag cleared in the interim.
-See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more
-information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and
-``/proc/kpagecgroup``.
+See Documentation/admin-guide/mm/pagemap.rst for more information about
+``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
.. _impl_details:
- =============
-.. _numaperf:
-
+ =======================
+ NUMA Memory Performance
+ =======================
+
NUMA Locality
=============
IO initiators such as GPUs and NICs. Unlike access class 0, only
nodes containing CPUs are considered.
- ================
NUMA Performance
================
Access class 1 takes the same form but only includes values for CPU to
memory activity.
- ==========
NUMA Cache
==========
The "write_policy" will be 0 for write-back, and non-zero for
write-through caching.
- ========
See Also
========
-.. _pagemap:
-
=============================
Examining Process Page Tables
=============================
* Bits 0-4 swap type if swapped
* Bits 5-54 swap offset if swapped
* Bit 55 pte is soft-dirty (see
- :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
+ Documentation/admin-guide/mm/soft-dirty.rst)
* Bit 56 page exclusively mapped (since 4.2)
* Bit 57 pte is uffd-wp write-protected (since 5.13) (see
- :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
+ Documentation/admin-guide/mm/userfaultfd.rst)
* Bits 58-60 zero
* Bit 61 page is file-page or shared-anon (since 3.5)
* Bit 62 page swapped
* ``/proc/kpagecount``. This file contains a 64-bit count of the number of
times each page is mapped, indexed by PFN.
- The page-types tool in the tools/vm directory can be used to query the
+ The page-types tool in the tools/mm directory can be used to query the
number of times a page is mapped.
* ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
A compound page with order N consists of 2^N physically contiguous pages.
A compound page with order 2 takes the form of "HTTT", where H donates its
head page and T donates its tail page(s). The major consumers of compound
- pages are hugeTLB pages
- (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
+ pages are hugeTLB pages (Documentation/admin-guide/mm/hugetlbpage.rst),
the SLUB etc. memory allocators and various device drivers.
However in this interface, only huge/giga pages are made visible
to end users.
Zero page for pfn_zero or huge_zero page.
25 - IDLE
The page has not been accessed since it was marked idle (see
- :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
+ Documentation/admin-guide/mm/idle_page_tracking.rst).
Note that this flag may be stale in case the page was accessed via
a PTE. To make sure the flag is up-to-date one has to read
``/sys/kernel/mm/page_idle/bitmap`` first.
14 - SWAPBACKED
The page is backed by swap/RAM.
- The page-types tool in the tools/vm directory can be used to query the
+ The page-types tool in the tools/mm directory can be used to query the
above flags.
Using pagemap to do something useful
-.. _balance:
-
================
Memory Balancing
================
- Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+ Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
well as for non __GFP_IO allocations.
The first reason why a caller may avoid reclaim is that the caller can not
-.. _highmem:
-
====================
High Memory Handling
====================
It can be invoked from any context (including interrupts) but the mappings
can only be used in the context which acquired them.
- This function should be preferred, where feasible, over all the others.
+ This function should always be used, whereas kmap_atomic() and kmap() have
+ been deprecated.
These mappings are thread-local and CPU-local, meaning that the mapping
can only be accessed from within this thread and the thread is bound to the
for pages which are known to not come from ZONE_HIGHMEM. However, it is
always safe to use kmap_local_page() / kunmap_local().
- While it is significantly faster than kmap(), for the higmem case it
+ While it is significantly faster than kmap(), for the highmem case it
comes with restrictions about the pointers validity. Contrary to kmap()
mappings, the local mappings are only valid in the context of the caller
and cannot be handed to other contexts. This implies that users must
(included in the "Functions" section) for details on how to manage nested
mappings.
- * kmap_atomic(). This permits a very short duration mapping of a single
- page. Since the mapping is restricted to the CPU that issued it, it
- performs well, but the issuing task is therefore required to stay on that
- CPU until it has finished, lest some other task displace its mappings.
+ * kmap_atomic(). This function has been deprecated; use kmap_local_page().
+
+ NOTE: Conversions to kmap_local_page() must take care to follow the mapping
+ restrictions imposed on kmap_local_page(). Furthermore, the code between
+ calls to kmap_atomic() and kunmap_atomic() may implicitly depend on the side
+ effects of atomic mappings, i.e. disabling page faults or preemption, or both.
+ In that case, explicit calls to pagefault_disable() or preempt_disable() or
+ both must be made in conjunction with the use of kmap_local_page().
+
+ [Legacy documentation]
+
+ This permits a very short duration mapping of a single page. Since the
+ mapping is restricted to the CPU that issued it, it performs well, but
+ the issuing task is therefore required to stay on that CPU until it has
+ finished, lest some other task displace its mappings.
kmap_atomic() may also be used by interrupt contexts, since it does not
sleep and the callers too may not sleep until after kunmap_atomic() is
It is assumed that k[un]map_atomic() won't fail.
- * kmap(). This should be used to make short duration mapping of a single
- page with no restrictions on preemption or migration. It comes with an
- overhead as mapping space is restricted and protected by a global lock
- for synchronization. When mapping is no longer needed, the address that
- the page was mapped to must be released with kunmap().
+ * kmap(). This function has been deprecated; use kmap_local_page().
+
+ NOTE: Conversions to kmap_local_page() must take care to follow the mapping
+ restrictions imposed on kmap_local_page(). In particular, it is necessary to
+ make sure that the kernel virtual memory pointer is only valid in the thread
+ that obtained it.
+
+ [Legacy documentation]
+
+ This should be used to make short duration mapping of a single page with no
+ restrictions on preemption or migration. It comes with an overhead as mapping
+ space is restricted and protected by a global lock for synchronization. When
+ mapping is no longer needed, the address that the page was mapped to must be
+ released with kunmap().
Mapping changes must be propagated across all the CPUs. kmap() also
requires global TLB invalidation when the kmap's pool wraps and it might
-.. _hugetlbfs_reserve:
-
=====================
Hugetlbfs Reservation
=====================
Overview
========
-Huge pages as described at :ref:`hugetlbpage` are typically
+Huge pages as described at Documentation/mm/hugetlbpage.rst are typically
preallocated for application use. These huge pages are instantiated in a
task's address space at page fault time if the VMA indicates huge pages are
to be used. If no huge page exists at page fault time, the task is sent
Reservations are consumed when huge pages associated with the reservations
are allocated and instantiated in the corresponding mapping. The allocation
- is performed within the routine alloc_huge_page()::
+ is performed within the routine alloc_hugetlb_folio()::
- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
- alloc_huge_page is passed a VMA pointer and a virtual address, so it can
+ alloc_hugetlb_folio is passed a VMA pointer and a virtual address, so it can
consult the reservation map to determine if a reservation exists. In addition,
- alloc_huge_page takes the argument avoid_reserve which indicates reserves
+ alloc_hugetlb_folio takes the argument avoid_reserve which indicates reserves
should not be used even if it appears they have been set aside for the
specified address. The avoid_reserve argument is most often used in the case
of Copy on Write and Page Migration where additional copies of an existing
exists and can be used for the allocation, the routine dequeue_huge_page_vma()
is called. This routine takes two arguments related to reservations:
- - avoid_reserve, this is the same value/argument passed to alloc_huge_page()
+ - avoid_reserve, this is the same value/argument passed to
+ alloc_hugetlb_folio().
- chg, even though this argument is of type long only the values 0 or 1 are
passed to dequeue_huge_page_vma. If the value is 0, it indicates a
reservation exists (see the section "Memory Policy and Reservations" for
reservation based adjustments as above will be made: SetPagePrivate(page) and
resv_huge_pages--.
- After obtaining a new huge page, (page)->private is set to the value of
- the subpool associated with the page if it exists. This will be used for
- subpool accounting when the page is freed.
+ After obtaining a new hugetlb folio, (folio)->_hugetlb_subpool is set to the
+ value of the subpool associated with the page if it exists. This will be used
+ for subpool accounting when the folio is freed.
The routine vma_commit_reservation() is then called to adjust the reserve
map based on the consumption of the reservation. In general, this involves
entry must be created.
It is possible that the reserve map could have been changed between the call
- to vma_needs_reservation() at the beginning of alloc_huge_page() and the
- call to vma_commit_reservation() after the page was allocated. This would
+ to vma_needs_reservation() at the beginning of alloc_hugetlb_folio() and the
+ call to vma_commit_reservation() after the folio was allocated. This would
be possible if hugetlb_reserve_pages was called for the same page in a shared
mapping. In such cases, the reservation count and subpool free page count
will be off by one. This rare condition can be identified by comparing the
-.. _page_owner:
-
==================================================
page owner: Tracking about who allocated each page
==================================================
Although it doesn't mean that they have the right owner information,
at least, we can tell whether the page is allocated or not,
more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages
-are catched and marked, although they are mostly allocated from struct
+are caught and marked, although they are mostly allocated from struct
page extension feature. Anyway, after that, no page is left in
un-tracking state.
1) Build user-space helper::
- cd tools/vm
+ cd tools/mm
make page_owner_sort
2) Enable page owner: add "page_owner=on" to boot cmdline.
at alloc_ts timestamp of the page when it was allocated
ator allocator memory allocator for pages
- For --curl option:
+ For --cull option:
KEY LONG DESCRIPTION
p pid process ID
-.. _slub:
-
==========================
Short users guide for SLUB
==========================
running the command. ``slabinfo`` can be compiled with
::
- gcc -o slabinfo tools/vm/slabinfo.c
+ gcc -o slabinfo tools/mm/slabinfo.c
Some of the modes of operation of ``slabinfo`` require that slub debugging
be enabled on the command line. F.e. no tracking information will be
-.. _transhuge:
-
============================
Transparent Hugepage Support
============================
Refcounting on THP is mostly consistent with refcounting on other compound
pages:
- - get_page()/put_page() and GUP operate on head page's ->_refcount.
+ - get_page()/put_page() and GUP operate on the folio->_refcount.
- ->_refcount in tail pages is always zero: get_page_unless_zero() never
succeeds on tail pages.
- - map/unmap of PMD entry for the whole compound page increment/decrement
- ->compound_mapcount, stored in the first tail page of the compound page;
- and also increment/decrement ->subpages_mapcount (also in the first tail)
- by COMPOUND_MAPPED when compound_mapcount goes from -1 to 0 or 0 to -1.
+ - map/unmap of a PMD entry for the whole THP increment/decrement
+ folio->_entire_mapcount and also increment/decrement
+ folio->_nr_pages_mapped by COMPOUND_MAPPED when _entire_mapcount
+ goes from -1 to 0 or 0 to -1.
- - map/unmap of sub-pages with PTE entry increment/decrement ->_mapcount
- on relevant sub-page of the compound page, and also increment/decrement
- ->subpages_mapcount, stored in first tail page of the compound page, when
- _mapcount goes from -1 to 0 or 0 to -1: counting sub-pages mapped by PTE.
+ - map/unmap of individual pages with PTE entry increment/decrement
+ page->_mapcount and also increment/decrement folio->_nr_pages_mapped
+ when page->_mapcount goes from -1 to 0 or 0 to -1 as this counts
+ the number of pages mapped by PTE.
split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the page
Note that split_huge_pmd() doesn't have any limitations on refcounting:
pmd can be split at any point and never fails.
- Partial unmap and deferred_split_huge_page()
- ============================================
+ Partial unmap and deferred_split_folio()
+ ========================================
Unmapping part of THP (with munmap() or other way) is not going to free
memory immediately. Instead, we detect that a subpage of THP is not in use
counterproductive since in many cases partial unmap happens during exit(2) if
a THP crosses a VMA boundary.
- The function deferred_split_huge_page() is used to queue a page for splitting.
+ The function deferred_split_folio() is used to queue a folio for splitting.
The splitting itself will happen when we get memory pressure via shrinker
interface.
-.. _unevictable_lru:
-
==============================
Unevictable LRU Infrastructure
==============================
This document describes the Linux memory manager's "Unevictable LRU"
infrastructure and the use of this to manage several types of "unevictable"
- pages.
+ folios.
The document attempts to provide the overall rationale behind this mechanism
and the rationale for some of the design decisions that drove the
===================
The Unevictable LRU facility adds an additional LRU list to track unevictable
- pages and to hide these pages from vmscan. This mechanism is based on a patch
- by Larry Woodman of Red Hat to address several scalability problems with page
+ folios and to hide these folios from vmscan. This mechanism is based on a patch
+ by Larry Woodman of Red Hat to address several scalability problems with folio
reclaim in Linux. The problems have been observed at customer sites on large
memory x86_64 systems.
unevictable, either by definition or by circumstance, in the future.
- The Unevictable LRU Page List
- -----------------------------
+ The Unevictable LRU Folio List
+ ------------------------------
- The Unevictable LRU page list is a lie. It was never an LRU-ordered list, but a
- companion to the LRU-ordered anonymous and file, active and inactive page lists;
- and now it is not even a page list. But following familiar convention, here in
- this document and in the source, we often imagine it as a fifth LRU page list.
+ The Unevictable LRU folio list is a lie. It was never an LRU-ordered
+ list, but a companion to the LRU-ordered anonymous and file, active and
+ inactive folio lists; and now it is not even a folio list. But following
+ familiar convention, here in this document and in the source, we often
+ imagine it as a fifth LRU folio list.
The Unevictable LRU infrastructure consists of an additional, per-node, LRU list
- called the "unevictable" list and an associated page flag, PG_unevictable, to
- indicate that the page is being managed on the unevictable list.
+ called the "unevictable" list and an associated folio flag, PG_unevictable, to
+ indicate that the folio is being managed on the unevictable list.
The PG_unevictable flag is analogous to, and mutually exclusive with, the
- PG_active flag in that it indicates on which LRU list a page resides when
+ PG_active flag in that it indicates on which LRU list a folio resides when
PG_lru is set.
- The Unevictable LRU infrastructure maintains unevictable pages as if they were
+ The Unevictable LRU infrastructure maintains unevictable folios as if they were
on an additional LRU list for a few reasons:
- (1) We get to "treat unevictable pages just like we treat other pages in the
+ (1) We get to "treat unevictable folios just like we treat other folios in the
system - which means we get to use the same code to manipulate them, the
same code to isolate them (for migrate, etc.), the same code to keep track
of the statistics, etc..." [Rik van Riel]
- (2) We want to be able to migrate unevictable pages between nodes for memory
+ (2) We want to be able to migrate unevictable folios between nodes for memory
defragmentation, workload management and memory hotplug. The Linux kernel
- can only migrate pages that it can successfully isolate from the LRU
+ can only migrate folios that it can successfully isolate from the LRU
lists (or "Movable" pages: outside of consideration here). If we were to
- maintain pages elsewhere than on an LRU-like list, where they can be
- detected by isolate_lru_page(), we would prevent their migration.
+ maintain folios elsewhere than on an LRU-like list, where they can be
+ detected by folio_isolate_lru(), we would prevent their migration.
- The unevictable list does not differentiate between file-backed and anonymous,
- swap-backed pages. This differentiation is only important while the pages are,
- in fact, evictable.
+ The unevictable list does not differentiate between file-backed and
+ anonymous, swap-backed folios. This differentiation is only important
+ while the folios are, in fact, evictable.
The unevictable list benefits from the "arrayification" of the per-node LRU
lists and statistics originally proposed and posted by Christoph Lameter.
Detecting Unevictable Pages
---------------------------
- The function page_evictable() in mm/internal.h determines whether a page is
+ The function folio_evictable() in mm/internal.h determines whether a folio is
evictable or not using the query function outlined above [see section
:ref:`Marking address spaces unevictable <mark_addr_space_unevict>`]
to check the AS_UNEVICTABLE flag.
might be), the lock action (e.g. SHM_LOCK) can be lazy, and need not populate
the page tables for the region as does, for example, mlock(), nor need it make
any special effort to push any pages in the SHM_LOCK'd area to the unevictable
- list. Instead, vmscan will do this if and when it encounters the pages during
+ list. Instead, vmscan will do this if and when it encounters the folios during
a reclamation scan.
On an unlock action (such as SHM_UNLOCK), the unlocker (e.g. shmctl()) must scan
the pages are also "rescued" from the unevictable list in the process of
freeing them.
- page_evictable() also checks for mlocked pages by testing an additional page
- flag, PG_mlocked (as wrapped by PageMlocked()), which is set when a page is
- faulted into a VM_LOCKED VMA, or found in a VMA being VM_LOCKED.
+ folio_evictable() also checks for mlocked folios by calling
+ folio_test_mlocked(), which is set when a folio is faulted into a
+ VM_LOCKED VMA, or found in a VMA being VM_LOCKED.
- Vmscan's Handling of Unevictable Pages
- --------------------------------------
+ Vmscan's Handling of Unevictable Folios
+ ---------------------------------------
- If unevictable pages are culled in the fault path, or moved to the unevictable
- list at mlock() or mmap() time, vmscan will not encounter the pages until they
+ If unevictable folios are culled in the fault path, or moved to the unevictable
+ list at mlock() or mmap() time, vmscan will not encounter the folios until they
have become evictable again (via munlock() for example) and have been "rescued"
from the unevictable list. However, there may be situations where we decide,
- for the sake of expediency, to leave an unevictable page on one of the regular
+ for the sake of expediency, to leave an unevictable folio on one of the regular
active/inactive LRU lists for vmscan to deal with. vmscan checks for such
- pages in all of the shrink_{active|inactive|page}_list() functions and will
- "cull" such pages that it encounters: that is, it diverts those pages to the
+ folios in all of the shrink_{active|inactive|page}_list() functions and will
+ "cull" such folios that it encounters: that is, it diverts those folios to the
unevictable list for the memory cgroup and node being scanned.
- There may be situations where a page is mapped into a VM_LOCKED VMA, but the
- page is not marked as PG_mlocked. Such pages will make it all the way to
- shrink_active_list() or shrink_page_list() where they will be detected when
- vmscan walks the reverse map in folio_referenced() or try_to_unmap(). The page
- is culled to the unevictable list when it is released by the shrinker.
+ There may be situations where a folio is mapped into a VM_LOCKED VMA,
+ but the folio does not have the mlocked flag set. Such folios will make
+ it all the way to shrink_active_list() or shrink_page_list() where they
+ will be detected when vmscan walks the reverse map in folio_referenced()
+ or try_to_unmap(). The folio is culled to the unevictable list when it
+ is released by the shrinker.
- To "cull" an unevictable page, vmscan simply puts the page back on the LRU list
- using putback_lru_page() - the inverse operation to isolate_lru_page() - after
- dropping the page lock. Because the condition which makes the page unevictable
- may change once the page is unlocked, __pagevec_lru_add_fn() will recheck the
- unevictable state of a page before placing it on the unevictable list.
+ To "cull" an unevictable folio, vmscan simply puts the folio back on
+ the LRU list using folio_putback_lru() - the inverse operation to
+ folio_isolate_lru() - after dropping the folio lock. Because the
+ condition which makes the folio unevictable may change once the folio
+ is unlocked, __pagevec_lru_add_fn() will recheck the unevictable state
+ of a folio before placing it on the unevictable list.
MLOCKED Pages
=============
- The unevictable page list is also useful for mlock(), in addition to ramfs and
+ The unevictable folio list is also useful for mlock(), in addition to ramfs and
SYSV SHM. Note that mlock() is only available in CONFIG_MMU=y situations; in
NOMMU situations, all mappings are effectively mlocked.
If the VMA passes some filtering as described in "Filtering Special VMAs"
below, mlock_fixup() will attempt to merge the VMA with its neighbors or split
off a subset of the VMA if the range does not cover the entire VMA. Any pages
- already present in the VMA are then marked as mlocked by mlock_page() via
+ already present in the VMA are then marked as mlocked by mlock_folio() via
mlock_pte_range() via walk_page_range() via mlock_vma_pages_range().
Before returning from the system call, do_mlock() or mlockall() will call
fault path - which is also how mlock2()'s MLOCK_ONFAULT areas are handled.
For each PTE (or PMD) being faulted into a VMA, the page add rmap function
- calls mlock_vma_page(), which calls mlock_page() when the VMA is VM_LOCKED
+ calls mlock_vma_folio(), which calls mlock_folio() when the VMA is VM_LOCKED
(unless it is a PTE mapping of a part of a transparent huge page). Or when
- it is a newly allocated anonymous page, lru_cache_add_inactive_or_unevictable()
- calls mlock_new_page() instead: similar to mlock_page(), but can make better
+ it is a newly allocated anonymous page, folio_add_lru_vma() calls
+ mlock_new_folio() instead: similar to mlock_folio(), but can make better
judgments, since this page is held exclusively and known not to be on LRU yet.
- mlock_page() sets PageMlocked immediately, then places the page on the CPU's
- mlock pagevec, to batch up the rest of the work to be done under lru_lock by
- __mlock_page(). __mlock_page() sets PageUnevictable, initializes mlock_count
+ mlock_folio() sets PG_mlocked immediately, then places the page on the CPU's
+ mlock folio batch, to batch up the rest of the work to be done under lru_lock by
+ __mlock_folio(). __mlock_folio() sets PG_unevictable, initializes mlock_count
and moves the page to unevictable state ("the unevictable LRU", but with
- mlock_count in place of LRU threading). Or if the page was already PageLRU
- and PageUnevictable and PageMlocked, it simply increments the mlock_count.
+ mlock_count in place of LRU threading). Or if the page was already PG_lru
+ and PG_unevictable and PG_mlocked, it simply increments the mlock_count.
But in practice that may not work ideally: the page may not yet be on an LRU, or
it may have been temporarily isolated from LRU. In such cases the mlock_count
- field cannot be touched, but will be set to 0 later when __pagevec_lru_add_fn()
+ field cannot be touched, but will be set to 0 later when __munlock_folio()
returns the page to "LRU". Races prohibit mlock_count from being set to 1 then:
rather than risk stranding a page indefinitely as unevictable, always err with
mlock_count on the low side, so that when munlocked the page will be rescued to
any "special" VMAs. So, those VMAs will be ignored for munlock.
If the VMA is VM_LOCKED, mlock_fixup() again attempts to merge or split off the
- specified range. All pages in the VMA are then munlocked by munlock_page() via
+ specified range. All pages in the VMA are then munlocked by munlock_folio() via
mlock_pte_range() via walk_page_range() via mlock_vma_pages_range() - the same
function used when mlocking a VMA range, with new flags for the VMA indicating
that it is munlock() being performed.
- munlock_page() uses the mlock pagevec to batch up work to be done under
- lru_lock by __munlock_page(). __munlock_page() decrements the page's
- mlock_count, and when that reaches 0 it clears PageMlocked and clears
- PageUnevictable, moving the page from unevictable state to inactive LRU.
+ munlock_folio() uses the mlock pagevec to batch up work to be done
+ under lru_lock by __munlock_folio(). __munlock_folio() decrements the
+ folio's mlock_count, and when that reaches 0 it clears the mlocked flag
+ and clears the unevictable flag, moving the folio from unevictable state
+ to the inactive LRU.
- But in practice that may not work ideally: the page may not yet have reached
+ But in practice that may not work ideally: the folio may not yet have reached
"the unevictable LRU", or it may have been temporarily isolated from it. In
those cases its mlock_count field is unusable and must be assumed to be 0: so
- that the page will be rescued to an evictable LRU, then perhaps be mlocked
+ that the folio will be rescued to an evictable LRU, then perhaps be mlocked
again later if vmscan finds it in a VM_LOCKED VMA.
before mlocking any pages already present, if one of those pages were migrated
before mlock_pte_range() reached it, it would get counted twice in mlock_count.
To prevent that, mlock_vma_pages_range() temporarily marks the VMA as VM_IO,
- so that mlock_vma_page() will skip it.
+ so that mlock_vma_folio() will skip it.
To complete page migration, we place the old and new pages back onto the LRU
afterwards. The "unneeded" page - old page on success, new page on failure -
way, so unmapping them required no processing.
For each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
- munlock_vma_page(), which calls munlock_page() when the VMA is VM_LOCKED
+ munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).
- munlock_page() uses the mlock pagevec to batch up work to be done under
- lru_lock by __munlock_page(). __munlock_page() decrements the page's
- mlock_count, and when that reaches 0 it clears PageMlocked and clears
- PageUnevictable, moving the page from unevictable state to inactive LRU.
+ munlock_folio() uses the mlock pagevec to batch up work to be done
+ under lru_lock by __munlock_folio(). __munlock_folio() decrements the
+ folio's mlock_count, and when that reaches 0 it clears the mlocked flag
+ and clears the unevictable flag, moving the folio from unevictable state
+ to the inactive LRU.
- But in practice that may not work ideally: the page may not yet have reached
+ But in practice that may not work ideally: the folio may not yet have reached
"the unevictable LRU", or it may have been temporarily isolated from it. In
those cases its mlock_count field is unusable and must be assumed to be 0: so
- that the page will be rescued to an evictable LRU, then perhaps be mlocked
+ that the folio will be rescued to an evictable LRU, then perhaps be mlocked
again later if vmscan finds it in a VM_LOCKED VMA.
Mlocked pages can be munlocked and deleted in this way: like with munmap(),
for each PTE (or PMD) being unmapped from a VMA, page_remove_rmap() calls
- munlock_vma_page(), which calls munlock_page() when the VMA is VM_LOCKED
+ munlock_vma_folio(), which calls munlock_folio() when the VMA is VM_LOCKED
(unless it was a PTE mapping of a part of a transparent huge page).
However, if there is a racing munlock(), since mlock_vma_pages_range() starts
present, if one of those pages were unmapped by truncation or hole punch before
mlock_pte_range() reached it, it would not be recognized as mlocked by this VMA,
and would not be counted out of mlock_count. In this rare case, a page may
- still appear as PageMlocked after it has been fully unmapped: and it is left to
+ still appear as PG_mlocked after it has been fully unmapped: and it is left to
release_pages() (or __page_cache_release()) to clear it and update statistics
before freeing (this event is counted in /proc/vmstat unevictable_pgs_cleared,
which is usually 0).
vmscan's shrink_active_list() culls any obviously unevictable pages -
i.e. !page_evictable(page) pages - diverting those to the unevictable list.
However, shrink_active_list() only sees unevictable pages that made it onto the
- active/inactive LRU lists. Note that these pages do not have PageUnevictable
+ active/inactive LRU lists. Note that these pages do not have PG_unevictable
set - otherwise they would be on the unevictable list and shrink_active_list()
would never see them.
rmap's folio_referenced_one(), called via vmscan's shrink_active_list() or
shrink_page_list(), and rmap's try_to_unmap_one() called via shrink_page_list(),
- check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_page()
+ check for (3) pages still mapped into VM_LOCKED VMAs, and call mlock_vma_folio()
to correct them. Such pages are culled to the unevictable list when released
by the shrinker.
-.. _zsmalloc:
-
========
zsmalloc
========
* ZS_ALMOST_FULL when n > N / f
* ZS_EMPTY when n == 0
* ZS_FULL when n == N
+
+
+ Internals
+ =========
+
+ zsmalloc has 255 size classes, each of which can hold a number of zspages.
+ Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
+ The optimal zspage chain size for each size class is calculated during the
+ creation of the zsmalloc pool (see calculate_zspage_chain_size()).
+
+ As an optimization, zsmalloc merges size classes that have similar
+ characteristics in terms of the number of pages per zspage and the number
+ of objects that each zspage can store.
+
+ For instance, consider the following size classes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 94 1536 0 0 0 0 0 3 0
+ 100 1632 0 0 0 0 0 2 0
+ ...
+
+
+ Size classes #95-99 are merged with size class #100. This means that when we
+ need to store an object of size, say, 1568 bytes, we end up using size class
+ #100 instead of size class #96. Size class #100 is meant for objects of size
+ 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
+
+ Size class #100 consists of zspages with 2 physical pages each, which can
+ hold a total of 5 objects. If we need to store 13 objects of size 1568, we
+ end up allocating three zspages, or 6 physical pages.
+
+ However, if we take a closer look at size class #96 (which is meant for
+ objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
+ find that the most optimal zspage configuration for this class is a chain
+ of 5 physical pages:::
+
+ pages per zspage wasted bytes used%
+ 1 960 76
+ 2 352 95
+ 3 1312 89
+ 4 704 95
+ 5 96 99
+
+ This means that a class #96 configuration with 5 physical pages can store 13
+ objects of size 1568 in a single zspage, using a total of 5 physical pages.
+ This is more efficient than the class #100 configuration, which would use 6
+ physical pages to store the same number of objects.
+
+ As the zspage chain size for class #96 increases, its key characteristics
+ such as pages per-zspage and objects per-zspage also change. This leads to
+ dewer class mergers, resulting in a more compact grouping of classes, which
+ reduces memory wastage.
+
+ Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+ Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
+ per zspage. Any object larger than 3264 bytes is considered huge and belongs
+ to size class #254, which stores each object in its own physical page (objects
+ in huge classes do not share pages).
+
+ Increasing the size of the chain of zspages also results in a higher watermark
+ for the huge size class and fewer huge classes overall. This allows for more
+ efficient storage of large objects.
+
+ For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 211 3408 0 0 0 0 0 5 0
+ 217 3504 0 0 0 0 0 6 0
+ 222 3584 0 0 0 0 0 7 0
+ 225 3632 0 0 0 0 0 8 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+ For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ 202 3264 0 0 0 0 0 4 0
+ 206 3328 0 0 0 0 0 13 0
+ 207 3344 0 0 0 0 0 9 0
+ 208 3360 0 0 0 0 0 14 0
+ 211 3408 0 0 0 0 0 5 0
+ 212 3424 0 0 0 0 0 16 0
+ 214 3456 0 0 0 0 0 11 0
+ 217 3504 0 0 0 0 0 6 0
+ 219 3536 0 0 0 0 0 13 0
+ 222 3584 0 0 0 0 0 7 0
+ 223 3600 0 0 0 0 0 15 0
+ 225 3632 0 0 0 0 0 8 0
+ 228 3680 0 0 0 0 0 9 0
+ 230 3712 0 0 0 0 0 10 0
+ 232 3744 0 0 0 0 0 11 0
+ 234 3776 0 0 0 0 0 12 0
+ 235 3792 0 0 0 0 0 13 0
+ 236 3808 0 0 0 0 0 14 0
+ 238 3840 0 0 0 0 0 15 0
+ 254 4096 0 0 0 0 0 1 0
+ ...
+
+ Overall the combined zspage chain size effect on zsmalloc pool configuration:::
+
+ pages per zspage number of size classes (clusters) huge size class watermark
+ 4 69 3264
+ 5 86 3408
+ 6 93 3504
+ 7 112 3584
+ 8 123 3632
+ 9 140 3680
+ 10 143 3712
+ 11 159 3744
+ 12 164 3776
+ 13 180 3792
+ 14 183 3808
+ 15 188 3840
+ 16 191 3840
+
+
+ A synthetic test
+ ----------------
+
+ zram as a build artifacts storage (Linux kernel compilation).
+
+ * `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
+
+ zsmalloc classes stats:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ Total 13 51 413836 412973 159955 3
+
+ zram mm_stat:::
+
+ 1691783168 628083717 655175680 0 655175680 60 0 34048 34049
+
+
+ * `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
+
+ zsmalloc classes stats:::
+
+ class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
+ ...
+ Total 18 87 414852 412978 156666 0
+
+ zram mm_stat:::
+
+ 1691803648 627793930 641703936 0 641703936 60 0 33591 33591
+
+ Using larger zspage chains may result in using fewer physical pages, as seen
+ in the example where the number of physical pages used decreased from 159955
+ to 156666, at the same time maximum zsmalloc pool memory usage went down from
+ 655175680 to 641703936 bytes.
+
+ However, this advantage may be offset by the potential for increased system
+ memory pressure (as some zspages have larger chain sizes) in cases where there
+ is heavy internal fragmentation and zspool compaction is unable to relocate
+ objects and release zspages. In these cases, it is recommended to decrease
+ the limit on the size of the zspage chains (as specified by the
+ CONFIG_ZSMALLOC_CHAIN_SIZE option).
概述
====
-:ref:`hugetlbpage` 中描述的巨页通常是预先分配给应用程序使用的。如果VMA指
+Documentation/mm/hugetlbpage.rst 中描述的巨页通常是预先分配给应用程序使用的。如果VMA指
示要使用巨页,这些巨页会在缺页异常时被实例化到任务的地址空间。如果在缺页异常
时没有巨页存在,任务就会被发送一个SIGBUS,并经常不高兴地死去。在加入巨页支
持后不久,人们决定,在mmap()时检测巨页的短缺情况会更好。这个想法是,如果
消耗预留/分配一个巨页
===========================
- 当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_huge_page()
+ 当与预留相关的巨页在相应的映射中被分配和实例化时,预留就被消耗了。该分配是在函数alloc_hugetlb_folio()
中进行的::
- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
- alloc_huge_page被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
- 此外,alloc_huge_page需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
+ alloc_hugetlb_folio被传递给一个VMA指针和一个虚拟地址,因此它可以查阅预留映射以确定是否存在预留。
+ 此外,alloc_hugetlb_folio需要一个参数avoid_reserve,该参数表示即使看起来已经为指定的地址预留了
预留,也不应该使用预留。avoid_reserve参数最常被用于写时拷贝和页面迁移的情况下,即现有页面的额
外拷贝被分配。
确定预留是否存在并可用于分配后,调用dequeue_huge_page_vma()函数。这个函数需要两个与预留有关
的参数:
- - avoid_reserve,这是传递给alloc_huge_page()的同一个值/参数。
+ - avoid_reserve,这是传递给alloc_hugetlb_folio()的同一个值/参数。
- chg,尽管这个参数的类型是long,但只有0或1的值被传递给dequeue_huge_page_vma。如果该值为0,
则表明存在预留(关于可能的问题,请参见 “预留和内存策略” 一节)。如果值
为1,则表示不存在预留,如果可能的话,必须从全局空闲池中取出该页。
的剩余巨页和超额分配的问题。即使分配了一个多余的页面,也会进行与上面一样的基于预留的调整:
SetPagePrivate(page) 和 resv_huge_pages--.
- 在获得一个新的巨页后,(page)->private被设置为与该页面相关的子池的值,如果它存在的话。当页
+ 在获得一个新的巨页后,(folio)->_hugetlb_subpool被设置为与该页面相关的子池的值,如果它存在的话。当页
面被释放时,这将被用于子池的计数。
然后调用函数vma_commit_reservation(),根据预留的消耗情况调整预留映射。一般来说,这涉及
已经存在,所以不做任何改变。然而,如果共享映射中没有预留,或者这是一个私有映射,则必须创建
一个新的条目。
- 在alloc_huge_page()开始调用vma_needs_reservation()和页面分配后调用
+ 在alloc_hugetlb_folio()开始调用vma_needs_reservation()和页面分配后调用
vma_commit_reservation()之间,预留映射有可能被改变。如果hugetlb_reserve_pages在共
享映射中为同一页面被调用,这将是可能的。在这种情况下,预留计数和子池空闲页计数会有一个偏差。
这种罕见的情况可以通过比较vma_needs_reservation和vma_commit_reservation的返回值来
一样进行。这两个不可能的分支应该不会影响到分配的性能,特别是在静态键跳转标签修补
功能可用的情况下。以下是由于这个功能而导致的内核代码大小的变化。
-- 没有page owner::
-
- text data bss dec hex filename
- 48392 2333 644 51369 c8a9 mm/page_alloc.o
-
-- 有page owner::
-
- text data bss dec hex filename
- 48800 2445 644 51889 cab1 mm/page_alloc.o
- 6662 108 29 6799 1a8f mm/page_owner.o
- 1025 8 8 1041 411 mm/page_ext.o
-
-虽然总共增加了8KB的代码,但page_alloc.o增加了520字节,其中不到一半是在hotpath
-中。构建带有page owner的内核,并在需要时打开它,将是调试内核内存问题的最佳选择。
+尽管启用page owner会使内核的大小增加几千字节,但这些代码大部分都在页面分配器和
+热路径之外。构建带有page owner的内核,并在需要时打开它,将是调试内核内存问题的
+最佳选择。
有一个问题是由实现细节引起的。页所有者将信息存储到struct page扩展的内存中。这
个内存的初始化时间比稀疏内存系统中的页面分配器启动的时间要晚一些,所以,在初始化
1) 构建用户空间的帮助::
- cd tools/vm
+ cd tools/mm
make page_owner_sort
2) 启用page owner: 添加 "page_owner=on" 到 boot cmdline.
F: Documentation/ABI/testing/configfs-acpi
F: Documentation/ABI/testing/sysfs-bus-acpi
F: Documentation/firmware-guide/acpi/
+F: arch/x86/kernel/acpi/
+F: arch/x86/pci/acpi.c
F: drivers/acpi/
F: drivers/pci/*/*acpi*
F: drivers/pci/*acpi*
-L: devel@acpica.org
S: Supported
W: https://acpica.org/
W: https://github.com/acpica/acpica/
F: drivers/dma/ptdma/
AMD SEATTLE DEVICE TREE SUPPORT
S: Supported
F: arch/arm64/boot/dts/amd/
AMD XGBE DRIVER
S: Supported
F: include/linux/soc/actions/
N: owl
-ARM/ADS SPHERE MACHINE SUPPORT
-S: Maintained
-
-ARM/AFEB9260 MACHINE SUPPORT
-S: Maintained
-
-ARM/AJECO 1ARM MACHINE SUPPORT
-S: Maintained
-
ARM/Allwinner SoC Clock Support
S: Maintained
F: drivers/soc/sunxi/
N: allwinner
N: sun[x456789]i
-N: sun50i
+N: sun[25]0i
ARM/Amlogic Meson SoC CLOCK FRAMEWORK
F: arch/arm/boot/dts/highbank.dts
F: arch/arm/mach-highbank/
-ARM/CAVIUM NETWORKS CNS3XXX MACHINE SUPPORT
-S: Maintained
-F: arch/arm/mach-cns3xxx/
-
ARM/CAVIUM THUNDER NETWORK DRIVER
S: Maintained
+F: arch/arm/boot/compressed/misc-ep93xx.h
F: arch/arm/mach-ep93xx/
-F: arch/arm/mach-ep93xx/include/mach/
ARM/CLKDEV SUPPORT
F: arch/arm/boot/dts/cx92755*
N: digicolor
-ARM/CONTEC MICRO9 MACHINE SUPPORT
-S: Maintained
-F: arch/arm/mach-ep93xx/micro9.c
-
ARM/CORESIGHT FRAMEWORK AND DRIVERS
F: tools/perf/util/cs-etm-decoder/*
F: tools/perf/util/cs-etm.*
-ARM/CORGI MACHINE SUPPORT
-S: Maintained
-
ARM/CORTINA SYSTEMS GEMINI ARM ARCHITECTURE
F: include/linux/armada-37xx-rwtm-mailbox.h
F: include/linux/moxtet.h
-ARM/EZX SMARTPHONES (A780, A910, A1200, E680, ROKR E2 and ROKR E6)
-S: Maintained
-F: arch/arm/mach-pxa/ezx.c
-
ARM/FARADAY FA526 PORT
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
X: drivers/media/i2c/
+F: arch/arm64/boot/dts/freescale/
+X: arch/arm64/boot/dts/freescale/fsl-*
+X: arch/arm64/boot/dts/freescale/qoriq-*
N: imx
N: mxs
F: arch/arm/boot/dts/vf*
F: arch/arm/mach-imx/*vf610*
-ARM/GLOMATION GESBC9312SX MACHINE SUPPORT
-S: Maintained
-
ARM/GUMSTIX MACHINE SUPPORT
S: Maintained
-ARM/H4700 (HP IPAQ HX4700) MACHINE SUPPORT
-S: Maintained
-F: arch/arm/mach-pxa/hx4700.c
-F: arch/arm/mach-pxa/include/mach/hx4700.h
-F: sound/soc/pxa/hx4700.c
-
ARM/HISILICON SOC SUPPORT
S: Maintained
+F: Documentation/hwmon/gxp-fan-ctrl.rst
F: Documentation/devicetree/bindings/arm/hpe,gxp.yaml
+F: Documentation/devicetree/bindings/hwmon/hpe,gxp-fan-ctrl.yaml
F: Documentation/devicetree/bindings/spi/hpe,gxp-spifi.yaml
F: Documentation/devicetree/bindings/timer/hpe,gxp-timer.yaml
F: arch/arm/boot/dts/hpe-bmc*
F: arch/arm/boot/dts/hpe-gxp*
F: arch/arm/mach-hpe/
F: drivers/clocksource/timer-gxp.c
+F: drivers/hwmon/gxp-fan-ctrl.c
F: drivers/spi/spi-gxp.c
F: drivers/watchdog/gxp-wdt.c
S: Maintained
F: arch/arm/boot/dts/omap3-igep*
-ARM/INCOME PXA270 SUPPORT
-S: Maintained
-F: arch/arm/mach-pxa/colibri-pxa270-income.c
-
-ARM/INTEL IOP32X ARM ARCHITECTURE
-S: Maintained
-
-ARM/INTEL IQ81342EX MACHINE SUPPORT
-S: Maintained
-
-ARM/INTEL IXDP2850 MACHINE SUPPORT
-S: Maintained
-
ARM/INTEL IXP4XX ARM ARCHITECTURE
S: Maintained
-ARM/IP FABRICS DOUBLE ESPRESSO MACHINE SUPPORT
-S: Maintained
-
ARM/LG1K ARCHITECTURE
S: Maintained
F: arch/arm64/boot/dts/lg/
-ARM/LOGICPD PXA270 MACHINE SUPPORT
-S: Maintained
-
ARM/LPC18XX ARCHITECTURE
F: drivers/watchdog/pnx4008_wdt.c
N: lpc32xx
-ARM/MAGICIAN MACHINE SUPPORT
-S: Maintained
-
ARM/Marvell Dove/MV78xx0/Orion SOC support
ARM/Mediatek SoC support
S: Maintained
W: https://mtk.wiki.kernel.org/
-C: irc://chat.freenode.net/linux-mediatek
+C: irc://irc.libera.chat/linux-mediatek
+F: arch/arm/boot/dts/mt2*
F: arch/arm/boot/dts/mt6*
F: arch/arm/boot/dts/mt7*
F: arch/arm/boot/dts/mt8*
F: arch/arm64/boot/dts/mediatek/
F: drivers/soc/mediatek/
N: mtk
-N: mt[678]
+N: mt[2678]
K: mediatek
ARM/Mediatek USB3 PHY DRIVER
F: arch/arm/mach-milbeaut/
N: milbeaut
-ARM/MIOA701 MACHINE SUPPORT
-S: Maintained
-F: arch/arm/mach-pxa/mioa701.c
-
ARM/MStar/Sigmastar Armv7 SoC support
F: include/dt-bindings/clock/mstar-*
F: include/dt-bindings/gpio/msc313-gpio.h
-ARM/NEC MOBILEPRO 900/c MACHINE SUPPORT
-S: Maintained
-
ARM/NOMADIK/Ux500 ARCHITECTURES
W: https://github.com/neuschaefer/wpcm450/wiki
F: Documentation/devicetree/bindings/*/*wpcm*
F: arch/arm/boot/dts/nuvoton-wpcm450*
+F: arch/arm/configs/wpcm450_defconfig
F: arch/arm/mach-npcm/wpcm450.c
F: drivers/*/*/*wpcm*
F: drivers/*/*wpcm*
S: Maintained
F: arch/arm64/boot/dts/freescale/s32g*.dts*
-ARM/OPENMOKO NEO FREERUNNER (GTA02) MACHINE SUPPORT
-S: Orphan
-W: http://wiki.openmoko.org/wiki/Neo_FreeRunner
-F: arch/arm/mach-s3c/gta02.h
-F: arch/arm/mach-s3c/mach-gta02.c
-
ARM/Orion SoC/Technologic Systems TS-78xx platform support
F: drivers/power/reset/oxnas-restart.c
N: oxnas
-ARM/PALM TREO SUPPORT
-S: Orphan
-F: arch/arm/mach-pxa/palmtreo.*
-
-ARM/PALMTX,PALMT5,PALMLD,PALMTE2,PALMTC SUPPORT
-S: Maintained
-W: http://hackndev.com
-F: arch/arm/mach-pxa/include/mach/palmld.h
-F: arch/arm/mach-pxa/include/mach/palmtc.h
-F: arch/arm/mach-pxa/include/mach/palmtx.h
-F: arch/arm/mach-pxa/palmld.c
-F: arch/arm/mach-pxa/palmt5.*
-F: arch/arm/mach-pxa/palmtc.c
-F: arch/arm/mach-pxa/palmte2.*
-F: arch/arm/mach-pxa/palmtx.c
-
-ARM/PALMZ72 SUPPORT
-S: Maintained
-W: http://hackndev.com
-F: arch/arm/mach-pxa/palmz72.*
-
-ARM/PLEB SUPPORT
-S: Maintained
-W: http://www.disy.cse.unsw.edu.au/Hardware/PLEB
-
-ARM/PT DIGITAL BOARD PORT
-S: Maintained
-W: http://www.armlinux.org.uk/
-
ARM/QUALCOMM SUPPORT
F: include/linux/*/qcom*
F: include/linux/soc/qcom/
-ARM/RADISYS ENP2611 MACHINE SUPPORT
-S: Maintained
-
ARM/RDA MICRO ARCHITECTURE
F: Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.yaml
F: Documentation/devicetree/bindings/spi/spi-rockchip.yaml
F: arch/arm/boot/dts/rk3*
-F: arch/arm/boot/dts/rv1108*
+F: arch/arm/boot/dts/rv11*
F: arch/arm/mach-rockchip/
F: drivers/*/*/*rockchip*
F: drivers/*/*rockchip*
F: include/linux/serial_s3c.h
F: include/linux/soc/samsung/
N: exynos
-N: s3c2410
N: s3c64xx
N: s5pv210
S: Maintained
W: http://www.stlinux.com
+F: Documentation/devicetree/bindings/spi/st,ssc-spi.yaml
F: Documentation/devicetree/bindings/i2c/i2c-st.txt
F: arch/arm/boot/dts/sti*
F: arch/arm/mach-sti/
S: Maintained
-F: arch/arm64/boot/dts/tesla*
+F: arch/arm64/boot/dts/tesla/
ARM/TETON BGA MACHINE SUPPORT
F: arch/arm64/boot/dts/ti/k3-*
F: include/dt-bindings/pinctrl/k3.h
-ARM/THECUS N2100 MACHINE SUPPORT
-S: Maintained
-
-ARM/TOSA MACHINE SUPPORT
-S: Maintained
-
ARM/TOSHIBA VISCONTI ARCHITECTURE
F: */*/*/vexpress*
F: */*/vexpress*
F: arch/arm/boot/dts/vexpress*
-F: arch/arm/mach-vexpress/
+F: arch/arm/mach-versatile/
F: arch/arm64/boot/dts/arm/
F: drivers/clk/versatile/clk-vexpress-osc.c
F: drivers/clocksource/timer-versatile.c
W: http://www.armlinux.org.uk/
F: arch/arm/vfp/
-ARM/VOIPAC PXA270 SUPPORT
-S: Maintained
-F: arch/arm/mach-pxa/include/mach/vpac270.h
-F: arch/arm/mach-pxa/vpac270.c
-
ARM/VT8500 ARM ARCHITECTURE
S: Orphan
F: drivers/video/fbdev/wm8505fb*
F: drivers/video/fbdev/wmt_ge_rops.*
-ARM/ZIPIT Z2 SUPPORT
-S: Maintained
-F: arch/arm/mach-pxa/include/mach/z2.h
-F: arch/arm/mach-pxa/z2.c
-
ARM/ZYNQ ARCHITECTURE
S: Maintained
-F: Documentation/devicetree/bindings/crypto/aspeed,ast2500-hace.yaml
+F: Documentation/devicetree/bindings/crypto/aspeed,*
F: drivers/crypto/aspeed/
ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
AUDIT SUBSYSTEM
S: Supported
W: https://github.com/linux-audit
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
BONDING DRIVER
S: Supported
S: Maintained
F: tools/testing/selftests/bpf/
+BPF [DOCUMENTATION] (Related to Standardization)
+S: Maintained
+F: Documentation/bpf/instruction-set.rst
+
BPF [MISC]
S: Odd Fixes
S: Maintained
F: drivers/phy/broadcom/phy-brcm-usb*
+BROADCOM Broadband SoC High Speed SPI Controller DRIVER
+S: Maintained
+F: Documentation/devicetree/bindings/spi/brcm,bcm63xx-hsspi.yaml
+F: drivers/spi/spi-bcm63xx-hsspi.c
+F: drivers/spi/spi-bcmbca-hsspi.c
+
BROADCOM ETHERNET PHY DRIVERS
F: net/sched/sch_taprio.c
CC2520 IEEE-802.15.4 RADIO DRIVER
-S: Maintained
+S: Odd Fixes
F: Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
F: drivers/net/ieee802154/cc2520.c
-F: include/linux/spi/cc2520.h
CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
F: Documentation/devicetree/bindings/sound/google,cros-ec-codec.yaml
F: sound/soc/codecs/cros_ec_codec.*
+CHROMEOS EC UART DRIVER
+S: Maintained
+F: drivers/platform/chrome/cros_ec_uart.c
+
CHROMEOS EC SUBDRIVERS
S: Maintained
-F: drivers/platform/chrome/cros_ec_typec.c
+F: drivers/platform/chrome/cros_ec_typec.*
F: drivers/platform/chrome/cros_typec_switch.c
+F: drivers/platform/chrome/cros_typec_vdm.*
CHROMEOS EC USB PD NOTIFY DRIVER
S: Maintained
+ W: https://damonitor.github.io
+ P: Documentation/mm/damon/maintainer-profile.rst
+ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
+ T: git git://git.kernel.org/pub/scm/linux/kernel/git/sj/linux.git damon/next
F: Documentation/ABI/testing/sysfs-kernel-mm-damon
F: Documentation/admin-guide/mm/damon/
F: Documentation/mm/damon/
F: drivers/platform/x86/dell/dell-wmi-ddv.c
DELL WMI SYSMAN DRIVER
F: Documentation/networking/devlink
F: include/net/devlink.h
F: include/uapi/linux/devlink.h
-F: net/core/devlink.c
+F: net/devlink/
DH ELECTRONICS IMX6 DHCOM/DHCOR BOARD SUPPORT
T: git git://git.linbit.com/drbd-8.4.git
F: Documentation/admin-guide/blockdev/
F: drivers/block/drbd/
+F: include/linux/drbd*
F: lib/lru_cache.c
DRIVER COMPONENT FRAMEWORK
T: git git://anongit.freedesktop.org/drm/drm-misc
F: drivers/gpu/drm/tiny/gm12u320.c
+DRM DRIVER FOR HIMAX HX8394 MIPI-DSI LCD panels
+S: Maintained
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: Documentation/devicetree/bindings/display/panel/himax,hx8394.yaml
+F: drivers/gpu/drm/panel/panel-himax-hx8394.c
+
DRM DRIVER FOR HX8357D PANELS
S: Maintained
F: Documentation/devicetree/bindings/display/ilitek,ili9486.yaml
F: drivers/gpu/drm/tiny/ili9486.c
-DRM DRIVER FOR INTEL I810 VIDEO CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/i810/
-F: include/uapi/drm/i810_drm.h
-
DRM DRIVER FOR JADARD JD9365DA-H3 MIPI-DSI LCD PANELS
S: Maintained
F: Documentation/devicetree/bindings/display/panel/mantix,mlaf057we51-x.yaml
F: drivers/gpu/drm/panel/panel-mantix-mlaf057we51.c
-DRM DRIVER FOR MATROX G200/G400 GRAPHICS CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/mga/
-F: include/uapi/drm/mga_drm.h
-
DRM DRIVER FOR MGA G200 GRAPHICS CHIPS
F: drivers/gpu/drm/qxl/
F: include/uapi/drm/qxl_drm.h
-DRM DRIVER FOR RAGE 128 VIDEO CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/r128/
-F: include/uapi/drm/r128_drm.h
-
DRM DRIVER FOR RAYDIUM RM67191 PANELS
S: Maintained
F: Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml
F: drivers/gpu/drm/panel/panel-sitronix-st7703.c
-DRM DRIVER FOR SAVAGE VIDEO CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/savage/
-F: include/uapi/drm/savage_drm.h
-
DRM DRIVER FOR FIRMWARE FRAMEBUFFERS
F: include/linux/aperture.h
F: include/video/nomodeset.h
-DRM DRIVER FOR SIS VIDEO CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/sis/
-F: include/uapi/drm/sis_drm.h
-
DRM DRIVER FOR SITRONIX ST7586 PANELS
S: Maintained
F: Documentation/devicetree/bindings/display/ste,mcde.yaml
F: drivers/gpu/drm/mcde/
-DRM DRIVER FOR TDFX VIDEO CARDS
-S: Orphan / Obsolete
-F: drivers/gpu/drm/tdfx/
-
DRM DRIVER FOR TI DLPC3433 MIPI DSI TO DMD BRIDGE
S: Maintained
T: git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
F: Documentation/accel/
F: drivers/accel/
+F: include/drm/drm_accel.h
+
+DRM ACCEL DRIVERS FOR INTEL VPU
+S: Supported
+T: git git://anongit.freedesktop.org/drm/drm-misc
+F: drivers/accel/ivpu/
+F: include/uapi/drm/ivpu_accel.h
DRM DRIVERS FOR ALLWINNER A10
S: Maintained
F: Documentation/devicetree/bindings/display/imx/
-F: drivers/gpu/drm/imx/
+F: drivers/gpu/drm/imx/ipuv3/
F: drivers/gpu/ipu-v3/
DRM DRIVERS FOR FREESCALE IMX BRIDGE
DRM DRIVERS FOR HISILICON
S: Maintained
T: git git://anongit.freedesktop.org/drm/drm-misc
S: Supported
-T: git git://anongit.freedesktop.org/tegra/linux.git
+T: git https://gitlab.freedesktop.org/drm/tegra.git
F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
F: Documentation/devicetree/bindings/gpu/host1x/
F: drivers/gpu/drm/tegra/
F: drivers/firmware/efi/test/
EFI VARIABLE FILESYSTEM
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
+F: Documentation/ABI/testing/sysfs-fs-erofs
F: Documentation/filesystems/erofs.rst
F: fs/erofs/
F: include/trace/events/erofs.h
EXTRA BOOT CONFIG
+Q: https://patchwork.kernel.org/project/linux-trace-kernel/list/
S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F: Documentation/admin-guide/bootconfig.rst
F: fs/proc/bootconfig.c
F: include/linux/bootconfig.h
FPU EMULATOR
S: Maintained
-W: http://floatingpoint.sourceforge.net/emulator/index.html
+W: https://floatingpoint.billm.au/
F: arch/x86/math-emu/
FRAMEBUFFER CORE
F: include/linux/fscache*.h
FSCRYPT: FILE SYSTEM LEVEL ENCRYPTION SUPPORT
S: Supported
Q: https://patchwork.kernel.org/project/linux-fscrypt/list/
-T: git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git
+T: git https://git.kernel.org/pub/scm/fs/fscrypt/linux.git
F: Documentation/filesystems/fscrypt.rst
F: fs/crypto/
-F: include/linux/fscrypt*.h
+F: include/linux/fscrypt.h
F: include/uapi/linux/fscrypt.h
FSI SUBSYSTEM
FSVERITY: READ-ONLY FILE-BASED AUTHENTICITY PROTECTION
S: Supported
-Q: https://patchwork.kernel.org/project/linux-fscrypt/list/
-T: git git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git fsverity
+Q: https://patchwork.kernel.org/project/fsverity/list/
+T: git https://git.kernel.org/pub/scm/fs/fsverity/linux.git
F: Documentation/filesystems/fsverity.rst
F: fs/verity/
F: include/linux/fsverity.h
F: arch/*/*/*/*ftrace*
F: arch/*/*/*ftrace*
F: include/*/ftrace.h
+F: samples/ftrace
FUNGIBLE ETHERNET DRIVERS
HABANALABS PCI DRIVER
S: Supported
+C: irc://irc.oftc.net/dri-devel
T: git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git
F: Documentation/ABI/testing/debugfs-driver-habanalabs
F: Documentation/ABI/testing/sysfs-driver-habanalabs
-F: drivers/misc/habanalabs/
+F: drivers/accel/habanalabs/
F: include/trace/events/habanalabs.h
-F: include/uapi/misc/habanalabs.h
+F: include/uapi/drm/habanalabs_accel.h
HACKRF MEDIA DRIVER
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
+F: Documentation/hid/
F: drivers/hid/
F: include/linux/hid*
F: include/uapi/linux/hid*
+F: samples/hid/
+F: tools/testing/selftests/hid/
HID LOGITECH DRIVERS
S: Maintained
F: drivers/hid/hid-logitech-*
+HID++ LOGITECH DRIVERS
+S: Maintained
+F: drivers/hid/hid-logitech-hidpp.c
+
HID PLAYSTATION DRIVER
HISILICON DMA DRIVER
-M: Jie Hai <haijie1@hisilicon.com>
+M: Jie Hai <haijie1@huawei.com>
S: Maintained
F: drivers/dma/hisi_dma.c
F: include/linux/hmm*
F: lib/test_hmm*
F: mm/hmm*
- F: tools/testing/selftests/vm/*hmm*
+ F: tools/testing/selftests/mm/*hmm*
HOST AP DRIVER
T: git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git
F: Documentation/filesystems/idmappings.rst
F: tools/testing/selftests/mount_setattr/
-F: include/linux/mnt_idmapping.h
+F: include/linux/mnt_idmapping.*
IDT VersaClock 5 CLOCK DRIVER
IEEE 802.15.4 SUBSYSTEM
S: Maintained
W: https://linux-wpan.org/
S: Maintained
F: drivers/iio/pressure/dps310.c
+INFINEON PEB2466 ASoC CODEC
+S: Maintained
+F: Documentation/devicetree/bindings/sound/infineon,peb2466.yaml
+F: sound/soc/codecs/peb2466.c
+
INFINIBAND SUBSYSTEM
F: Documentation/ABI/testing/sysfs-driver-intel-m10-bmc
F: Documentation/hwmon/intel-m10-bmc-hwmon.rst
F: drivers/hwmon/intel-m10-bmc-hwmon.c
-F: drivers/mfd/intel-m10-bmc.c
+F: drivers/mfd/intel-m10-bmc*
F: include/linux/mfd/intel-m10-bmc.h
INTEL MENLOW THERMAL DRIVER
F: arch/x86/include/asm/intel_telemetry.h
F: drivers/platform/x86/intel/telemetry/
+INTEL TPMI DRIVER
+S: Maintained
+F: drivers/platform/x86/intel/tpmi.c
+F: include/linux/intel_tpmi.h
+
INTEL UNCORE FREQUENCY CONTROL
S: Odd Fixes
F: drivers/tty/ipwireless/
+IRON DEVICE AUDIO CODEC DRIVERS
+S: Maintained
+F: Documentation/devicetree/bindings/sound/irondevice,*
+F: sound/soc/codecs/sma*
+
IRQ DOMAINS (IRQ NUMBER MAPPING LIBRARY)
S: Maintained
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
F: kernel/irq/
+F: include/linux/group_cpus.h
+F: lib/group_cpus.c
IRQCHIP DRIVERS
S: Maintained
F: tools/testing/ktest
+KTZ8866 BACKLIGHT DRIVER
+S: Maintained
+F: Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
+F: drivers/video/backlight/ktz8866.c
+
L3MDEV
S: Supported
W: https://landlock.io
-T: git https://github.com/landlock-lsm/linux.git
+T: git https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
F: Documentation/security/landlock.rst
F: Documentation/userspace-api/landlock.rst
F: include/uapi/linux/landlock.h
F: drivers/iio/potentiometer/mcp4531.c
MCR20A IEEE-802.15.4 RADIO DRIVER
-S: Maintained
+S: Odd Fixes
W: https://github.com/xueliu/mcr20a-linux
F: Documentation/devicetree/bindings/net/ieee802154/mcr20a.txt
F: drivers/net/ieee802154/mcr20a.c
S: Maintained
W: http://www.linux-mm.org
- T: git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
F: include/linux/gfp.h
F: include/linux/gfp_types.h
F: include/linux/mmzone.h
F: include/linux/pagewalk.h
F: mm/
- F: tools/testing/selftests/vm/
+ F: tools/mm/
+ F: tools/testing/selftests/mm/
VMALLOC
S: Maintained
W: http://www.linux-mm.org
- T: git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: include/linux/vmalloc.h
F: mm/vmalloc.c
F: Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml
F: Documentation/devicetree/bindings/net/dsa/microchip,lan937x.yaml
F: drivers/net/dsa/microchip/*
+F: include/linux/dsa/ksz_common.h
F: include/linux/platform_data/microchip-ksz.h
F: net/dsa/tag_ksz.c
S: Maintained
F: Documentation/devicetree/bindings/mfd/mps,mp2629.yaml
F: Documentation/devicetree/bindings/regulator/mps,mp*.yaml
+F: drivers/hwmon/pmbus/mpq7932.c
F: drivers/iio/adc/mp2629_adc.c
F: drivers/mfd/mp2629.c
F: drivers/power/supply/mp2629_charger.c
S: Maintained
+F: Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml
F: drivers/net/phy/motorcomm.c
MOXA SMARTIO/INDUSTIO/INTELLIO SERIAL CARD
F: drivers/media/radio/radio-mr800.c
MRF24J40 IEEE 802.15.4 RADIO DRIVER
-S: Maintained
+S: Odd Fixes
F: Documentation/devicetree/bindings/net/ieee802154/mrf24j40.txt
F: drivers/net/ieee802154/mrf24j40.c
MULTIFUNCTION DEVICES (MFD)
-S: Supported
+S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
F: Documentation/devicetree/bindings/mfd/
F: drivers/mfd/
S: Maintained
F: Documentation/devicetree/bindings/net/dsa/
+F: Documentation/devicetree/bindings/net/ethernet-switch-port.yaml
+F: Documentation/devicetree/bindings/net/ethernet-switch.yaml
F: drivers/net/dsa/
F: include/linux/dsa/
F: include/linux/platform_data/dsa.h
T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
T: git git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
+F: Documentation/core-api/netlink.rst
F: Documentation/networking/
F: Documentation/process/maintainer-netdev.rst
+F: Documentation/userspace-api/netlink/
F: include/linux/in.h
F: include/linux/net.h
F: include/linux/netdevice.h
F: lib/net_utils.c
F: lib/random32.c
F: net/
+F: tools/net/
F: tools/testing/selftests/net/
NETWORKING [IPSEC]
NETWORKING [IPv4/IPv6]
S: Maintained
F: net/netlabel/
NETWORKING [MPTCP]
S: Supported
F: Documentation/devicetree/bindings/mfd/mscc,ocelot.yaml
F: drivers/mfd/ocelot*
+F: drivers/net/dsa/ocelot/ocelot_ext.c
F: include/linux/mfd/ocelot.h
OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
F: arch/arm/configs/omap1_defconfig
F: arch/arm/mach-omap1/
-F: arch/arm/plat-omap/
F: drivers/i2c/busses/i2c-omap.c
F: include/linux/platform_data/ams-delta-fiq.h
F: include/linux/platform_data/i2c-omap.h
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git
F: arch/arm/configs/omap2plus_defconfig
F: arch/arm/mach-omap2/
-F: arch/arm/plat-omap/
F: drivers/bus/ti-sysc.c
F: drivers/i2c/busses/i2c-omap.c
F: drivers/irqchip/irq-omap-intc.c
F: include/linux/mtd/onenand*.h
ONEXPLAYER FAN DRIVER
S: Maintained
S: Maintained
F: arch/mips/boot/dts/ralink/omega2p.dts
+ONSEMI ETHERNET PHY DRIVERS
+S: Supported
+W: http://www.onsemi.com
+F: drivers/net/phy/ncn*
+
OP-TEE DRIVER
S: Maintained
W: http://openrisc.io
T: git https://github.com/openrisc/linux.git
W: https://wireless.wiki.kernel.org/en/users/Drivers/p54
F: drivers/net/wireless/intersil/p54/
+PACKET SOCKETS
+S: Maintained
+F: include/uapi/linux/if_packet.h
+F: net/packet/af_packet.c
+
PACKING
F: arch/*/kernel/paravirt*
F: include/linux/hypervisor.h
-PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
-S: Maintained
-F: Documentation/admin-guide/blockdev/paride.rst
-F: drivers/block/paride/
-
PARISC ARCHITECTURE
PCI ENDPOINT SUBSYSTEM
Q: https://patchwork.kernel.org/project/linux-pci/list/
B: https://bugzilla.kernel.org
C: irc://irc.oftc.net/linux-pci
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
F: Documentation/PCI/endpoint/*
F: Documentation/misc-devices/pci-endpoint-test.rst
F: drivers/misc/pci_endpoint_test.c
Q: https://patchwork.kernel.org/project/linux-pci/list/
B: https://bugzilla.kernel.org
C: irc://irc.oftc.net/linux-pci
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
F: Documentation/driver-api/pci/p2pdma.rst
F: drivers/pci/p2pdma.c
F: include/linux/pci-p2pdma.h
PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS
S: Supported
Q: https://patchwork.kernel.org/project/linux-pci/list/
B: https://bugzilla.kernel.org
C: irc://irc.oftc.net/linux-pci
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
F: Documentation/devicetree/bindings/pci/
F: drivers/pci/controller/
F: drivers/pci/pci-bridge-emul.c
Q: https://patchwork.kernel.org/project/linux-pci/list/
B: https://bugzilla.kernel.org
C: irc://irc.oftc.net/linux-pci
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
F: Documentation/PCI/
F: Documentation/devicetree/bindings/pci/
F: arch/x86/kernel/early-quirks.c
S: Supported
F: Documentation/devicetree/bindings/iio/chemical/plantower,pms7003.yaml
F: drivers/iio/chemical/pms7003.c
+PLCA RECONCILIATION SUBLAYER (IEEE802.3 Clause 148)
+S: Maintained
+F: drivers/net/phy/mdio-open-alliance.h
+F: net/ethtool/plca.c
+
PLDMFW LIBRARY
S: Maintained
F: Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
F: drivers/net/wireless/ath/ath11k/
+QUALCOMM ATH12K WIRELESS DRIVER
+S: Supported
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
+F: drivers/net/wireless/ath/ath12k/
+
QUALCOMM ATHEROS ATH9K WIRELESS DRIVER
F: drivers/net/ethernet/renesas/
F: include/linux/sh_eth.h
+RENESAS IDT821034 ASoC CODEC
+S: Maintained
+F: Documentation/devicetree/bindings/sound/renesas,idt821034.yaml
+F: sound/soc/codecs/idt821034.c
+
RENESAS R-CAR GYROADC DRIVER
S: Supported
Q: https://patchwork.kernel.org/project/linux-riscv/list/
+C: irc://irc.libera.chat/riscv
P: Documentation/riscv/patch-acceptance.rst
T: git git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
F: arch/riscv/
S: Supported
W: https://github.com/Rust-for-Linux/linux
B: https://github.com/Rust-for-Linux/linux/issues
+C: zulip://rust-for-linux.zulipchat.com
T: git https://github.com/Rust-for-Linux/linux.git rust-next
F: Documentation/rust/
F: rust/
F: Documentation/s390/
F: arch/s390/
F: drivers/s390/
+F: drivers/watchdog/diag288_wdt.c
S390 COMMON I/O LAYER
F: drivers/pci/hotplug/s390_pci_hpc.c
F: Documentation/s390/pci.rst
+S390 SCM DRIVER
+S: Supported
+F: drivers/s390/block/scm*
+F: drivers/s390/cio/scm.c
+
S390 VFIO AP DRIVER
S: Supported
F: drivers/s390/scsi/zfcp_*
-S3C ADC BATTERY DRIVER
-S: Odd Fixes
-F: drivers/power/supply/s3c_adc_battery.c
-F: include/linux/s3c_adc_battery.h
-
-S3C24XX SD/MMC Driver
-S: Supported
-F: drivers/mmc/host/s3cmci.*
-
SAA6588 RDS RECEIVER DRIVER
F: Documentation/devicetree/bindings/clock/samsung,s3c*
F: drivers/clk/samsung/
F: include/dt-bindings/clock/exynos*.h
-F: include/dt-bindings/clock/s3c*.h
F: include/dt-bindings/clock/s5p*.h
F: include/dt-bindings/clock/samsung,*.h
F: include/linux/clk/samsung.h
-F: include/linux/platform_data/clk-s3c2410.h
SAMSUNG SPI DRIVERS
F: Documentation/devicetree/bindings/spi/samsung,spi*.yaml
F: drivers/spi/spi-s3c*
F: include/linux/platform_data/spi-s3c64xx.h
-F: include/linux/spi/s3c24xx-fiq.h
SAMSUNG SXGBE DRIVERS
F: include/target/
SCTP PROTOCOL
S: Maintained
W: http://lksctp.sourceforge.net
S: Supported
+F: Documentation/networking/devlink/sfc.rst
F: drivers/net/ethernet/sfc/
SFF/SFP/SFP+ MODULE SUPPORT
S: Supported
W: http://www.simtec.co.uk/products/EB110ATX/
-SIMTEC EB2410ITX (BAST)
-S: Supported
-W: http://www.simtec.co.uk/products/EB2410ITX/
-F: arch/arm/mach-s3c/bast-ide.c
-F: arch/arm/mach-s3c/bast-irq.c
-F: arch/arm/mach-s3c/mach-bast.c
-
SIOX
S: Orphan
F: sound/soc/uniphier/
+SOCKET TIMESTAMPING
+S: Maintained
+F: Documentation/networking/timestamping.rst
+F: include/uapi/linux/net_tstamp.h
+F: tools/testing/selftests/net/so_txtime.c
+
SOEKRIS NET48XX LED SUPPORT
S: Maintained
F: drivers/clk/starfive/clk-starfive-jh7100*
F: include/dt-bindings/clock/starfive-jh7100*.h
-STARFIVE JH7100 PINCTRL DRIVER
+STARFIVE JH71X0 PINCTRL DRIVERS
S: Maintained
-F: Documentation/devicetree/bindings/pinctrl/starfive,jh7100-pinctrl.yaml
-F: drivers/pinctrl/starfive/
+F: Documentation/devicetree/bindings/pinctrl/starfive,jh71*.yaml
+F: drivers/pinctrl/starfive/pinctrl-starfive-jh71*
F: include/dt-bindings/pinctrl/pinctrl-starfive-jh7100.h
+F: include/dt-bindings/pinctrl/starfive,jh7110-pinctrl.h
STARFIVE JH7100 RESET CONTROLLER DRIVER
F: drivers/reset/reset-starfive-jh7100.c
F: include/dt-bindings/reset/starfive-jh7100.h
+STARFIVE TRNG DRIVER
+S: Supported
+F: Documentation/devicetree/bindings/rng/starfive*
+F: drivers/char/hw_random/jh7110-trng.c
+
STATIC BRANCH/CALL
SUPERH
S: Maintained
Q: http://patchwork.kernel.org/project/linux-sh/list/
S: Supported
B: https://bugzilla.kernel.org
F: Documentation/power/
-F: arch/x86/kernel/acpi/
+F: arch/x86/kernel/acpi/sleep*
+F: arch/x86/kernel/acpi/wakeup*
F: drivers/base/power/
F: include/linux/freezer.h
F: include/linux/pm.h
F: drivers/platform/x86/system76_acpi.c
SYSV FILESYSTEM
-S: Maintained
+S: Orphan
F: Documentation/filesystems/sysv-fs.rst
F: fs/sysv/
F: include/linux/sysv_fs.h
Q: https://patchwork.kernel.org/project/linux-pm/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git thermal
F: Documentation/ABI/testing/sysfs-class-thermal
+F: Documentation/admin-guide/thermal/
F: Documentation/devicetree/bindings/thermal/
F: Documentation/driver-api/thermal/
F: drivers/thermal/
S: Maintained
-F: drivers/net/thunderbolt.c
+F: drivers/net/thunderbolt/
THUNDERX GPIO DRIVER
Q: http://patchwork.linuxtv.org/project/linux-media/list/
T: git git://linuxtv.org/mhadli/v4l-dvb-davinci_devices.git
F: drivers/media/platform/ti/davinci/
-F: drivers/staging/media/deprecated/vpfe_capture/
F: include/media/davinci/
TI ENHANCED CAPTURE (eCAP) DRIVER
S: Supported
F: drivers/ufs/host/*dwc*
+UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER EXYNOS HOOKS
+S: Maintained
+F: drivers/ufs/host/ufs-exynos*
+
UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER MEDIATEK HOOKS
S: Maintained
F: drivers/ufs/host/ufs-mediatek*
+UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER QUALCOMM HOOKS
+S: Maintained
+F: Documentation/devicetree/bindings/ufs/qcom,ufs.yaml
+F: drivers/ufs/host/ufs-qcom*
+
UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER RENESAS HOOKS
USB WEBCAM GADGET
S: Maintained
F: drivers/usb/gadget/function/*uvc*
F: Documentation/admin-guide/media/zr364xx*
F: drivers/staging/media/deprecated/zr364xx/
+USER DATAGRAM PROTOCOL (UDP)
+S: Maintained
+F: include/linux/udp.h
+F: net/ipv4/udp.c
+F: net/ipv6/udp.c
+
USER-MODE LINUX (UML)
T: git git://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
UUID HELPERS
S: Maintained
-T: git git://git.infradead.org/users/hch/uuid.git
F: include/linux/uuid.h
F: include/uapi/linux/uuid.h
F: lib/test_uuid.c
T: git git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
F: drivers/platform/olpc/
F: drivers/platform/x86/
+F: include/linux/platform_data/x86/
X86 PLATFORM DRIVERS - ARCH
F: drivers/dma/xilinx/xilinx_dpdma.c
F: include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
+XILINX ZYNQMP OCM EDAC DRIVER
+S: Maintained
+F: Documentation/devicetree/bindings/memory-controllers/xlnx,zynqmp-ocmc-1.0.yaml
+F: drivers/edac/zynqmp_edac.c
+
XILINX ZYNQMP PSGTR PHY DRIVER
arm_pm_idle();
else
cpu_do_idle();
- raw_local_irq_enable();
}
void arch_cpu_idle_prepare(void)
gate_vma.vm_page_prot = PAGE_READONLY_EXEC;
gate_vma.vm_start = 0xffff0000;
gate_vma.vm_end = 0xffff0000 + PAGE_SIZE;
- gate_vma.vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC;
+ vm_flags_init(&gate_vma, VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC);
return 0;
}
arch_initcall(gate_vma_init);
}
extern void __sync_icache_dcache(pte_t pteval);
+bool pgattr_change_is_safe(u64 old, u64 new);
/*
* PTE bits configuration in the presence of hardware Dirty Bit Management
* PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
*/
-static inline void __check_racy_pte_update(struct mm_struct *mm, pte_t *ptep,
+static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
pte_t pte)
{
pte_t old_pte;
VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte),
"%s: racy dirty state clearing: 0x%016llx -> 0x%016llx",
__func__, pte_val(old_pte), pte_val(pte));
+ VM_WARN_ONCE(!pgattr_change_is_safe(pte_val(old_pte), pte_val(pte)),
+ "%s: unsafe attribute change: 0x%016llx -> 0x%016llx",
+ __func__, pte_val(old_pte), pte_val(pte));
}
static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
mte_sync_tags(old_pte, pte);
}
- __check_racy_pte_update(mm, ptep, pte);
+ __check_safe_pte_update(mm, ptep, pte);
set_pte(ptep, pte);
}
return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
}
- #define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline pte_t pte_swp_mkexclusive(pte_t pte)
{
return set_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
}
+
+#define pmdp_collapse_flush pmdp_collapse_flush
+extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmdp);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/*
- * Encode and decode a swap entry
+ * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
+ * are !pte_none() && !pte_present().
*
* Format of swap PTE:
* bit 0: _PAGE_PRESENT (zero)
* bit 1 to 3: _PAGE_LEAF (zero)
* bit 5: _PAGE_PROT_NONE (zero)
- * bits 6 to 10: swap type
- * bits 10 to XLEN-1: swap offset
+ * bit 6: exclusive marker
+ * bits 7 to 11: swap type
+ * bits 11 to XLEN-1: swap offset
*/
- #define __SWP_TYPE_SHIFT 6
+ #define __SWP_TYPE_SHIFT 7
#define __SWP_TYPE_BITS 5
#define __SWP_TYPE_MASK ((1UL << __SWP_TYPE_BITS) - 1)
#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
#define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
#define __swp_offset(x) ((x).val >> __SWP_OFFSET_SHIFT)
#define __swp_entry(type, offset) ((swp_entry_t) \
- { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
+ { (((type) & __SWP_TYPE_MASK) << __SWP_TYPE_SHIFT) | \
+ ((offset) << __SWP_OFFSET_SHIFT) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
+ static inline int pte_swp_exclusive(pte_t pte)
+ {
+ return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+ }
+
+ static inline pte_t pte_swp_mkexclusive(pte_t pte)
+ {
+ return __pte(pte_val(pte) | _PAGE_SWP_EXCLUSIVE);
+ }
+
+ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+ {
+ return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
+ }
+
#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
#define __pmd_to_swp_entry(pmd) ((swp_entry_t) { pmd_val(pmd) })
#define __swp_entry_to_pmd(swp) __pmd((swp).val)
#include <asm/uv.h>
extern pgd_t swapper_pg_dir[];
+extern pgd_t invalid_pg_dir[];
extern void paging_init(void);
extern unsigned long s390_invalid_asce;
#define _PAGE_SOFT_DIRTY 0x000
#endif
+#define _PAGE_SW_BITS 0xffUL /* All SW bits */
+
#define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE /* SW pte exclusive swap bit */
/* Set of bits not changed in pte_modify */
#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_SPECIAL | _PAGE_DIRTY | \
_PAGE_YOUNG | _PAGE_SOFT_DIRTY)
+/*
+ * Mask of bits that must not be changed with RDP. Allow only _PAGE_PROTECT
+ * HW bit and all SW bits.
+ */
+#define _PAGE_RDP_MASK ~(_PAGE_PROTECT | _PAGE_SW_BITS)
+
/*
* handle_pte_fault uses pte_present and pte_none to find out the pte type
* WITHOUT holding the page table lock. The _PAGE_PRESENT bit is used to
_REGION3_ENTRY_YOUNG | \
_REGION_ENTRY_PROTECT | \
_REGION_ENTRY_NOEXEC)
+#define REGION3_KERNEL_EXEC __pgprot(_REGION_ENTRY_TYPE_R3 | \
+ _REGION3_ENTRY_LARGE | \
+ _REGION3_ENTRY_READ | \
+ _REGION3_ENTRY_WRITE | \
+ _REGION3_ENTRY_YOUNG | \
+ _REGION3_ENTRY_DIRTY)
static inline bool mm_p4d_folded(struct mm_struct *mm)
{
}
#endif
- #define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
static inline int pte_swp_exclusive(pte_t pte)
{
return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
#define IPTE_NODAT 0x400
#define IPTE_GUEST_ASCE 0x800
+static __always_inline void __ptep_rdp(unsigned long addr, pte_t *ptep,
+ unsigned long opt, unsigned long asce,
+ int local)
+{
+ unsigned long pto;
+
+ pto = __pa(ptep) & ~(PTRS_PER_PTE * sizeof(pte_t) - 1);
+ asm volatile(".insn rrf,0xb98b0000,%[r1],%[r2],%[asce],%[m4]"
+ : "+m" (*ptep)
+ : [r1] "a" (pto), [r2] "a" ((addr & PAGE_MASK) | opt),
+ [asce] "a" (asce), [m4] "i" (local));
+}
+
static __always_inline void __ptep_ipte(unsigned long address, pte_t *ptep,
unsigned long opt, unsigned long asce,
int local)
ptep_xchg_lazy(mm, addr, ptep, pte_wrprotect(pte));
}
+/*
+ * Check if PTEs only differ in _PAGE_PROTECT HW bit, but also allow SW PTE
+ * bits in the comparison. Those might change e.g. because of dirty and young
+ * tracking.
+ */
+static inline int pte_allow_rdp(pte_t old, pte_t new)
+{
+ /*
+ * Only allow changes from RO to RW
+ */
+ if (!(pte_val(old) & _PAGE_PROTECT) || pte_val(new) & _PAGE_PROTECT)
+ return 0;
+
+ return (pte_val(old) & _PAGE_RDP_MASK) == (pte_val(new) & _PAGE_RDP_MASK);
+}
+
+static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
+ unsigned long address)
+{
+ /*
+ * RDP might not have propagated the PTE protection reset to all CPUs,
+ * so there could be spurious TLB protection faults.
+ * NOTE: This will also be called when a racing pagetable update on
+ * another thread already installed the correct PTE. Both cases cannot
+ * really be distinguished.
+ * Therefore, only do the local TLB flush when RDP can be used, to avoid
+ * unnecessary overhead.
+ */
+ if (MACHINE_HAS_RDP)
+ asm volatile("ptlb" : : : "memory");
+}
+#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
+
+void ptep_reset_dat_prot(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+ pte_t new);
+
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
static inline int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
{
if (pte_same(*ptep, entry))
return 0;
- ptep_xchg_direct(vma->vm_mm, addr, ptep, entry);
+ if (MACHINE_HAS_RDP && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, entry))
+ ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry);
+ else
+ ptep_xchg_direct(vma->vm_mm, addr, ptep, entry);
return 1;
}
unsigned int __read_mostly vdso64_enabled = 1;
#endif
-void __init init_vdso_image(const struct vdso_image *image)
+int __init init_vdso_image(const struct vdso_image *image)
{
+ BUILD_BUG_ON(VDSO_CLOCKMODE_MAX >= 32);
BUG_ON(image->size % PAGE_SIZE != 0);
apply_alternatives((struct alt_instr *)(image->data + image->alt),
(struct alt_instr *)(image->data + image->alt +
image->alt_len));
+
+ return 0;
}
static const struct vm_special_mapping vvar_mapping;
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- unsigned long size = vma->vm_end - vma->vm_start;
-
if (vma_is_special_mapping(vma, &vvar_mapping))
- zap_page_range(vma, vma->vm_start, size);
+ zap_vma_pages(vma);
}
mmap_read_unlock(mm);
return 1;
}
__setup("vdso=", vdso_setup);
-
-static int __init init_vdso(void)
-{
- BUILD_BUG_ON(VDSO_CLOCKMODE_MAX >= 32);
-
- init_vdso_image(&vdso_image_64);
-
-#ifdef CONFIG_X86_X32_ABI
- init_vdso_image(&vdso_image_x32);
-#endif
-
- return 0;
-}
-subsys_initcall(init_vdso);
#endif /* CONFIG_X86_64 */
break;
}
+ old_flags = READ_ONCE(pg->flags);
do {
- old_flags = pg->flags;
new_flags = (old_flags & _PGMT_CLEAR_MASK) | memtype_flags;
- } while (cmpxchg(&pg->flags, old_flags, new_flags) != old_flags);
+ } while (!try_cmpxchg(&pg->flags, &old_flags, new_flags));
}
#else
static inline enum page_cache_mode get_page_memtype(struct page *pg)
u8 mtrr_type, uniform;
mtrr_type = mtrr_type_lookup(start, end, &uniform);
- if (mtrr_type != MTRR_TYPE_WRBACK &&
- mtrr_type != MTRR_TYPE_INVALID)
+ if (mtrr_type != MTRR_TYPE_WRBACK)
return _PAGE_CACHE_MODE_UC_MINUS;
return _PAGE_CACHE_MODE_WB;
ret = reserve_pfn_range(paddr, size, prot, 0);
if (ret == 0 && vma)
- vma->vm_flags |= VM_PAT;
+ vm_flags_set(vma, VM_PAT);
return ret;
}
* can be for the entire vma (in which case pfn, size are zero).
*/
void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
- unsigned long size)
+ unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
unsigned long prot;
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
- if (vma)
- vma->vm_flags &= ~VM_PAT;
+ if (vma) {
+ if (mm_wr_locked)
+ vm_flags_clear(vma, VM_PAT);
+ else
+ __vm_flags_mod(vma, 0, VM_PAT);
+ }
}
/*
*/
void untrack_pfn_moved(struct vm_area_struct *vma)
{
- vma->vm_flags &= ~VM_PAT;
+ vm_flags_clear(vma, VM_PAT);
}
pgprot_t pgprot_writecombine(pgprot_t prot)
--- /dev/null
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE;
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2016-2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include <uapi/drm/habanalabs_accel.h>
+#include "habanalabs.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/pci-p2pdma.h>
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+#define HL_MMU_DEBUG 0
+
+/* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */
+#define DRAM_POOL_PAGE_SIZE SZ_8M
+
+#define MEM_HANDLE_INVALID ULONG_MAX
+
+static int allocate_timestamps_buffers(struct hl_fpriv *hpriv,
+ struct hl_mem_in *args, u64 *handle);
+
+static int set_alloc_page_size(struct hl_device *hdev, struct hl_mem_in *args, u32 *page_size)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 psize;
+
+ /*
+ * for ASIC that supports setting the allocation page size by user we will address
+ * user's choice only if it is not 0 (as 0 means taking the default page size)
+ */
+ if (prop->supports_user_set_page_size && args->alloc.page_size) {
+ psize = args->alloc.page_size;
+
+ if (!is_power_of_2(psize)) {
+ dev_err(hdev->dev, "user page size (%#llx) is not power of 2\n", psize);
+ return -EINVAL;
+ }
+ } else {
+ psize = prop->device_mem_alloc_default_page_size;
+ }
+
+ *page_size = psize;
+
+ return 0;
+}
+
+/*
+ * The va ranges in context object contain a list with the available chunks of
+ * device virtual memory.
+ * There is one range for host allocations and one for DRAM allocations.
+ *
+ * On initialization each range contains one chunk of all of its available
+ * virtual range which is a half of the total device virtual range.
+ *
+ * On each mapping of physical pages, a suitable virtual range chunk (with a
+ * minimum size) is selected from the list. If the chunk size equals the
+ * requested size, the chunk is returned. Otherwise, the chunk is split into
+ * two chunks - one to return as result and a remainder to stay in the list.
+ *
+ * On each Unmapping of a virtual address, the relevant virtual chunk is
+ * returned to the list. The chunk is added to the list and if its edges match
+ * the edges of the adjacent chunks (means a contiguous chunk can be created),
+ * the chunks are merged.
+ *
+ * On finish, the list is checked to have only one chunk of all the relevant
+ * virtual range (which is a half of the device total virtual range).
+ * If not (means not all mappings were unmapped), a warning is printed.
+ */
+
+/*
+ * alloc_device_memory() - allocate device memory.
+ * @ctx: pointer to the context structure.
+ * @args: host parameters containing the requested size.
+ * @ret_handle: result handle.
+ *
+ * This function does the following:
+ * - Allocate the requested size rounded up to 'dram_page_size' pages.
+ * - Return unique handle for later map/unmap/free.
+ */
+static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
+ u32 *ret_handle)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_vm *vm = &hdev->vm;
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ u64 paddr = 0, total_size, num_pgs, i;
+ u32 num_curr_pgs, page_size;
+ bool contiguous;
+ int handle, rc;
+
+ num_curr_pgs = 0;
+
+ rc = set_alloc_page_size(hdev, args, &page_size);
+ if (rc)
+ return rc;
+
+ num_pgs = DIV_ROUND_UP_ULL(args->alloc.mem_size, page_size);
+ total_size = num_pgs * page_size;
+
+ if (!total_size) {
+ dev_err(hdev->dev, "Cannot allocate 0 bytes\n");
+ return -EINVAL;
+ }
+
+ contiguous = args->flags & HL_MEM_CONTIGUOUS;
+
+ if (contiguous) {
+ if (is_power_of_2(page_size))
+ paddr = (uintptr_t) gen_pool_dma_alloc_align(vm->dram_pg_pool,
+ total_size, NULL, page_size);
+ else
+ paddr = gen_pool_alloc(vm->dram_pg_pool, total_size);
+ if (!paddr) {
+ dev_err(hdev->dev,
+ "Cannot allocate %llu contiguous pages with total size of %llu\n",
+ num_pgs, total_size);
+ return -ENOMEM;
+ }
+ }
+
+ phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
+ if (!phys_pg_pack) {
+ rc = -ENOMEM;
+ goto pages_pack_err;
+ }
+
+ phys_pg_pack->vm_type = VM_TYPE_PHYS_PACK;
+ phys_pg_pack->asid = ctx->asid;
+ phys_pg_pack->npages = num_pgs;
+ phys_pg_pack->page_size = page_size;
+ phys_pg_pack->total_size = total_size;
+ phys_pg_pack->flags = args->flags;
+ phys_pg_pack->contiguous = contiguous;
+
+ phys_pg_pack->pages = kvmalloc_array(num_pgs, sizeof(u64), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
+ rc = -ENOMEM;
+ goto pages_arr_err;
+ }
+
+ if (phys_pg_pack->contiguous) {
+ for (i = 0 ; i < num_pgs ; i++)
+ phys_pg_pack->pages[i] = paddr + i * page_size;
+ } else {
+ for (i = 0 ; i < num_pgs ; i++) {
+ if (is_power_of_2(page_size))
+ phys_pg_pack->pages[i] =
+ (uintptr_t)gen_pool_dma_alloc_align(vm->dram_pg_pool,
+ page_size, NULL,
+ page_size);
+ else
+ phys_pg_pack->pages[i] = gen_pool_alloc(vm->dram_pg_pool,
+ page_size);
+
+ if (!phys_pg_pack->pages[i]) {
+ dev_err(hdev->dev,
+ "Cannot allocate device memory (out of memory)\n");
+ rc = -ENOMEM;
+ goto page_err;
+ }
+
+ num_curr_pgs++;
+ }
+ }
+
+ spin_lock(&vm->idr_lock);
+ handle = idr_alloc(&vm->phys_pg_pack_handles, phys_pg_pack, 1, 0,
+ GFP_ATOMIC);
+ spin_unlock(&vm->idr_lock);
+
+ if (handle < 0) {
+ dev_err(hdev->dev, "Failed to get handle for page\n");
+ rc = -EFAULT;
+ goto idr_err;
+ }
+
+ for (i = 0 ; i < num_pgs ; i++)
+ kref_get(&vm->dram_pg_pool_refcount);
+
+ phys_pg_pack->handle = handle;
+
+ atomic64_add(phys_pg_pack->total_size, &ctx->dram_phys_mem);
+ atomic64_add(phys_pg_pack->total_size, &hdev->dram_used_mem);
+
+ *ret_handle = handle;
+
+ return 0;
+
+idr_err:
+page_err:
+ if (!phys_pg_pack->contiguous)
+ for (i = 0 ; i < num_curr_pgs ; i++)
+ gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[i],
+ page_size);
+
+ kvfree(phys_pg_pack->pages);
+pages_arr_err:
+ kfree(phys_pg_pack);
+pages_pack_err:
+ if (contiguous)
+ gen_pool_free(vm->dram_pg_pool, paddr, total_size);
+
+ return rc;
+}
+
+/**
+ * dma_map_host_va() - DMA mapping of the given host virtual address.
+ * @hdev: habanalabs device structure.
+ * @addr: the host virtual address of the memory area.
+ * @size: the size of the memory area.
+ * @p_userptr: pointer to result userptr structure.
+ *
+ * This function does the following:
+ * - Allocate userptr structure.
+ * - Pin the given host memory using the userptr structure.
+ * - Perform DMA mapping to have the DMA addresses of the pages.
+ */
+static int dma_map_host_va(struct hl_device *hdev, u64 addr, u64 size,
+ struct hl_userptr **p_userptr)
+{
+ struct hl_userptr *userptr;
+ int rc;
+
+ userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
+ if (!userptr) {
+ rc = -ENOMEM;
+ goto userptr_err;
+ }
+
+ rc = hl_pin_host_memory(hdev, addr, size, userptr);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to pin host memory\n");
+ goto pin_err;
+ }
+
+ userptr->dma_mapped = true;
+ userptr->dir = DMA_BIDIRECTIONAL;
+ userptr->vm_type = VM_TYPE_USERPTR;
+
+ *p_userptr = userptr;
+
+ rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, DMA_BIDIRECTIONAL);
+ if (rc) {
+ dev_err(hdev->dev, "failed to map sgt with DMA region\n");
+ goto dma_map_err;
+ }
+
+ return 0;
+
+dma_map_err:
+ hl_unpin_host_memory(hdev, userptr);
+pin_err:
+ kfree(userptr);
+userptr_err:
+
+ return rc;
+}
+
+/**
+ * dma_unmap_host_va() - DMA unmapping of the given host virtual address.
+ * @hdev: habanalabs device structure.
+ * @userptr: userptr to free.
+ *
+ * This function does the following:
+ * - Unpins the physical pages.
+ * - Frees the userptr structure.
+ */
+static void dma_unmap_host_va(struct hl_device *hdev,
+ struct hl_userptr *userptr)
+{
+ hl_unpin_host_memory(hdev, userptr);
+ kfree(userptr);
+}
+
+/**
+ * dram_pg_pool_do_release() - free DRAM pages pool
+ * @ref: pointer to reference object.
+ *
+ * This function does the following:
+ * - Frees the idr structure of physical pages handles.
+ * - Frees the generic pool of DRAM physical pages.
+ */
+static void dram_pg_pool_do_release(struct kref *ref)
+{
+ struct hl_vm *vm = container_of(ref, struct hl_vm,
+ dram_pg_pool_refcount);
+
+ /*
+ * free the idr here as only here we know for sure that there are no
+ * allocated physical pages and hence there are no handles in use
+ */
+ idr_destroy(&vm->phys_pg_pack_handles);
+ gen_pool_destroy(vm->dram_pg_pool);
+}
+
+/**
+ * free_phys_pg_pack() - free physical page pack.
+ * @hdev: habanalabs device structure.
+ * @phys_pg_pack: physical page pack to free.
+ *
+ * This function does the following:
+ * - For DRAM memory only
+ * - iterate over the pack, free each physical block structure by
+ * returning it to the general pool.
+ * - Free the hl_vm_phys_pg_pack structure.
+ */
+static void free_phys_pg_pack(struct hl_device *hdev,
+ struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+ struct hl_vm *vm = &hdev->vm;
+ u64 i;
+
+ if (phys_pg_pack->created_from_userptr)
+ goto end;
+
+ if (phys_pg_pack->contiguous) {
+ gen_pool_free(vm->dram_pg_pool, phys_pg_pack->pages[0],
+ phys_pg_pack->total_size);
+
+ for (i = 0; i < phys_pg_pack->npages ; i++)
+ kref_put(&vm->dram_pg_pool_refcount,
+ dram_pg_pool_do_release);
+ } else {
+ for (i = 0 ; i < phys_pg_pack->npages ; i++) {
+ gen_pool_free(vm->dram_pg_pool,
+ phys_pg_pack->pages[i],
+ phys_pg_pack->page_size);
+ kref_put(&vm->dram_pg_pool_refcount,
+ dram_pg_pool_do_release);
+ }
+ }
+
+end:
+ kvfree(phys_pg_pack->pages);
+ kfree(phys_pg_pack);
+
+ return;
+}
+
+/**
+ * free_device_memory() - free device memory.
+ * @ctx: pointer to the context structure.
+ * @args: host parameters containing the requested size.
+ *
+ * This function does the following:
+ * - Free the device memory related to the given handle.
+ */
+static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_vm *vm = &hdev->vm;
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ u32 handle = args->free.handle;
+
+ spin_lock(&vm->idr_lock);
+ phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
+ if (!phys_pg_pack) {
+ spin_unlock(&vm->idr_lock);
+ dev_err(hdev->dev, "free device memory failed, no match for handle %u\n", handle);
+ return -EINVAL;
+ }
+
+ if (atomic_read(&phys_pg_pack->mapping_cnt) > 0) {
+ spin_unlock(&vm->idr_lock);
+ dev_err(hdev->dev, "handle %u is mapped, cannot free\n", handle);
+ return -EINVAL;
+ }
+
+ /* must remove from idr before the freeing of the physical pages as the refcount of the pool
+ * is also the trigger of the idr destroy
+ */
+ idr_remove(&vm->phys_pg_pack_handles, handle);
+ spin_unlock(&vm->idr_lock);
+
+ atomic64_sub(phys_pg_pack->total_size, &ctx->dram_phys_mem);
+ atomic64_sub(phys_pg_pack->total_size, &hdev->dram_used_mem);
+
+ free_phys_pg_pack(hdev, phys_pg_pack);
+
+ return 0;
+}
+
+/**
+ * clear_va_list_locked() - free virtual addresses list.
+ * @hdev: habanalabs device structure.
+ * @va_list: list of virtual addresses to free.
+ *
+ * This function does the following:
+ * - Iterate over the list and free each virtual addresses block.
+ *
+ * This function should be called only when va_list lock is taken.
+ */
+static void clear_va_list_locked(struct hl_device *hdev,
+ struct list_head *va_list)
+{
+ struct hl_vm_va_block *va_block, *tmp;
+
+ list_for_each_entry_safe(va_block, tmp, va_list, node) {
+ list_del(&va_block->node);
+ kfree(va_block);
+ }
+}
+
+/**
+ * print_va_list_locked() - print virtual addresses list.
+ * @hdev: habanalabs device structure.
+ * @va_list: list of virtual addresses to print.
+ *
+ * This function does the following:
+ * - Iterate over the list and print each virtual addresses block.
+ *
+ * This function should be called only when va_list lock is taken.
+ */
+static void print_va_list_locked(struct hl_device *hdev,
+ struct list_head *va_list)
+{
+#if HL_MMU_DEBUG
+ struct hl_vm_va_block *va_block;
+
+ dev_dbg(hdev->dev, "print va list:\n");
+
+ list_for_each_entry(va_block, va_list, node)
+ dev_dbg(hdev->dev,
+ "va block, start: 0x%llx, end: 0x%llx, size: %llu\n",
+ va_block->start, va_block->end, va_block->size);
+#endif
+}
+
+/**
+ * merge_va_blocks_locked() - merge a virtual block if possible.
+ * @hdev: pointer to the habanalabs device structure.
+ * @va_list: pointer to the virtual addresses block list.
+ * @va_block: virtual block to merge with adjacent blocks.
+ *
+ * This function does the following:
+ * - Merge the given blocks with the adjacent blocks if their virtual ranges
+ * create a contiguous virtual range.
+ *
+ * This Function should be called only when va_list lock is taken.
+ */
+static void merge_va_blocks_locked(struct hl_device *hdev,
+ struct list_head *va_list, struct hl_vm_va_block *va_block)
+{
+ struct hl_vm_va_block *prev, *next;
+
+ prev = list_prev_entry(va_block, node);
+ if (&prev->node != va_list && prev->end + 1 == va_block->start) {
+ prev->end = va_block->end;
+ prev->size = prev->end - prev->start + 1;
+ list_del(&va_block->node);
+ kfree(va_block);
+ va_block = prev;
+ }
+
+ next = list_next_entry(va_block, node);
+ if (&next->node != va_list && va_block->end + 1 == next->start) {
+ next->start = va_block->start;
+ next->size = next->end - next->start + 1;
+ list_del(&va_block->node);
+ kfree(va_block);
+ }
+}
+
+/**
+ * add_va_block_locked() - add a virtual block to the virtual addresses list.
+ * @hdev: pointer to the habanalabs device structure.
+ * @va_list: pointer to the virtual addresses block list.
+ * @start: start virtual address.
+ * @end: end virtual address.
+ *
+ * This function does the following:
+ * - Add the given block to the virtual blocks list and merge with other blocks
+ * if a contiguous virtual block can be created.
+ *
+ * This Function should be called only when va_list lock is taken.
+ */
+static int add_va_block_locked(struct hl_device *hdev,
+ struct list_head *va_list, u64 start, u64 end)
+{
+ struct hl_vm_va_block *va_block, *res = NULL;
+ u64 size = end - start + 1;
+
+ print_va_list_locked(hdev, va_list);
+
+ list_for_each_entry(va_block, va_list, node) {
+ /* TODO: remove upon matureness */
+ if (hl_mem_area_crosses_range(start, size, va_block->start,
+ va_block->end)) {
+ dev_err(hdev->dev,
+ "block crossing ranges at start 0x%llx, end 0x%llx\n",
+ va_block->start, va_block->end);
+ return -EINVAL;
+ }
+
+ if (va_block->end < start)
+ res = va_block;
+ }
+
+ va_block = kmalloc(sizeof(*va_block), GFP_KERNEL);
+ if (!va_block)
+ return -ENOMEM;
+
+ va_block->start = start;
+ va_block->end = end;
+ va_block->size = size;
+
+ if (!res)
+ list_add(&va_block->node, va_list);
+ else
+ list_add(&va_block->node, &res->node);
+
+ merge_va_blocks_locked(hdev, va_list, va_block);
+
+ print_va_list_locked(hdev, va_list);
+
+ return 0;
+}
+
+/**
+ * add_va_block() - wrapper for add_va_block_locked.
+ * @hdev: pointer to the habanalabs device structure.
+ * @va_range: pointer to the virtual addresses range object.
+ * @start: start virtual address.
+ * @end: end virtual address.
+ *
+ * This function does the following:
+ * - Takes the list lock and calls add_va_block_locked.
+ */
+static inline int add_va_block(struct hl_device *hdev,
+ struct hl_va_range *va_range, u64 start, u64 end)
+{
+ int rc;
+
+ mutex_lock(&va_range->lock);
+ rc = add_va_block_locked(hdev, &va_range->list, start, end);
+ mutex_unlock(&va_range->lock);
+
+ return rc;
+}
+
+/**
+ * is_hint_crossing_range() - check if hint address crossing specified reserved.
+ * @range_type: virtual space range type.
+ * @start_addr: start virtual address.
+ * @size: block size.
+ * @prop: asic properties structure to retrieve reserved ranges from.
+ */
+static inline bool is_hint_crossing_range(enum hl_va_range_type range_type,
+ u64 start_addr, u32 size, struct asic_fixed_properties *prop) {
+ bool range_cross;
+
+ if (range_type == HL_VA_RANGE_TYPE_DRAM)
+ range_cross =
+ hl_mem_area_crosses_range(start_addr, size,
+ prop->hints_dram_reserved_va_range.start_addr,
+ prop->hints_dram_reserved_va_range.end_addr);
+ else if (range_type == HL_VA_RANGE_TYPE_HOST)
+ range_cross =
+ hl_mem_area_crosses_range(start_addr, size,
+ prop->hints_host_reserved_va_range.start_addr,
+ prop->hints_host_reserved_va_range.end_addr);
+ else
+ range_cross =
+ hl_mem_area_crosses_range(start_addr, size,
+ prop->hints_host_hpage_reserved_va_range.start_addr,
+ prop->hints_host_hpage_reserved_va_range.end_addr);
+
+ return range_cross;
+}
+
+/**
+ * get_va_block() - get a virtual block for the given size and alignment.
+ *
+ * @hdev: pointer to the habanalabs device structure.
+ * @va_range: pointer to the virtual addresses range.
+ * @size: requested block size.
+ * @hint_addr: hint for requested address by the user.
+ * @va_block_align: required alignment of the virtual block start address.
+ * @range_type: va range type (host, dram)
+ * @flags: additional memory flags, currently only uses HL_MEM_FORCE_HINT
+ *
+ * This function does the following:
+ * - Iterate on the virtual block list to find a suitable virtual block for the
+ * given size, hint address and alignment.
+ * - Reserve the requested block and update the list.
+ * - Return the start address of the virtual block.
+ */
+static u64 get_va_block(struct hl_device *hdev,
+ struct hl_va_range *va_range,
+ u64 size, u64 hint_addr, u32 va_block_align,
+ enum hl_va_range_type range_type,
+ u32 flags)
+{
+ struct hl_vm_va_block *va_block, *new_va_block = NULL;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 tmp_hint_addr, valid_start, valid_size, prev_start, prev_end,
+ align_mask, reserved_valid_start = 0, reserved_valid_size = 0,
+ dram_hint_mask = prop->dram_hints_align_mask;
+ bool add_prev = false;
+ bool is_align_pow_2 = is_power_of_2(va_range->page_size);
+ bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr);
+ bool force_hint = flags & HL_MEM_FORCE_HINT;
+
+ if (is_align_pow_2)
+ align_mask = ~((u64)va_block_align - 1);
+ else
+ /*
+ * with non-power-of-2 range we work only with page granularity
+ * and the start address is page aligned,
+ * so no need for alignment checking.
+ */
+ size = DIV_ROUND_UP_ULL(size, va_range->page_size) *
+ va_range->page_size;
+
+ tmp_hint_addr = hint_addr & ~dram_hint_mask;
+
+ /* Check if we need to ignore hint address */
+ if ((is_align_pow_2 && (hint_addr & (va_block_align - 1))) ||
+ (!is_align_pow_2 && is_hint_dram_addr &&
+ do_div(tmp_hint_addr, va_range->page_size))) {
+
+ if (force_hint) {
+ /* Hint must be respected, so here we just fail */
+ dev_err(hdev->dev,
+ "Hint address 0x%llx is not page aligned - cannot be respected\n",
+ hint_addr);
+ return 0;
+ }
+
+ dev_dbg(hdev->dev,
+ "Hint address 0x%llx will be ignored because it is not aligned\n",
+ hint_addr);
+ hint_addr = 0;
+ }
+
+ mutex_lock(&va_range->lock);
+
+ print_va_list_locked(hdev, &va_range->list);
+
+ list_for_each_entry(va_block, &va_range->list, node) {
+ /* Calc the first possible aligned addr */
+ valid_start = va_block->start;
+
+ if (is_align_pow_2 && (valid_start & (va_block_align - 1))) {
+ valid_start &= align_mask;
+ valid_start += va_block_align;
+ if (valid_start > va_block->end)
+ continue;
+ }
+
+ valid_size = va_block->end - valid_start + 1;
+ if (valid_size < size)
+ continue;
+
+ /*
+ * In case hint address is 0, and hints_range_reservation
+ * property enabled, then avoid allocating va blocks from the
+ * range reserved for hint addresses
+ */
+ if (prop->hints_range_reservation && !hint_addr)
+ if (is_hint_crossing_range(range_type, valid_start,
+ size, prop))
+ continue;
+
+ /* Pick the minimal length block which has the required size */
+ if (!new_va_block || (valid_size < reserved_valid_size)) {
+ new_va_block = va_block;
+ reserved_valid_start = valid_start;
+ reserved_valid_size = valid_size;
+ }
+
+ if (hint_addr && hint_addr >= valid_start &&
+ (hint_addr + size) <= va_block->end) {
+ new_va_block = va_block;
+ reserved_valid_start = hint_addr;
+ reserved_valid_size = valid_size;
+ break;
+ }
+ }
+
+ if (!new_va_block) {
+ dev_err(hdev->dev, "no available va block for size %llu\n",
+ size);
+ goto out;
+ }
+
+ if (force_hint && reserved_valid_start != hint_addr) {
+ /* Hint address must be respected. If we are here - this means
+ * we could not respect it.
+ */
+ dev_err(hdev->dev,
+ "Hint address 0x%llx could not be respected\n",
+ hint_addr);
+ reserved_valid_start = 0;
+ goto out;
+ }
+
+ /*
+ * Check if there is some leftover range due to reserving the new
+ * va block, then return it to the main virtual addresses list.
+ */
+ if (reserved_valid_start > new_va_block->start) {
+ prev_start = new_va_block->start;
+ prev_end = reserved_valid_start - 1;
+
+ new_va_block->start = reserved_valid_start;
+ new_va_block->size = reserved_valid_size;
+
+ add_prev = true;
+ }
+
+ if (new_va_block->size > size) {
+ new_va_block->start += size;
+ new_va_block->size = new_va_block->end - new_va_block->start + 1;
+ } else {
+ list_del(&new_va_block->node);
+ kfree(new_va_block);
+ }
+
+ if (add_prev)
+ add_va_block_locked(hdev, &va_range->list, prev_start,
+ prev_end);
+
+ print_va_list_locked(hdev, &va_range->list);
+out:
+ mutex_unlock(&va_range->lock);
+
+ return reserved_valid_start;
+}
+
+/*
+ * hl_reserve_va_block() - reserve a virtual block of a given size.
+ * @hdev: pointer to the habanalabs device structure.
+ * @ctx: current context
+ * @type: virtual addresses range type.
+ * @size: requested block size.
+ * @alignment: required alignment in bytes of the virtual block start address,
+ * 0 means no alignment.
+ *
+ * This function does the following:
+ * - Iterate on the virtual block list to find a suitable virtual block for the
+ * given size and alignment.
+ * - Reserve the requested block and update the list.
+ * - Return the start address of the virtual block.
+ */
+u64 hl_reserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
+ enum hl_va_range_type type, u64 size, u32 alignment)
+{
+ return get_va_block(hdev, ctx->va_range[type], size, 0,
+ max(alignment, ctx->va_range[type]->page_size),
+ type, 0);
+}
+
+/**
+ * hl_get_va_range_type() - get va_range type for the given address and size.
+ * @ctx: context to fetch va_range from.
+ * @address: the start address of the area we want to validate.
+ * @size: the size in bytes of the area we want to validate.
+ * @type: returned va_range type.
+ *
+ * Return: true if the area is inside a valid range, false otherwise.
+ */
+static int hl_get_va_range_type(struct hl_ctx *ctx, u64 address, u64 size,
+ enum hl_va_range_type *type)
+{
+ int i;
+
+ for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX; i++) {
+ if (hl_mem_area_inside_range(address, size,
+ ctx->va_range[i]->start_addr,
+ ctx->va_range[i]->end_addr)) {
+ *type = i;
+ return 0;
+ }
+ }
+
+ return -EINVAL;
+}
+
+/**
+ * hl_unreserve_va_block() - wrapper for add_va_block to unreserve a va block.
+ * @hdev: pointer to the habanalabs device structure
+ * @ctx: pointer to the context structure.
+ * @start_addr: start virtual address.
+ * @size: number of bytes to unreserve.
+ *
+ * This function does the following:
+ * - Takes the list lock and calls add_va_block_locked.
+ */
+int hl_unreserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
+ u64 start_addr, u64 size)
+{
+ enum hl_va_range_type type;
+ int rc;
+
+ rc = hl_get_va_range_type(ctx, start_addr, size, &type);
+ if (rc) {
+ dev_err(hdev->dev,
+ "cannot find va_range for va %#llx size %llu",
+ start_addr, size);
+ return rc;
+ }
+
+ rc = add_va_block(hdev, ctx->va_range[type], start_addr,
+ start_addr + size - 1);
+ if (rc)
+ dev_warn(hdev->dev,
+ "add va block failed for vaddr: 0x%llx\n", start_addr);
+
+ return rc;
+}
+
+/**
+ * init_phys_pg_pack_from_userptr() - initialize physical page pack from host
+ * memory
+ * @ctx: pointer to the context structure.
+ * @userptr: userptr to initialize from.
+ * @pphys_pg_pack: result pointer.
+ * @force_regular_page: tell the function to ignore huge page optimization,
+ * even if possible. Needed for cases where the device VA
+ * is allocated before we know the composition of the
+ * physical pages
+ *
+ * This function does the following:
+ * - Pin the physical pages related to the given virtual block.
+ * - Create a physical page pack from the physical pages related to the given
+ * virtual block.
+ */
+static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
+ struct hl_userptr *userptr,
+ struct hl_vm_phys_pg_pack **pphys_pg_pack,
+ bool force_regular_page)
+{
+ u32 npages, page_size = PAGE_SIZE,
+ huge_page_size = ctx->hdev->asic_prop.pmmu_huge.page_size;
+ u32 pgs_in_huge_page = huge_page_size >> __ffs(page_size);
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ bool first = true, is_huge_page_opt;
+ u64 page_mask, total_npages;
+ struct scatterlist *sg;
+ dma_addr_t dma_addr;
+ int rc, i, j;
+
+ phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
+ if (!phys_pg_pack)
+ return -ENOMEM;
+
+ phys_pg_pack->vm_type = userptr->vm_type;
+ phys_pg_pack->created_from_userptr = true;
+ phys_pg_pack->asid = ctx->asid;
+ atomic_set(&phys_pg_pack->mapping_cnt, 1);
+
+ is_huge_page_opt = (force_regular_page ? false : true);
+
+ /* Only if all dma_addrs are aligned to 2MB and their
+ * sizes is at least 2MB, we can use huge page mapping.
+ * We limit the 2MB optimization to this condition,
+ * since later on we acquire the related VA range as one
+ * consecutive block.
+ */
+ total_npages = 0;
+ for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
+ npages = hl_get_sg_info(sg, &dma_addr);
+
+ total_npages += npages;
+
+ if ((npages % pgs_in_huge_page) ||
+ (dma_addr & (huge_page_size - 1)))
+ is_huge_page_opt = false;
+ }
+
+ if (is_huge_page_opt) {
+ page_size = huge_page_size;
+ do_div(total_npages, pgs_in_huge_page);
+ }
+
+ page_mask = ~(((u64) page_size) - 1);
+
+ phys_pg_pack->pages = kvmalloc_array(total_npages, sizeof(u64),
+ GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(phys_pg_pack->pages)) {
+ rc = -ENOMEM;
+ goto page_pack_arr_mem_err;
+ }
+
+ phys_pg_pack->npages = total_npages;
+ phys_pg_pack->page_size = page_size;
+ phys_pg_pack->total_size = total_npages * page_size;
+
+ j = 0;
+ for_each_sgtable_dma_sg(userptr->sgt, sg, i) {
+ npages = hl_get_sg_info(sg, &dma_addr);
+
+ /* align down to physical page size and save the offset */
+ if (first) {
+ first = false;
+ phys_pg_pack->offset = dma_addr & (page_size - 1);
+ dma_addr &= page_mask;
+ }
+
+ while (npages) {
+ phys_pg_pack->pages[j++] = dma_addr;
+ dma_addr += page_size;
+
+ if (is_huge_page_opt)
+ npages -= pgs_in_huge_page;
+ else
+ npages--;
+ }
+ }
+
+ *pphys_pg_pack = phys_pg_pack;
+
+ return 0;
+
+page_pack_arr_mem_err:
+ kfree(phys_pg_pack);
+
+ return rc;
+}
+
+/**
+ * map_phys_pg_pack() - maps the physical page pack..
+ * @ctx: pointer to the context structure.
+ * @vaddr: start address of the virtual area to map from.
+ * @phys_pg_pack: the pack of physical pages to map to.
+ *
+ * This function does the following:
+ * - Maps each chunk of virtual memory to matching physical chunk.
+ * - Stores number of successful mappings in the given argument.
+ * - Returns 0 on success, error code otherwise.
+ */
+static int map_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+ struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+ struct hl_device *hdev = ctx->hdev;
+ u64 next_vaddr = vaddr, paddr, mapped_pg_cnt = 0, i;
+ u32 page_size = phys_pg_pack->page_size;
+ int rc = 0;
+ bool is_host_addr;
+
+ for (i = 0 ; i < phys_pg_pack->npages ; i++) {
+ paddr = phys_pg_pack->pages[i];
+
+ rc = hl_mmu_map_page(ctx, next_vaddr, paddr, page_size,
+ (i + 1) == phys_pg_pack->npages);
+ if (rc) {
+ dev_err(hdev->dev,
+ "map failed for handle %u, npages: %llu, mapped: %llu",
+ phys_pg_pack->handle, phys_pg_pack->npages,
+ mapped_pg_cnt);
+ goto err;
+ }
+
+ mapped_pg_cnt++;
+ next_vaddr += page_size;
+ }
+
+ return 0;
+
+err:
+ is_host_addr = !hl_is_dram_va(hdev, vaddr);
+
+ next_vaddr = vaddr;
+ for (i = 0 ; i < mapped_pg_cnt ; i++) {
+ if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
+ (i + 1) == mapped_pg_cnt))
+ dev_warn_ratelimited(hdev->dev,
+ "failed to unmap handle %u, va: 0x%llx, pa: 0x%llx, page size: %u\n",
+ phys_pg_pack->handle, next_vaddr,
+ phys_pg_pack->pages[i], page_size);
+
+ next_vaddr += page_size;
+
+ /*
+ * unmapping on Palladium can be really long, so avoid a CPU
+ * soft lockup bug by sleeping a little between unmapping pages
+ *
+ * In addition, on host num of pages could be huge,
+ * because page size could be 4KB, so when unmapping host
+ * pages sleep every 32K pages to avoid soft lockup
+ */
+ if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
+ usleep_range(50, 200);
+ }
+
+ return rc;
+}
+
+/**
+ * unmap_phys_pg_pack() - unmaps the physical page pack.
+ * @ctx: pointer to the context structure.
+ * @vaddr: start address of the virtual area to unmap.
+ * @phys_pg_pack: the pack of physical pages to unmap.
+ */
+static void unmap_phys_pg_pack(struct hl_ctx *ctx, u64 vaddr,
+ struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+ struct hl_device *hdev = ctx->hdev;
+ u64 next_vaddr, i;
+ bool is_host_addr;
+ u32 page_size;
+
+ is_host_addr = !hl_is_dram_va(hdev, vaddr);
+ page_size = phys_pg_pack->page_size;
+ next_vaddr = vaddr;
+
+ for (i = 0 ; i < phys_pg_pack->npages ; i++, next_vaddr += page_size) {
+ if (hl_mmu_unmap_page(ctx, next_vaddr, page_size,
+ (i + 1) == phys_pg_pack->npages))
+ dev_warn_ratelimited(hdev->dev,
+ "unmap failed for vaddr: 0x%llx\n", next_vaddr);
+
+ /*
+ * unmapping on Palladium can be really long, so avoid a CPU
+ * soft lockup bug by sleeping a little between unmapping pages
+ *
+ * In addition, on host num of pages could be huge,
+ * because page size could be 4KB, so when unmapping host
+ * pages sleep every 32K pages to avoid soft lockup
+ */
+ if (hdev->pldm || (is_host_addr && (i & 0x7FFF) == 0))
+ usleep_range(50, 200);
+ }
+}
+
+static int get_paddr_from_handle(struct hl_ctx *ctx, struct hl_mem_in *args,
+ u64 *paddr)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_vm *vm = &hdev->vm;
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ u32 handle;
+
+ handle = lower_32_bits(args->map_device.handle);
+ spin_lock(&vm->idr_lock);
+ phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
+ if (!phys_pg_pack) {
+ spin_unlock(&vm->idr_lock);
+ dev_err(hdev->dev, "no match for handle %u\n", handle);
+ return -EINVAL;
+ }
+
+ *paddr = phys_pg_pack->pages[0];
+
+ spin_unlock(&vm->idr_lock);
+
+ return 0;
+}
+
+/**
+ * map_device_va() - map the given memory.
+ * @ctx: pointer to the context structure.
+ * @args: host parameters with handle/host virtual address.
+ * @device_addr: pointer to result device virtual address.
+ *
+ * This function does the following:
+ * - If given a physical device memory handle, map to a device virtual block
+ * and return the start address of this block.
+ * - If given a host virtual address and size, find the related physical pages,
+ * map a device virtual block to this pages and return the start address of
+ * this block.
+ */
+static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device_addr)
+{
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ enum hl_va_range_type va_range_type = 0;
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_userptr *userptr = NULL;
+ u32 handle = 0, va_block_align;
+ struct hl_vm_hash_node *hnode;
+ struct hl_vm *vm = &hdev->vm;
+ struct hl_va_range *va_range;
+ bool is_userptr, do_prefetch;
+ u64 ret_vaddr, hint_addr;
+ enum vm_type *vm_type;
+ int rc;
+
+ /* set map flags */
+ is_userptr = args->flags & HL_MEM_USERPTR;
+ do_prefetch = hdev->supports_mmu_prefetch && (args->flags & HL_MEM_PREFETCH);
+
+ /* Assume failure */
+ *device_addr = 0;
+
+ if (is_userptr) {
+ u64 addr = args->map_host.host_virt_addr,
+ size = args->map_host.mem_size;
+ u32 page_size = hdev->asic_prop.pmmu.page_size,
+ huge_page_size = hdev->asic_prop.pmmu_huge.page_size;
+
+ rc = dma_map_host_va(hdev, addr, size, &userptr);
+ if (rc) {
+ dev_err(hdev->dev, "failed to get userptr from va\n");
+ return rc;
+ }
+
+ rc = init_phys_pg_pack_from_userptr(ctx, userptr,
+ &phys_pg_pack, false);
+ if (rc) {
+ dev_err(hdev->dev,
+ "unable to init page pack for vaddr 0x%llx\n",
+ addr);
+ goto init_page_pack_err;
+ }
+
+ vm_type = (enum vm_type *) userptr;
+ hint_addr = args->map_host.hint_addr;
+ handle = phys_pg_pack->handle;
+
+ /* get required alignment */
+ if (phys_pg_pack->page_size == page_size) {
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
+ va_range_type = HL_VA_RANGE_TYPE_HOST;
+ /*
+ * huge page alignment may be needed in case of regular
+ * page mapping, depending on the host VA alignment
+ */
+ if (addr & (huge_page_size - 1))
+ va_block_align = page_size;
+ else
+ va_block_align = huge_page_size;
+ } else {
+ /*
+ * huge page alignment is needed in case of huge page
+ * mapping
+ */
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
+ va_range_type = HL_VA_RANGE_TYPE_HOST_HUGE;
+ va_block_align = huge_page_size;
+ }
+ } else {
+ handle = lower_32_bits(args->map_device.handle);
+
+ spin_lock(&vm->idr_lock);
+ phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, handle);
+ if (!phys_pg_pack) {
+ spin_unlock(&vm->idr_lock);
+ dev_err(hdev->dev,
+ "no match for handle %u\n", handle);
+ return -EINVAL;
+ }
+
+ /* increment now to avoid freeing device memory while mapping */
+ atomic_inc(&phys_pg_pack->mapping_cnt);
+
+ spin_unlock(&vm->idr_lock);
+
+ vm_type = (enum vm_type *) phys_pg_pack;
+
+ hint_addr = args->map_device.hint_addr;
+
+ /* DRAM VA alignment is the same as the MMU page size */
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
+ va_range_type = HL_VA_RANGE_TYPE_DRAM;
+ va_block_align = hdev->asic_prop.dmmu.page_size;
+ }
+
+ /*
+ * relevant for mapping device physical memory only, as host memory is
+ * implicitly shared
+ */
+ if (!is_userptr && !(phys_pg_pack->flags & HL_MEM_SHARED) &&
+ phys_pg_pack->asid != ctx->asid) {
+ dev_err(hdev->dev,
+ "Failed to map memory, handle %u is not shared\n",
+ handle);
+ rc = -EPERM;
+ goto shared_err;
+ }
+
+ hnode = kzalloc(sizeof(*hnode), GFP_KERNEL);
+ if (!hnode) {
+ rc = -ENOMEM;
+ goto hnode_err;
+ }
+
+ if (hint_addr && phys_pg_pack->offset) {
+ if (args->flags & HL_MEM_FORCE_HINT) {
+ /* Fail if hint must be respected but it can't be */
+ dev_err(hdev->dev,
+ "Hint address 0x%llx cannot be respected because source memory is not aligned 0x%x\n",
+ hint_addr, phys_pg_pack->offset);
+ rc = -EINVAL;
+ goto va_block_err;
+ }
+ dev_dbg(hdev->dev,
+ "Hint address 0x%llx will be ignored because source memory is not aligned 0x%x\n",
+ hint_addr, phys_pg_pack->offset);
+ }
+
+ ret_vaddr = get_va_block(hdev, va_range, phys_pg_pack->total_size,
+ hint_addr, va_block_align,
+ va_range_type, args->flags);
+ if (!ret_vaddr) {
+ dev_err(hdev->dev, "no available va block for handle %u\n",
+ handle);
+ rc = -ENOMEM;
+ goto va_block_err;
+ }
+
+ mutex_lock(&hdev->mmu_lock);
+
+ rc = map_phys_pg_pack(ctx, ret_vaddr, phys_pg_pack);
+ if (rc) {
+ dev_err(hdev->dev, "mapping page pack failed for handle %u\n", handle);
+ mutex_unlock(&hdev->mmu_lock);
+ goto map_err;
+ }
+
+ rc = hl_mmu_invalidate_cache_range(hdev, false, *vm_type | MMU_OP_SKIP_LOW_CACHE_INV,
+ ctx->asid, ret_vaddr, phys_pg_pack->total_size);
+ mutex_unlock(&hdev->mmu_lock);
+ if (rc)
+ goto map_err;
+
+ /*
+ * prefetch is done upon user's request. it is performed in WQ as and so can
+ * be outside the MMU lock. the operation itself is already protected by the mmu lock
+ */
+ if (do_prefetch) {
+ rc = hl_mmu_prefetch_cache_range(ctx, *vm_type, ctx->asid, ret_vaddr,
+ phys_pg_pack->total_size);
+ if (rc)
+ goto map_err;
+ }
+
+ ret_vaddr += phys_pg_pack->offset;
+
+ hnode->ptr = vm_type;
+ hnode->vaddr = ret_vaddr;
+ hnode->handle = is_userptr ? MEM_HANDLE_INVALID : handle;
+
+ mutex_lock(&ctx->mem_hash_lock);
+ hash_add(ctx->mem_hash, &hnode->node, ret_vaddr);
+ mutex_unlock(&ctx->mem_hash_lock);
+
+ *device_addr = ret_vaddr;
+
+ if (is_userptr)
+ free_phys_pg_pack(hdev, phys_pg_pack);
+
+ return rc;
+
+map_err:
+ if (add_va_block(hdev, va_range, ret_vaddr,
+ ret_vaddr + phys_pg_pack->total_size - 1))
+ dev_warn(hdev->dev,
+ "release va block failed for handle 0x%x, vaddr: 0x%llx\n",
+ handle, ret_vaddr);
+
+va_block_err:
+ kfree(hnode);
+hnode_err:
+shared_err:
+ atomic_dec(&phys_pg_pack->mapping_cnt);
+ if (is_userptr)
+ free_phys_pg_pack(hdev, phys_pg_pack);
+init_page_pack_err:
+ if (is_userptr)
+ dma_unmap_host_va(hdev, userptr);
+
+ return rc;
+}
+
+/**
+ * unmap_device_va() - unmap the given device virtual address.
+ * @ctx: pointer to the context structure.
+ * @args: host parameters with device virtual address to unmap.
+ * @ctx_free: true if in context free flow, false otherwise.
+ *
+ * This function does the following:
+ * - unmap the physical pages related to the given virtual address.
+ * - return the device virtual block to the virtual block list.
+ */
+static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
+ bool ctx_free)
+{
+ struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
+ u64 vaddr = args->unmap.device_virt_addr;
+ struct hl_vm_hash_node *hnode = NULL;
+ struct asic_fixed_properties *prop;
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_userptr *userptr = NULL;
+ struct hl_va_range *va_range;
+ enum vm_type *vm_type;
+ bool is_userptr;
+ int rc = 0;
+
+ prop = &hdev->asic_prop;
+
+ /* protect from double entrance */
+ mutex_lock(&ctx->mem_hash_lock);
+ hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr)
+ if (vaddr == hnode->vaddr)
+ break;
+
+ if (!hnode) {
+ mutex_unlock(&ctx->mem_hash_lock);
+ dev_err(hdev->dev,
+ "unmap failed, no mem hnode for vaddr 0x%llx\n",
+ vaddr);
+ return -EINVAL;
+ }
+
+ if (hnode->export_cnt) {
+ mutex_unlock(&ctx->mem_hash_lock);
+ dev_err(hdev->dev, "failed to unmap %#llx, memory is exported\n", vaddr);
+ return -EINVAL;
+ }
+
+ hash_del(&hnode->node);
+ mutex_unlock(&ctx->mem_hash_lock);
+
+ vm_type = hnode->ptr;
+
+ if (*vm_type == VM_TYPE_USERPTR) {
+ is_userptr = true;
+ userptr = hnode->ptr;
+
+ rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack,
+ false);
+ if (rc) {
+ dev_err(hdev->dev,
+ "unable to init page pack for vaddr 0x%llx\n",
+ vaddr);
+ goto vm_type_err;
+ }
+
+ if (phys_pg_pack->page_size ==
+ hdev->asic_prop.pmmu.page_size)
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
+ else
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
+ } else if (*vm_type == VM_TYPE_PHYS_PACK) {
+ is_userptr = false;
+ va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
+ phys_pg_pack = hnode->ptr;
+ } else {
+ dev_warn(hdev->dev,
+ "unmap failed, unknown vm desc for vaddr 0x%llx\n",
+ vaddr);
+ rc = -EFAULT;
+ goto vm_type_err;
+ }
+
+ if (atomic_read(&phys_pg_pack->mapping_cnt) == 0) {
+ dev_err(hdev->dev, "vaddr 0x%llx is not mapped\n", vaddr);
+ rc = -EINVAL;
+ goto mapping_cnt_err;
+ }
+
+ if (!is_userptr && !is_power_of_2(phys_pg_pack->page_size))
+ vaddr = prop->dram_base_address +
+ DIV_ROUND_DOWN_ULL(vaddr - prop->dram_base_address,
+ phys_pg_pack->page_size) *
+ phys_pg_pack->page_size;
+ else
+ vaddr &= ~(((u64) phys_pg_pack->page_size) - 1);
+
+ mutex_lock(&hdev->mmu_lock);
+
+ unmap_phys_pg_pack(ctx, vaddr, phys_pg_pack);
+
+ /*
+ * During context free this function is called in a loop to clean all
+ * the context mappings. Hence the cache invalidation can be called once
+ * at the loop end rather than for each iteration
+ */
+ if (!ctx_free)
+ rc = hl_mmu_invalidate_cache_range(hdev, true, *vm_type, ctx->asid, vaddr,
+ phys_pg_pack->total_size);
+
+ mutex_unlock(&hdev->mmu_lock);
+
+ /*
+ * If the context is closing we don't need to check for the MMU cache
+ * invalidation return code and update the VA free list as in this flow
+ * we invalidate the MMU cache outside of this unmap function and the VA
+ * free list will be freed anyway.
+ */
+ if (!ctx_free) {
+ int tmp_rc;
+
+ tmp_rc = add_va_block(hdev, va_range, vaddr,
+ vaddr + phys_pg_pack->total_size - 1);
+ if (tmp_rc) {
+ dev_warn(hdev->dev,
+ "add va block failed for vaddr: 0x%llx\n",
+ vaddr);
+ if (!rc)
+ rc = tmp_rc;
+ }
+ }
+
+ atomic_dec(&phys_pg_pack->mapping_cnt);
+ kfree(hnode);
+
+ if (is_userptr) {
+ free_phys_pg_pack(hdev, phys_pg_pack);
+ dma_unmap_host_va(hdev, userptr);
+ }
+
+ return rc;
+
+mapping_cnt_err:
+ if (is_userptr)
+ free_phys_pg_pack(hdev, phys_pg_pack);
+vm_type_err:
+ mutex_lock(&ctx->mem_hash_lock);
+ hash_add(ctx->mem_hash, &hnode->node, vaddr);
+ mutex_unlock(&ctx->mem_hash_lock);
+
+ return rc;
+}
+
+static int map_block(struct hl_device *hdev, u64 address, u64 *handle, u32 *size)
+{
+ u32 block_id;
+ int rc;
+
+ *handle = 0;
+ if (size)
+ *size = 0;
+
+ rc = hdev->asic_funcs->get_hw_block_id(hdev, address, size, &block_id);
+ if (rc)
+ return rc;
+
+ *handle = block_id | HL_MMAP_TYPE_BLOCK;
+ *handle <<= PAGE_SHIFT;
+
+ return 0;
+}
+
+static void hw_block_vm_close(struct vm_area_struct *vma)
+{
+ struct hl_vm_hw_block_list_node *lnode =
+ (struct hl_vm_hw_block_list_node *) vma->vm_private_data;
+ struct hl_ctx *ctx = lnode->ctx;
+ long new_mmap_size;
+
+ new_mmap_size = lnode->mapped_size - (vma->vm_end - vma->vm_start);
+ if (new_mmap_size > 0) {
+ lnode->mapped_size = new_mmap_size;
+ return;
+ }
+
+ mutex_lock(&ctx->hw_block_list_lock);
+ list_del(&lnode->node);
+ mutex_unlock(&ctx->hw_block_list_lock);
+ hl_ctx_put(ctx);
+ kfree(lnode);
+ vma->vm_private_data = NULL;
+}
+
+static const struct vm_operations_struct hw_block_vm_ops = {
+ .close = hw_block_vm_close
+};
+
+/**
+ * hl_hw_block_mmap() - mmap a hw block to user.
+ * @hpriv: pointer to the private data of the fd
+ * @vma: pointer to vm_area_struct of the process
+ *
+ * Driver increments context reference for every HW block mapped in order
+ * to prevent user from closing FD without unmapping first
+ */
+int hl_hw_block_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
+{
+ struct hl_vm_hw_block_list_node *lnode;
+ struct hl_device *hdev = hpriv->hdev;
+ struct hl_ctx *ctx = hpriv->ctx;
+ u32 block_id, block_size;
+ int rc;
+
+ /* We use the page offset to hold the block id and thus we need to clear
+ * it before doing the mmap itself
+ */
+ block_id = vma->vm_pgoff;
+ vma->vm_pgoff = 0;
+
+ /* Driver only allows mapping of a complete HW block */
+ block_size = vma->vm_end - vma->vm_start;
+
+ if (!access_ok((void __user *) (uintptr_t) vma->vm_start, block_size)) {
+ dev_err(hdev->dev,
+ "user pointer is invalid - 0x%lx\n",
+ vma->vm_start);
+
+ return -EINVAL;
+ }
+
+ lnode = kzalloc(sizeof(*lnode), GFP_KERNEL);
+ if (!lnode)
+ return -ENOMEM;
+
+ rc = hdev->asic_funcs->hw_block_mmap(hdev, vma, block_id, block_size);
+ if (rc) {
+ kfree(lnode);
+ return rc;
+ }
+
+ hl_ctx_get(ctx);
+
+ lnode->ctx = ctx;
+ lnode->vaddr = vma->vm_start;
+ lnode->block_size = block_size;
+ lnode->mapped_size = lnode->block_size;
+ lnode->id = block_id;
+
+ vma->vm_private_data = lnode;
+ vma->vm_ops = &hw_block_vm_ops;
+
+ mutex_lock(&ctx->hw_block_list_lock);
+ list_add_tail(&lnode->node, &ctx->hw_block_mem_list);
+ mutex_unlock(&ctx->hw_block_list_lock);
+
+ vma->vm_pgoff = block_id;
+
+ return 0;
+}
+
+static int set_dma_sg(struct scatterlist *sg, u64 bar_address, u64 chunk_size,
+ struct device *dev, enum dma_data_direction dir)
+{
+ dma_addr_t addr;
+ int rc;
+
+ addr = dma_map_resource(dev, bar_address, chunk_size, dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ rc = dma_mapping_error(dev, addr);
+ if (rc)
+ return rc;
+
+ sg_set_page(sg, NULL, chunk_size, 0);
+ sg_dma_address(sg) = addr;
+ sg_dma_len(sg) = chunk_size;
+
+ return 0;
+}
+
+static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 *pages, u64 npages,
+ u64 page_size, u64 exported_size,
+ struct device *dev, enum dma_data_direction dir)
+{
+ u64 chunk_size, bar_address, dma_max_seg_size, cur_size_to_export, cur_npages;
+ struct asic_fixed_properties *prop;
+ int rc, i, j, nents, cur_page;
+ struct scatterlist *sg;
+ struct sg_table *sgt;
+
+ prop = &hdev->asic_prop;
+
+ dma_max_seg_size = dma_get_max_seg_size(dev);
+
+ /* We would like to align the max segment size to PAGE_SIZE, so the
+ * SGL will contain aligned addresses that can be easily mapped to
+ * an MMU
+ */
+ dma_max_seg_size = ALIGN_DOWN(dma_max_seg_size, PAGE_SIZE);
+ if (dma_max_seg_size < PAGE_SIZE) {
+ dev_err_ratelimited(hdev->dev,
+ "dma_max_seg_size %llu can't be smaller than PAGE_SIZE\n",
+ dma_max_seg_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt)
+ return ERR_PTR(-ENOMEM);
+
+ /* remove export size restrictions in case not explicitly defined */
+ cur_size_to_export = exported_size ? exported_size : (npages * page_size);
+
+ /* If the size of each page is larger than the dma max segment size,
+ * then we can't combine pages and the number of entries in the SGL
+ * will just be the
+ * <number of pages> * <chunks of max segment size in each page>
+ */
+ if (page_size > dma_max_seg_size) {
+ /* we should limit number of pages according to the exported size */
+ cur_npages = DIV_ROUND_UP_SECTOR_T(cur_size_to_export, page_size);
+ nents = cur_npages * DIV_ROUND_UP_SECTOR_T(page_size, dma_max_seg_size);
+ } else {
+ cur_npages = npages;
+
+ /* Get number of non-contiguous chunks */
+ for (i = 1, nents = 1, chunk_size = page_size ; i < cur_npages ; i++) {
+ if (pages[i - 1] + page_size != pages[i] ||
+ chunk_size + page_size > dma_max_seg_size) {
+ nents++;
+ chunk_size = page_size;
+ continue;
+ }
+
+ chunk_size += page_size;
+ }
+ }
+
+ rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO);
+ if (rc)
+ goto error_free;
+
+ cur_page = 0;
+
+ if (page_size > dma_max_seg_size) {
+ u64 size_left, cur_device_address = 0;
+
+ size_left = page_size;
+
+ /* Need to split each page into the number of chunks of
+ * dma_max_seg_size
+ */
+ for_each_sgtable_dma_sg(sgt, sg, i) {
+ if (size_left == page_size)
+ cur_device_address =
+ pages[cur_page] - prop->dram_base_address;
+ else
+ cur_device_address += dma_max_seg_size;
+
+ /* make sure not to export over exported size */
+ chunk_size = min3(size_left, dma_max_seg_size, cur_size_to_export);
+
+ bar_address = hdev->dram_pci_bar_start + cur_device_address;
+
+ rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
+ if (rc)
+ goto error_unmap;
+
+ cur_size_to_export -= chunk_size;
+
+ if (size_left > dma_max_seg_size) {
+ size_left -= dma_max_seg_size;
+ } else {
+ cur_page++;
+ size_left = page_size;
+ }
+ }
+ } else {
+ /* Merge pages and put them into the scatterlist */
+ for_each_sgtable_dma_sg(sgt, sg, i) {
+ chunk_size = page_size;
+ for (j = cur_page + 1 ; j < cur_npages ; j++) {
+ if (pages[j - 1] + page_size != pages[j] ||
+ chunk_size + page_size > dma_max_seg_size)
+ break;
+
+ chunk_size += page_size;
+ }
+
+ bar_address = hdev->dram_pci_bar_start +
+ (pages[cur_page] - prop->dram_base_address);
+
+ /* make sure not to export over exported size */
+ chunk_size = min(chunk_size, cur_size_to_export);
+ rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir);
+ if (rc)
+ goto error_unmap;
+
+ cur_size_to_export -= chunk_size;
+ cur_page = j;
+ }
+ }
+
+ /* Because we are not going to include a CPU list we want to have some
+ * chance that other users will detect this by setting the orig_nents
+ * to 0 and using only nents (length of DMA list) when going over the
+ * sgl
+ */
+ sgt->orig_nents = 0;
+
+ return sgt;
+
+error_unmap:
+ for_each_sgtable_dma_sg(sgt, sg, i) {
+ if (!sg_dma_len(sg))
+ continue;
+
+ dma_unmap_resource(dev, sg_dma_address(sg),
+ sg_dma_len(sg), dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ }
+
+ sg_free_table(sgt);
+
+error_free:
+ kfree(sgt);
+ return ERR_PTR(rc);
+}
+
+static int hl_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attachment)
+{
+ struct hl_dmabuf_priv *hl_dmabuf;
+ struct hl_device *hdev;
+ int rc;
+
+ hl_dmabuf = dmabuf->priv;
+ hdev = hl_dmabuf->ctx->hdev;
+
+ rc = pci_p2pdma_distance(hdev->pdev, attachment->dev, true);
+
+ if (rc < 0)
+ attachment->peer2peer = false;
+ return 0;
+}
+
+static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment,
+ enum dma_data_direction dir)
+{
+ struct dma_buf *dma_buf = attachment->dmabuf;
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ struct hl_dmabuf_priv *hl_dmabuf;
+ struct hl_device *hdev;
+ struct sg_table *sgt;
+
+ hl_dmabuf = dma_buf->priv;
+ hdev = hl_dmabuf->ctx->hdev;
+ phys_pg_pack = hl_dmabuf->phys_pg_pack;
+
+ if (!attachment->peer2peer) {
+ dev_dbg(hdev->dev, "Failed to map dmabuf because p2p is disabled\n");
+ return ERR_PTR(-EPERM);
+ }
+
+ if (phys_pg_pack)
+ sgt = alloc_sgt_from_device_pages(hdev,
+ phys_pg_pack->pages,
+ phys_pg_pack->npages,
+ phys_pg_pack->page_size,
+ phys_pg_pack->exported_size,
+ attachment->dev,
+ dir);
+ else
+ sgt = alloc_sgt_from_device_pages(hdev,
+ &hl_dmabuf->device_address,
+ 1,
+ hl_dmabuf->dmabuf->size,
+ 0,
+ attachment->dev,
+ dir);
+
+ if (IS_ERR(sgt))
+ dev_err(hdev->dev, "failed (%ld) to initialize sgt for dmabuf\n", PTR_ERR(sgt));
+
+ return sgt;
+}
+
+static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ /* The memory behind the dma-buf has *always* resided on the device itself, i.e. it lives
+ * only in the 'device' domain (after all, it maps a PCI bar address which points to the
+ * device memory).
+ *
+ * Therefore, it was never in the 'CPU' domain and hence, there is no need to perform
+ * a sync of the memory to the CPU's cache, as it never resided inside that cache.
+ */
+ for_each_sgtable_dma_sg(sgt, sg, i)
+ dma_unmap_resource(attachment->dev, sg_dma_address(sg),
+ sg_dma_len(sg), dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+
+ /* Need to restore orig_nents because sg_free_table use that field */
+ sgt->orig_nents = sgt->nents;
+ sg_free_table(sgt);
+ kfree(sgt);
+}
+
+static void hl_release_dmabuf(struct dma_buf *dmabuf)
+{
+ struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv;
+ struct hl_ctx *ctx;
+
+ if (!hl_dmabuf)
+ return;
+
+ ctx = hl_dmabuf->ctx;
+
+ if (hl_dmabuf->memhash_hnode) {
+ mutex_lock(&ctx->mem_hash_lock);
+ hl_dmabuf->memhash_hnode->export_cnt--;
+ mutex_unlock(&ctx->mem_hash_lock);
+ }
+
+ hl_ctx_put(ctx);
+ kfree(hl_dmabuf);
+}
+
+static const struct dma_buf_ops habanalabs_dmabuf_ops = {
+ .attach = hl_dmabuf_attach,
+ .map_dma_buf = hl_map_dmabuf,
+ .unmap_dma_buf = hl_unmap_dmabuf,
+ .release = hl_release_dmabuf,
+};
+
+static int export_dmabuf(struct hl_ctx *ctx,
+ struct hl_dmabuf_priv *hl_dmabuf,
+ u64 total_size, int flags, int *dmabuf_fd)
+{
+ DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+ struct hl_device *hdev = ctx->hdev;
+ int rc, fd;
+
+ exp_info.ops = &habanalabs_dmabuf_ops;
+ exp_info.size = total_size;
+ exp_info.flags = flags;
+ exp_info.priv = hl_dmabuf;
+
+ hl_dmabuf->dmabuf = dma_buf_export(&exp_info);
+ if (IS_ERR(hl_dmabuf->dmabuf)) {
+ dev_err(hdev->dev, "failed to export dma-buf\n");
+ return PTR_ERR(hl_dmabuf->dmabuf);
+ }
+
+ fd = dma_buf_fd(hl_dmabuf->dmabuf, flags);
+ if (fd < 0) {
+ dev_err(hdev->dev, "failed to get a file descriptor for a dma-buf, %d\n", fd);
+ rc = fd;
+ goto err_dma_buf_put;
+ }
+
+ hl_dmabuf->ctx = ctx;
+ hl_ctx_get(hl_dmabuf->ctx);
+
+ *dmabuf_fd = fd;
+
+ return 0;
+
+err_dma_buf_put:
+ hl_dmabuf->dmabuf->priv = NULL;
+ dma_buf_put(hl_dmabuf->dmabuf);
+ return rc;
+}
+
+static int validate_export_params_common(struct hl_device *hdev, u64 device_addr, u64 size)
+{
+ if (!IS_ALIGNED(device_addr, PAGE_SIZE)) {
+ dev_dbg(hdev->dev,
+ "exported device memory address 0x%llx should be aligned to 0x%lx\n",
+ device_addr, PAGE_SIZE);
+ return -EINVAL;
+ }
+
+ if (size < PAGE_SIZE) {
+ dev_dbg(hdev->dev,
+ "exported device memory size %llu should be equal to or greater than %lu\n",
+ size, PAGE_SIZE);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int validate_export_params_no_mmu(struct hl_device *hdev, u64 device_addr, u64 size)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 bar_address;
+ int rc;
+
+ rc = validate_export_params_common(hdev, device_addr, size);
+ if (rc)
+ return rc;
+
+ if (device_addr < prop->dram_user_base_address ||
+ (device_addr + size) > prop->dram_end_address ||
+ (device_addr + size) < device_addr) {
+ dev_dbg(hdev->dev,
+ "DRAM memory range 0x%llx (+0x%llx) is outside of DRAM boundaries\n",
+ device_addr, size);
+ return -EINVAL;
+ }
+
+ bar_address = hdev->dram_pci_bar_start + (device_addr - prop->dram_base_address);
+
+ if ((bar_address + size) > (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
+ (bar_address + size) < bar_address) {
+ dev_dbg(hdev->dev,
+ "DRAM memory range 0x%llx (+0x%llx) is outside of PCI BAR boundaries\n",
+ device_addr, size);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size, u64 offset,
+ struct hl_vm_phys_pg_pack *phys_pg_pack)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 bar_address;
+ int i, rc;
+
+ rc = validate_export_params_common(hdev, device_addr, size);
+ if (rc)
+ return rc;
+
+ if ((offset + size) > phys_pg_pack->total_size) {
+ dev_dbg(hdev->dev, "offset %#llx and size %#llx exceed total map size %#llx\n",
+ offset, size, phys_pg_pack->total_size);
+ return -EINVAL;
+ }
+
+ for (i = 0 ; i < phys_pg_pack->npages ; i++) {
+
+ bar_address = hdev->dram_pci_bar_start +
+ (phys_pg_pack->pages[i] - prop->dram_base_address);
+
+ if ((bar_address + phys_pg_pack->page_size) >
+ (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) ||
+ (bar_address + phys_pg_pack->page_size) < bar_address) {
+ dev_dbg(hdev->dev,
+ "DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n",
+ phys_pg_pack->pages[i],
+ phys_pg_pack->page_size);
+
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_vm_hash_node *hnode;
+
+ /* get the memory handle */
+ mutex_lock(&ctx->mem_hash_lock);
+ hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr)
+ if (addr == hnode->vaddr)
+ break;
+
+ if (!hnode) {
+ mutex_unlock(&ctx->mem_hash_lock);
+ dev_dbg(hdev->dev, "map address %#llx not found\n", addr);
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (upper_32_bits(hnode->handle)) {
+ mutex_unlock(&ctx->mem_hash_lock);
+ dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n",
+ hnode->handle, addr);
+ return ERR_PTR(-EINVAL);
+ }
+
+ /*
+ * node found, increase export count so this memory cannot be unmapped
+ * and the hash node cannot be deleted.
+ */
+ hnode->export_cnt++;
+ mutex_unlock(&ctx->mem_hash_lock);
+
+ return hnode;
+}
+
+static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode)
+{
+ mutex_lock(&ctx->mem_hash_lock);
+ hnode->export_cnt--;
+ mutex_unlock(&ctx->mem_hash_lock);
+}
+
+static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev,
+ struct hl_vm_hash_node *hnode)
+{
+ struct hl_vm_phys_pg_pack *phys_pg_pack;
+ struct hl_vm *vm = &hdev->vm;
+
+ spin_lock(&vm->idr_lock);
+ phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) hnode->handle);
+ if (!phys_pg_pack) {
+ spin_unlock(&vm->idr_lock);
+ dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) hnode->handle);
+ return ERR_PTR(-EINVAL);
+ }
+
+ spin_unlock(&vm->idr_lock);
+
+ if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) {
+ dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", hnode->handle);
+ return ERR_PTR(-EINVAL);
+ }
+
+ return phys_pg_pack;
+}
+
+/**
+ * export_dmabuf_from_addr() - export a dma-buf object for the given memory
+ * address and size.
+ * @ctx: pointer to the context structure.
+ * @addr: device address.
+ * @size: size of device memory to export.
+ * @offset: the offset into the buffer from which to start exporting
+ * @flags: DMA-BUF file/FD flags.
+ * @dmabuf_fd: pointer to result FD that represents the dma-buf object.
+ *
+ * Create and export a dma-buf object for an existing memory allocation inside
+ * the device memory, and return a FD which is associated with the dma-buf
+ * object.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 offset,
+ int flags, int *dmabuf_fd)
+{
+ struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
+ struct hl_vm_hash_node *hnode = NULL;
+ struct asic_fixed_properties *prop;
+ struct hl_dmabuf_priv *hl_dmabuf;
+ struct hl_device *hdev;
+ u64 export_addr;
+ int rc;
+
+ hdev = ctx->hdev;
+ prop = &hdev->asic_prop;
+
+ /* offset must be 0 in devices without virtual memory support */
+ if (!prop->dram_supports_virtual_memory && offset) {
+ dev_dbg(hdev->dev, "offset is not allowed in device without virtual memory\n");
+ return -EINVAL;
+ }
+
+ export_addr = addr + offset;
+
+ hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL);
+ if (!hl_dmabuf)
+ return -ENOMEM;
+
+ if (prop->dram_supports_virtual_memory) {
+ hnode = memhash_node_export_get(ctx, addr);
+ if (IS_ERR(hnode)) {
+ rc = PTR_ERR(hnode);
+ goto err_free_dmabuf_wrapper;
+ }
+ phys_pg_pack = get_phys_pg_pack_from_hash_node(hdev, hnode);
+ if (IS_ERR(phys_pg_pack)) {
+ rc = PTR_ERR(phys_pg_pack);
+ goto dec_memhash_export_cnt;
+ }
+ rc = validate_export_params(hdev, export_addr, size, offset, phys_pg_pack);
+ if (rc)
+ goto dec_memhash_export_cnt;
+
+ phys_pg_pack->exported_size = size;
+ hl_dmabuf->phys_pg_pack = phys_pg_pack;
+ hl_dmabuf->memhash_hnode = hnode;
+ } else {
+ rc = validate_export_params_no_mmu(hdev, export_addr, size);
+ if (rc)
+ goto err_free_dmabuf_wrapper;
+ }
+
+ hl_dmabuf->device_address = export_addr;
+
+ rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd);
+ if (rc)
+ goto dec_memhash_export_cnt;
+
+ return 0;
+
+dec_memhash_export_cnt:
+ if (prop->dram_supports_virtual_memory)
+ memhash_node_export_put(ctx, hnode);
+err_free_dmabuf_wrapper:
+ kfree(hl_dmabuf);
+ return rc;
+}
+
+static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
+{
+ struct hl_device *hdev = hpriv->hdev;
+ u64 block_handle, device_addr = 0;
+ struct hl_ctx *ctx = hpriv->ctx;
+ u32 handle = 0, block_size;
+ int rc;
+
+ switch (args->in.op) {
+ case HL_MEM_OP_ALLOC:
+ if (args->in.alloc.mem_size == 0) {
+ dev_err(hdev->dev, "alloc size must be larger than 0\n");
+ rc = -EINVAL;
+ goto out;
+ }
+
+ /* Force contiguous as there are no real MMU
+ * translations to overcome physical memory gaps
+ */
+ args->in.flags |= HL_MEM_CONTIGUOUS;
+ rc = alloc_device_memory(ctx, &args->in, &handle);
+
+ memset(args, 0, sizeof(*args));
+ args->out.handle = (__u64) handle;
+ break;
+
+ case HL_MEM_OP_FREE:
+ rc = free_device_memory(ctx, &args->in);
+ break;
+
+ case HL_MEM_OP_MAP:
+ if (args->in.flags & HL_MEM_USERPTR) {
+ dev_err(hdev->dev, "Failed to map host memory when MMU is disabled\n");
+ rc = -EPERM;
+ } else {
+ rc = get_paddr_from_handle(ctx, &args->in, &device_addr);
+ memset(args, 0, sizeof(*args));
+ args->out.device_virt_addr = device_addr;
+ }
+
+ break;
+
+ case HL_MEM_OP_UNMAP:
+ rc = 0;
+ break;
+
+ case HL_MEM_OP_MAP_BLOCK:
+ rc = map_block(hdev, args->in.map_block.block_addr, &block_handle, &block_size);
+ args->out.block_handle = block_handle;
+ args->out.block_size = block_size;
+ break;
+
+ case HL_MEM_OP_EXPORT_DMABUF_FD:
+ dev_err(hdev->dev, "Failed to export dma-buf object when MMU is disabled\n");
+ rc = -EPERM;
+ break;
+
+ case HL_MEM_OP_TS_ALLOC:
+ rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
+ break;
+ default:
+ dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
+ rc = -EINVAL;
+ break;
+ }
+
+out:
+ return rc;
+}
+
+static void ts_buff_release(struct hl_mmap_mem_buf *buf)
+{
+ struct hl_ts_buff *ts_buff = buf->private;
+
+ vfree(ts_buff->kernel_buff_address);
+ vfree(ts_buff->user_buff_address);
+ kfree(ts_buff);
+}
+
+static int hl_ts_mmap(struct hl_mmap_mem_buf *buf, struct vm_area_struct *vma, void *args)
+{
+ struct hl_ts_buff *ts_buff = buf->private;
+
++ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
+ return remap_vmalloc_range(vma, ts_buff->user_buff_address, 0);
+}
+
+static int hl_ts_alloc_buf(struct hl_mmap_mem_buf *buf, gfp_t gfp, void *args)
+{
+ struct hl_ts_buff *ts_buff = NULL;
+ u32 num_elements;
+ size_t size;
+ void *p;
+
+ num_elements = *(u32 *)args;
+
+ ts_buff = kzalloc(sizeof(*ts_buff), gfp);
+ if (!ts_buff)
+ return -ENOMEM;
+
+ /* Allocate the user buffer */
+ size = num_elements * sizeof(u64);
+ p = vmalloc_user(size);
+ if (!p)
+ goto free_mem;
+
+ ts_buff->user_buff_address = p;
+ buf->mappable_size = size;
+
+ /* Allocate the internal kernel buffer */
+ size = num_elements * sizeof(struct hl_user_pending_interrupt);
+ p = vzalloc(size);
+ if (!p)
+ goto free_user_buff;
+
+ ts_buff->kernel_buff_address = p;
+ ts_buff->kernel_buff_size = size;
+
+ buf->private = ts_buff;
+
+ return 0;
+
+free_user_buff:
+ vfree(ts_buff->user_buff_address);
+free_mem:
+ kfree(ts_buff);
+ return -ENOMEM;
+}
+
+static struct hl_mmap_mem_buf_behavior hl_ts_behavior = {
+ .topic = "TS",
+ .mem_id = HL_MMAP_TYPE_TS_BUFF,
+ .mmap = hl_ts_mmap,
+ .alloc = hl_ts_alloc_buf,
+ .release = ts_buff_release,
+};
+
+/**
+ * allocate_timestamps_buffers() - allocate timestamps buffers
+ * This function will allocate ts buffer that will later on be mapped to the user
+ * in order to be able to read the timestamp.
+ * in additon it'll allocate an extra buffer for registration management.
+ * since we cannot fail during registration for out-of-memory situation, so
+ * we'll prepare a pool which will be used as user interrupt nodes and instead
+ * of dynamically allocating nodes while registration we'll pick the node from
+ * this pool. in addtion it'll add node to the mapping hash which will be used
+ * to map user ts buffer to the internal kernel ts buffer.
+ * @hpriv: pointer to the private data of the fd
+ * @args: ioctl input
+ * @handle: user timestamp buffer handle as an output
+ */
+static int allocate_timestamps_buffers(struct hl_fpriv *hpriv, struct hl_mem_in *args, u64 *handle)
+{
+ struct hl_mem_mgr *mmg = &hpriv->mem_mgr;
+ struct hl_mmap_mem_buf *buf;
+
+ if (args->num_of_elements > TS_MAX_ELEMENTS_NUM) {
+ dev_err(mmg->dev, "Num of elements exceeds Max allowed number (0x%x > 0x%x)\n",
+ args->num_of_elements, TS_MAX_ELEMENTS_NUM);
+ return -EINVAL;
+ }
+
+ buf = hl_mmap_mem_buf_alloc(mmg, &hl_ts_behavior, GFP_KERNEL, &args->num_of_elements);
+ if (!buf)
+ return -ENOMEM;
+
+ *handle = buf->handle;
+
+ return 0;
+}
+
+int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data)
+{
+ enum hl_device_status status;
+ union hl_mem_args *args = data;
+ struct hl_device *hdev = hpriv->hdev;
+ struct hl_ctx *ctx = hpriv->ctx;
+ u64 block_handle, device_addr = 0;
+ u32 handle = 0, block_size;
+ int rc, dmabuf_fd = -EBADF;
+
+ if (!hl_device_operational(hdev, &status)) {
+ dev_dbg_ratelimited(hdev->dev,
+ "Device is %s. Can't execute MEMORY IOCTL\n",
+ hdev->status[status]);
+ return -EBUSY;
+ }
+
+ if (!hdev->mmu_enable)
+ return mem_ioctl_no_mmu(hpriv, args);
+
+ switch (args->in.op) {
+ case HL_MEM_OP_ALLOC:
+ if (args->in.alloc.mem_size == 0) {
+ dev_err(hdev->dev,
+ "alloc size must be larger than 0\n");
+ rc = -EINVAL;
+ goto out;
+ }
+
+ /* If DRAM does not support virtual memory the driver won't
+ * handle the allocation/freeing of that memory. However, for
+ * system administration/monitoring purposes, the driver will
+ * keep track of the amount of DRAM memory that is allocated
+ * and freed by the user. Because this code totally relies on
+ * the user's input, the driver can't ensure the validity
+ * of this accounting.
+ */
+ if (!hdev->asic_prop.dram_supports_virtual_memory) {
+ atomic64_add(args->in.alloc.mem_size,
+ &ctx->dram_phys_mem);
+ atomic64_add(args->in.alloc.mem_size,
+ &hdev->dram_used_mem);
+
+ dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
+ rc = 0;
+
+ memset(args, 0, sizeof(*args));
+ args->out.handle = 0;
+ goto out;
+ }
+
+ rc = alloc_device_memory(ctx, &args->in, &handle);
+
+ memset(args, 0, sizeof(*args));
+ args->out.handle = (__u64) handle;
+ break;
+
+ case HL_MEM_OP_FREE:
+ /* If DRAM does not support virtual memory the driver won't
+ * handle the allocation/freeing of that memory. However, for
+ * system administration/monitoring purposes, the driver will
+ * keep track of the amount of DRAM memory that is allocated
+ * and freed by the user. Because this code totally relies on
+ * the user's input, the driver can't ensure the validity
+ * of this accounting.
+ */
+ if (!hdev->asic_prop.dram_supports_virtual_memory) {
+ atomic64_sub(args->in.alloc.mem_size,
+ &ctx->dram_phys_mem);
+ atomic64_sub(args->in.alloc.mem_size,
+ &hdev->dram_used_mem);
+
+ dev_dbg(hdev->dev, "DRAM alloc is not supported\n");
+ rc = 0;
+
+ goto out;
+ }
+
+ rc = free_device_memory(ctx, &args->in);
+ break;
+
+ case HL_MEM_OP_MAP:
+ rc = map_device_va(ctx, &args->in, &device_addr);
+
+ memset(args, 0, sizeof(*args));
+ args->out.device_virt_addr = device_addr;
+ break;
+
+ case HL_MEM_OP_UNMAP:
+ rc = unmap_device_va(ctx, &args->in, false);
+ break;
+
+ case HL_MEM_OP_MAP_BLOCK:
+ rc = map_block(hdev, args->in.map_block.block_addr,
+ &block_handle, &block_size);
+ args->out.block_handle = block_handle;
+ args->out.block_size = block_size;
+ break;
+
+ case HL_MEM_OP_EXPORT_DMABUF_FD:
+ rc = export_dmabuf_from_addr(ctx,
+ args->in.export_dmabuf_fd.addr,
+ args->in.export_dmabuf_fd.mem_size,
+ args->in.export_dmabuf_fd.offset,
+ args->in.flags,
+ &dmabuf_fd);
+ memset(args, 0, sizeof(*args));
+ args->out.fd = dmabuf_fd;
+ break;
+
+ case HL_MEM_OP_TS_ALLOC:
+ rc = allocate_timestamps_buffers(hpriv, &args->in, &args->out.handle);
+ break;
+ default:
+ dev_err(hdev->dev, "Unknown opcode for memory IOCTL\n");
+ rc = -EINVAL;
+ break;
+ }
+
+out:
+ return rc;
+}
+
+static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
+ u32 npages, u64 start, u32 offset,
+ struct hl_userptr *userptr)
+{
+ int rc;
+
+ if (!access_ok((void __user *) (uintptr_t) addr, size)) {
+ dev_err(hdev->dev, "user pointer is invalid - 0x%llx\n", addr);
+ return -EFAULT;
+ }
+
+ userptr->pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
+ if (!userptr->pages)
+ return -ENOMEM;
+
+ rc = pin_user_pages_fast(start, npages, FOLL_WRITE | FOLL_LONGTERM,
+ userptr->pages);
+
+ if (rc != npages) {
+ dev_err(hdev->dev,
+ "Failed (%d) to pin host memory with user ptr 0x%llx, size 0x%llx, npages %d\n",
+ rc, addr, size, npages);
+ if (rc < 0)
+ goto destroy_pages;
+ npages = rc;
+ rc = -EFAULT;
+ goto put_pages;
+ }
+ userptr->npages = npages;
+
+ rc = sg_alloc_table_from_pages(userptr->sgt,
+ userptr->pages,
+ npages, offset, size, GFP_KERNEL);
+ if (rc < 0) {
+ dev_err(hdev->dev, "failed to create SG table from pages\n");
+ goto put_pages;
+ }
+
+ return 0;
+
+put_pages:
+ unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+ kvfree(userptr->pages);
+ return rc;
+}
+
+/**
+ * hl_pin_host_memory() - pins a chunk of host memory.
+ * @hdev: pointer to the habanalabs device structure.
+ * @addr: the host virtual address of the memory area.
+ * @size: the size of the memory area.
+ * @userptr: pointer to hl_userptr structure.
+ *
+ * This function does the following:
+ * - Pins the physical pages.
+ * - Create an SG list from those pages.
+ */
+int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
+ struct hl_userptr *userptr)
+{
+ u64 start, end;
+ u32 npages, offset;
+ int rc;
+
+ if (!size) {
+ dev_err(hdev->dev, "size to pin is invalid - %llu\n", size);
+ return -EINVAL;
+ }
+
+ /*
+ * If the combination of the address and size requested for this memory
+ * region causes an integer overflow, return error.
+ */
+ if (((addr + size) < addr) ||
+ PAGE_ALIGN(addr + size) < (addr + size)) {
+ dev_err(hdev->dev,
+ "user pointer 0x%llx + %llu causes integer overflow\n",
+ addr, size);
+ return -EINVAL;
+ }
+
+ userptr->pid = current->pid;
+ userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_KERNEL);
+ if (!userptr->sgt)
+ return -ENOMEM;
+
+ start = addr & PAGE_MASK;
+ offset = addr & ~PAGE_MASK;
+ end = PAGE_ALIGN(addr + size);
+ npages = (end - start) >> PAGE_SHIFT;
+
+ userptr->size = size;
+ userptr->addr = addr;
+ userptr->dma_mapped = false;
+ INIT_LIST_HEAD(&userptr->job_node);
+
+ rc = get_user_memory(hdev, addr, size, npages, start, offset,
+ userptr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to get user memory for address 0x%llx\n",
+ addr);
+ goto free_sgt;
+ }
+
+ hl_debugfs_add_userptr(hdev, userptr);
+
+ return 0;
+
+free_sgt:
+ kfree(userptr->sgt);
+ return rc;
+}
+
+/*
+ * hl_unpin_host_memory - unpins a chunk of host memory.
+ * @hdev: pointer to the habanalabs device structure
+ * @userptr: pointer to hl_userptr structure
+ *
+ * This function does the following:
+ * - Unpins the physical pages related to the host memory
+ * - Free the SG list
+ */
+void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
+{
+ hl_debugfs_remove_userptr(hdev, userptr);
+
+ if (userptr->dma_mapped)
+ hdev->asic_funcs->hl_dma_unmap_sgtable(hdev, userptr->sgt, userptr->dir);
+
+ unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+ kvfree(userptr->pages);
+
+ list_del(&userptr->job_node);
+
+ sg_free_table(userptr->sgt);
+ kfree(userptr->sgt);
+}
+
+/**
+ * hl_userptr_delete_list() - clear userptr list.
+ * @hdev: pointer to the habanalabs device structure.
+ * @userptr_list: pointer to the list to clear.
+ *
+ * This function does the following:
+ * - Iterates over the list and unpins the host memory and frees the userptr
+ * structure.
+ */
+void hl_userptr_delete_list(struct hl_device *hdev,
+ struct list_head *userptr_list)
+{
+ struct hl_userptr *userptr, *tmp;
+
+ list_for_each_entry_safe(userptr, tmp, userptr_list, job_node) {
+ hl_unpin_host_memory(hdev, userptr);
+ kfree(userptr);
+ }
+
+ INIT_LIST_HEAD(userptr_list);
+}
+
+/**
+ * hl_userptr_is_pinned() - returns whether the given userptr is pinned.
+ * @hdev: pointer to the habanalabs device structure.
+ * @addr: user address to check.
+ * @size: user block size to check.
+ * @userptr_list: pointer to the list to clear.
+ * @userptr: pointer to userptr to check.
+ *
+ * This function does the following:
+ * - Iterates over the list and checks if the given userptr is in it, means is
+ * pinned. If so, returns true, otherwise returns false.
+ */
+bool hl_userptr_is_pinned(struct hl_device *hdev, u64 addr,
+ u32 size, struct list_head *userptr_list,
+ struct hl_userptr **userptr)
+{
+ list_for_each_entry((*userptr), userptr_list, job_node) {
+ if ((addr == (*userptr)->addr) && (size == (*userptr)->size))
+ return true;
+ }
+
+ return false;
+}
+
+/**
+ * va_range_init() - initialize virtual addresses range.
+ * @hdev: pointer to the habanalabs device structure.
+ * @va_ranges: pointer to va_ranges array.
+ * @range_type: virtual address range type.
+ * @start: range start address, inclusive.
+ * @end: range end address, inclusive.
+ * @page_size: page size for this va_range.
+ *
+ * This function does the following:
+ * - Initializes the virtual addresses list of the given range with the given
+ * addresses.
+ */
+static int va_range_init(struct hl_device *hdev, struct hl_va_range **va_ranges,
+ enum hl_va_range_type range_type, u64 start,
+ u64 end, u32 page_size)
+{
+ struct hl_va_range *va_range = va_ranges[range_type];
+ int rc;
+
+ INIT_LIST_HEAD(&va_range->list);
+
+ /*
+ * PAGE_SIZE alignment
+ * it is the caller's responsibility to align the addresses if the
+ * page size is not a power of 2
+ */
+
+ if (is_power_of_2(page_size)) {
+ start = round_up(start, page_size);
+
+ /*
+ * The end of the range is inclusive, hence we need to align it
+ * to the end of the last full page in the range. For example if
+ * end = 0x3ff5 with page size 0x1000, we need to align it to
+ * 0x2fff. The remaining 0xff5 bytes do not form a full page.
+ */
+ end = round_down(end + 1, page_size) - 1;
+ }
+
+ if (start >= end) {
+ dev_err(hdev->dev, "too small vm range for va list\n");
+ return -EFAULT;
+ }
+
+ rc = add_va_block(hdev, va_range, start, end);
+
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init host va list\n");
+ return rc;
+ }
+
+ va_range->start_addr = start;
+ va_range->end_addr = end;
+ va_range->page_size = page_size;
+
+ return 0;
+}
+
+/**
+ * va_range_fini() - clear a virtual addresses range.
+ * @hdev: pointer to the habanalabs structure.
+ * @va_range: pointer to virtual addresses range.
+ *
+ * This function does the following:
+ * - Frees the virtual addresses block list and its lock.
+ */
+static void va_range_fini(struct hl_device *hdev, struct hl_va_range *va_range)
+{
+ mutex_lock(&va_range->lock);
+ clear_va_list_locked(hdev, &va_range->list);
+ mutex_unlock(&va_range->lock);
+
+ mutex_destroy(&va_range->lock);
+ kfree(va_range);
+}
+
+/**
+ * vm_ctx_init_with_ranges() - initialize virtual memory for context.
+ * @ctx: pointer to the habanalabs context structure.
+ * @host_range_start: host virtual addresses range start.
+ * @host_range_end: host virtual addresses range end.
+ * @host_page_size: host page size.
+ * @host_huge_range_start: host virtual addresses range start for memory
+ * allocated with huge pages.
+ * @host_huge_range_end: host virtual addresses range end for memory allocated
+ * with huge pages.
+ * @host_huge_page_size: host huge page size.
+ * @dram_range_start: dram virtual addresses range start.
+ * @dram_range_end: dram virtual addresses range end.
+ * @dram_page_size: dram page size.
+ *
+ * This function initializes the following:
+ * - MMU for context.
+ * - Virtual address to area descriptor hashtable.
+ * - Virtual block list of available virtual memory.
+ */
+static int vm_ctx_init_with_ranges(struct hl_ctx *ctx,
+ u64 host_range_start,
+ u64 host_range_end,
+ u32 host_page_size,
+ u64 host_huge_range_start,
+ u64 host_huge_range_end,
+ u32 host_huge_page_size,
+ u64 dram_range_start,
+ u64 dram_range_end,
+ u32 dram_page_size)
+{
+ struct hl_device *hdev = ctx->hdev;
+ int i, rc;
+
+ for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++) {
+ ctx->va_range[i] =
+ kzalloc(sizeof(struct hl_va_range), GFP_KERNEL);
+ if (!ctx->va_range[i]) {
+ rc = -ENOMEM;
+ goto free_va_range;
+ }
+ }
+
+ rc = hl_mmu_ctx_init(ctx);
+ if (rc) {
+ dev_err(hdev->dev, "failed to init context %d\n", ctx->asid);
+ goto free_va_range;
+ }
+
+ mutex_init(&ctx->mem_hash_lock);
+ hash_init(ctx->mem_hash);
+
+ mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
+
+ rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_HOST,
+ host_range_start, host_range_end, host_page_size);
+ if (rc) {
+ dev_err(hdev->dev, "failed to init host vm range\n");
+ goto mmu_ctx_fini;
+ }
+
+ if (hdev->pmmu_huge_range) {
+ mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
+
+ rc = va_range_init(hdev,
+ ctx->va_range, HL_VA_RANGE_TYPE_HOST_HUGE,
+ host_huge_range_start, host_huge_range_end,
+ host_huge_page_size);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to init host huge vm range\n");
+ goto clear_host_va_range;
+ }
+ } else {
+ kfree(ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
+ ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE] =
+ ctx->va_range[HL_VA_RANGE_TYPE_HOST];
+ }
+
+ mutex_init(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
+
+ rc = va_range_init(hdev, ctx->va_range, HL_VA_RANGE_TYPE_DRAM,
+ dram_range_start, dram_range_end, dram_page_size);
+ if (rc) {
+ dev_err(hdev->dev, "failed to init dram vm range\n");
+ goto clear_host_huge_va_range;
+ }
+
+ hl_debugfs_add_ctx_mem_hash(hdev, ctx);
+
+ return 0;
+
+clear_host_huge_va_range:
+ mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_DRAM]->lock);
+
+ if (hdev->pmmu_huge_range) {
+ mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
+ clear_va_list_locked(hdev,
+ &ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->list);
+ mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
+ }
+clear_host_va_range:
+ if (hdev->pmmu_huge_range)
+ mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]->lock);
+ mutex_lock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
+ clear_va_list_locked(hdev, &ctx->va_range[HL_VA_RANGE_TYPE_HOST]->list);
+ mutex_unlock(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
+mmu_ctx_fini:
+ mutex_destroy(&ctx->va_range[HL_VA_RANGE_TYPE_HOST]->lock);
+ mutex_destroy(&ctx->mem_hash_lock);
+ hl_mmu_ctx_fini(ctx);
+free_va_range:
+ for (i = 0 ; i < HL_VA_RANGE_TYPE_MAX ; i++)
+ kfree(ctx->va_range[i]);
+
+ return rc;
+}
+
+int hl_vm_ctx_init(struct hl_ctx *ctx)
+{
+ struct asic_fixed_properties *prop = &ctx->hdev->asic_prop;
+ u64 host_range_start, host_range_end, host_huge_range_start,
+ host_huge_range_end, dram_range_start, dram_range_end;
+ u32 host_page_size, host_huge_page_size, dram_page_size;
+
+ atomic64_set(&ctx->dram_phys_mem, 0);
+
+ /*
+ * - If MMU is enabled, init the ranges as usual.
+ * - If MMU is disabled, in case of host mapping, the returned address
+ * is the given one.
+ * In case of DRAM mapping, the returned address is the physical
+ * address of the memory related to the given handle.
+ */
+ if (!ctx->hdev->mmu_enable)
+ return 0;
+
+ dram_range_start = prop->dmmu.start_addr;
+ dram_range_end = prop->dmmu.end_addr - 1;
+ dram_page_size = prop->dram_page_size ?
+ prop->dram_page_size : prop->dmmu.page_size;
+ host_range_start = prop->pmmu.start_addr;
+ host_range_end = prop->pmmu.end_addr - 1;
+ host_page_size = prop->pmmu.page_size;
+ host_huge_range_start = prop->pmmu_huge.start_addr;
+ host_huge_range_end = prop->pmmu_huge.end_addr - 1;
+ host_huge_page_size = prop->pmmu_huge.page_size;
+
+ return vm_ctx_init_with_ranges(ctx, host_range_start, host_range_end,
+ host_page_size, host_huge_range_start,
+ host_huge_range_end, host_huge_page_size,
+ dram_range_start, dram_range_end, dram_page_size);
+}
+
+/**
+ * hl_vm_ctx_fini() - virtual memory teardown of context.
+ * @ctx: pointer to the habanalabs context structure.
+ *
+ * This function perform teardown the following:
+ * - Virtual block list of available virtual memory.
+ * - Virtual address to area descriptor hashtable.
+ * - MMU for context.
+ *
+ * In addition this function does the following:
+ * - Unmaps the existing hashtable nodes if the hashtable is not empty. The
+ * hashtable should be empty as no valid mappings should exist at this
+ * point.
+ * - Frees any existing physical page list from the idr which relates to the
+ * current context asid.
+ * - This function checks the virtual block list for correctness. At this point
+ * the list should contain one element which describes the whole virtual
+ * memory range of the context. Otherwise, a warning is printed.
+ */
+void hl_vm_ctx_fini(struct hl_ctx *ctx)
+{
+ struct hl_vm_phys_pg_pack *phys_pg_list, *tmp_phys_node;
+ struct hl_device *hdev = ctx->hdev;
+ struct hl_vm_hash_node *hnode;
+ struct hl_vm *vm = &hdev->vm;
+ struct hlist_node *tmp_node;
+ struct list_head free_list;
+ struct hl_mem_in args;
+ int i;
+
+ if (!hdev->mmu_enable)
+ return;
+
+ hl_debugfs_remove_ctx_mem_hash(hdev, ctx);
+
+ /*
+ * Clearly something went wrong on hard reset so no point in printing
+ * another side effect error
+ */
+ if (!hdev->reset_info.hard_reset_pending && !hash_empty(ctx->mem_hash))
+ dev_dbg(hdev->dev,
+ "user released device without removing its memory mappings\n");
+
+ hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) {
+ dev_dbg(hdev->dev,
+ "hl_mem_hash_node of vaddr 0x%llx of asid %d is still alive\n",
+ hnode->vaddr, ctx->asid);
+ args.unmap.device_virt_addr = hnode->vaddr;
+ unmap_device_va(ctx, &args, true);
+ }
+
+ mutex_lock(&hdev->mmu_lock);
+
+ /* invalidate the cache once after the unmapping loop */
+ hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
+ hl_mmu_invalidate_cache(hdev, true, MMU_OP_PHYS_PACK);
+
+ mutex_unlock(&hdev->mmu_lock);
+
+ INIT_LIST_HEAD(&free_list);
+
+ spin_lock(&vm->idr_lock);
+ idr_for_each_entry(&vm->phys_pg_pack_handles, phys_pg_list, i)
+ if (phys_pg_list->asid == ctx->asid) {
+ dev_dbg(hdev->dev,
+ "page list 0x%px of asid %d is still alive\n",
+ phys_pg_list, ctx->asid);
+
+ atomic64_sub(phys_pg_list->total_size, &hdev->dram_used_mem);
+ idr_remove(&vm->phys_pg_pack_handles, i);
+ list_add(&phys_pg_list->node, &free_list);
+ }
+ spin_unlock(&vm->idr_lock);
+
+ list_for_each_entry_safe(phys_pg_list, tmp_phys_node, &free_list, node)
+ free_phys_pg_pack(hdev, phys_pg_list);
+
+ va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_DRAM]);
+ va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST]);
+
+ if (hdev->pmmu_huge_range)
+ va_range_fini(hdev, ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]);
+
+ mutex_destroy(&ctx->mem_hash_lock);
+ hl_mmu_ctx_fini(ctx);
+
+ /* In this case we need to clear the global accounting of DRAM usage
+ * because the user notifies us on allocations. If the user is no more,
+ * all DRAM is available
+ */
+ if (ctx->asid != HL_KERNEL_ASID_ID &&
+ !hdev->asic_prop.dram_supports_virtual_memory)
+ atomic64_set(&hdev->dram_used_mem, 0);
+}
+
+/**
+ * hl_vm_init() - initialize virtual memory module.
+ * @hdev: pointer to the habanalabs device structure.
+ *
+ * This function initializes the following:
+ * - MMU module.
+ * - DRAM physical pages pool of 2MB.
+ * - Idr for device memory allocation handles.
+ */
+int hl_vm_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hl_vm *vm = &hdev->vm;
+ int rc;
+
+ if (is_power_of_2(prop->dram_page_size))
+ vm->dram_pg_pool =
+ gen_pool_create(__ffs(prop->dram_page_size), -1);
+ else
+ vm->dram_pg_pool =
+ gen_pool_create(__ffs(DRAM_POOL_PAGE_SIZE), -1);
+
+ if (!vm->dram_pg_pool) {
+ dev_err(hdev->dev, "Failed to create dram page pool\n");
+ return -ENOMEM;
+ }
+
+ kref_init(&vm->dram_pg_pool_refcount);
+
+ rc = gen_pool_add(vm->dram_pg_pool, prop->dram_user_base_address,
+ prop->dram_end_address - prop->dram_user_base_address,
+ -1);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to add memory to dram page pool %d\n", rc);
+ goto pool_add_err;
+ }
+
+ spin_lock_init(&vm->idr_lock);
+ idr_init(&vm->phys_pg_pack_handles);
+
+ atomic64_set(&hdev->dram_used_mem, 0);
+
+ vm->init_done = true;
+
+ return 0;
+
+pool_add_err:
+ gen_pool_destroy(vm->dram_pg_pool);
+
+ return rc;
+}
+
+/**
+ * hl_vm_fini() - virtual memory module teardown.
+ * @hdev: pointer to the habanalabs device structure.
+ *
+ * This function perform teardown to the following:
+ * - Idr for device memory allocation handles.
+ * - DRAM physical pages pool of 2MB.
+ * - MMU module.
+ */
+void hl_vm_fini(struct hl_device *hdev)
+{
+ struct hl_vm *vm = &hdev->vm;
+
+ if (!vm->init_done)
+ return;
+
+ /*
+ * At this point all the contexts should be freed and hence no DRAM
+ * memory should be in use. Hence the DRAM pool should be freed here.
+ */
+ if (kref_put(&vm->dram_pg_pool_refcount, dram_pg_pool_do_release) != 1)
+ dev_warn(hdev->dev, "dram_pg_pool was not destroyed on %s\n",
+ __func__);
+
+ vm->init_done = false;
+}
+
+/**
+ * hl_hw_block_mem_init() - HW block memory initialization.
+ * @ctx: pointer to the habanalabs context structure.
+ *
+ * This function initializes the HW block virtual mapped addresses list and
+ * it's lock.
+ */
+void hl_hw_block_mem_init(struct hl_ctx *ctx)
+{
+ mutex_init(&ctx->hw_block_list_lock);
+ INIT_LIST_HEAD(&ctx->hw_block_mem_list);
+}
+
+/**
+ * hl_hw_block_mem_fini() - HW block memory teardown.
+ * @ctx: pointer to the habanalabs context structure.
+ *
+ * This function clears the HW block virtual mapped addresses list and destroys
+ * it's lock.
+ */
+void hl_hw_block_mem_fini(struct hl_ctx *ctx)
+{
+ struct hl_vm_hw_block_list_node *lnode, *tmp;
+
+ if (!list_empty(&ctx->hw_block_mem_list))
+ dev_crit(ctx->hdev->dev, "HW block mem list isn't empty\n");
+
+ list_for_each_entry_safe(lnode, tmp, &ctx->hw_block_mem_list, node) {
+ list_del(&lnode->node);
+ kfree(lnode);
+ }
+
+ mutex_destroy(&ctx->hw_block_list_lock);
+}
--- /dev/null
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2016-2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudiP.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+#include "../include/hw_ip/mmu/mmu_v1_1.h"
+#include "../include/gaudi/gaudi_masks.h"
+#include "../include/gaudi/gaudi_fw_if.h"
+#include "../include/gaudi/gaudi_reg_map.h"
+#include "../include/gaudi/gaudi_async_ids_map_extended.h"
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/firmware.h>
+#include <linux/hwmon.h>
+#include <linux/iommu.h>
+#include <linux/seq_file.h>
+
+/*
+ * Gaudi security scheme:
+ *
+ * 1. Host is protected by:
+ * - Range registers
+ * - MMU
+ *
+ * 2. DDR is protected by:
+ * - Range registers (protect the first 512MB)
+ *
+ * 3. Configuration is protected by:
+ * - Range registers
+ * - Protection bits
+ *
+ * MMU is always enabled.
+ *
+ * QMAN DMA channels 0,1 (PCI DMAN):
+ * - DMA is not secured.
+ * - PQ and CQ are secured.
+ * - CP is secured: The driver needs to parse CB but WREG should be allowed
+ * because of TDMA (tensor DMA). Hence, WREG is always not
+ * secured.
+ *
+ * When the driver needs to use DMA it will check that Gaudi is idle, set DMA
+ * channel 0 to be secured, execute the DMA and change it back to not secured.
+ * Currently, the driver doesn't use the DMA while there are compute jobs
+ * running.
+ *
+ * The current use cases for the driver to use the DMA are:
+ * - Clear SRAM on context switch (happens on context switch when device is
+ * idle)
+ * - MMU page tables area clear (happens on init)
+ *
+ * QMAN DMA 2-7, TPC, MME, NIC:
+ * PQ is secured and is located on the Host (HBM CON TPC3 bug)
+ * CQ, CP and the engine are not secured
+ *
+ */
+
+#define GAUDI_BOOT_FIT_FILE "habanalabs/gaudi/gaudi-boot-fit.itb"
+#define GAUDI_LINUX_FW_FILE "habanalabs/gaudi/gaudi-fit.itb"
+#define GAUDI_TPC_FW_FILE "habanalabs/gaudi/gaudi_tpc.bin"
+
+#define GAUDI_DMA_POOL_BLK_SIZE 0x100 /* 256 bytes */
+
+#define GAUDI_RESET_TIMEOUT_MSEC 2000 /* 2000ms */
+#define GAUDI_RESET_WAIT_MSEC 1 /* 1ms */
+#define GAUDI_CPU_RESET_WAIT_MSEC 200 /* 200ms */
+#define GAUDI_TEST_QUEUE_WAIT_USEC 100000 /* 100ms */
+
+#define GAUDI_PLDM_RESET_WAIT_MSEC 1000 /* 1s */
+#define GAUDI_PLDM_HRESET_TIMEOUT_MSEC 20000 /* 20s */
+#define GAUDI_PLDM_TEST_QUEUE_WAIT_USEC 1000000 /* 1s */
+#define GAUDI_PLDM_MMU_TIMEOUT_USEC (MMU_CONFIG_TIMEOUT_USEC * 100)
+#define GAUDI_PLDM_QMAN0_TIMEOUT_USEC (HL_DEVICE_TIMEOUT_USEC * 30)
+#define GAUDI_PLDM_TPC_KERNEL_WAIT_USEC (HL_DEVICE_TIMEOUT_USEC * 30)
+#define GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC 4000000 /* 4s */
+#define GAUDI_MSG_TO_CPU_TIMEOUT_USEC 4000000 /* 4s */
+#define GAUDI_WAIT_FOR_BL_TIMEOUT_USEC 15000000 /* 15s */
+
+#define GAUDI_QMAN0_FENCE_VAL 0x72E91AB9
+
+#define GAUDI_MAX_STRING_LEN 20
+
+#define GAUDI_CB_POOL_CB_CNT 512
+#define GAUDI_CB_POOL_CB_SIZE 0x20000 /* 128KB */
+
+#define GAUDI_ALLOC_CPU_MEM_RETRY_CNT 3
+
+#define GAUDI_NUM_OF_TPC_INTR_CAUSE 20
+
+#define GAUDI_NUM_OF_QM_ERR_CAUSE 16
+
+#define GAUDI_NUM_OF_QM_ARB_ERR_CAUSE 3
+
+#define GAUDI_ARB_WDT_TIMEOUT 0xEE6b27FF /* 8 seconds */
+
+#define HBM_SCRUBBING_TIMEOUT_US 1000000 /* 1s */
+
+#define BIN_REG_STRING_SIZE sizeof("0b10101010101010101010101010101010")
+
+#define MONITOR_SOB_STRING_SIZE 256
+
+static u32 gaudi_stream_master[GAUDI_STREAM_MASTER_ARR_SIZE] = {
+ GAUDI_QUEUE_ID_DMA_0_0,
+ GAUDI_QUEUE_ID_DMA_0_1,
+ GAUDI_QUEUE_ID_DMA_0_2,
+ GAUDI_QUEUE_ID_DMA_0_3,
+ GAUDI_QUEUE_ID_DMA_1_0,
+ GAUDI_QUEUE_ID_DMA_1_1,
+ GAUDI_QUEUE_ID_DMA_1_2,
+ GAUDI_QUEUE_ID_DMA_1_3
+};
+
+static const char gaudi_irq_name[GAUDI_MSI_ENTRIES][GAUDI_MAX_STRING_LEN] = {
+ "gaudi cq 0_0", "gaudi cq 0_1", "gaudi cq 0_2", "gaudi cq 0_3",
+ "gaudi cq 1_0", "gaudi cq 1_1", "gaudi cq 1_2", "gaudi cq 1_3",
+ "gaudi cq 5_0", "gaudi cq 5_1", "gaudi cq 5_2", "gaudi cq 5_3",
+ "gaudi cpu eq"
+};
+
+static const u8 gaudi_dma_assignment[GAUDI_DMA_MAX] = {
+ [GAUDI_PCI_DMA_1] = GAUDI_ENGINE_ID_DMA_0,
+ [GAUDI_PCI_DMA_2] = GAUDI_ENGINE_ID_DMA_1,
+ [GAUDI_HBM_DMA_1] = GAUDI_ENGINE_ID_DMA_2,
+ [GAUDI_HBM_DMA_2] = GAUDI_ENGINE_ID_DMA_3,
+ [GAUDI_HBM_DMA_3] = GAUDI_ENGINE_ID_DMA_4,
+ [GAUDI_HBM_DMA_4] = GAUDI_ENGINE_ID_DMA_5,
+ [GAUDI_HBM_DMA_5] = GAUDI_ENGINE_ID_DMA_6,
+ [GAUDI_HBM_DMA_6] = GAUDI_ENGINE_ID_DMA_7
+};
+
+static const u8 gaudi_cq_assignment[NUMBER_OF_CMPLT_QUEUES] = {
+ [0] = GAUDI_QUEUE_ID_DMA_0_0,
+ [1] = GAUDI_QUEUE_ID_DMA_0_1,
+ [2] = GAUDI_QUEUE_ID_DMA_0_2,
+ [3] = GAUDI_QUEUE_ID_DMA_0_3,
+ [4] = GAUDI_QUEUE_ID_DMA_1_0,
+ [5] = GAUDI_QUEUE_ID_DMA_1_1,
+ [6] = GAUDI_QUEUE_ID_DMA_1_2,
+ [7] = GAUDI_QUEUE_ID_DMA_1_3,
+};
+
+static const u16 gaudi_packet_sizes[MAX_PACKET_ID] = {
+ [PACKET_WREG_32] = sizeof(struct packet_wreg32),
+ [PACKET_WREG_BULK] = sizeof(struct packet_wreg_bulk),
+ [PACKET_MSG_LONG] = sizeof(struct packet_msg_long),
+ [PACKET_MSG_SHORT] = sizeof(struct packet_msg_short),
+ [PACKET_CP_DMA] = sizeof(struct packet_cp_dma),
+ [PACKET_REPEAT] = sizeof(struct packet_repeat),
+ [PACKET_MSG_PROT] = sizeof(struct packet_msg_prot),
+ [PACKET_FENCE] = sizeof(struct packet_fence),
+ [PACKET_LIN_DMA] = sizeof(struct packet_lin_dma),
+ [PACKET_NOP] = sizeof(struct packet_nop),
+ [PACKET_STOP] = sizeof(struct packet_stop),
+ [PACKET_ARB_POINT] = sizeof(struct packet_arb_point),
+ [PACKET_WAIT] = sizeof(struct packet_wait),
+ [PACKET_LOAD_AND_EXE] = sizeof(struct packet_load_and_exe)
+};
+
+static inline bool validate_packet_id(enum packet_id id)
+{
+ switch (id) {
+ case PACKET_WREG_32:
+ case PACKET_WREG_BULK:
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_CP_DMA:
+ case PACKET_REPEAT:
+ case PACKET_MSG_PROT:
+ case PACKET_FENCE:
+ case PACKET_LIN_DMA:
+ case PACKET_NOP:
+ case PACKET_STOP:
+ case PACKET_ARB_POINT:
+ case PACKET_WAIT:
+ case PACKET_LOAD_AND_EXE:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static const char * const
+gaudi_tpc_interrupts_cause[GAUDI_NUM_OF_TPC_INTR_CAUSE] = {
+ "tpc_address_exceed_slm",
+ "tpc_div_by_0",
+ "tpc_spu_mac_overflow",
+ "tpc_spu_addsub_overflow",
+ "tpc_spu_abs_overflow",
+ "tpc_spu_fp_dst_nan_inf",
+ "tpc_spu_fp_dst_denorm",
+ "tpc_vpu_mac_overflow",
+ "tpc_vpu_addsub_overflow",
+ "tpc_vpu_abs_overflow",
+ "tpc_vpu_fp_dst_nan_inf",
+ "tpc_vpu_fp_dst_denorm",
+ "tpc_assertions",
+ "tpc_illegal_instruction",
+ "tpc_pc_wrap_around",
+ "tpc_qm_sw_err",
+ "tpc_hbw_rresp_err",
+ "tpc_hbw_bresp_err",
+ "tpc_lbw_rresp_err",
+ "tpc_lbw_bresp_err"
+};
+
+static const char * const
+gaudi_qman_error_cause[GAUDI_NUM_OF_QM_ERR_CAUSE] = {
+ "PQ AXI HBW error",
+ "CQ AXI HBW error",
+ "CP AXI HBW error",
+ "CP error due to undefined OPCODE",
+ "CP encountered STOP OPCODE",
+ "CP AXI LBW error",
+ "CP WRREG32 or WRBULK returned error",
+ "N/A",
+ "FENCE 0 inc over max value and clipped",
+ "FENCE 1 inc over max value and clipped",
+ "FENCE 2 inc over max value and clipped",
+ "FENCE 3 inc over max value and clipped",
+ "FENCE 0 dec under min value and clipped",
+ "FENCE 1 dec under min value and clipped",
+ "FENCE 2 dec under min value and clipped",
+ "FENCE 3 dec under min value and clipped"
+};
+
+static const char * const
+gaudi_qman_arb_error_cause[GAUDI_NUM_OF_QM_ARB_ERR_CAUSE] = {
+ "Choice push while full error",
+ "Choice Q watchdog error",
+ "MSG AXI LBW returned with error"
+};
+
+static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_0 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_1 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_2 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_0_3 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_0 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_1 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_2 */
+ QUEUE_TYPE_EXT, /* GAUDI_QUEUE_ID_DMA_1_3 */
+ QUEUE_TYPE_CPU, /* GAUDI_QUEUE_ID_CPU_PQ */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_2_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_3_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_4_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_5_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_6_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_DMA_7_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_0_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_MME_1_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_0_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_1_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_2_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_3_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_4_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_5_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_6_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_TPC_7_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_0_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_1_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_2_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_3_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_4_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_5_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_6_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_7_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_8_3 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_0 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_1 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_2 */
+ QUEUE_TYPE_INT, /* GAUDI_QUEUE_ID_NIC_9_3 */
+};
+
+static struct hl_hw_obj_name_entry gaudi_so_id_to_str[] = {
+ { .id = 0, .name = "SYNC_OBJ_DMA_DOWN_FEEDBACK" },
+ { .id = 1, .name = "SYNC_OBJ_DMA_UP_FEEDBACK" },
+ { .id = 2, .name = "SYNC_OBJ_DMA_STATIC_DRAM_SRAM_FEEDBACK" },
+ { .id = 3, .name = "SYNC_OBJ_DMA_SRAM_DRAM_FEEDBACK" },
+ { .id = 4, .name = "SYNC_OBJ_FIRST_COMPUTE_FINISH" },
+ { .id = 5, .name = "SYNC_OBJ_HOST_DRAM_DONE" },
+ { .id = 6, .name = "SYNC_OBJ_DBG_CTR_DEPRECATED" },
+ { .id = 7, .name = "SYNC_OBJ_DMA_ACTIVATIONS_DRAM_SRAM_FEEDBACK" },
+ { .id = 8, .name = "SYNC_OBJ_ENGINE_SEM_MME_0" },
+ { .id = 9, .name = "SYNC_OBJ_ENGINE_SEM_MME_1" },
+ { .id = 10, .name = "SYNC_OBJ_ENGINE_SEM_TPC_0" },
+ { .id = 11, .name = "SYNC_OBJ_ENGINE_SEM_TPC_1" },
+ { .id = 12, .name = "SYNC_OBJ_ENGINE_SEM_TPC_2" },
+ { .id = 13, .name = "SYNC_OBJ_ENGINE_SEM_TPC_3" },
+ { .id = 14, .name = "SYNC_OBJ_ENGINE_SEM_TPC_4" },
+ { .id = 15, .name = "SYNC_OBJ_ENGINE_SEM_TPC_5" },
+ { .id = 16, .name = "SYNC_OBJ_ENGINE_SEM_TPC_6" },
+ { .id = 17, .name = "SYNC_OBJ_ENGINE_SEM_TPC_7" },
+ { .id = 18, .name = "SYNC_OBJ_ENGINE_SEM_DMA_1" },
+ { .id = 19, .name = "SYNC_OBJ_ENGINE_SEM_DMA_2" },
+ { .id = 20, .name = "SYNC_OBJ_ENGINE_SEM_DMA_3" },
+ { .id = 21, .name = "SYNC_OBJ_ENGINE_SEM_DMA_4" },
+ { .id = 22, .name = "SYNC_OBJ_ENGINE_SEM_DMA_5" },
+ { .id = 23, .name = "SYNC_OBJ_ENGINE_SEM_DMA_6" },
+ { .id = 24, .name = "SYNC_OBJ_ENGINE_SEM_DMA_7" },
+ { .id = 25, .name = "SYNC_OBJ_DBG_CTR_0" },
+ { .id = 26, .name = "SYNC_OBJ_DBG_CTR_1" },
+};
+
+static struct hl_hw_obj_name_entry gaudi_monitor_id_to_str[] = {
+ { .id = 200, .name = "MON_OBJ_DMA_DOWN_FEEDBACK_RESET" },
+ { .id = 201, .name = "MON_OBJ_DMA_UP_FEEDBACK_RESET" },
+ { .id = 203, .name = "MON_OBJ_DRAM_TO_SRAM_QUEUE_FENCE" },
+ { .id = 204, .name = "MON_OBJ_TPC_0_CLK_GATE" },
+ { .id = 205, .name = "MON_OBJ_TPC_1_CLK_GATE" },
+ { .id = 206, .name = "MON_OBJ_TPC_2_CLK_GATE" },
+ { .id = 207, .name = "MON_OBJ_TPC_3_CLK_GATE" },
+ { .id = 208, .name = "MON_OBJ_TPC_4_CLK_GATE" },
+ { .id = 209, .name = "MON_OBJ_TPC_5_CLK_GATE" },
+ { .id = 210, .name = "MON_OBJ_TPC_6_CLK_GATE" },
+ { .id = 211, .name = "MON_OBJ_TPC_7_CLK_GATE" },
+};
+
+static s64 gaudi_state_dump_specs_props[] = {
+ [SP_SYNC_OBJ_BASE_ADDR] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0,
+ [SP_NEXT_SYNC_OBJ_ADDR] = NEXT_SYNC_OBJ_ADDR_INTERVAL,
+ [SP_SYNC_OBJ_AMOUNT] = NUM_OF_SOB_IN_BLOCK,
+ [SP_MON_OBJ_WR_ADDR_LOW] =
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0,
+ [SP_MON_OBJ_WR_ADDR_HIGH] =
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0,
+ [SP_MON_OBJ_WR_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_DATA_0,
+ [SP_MON_OBJ_ARM_DATA] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_ARM_0,
+ [SP_MON_OBJ_STATUS] = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0,
+ [SP_MONITORS_AMOUNT] = NUM_OF_MONITORS_IN_BLOCK,
+ [SP_TPC0_CMDQ] = mmTPC0_QM_GLBL_CFG0,
+ [SP_TPC0_CFG_SO] = mmTPC0_CFG_QM_SYNC_OBJECT_ADDR,
+ [SP_NEXT_TPC] = mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0,
+ [SP_MME_CMDQ] = mmMME0_QM_GLBL_CFG0,
+ [SP_MME_CFG_SO] = mmMME0_CTRL_ARCH_DESC_SYNC_OBJECT_ADDR_LOW_LOCAL,
+ [SP_NEXT_MME] = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0,
+ [SP_DMA_CMDQ] = mmDMA0_QM_GLBL_CFG0,
+ [SP_DMA_CFG_SO] = mmDMA0_CORE_WR_COMP_ADDR_LO,
+ [SP_DMA_QUEUES_OFFSET] = mmDMA1_QM_GLBL_CFG0 - mmDMA0_QM_GLBL_CFG0,
+ [SP_NUM_OF_MME_ENGINES] = NUM_OF_MME_ENGINES,
+ [SP_SUB_MME_ENG_NUM] = NUM_OF_MME_SUB_ENGINES,
+ [SP_NUM_OF_DMA_ENGINES] = NUM_OF_DMA_ENGINES,
+ [SP_NUM_OF_TPC_ENGINES] = NUM_OF_TPC_ENGINES,
+ [SP_ENGINE_NUM_OF_QUEUES] = NUM_OF_QUEUES,
+ [SP_ENGINE_NUM_OF_STREAMS] = NUM_OF_STREAMS,
+ [SP_ENGINE_NUM_OF_FENCES] = NUM_OF_FENCES,
+ [SP_FENCE0_CNT_OFFSET] =
+ mmDMA0_QM_CP_FENCE0_CNT_0 - mmDMA0_QM_GLBL_CFG0,
+ [SP_FENCE0_RDATA_OFFSET] =
+ mmDMA0_QM_CP_FENCE0_RDATA_0 - mmDMA0_QM_GLBL_CFG0,
+ [SP_CP_STS_OFFSET] = mmDMA0_QM_CP_STS_0 - mmDMA0_QM_GLBL_CFG0,
+ [SP_NUM_CORES] = 1,
+};
+
+static const int gaudi_queue_id_to_engine_id[] = {
+ [GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3] = GAUDI_ENGINE_ID_DMA_0,
+ [GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3] = GAUDI_ENGINE_ID_DMA_1,
+ [GAUDI_QUEUE_ID_CPU_PQ] = GAUDI_ENGINE_ID_SIZE,
+ [GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3] = GAUDI_ENGINE_ID_DMA_2,
+ [GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3] = GAUDI_ENGINE_ID_DMA_3,
+ [GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3] = GAUDI_ENGINE_ID_DMA_4,
+ [GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3] = GAUDI_ENGINE_ID_DMA_5,
+ [GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3] = GAUDI_ENGINE_ID_DMA_6,
+ [GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3] = GAUDI_ENGINE_ID_DMA_7,
+ [GAUDI_QUEUE_ID_MME_0_0...GAUDI_QUEUE_ID_MME_0_3] = GAUDI_ENGINE_ID_MME_0,
+ [GAUDI_QUEUE_ID_MME_1_0...GAUDI_QUEUE_ID_MME_1_3] = GAUDI_ENGINE_ID_MME_2,
+ [GAUDI_QUEUE_ID_TPC_0_0...GAUDI_QUEUE_ID_TPC_0_3] = GAUDI_ENGINE_ID_TPC_0,
+ [GAUDI_QUEUE_ID_TPC_1_0...GAUDI_QUEUE_ID_TPC_1_3] = GAUDI_ENGINE_ID_TPC_1,
+ [GAUDI_QUEUE_ID_TPC_2_0...GAUDI_QUEUE_ID_TPC_2_3] = GAUDI_ENGINE_ID_TPC_2,
+ [GAUDI_QUEUE_ID_TPC_3_0...GAUDI_QUEUE_ID_TPC_3_3] = GAUDI_ENGINE_ID_TPC_3,
+ [GAUDI_QUEUE_ID_TPC_4_0...GAUDI_QUEUE_ID_TPC_4_3] = GAUDI_ENGINE_ID_TPC_4,
+ [GAUDI_QUEUE_ID_TPC_5_0...GAUDI_QUEUE_ID_TPC_5_3] = GAUDI_ENGINE_ID_TPC_5,
+ [GAUDI_QUEUE_ID_TPC_6_0...GAUDI_QUEUE_ID_TPC_6_3] = GAUDI_ENGINE_ID_TPC_6,
+ [GAUDI_QUEUE_ID_TPC_7_0...GAUDI_QUEUE_ID_TPC_7_3] = GAUDI_ENGINE_ID_TPC_7,
+ [GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3] = GAUDI_ENGINE_ID_NIC_0,
+ [GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3] = GAUDI_ENGINE_ID_NIC_1,
+ [GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3] = GAUDI_ENGINE_ID_NIC_2,
+ [GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3] = GAUDI_ENGINE_ID_NIC_3,
+ [GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3] = GAUDI_ENGINE_ID_NIC_4,
+ [GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3] = GAUDI_ENGINE_ID_NIC_5,
+ [GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3] = GAUDI_ENGINE_ID_NIC_6,
+ [GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3] = GAUDI_ENGINE_ID_NIC_7,
+ [GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3] = GAUDI_ENGINE_ID_NIC_8,
+ [GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3] = GAUDI_ENGINE_ID_NIC_9,
+};
+
+/* The order here is opposite to the order of the indexing in the h/w.
+ * i.e. SYNC_MGR_W_S is actually 0, SYNC_MGR_E_S is 1, etc.
+ */
+static const char * const gaudi_sync_manager_names[] = {
+ "SYNC_MGR_E_N",
+ "SYNC_MGR_W_N",
+ "SYNC_MGR_E_S",
+ "SYNC_MGR_W_S",
+ NULL
+};
+
+struct ecc_info_extract_params {
+ u64 block_address;
+ u32 num_memories;
+ bool derr;
+};
+
+static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
+ u64 phys_addr);
+static int gaudi_send_job_on_qman0(struct hl_device *hdev,
+ struct hl_cs_job *job);
+static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
+ u32 size, u64 val);
+static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
+ u32 num_regs, u32 val);
+static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel,
+ u32 tpc_id);
+static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev);
+static int gaudi_cpucp_info_get(struct hl_device *hdev);
+static void gaudi_disable_clock_gating(struct hl_device *hdev);
+static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid);
+static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size, bool eb);
+static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop);
+static inline enum hl_collective_mode
+get_collective_mode(struct hl_device *hdev, u32 queue_id)
+{
+ if (gaudi_queue_type[queue_id] == QUEUE_TYPE_EXT)
+ return HL_COLLECTIVE_MASTER;
+
+ if (queue_id >= GAUDI_QUEUE_ID_DMA_5_0 &&
+ queue_id <= GAUDI_QUEUE_ID_DMA_5_3)
+ return HL_COLLECTIVE_SLAVE;
+
+ if (queue_id >= GAUDI_QUEUE_ID_TPC_7_0 &&
+ queue_id <= GAUDI_QUEUE_ID_TPC_7_3)
+ return HL_COLLECTIVE_SLAVE;
+
+ if (queue_id >= GAUDI_QUEUE_ID_NIC_0_0 &&
+ queue_id <= GAUDI_QUEUE_ID_NIC_9_3)
+ return HL_COLLECTIVE_SLAVE;
+
+ return HL_COLLECTIVE_NOT_SUPPORTED;
+}
+
+static inline void set_default_power_values(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+ if (hdev->card_type == cpucp_card_type_pmc) {
+ prop->max_power_default = MAX_POWER_DEFAULT_PMC;
+
+ if (prop->fw_security_enabled)
+ prop->dc_power_default = DC_POWER_DEFAULT_PMC_SEC;
+ else
+ prop->dc_power_default = DC_POWER_DEFAULT_PMC;
+ } else {
+ prop->max_power_default = MAX_POWER_DEFAULT_PCI;
+ prop->dc_power_default = DC_POWER_DEFAULT_PCI;
+ }
+}
+
+static int gaudi_set_fixed_properties(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 num_sync_stream_queues = 0;
+ int i;
+
+ prop->max_queues = GAUDI_QUEUE_ID_SIZE;
+ prop->hw_queues_props = kcalloc(prop->max_queues,
+ sizeof(struct hw_queue_properties),
+ GFP_KERNEL);
+
+ if (!prop->hw_queues_props)
+ return -ENOMEM;
+
+ for (i = 0 ; i < prop->max_queues ; i++) {
+ if (gaudi_queue_type[i] == QUEUE_TYPE_EXT) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
+ prop->hw_queues_props[i].driver_only = 0;
+ prop->hw_queues_props[i].supports_sync_stream = 1;
+ prop->hw_queues_props[i].cb_alloc_flags =
+ CB_ALLOC_KERNEL;
+ num_sync_stream_queues++;
+ } else if (gaudi_queue_type[i] == QUEUE_TYPE_CPU) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
+ prop->hw_queues_props[i].driver_only = 1;
+ prop->hw_queues_props[i].supports_sync_stream = 0;
+ prop->hw_queues_props[i].cb_alloc_flags =
+ CB_ALLOC_KERNEL;
+ } else if (gaudi_queue_type[i] == QUEUE_TYPE_INT) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
+ prop->hw_queues_props[i].driver_only = 0;
+ prop->hw_queues_props[i].supports_sync_stream = 0;
+ prop->hw_queues_props[i].cb_alloc_flags =
+ CB_ALLOC_USER;
+
+ }
+ prop->hw_queues_props[i].collective_mode =
+ get_collective_mode(hdev, i);
+ }
+
+ prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
+ prop->cfg_base_address = CFG_BASE;
+ prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
+ prop->host_base_address = HOST_PHYS_BASE;
+ prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
+ prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
+ prop->completion_mode = HL_COMPLETION_MODE_JOB;
+ prop->collective_first_sob = 0;
+ prop->collective_first_mon = 0;
+
+ /* 2 SOBs per internal queue stream are reserved for collective */
+ prop->sync_stream_first_sob =
+ ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR)
+ * QMAN_STREAMS * HL_RSVD_SOBS;
+
+ /* 1 monitor per internal queue stream are reserved for collective
+ * 2 monitors per external queue stream are reserved for collective
+ */
+ prop->sync_stream_first_mon =
+ (NUMBER_OF_COLLECTIVE_QUEUES * QMAN_STREAMS) +
+ (NUMBER_OF_EXT_HW_QUEUES * 2);
+
+ prop->dram_base_address = DRAM_PHYS_BASE;
+ prop->dram_size = GAUDI_HBM_SIZE_32GB;
+ prop->dram_end_address = prop->dram_base_address + prop->dram_size;
+ prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
+
+ prop->sram_base_address = SRAM_BASE_ADDR;
+ prop->sram_size = SRAM_SIZE;
+ prop->sram_end_address = prop->sram_base_address + prop->sram_size;
+ prop->sram_user_base_address =
+ prop->sram_base_address + SRAM_USER_BASE_OFFSET;
+
+ prop->mmu_cache_mng_addr = MMU_CACHE_MNG_ADDR;
+ prop->mmu_cache_mng_size = MMU_CACHE_MNG_SIZE;
+
+ prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
+ if (hdev->pldm)
+ prop->mmu_pgt_size = 0x800000; /* 8MB */
+ else
+ prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
+ prop->mmu_pte_size = HL_PTE_SIZE;
+ prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
+ prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
+ prop->dram_page_size = PAGE_SIZE_2MB;
+ prop->device_mem_alloc_default_page_size = prop->dram_page_size;
+ prop->dram_supports_virtual_memory = false;
+
+ prop->pmmu.hop_shifts[MMU_HOP0] = MMU_V1_1_HOP0_SHIFT;
+ prop->pmmu.hop_shifts[MMU_HOP1] = MMU_V1_1_HOP1_SHIFT;
+ prop->pmmu.hop_shifts[MMU_HOP2] = MMU_V1_1_HOP2_SHIFT;
+ prop->pmmu.hop_shifts[MMU_HOP3] = MMU_V1_1_HOP3_SHIFT;
+ prop->pmmu.hop_shifts[MMU_HOP4] = MMU_V1_1_HOP4_SHIFT;
+ prop->pmmu.hop_masks[MMU_HOP0] = MMU_V1_1_HOP0_MASK;
+ prop->pmmu.hop_masks[MMU_HOP1] = MMU_V1_1_HOP1_MASK;
+ prop->pmmu.hop_masks[MMU_HOP2] = MMU_V1_1_HOP2_MASK;
+ prop->pmmu.hop_masks[MMU_HOP3] = MMU_V1_1_HOP3_MASK;
+ prop->pmmu.hop_masks[MMU_HOP4] = MMU_V1_1_HOP4_MASK;
+ prop->pmmu.start_addr = VA_HOST_SPACE_START;
+ prop->pmmu.end_addr =
+ (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2) - 1;
+ prop->pmmu.page_size = PAGE_SIZE_4KB;
+ prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
+ prop->pmmu.last_mask = LAST_MASK;
+ /* TODO: will be duplicated until implementing per-MMU props */
+ prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
+ prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
+
+ /* PMMU and HPMMU are the same except of page size */
+ memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
+ prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
+
+ /* shifts and masks are the same in PMMU and DMMU */
+ memcpy(&prop->dmmu, &prop->pmmu, sizeof(prop->pmmu));
+ prop->dmmu.start_addr = (VA_HOST_SPACE_START + VA_HOST_SPACE_SIZE / 2);
+ prop->dmmu.end_addr = VA_HOST_SPACE_END;
+ prop->dmmu.page_size = PAGE_SIZE_2MB;
+
+ prop->cfg_size = CFG_SIZE;
+ prop->max_asid = MAX_ASID;
+ prop->num_of_events = GAUDI_EVENT_SIZE;
+ prop->tpc_enabled_mask = TPC_ENABLED_MASK;
+
+ set_default_power_values(hdev);
+
+ prop->cb_pool_cb_cnt = GAUDI_CB_POOL_CB_CNT;
+ prop->cb_pool_cb_size = GAUDI_CB_POOL_CB_SIZE;
+
+ prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
+ prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
+
+ strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
+ CARD_NAME_MAX_LEN);
+
+ prop->max_pending_cs = GAUDI_MAX_PENDING_CS;
+
+ prop->first_available_user_sob[HL_GAUDI_WS_DCORE] =
+ prop->sync_stream_first_sob +
+ (num_sync_stream_queues * HL_RSVD_SOBS);
+ prop->first_available_user_mon[HL_GAUDI_WS_DCORE] =
+ prop->sync_stream_first_mon +
+ (num_sync_stream_queues * HL_RSVD_MONS);
+
+ prop->first_available_user_interrupt = USHRT_MAX;
+
+ for (i = 0 ; i < HL_MAX_DCORES ; i++)
+ prop->first_available_cq[i] = USHRT_MAX;
+
+ prop->fw_cpu_boot_dev_sts0_valid = false;
+ prop->fw_cpu_boot_dev_sts1_valid = false;
+ prop->hard_reset_done_by_fw = false;
+ prop->gic_interrupts_enable = true;
+
+ prop->server_type = HL_SERVER_TYPE_UNKNOWN;
+
+ prop->clk_pll_index = HL_GAUDI_MME_PLL;
+ prop->max_freq_value = GAUDI_MAX_CLK_FREQ;
+
+ prop->use_get_power_for_reset_history = true;
+
+ prop->configurable_stop_on_err = true;
+
+ prop->set_max_power_on_device_init = true;
+
+ prop->dma_mask = 48;
+
+ prop->hbw_flush_reg = mmPCIE_WRAP_RR_ELBI_RD_SEC_REG_CTRL;
+
+ return 0;
+}
+
+static int gaudi_pci_bars_map(struct hl_device *hdev)
+{
+ static const char * const name[] = {"SRAM", "CFG", "HBM"};
+ bool is_wc[3] = {false, false, true};
+ int rc;
+
+ rc = hl_pci_bars_map(hdev, name, is_wc);
+ if (rc)
+ return rc;
+
+ hdev->rmmio = hdev->pcie_bar[CFG_BAR_ID] +
+ (CFG_BASE - SPI_FLASH_BASE_ADDR);
+
+ return 0;
+}
+
+static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct hl_inbound_pci_region pci_region;
+ u64 old_addr = addr;
+ int rc;
+
+ if ((gaudi) && (gaudi->hbm_bar_cur_addr == addr))
+ return old_addr;
+
+ if (hdev->asic_prop.iatu_done_by_fw)
+ return U64_MAX;
+
+ /* Inbound Region 2 - Bar 4 - Point to HBM */
+ pci_region.mode = PCI_BAR_MATCH_MODE;
+ pci_region.bar = HBM_BAR_ID;
+ pci_region.addr = addr;
+ rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
+ if (rc)
+ return U64_MAX;
+
+ if (gaudi) {
+ old_addr = gaudi->hbm_bar_cur_addr;
+ gaudi->hbm_bar_cur_addr = addr;
+ }
+
+ return old_addr;
+}
+
+static int gaudi_init_iatu(struct hl_device *hdev)
+{
+ struct hl_inbound_pci_region inbound_region;
+ struct hl_outbound_pci_region outbound_region;
+ int rc;
+
+ if (hdev->asic_prop.iatu_done_by_fw)
+ return 0;
+
+ /* Inbound Region 0 - Bar 0 - Point to SRAM + CFG */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = SRAM_BAR_ID;
+ inbound_region.addr = SRAM_BASE_ADDR;
+ rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+ if (rc)
+ goto done;
+
+ /* Inbound Region 1 - Bar 2 - Point to SPI FLASH */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = CFG_BAR_ID;
+ inbound_region.addr = SPI_FLASH_BASE_ADDR;
+ rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
+ if (rc)
+ goto done;
+
+ /* Inbound Region 2 - Bar 4 - Point to HBM */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = HBM_BAR_ID;
+ inbound_region.addr = DRAM_PHYS_BASE;
+ rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
+ if (rc)
+ goto done;
+
+ /* Outbound Region 0 - Point to Host */
+ outbound_region.addr = HOST_PHYS_BASE;
+ outbound_region.size = HOST_PHYS_SIZE;
+ rc = hl_pci_set_outbound_region(hdev, &outbound_region);
+
+done:
+ return rc;
+}
+
+static enum hl_device_hw_state gaudi_get_hw_state(struct hl_device *hdev)
+{
+ return RREG32(mmHW_STATE);
+}
+
+static int gaudi_early_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_dev *pdev = hdev->pdev;
+ resource_size_t pci_bar_size;
+ u32 fw_boot_status;
+ int rc;
+
+ rc = gaudi_set_fixed_properties(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed setting fixed properties\n");
+ return rc;
+ }
+
+ /* Check BAR sizes */
+ pci_bar_size = pci_resource_len(pdev, SRAM_BAR_ID);
+
+ if (pci_bar_size != SRAM_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ SRAM_BAR_ID, &pci_bar_size, SRAM_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ pci_bar_size = pci_resource_len(pdev, CFG_BAR_ID);
+
+ if (pci_bar_size != CFG_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
+ hdev->dram_pci_bar_start = pci_resource_start(pdev, HBM_BAR_ID);
+
+ /* If FW security is enabled at this point it means no access to ELBI */
+ if (hdev->asic_prop.fw_security_enabled) {
+ hdev->asic_prop.iatu_done_by_fw = true;
+
+ /*
+ * GIC-security-bit can ONLY be set by CPUCP, so in this stage
+ * decision can only be taken based on PCI ID security.
+ */
+ hdev->asic_prop.gic_interrupts_enable = false;
+ goto pci_init;
+ }
+
+ rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
+ &fw_boot_status);
+ if (rc)
+ goto free_queue_props;
+
+ /* Check whether FW is configuring iATU */
+ if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
+ (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
+ hdev->asic_prop.iatu_done_by_fw = true;
+
+pci_init:
+ rc = hl_pci_init(hdev);
+ if (rc)
+ goto free_queue_props;
+
+ /* Before continuing in the initialization, we need to read the preboot
+ * version to determine whether we run with a security-enabled firmware
+ */
+ rc = hl_fw_read_preboot_status(hdev);
+ if (rc) {
+ if (hdev->reset_on_preboot_fail)
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ goto pci_fini;
+ }
+
+ if (gaudi_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
+ dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ }
+
+ return 0;
+
+pci_fini:
+ hl_pci_fini(hdev);
+free_queue_props:
+ kfree(hdev->asic_prop.hw_queues_props);
+ return rc;
+}
+
+static int gaudi_early_fini(struct hl_device *hdev)
+{
+ kfree(hdev->asic_prop.hw_queues_props);
+ hl_pci_fini(hdev);
+
+ return 0;
+}
+
+/**
+ * gaudi_fetch_psoc_frequency - Fetch PSOC frequency values
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static int gaudi_fetch_psoc_frequency(struct hl_device *hdev)
+{
+ u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
+ int rc;
+
+ if ((hdev->fw_components & FW_TYPE_LINUX) &&
+ (prop->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_PLL_INFO_EN)) {
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI_CPU_PLL, pll_freq_arr);
+
+ if (rc)
+ return rc;
+
+ freq = pll_freq_arr[2];
+ } else {
+ /* Backward compatibility */
+ div_fctr = RREG32(mmPSOC_CPU_PLL_DIV_FACTOR_2);
+ div_sel = RREG32(mmPSOC_CPU_PLL_DIV_SEL_2);
+ nr = RREG32(mmPSOC_CPU_PLL_NR);
+ nf = RREG32(mmPSOC_CPU_PLL_NF);
+ od = RREG32(mmPSOC_CPU_PLL_OD);
+
+ if (div_sel == DIV_SEL_REF_CLK ||
+ div_sel == DIV_SEL_DIVIDED_REF) {
+ if (div_sel == DIV_SEL_REF_CLK)
+ freq = PLL_REF_CLK;
+ else
+ freq = PLL_REF_CLK / (div_fctr + 1);
+ } else if (div_sel == DIV_SEL_PLL_CLK ||
+ div_sel == DIV_SEL_DIVIDED_PLL) {
+ pll_clk = PLL_REF_CLK * (nf + 1) /
+ ((nr + 1) * (od + 1));
+ if (div_sel == DIV_SEL_PLL_CLK)
+ freq = pll_clk;
+ else
+ freq = pll_clk / (div_fctr + 1);
+ } else {
+ dev_warn(hdev->dev, "Received invalid div select value: %#x", div_sel);
+ freq = 0;
+ }
+ }
+
+ prop->psoc_timestamp_frequency = freq;
+ prop->psoc_pci_pll_nr = nr;
+ prop->psoc_pci_pll_nf = nf;
+ prop->psoc_pci_pll_od = od;
+ prop->psoc_pci_pll_div_factor = div_fctr;
+
+ return 0;
+}
+
+static int _gaudi_init_tpc_mem(struct hl_device *hdev,
+ dma_addr_t tpc_kernel_src_addr, u32 tpc_kernel_size)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct packet_lin_dma *init_tpc_mem_pkt;
+ struct hl_cs_job *job;
+ struct hl_cb *cb;
+ u64 dst_addr;
+ u32 cb_size, ctl;
+ u8 tpc_id;
+ int rc;
+
+ cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
+ if (!cb)
+ return -EFAULT;
+
+ init_tpc_mem_pkt = cb->kernel_address;
+ cb_size = sizeof(*init_tpc_mem_pkt);
+ memset(init_tpc_mem_pkt, 0, cb_size);
+
+ init_tpc_mem_pkt->tsize = cpu_to_le32(tpc_kernel_size);
+
+ ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
+ ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ init_tpc_mem_pkt->ctl = cpu_to_le32(ctl);
+
+ init_tpc_mem_pkt->src_addr = cpu_to_le64(tpc_kernel_src_addr);
+
+ /* TPC_CMD is configured with I$ prefetch enabled, so address should be aligned to 8KB */
+ dst_addr = FIELD_PREP(GAUDI_PKT_LIN_DMA_DST_ADDR_MASK,
+ round_up(prop->sram_user_base_address, SZ_8K));
+ init_tpc_mem_pkt->dst_addr |= cpu_to_le64(dst_addr);
+
+ job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
+ if (!job) {
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ rc = -ENOMEM;
+ goto release_cb;
+ }
+
+ job->id = 0;
+ job->user_cb = cb;
+ atomic_inc(&job->user_cb->cs_cnt);
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
+ job->patched_cb = job->user_cb;
+ job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
+
+ hl_debugfs_add_job(hdev, job);
+
+ rc = gaudi_send_job_on_qman0(hdev, job);
+
+ if (rc)
+ goto free_job;
+
+ for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
+ rc = gaudi_run_tpc_kernel(hdev, dst_addr, tpc_id);
+ if (rc)
+ break;
+ }
+
+free_job:
+ hl_userptr_delete_list(hdev, &job->userptr_list);
+ hl_debugfs_remove_job(hdev, job);
+ kfree(job);
+ atomic_dec(&cb->cs_cnt);
+
+release_cb:
+ hl_cb_put(cb);
+ hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
+
+ return rc;
+}
+
+/*
+ * gaudi_init_tpc_mem() - Initialize TPC memories.
+ * @hdev: Pointer to hl_device structure.
+ *
+ * Copy TPC kernel fw from firmware file and run it to initialize TPC memories.
+ *
+ * Return: 0 for success, negative value for error.
+ */
+static int gaudi_init_tpc_mem(struct hl_device *hdev)
+{
+ const struct firmware *fw;
+ size_t fw_size;
+ void *cpu_addr;
+ dma_addr_t dma_handle;
+ int rc, count = 5;
+
+again:
+ rc = request_firmware(&fw, GAUDI_TPC_FW_FILE, hdev->dev);
+ if (rc == -EINTR && count-- > 0) {
+ msleep(50);
+ goto again;
+ }
+
+ if (rc) {
+ dev_err(hdev->dev, "Failed to load firmware file %s\n",
+ GAUDI_TPC_FW_FILE);
+ goto out;
+ }
+
+ fw_size = fw->size;
+ cpu_addr = hl_asic_dma_alloc_coherent(hdev, fw_size, &dma_handle, GFP_KERNEL | __GFP_ZERO);
+ if (!cpu_addr) {
+ dev_err(hdev->dev,
+ "Failed to allocate %zu of dma memory for TPC kernel\n",
+ fw_size);
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ memcpy(cpu_addr, fw->data, fw_size);
+
+ rc = _gaudi_init_tpc_mem(hdev, dma_handle, fw_size);
+
+ hl_asic_dma_free_coherent(hdev, fw->size, cpu_addr, dma_handle);
+
+out:
+ release_firmware(fw);
+ return rc;
+}
+
+static void gaudi_collective_map_sobs(struct hl_device *hdev, u32 stream)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_collective_properties *prop = &gaudi->collective_props;
+ struct hl_hw_queue *q;
+ u32 i, sob_id, sob_group_id, queue_id;
+
+ /* Iterate through SOB groups and assign a SOB for each slave queue */
+ sob_group_id =
+ stream * HL_RSVD_SOBS + prop->curr_sob_group_idx[stream];
+ sob_id = prop->hw_sob_group[sob_group_id].base_sob_id;
+
+ queue_id = GAUDI_QUEUE_ID_NIC_0_0 + stream;
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
+ q = &hdev->kernel_queues[queue_id + (4 * i)];
+ q->sync_stream_prop.collective_sob_id = sob_id + i;
+ }
+
+ /* Both DMA5 and TPC7 use the same resources since only a single
+ * engine need to participate in the reduction process
+ */
+ queue_id = GAUDI_QUEUE_ID_DMA_5_0 + stream;
+ q = &hdev->kernel_queues[queue_id];
+ q->sync_stream_prop.collective_sob_id =
+ sob_id + NIC_NUMBER_OF_ENGINES;
+
+ queue_id = GAUDI_QUEUE_ID_TPC_7_0 + stream;
+ q = &hdev->kernel_queues[queue_id];
+ q->sync_stream_prop.collective_sob_id =
+ sob_id + NIC_NUMBER_OF_ENGINES;
+}
+
+static void gaudi_sob_group_hw_reset(struct kref *ref)
+{
+ struct gaudi_hw_sob_group *hw_sob_group =
+ container_of(ref, struct gaudi_hw_sob_group, kref);
+ struct hl_device *hdev = hw_sob_group->hdev;
+ int i;
+
+ for (i = 0 ; i < NUMBER_OF_SOBS_IN_GRP ; i++)
+ WREG32((mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
+ (hw_sob_group->base_sob_id * 4) + (i * 4)), 0);
+
+ kref_init(&hw_sob_group->kref);
+}
+
+static void gaudi_sob_group_reset_error(struct kref *ref)
+{
+ struct gaudi_hw_sob_group *hw_sob_group =
+ container_of(ref, struct gaudi_hw_sob_group, kref);
+ struct hl_device *hdev = hw_sob_group->hdev;
+
+ dev_crit(hdev->dev,
+ "SOB release shouldn't be called here, base_sob_id: %d\n",
+ hw_sob_group->base_sob_id);
+}
+
+static void gaudi_collective_mstr_sob_mask_set(struct gaudi_device *gaudi)
+{
+ struct gaudi_collective_properties *prop;
+ int i;
+
+ prop = &gaudi->collective_props;
+
+ memset(prop->mstr_sob_mask, 0, sizeof(prop->mstr_sob_mask));
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++)
+ if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + i))
+ prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
+ BIT(i % HL_MAX_SOBS_PER_MONITOR);
+ /* Set collective engine bit */
+ prop->mstr_sob_mask[i / HL_MAX_SOBS_PER_MONITOR] |=
+ BIT(i % HL_MAX_SOBS_PER_MONITOR);
+}
+
+static int gaudi_collective_init(struct hl_device *hdev)
+{
+ u32 i, sob_id, reserved_sobs_per_group;
+ struct gaudi_collective_properties *prop;
+ struct gaudi_device *gaudi;
+
+ gaudi = hdev->asic_specific;
+ prop = &gaudi->collective_props;
+ sob_id = hdev->asic_prop.collective_first_sob;
+
+ /* First sob in group must be aligned to HL_MAX_SOBS_PER_MONITOR */
+ reserved_sobs_per_group =
+ ALIGN(NUMBER_OF_SOBS_IN_GRP, HL_MAX_SOBS_PER_MONITOR);
+
+ /* Init SOB groups */
+ for (i = 0 ; i < NUM_SOB_GROUPS; i++) {
+ prop->hw_sob_group[i].hdev = hdev;
+ prop->hw_sob_group[i].base_sob_id = sob_id;
+ sob_id += reserved_sobs_per_group;
+ gaudi_sob_group_hw_reset(&prop->hw_sob_group[i].kref);
+ }
+
+ for (i = 0 ; i < QMAN_STREAMS; i++) {
+ prop->next_sob_group_val[i] = 1;
+ prop->curr_sob_group_idx[i] = 0;
+ gaudi_collective_map_sobs(hdev, i);
+ }
+
+ gaudi_collective_mstr_sob_mask_set(gaudi);
+
+ return 0;
+}
+
+static void gaudi_reset_sob_group(struct hl_device *hdev, u16 sob_group)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_collective_properties *cprop = &gaudi->collective_props;
+
+ kref_put(&cprop->hw_sob_group[sob_group].kref,
+ gaudi_sob_group_hw_reset);
+}
+
+static void gaudi_collective_master_init_job(struct hl_device *hdev,
+ struct hl_cs_job *job, u32 stream, u32 sob_group_offset)
+{
+ u32 master_sob_base, master_monitor, queue_id, cb_size = 0;
+ struct gaudi_collective_properties *cprop;
+ struct hl_gen_wait_properties wait_prop;
+ struct hl_sync_stream_properties *prop;
+ struct gaudi_device *gaudi;
+
+ gaudi = hdev->asic_specific;
+ cprop = &gaudi->collective_props;
+ queue_id = job->hw_queue_id;
+ prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
+
+ master_sob_base =
+ cprop->hw_sob_group[sob_group_offset].base_sob_id;
+ master_monitor = prop->collective_mstr_mon_id[0];
+
+ cprop->hw_sob_group[sob_group_offset].queue_id = queue_id;
+
+ dev_dbg(hdev->dev,
+ "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
+ master_sob_base, cprop->mstr_sob_mask[0],
+ cprop->next_sob_group_val[stream],
+ master_monitor, queue_id);
+
+ wait_prop.data = (void *) job->patched_cb;
+ wait_prop.sob_base = master_sob_base;
+ wait_prop.sob_mask = cprop->mstr_sob_mask[0];
+ wait_prop.sob_val = cprop->next_sob_group_val[stream];
+ wait_prop.mon_id = master_monitor;
+ wait_prop.q_idx = queue_id;
+ wait_prop.size = cb_size;
+ cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
+
+ master_sob_base += HL_MAX_SOBS_PER_MONITOR;
+ master_monitor = prop->collective_mstr_mon_id[1];
+
+ dev_dbg(hdev->dev,
+ "Generate master wait CBs, sob %d (mask %#x), val:0x%x, mon %u, q %d\n",
+ master_sob_base, cprop->mstr_sob_mask[1],
+ cprop->next_sob_group_val[stream],
+ master_monitor, queue_id);
+
+ wait_prop.sob_base = master_sob_base;
+ wait_prop.sob_mask = cprop->mstr_sob_mask[1];
+ wait_prop.mon_id = master_monitor;
+ wait_prop.size = cb_size;
+ cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
+}
+
+static void gaudi_collective_slave_init_job(struct hl_device *hdev,
+ struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
+{
+ struct hl_gen_wait_properties wait_prop;
+ struct hl_sync_stream_properties *prop;
+ u32 queue_id, cb_size = 0;
+
+ queue_id = job->hw_queue_id;
+ prop = &hdev->kernel_queues[queue_id].sync_stream_prop;
+
+ if (job->cs->encaps_signals) {
+ /* use the encaps signal handle store earlier in the flow
+ * and set the SOB information from the encaps
+ * signals handle
+ */
+ hl_hw_queue_encaps_sig_set_sob_info(hdev, job->cs, job,
+ cs_cmpl);
+
+ dev_dbg(hdev->dev, "collective wait: Sequence %llu found, sob_id: %u, wait for sob_val: %u\n",
+ job->cs->sequence,
+ cs_cmpl->hw_sob->sob_id,
+ cs_cmpl->sob_val);
+ }
+
+ /* Add to wait CBs using slave monitor */
+ wait_prop.data = (void *) job->user_cb;
+ wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
+ wait_prop.sob_mask = 0x1;
+ wait_prop.sob_val = cs_cmpl->sob_val;
+ wait_prop.mon_id = prop->collective_slave_mon_id;
+ wait_prop.q_idx = queue_id;
+ wait_prop.size = cb_size;
+
+ dev_dbg(hdev->dev,
+ "Generate slave wait CB, sob %d, val:%x, mon %d, q %d\n",
+ cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
+ prop->collective_slave_mon_id, queue_id);
+
+ cb_size += gaudi_gen_wait_cb(hdev, &wait_prop);
+
+ dev_dbg(hdev->dev,
+ "generate signal CB, sob_id: %d, sob val: 1, q_idx: %d\n",
+ prop->collective_sob_id, queue_id);
+
+ cb_size += gaudi_gen_signal_cb(hdev, job->user_cb,
+ prop->collective_sob_id, cb_size, false);
+}
+
+static int gaudi_collective_wait_init_cs(struct hl_cs *cs)
+{
+ struct hl_cs_compl *signal_cs_cmpl =
+ container_of(cs->signal_fence, struct hl_cs_compl, base_fence);
+ struct hl_cs_compl *cs_cmpl =
+ container_of(cs->fence, struct hl_cs_compl, base_fence);
+ struct hl_cs_encaps_sig_handle *handle = cs->encaps_sig_hdl;
+ struct gaudi_collective_properties *cprop;
+ u32 stream, queue_id, sob_group_offset;
+ struct gaudi_device *gaudi;
+ struct hl_device *hdev;
+ struct hl_cs_job *job;
+ struct hl_ctx *ctx;
+
+ ctx = cs->ctx;
+ hdev = ctx->hdev;
+ gaudi = hdev->asic_specific;
+ cprop = &gaudi->collective_props;
+
+ if (cs->encaps_signals) {
+ cs_cmpl->hw_sob = handle->hw_sob;
+ /* at this checkpoint we only need the hw_sob pointer
+ * for the completion check before start going over the jobs
+ * of the master/slaves, the sob_value will be taken later on
+ * in gaudi_collective_slave_init_job depends on each
+ * job wait offset value.
+ */
+ cs_cmpl->sob_val = 0;
+ } else {
+ /* copy the SOB id and value of the signal CS */
+ cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
+ cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
+ }
+
+ /* check again if the signal cs already completed.
+ * if yes then don't send any wait cs since the hw_sob
+ * could be in reset already. if signal is not completed
+ * then get refcount to hw_sob to prevent resetting the sob
+ * while wait cs is not submitted.
+ * note that this check is protected by two locks,
+ * hw queue lock and completion object lock,
+ * and the same completion object lock also protects
+ * the hw_sob reset handler function.
+ * The hw_queue lock prevent out of sync of hw_sob
+ * refcount value, changed by signal/wait flows.
+ */
+ spin_lock(&signal_cs_cmpl->lock);
+
+ if (completion_done(&cs->signal_fence->completion)) {
+ spin_unlock(&signal_cs_cmpl->lock);
+ return -EINVAL;
+ }
+ /* Increment kref since all slave queues are now waiting on it */
+ kref_get(&cs_cmpl->hw_sob->kref);
+
+ spin_unlock(&signal_cs_cmpl->lock);
+
+ /* Calculate the stream from collective master queue (1st job) */
+ job = list_first_entry(&cs->job_list, struct hl_cs_job, cs_node);
+ stream = job->hw_queue_id % 4;
+ sob_group_offset =
+ stream * HL_RSVD_SOBS + cprop->curr_sob_group_idx[stream];
+
+ list_for_each_entry(job, &cs->job_list, cs_node) {
+ queue_id = job->hw_queue_id;
+
+ if (hdev->kernel_queues[queue_id].collective_mode ==
+ HL_COLLECTIVE_MASTER)
+ gaudi_collective_master_init_job(hdev, job, stream,
+ sob_group_offset);
+ else
+ gaudi_collective_slave_init_job(hdev, job, cs_cmpl);
+ }
+
+ cs_cmpl->sob_group = sob_group_offset;
+
+ /* Handle sob group kref and wraparound */
+ kref_get(&cprop->hw_sob_group[sob_group_offset].kref);
+ cprop->next_sob_group_val[stream]++;
+
+ if (cprop->next_sob_group_val[stream] == HL_MAX_SOB_VAL) {
+ /*
+ * Decrement as we reached the max value.
+ * The release function won't be called here as we've
+ * just incremented the refcount.
+ */
+ kref_put(&cprop->hw_sob_group[sob_group_offset].kref,
+ gaudi_sob_group_reset_error);
+ cprop->next_sob_group_val[stream] = 1;
+ /* only two SOBs are currently in use */
+ cprop->curr_sob_group_idx[stream] =
+ (cprop->curr_sob_group_idx[stream] + 1) &
+ (HL_RSVD_SOBS - 1);
+
+ gaudi_collective_map_sobs(hdev, stream);
+
+ dev_dbg(hdev->dev, "switched to SOB group %d, stream: %d\n",
+ cprop->curr_sob_group_idx[stream], stream);
+ }
+
+ mb();
+ hl_fence_put(cs->signal_fence);
+ cs->signal_fence = NULL;
+
+ return 0;
+}
+
+static u32 gaudi_get_patched_cb_extra_size(u32 user_cb_size)
+{
+ u32 cacheline_end, additional_commands;
+
+ cacheline_end = round_up(user_cb_size, DEVICE_CACHE_LINE_SIZE);
+ additional_commands = sizeof(struct packet_msg_prot) * 2;
+
+ if (user_cb_size + additional_commands > cacheline_end)
+ return cacheline_end - user_cb_size + additional_commands;
+ else
+ return additional_commands;
+}
+
+static int gaudi_collective_wait_create_job(struct hl_device *hdev,
+ struct hl_ctx *ctx, struct hl_cs *cs,
+ enum hl_collective_mode mode, u32 queue_id, u32 wait_queue_id,
+ u32 encaps_signal_offset)
+{
+ struct hw_queue_properties *hw_queue_prop;
+ struct hl_cs_counters_atomic *cntr;
+ struct hl_cs_job *job;
+ struct hl_cb *cb;
+ u32 cb_size;
+ bool patched_cb;
+
+ cntr = &hdev->aggregated_cs_counters;
+
+ if (mode == HL_COLLECTIVE_MASTER) {
+ /* CB size of collective master queue contains
+ * 4 msg short packets for monitor 1 configuration
+ * 1 fence packet
+ * 4 msg short packets for monitor 2 configuration
+ * 1 fence packet
+ * 2 msg prot packets for completion and MSI
+ */
+ cb_size = sizeof(struct packet_msg_short) * 8 +
+ sizeof(struct packet_fence) * 2 +
+ sizeof(struct packet_msg_prot) * 2;
+ patched_cb = true;
+ } else {
+ /* CB size of collective slave queues contains
+ * 4 msg short packets for monitor configuration
+ * 1 fence packet
+ * 1 additional msg short packet for sob signal
+ */
+ cb_size = sizeof(struct packet_msg_short) * 5 +
+ sizeof(struct packet_fence);
+ patched_cb = false;
+ }
+
+ hw_queue_prop = &hdev->asic_prop.hw_queues_props[queue_id];
+ job = hl_cs_allocate_job(hdev, hw_queue_prop->type, true);
+ if (!job) {
+ atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
+ atomic64_inc(&cntr->out_of_mem_drop_cnt);
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ return -ENOMEM;
+ }
+
+ /* Allocate internal mapped CB for non patched CBs */
+ cb = hl_cb_kernel_create(hdev, cb_size,
+ hdev->mmu_enable && !patched_cb);
+ if (!cb) {
+ atomic64_inc(&ctx->cs_counters.out_of_mem_drop_cnt);
+ atomic64_inc(&cntr->out_of_mem_drop_cnt);
+ kfree(job);
+ return -EFAULT;
+ }
+
+ job->id = 0;
+ job->cs = cs;
+ job->user_cb = cb;
+ atomic_inc(&job->user_cb->cs_cnt);
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = queue_id;
+
+ /* since its guaranteed to have only one chunk in the collective wait
+ * cs, we can use this chunk to set the encapsulated signal offset
+ * in the jobs.
+ */
+ if (cs->encaps_signals)
+ job->encaps_sig_wait_offset = encaps_signal_offset;
+
+ /*
+ * No need in parsing, user CB is the patched CB.
+ * We call hl_cb_destroy() out of two reasons - we don't need
+ * the CB in the CB idr anymore and to decrement its refcount as
+ * it was incremented inside hl_cb_kernel_create().
+ */
+ if (patched_cb)
+ job->patched_cb = job->user_cb;
+ else
+ job->patched_cb = NULL;
+
+ job->job_cb_size = job->user_cb_size;
+ hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
+
+ /* increment refcount as for external queues we get completion */
+ if (hw_queue_prop->type == QUEUE_TYPE_EXT)
+ cs_get(cs);
+
+ cs->jobs_in_queue_cnt[job->hw_queue_id]++;
+
+ list_add_tail(&job->cs_node, &cs->job_list);
+
+ hl_debugfs_add_job(hdev, job);
+
+ return 0;
+}
+
+static int gaudi_collective_wait_create_jobs(struct hl_device *hdev,
+ struct hl_ctx *ctx, struct hl_cs *cs,
+ u32 wait_queue_id, u32 collective_engine_id,
+ u32 encaps_signal_offset)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct hw_queue_properties *hw_queue_prop;
+ u32 queue_id, collective_queue, num_jobs;
+ u32 stream, nic_queue, nic_idx = 0;
+ bool skip;
+ int i, rc = 0;
+
+ /* Verify wait queue id is configured as master */
+ hw_queue_prop = &hdev->asic_prop.hw_queues_props[wait_queue_id];
+ if (!(hw_queue_prop->collective_mode == HL_COLLECTIVE_MASTER)) {
+ dev_err(hdev->dev,
+ "Queue %d is not configured as collective master\n",
+ wait_queue_id);
+ return -EINVAL;
+ }
+
+ /* Verify engine id is supported */
+ if (collective_engine_id != GAUDI_ENGINE_ID_DMA_5 &&
+ collective_engine_id != GAUDI_ENGINE_ID_TPC_7) {
+ dev_err(hdev->dev,
+ "Collective wait does not support engine %u\n",
+ collective_engine_id);
+ return -EINVAL;
+ }
+
+ stream = wait_queue_id % 4;
+
+ if (collective_engine_id == GAUDI_ENGINE_ID_DMA_5)
+ collective_queue = GAUDI_QUEUE_ID_DMA_5_0 + stream;
+ else
+ collective_queue = GAUDI_QUEUE_ID_TPC_7_0 + stream;
+
+ num_jobs = NUMBER_OF_SOBS_IN_GRP + 1;
+ nic_queue = GAUDI_QUEUE_ID_NIC_0_0 + stream;
+
+ /* First job goes to the collective master queue, it will wait for
+ * the collective slave queues to finish execution.
+ * The synchronization is done using two monitors:
+ * First monitor for NICs 0-7, second monitor for NICs 8-9 and the
+ * reduction engine (DMA5/TPC7).
+ *
+ * Rest of the jobs goes to the collective slave queues which will
+ * all wait for the user to signal sob 'cs_cmpl->sob_val'.
+ */
+ for (i = 0 ; i < num_jobs ; i++) {
+ if (i == 0) {
+ queue_id = wait_queue_id;
+ rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
+ HL_COLLECTIVE_MASTER, queue_id,
+ wait_queue_id, encaps_signal_offset);
+ } else {
+ if (nic_idx < NIC_NUMBER_OF_ENGINES) {
+ if (gaudi->hw_cap_initialized &
+ BIT(HW_CAP_NIC_SHIFT + nic_idx))
+ skip = false;
+ else
+ skip = true;
+
+ queue_id = nic_queue;
+ nic_queue += 4;
+ nic_idx++;
+
+ if (skip)
+ continue;
+ } else {
+ queue_id = collective_queue;
+ }
+
+ rc = gaudi_collective_wait_create_job(hdev, ctx, cs,
+ HL_COLLECTIVE_SLAVE, queue_id,
+ wait_queue_id, encaps_signal_offset);
+ }
+
+ if (rc)
+ return rc;
+ }
+
+ return rc;
+}
+
+static int gaudi_late_init(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int rc;
+
+ rc = gaudi->cpucp_info_get(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get cpucp info\n");
+ return rc;
+ }
+
+ if ((hdev->card_type == cpucp_card_type_pci) &&
+ (hdev->nic_ports_mask & 0x3)) {
+ dev_info(hdev->dev,
+ "PCI card detected, only 8 ports are enabled\n");
+ hdev->nic_ports_mask &= ~0x3;
+
+ /* Stop and disable unused NIC QMANs */
+ WREG32(mmNIC0_QM0_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ WREG32(mmNIC0_QM1_GLBL_CFG1, NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ WREG32(mmNIC0_QM0_GLBL_CFG0, 0);
+ WREG32(mmNIC0_QM1_GLBL_CFG0, 0);
+
+ gaudi->hw_cap_initialized &= ~(HW_CAP_NIC0 | HW_CAP_NIC1);
+ }
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
+ return rc;
+ }
+
+ /* Scrub both SRAM and DRAM */
+ rc = hdev->asic_funcs->scrub_device_mem(hdev);
+ if (rc)
+ goto disable_pci_access;
+
+ rc = gaudi_fetch_psoc_frequency(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
+ goto disable_pci_access;
+ }
+
+ rc = gaudi_mmu_clear_pgt_range(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to clear MMU page tables range\n");
+ goto disable_pci_access;
+ }
+
+ rc = gaudi_init_tpc_mem(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to initialize TPC memories\n");
+ goto disable_pci_access;
+ }
+
+ rc = gaudi_collective_init(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to init collective\n");
+ goto disable_pci_access;
+ }
+
+ /* We only support a single ASID for the user, so for the sake of optimization, just
+ * initialize the ASID one time during device initialization with the fixed value of 1
+ */
+ gaudi_mmu_prepare(hdev, 1);
+
+ hl_fw_set_pll_profile(hdev);
+
+ return 0;
+
+disable_pci_access:
+ hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
+
+ return rc;
+}
+
+static void gaudi_late_fini(struct hl_device *hdev)
+{
+ hl_hwmon_release_resources(hdev);
+}
+
+static int gaudi_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
+{
+ dma_addr_t dma_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
+ void *virt_addr_arr[GAUDI_ALLOC_CPU_MEM_RETRY_CNT] = {};
+ int i, j, rc = 0;
+
+ /*
+ * The device CPU works with 40-bits addresses, while bit 39 must be set
+ * to '1' when accessing the host.
+ * Bits 49:39 of the full host address are saved for a later
+ * configuration of the HW to perform extension to 50 bits.
+ * Because there is a single HW register that holds the extension bits,
+ * these bits must be identical in all allocated range.
+ */
+
+ for (i = 0 ; i < GAUDI_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
+ virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
+ &dma_addr_arr[i],
+ GFP_KERNEL | __GFP_ZERO);
+ if (!virt_addr_arr[i]) {
+ rc = -ENOMEM;
+ goto free_dma_mem_arr;
+ }
+
+ end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
+ if (GAUDI_CPU_PCI_MSB_ADDR(dma_addr_arr[i]) ==
+ GAUDI_CPU_PCI_MSB_ADDR(end_addr))
+ break;
+ }
+
+ if (i == GAUDI_ALLOC_CPU_MEM_RETRY_CNT) {
+ dev_err(hdev->dev,
+ "MSB of CPU accessible DMA memory are not identical in all range\n");
+ rc = -EFAULT;
+ goto free_dma_mem_arr;
+ }
+
+ hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
+ hdev->cpu_accessible_dma_address = dma_addr_arr[i];
+ hdev->cpu_pci_msb_addr =
+ GAUDI_CPU_PCI_MSB_ADDR(hdev->cpu_accessible_dma_address);
+
+ if (!hdev->asic_prop.fw_security_enabled)
+ GAUDI_PCI_TO_CPU_ADDR(hdev->cpu_accessible_dma_address);
+
+free_dma_mem_arr:
+ for (j = 0 ; j < i ; j++)
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
+ dma_addr_arr[j]);
+
+ return rc;
+}
+
+static void gaudi_free_internal_qmans_pq_mem(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ u32 i;
+
+ for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
+ q = &gaudi->internal_qmans[i];
+ if (!q->pq_kernel_addr)
+ continue;
+ hl_asic_dma_free_coherent(hdev, q->pq_size, q->pq_kernel_addr, q->pq_dma_addr);
+ }
+}
+
+static int gaudi_alloc_internal_qmans_pq_mem(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ int rc, i;
+
+ for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
+ if (gaudi_queue_type[i] != QUEUE_TYPE_INT)
+ continue;
+
+ q = &gaudi->internal_qmans[i];
+
+ switch (i) {
+ case GAUDI_QUEUE_ID_DMA_2_0 ... GAUDI_QUEUE_ID_DMA_7_3:
+ q->pq_size = HBM_DMA_QMAN_SIZE_IN_BYTES;
+ break;
+ case GAUDI_QUEUE_ID_MME_0_0 ... GAUDI_QUEUE_ID_MME_1_3:
+ q->pq_size = MME_QMAN_SIZE_IN_BYTES;
+ break;
+ case GAUDI_QUEUE_ID_TPC_0_0 ... GAUDI_QUEUE_ID_TPC_7_3:
+ q->pq_size = TPC_QMAN_SIZE_IN_BYTES;
+ break;
+ case GAUDI_QUEUE_ID_NIC_0_0 ... GAUDI_QUEUE_ID_NIC_9_3:
+ q->pq_size = NIC_QMAN_SIZE_IN_BYTES;
+ break;
+ default:
+ dev_err(hdev->dev, "Bad internal queue index %d", i);
+ rc = -EINVAL;
+ goto free_internal_qmans_pq_mem;
+ }
+
+ q->pq_kernel_addr = hl_asic_dma_alloc_coherent(hdev, q->pq_size, &q->pq_dma_addr,
+ GFP_KERNEL | __GFP_ZERO);
+ if (!q->pq_kernel_addr) {
+ rc = -ENOMEM;
+ goto free_internal_qmans_pq_mem;
+ }
+ }
+
+ return 0;
+
+free_internal_qmans_pq_mem:
+ gaudi_free_internal_qmans_pq_mem(hdev);
+ return rc;
+}
+
+static void gaudi_set_pci_memory_regions(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_mem_region *region;
+
+ /* CFG */
+ region = &hdev->pci_mem_region[PCI_REGION_CFG];
+ region->region_base = CFG_BASE;
+ region->region_size = CFG_SIZE;
+ region->offset_in_bar = CFG_BASE - SPI_FLASH_BASE_ADDR;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = CFG_BAR_ID;
+ region->used = 1;
+
+ /* SRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_SRAM];
+ region->region_base = SRAM_BASE_ADDR;
+ region->region_size = SRAM_SIZE;
+ region->offset_in_bar = 0;
+ region->bar_size = SRAM_BAR_SIZE;
+ region->bar_id = SRAM_BAR_ID;
+ region->used = 1;
+
+ /* DRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_DRAM];
+ region->region_base = DRAM_PHYS_BASE;
+ region->region_size = hdev->asic_prop.dram_size;
+ region->offset_in_bar = 0;
+ region->bar_size = prop->dram_pci_bar_size;
+ region->bar_id = HBM_BAR_ID;
+ region->used = 1;
+
+ /* SP SRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_SP_SRAM];
+ region->region_base = PSOC_SCRATCHPAD_ADDR;
+ region->region_size = PSOC_SCRATCHPAD_SIZE;
+ region->offset_in_bar = PSOC_SCRATCHPAD_ADDR - SPI_FLASH_BASE_ADDR;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = CFG_BAR_ID;
+ region->used = 1;
+}
+
+static int gaudi_sw_init(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi;
+ u32 i, event_id = 0;
+ int rc;
+
+ /* Allocate device structure */
+ gaudi = kzalloc(sizeof(*gaudi), GFP_KERNEL);
+ if (!gaudi)
+ return -ENOMEM;
+
+ for (i = 0 ; i < ARRAY_SIZE(gaudi_irq_map_table) ; i++) {
+ if (gaudi_irq_map_table[i].valid) {
+ if (event_id == GAUDI_EVENT_SIZE) {
+ dev_err(hdev->dev,
+ "Event array exceeds the limit of %u events\n",
+ GAUDI_EVENT_SIZE);
+ rc = -EINVAL;
+ goto free_gaudi_device;
+ }
+
+ gaudi->events[event_id++] =
+ gaudi_irq_map_table[i].fc_id;
+ }
+ }
+
+ gaudi->cpucp_info_get = gaudi_cpucp_info_get;
+
+ hdev->asic_specific = gaudi;
+
+ /* Create DMA pool for small allocations */
+ hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
+ &hdev->pdev->dev, GAUDI_DMA_POOL_BLK_SIZE, 8, 0);
+ if (!hdev->dma_pool) {
+ dev_err(hdev->dev, "failed to create DMA pool\n");
+ rc = -ENOMEM;
+ goto free_gaudi_device;
+ }
+
+ rc = gaudi_alloc_cpu_accessible_dma_mem(hdev);
+ if (rc)
+ goto free_dma_pool;
+
+ hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
+ if (!hdev->cpu_accessible_dma_pool) {
+ dev_err(hdev->dev,
+ "Failed to create CPU accessible DMA pool\n");
+ rc = -ENOMEM;
+ goto free_cpu_dma_mem;
+ }
+
+ rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
+ (uintptr_t) hdev->cpu_accessible_dma_mem,
+ HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to add memory to CPU accessible DMA pool\n");
+ rc = -EFAULT;
+ goto free_cpu_accessible_dma_pool;
+ }
+
+ rc = gaudi_alloc_internal_qmans_pq_mem(hdev);
+ if (rc)
+ goto free_cpu_accessible_dma_pool;
+
+ spin_lock_init(&gaudi->hw_queues_lock);
+
+ hdev->supports_sync_stream = true;
+ hdev->supports_coresight = true;
+ hdev->supports_staged_submission = true;
+ hdev->supports_wait_for_multi_cs = true;
+
+ hdev->asic_funcs->set_pci_memory_regions(hdev);
+ hdev->stream_master_qid_arr =
+ hdev->asic_funcs->get_stream_master_qid_arr();
+ hdev->stream_master_qid_arr_size = GAUDI_STREAM_MASTER_ARR_SIZE;
+
+ return 0;
+
+free_cpu_accessible_dma_pool:
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+free_cpu_dma_mem:
+ if (!hdev->asic_prop.fw_security_enabled)
+ GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
+ hdev->cpu_pci_msb_addr);
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+free_dma_pool:
+ dma_pool_destroy(hdev->dma_pool);
+free_gaudi_device:
+ kfree(gaudi);
+ return rc;
+}
+
+static int gaudi_sw_fini(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ gaudi_free_internal_qmans_pq_mem(hdev);
+
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+
+ if (!hdev->asic_prop.fw_security_enabled)
+ GAUDI_CPU_TO_PCI_ADDR(hdev->cpu_accessible_dma_address,
+ hdev->cpu_pci_msb_addr);
+
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+
+ dma_pool_destroy(hdev->dma_pool);
+
+ kfree(gaudi);
+
+ return 0;
+}
+
+static irqreturn_t gaudi_irq_handler_single(int irq, void *arg)
+{
+ struct hl_device *hdev = arg;
+ int i;
+
+ if (hdev->disabled)
+ return IRQ_HANDLED;
+
+ for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
+ hl_irq_handler_cq(irq, &hdev->completion_queue[i]);
+
+ hl_irq_handler_eq(irq, &hdev->event_queue);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * For backward compatibility, new MSI interrupts should be set after the
+ * existing CPU and NIC interrupts.
+ */
+static int gaudi_pci_irq_vector(struct hl_device *hdev, unsigned int nr,
+ bool cpu_eq)
+{
+ int msi_vec;
+
+ if ((nr != GAUDI_EVENT_QUEUE_MSI_IDX) && (cpu_eq))
+ dev_crit(hdev->dev, "CPU EQ must use IRQ %d\n",
+ GAUDI_EVENT_QUEUE_MSI_IDX);
+
+ msi_vec = ((nr < GAUDI_EVENT_QUEUE_MSI_IDX) || (cpu_eq)) ? nr :
+ (nr + NIC_NUMBER_OF_ENGINES + 1);
+
+ return pci_irq_vector(hdev->pdev, msi_vec);
+}
+
+static int gaudi_enable_msi_single(struct hl_device *hdev)
+{
+ int rc, irq;
+
+ dev_dbg(hdev->dev, "Working in single MSI IRQ mode\n");
+
+ irq = gaudi_pci_irq_vector(hdev, 0, false);
+ rc = request_irq(irq, gaudi_irq_handler_single, 0,
+ "gaudi single msi", hdev);
+ if (rc)
+ dev_err(hdev->dev,
+ "Failed to request single MSI IRQ\n");
+
+ return rc;
+}
+
+static int gaudi_enable_msi_multi(struct hl_device *hdev)
+{
+ int cq_cnt = hdev->asic_prop.completion_queues_count;
+ int rc, i, irq_cnt_init, irq;
+
+ for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
+ irq = gaudi_pci_irq_vector(hdev, i, false);
+ rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi_irq_name[i],
+ &hdev->completion_queue[i]);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_irqs;
+ }
+ }
+
+ irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX, true);
+ rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi_irq_name[cq_cnt],
+ &hdev->event_queue);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_irqs;
+ }
+
+ return 0;
+
+free_irqs:
+ for (i = 0 ; i < irq_cnt_init ; i++)
+ free_irq(gaudi_pci_irq_vector(hdev, i, false),
+ &hdev->completion_queue[i]);
+ return rc;
+}
+
+static int gaudi_enable_msi(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int rc;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_MSI)
+ return 0;
+
+ rc = pci_alloc_irq_vectors(hdev->pdev, 1, 1, PCI_IRQ_MSI);
+ if (rc < 0) {
+ dev_err(hdev->dev, "MSI: Failed to enable support %d\n", rc);
+ return rc;
+ }
+
+ if (rc < NUMBER_OF_INTERRUPTS) {
+ gaudi->multi_msi_mode = false;
+ rc = gaudi_enable_msi_single(hdev);
+ } else {
+ gaudi->multi_msi_mode = true;
+ rc = gaudi_enable_msi_multi(hdev);
+ }
+
+ if (rc)
+ goto free_pci_irq_vectors;
+
+ gaudi->hw_cap_initialized |= HW_CAP_MSI;
+
+ return 0;
+
+free_pci_irq_vectors:
+ pci_free_irq_vectors(hdev->pdev);
+ return rc;
+}
+
+static void gaudi_sync_irqs(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int i, cq_cnt = hdev->asic_prop.completion_queues_count;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
+ return;
+
+ /* Wait for all pending IRQs to be finished */
+ if (gaudi->multi_msi_mode) {
+ for (i = 0 ; i < cq_cnt ; i++)
+ synchronize_irq(gaudi_pci_irq_vector(hdev, i, false));
+
+ synchronize_irq(gaudi_pci_irq_vector(hdev,
+ GAUDI_EVENT_QUEUE_MSI_IDX,
+ true));
+ } else {
+ synchronize_irq(gaudi_pci_irq_vector(hdev, 0, false));
+ }
+}
+
+static void gaudi_disable_msi(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int i, irq, cq_cnt = hdev->asic_prop.completion_queues_count;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MSI))
+ return;
+
+ gaudi_sync_irqs(hdev);
+
+ if (gaudi->multi_msi_mode) {
+ irq = gaudi_pci_irq_vector(hdev, GAUDI_EVENT_QUEUE_MSI_IDX,
+ true);
+ free_irq(irq, &hdev->event_queue);
+
+ for (i = 0 ; i < cq_cnt ; i++) {
+ irq = gaudi_pci_irq_vector(hdev, i, false);
+ free_irq(irq, &hdev->completion_queue[i]);
+ }
+ } else {
+ free_irq(gaudi_pci_irq_vector(hdev, 0, false), hdev);
+ }
+
+ pci_free_irq_vectors(hdev->pdev);
+
+ gaudi->hw_cap_initialized &= ~HW_CAP_MSI;
+}
+
+static void gaudi_init_scrambler_sram(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (hdev->asic_prop.fw_security_enabled)
+ return;
+
+ if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_SRAM_SCR_EN)
+ return;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_SRAM_SCRAMBLER)
+ return;
+
+ WREG32(mmNIF_RTR_CTRL_0_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_1_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_2_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_3_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_4_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_5_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_6_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_7_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_0_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_1_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_2_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_3_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_4_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_5_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_6_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_7_SCRAM_SRAM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_SRAM_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_SRAM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_SRAM_EN_VAL_SHIFT);
+
+ gaudi->hw_cap_initialized |= HW_CAP_SRAM_SCRAMBLER;
+}
+
+static void gaudi_init_scrambler_hbm(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (hdev->asic_prop.fw_security_enabled)
+ return;
+
+ if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_DRAM_SCR_EN)
+ return;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_HBM_SCRAMBLER)
+ return;
+
+ WREG32(mmNIF_RTR_CTRL_0_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_1_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_2_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_3_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_4_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_5_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_6_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_7_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_0_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_1_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_2_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_3_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_4_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_5_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_6_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_7_SCRAM_HBM_EN,
+ 1 << IF_RTR_CTRL_SCRAM_HBM_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_SCRAM_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_SCRAM_HBM_EN_VAL_SHIFT);
+
+ gaudi->hw_cap_initialized |= HW_CAP_HBM_SCRAMBLER;
+}
+
+static void gaudi_init_e2e(struct hl_device *hdev)
+{
+ if (hdev->asic_prop.fw_security_enabled)
+ return;
+
+ if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_E2E_CRED_EN)
+ return;
+
+ WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 247 >> 3);
+ WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 785 >> 3);
+ WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 49);
+ WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 101);
+
+ WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
+ WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
+ WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
+
+ WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
+ WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
+ WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
+ WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
+ WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
+ WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
+ WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
+ WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
+ WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
+
+ WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 297 >> 3);
+ WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 908 >> 3);
+ WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 19);
+ WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 19);
+
+ WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_WR_SIZE, 318 >> 3);
+ WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_RD_SIZE, 956 >> 3);
+ WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_WR_SIZE, 79);
+ WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_RD_SIZE, 163);
+
+ WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_WR_SIZE, 275 >> 3);
+ WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_RD_SIZE, 614 >> 3);
+ WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_RD_SIZE, 39);
+
+ WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_RD_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_WR_SIZE, 176 >> 3);
+ WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_RD_SIZE, 32 >> 3);
+ WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_WR_SIZE, 19);
+ WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_WR_SIZE, 176 >> 3);
+ WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_RD_SIZE, 32 >> 3);
+ WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_WR_SIZE, 19);
+ WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_RD_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_RD_SIZE, 32);
+
+ WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_WR_SIZE, 275 >> 3);
+ WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_RD_SIZE, 614 >> 3);
+ WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_WR_SIZE, 1);
+ WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_RD_SIZE, 39);
+
+ WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_WR_SIZE, 318 >> 3);
+ WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_RD_SIZE, 956 >> 3);
+ WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_WR_SIZE, 79);
+ WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_RD_SIZE, 79);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_WR_SIZE, 344 >> 3);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_RD_SIZE, 1000 >> 3);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_WR_SIZE, 162);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_RD_SIZE, 338);
+
+ WREG32(mmSIF_RTR_CTRL_0_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_0_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_1_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_1_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_2_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_2_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_3_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_3_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_4_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_4_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_5_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_5_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_6_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_6_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmSIF_RTR_CTRL_7_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmSIF_RTR_CTRL_7_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_0_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_0_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_1_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_1_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_2_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_2_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_3_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_3_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_4_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_4_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_5_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_5_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_6_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_6_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmNIF_RTR_CTRL_7_E2E_HBM_EN,
+ 1 << IF_RTR_CTRL_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmNIF_RTR_CTRL_7_E2E_PCI_EN,
+ 1 << IF_RTR_CTRL_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_N_DOWN_CH0_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_N_DOWN_CH1_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH0_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_E_S_DOWN_CH1_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH0_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_N_DOWN_CH1_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH0_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_HBM_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_HBM_EN_VAL_SHIFT);
+ WREG32(mmDMA_IF_W_S_DOWN_CH1_E2E_PCI_EN,
+ 1 << DMA_IF_DOWN_CHX_E2E_PCI_EN_VAL_SHIFT);
+}
+
+static void gaudi_init_hbm_cred(struct hl_device *hdev)
+{
+ u32 hbm0_wr, hbm1_wr, hbm0_rd, hbm1_rd;
+
+ if (hdev->asic_prop.fw_security_enabled)
+ return;
+
+ if (hdev->asic_prop.fw_bootfit_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_HBM_CRED_EN)
+ return;
+
+ hbm0_wr = 0x33333333;
+ hbm0_rd = 0x77777777;
+ hbm1_wr = 0x55555555;
+ hbm1_rd = 0xDDDDDDDD;
+
+ WREG32(mmDMA_IF_E_N_HBM0_WR_CRED_CNT, hbm0_wr);
+ WREG32(mmDMA_IF_E_N_HBM1_WR_CRED_CNT, hbm1_wr);
+ WREG32(mmDMA_IF_E_N_HBM0_RD_CRED_CNT, hbm0_rd);
+ WREG32(mmDMA_IF_E_N_HBM1_RD_CRED_CNT, hbm1_rd);
+
+ WREG32(mmDMA_IF_E_S_HBM0_WR_CRED_CNT, hbm0_wr);
+ WREG32(mmDMA_IF_E_S_HBM1_WR_CRED_CNT, hbm1_wr);
+ WREG32(mmDMA_IF_E_S_HBM0_RD_CRED_CNT, hbm0_rd);
+ WREG32(mmDMA_IF_E_S_HBM1_RD_CRED_CNT, hbm1_rd);
+
+ WREG32(mmDMA_IF_W_N_HBM0_WR_CRED_CNT, hbm0_wr);
+ WREG32(mmDMA_IF_W_N_HBM1_WR_CRED_CNT, hbm1_wr);
+ WREG32(mmDMA_IF_W_N_HBM0_RD_CRED_CNT, hbm0_rd);
+ WREG32(mmDMA_IF_W_N_HBM1_RD_CRED_CNT, hbm1_rd);
+
+ WREG32(mmDMA_IF_W_S_HBM0_WR_CRED_CNT, hbm0_wr);
+ WREG32(mmDMA_IF_W_S_HBM1_WR_CRED_CNT, hbm1_wr);
+ WREG32(mmDMA_IF_W_S_HBM0_RD_CRED_CNT, hbm0_rd);
+ WREG32(mmDMA_IF_W_S_HBM1_RD_CRED_CNT, hbm1_rd);
+
+ WREG32(mmDMA_IF_E_N_HBM_CRED_EN_0,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_E_S_HBM_CRED_EN_0,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_W_N_HBM_CRED_EN_0,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_W_S_HBM_CRED_EN_0,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+
+ WREG32(mmDMA_IF_E_N_HBM_CRED_EN_1,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_E_S_HBM_CRED_EN_1,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_W_N_HBM_CRED_EN_1,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+ WREG32(mmDMA_IF_W_S_HBM_CRED_EN_1,
+ (1 << DMA_IF_HBM_CRED_EN_READ_CREDIT_EN_SHIFT) |
+ (1 << DMA_IF_HBM_CRED_EN_WRITE_CREDIT_EN_SHIFT));
+}
+
+static void gaudi_init_golden_registers(struct hl_device *hdev)
+{
+ u32 tpc_offset;
+ int tpc_id, i;
+
+ gaudi_init_e2e(hdev);
+ gaudi_init_hbm_cred(hdev);
+
+ for (tpc_id = 0, tpc_offset = 0;
+ tpc_id < TPC_NUMBER_OF_ENGINES;
+ tpc_id++, tpc_offset += TPC_CFG_OFFSET) {
+ /* Mask all arithmetic interrupts from TPC */
+ WREG32(mmTPC0_CFG_TPC_INTR_MASK + tpc_offset, 0x8FFE);
+ /* Set 16 cache lines */
+ WREG32_FIELD(TPC0_CFG_MSS_CONFIG, tpc_offset,
+ ICACHE_FETCH_LINE_NUM, 2);
+ }
+
+ /* Make sure 1st 128 bytes in SRAM are 0 for Tensor DMA */
+ for (i = 0 ; i < 128 ; i += 8)
+ writeq(0, hdev->pcie_bar[SRAM_BAR_ID] + i);
+
+ WREG32(mmMME0_CTRL_EUS_ROLLUP_CNT_ADD, 3);
+ WREG32(mmMME1_CTRL_EUS_ROLLUP_CNT_ADD, 3);
+ WREG32(mmMME2_CTRL_EUS_ROLLUP_CNT_ADD, 3);
+ WREG32(mmMME3_CTRL_EUS_ROLLUP_CNT_ADD, 3);
+}
+
+static void gaudi_init_pci_dma_qman(struct hl_device *hdev, int dma_id,
+ int qman_id, dma_addr_t qman_pq_addr)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
+ u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
+ u32 q_off, dma_qm_offset;
+ u32 dma_qm_err_cfg, irq_handler_offset;
+
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+
+ mtr_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ mtr_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ q_off = dma_qm_offset + qman_id * 4;
+
+ WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_pq_addr));
+ WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_pq_addr));
+
+ WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HL_QUEUE_LENGTH));
+ WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
+ WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
+
+ WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off, QMAN_LDMA_SIZE_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_SRC_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_DST_OFFSET);
+
+ WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
+ WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
+ WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
+ WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
+
+ WREG32(mmDMA0_QM_CP_BARRIER_CFG_0 + q_off, 0x100);
+
+ /* The following configuration is needed only once per QMAN */
+ if (qman_id == 0) {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
+
+ /* Configure RAZWI IRQ */
+ dma_qm_err_cfg = PCI_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+ if (hdev->stop_on_err)
+ dma_qm_err_cfg |=
+ PCI_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+
+ WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
+
+ WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
+ dma_id);
+
+ WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
+ QM_ARB_ERR_MSG_EN_MASK);
+
+ /* Set timeout to maximum */
+ WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
+
+ WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
+ QMAN_EXTERNAL_MAKE_TRUSTED);
+
+ WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
+ }
+}
+
+static void gaudi_init_dma_core(struct hl_device *hdev, int dma_id)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 dma_err_cfg = 1 << DMA0_CORE_ERR_CFG_ERR_MSG_EN_SHIFT;
+ u32 dma_offset = dma_id * DMA_CORE_OFFSET;
+ u32 irq_handler_offset;
+
+ /* Set to maximum possible according to physical size */
+ WREG32(mmDMA0_CORE_RD_MAX_OUTSTAND + dma_offset, 0);
+ WREG32(mmDMA0_CORE_RD_MAX_SIZE + dma_offset, 0);
+
+ /* WA for H/W bug H3-2116 */
+ WREG32(mmDMA0_CORE_LBW_MAX_OUTSTAND + dma_offset, 15);
+
+ /* STOP_ON bit implies no completion to operation in case of RAZWI */
+ if (hdev->stop_on_err)
+ dma_err_cfg |= 1 << DMA0_CORE_ERR_CFG_STOP_ON_ERR_SHIFT;
+
+ WREG32(mmDMA0_CORE_ERR_CFG + dma_offset, dma_err_cfg);
+
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
+
+ WREG32(mmDMA0_CORE_ERRMSG_ADDR_LO + dma_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmDMA0_CORE_ERRMSG_ADDR_HI + dma_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmDMA0_CORE_ERRMSG_WDATA + dma_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_DMA0_CORE].cpu_id + dma_id);
+ WREG32(mmDMA0_CORE_PROT + dma_offset,
+ 1 << DMA0_CORE_PROT_ERR_VAL_SHIFT);
+ /* If the channel is secured, it should be in MMU bypass mode */
+ WREG32(mmDMA0_CORE_SECURE_PROPS + dma_offset,
+ 1 << DMA0_CORE_SECURE_PROPS_MMBP_SHIFT);
+ WREG32(mmDMA0_CORE_CFG_0 + dma_offset, 1 << DMA0_CORE_CFG_0_EN_SHIFT);
+}
+
+static void gaudi_enable_qman(struct hl_device *hdev, int dma_id,
+ u32 enable_mask)
+{
+ u32 dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+
+ WREG32(mmDMA0_QM_GLBL_CFG0 + dma_qm_offset, enable_mask);
+}
+
+static void gaudi_init_pci_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct hl_hw_queue *q;
+ int i, j, dma_id, cpu_skip, nic_skip, cq_id = 0, q_idx, msi_vec = 0;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_PCI_DMA)
+ return;
+
+ for (i = 0 ; i < PCI_DMA_NUMBER_OF_CHNLS ; i++) {
+ dma_id = gaudi_dma_assignment[i];
+ /*
+ * For queues after the CPU Q need to add 1 to get the correct
+ * queue. In addition, need to add the CPU EQ and NIC IRQs in
+ * order to get the correct MSI register.
+ */
+ if (dma_id > 1) {
+ cpu_skip = 1;
+ nic_skip = NIC_NUMBER_OF_ENGINES;
+ } else {
+ cpu_skip = 0;
+ nic_skip = 0;
+ }
+
+ for (j = 0 ; j < QMAN_STREAMS ; j++) {
+ q_idx = 4 * dma_id + j + cpu_skip;
+ q = &hdev->kernel_queues[q_idx];
+ q->cq_id = cq_id++;
+ q->msi_vec = nic_skip + cpu_skip + msi_vec++;
+ gaudi_init_pci_dma_qman(hdev, dma_id, j,
+ q->bus_address);
+ }
+
+ gaudi_init_dma_core(hdev, dma_id);
+
+ gaudi_enable_qman(hdev, dma_id, PCI_DMA_QMAN_ENABLE);
+ }
+
+ gaudi->hw_cap_initialized |= HW_CAP_PCI_DMA;
+}
+
+static void gaudi_init_hbm_dma_qman(struct hl_device *hdev, int dma_id,
+ int qman_id, u64 qman_base_addr)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
+ u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
+ u32 dma_qm_err_cfg, irq_handler_offset;
+ u32 q_off, dma_qm_offset;
+
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+
+ mtr_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ mtr_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ q_off = dma_qm_offset + qman_id * 4;
+
+ if (qman_id < 4) {
+ WREG32(mmDMA0_QM_PQ_BASE_LO_0 + q_off,
+ lower_32_bits(qman_base_addr));
+ WREG32(mmDMA0_QM_PQ_BASE_HI_0 + q_off,
+ upper_32_bits(qman_base_addr));
+
+ WREG32(mmDMA0_QM_PQ_SIZE_0 + q_off, ilog2(HBM_DMA_QMAN_LENGTH));
+ WREG32(mmDMA0_QM_PQ_PI_0 + q_off, 0);
+ WREG32(mmDMA0_QM_PQ_CI_0 + q_off, 0);
+
+ WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_CPDMA_SIZE_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_SRC_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_DST_OFFSET);
+ } else {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
+
+ WREG32(mmDMA0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_LDMA_SIZE_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_SRC_OFFSET);
+ WREG32(mmDMA0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_DST_OFFSET);
+
+ /* Configure RAZWI IRQ */
+ dma_qm_err_cfg = HBM_DMA_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+ if (hdev->stop_on_err)
+ dma_qm_err_cfg |=
+ HBM_DMA_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+
+ WREG32(mmDMA0_QM_GLBL_ERR_CFG + dma_qm_offset, dma_qm_err_cfg);
+
+ WREG32(mmDMA0_QM_GLBL_ERR_ADDR_LO + dma_qm_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmDMA0_QM_GLBL_ERR_ADDR_HI + dma_qm_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmDMA0_QM_GLBL_ERR_WDATA + dma_qm_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_DMA0_QM].cpu_id +
+ dma_id);
+
+ WREG32(mmDMA0_QM_ARB_ERR_MSG_EN + dma_qm_offset,
+ QM_ARB_ERR_MSG_EN_MASK);
+
+ /* Set timeout to maximum */
+ WREG32(mmDMA0_QM_ARB_SLV_CHOISE_WDT + dma_qm_offset, GAUDI_ARB_WDT_TIMEOUT);
+
+ WREG32(mmDMA0_QM_GLBL_CFG1 + dma_qm_offset, 0);
+ WREG32(mmDMA0_QM_GLBL_PROT + dma_qm_offset,
+ QMAN_INTERNAL_MAKE_TRUSTED);
+ }
+
+ WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
+ WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
+
+ /* Configure DMA5 CP_MSG_BASE 2/3 for sync stream collective */
+ if (gaudi_dma_assignment[dma_id] == GAUDI_ENGINE_ID_DMA_5) {
+ WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
+ mtr_base_ws_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
+ mtr_base_ws_hi);
+ WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
+ so_base_ws_lo);
+ WREG32(mmDMA0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
+ so_base_ws_hi);
+ }
+}
+
+static void gaudi_init_hbm_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ u64 qman_base_addr;
+ int i, j, dma_id, internal_q_index;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_HBM_DMA)
+ return;
+
+ for (i = 0 ; i < HBM_DMA_NUMBER_OF_CHNLS ; i++) {
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1 + i];
+
+ for (j = 0 ; j < QMAN_STREAMS ; j++) {
+ /*
+ * Add the CPU queue in order to get the correct queue
+ * number as all internal queue are placed after it
+ */
+ internal_q_index = dma_id * QMAN_STREAMS + j + 1;
+
+ q = &gaudi->internal_qmans[internal_q_index];
+ qman_base_addr = (u64) q->pq_dma_addr;
+ gaudi_init_hbm_dma_qman(hdev, dma_id, j,
+ qman_base_addr);
+ }
+
+ /* Initializing lower CP for HBM DMA QMAN */
+ gaudi_init_hbm_dma_qman(hdev, dma_id, 4, 0);
+
+ gaudi_init_dma_core(hdev, dma_id);
+
+ gaudi_enable_qman(hdev, dma_id, HBM_DMA_QMAN_ENABLE);
+ }
+
+ gaudi->hw_cap_initialized |= HW_CAP_HBM_DMA;
+}
+
+static void gaudi_init_mme_qman(struct hl_device *hdev, u32 mme_offset,
+ int qman_id, u64 qman_base_addr)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 irq_handler_offset;
+ u32 q_off, mme_id;
+ u32 mme_qm_err_cfg;
+
+ mtr_base_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ q_off = mme_offset + qman_id * 4;
+
+ if (qman_id < 4) {
+ WREG32(mmMME0_QM_PQ_BASE_LO_0 + q_off,
+ lower_32_bits(qman_base_addr));
+ WREG32(mmMME0_QM_PQ_BASE_HI_0 + q_off,
+ upper_32_bits(qman_base_addr));
+
+ WREG32(mmMME0_QM_PQ_SIZE_0 + q_off, ilog2(MME_QMAN_LENGTH));
+ WREG32(mmMME0_QM_PQ_PI_0 + q_off, 0);
+ WREG32(mmMME0_QM_PQ_CI_0 + q_off, 0);
+
+ WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_CPDMA_SIZE_OFFSET);
+ WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_SRC_OFFSET);
+ WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_DST_OFFSET);
+ } else {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
+
+ WREG32(mmMME0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_LDMA_SIZE_OFFSET);
+ WREG32(mmMME0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_SRC_OFFSET);
+ WREG32(mmMME0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_DST_OFFSET);
+
+ /* Configure RAZWI IRQ */
+ mme_id = mme_offset /
+ (mmMME1_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0) / 2;
+
+ mme_qm_err_cfg = MME_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+ if (hdev->stop_on_err)
+ mme_qm_err_cfg |=
+ MME_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+
+ WREG32(mmMME0_QM_GLBL_ERR_CFG + mme_offset, mme_qm_err_cfg);
+
+ WREG32(mmMME0_QM_GLBL_ERR_ADDR_LO + mme_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmMME0_QM_GLBL_ERR_ADDR_HI + mme_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmMME0_QM_GLBL_ERR_WDATA + mme_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_MME0_QM].cpu_id +
+ mme_id);
+
+ WREG32(mmMME0_QM_ARB_ERR_MSG_EN + mme_offset,
+ QM_ARB_ERR_MSG_EN_MASK);
+
+ /* Set timeout to maximum */
+ WREG32(mmMME0_QM_ARB_SLV_CHOISE_WDT + mme_offset, GAUDI_ARB_WDT_TIMEOUT);
+
+ WREG32(mmMME0_QM_GLBL_CFG1 + mme_offset, 0);
+ WREG32(mmMME0_QM_GLBL_PROT + mme_offset,
+ QMAN_INTERNAL_MAKE_TRUSTED);
+ }
+
+ WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_lo);
+ WREG32(mmMME0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_hi);
+ WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_lo);
+ WREG32(mmMME0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_hi);
+}
+
+static void gaudi_init_mme_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ u64 qman_base_addr;
+ u32 mme_offset;
+ int i, internal_q_index;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_MME)
+ return;
+
+ /*
+ * map GAUDI_QUEUE_ID_MME_0_X to the N_W_MME (mmMME2_QM_BASE)
+ * and GAUDI_QUEUE_ID_MME_1_X to the S_W_MME (mmMME0_QM_BASE)
+ */
+
+ mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
+
+ for (i = 0 ; i < MME_NUMBER_OF_QMANS ; i++) {
+ internal_q_index = GAUDI_QUEUE_ID_MME_0_0 + i;
+ q = &gaudi->internal_qmans[internal_q_index];
+ qman_base_addr = (u64) q->pq_dma_addr;
+ gaudi_init_mme_qman(hdev, mme_offset, (i & 0x3),
+ qman_base_addr);
+ if (i == 3)
+ mme_offset = 0;
+ }
+
+ /* Initializing lower CP for MME QMANs */
+ mme_offset = mmMME2_QM_GLBL_CFG0 - mmMME0_QM_GLBL_CFG0;
+ gaudi_init_mme_qman(hdev, mme_offset, 4, 0);
+ gaudi_init_mme_qman(hdev, 0, 4, 0);
+
+ WREG32(mmMME2_QM_GLBL_CFG0, QMAN_MME_ENABLE);
+ WREG32(mmMME0_QM_GLBL_CFG0, QMAN_MME_ENABLE);
+
+ gaudi->hw_cap_initialized |= HW_CAP_MME;
+}
+
+static void gaudi_init_tpc_qman(struct hl_device *hdev, u32 tpc_offset,
+ int qman_id, u64 qman_base_addr)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
+ u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
+ u32 tpc_qm_err_cfg, irq_handler_offset;
+ u32 q_off, tpc_id;
+
+ mtr_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_en_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ mtr_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_ws_lo = lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ q_off = tpc_offset + qman_id * 4;
+
+ tpc_id = tpc_offset /
+ (mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0);
+
+ if (qman_id < 4) {
+ WREG32(mmTPC0_QM_PQ_BASE_LO_0 + q_off,
+ lower_32_bits(qman_base_addr));
+ WREG32(mmTPC0_QM_PQ_BASE_HI_0 + q_off,
+ upper_32_bits(qman_base_addr));
+
+ WREG32(mmTPC0_QM_PQ_SIZE_0 + q_off, ilog2(TPC_QMAN_LENGTH));
+ WREG32(mmTPC0_QM_PQ_PI_0 + q_off, 0);
+ WREG32(mmTPC0_QM_PQ_CI_0 + q_off, 0);
+
+ WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_CPDMA_SIZE_OFFSET);
+ WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_SRC_OFFSET);
+ WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_CPDMA_DST_OFFSET);
+ } else {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
+
+ WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_LDMA_SIZE_OFFSET);
+ WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_SRC_OFFSET);
+ WREG32(mmTPC0_QM_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_DST_OFFSET);
+
+ /* Configure RAZWI IRQ */
+ tpc_qm_err_cfg = TPC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+ if (hdev->stop_on_err)
+ tpc_qm_err_cfg |=
+ TPC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+
+ WREG32(mmTPC0_QM_GLBL_ERR_CFG + tpc_offset, tpc_qm_err_cfg);
+
+ WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + tpc_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + tpc_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmTPC0_QM_GLBL_ERR_WDATA + tpc_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_TPC0_QM].cpu_id +
+ tpc_id);
+
+ WREG32(mmTPC0_QM_ARB_ERR_MSG_EN + tpc_offset,
+ QM_ARB_ERR_MSG_EN_MASK);
+
+ /* Set timeout to maximum */
+ WREG32(mmTPC0_QM_ARB_SLV_CHOISE_WDT + tpc_offset, GAUDI_ARB_WDT_TIMEOUT);
+
+ WREG32(mmTPC0_QM_GLBL_CFG1 + tpc_offset, 0);
+ WREG32(mmTPC0_QM_GLBL_PROT + tpc_offset,
+ QMAN_INTERNAL_MAKE_TRUSTED);
+ }
+
+ WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
+ WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
+
+ /* Configure TPC7 CP_MSG_BASE 2/3 for sync stream collective */
+ if (tpc_id == 6) {
+ WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_LO_0 + q_off,
+ mtr_base_ws_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE2_ADDR_HI_0 + q_off,
+ mtr_base_ws_hi);
+ WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_LO_0 + q_off,
+ so_base_ws_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE3_ADDR_HI_0 + q_off,
+ so_base_ws_hi);
+ }
+}
+
+static void gaudi_init_tpc_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ u64 qman_base_addr;
+ u32 so_base_hi, tpc_offset = 0;
+ u32 tpc_delta = mmTPC1_CFG_SM_BASE_ADDRESS_HIGH -
+ mmTPC0_CFG_SM_BASE_ADDRESS_HIGH;
+ int i, tpc_id, internal_q_index;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_TPC_MASK)
+ return;
+
+ so_base_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
+ for (i = 0 ; i < QMAN_STREAMS ; i++) {
+ internal_q_index = GAUDI_QUEUE_ID_TPC_0_0 +
+ tpc_id * QMAN_STREAMS + i;
+ q = &gaudi->internal_qmans[internal_q_index];
+ qman_base_addr = (u64) q->pq_dma_addr;
+ gaudi_init_tpc_qman(hdev, tpc_offset, i,
+ qman_base_addr);
+
+ if (i == 3) {
+ /* Initializing lower CP for TPC QMAN */
+ gaudi_init_tpc_qman(hdev, tpc_offset, 4, 0);
+
+ /* Enable the QMAN and TPC channel */
+ WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset,
+ QMAN_TPC_ENABLE);
+ }
+ }
+
+ WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + tpc_id * tpc_delta,
+ so_base_hi);
+
+ tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
+
+ gaudi->hw_cap_initialized |=
+ FIELD_PREP(HW_CAP_TPC_MASK, 1 << tpc_id);
+ }
+}
+
+static void gaudi_init_nic_qman(struct hl_device *hdev, u32 nic_offset,
+ int qman_id, u64 qman_base_addr, int nic_id)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 mtr_base_en_lo, mtr_base_en_hi, mtr_base_ws_lo, mtr_base_ws_hi;
+ u32 so_base_en_lo, so_base_en_hi, so_base_ws_lo, so_base_ws_hi;
+ u32 nic_qm_err_cfg, irq_handler_offset;
+ u32 q_off;
+
+ mtr_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_en_lo = lower_32_bits((CFG_BASE & U32_MAX) +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_en_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ mtr_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_ws_lo = lower_32_bits((CFG_BASE & U32_MAX) +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_ws_hi = upper_32_bits(CFG_BASE +
+ mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ q_off = nic_offset + qman_id * 4;
+
+ WREG32(mmNIC0_QM0_PQ_BASE_LO_0 + q_off, lower_32_bits(qman_base_addr));
+ WREG32(mmNIC0_QM0_PQ_BASE_HI_0 + q_off, upper_32_bits(qman_base_addr));
+
+ WREG32(mmNIC0_QM0_PQ_SIZE_0 + q_off, ilog2(NIC_QMAN_LENGTH));
+ WREG32(mmNIC0_QM0_PQ_PI_0 + q_off, 0);
+ WREG32(mmNIC0_QM0_PQ_CI_0 + q_off, 0);
+
+ WREG32(mmNIC0_QM0_CP_LDMA_TSIZE_OFFSET_0 + q_off,
+ QMAN_LDMA_SIZE_OFFSET);
+ WREG32(mmNIC0_QM0_CP_LDMA_SRC_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_SRC_OFFSET);
+ WREG32(mmNIC0_QM0_CP_LDMA_DST_BASE_LO_OFFSET_0 + q_off,
+ QMAN_LDMA_DST_OFFSET);
+
+ WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_LO_0 + q_off, mtr_base_en_lo);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE0_ADDR_HI_0 + q_off, mtr_base_en_hi);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_LO_0 + q_off, so_base_en_lo);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE1_ADDR_HI_0 + q_off, so_base_en_hi);
+
+ /* Configure NIC CP_MSG_BASE 2/3 for sync stream collective */
+ WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_LO_0 + q_off, mtr_base_ws_lo);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE2_ADDR_HI_0 + q_off, mtr_base_ws_hi);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_LO_0 + q_off, so_base_ws_lo);
+ WREG32(mmNIC0_QM0_CP_MSG_BASE3_ADDR_HI_0 + q_off, so_base_ws_hi);
+
+ if (qman_id == 0) {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
+
+ /* Configure RAZWI IRQ */
+ nic_qm_err_cfg = NIC_QMAN_GLBL_ERR_CFG_MSG_EN_MASK;
+ if (hdev->stop_on_err)
+ nic_qm_err_cfg |=
+ NIC_QMAN_GLBL_ERR_CFG_STOP_ON_ERR_EN_MASK;
+
+ WREG32(mmNIC0_QM0_GLBL_ERR_CFG + nic_offset, nic_qm_err_cfg);
+
+ WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_LO + nic_offset,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(mmNIC0_QM0_GLBL_ERR_ADDR_HI + nic_offset,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(mmNIC0_QM0_GLBL_ERR_WDATA + nic_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_NIC0_QM0].cpu_id +
+ nic_id);
+
+ WREG32(mmNIC0_QM0_ARB_ERR_MSG_EN + nic_offset,
+ QM_ARB_ERR_MSG_EN_MASK);
+
+ /* Set timeout to maximum */
+ WREG32(mmNIC0_QM0_ARB_SLV_CHOISE_WDT + nic_offset, GAUDI_ARB_WDT_TIMEOUT);
+
+ WREG32(mmNIC0_QM0_GLBL_CFG1 + nic_offset, 0);
+ WREG32(mmNIC0_QM0_GLBL_PROT + nic_offset,
+ QMAN_INTERNAL_MAKE_TRUSTED);
+ }
+}
+
+static void gaudi_init_nic_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+ u64 qman_base_addr;
+ u32 nic_offset = 0;
+ u32 nic_delta_between_qmans =
+ mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+ u32 nic_delta_between_nics =
+ mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+ int i, nic_id, internal_q_index;
+
+ if (!hdev->nic_ports_mask)
+ return;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC_MASK)
+ return;
+
+ dev_dbg(hdev->dev, "Initializing NIC QMANs\n");
+
+ for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+ if (!(hdev->nic_ports_mask & (1 << nic_id))) {
+ nic_offset += nic_delta_between_qmans;
+ if (nic_id & 1) {
+ nic_offset -= (nic_delta_between_qmans * 2);
+ nic_offset += nic_delta_between_nics;
+ }
+ continue;
+ }
+
+ for (i = 0 ; i < QMAN_STREAMS ; i++) {
+ internal_q_index = GAUDI_QUEUE_ID_NIC_0_0 +
+ nic_id * QMAN_STREAMS + i;
+ q = &gaudi->internal_qmans[internal_q_index];
+ qman_base_addr = (u64) q->pq_dma_addr;
+ gaudi_init_nic_qman(hdev, nic_offset, (i & 0x3),
+ qman_base_addr, nic_id);
+ }
+
+ /* Enable the QMAN */
+ WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, NIC_QMAN_ENABLE);
+
+ nic_offset += nic_delta_between_qmans;
+ if (nic_id & 1) {
+ nic_offset -= (nic_delta_between_qmans * 2);
+ nic_offset += nic_delta_between_nics;
+ }
+
+ gaudi->hw_cap_initialized |= 1 << (HW_CAP_NIC_SHIFT + nic_id);
+ }
+}
+
+static void gaudi_disable_pci_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
+ return;
+
+ WREG32(mmDMA0_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA1_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA5_QM_GLBL_CFG0, 0);
+}
+
+static void gaudi_disable_hbm_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
+ return;
+
+ WREG32(mmDMA2_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA3_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA4_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA6_QM_GLBL_CFG0, 0);
+ WREG32(mmDMA7_QM_GLBL_CFG0, 0);
+}
+
+static void gaudi_disable_mme_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
+ return;
+
+ WREG32(mmMME2_QM_GLBL_CFG0, 0);
+ WREG32(mmMME0_QM_GLBL_CFG0, 0);
+}
+
+static void gaudi_disable_tpc_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 tpc_offset = 0;
+ int tpc_id;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ for (tpc_id = 0 ; tpc_id < TPC_NUMBER_OF_ENGINES ; tpc_id++) {
+ WREG32(mmTPC0_QM_GLBL_CFG0 + tpc_offset, 0);
+ tpc_offset += mmTPC1_QM_GLBL_CFG0 - mmTPC0_QM_GLBL_CFG0;
+ }
+}
+
+static void gaudi_disable_nic_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 nic_mask, nic_offset = 0;
+ u32 nic_delta_between_qmans =
+ mmNIC0_QM1_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+ u32 nic_delta_between_nics =
+ mmNIC1_QM0_GLBL_CFG0 - mmNIC0_QM0_GLBL_CFG0;
+ int nic_id;
+
+ for (nic_id = 0 ; nic_id < NIC_NUMBER_OF_ENGINES ; nic_id++) {
+ nic_mask = 1 << (HW_CAP_NIC_SHIFT + nic_id);
+
+ if (gaudi->hw_cap_initialized & nic_mask)
+ WREG32(mmNIC0_QM0_GLBL_CFG0 + nic_offset, 0);
+
+ nic_offset += nic_delta_between_qmans;
+ if (nic_id & 1) {
+ nic_offset -= (nic_delta_between_qmans * 2);
+ nic_offset += nic_delta_between_nics;
+ }
+ }
+}
+
+static void gaudi_stop_pci_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
+ return;
+
+ /* Stop upper CPs of QMANs 0.0 to 1.3 and 5.0 to 5.3 */
+ WREG32(mmDMA0_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA1_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA5_QM_GLBL_CFG1, 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+}
+
+static void gaudi_stop_hbm_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
+ return;
+
+ /* Stop CPs of HBM DMA QMANs */
+
+ WREG32(mmDMA2_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA3_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA4_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA6_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmDMA7_QM_GLBL_CFG1, 0x1F << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+}
+
+static void gaudi_stop_mme_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
+ return;
+
+ /* Stop CPs of MME QMANs */
+ WREG32(mmMME2_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmMME0_QM_GLBL_CFG1, 0x1F << MME0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+}
+
+static void gaudi_stop_tpc_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ WREG32(mmTPC0_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC1_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC2_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC3_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC4_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC5_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC6_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+ WREG32(mmTPC7_QM_GLBL_CFG1, 0x1F << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+}
+
+static void gaudi_stop_nic_qmans(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ /* Stop upper CPs of QMANs */
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC0)
+ WREG32(mmNIC0_QM0_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC1)
+ WREG32(mmNIC0_QM1_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC2)
+ WREG32(mmNIC1_QM0_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC3)
+ WREG32(mmNIC1_QM1_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC4)
+ WREG32(mmNIC2_QM0_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC5)
+ WREG32(mmNIC2_QM1_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC6)
+ WREG32(mmNIC3_QM0_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC7)
+ WREG32(mmNIC3_QM1_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC8)
+ WREG32(mmNIC4_QM0_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC9)
+ WREG32(mmNIC4_QM1_GLBL_CFG1,
+ NIC0_QM0_GLBL_CFG1_PQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CQF_STOP_MASK |
+ NIC0_QM0_GLBL_CFG1_CP_STOP_MASK);
+}
+
+static void gaudi_pci_dma_stall(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_PCI_DMA))
+ return;
+
+ WREG32(mmDMA0_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA1_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA5_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+}
+
+static void gaudi_hbm_dma_stall(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_HBM_DMA))
+ return;
+
+ WREG32(mmDMA2_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA3_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA4_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA6_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+ WREG32(mmDMA7_CORE_CFG_1, 1 << DMA0_CORE_CFG_1_HALT_SHIFT);
+}
+
+static void gaudi_mme_stall(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MME))
+ return;
+
+ /* WA for H3-1800 bug: do ACC and SBAB writes twice */
+ WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME0_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME0_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME1_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME1_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME2_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME2_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME3_ACC_ACC_STALL, 1 << MME_ACC_ACC_STALL_R_SHIFT);
+ WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+ WREG32(mmMME3_SBAB_SB_STALL, 1 << MME_SBAB_SB_STALL_R_SHIFT);
+}
+
+static void gaudi_tpc_stall(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+}
+
+static void gaudi_disable_clock_gating(struct hl_device *hdev)
+{
+ u32 qman_offset;
+ int i;
+
+ if (hdev->asic_prop.fw_security_enabled)
+ return;
+
+ for (i = 0, qman_offset = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
+ WREG32(mmDMA0_QM_CGM_CFG + qman_offset, 0);
+ WREG32(mmDMA0_QM_CGM_CFG1 + qman_offset, 0);
+
+ qman_offset += (mmDMA1_QM_CGM_CFG - mmDMA0_QM_CGM_CFG);
+ }
+
+ WREG32(mmMME0_QM_CGM_CFG, 0);
+ WREG32(mmMME0_QM_CGM_CFG1, 0);
+ WREG32(mmMME2_QM_CGM_CFG, 0);
+ WREG32(mmMME2_QM_CGM_CFG1, 0);
+
+ for (i = 0, qman_offset = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
+ WREG32(mmTPC0_QM_CGM_CFG + qman_offset, 0);
+ WREG32(mmTPC0_QM_CGM_CFG1 + qman_offset, 0);
+
+ qman_offset += (mmTPC1_QM_CGM_CFG - mmTPC0_QM_CGM_CFG);
+ }
+}
+
+static void gaudi_enable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
+
+ /* Zero the lower/upper parts of the 64-bit counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
+
+ /* Enable the counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
+}
+
+static void gaudi_disable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
+}
+
+static void gaudi_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ u32 wait_timeout_ms;
+
+ if (hdev->pldm)
+ wait_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
+ else
+ wait_timeout_ms = GAUDI_RESET_WAIT_MSEC;
+
+ if (fw_reset)
+ goto skip_engines;
+
+ gaudi_stop_nic_qmans(hdev);
+ gaudi_stop_mme_qmans(hdev);
+ gaudi_stop_tpc_qmans(hdev);
+ gaudi_stop_hbm_dma_qmans(hdev);
+ gaudi_stop_pci_dma_qmans(hdev);
+
+ msleep(wait_timeout_ms);
+
+ gaudi_pci_dma_stall(hdev);
+ gaudi_hbm_dma_stall(hdev);
+ gaudi_tpc_stall(hdev);
+ gaudi_mme_stall(hdev);
+
+ msleep(wait_timeout_ms);
+
+ gaudi_disable_nic_qmans(hdev);
+ gaudi_disable_mme_qmans(hdev);
+ gaudi_disable_tpc_qmans(hdev);
+ gaudi_disable_hbm_dma_qmans(hdev);
+ gaudi_disable_pci_dma_qmans(hdev);
+
+ gaudi_disable_timestamp(hdev);
+
+skip_engines:
+ gaudi_disable_msi(hdev);
+}
+
+static int gaudi_mmu_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u64 hop0_addr;
+ int rc, i;
+
+ if (!hdev->mmu_enable)
+ return 0;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_MMU)
+ return 0;
+
+ for (i = 0 ; i < prop->max_asid ; i++) {
+ hop0_addr = prop->mmu_pgt_addr +
+ (i * prop->mmu_hop_table_size);
+
+ rc = gaudi_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to set hop0 addr for asid %d\n", i);
+ goto err;
+ }
+ }
+
+ /* init MMU cache manage page */
+ WREG32(mmSTLB_CACHE_INV_BASE_39_8, prop->mmu_cache_mng_addr >> 8);
+ WREG32(mmSTLB_CACHE_INV_BASE_49_40, prop->mmu_cache_mng_addr >> 40);
+
+ /* mem cache invalidation */
+ WREG32(mmSTLB_MEM_CACHE_INVALIDATION, 1);
+
+ hl_mmu_invalidate_cache(hdev, true, 0);
+
+ WREG32(mmMMU_UP_MMU_ENABLE, 1);
+ WREG32(mmMMU_UP_SPI_MASK, 0xF);
+
+ WREG32(mmSTLB_HOP_CONFIGURATION, 0x30440);
+
+ /*
+ * The H/W expects the first PI after init to be 1. After wraparound
+ * we'll write 0.
+ */
+ gaudi->mmu_cache_inv_pi = 1;
+
+ gaudi->hw_cap_initialized |= HW_CAP_MMU;
+
+ return 0;
+
+err:
+ return rc;
+}
+
+static int gaudi_load_firmware_to_device(struct hl_device *hdev)
+{
+ void __iomem *dst;
+
+ dst = hdev->pcie_bar[HBM_BAR_ID] + LINUX_FW_OFFSET;
+
+ return hl_fw_load_fw_to_device(hdev, GAUDI_LINUX_FW_FILE, dst, 0, 0);
+}
+
+static int gaudi_load_boot_fit_to_device(struct hl_device *hdev)
+{
+ void __iomem *dst;
+
+ dst = hdev->pcie_bar[SRAM_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
+
+ return hl_fw_load_fw_to_device(hdev, GAUDI_BOOT_FIT_FILE, dst, 0, 0);
+}
+
+static void gaudi_init_dynamic_firmware_loader(struct hl_device *hdev)
+{
+ struct dynamic_fw_load_mgr *dynamic_loader;
+ struct cpu_dyn_regs *dyn_regs;
+
+ dynamic_loader = &hdev->fw_loader.dynamic_loader;
+
+ /*
+ * here we update initial values for few specific dynamic regs (as
+ * before reading the first descriptor from FW those value has to be
+ * hard-coded) in later stages of the protocol those values will be
+ * updated automatically by reading the FW descriptor so data there
+ * will always be up-to-date
+ */
+ dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
+ dyn_regs->kmd_msg_to_cpu =
+ cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
+ dyn_regs->cpu_cmd_status_to_host =
+ cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
+
+ dynamic_loader->wait_for_bl_timeout = GAUDI_WAIT_FOR_BL_TIMEOUT_USEC;
+}
+
+static void gaudi_init_static_firmware_loader(struct hl_device *hdev)
+{
+ struct static_fw_load_mgr *static_loader;
+
+ static_loader = &hdev->fw_loader.static_loader;
+
+ static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
+ static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
+ static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
+ static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
+ static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
+ static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
+ static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
+ static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
+ static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
+ static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
+ static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
+ static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
+ static_loader->cpu_reset_wait_msec = hdev->pldm ?
+ GAUDI_PLDM_RESET_WAIT_MSEC :
+ GAUDI_CPU_RESET_WAIT_MSEC;
+}
+
+static void gaudi_init_firmware_preload_params(struct hl_device *hdev)
+{
+ struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
+
+ pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
+ pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
+ pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
+ pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
+ pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
+ pre_fw_load->wait_for_preboot_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
+}
+
+static void gaudi_init_firmware_loader(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct fw_load_mgr *fw_loader = &hdev->fw_loader;
+
+ /* fill common fields */
+ fw_loader->fw_comp_loaded = FW_TYPE_NONE;
+ fw_loader->boot_fit_img.image_name = GAUDI_BOOT_FIT_FILE;
+ fw_loader->linux_img.image_name = GAUDI_LINUX_FW_FILE;
+ fw_loader->cpu_timeout = GAUDI_CPU_TIMEOUT_USEC;
+ fw_loader->boot_fit_timeout = GAUDI_BOOT_FIT_REQ_TIMEOUT_USEC;
+ fw_loader->skip_bmc = !hdev->bmc_enable;
+ fw_loader->sram_bar_id = SRAM_BAR_ID;
+ fw_loader->dram_bar_id = HBM_BAR_ID;
+
+ if (prop->dynamic_fw_load)
+ gaudi_init_dynamic_firmware_loader(hdev);
+ else
+ gaudi_init_static_firmware_loader(hdev);
+}
+
+static int gaudi_init_cpu(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int rc;
+
+ if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
+ return 0;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_CPU)
+ return 0;
+
+ /*
+ * The device CPU works with 40 bits addresses.
+ * This register sets the extension to 50 bits.
+ */
+ if (!hdev->asic_prop.fw_security_enabled)
+ WREG32(mmCPU_IF_CPU_MSB_ADDR, hdev->cpu_pci_msb_addr);
+
+ rc = hl_fw_init_cpu(hdev);
+
+ if (rc)
+ return rc;
+
+ gaudi->hw_cap_initialized |= HW_CAP_CPU;
+
+ return 0;
+}
+
+static int gaudi_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 status, irq_handler_offset;
+ struct hl_eq *eq;
+ struct hl_hw_queue *cpu_pq =
+ &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
+ int err;
+
+ if (!hdev->cpu_queues_enable)
+ return 0;
+
+ if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
+ return 0;
+
+ eq = &hdev->event_queue;
+
+ WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
+ WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
+
+ WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
+ WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
+
+ WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW,
+ lower_32_bits(hdev->cpu_accessible_dma_address));
+ WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH,
+ upper_32_bits(hdev->cpu_accessible_dma_address));
+
+ WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
+ WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
+ WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
+
+ /* Used for EQ CI */
+ WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
+
+ WREG32(mmCPU_IF_PF_PQ_PI, 0);
+
+ if (gaudi->multi_msi_mode)
+ WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
+ else
+ WREG32(mmCPU_IF_QUEUE_INIT,
+ PQ_INIT_STATUS_READY_FOR_CP_SINGLE_MSI);
+
+ irq_handler_offset = prop->gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
+
+ WREG32(irq_handler_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
+
+ err = hl_poll_timeout(
+ hdev,
+ mmCPU_IF_QUEUE_INIT,
+ status,
+ (status == PQ_INIT_STATUS_READY_FOR_HOST),
+ 1000,
+ cpu_timeout);
+
+ if (err) {
+ dev_err(hdev->dev,
+ "Failed to communicate with Device CPU (CPU-CP timeout)\n");
+ return -EIO;
+ }
+
+ /* update FW application security bits */
+ if (prop->fw_cpu_boot_dev_sts0_valid)
+ prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
+ if (prop->fw_cpu_boot_dev_sts1_valid)
+ prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
+
+ gaudi->hw_cap_initialized |= HW_CAP_CPU_Q;
+ return 0;
+}
+
+static void gaudi_pre_hw_init(struct hl_device *hdev)
+{
+ /* Perform read from the device to make sure device is up */
+ RREG32(mmHW_STATE);
+
+ if (!hdev->asic_prop.fw_security_enabled) {
+ /* Set the access through PCI bars (Linux driver only) as
+ * secured
+ */
+ WREG32(mmPCIE_WRAP_LBW_PROT_OVR,
+ (PCIE_WRAP_LBW_PROT_OVR_RD_EN_MASK |
+ PCIE_WRAP_LBW_PROT_OVR_WR_EN_MASK));
+
+ /* Perform read to flush the waiting writes to ensure
+ * configuration was set in the device
+ */
+ RREG32(mmPCIE_WRAP_LBW_PROT_OVR);
+ }
+
+ /*
+ * Let's mark in the H/W that we have reached this point. We check
+ * this value in the reset_before_init function to understand whether
+ * we need to reset the chip before doing H/W init. This register is
+ * cleared by the H/W upon H/W reset
+ */
+ WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
+}
+
+static int gaudi_hw_init(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int rc;
+
+ gaudi_pre_hw_init(hdev);
+
+ /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
+ * So we set it here and if anyone tries to move it later to
+ * a different address, there will be an error
+ */
+ if (hdev->asic_prop.iatu_done_by_fw)
+ gaudi->hbm_bar_cur_addr = DRAM_PHYS_BASE;
+
+ /*
+ * Before pushing u-boot/linux to device, need to set the hbm bar to
+ * base address of dram
+ */
+ if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
+ dev_err(hdev->dev,
+ "failed to map HBM bar to DRAM base address\n");
+ return -EIO;
+ }
+
+ rc = gaudi_init_cpu(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to initialize CPU\n");
+ return rc;
+ }
+
+ /* In case the clock gating was enabled in preboot we need to disable
+ * it here before touching the MME/TPC registers.
+ */
+ gaudi_disable_clock_gating(hdev);
+
+ /* SRAM scrambler must be initialized after CPU is running from HBM */
+ gaudi_init_scrambler_sram(hdev);
+
+ /* This is here just in case we are working without CPU */
+ gaudi_init_scrambler_hbm(hdev);
+
+ gaudi_init_golden_registers(hdev);
+
+ rc = gaudi_mmu_init(hdev);
+ if (rc)
+ return rc;
+
+ gaudi_init_security(hdev);
+
+ gaudi_init_pci_dma_qmans(hdev);
+
+ gaudi_init_hbm_dma_qmans(hdev);
+
+ gaudi_init_mme_qmans(hdev);
+
+ gaudi_init_tpc_qmans(hdev);
+
+ gaudi_init_nic_qmans(hdev);
+
+ gaudi_enable_timestamp(hdev);
+
+ /* MSI must be enabled before CPU queues and NIC are initialized */
+ rc = gaudi_enable_msi(hdev);
+ if (rc)
+ goto disable_queues;
+
+ /* must be called after MSI was enabled */
+ rc = gaudi_init_cpu_queues(hdev, GAUDI_CPU_TIMEOUT_USEC);
+ if (rc) {
+ dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n",
+ rc);
+ goto disable_msi;
+ }
+
+ /* Perform read from the device to flush all configuration */
+ RREG32(mmHW_STATE);
+
+ return 0;
+
+disable_msi:
+ gaudi_disable_msi(hdev);
+disable_queues:
+ gaudi_disable_mme_qmans(hdev);
+ gaudi_disable_pci_dma_qmans(hdev);
+
+ return rc;
+}
+
+static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 status, reset_timeout_ms, cpu_timeout_ms, irq_handler_offset;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ bool driver_performs_reset;
+
+ if (!hard_reset) {
+ dev_err(hdev->dev, "GAUDI doesn't support soft-reset\n");
+ return;
+ }
+
+ if (hdev->pldm) {
+ reset_timeout_ms = GAUDI_PLDM_HRESET_TIMEOUT_MSEC;
+ cpu_timeout_ms = GAUDI_PLDM_RESET_WAIT_MSEC;
+ } else {
+ reset_timeout_ms = GAUDI_RESET_TIMEOUT_MSEC;
+ cpu_timeout_ms = GAUDI_CPU_RESET_WAIT_MSEC;
+ }
+
+ if (fw_reset) {
+ dev_dbg(hdev->dev,
+ "Firmware performs HARD reset, going to wait %dms\n",
+ reset_timeout_ms);
+
+ goto skip_reset;
+ }
+
+ driver_performs_reset = !!(!hdev->asic_prop.fw_security_enabled &&
+ !hdev->asic_prop.hard_reset_done_by_fw);
+
+ /* Set device to handle FLR by H/W as we will put the device CPU to
+ * halt mode
+ */
+ if (driver_performs_reset)
+ WREG32(mmPCIE_AUX_FLR_CTRL, (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK |
+ PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
+
+ /* If linux is loaded in the device CPU we need to communicate with it
+ * via the GIC. Otherwise, we need to use COMMS or the MSG_TO_CPU
+ * registers in case of old F/Ws
+ */
+ if (hdev->fw_loader.fw_comp_loaded & FW_TYPE_LINUX) {
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_host_halt_irq);
+
+ WREG32(irq_handler_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_HALT_MACHINE].cpu_id);
+
+ /* This is a hail-mary attempt to revive the card in the small chance that the
+ * f/w has experienced a watchdog event, which caused it to return back to preboot.
+ * In that case, triggering reset through GIC won't help. We need to trigger the
+ * reset as if Linux wasn't loaded.
+ *
+ * We do it only if the reset cause was HB, because that would be the indication
+ * of such an event.
+ *
+ * In case watchdog hasn't expired but we still got HB, then this won't do any
+ * damage.
+ */
+ if (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT) {
+ if (hdev->asic_prop.hard_reset_done_by_fw)
+ hl_fw_ask_hard_reset_without_linux(hdev);
+ else
+ hl_fw_ask_halt_machine_without_linux(hdev);
+ }
+ } else {
+ if (hdev->asic_prop.hard_reset_done_by_fw)
+ hl_fw_ask_hard_reset_without_linux(hdev);
+ else
+ hl_fw_ask_halt_machine_without_linux(hdev);
+ }
+
+ if (driver_performs_reset) {
+
+ /* Configure the reset registers. Must be done as early as
+ * possible in case we fail during H/W initialization
+ */
+ WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_H,
+ (CFG_RST_H_DMA_MASK |
+ CFG_RST_H_MME_MASK |
+ CFG_RST_H_SM_MASK |
+ CFG_RST_H_TPC_7_MASK));
+
+ WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST_CFG_L, CFG_RST_L_TPC_MASK);
+
+ WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_H,
+ (CFG_RST_H_HBM_MASK |
+ CFG_RST_H_TPC_7_MASK |
+ CFG_RST_H_NIC_MASK |
+ CFG_RST_H_SM_MASK |
+ CFG_RST_H_DMA_MASK |
+ CFG_RST_H_MME_MASK |
+ CFG_RST_H_CPU_MASK |
+ CFG_RST_H_MMU_MASK));
+
+ WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG_L,
+ (CFG_RST_L_IF_MASK |
+ CFG_RST_L_PSOC_MASK |
+ CFG_RST_L_TPC_MASK));
+
+ msleep(cpu_timeout_ms);
+
+ /* Tell ASIC not to re-initialize PCIe */
+ WREG32(mmPREBOOT_PCIE_EN, LKD_HARD_RESET_MAGIC);
+
+ /* Restart BTL/BLR upon hard-reset */
+ WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 1);
+
+ WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
+ 1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
+
+ dev_dbg(hdev->dev,
+ "Issued HARD reset command, going to wait %dms\n",
+ reset_timeout_ms);
+ } else {
+ dev_dbg(hdev->dev,
+ "Firmware performs HARD reset, going to wait %dms\n",
+ reset_timeout_ms);
+ }
+
+skip_reset:
+ /*
+ * After hard reset, we can't poll the BTM_FSM register because the PSOC
+ * itself is in reset. Need to wait until the reset is deasserted
+ */
+ msleep(reset_timeout_ms);
+
+ status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
+ if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
+ dev_err(hdev->dev,
+ "Timeout while waiting for device to reset 0x%x\n",
+ status);
+
+ if (gaudi) {
+ gaudi->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q | HW_CAP_HBM |
+ HW_CAP_PCI_DMA | HW_CAP_MME | HW_CAP_TPC_MASK |
+ HW_CAP_HBM_DMA | HW_CAP_PLL | HW_CAP_NIC_MASK |
+ HW_CAP_MMU | HW_CAP_SRAM_SCRAMBLER |
+ HW_CAP_HBM_SCRAMBLER);
+
+ memset(gaudi->events_stat, 0, sizeof(gaudi->events_stat));
+
+ hdev->device_cpu_is_halted = false;
+ }
+}
+
+static int gaudi_suspend(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
+ if (rc)
+ dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
+
+ return rc;
+}
+
+static int gaudi_resume(struct hl_device *hdev)
+{
+ return gaudi_init_iatu(hdev);
+}
+
+static int gaudi_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+ int rc;
+
++ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++ VM_DONTCOPY | VM_NORESERVE);
+
+ rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
+ (dma_addr - HOST_PHYS_BASE), size);
+ if (rc)
+ dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
+
+ return rc;
+}
+
+static void gaudi_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 db_reg_offset, db_value, dma_qm_offset, q_off, irq_handler_offset;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ bool invalid_queue = false;
+ int dma_id;
+
+ switch (hw_queue_id) {
+ case GAUDI_QUEUE_ID_DMA_0_0...GAUDI_QUEUE_ID_DMA_0_3:
+ dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_1_0...GAUDI_QUEUE_ID_DMA_1_3:
+ dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + (hw_queue_id & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_2_0...GAUDI_QUEUE_ID_DMA_2_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_1];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_3_0...GAUDI_QUEUE_ID_DMA_3_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_2];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_4_0...GAUDI_QUEUE_ID_DMA_4_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_3];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_5_0...GAUDI_QUEUE_ID_DMA_5_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_4];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_6_0...GAUDI_QUEUE_ID_DMA_6_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_5];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_DMA_7_0...GAUDI_QUEUE_ID_DMA_7_3:
+ dma_id = gaudi_dma_assignment[GAUDI_HBM_DMA_6];
+ dma_qm_offset = dma_id * DMA_QMAN_OFFSET;
+ q_off = dma_qm_offset + ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmDMA0_QM_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_CPU_PQ:
+ if (gaudi->hw_cap_initialized & HW_CAP_CPU_Q)
+ db_reg_offset = mmCPU_IF_PF_PQ_PI;
+ else
+ invalid_queue = true;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_0_0:
+ db_reg_offset = mmMME2_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_0_1:
+ db_reg_offset = mmMME2_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_0_2:
+ db_reg_offset = mmMME2_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_0_3:
+ db_reg_offset = mmMME2_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_1_0:
+ db_reg_offset = mmMME0_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_1_1:
+ db_reg_offset = mmMME0_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_1_2:
+ db_reg_offset = mmMME0_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_MME_1_3:
+ db_reg_offset = mmMME0_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_0_0:
+ db_reg_offset = mmTPC0_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_0_1:
+ db_reg_offset = mmTPC0_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_0_2:
+ db_reg_offset = mmTPC0_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_0_3:
+ db_reg_offset = mmTPC0_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_1_0:
+ db_reg_offset = mmTPC1_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_1_1:
+ db_reg_offset = mmTPC1_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_1_2:
+ db_reg_offset = mmTPC1_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_1_3:
+ db_reg_offset = mmTPC1_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_2_0:
+ db_reg_offset = mmTPC2_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_2_1:
+ db_reg_offset = mmTPC2_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_2_2:
+ db_reg_offset = mmTPC2_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_2_3:
+ db_reg_offset = mmTPC2_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_3_0:
+ db_reg_offset = mmTPC3_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_3_1:
+ db_reg_offset = mmTPC3_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_3_2:
+ db_reg_offset = mmTPC3_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_3_3:
+ db_reg_offset = mmTPC3_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_4_0:
+ db_reg_offset = mmTPC4_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_4_1:
+ db_reg_offset = mmTPC4_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_4_2:
+ db_reg_offset = mmTPC4_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_4_3:
+ db_reg_offset = mmTPC4_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_5_0:
+ db_reg_offset = mmTPC5_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_5_1:
+ db_reg_offset = mmTPC5_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_5_2:
+ db_reg_offset = mmTPC5_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_5_3:
+ db_reg_offset = mmTPC5_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_6_0:
+ db_reg_offset = mmTPC6_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_6_1:
+ db_reg_offset = mmTPC6_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_6_2:
+ db_reg_offset = mmTPC6_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_6_3:
+ db_reg_offset = mmTPC6_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_7_0:
+ db_reg_offset = mmTPC7_QM_PQ_PI_0;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_7_1:
+ db_reg_offset = mmTPC7_QM_PQ_PI_1;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_7_2:
+ db_reg_offset = mmTPC7_QM_PQ_PI_2;
+ break;
+
+ case GAUDI_QUEUE_ID_TPC_7_3:
+ db_reg_offset = mmTPC7_QM_PQ_PI_3;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_0_0...GAUDI_QUEUE_ID_NIC_0_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC0))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC0_QM0_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_1_0...GAUDI_QUEUE_ID_NIC_1_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC1))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC0_QM1_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_2_0...GAUDI_QUEUE_ID_NIC_2_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC2))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC1_QM0_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_3_0...GAUDI_QUEUE_ID_NIC_3_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC3))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC1_QM1_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_4_0...GAUDI_QUEUE_ID_NIC_4_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC4))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC2_QM0_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_5_0...GAUDI_QUEUE_ID_NIC_5_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC5))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC2_QM1_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_6_0...GAUDI_QUEUE_ID_NIC_6_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC6))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC3_QM0_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_7_0...GAUDI_QUEUE_ID_NIC_7_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC7))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC3_QM1_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_8_0...GAUDI_QUEUE_ID_NIC_8_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC8))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC4_QM0_PQ_PI_0 + q_off;
+ break;
+
+ case GAUDI_QUEUE_ID_NIC_9_0...GAUDI_QUEUE_ID_NIC_9_3:
+ if (!(gaudi->hw_cap_initialized & HW_CAP_NIC9))
+ invalid_queue = true;
+
+ q_off = ((hw_queue_id - 1) & 0x3) * 4;
+ db_reg_offset = mmNIC4_QM1_PQ_PI_0 + q_off;
+ break;
+
+ default:
+ invalid_queue = true;
+ }
+
+ if (invalid_queue) {
+ /* Should never get here */
+ dev_err(hdev->dev, "h/w queue %d is invalid. Can't set pi\n",
+ hw_queue_id);
+ return;
+ }
+
+ db_value = pi;
+
+ /* ring the doorbell */
+ WREG32(db_reg_offset, db_value);
+
+ if (hw_queue_id == GAUDI_QUEUE_ID_CPU_PQ) {
+ /* make sure device CPU will read latest data from host */
+ mb();
+
+ irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_host_pi_upd_irq);
+
+ WREG32(irq_handler_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_PI_UPDATE].cpu_id);
+ }
+}
+
+static void gaudi_pqe_write(struct hl_device *hdev, __le64 *pqe,
+ struct hl_bd *bd)
+{
+ __le64 *pbd = (__le64 *) bd;
+
+ /* The QMANs are on the host memory so a simple copy suffice */
+ pqe[0] = pbd[0];
+ pqe[1] = pbd[1];
+}
+
+static void *gaudi_dma_alloc_coherent(struct hl_device *hdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags)
+{
+ void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
+ dma_handle, flags);
+
+ /* Shift to the device's base physical address of host memory */
+ if (kernel_addr)
+ *dma_handle += HOST_PHYS_BASE;
+
+ return kernel_addr;
+}
+
+static void gaudi_dma_free_coherent(struct hl_device *hdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle)
+{
+ /* Cancel the device's base physical address of host memory */
+ dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
+
+ dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
+}
+
+static int gaudi_scrub_device_dram(struct hl_device *hdev, u64 val)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 cur_addr = prop->dram_user_base_address;
+ u32 chunk_size, busy;
+ int rc, dma_id;
+
+ while (cur_addr < prop->dram_end_address) {
+ for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
+ u32 dma_offset = dma_id * DMA_CORE_OFFSET;
+
+ chunk_size =
+ min((u64)SZ_2G, prop->dram_end_address - cur_addr);
+
+ dev_dbg(hdev->dev,
+ "Doing HBM scrubbing for 0x%09llx - 0x%09llx\n",
+ cur_addr, cur_addr + chunk_size);
+
+ WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset,
+ lower_32_bits(val));
+ WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset,
+ upper_32_bits(val));
+ WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset,
+ lower_32_bits(cur_addr));
+ WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset,
+ upper_32_bits(cur_addr));
+ WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset,
+ chunk_size);
+ WREG32(mmDMA0_CORE_COMMIT + dma_offset,
+ ((1 << DMA0_CORE_COMMIT_LIN_SHIFT) |
+ (1 << DMA0_CORE_COMMIT_MEM_SET_SHIFT)));
+
+ cur_addr += chunk_size;
+
+ if (cur_addr == prop->dram_end_address)
+ break;
+ }
+
+ for (dma_id = 0 ; dma_id < DMA_NUMBER_OF_CHANNELS ; dma_id++) {
+ u32 dma_offset = dma_id * DMA_CORE_OFFSET;
+
+ rc = hl_poll_timeout(
+ hdev,
+ mmDMA0_CORE_STS0 + dma_offset,
+ busy,
+ ((busy & DMA0_CORE_STS0_BUSY_MASK) == 0),
+ 1000,
+ HBM_SCRUBBING_TIMEOUT_US);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "DMA Timeout during HBM scrubbing of DMA #%d\n",
+ dma_id);
+ return -EIO;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static int gaudi_scrub_device_mem(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 wait_to_idle_time = hdev->pdev ? HBM_SCRUBBING_TIMEOUT_US :
+ min_t(u64, HBM_SCRUBBING_TIMEOUT_US * 10, HL_SIM_MAX_TIMEOUT_US);
+ u64 addr, size, val = hdev->memory_scrub_val;
+ ktime_t timeout;
+ int rc = 0;
+
+ if (!hdev->memory_scrub)
+ return 0;
+
+ timeout = ktime_add_us(ktime_get(), wait_to_idle_time);
+ while (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
+ if (ktime_compare(ktime_get(), timeout) > 0) {
+ dev_err(hdev->dev, "waiting for idle timeout\n");
+ return -ETIMEDOUT;
+ }
+ usleep_range((1000 >> 2) + 1, 1000);
+ }
+
+ /* Scrub SRAM */
+ addr = prop->sram_user_base_address;
+ size = hdev->pldm ? 0x10000 : prop->sram_size - SRAM_USER_BASE_OFFSET;
+
+ dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx val: 0x%llx\n",
+ addr, addr + size, val);
+ rc = gaudi_memset_device_memory(hdev, addr, size, val);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to clear SRAM (%d)\n", rc);
+ return rc;
+ }
+
+ /* Scrub HBM using all DMA channels in parallel */
+ rc = gaudi_scrub_device_dram(hdev, val);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to clear HBM (%d)\n", rc);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void *gaudi_get_int_queue_base(struct hl_device *hdev,
+ u32 queue_id, dma_addr_t *dma_handle,
+ u16 *queue_len)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct gaudi_internal_qman_info *q;
+
+ if (queue_id >= GAUDI_QUEUE_ID_SIZE ||
+ gaudi_queue_type[queue_id] != QUEUE_TYPE_INT) {
+ dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
+ return NULL;
+ }
+
+ q = &gaudi->internal_qmans[queue_id];
+ *dma_handle = q->pq_dma_addr;
+ *queue_len = q->pq_size / QMAN_PQ_ENTRY_SIZE;
+
+ return q->pq_kernel_addr;
+}
+
+static int gaudi_send_cpu_message(struct hl_device *hdev, u32 *msg,
+ u16 len, u32 timeout, u64 *result)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q)) {
+ if (result)
+ *result = 0;
+ return 0;
+ }
+
+ if (!timeout)
+ timeout = GAUDI_MSG_TO_CPU_TIMEOUT_USEC;
+
+ return hl_fw_send_cpu_message(hdev, GAUDI_QUEUE_ID_CPU_PQ, msg, len,
+ timeout, result);
+}
+
+static int gaudi_test_queue(struct hl_device *hdev, u32 hw_queue_id)
+{
+ struct packet_msg_prot *fence_pkt;
+ dma_addr_t pkt_dma_addr;
+ u32 fence_val, tmp, timeout_usec;
+ dma_addr_t fence_dma_addr;
+ u32 *fence_ptr;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI_PLDM_TEST_QUEUE_WAIT_USEC;
+ else
+ timeout_usec = GAUDI_TEST_QUEUE_WAIT_USEC;
+
+ fence_val = GAUDI_QMAN0_FENCE_VAL;
+
+ fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
+ if (!fence_ptr) {
+ dev_err(hdev->dev,
+ "Failed to allocate memory for H/W queue %d testing\n",
+ hw_queue_id);
+ return -ENOMEM;
+ }
+
+ *fence_ptr = 0;
+
+ fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
+ &pkt_dma_addr);
+ if (!fence_pkt) {
+ dev_err(hdev->dev,
+ "Failed to allocate packet for H/W queue %d testing\n",
+ hw_queue_id);
+ rc = -ENOMEM;
+ goto free_fence_ptr;
+ }
+
+ tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ fence_pkt->ctl = cpu_to_le32(tmp);
+ fence_pkt->value = cpu_to_le32(fence_val);
+ fence_pkt->addr = cpu_to_le64(fence_dma_addr);
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
+ sizeof(struct packet_msg_prot),
+ pkt_dma_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to send fence packet to H/W queue %d\n",
+ hw_queue_id);
+ goto free_pkt;
+ }
+
+ rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
+ 1000, timeout_usec, true);
+
+ hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
+
+ if (rc == -ETIMEDOUT) {
+ dev_err(hdev->dev,
+ "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
+ hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
+ rc = -EIO;
+ }
+
+free_pkt:
+ hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
+free_fence_ptr:
+ hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
+ return rc;
+}
+
+static int gaudi_test_cpu_queue(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ /*
+ * check capability here as send_cpu_message() won't update the result
+ * value if no capability
+ */
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_test_cpu_queue(hdev);
+}
+
+static int gaudi_test_queues(struct hl_device *hdev)
+{
+ int i, rc, ret_val = 0;
+
+ for (i = 0 ; i < hdev->asic_prop.max_queues ; i++) {
+ if (hdev->asic_prop.hw_queues_props[i].type == QUEUE_TYPE_EXT) {
+ rc = gaudi_test_queue(hdev, i);
+ if (rc)
+ ret_val = -EINVAL;
+ }
+ }
+
+ rc = gaudi_test_cpu_queue(hdev);
+ if (rc)
+ ret_val = -EINVAL;
+
+ return ret_val;
+}
+
+static void *gaudi_dma_pool_zalloc(struct hl_device *hdev, size_t size,
+ gfp_t mem_flags, dma_addr_t *dma_handle)
+{
+ void *kernel_addr;
+
+ if (size > GAUDI_DMA_POOL_BLK_SIZE)
+ return NULL;
+
+ kernel_addr = dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
+
+ /* Shift to the device's base physical address of host memory */
+ if (kernel_addr)
+ *dma_handle += HOST_PHYS_BASE;
+
+ return kernel_addr;
+}
+
+static void gaudi_dma_pool_free(struct hl_device *hdev, void *vaddr,
+ dma_addr_t dma_addr)
+{
+ /* Cancel the device's base physical address of host memory */
+ dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
+
+ dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
+}
+
+static void *gaudi_cpu_accessible_dma_pool_alloc(struct hl_device *hdev,
+ size_t size, dma_addr_t *dma_handle)
+{
+ return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
+}
+
+static void gaudi_cpu_accessible_dma_pool_free(struct hl_device *hdev,
+ size_t size, void *vaddr)
+{
+ hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
+}
+
+static u32 gaudi_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
+{
+ struct scatterlist *sg, *sg_next_iter;
+ u32 count, dma_desc_cnt;
+ u64 len, len_next;
+ dma_addr_t addr, addr_next;
+
+ dma_desc_cnt = 0;
+
+ for_each_sgtable_dma_sg(sgt, sg, count) {
+ len = sg_dma_len(sg);
+ addr = sg_dma_address(sg);
+
+ if (len == 0)
+ break;
+
+ while ((count + 1) < sgt->nents) {
+ sg_next_iter = sg_next(sg);
+ len_next = sg_dma_len(sg_next_iter);
+ addr_next = sg_dma_address(sg_next_iter);
+
+ if (len_next == 0)
+ break;
+
+ if ((addr + len == addr_next) &&
+ (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
+ len += len_next;
+ count++;
+ sg = sg_next_iter;
+ } else {
+ break;
+ }
+ }
+
+ dma_desc_cnt++;
+ }
+
+ return dma_desc_cnt * sizeof(struct packet_lin_dma);
+}
+
+static int gaudi_pin_memory_before_cs(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt,
+ u64 addr, enum dma_data_direction dir)
+{
+ struct hl_userptr *userptr;
+ int rc;
+
+ if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
+ parser->job_userptr_list, &userptr))
+ goto already_pinned;
+
+ userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
+ if (!userptr)
+ return -ENOMEM;
+
+ rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
+ userptr);
+ if (rc)
+ goto free_userptr;
+
+ list_add_tail(&userptr->job_node, parser->job_userptr_list);
+
+ rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
+ if (rc) {
+ dev_err(hdev->dev, "failed to map sgt with DMA region\n");
+ goto unpin_memory;
+ }
+
+ userptr->dma_mapped = true;
+ userptr->dir = dir;
+
+already_pinned:
+ parser->patched_cb_size +=
+ gaudi_get_dma_desc_list_size(hdev, userptr->sgt);
+
+ return 0;
+
+unpin_memory:
+ list_del(&userptr->job_node);
+ hl_unpin_host_memory(hdev, userptr);
+free_userptr:
+ kfree(userptr);
+ return rc;
+}
+
+static int gaudi_validate_dma_pkt_host(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt,
+ bool src_in_host)
+{
+ enum dma_data_direction dir;
+ bool skip_host_mem_pin = false, user_memset;
+ u64 addr;
+ int rc = 0;
+
+ user_memset = (le32_to_cpu(user_dma_pkt->ctl) &
+ GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+ GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
+
+ if (src_in_host) {
+ if (user_memset)
+ skip_host_mem_pin = true;
+
+ dev_dbg(hdev->dev, "DMA direction is HOST --> DEVICE\n");
+ dir = DMA_TO_DEVICE;
+ addr = le64_to_cpu(user_dma_pkt->src_addr);
+ } else {
+ dev_dbg(hdev->dev, "DMA direction is DEVICE --> HOST\n");
+ dir = DMA_FROM_DEVICE;
+ addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
+ GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
+ GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
+ }
+
+ if (skip_host_mem_pin)
+ parser->patched_cb_size += sizeof(*user_dma_pkt);
+ else
+ rc = gaudi_pin_memory_before_cs(hdev, parser, user_dma_pkt,
+ addr, dir);
+
+ return rc;
+}
+
+static int gaudi_validate_dma_pkt_no_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt)
+{
+ bool src_in_host = false;
+ u64 dst_addr = (le64_to_cpu(user_dma_pkt->dst_addr) &
+ GAUDI_PKT_LIN_DMA_DST_ADDR_MASK) >>
+ GAUDI_PKT_LIN_DMA_DST_ADDR_SHIFT;
+
+ dev_dbg(hdev->dev, "DMA packet details:\n");
+ dev_dbg(hdev->dev, "source == 0x%llx\n",
+ le64_to_cpu(user_dma_pkt->src_addr));
+ dev_dbg(hdev->dev, "destination == 0x%llx\n", dst_addr);
+ dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
+
+ /*
+ * Special handling for DMA with size 0. Bypass all validations
+ * because no transactions will be done except for WR_COMP, which
+ * is not a security issue
+ */
+ if (!le32_to_cpu(user_dma_pkt->tsize)) {
+ parser->patched_cb_size += sizeof(*user_dma_pkt);
+ return 0;
+ }
+
+ if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
+ src_in_host = true;
+
+ return gaudi_validate_dma_pkt_host(hdev, parser, user_dma_pkt,
+ src_in_host);
+}
+
+static int gaudi_validate_load_and_exe_pkt(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_load_and_exe *user_pkt)
+{
+ u32 cfg;
+
+ cfg = le32_to_cpu(user_pkt->cfg);
+
+ if (cfg & GAUDI_PKT_LOAD_AND_EXE_CFG_DST_MASK) {
+ dev_err(hdev->dev,
+ "User not allowed to use Load and Execute\n");
+ return -EPERM;
+ }
+
+ parser->patched_cb_size += sizeof(struct packet_load_and_exe);
+
+ return 0;
+}
+
+static int gaudi_validate_cb(struct hl_device *hdev,
+ struct hl_cs_parser *parser, bool is_mmu)
+{
+ u32 cb_parsed_length = 0;
+ int rc = 0;
+
+ parser->patched_cb_size = 0;
+
+ /* cb_user_size is more than 0 so loop will always be executed */
+ while (cb_parsed_length < parser->user_cb_size) {
+ enum packet_id pkt_id;
+ u16 pkt_size;
+ struct gaudi_packet *user_pkt;
+
+ user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
+
+ pkt_id = (enum packet_id) (
+ (le64_to_cpu(user_pkt->header) &
+ PACKET_HEADER_PACKET_ID_MASK) >>
+ PACKET_HEADER_PACKET_ID_SHIFT);
+
+ if (!validate_packet_id(pkt_id)) {
+ dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ pkt_size = gaudi_packet_sizes[pkt_id];
+ cb_parsed_length += pkt_size;
+ if (cb_parsed_length > parser->user_cb_size) {
+ dev_err(hdev->dev,
+ "packet 0x%x is out of CB boundary\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ switch (pkt_id) {
+ case PACKET_MSG_PROT:
+ dev_err(hdev->dev,
+ "User not allowed to use MSG_PROT\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_CP_DMA:
+ dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_STOP:
+ dev_err(hdev->dev, "User not allowed to use STOP\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_WREG_BULK:
+ dev_err(hdev->dev,
+ "User not allowed to use WREG_BULK\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_LOAD_AND_EXE:
+ rc = gaudi_validate_load_and_exe_pkt(hdev, parser,
+ (struct packet_load_and_exe *) user_pkt);
+ break;
+
+ case PACKET_LIN_DMA:
+ parser->contains_dma_pkt = true;
+ if (is_mmu)
+ parser->patched_cb_size += pkt_size;
+ else
+ rc = gaudi_validate_dma_pkt_no_mmu(hdev, parser,
+ (struct packet_lin_dma *) user_pkt);
+ break;
+
+ case PACKET_WREG_32:
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_REPEAT:
+ case PACKET_FENCE:
+ case PACKET_NOP:
+ case PACKET_ARB_POINT:
+ parser->patched_cb_size += pkt_size;
+ break;
+
+ default:
+ dev_err(hdev->dev, "Invalid packet header 0x%x\n",
+ pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ if (rc)
+ break;
+ }
+
+ /*
+ * The new CB should have space at the end for two MSG_PROT packets:
+ * 1. Optional NOP padding for cacheline alignment
+ * 2. A packet that will act as a completion packet
+ * 3. A packet that will generate MSI interrupt
+ */
+ if (parser->completion)
+ parser->patched_cb_size += gaudi_get_patched_cb_extra_size(
+ parser->patched_cb_size);
+
+ return rc;
+}
+
+static int gaudi_patch_dma_packet(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt,
+ struct packet_lin_dma *new_dma_pkt,
+ u32 *new_dma_pkt_size)
+{
+ struct hl_userptr *userptr;
+ struct scatterlist *sg, *sg_next_iter;
+ u32 count, dma_desc_cnt, user_wrcomp_en_mask, ctl;
+ u64 len, len_next;
+ dma_addr_t dma_addr, dma_addr_next;
+ u64 device_memory_addr, addr;
+ enum dma_data_direction dir;
+ struct sg_table *sgt;
+ bool src_in_host = false;
+ bool skip_host_mem_pin = false;
+ bool user_memset;
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+
+ if (parser->hw_queue_id <= GAUDI_QUEUE_ID_DMA_0_3)
+ src_in_host = true;
+
+ user_memset = (ctl & GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+ GAUDI_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
+
+ if (src_in_host) {
+ addr = le64_to_cpu(user_dma_pkt->src_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ dir = DMA_TO_DEVICE;
+ if (user_memset)
+ skip_host_mem_pin = true;
+ } else {
+ addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ dir = DMA_FROM_DEVICE;
+ }
+
+ if ((!skip_host_mem_pin) &&
+ (!hl_userptr_is_pinned(hdev, addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ parser->job_userptr_list, &userptr))) {
+ dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
+ addr, user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+
+ if ((user_memset) && (dir == DMA_TO_DEVICE)) {
+ memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
+ *new_dma_pkt_size = sizeof(*user_dma_pkt);
+ return 0;
+ }
+
+ user_wrcomp_en_mask = ctl & GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
+
+ sgt = userptr->sgt;
+ dma_desc_cnt = 0;
+
+ for_each_sgtable_dma_sg(sgt, sg, count) {
+ len = sg_dma_len(sg);
+ dma_addr = sg_dma_address(sg);
+
+ if (len == 0)
+ break;
+
+ while ((count + 1) < sgt->nents) {
+ sg_next_iter = sg_next(sg);
+ len_next = sg_dma_len(sg_next_iter);
+ dma_addr_next = sg_dma_address(sg_next_iter);
+
+ if (len_next == 0)
+ break;
+
+ if ((dma_addr + len == dma_addr_next) &&
+ (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
+ len += len_next;
+ count++;
+ sg = sg_next_iter;
+ } else {
+ break;
+ }
+ }
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+ if (likely(dma_desc_cnt))
+ ctl &= ~GAUDI_PKT_CTL_EB_MASK;
+ ctl &= ~GAUDI_PKT_LIN_DMA_CTL_WRCOMP_EN_MASK;
+ new_dma_pkt->ctl = cpu_to_le32(ctl);
+ new_dma_pkt->tsize = cpu_to_le32(len);
+
+ if (dir == DMA_TO_DEVICE) {
+ new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
+ new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
+ } else {
+ new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
+ new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
+ }
+
+ if (!user_memset)
+ device_memory_addr += len;
+ dma_desc_cnt++;
+ new_dma_pkt++;
+ }
+
+ if (!dma_desc_cnt) {
+ dev_err(hdev->dev,
+ "Error of 0 SG entries when patching DMA packet\n");
+ return -EFAULT;
+ }
+
+ /* Fix the last dma packet - wrcomp must be as user set it */
+ new_dma_pkt--;
+ new_dma_pkt->ctl |= cpu_to_le32(user_wrcomp_en_mask);
+
+ *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
+
+ return 0;
+}
+
+static int gaudi_patch_cb(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u32 cb_parsed_length = 0;
+ u32 cb_patched_cur_length = 0;
+ int rc = 0;
+
+ /* cb_user_size is more than 0 so loop will always be executed */
+ while (cb_parsed_length < parser->user_cb_size) {
+ enum packet_id pkt_id;
+ u16 pkt_size;
+ u32 new_pkt_size = 0;
+ struct gaudi_packet *user_pkt, *kernel_pkt;
+
+ user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
+ kernel_pkt = parser->patched_cb->kernel_address +
+ cb_patched_cur_length;
+
+ pkt_id = (enum packet_id) (
+ (le64_to_cpu(user_pkt->header) &
+ PACKET_HEADER_PACKET_ID_MASK) >>
+ PACKET_HEADER_PACKET_ID_SHIFT);
+
+ if (!validate_packet_id(pkt_id)) {
+ dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ pkt_size = gaudi_packet_sizes[pkt_id];
+ cb_parsed_length += pkt_size;
+ if (cb_parsed_length > parser->user_cb_size) {
+ dev_err(hdev->dev,
+ "packet 0x%x is out of CB boundary\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ switch (pkt_id) {
+ case PACKET_LIN_DMA:
+ rc = gaudi_patch_dma_packet(hdev, parser,
+ (struct packet_lin_dma *) user_pkt,
+ (struct packet_lin_dma *) kernel_pkt,
+ &new_pkt_size);
+ cb_patched_cur_length += new_pkt_size;
+ break;
+
+ case PACKET_MSG_PROT:
+ dev_err(hdev->dev,
+ "User not allowed to use MSG_PROT\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_CP_DMA:
+ dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_STOP:
+ dev_err(hdev->dev, "User not allowed to use STOP\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_WREG_32:
+ case PACKET_WREG_BULK:
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_REPEAT:
+ case PACKET_FENCE:
+ case PACKET_NOP:
+ case PACKET_ARB_POINT:
+ case PACKET_LOAD_AND_EXE:
+ memcpy(kernel_pkt, user_pkt, pkt_size);
+ cb_patched_cur_length += pkt_size;
+ break;
+
+ default:
+ dev_err(hdev->dev, "Invalid packet header 0x%x\n",
+ pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ if (rc)
+ break;
+ }
+
+ return rc;
+}
+
+static int gaudi_parse_cb_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u64 handle;
+ u32 patched_cb_size;
+ struct hl_cb *user_cb;
+ int rc;
+
+ /*
+ * The new CB should have space at the end for two MSG_PROT packets:
+ * 1. Optional NOP padding for cacheline alignment
+ * 2. A packet that will act as a completion packet
+ * 3. A packet that will generate MSI interrupt
+ */
+ if (parser->completion)
+ parser->patched_cb_size = parser->user_cb_size +
+ gaudi_get_patched_cb_extra_size(parser->user_cb_size);
+ else
+ parser->patched_cb_size = parser->user_cb_size;
+
+ rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
+ parser->patched_cb_size, false, false,
+ &handle);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to allocate patched CB for DMA CS %d\n",
+ rc);
+ return rc;
+ }
+
+ parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
+ /* hl_cb_get should never fail */
+ if (!parser->patched_cb) {
+ dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
+ rc = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * We are protected from overflow because the check
+ * "parser->user_cb_size <= parser->user_cb->size" was done in get_cb_from_cs_chunk()
+ * in the common code. That check is done only if is_kernel_allocated_cb is true.
+ *
+ * There is no option to reach here without going through that check because:
+ * 1. validate_queue_index() assigns true to is_kernel_allocated_cb for any submission to
+ * an external queue.
+ * 2. For Gaudi, we only parse CBs that were submitted to the external queues.
+ */
+ memcpy(parser->patched_cb->kernel_address,
+ parser->user_cb->kernel_address,
+ parser->user_cb_size);
+
+ patched_cb_size = parser->patched_cb_size;
+
+ /* Validate patched CB instead of user CB */
+ user_cb = parser->user_cb;
+ parser->user_cb = parser->patched_cb;
+ rc = gaudi_validate_cb(hdev, parser, true);
+ parser->user_cb = user_cb;
+
+ if (rc) {
+ hl_cb_put(parser->patched_cb);
+ goto out;
+ }
+
+ if (patched_cb_size != parser->patched_cb_size) {
+ dev_err(hdev->dev, "user CB size mismatch\n");
+ hl_cb_put(parser->patched_cb);
+ rc = -EINVAL;
+ goto out;
+ }
+
+out:
+ /*
+ * Always call cb destroy here because we still have 1 reference
+ * to it by calling cb_get earlier. After the job will be completed,
+ * cb_put will release it, but here we want to remove it from the
+ * idr
+ */
+ hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
+
+ return rc;
+}
+
+static int gaudi_parse_cb_no_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u64 handle;
+ int rc;
+
+ rc = gaudi_validate_cb(hdev, parser, false);
+
+ if (rc)
+ goto free_userptr;
+
+ rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
+ parser->patched_cb_size, false, false,
+ &handle);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to allocate patched CB for DMA CS %d\n", rc);
+ goto free_userptr;
+ }
+
+ parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
+ /* hl_cb_get should never fail here */
+ if (!parser->patched_cb) {
+ dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = gaudi_patch_cb(hdev, parser);
+
+ if (rc)
+ hl_cb_put(parser->patched_cb);
+
+out:
+ /*
+ * Always call cb destroy here because we still have 1 reference
+ * to it by calling cb_get earlier. After the job will be completed,
+ * cb_put will release it, but here we want to remove it from the
+ * idr
+ */
+ hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
+
+free_userptr:
+ if (rc)
+ hl_userptr_delete_list(hdev, parser->job_userptr_list);
+ return rc;
+}
+
+static int gaudi_parse_cb_no_ext_queue(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 nic_queue_offset, nic_mask_q_id;
+
+ if ((parser->hw_queue_id >= GAUDI_QUEUE_ID_NIC_0_0) &&
+ (parser->hw_queue_id <= GAUDI_QUEUE_ID_NIC_9_3)) {
+ nic_queue_offset = parser->hw_queue_id - GAUDI_QUEUE_ID_NIC_0_0;
+ nic_mask_q_id = 1 << (HW_CAP_NIC_SHIFT + (nic_queue_offset >> 2));
+
+ if (!(gaudi->hw_cap_initialized & nic_mask_q_id)) {
+ dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
+ return -EINVAL;
+ }
+ }
+
+ /* For internal queue jobs just check if CB address is valid */
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->sram_user_base_address,
+ asic_prop->sram_end_address))
+ return 0;
+
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->dram_user_base_address,
+ asic_prop->dram_end_address))
+ return 0;
+
+ /* PMMU and HPMMU addresses are equal, check only one of them */
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->pmmu.start_addr,
+ asic_prop->pmmu.end_addr))
+ return 0;
+
+ dev_err(hdev->dev,
+ "CB address 0x%px + 0x%x for internal QMAN is not valid\n",
+ parser->user_cb, parser->user_cb_size);
+
+ return -EFAULT;
+}
+
+static int gaudi_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (parser->queue_type == QUEUE_TYPE_INT)
+ return gaudi_parse_cb_no_ext_queue(hdev, parser);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_MMU)
+ return gaudi_parse_cb_mmu(hdev, parser);
+ else
+ return gaudi_parse_cb_no_mmu(hdev, parser);
+}
+
+static void gaudi_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
+ u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
+ u32 msi_vec, bool eb)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct packet_msg_prot *cq_pkt;
+ struct packet_nop *cq_padding;
+ u64 msi_addr;
+ u32 tmp;
+
+ cq_padding = kernel_address + original_len;
+ cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
+
+ while ((void *)cq_padding < (void *)cq_pkt) {
+ cq_padding->ctl = cpu_to_le32(FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_NOP));
+ cq_padding++;
+ }
+
+ tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ if (eb)
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
+
+ cq_pkt->ctl = cpu_to_le32(tmp);
+ cq_pkt->value = cpu_to_le32(cq_val);
+ cq_pkt->addr = cpu_to_le64(cq_addr);
+
+ cq_pkt++;
+
+ tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+ cq_pkt->ctl = cpu_to_le32(tmp);
+ cq_pkt->value = cpu_to_le32(1);
+
+ if (gaudi->multi_msi_mode)
+ msi_addr = mmPCIE_MSI_INTR_0 + msi_vec * 4;
+ else
+ msi_addr = mmPCIE_CORE_MSI_REQ;
+
+ cq_pkt->addr = cpu_to_le64(CFG_BASE + msi_addr);
+}
+
+static void gaudi_update_eq_ci(struct hl_device *hdev, u32 val)
+{
+ WREG32(mmCPU_IF_EQ_RD_OFFS, val);
+}
+
+static int gaudi_memset_device_memory(struct hl_device *hdev, u64 addr,
+ u32 size, u64 val)
+{
+ struct packet_lin_dma *lin_dma_pkt;
+ struct hl_cs_job *job;
+ u32 cb_size, ctl, err_cause;
+ struct hl_cb *cb;
+ int rc;
+
+ cb = hl_cb_kernel_create(hdev, PAGE_SIZE, false);
+ if (!cb)
+ return -EFAULT;
+
+ lin_dma_pkt = cb->kernel_address;
+ memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
+ cb_size = sizeof(*lin_dma_pkt);
+
+ ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
+ ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_LIN_DMA_CTL_LIN_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+
+ lin_dma_pkt->ctl = cpu_to_le32(ctl);
+ lin_dma_pkt->src_addr = cpu_to_le64(val);
+ lin_dma_pkt->dst_addr |= cpu_to_le64(addr);
+ lin_dma_pkt->tsize = cpu_to_le32(size);
+
+ job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
+ if (!job) {
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ rc = -ENOMEM;
+ goto release_cb;
+ }
+
+ /* Verify DMA is OK */
+ err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
+ if (err_cause && !hdev->init_done) {
+ dev_dbg(hdev->dev,
+ "Clearing DMA0 engine from errors (cause 0x%x)\n",
+ err_cause);
+ WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
+ }
+
+ job->id = 0;
+ job->user_cb = cb;
+ atomic_inc(&job->user_cb->cs_cnt);
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
+ job->patched_cb = job->user_cb;
+ job->job_cb_size = job->user_cb_size + sizeof(struct packet_msg_prot);
+
+ hl_debugfs_add_job(hdev, job);
+
+ rc = gaudi_send_job_on_qman0(hdev, job);
+ hl_debugfs_remove_job(hdev, job);
+ kfree(job);
+ atomic_dec(&cb->cs_cnt);
+
+ /* Verify DMA is OK */
+ err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE);
+ if (err_cause) {
+ dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
+ rc = -EIO;
+ if (!hdev->init_done) {
+ dev_dbg(hdev->dev,
+ "Clearing DMA0 engine from errors (cause 0x%x)\n",
+ err_cause);
+ WREG32(mmDMA0_CORE_ERR_CAUSE, err_cause);
+ }
+ }
+
+release_cb:
+ hl_cb_put(cb);
+ hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
+
+ return rc;
+}
+
+static int gaudi_memset_registers(struct hl_device *hdev, u64 reg_base,
+ u32 num_regs, u32 val)
+{
+ struct packet_msg_long *pkt;
+ struct hl_cs_job *job;
+ u32 cb_size, ctl;
+ struct hl_cb *cb;
+ int i, rc;
+
+ cb_size = (sizeof(*pkt) * num_regs) + sizeof(struct packet_msg_prot);
+
+ if (cb_size > SZ_2M) {
+ dev_err(hdev->dev, "CB size must be smaller than %uMB", SZ_2M);
+ return -ENOMEM;
+ }
+
+ cb = hl_cb_kernel_create(hdev, cb_size, false);
+ if (!cb)
+ return -EFAULT;
+
+ pkt = cb->kernel_address;
+
+ ctl = FIELD_PREP(GAUDI_PKT_LONG_CTL_OP_MASK, 0); /* write the value */
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_LONG);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ for (i = 0; i < num_regs ; i++, pkt++) {
+ pkt->ctl = cpu_to_le32(ctl);
+ pkt->value = cpu_to_le32(val);
+ pkt->addr = cpu_to_le64(reg_base + (i * 4));
+ }
+
+ job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
+ if (!job) {
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ rc = -ENOMEM;
+ goto release_cb;
+ }
+
+ job->id = 0;
+ job->user_cb = cb;
+ atomic_inc(&job->user_cb->cs_cnt);
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = GAUDI_QUEUE_ID_DMA_0_0;
+ job->patched_cb = job->user_cb;
+ job->job_cb_size = cb_size;
+
+ hl_debugfs_add_job(hdev, job);
+
+ rc = gaudi_send_job_on_qman0(hdev, job);
+ hl_debugfs_remove_job(hdev, job);
+ kfree(job);
+ atomic_dec(&cb->cs_cnt);
+
+release_cb:
+ hl_cb_put(cb);
+ hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
+
+ return rc;
+}
+
+static int gaudi_restore_sm_registers(struct hl_device *hdev)
+{
+ u64 base_addr;
+ u32 num_regs;
+ int rc;
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
+ num_regs = NUM_OF_SOB_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_SOB_OBJ_0;
+ num_regs = NUM_OF_SOB_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
+ num_regs = NUM_OF_SOB_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_MON_STATUS_0;
+ num_regs = NUM_OF_MONITORS_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_E_S_SYNC_MNGR_OBJS_MON_STATUS_0;
+ num_regs = NUM_OF_MONITORS_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_W_N_SYNC_MNGR_OBJS_MON_STATUS_0;
+ num_regs = NUM_OF_MONITORS_IN_BLOCK;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
+ (GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT * 4);
+ num_regs = NUM_OF_SOB_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_SYNC_OBJECT;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ base_addr = CFG_BASE + mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0 +
+ (GAUDI_FIRST_AVAILABLE_W_S_MONITOR * 4);
+ num_regs = NUM_OF_MONITORS_IN_BLOCK - GAUDI_FIRST_AVAILABLE_W_S_MONITOR;
+ rc = gaudi_memset_registers(hdev, base_addr, num_regs, 0);
+ if (rc) {
+ dev_err(hdev->dev, "failed resetting SM registers");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void gaudi_restore_dma_registers(struct hl_device *hdev)
+{
+ u32 sob_delta = mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_1 -
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0;
+ int i;
+
+ for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
+ u64 sob_addr = CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0 +
+ (i * sob_delta);
+ u32 dma_offset = i * DMA_CORE_OFFSET;
+
+ WREG32(mmDMA0_CORE_WR_COMP_ADDR_LO + dma_offset,
+ lower_32_bits(sob_addr));
+ WREG32(mmDMA0_CORE_WR_COMP_ADDR_HI + dma_offset,
+ upper_32_bits(sob_addr));
+ WREG32(mmDMA0_CORE_WR_COMP_WDATA + dma_offset, 0x80000001);
+
+ /* For DMAs 2-7, need to restore WR_AWUSER_31_11 as it can be
+ * modified by the user for SRAM reduction
+ */
+ if (i > 1)
+ WREG32(mmDMA0_CORE_WR_AWUSER_31_11 + dma_offset,
+ 0x00000001);
+ }
+}
+
+static void gaudi_restore_qm_registers(struct hl_device *hdev)
+{
+ u32 qman_offset;
+ int i;
+
+ for (i = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
+ qman_offset = i * DMA_QMAN_OFFSET;
+ WREG32(mmDMA0_QM_ARB_CFG_0 + qman_offset, 0);
+ }
+
+ for (i = 0 ; i < MME_NUMBER_OF_MASTER_ENGINES ; i++) {
+ qman_offset = i * (mmMME2_QM_BASE - mmMME0_QM_BASE);
+ WREG32(mmMME0_QM_ARB_CFG_0 + qman_offset, 0);
+ }
+
+ for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
+ qman_offset = i * TPC_QMAN_OFFSET;
+ WREG32(mmTPC0_QM_ARB_CFG_0 + qman_offset, 0);
+ }
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
+ qman_offset = (i >> 1) * NIC_MACRO_QMAN_OFFSET +
+ (i & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+ WREG32(mmNIC0_QM0_ARB_CFG_0 + qman_offset, 0);
+ }
+}
+
+static int gaudi_restore_user_registers(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = gaudi_restore_sm_registers(hdev);
+ if (rc)
+ return rc;
+
+ gaudi_restore_dma_registers(hdev);
+ gaudi_restore_qm_registers(hdev);
+
+ return 0;
+}
+
+static int gaudi_context_switch(struct hl_device *hdev, u32 asid)
+{
+ return 0;
+}
+
+static int gaudi_mmu_clear_pgt_range(struct hl_device *hdev)
+{
+ u32 size = hdev->asic_prop.mmu_pgt_size +
+ hdev->asic_prop.mmu_cache_mng_size;
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u64 addr = hdev->asic_prop.mmu_pgt_addr;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+ return 0;
+
+ return gaudi_memset_device_memory(hdev, addr, size, 0);
+}
+
+static void gaudi_restore_phase_topology(struct hl_device *hdev)
+{
+
+}
+
+static int gaudi_dma_core_transfer(struct hl_device *hdev, int dma_id, u64 addr,
+ u32 size_to_dma, dma_addr_t dma_addr)
+{
+ u32 err_cause, val;
+ u64 dma_offset;
+ int rc;
+
+ dma_offset = dma_id * DMA_CORE_OFFSET;
+
+ WREG32(mmDMA0_CORE_SRC_BASE_LO + dma_offset, lower_32_bits(addr));
+ WREG32(mmDMA0_CORE_SRC_BASE_HI + dma_offset, upper_32_bits(addr));
+ WREG32(mmDMA0_CORE_DST_BASE_LO + dma_offset, lower_32_bits(dma_addr));
+ WREG32(mmDMA0_CORE_DST_BASE_HI + dma_offset, upper_32_bits(dma_addr));
+ WREG32(mmDMA0_CORE_DST_TSIZE_0 + dma_offset, size_to_dma);
+ WREG32(mmDMA0_CORE_COMMIT + dma_offset,
+ (1 << DMA0_CORE_COMMIT_LIN_SHIFT));
+
+ rc = hl_poll_timeout(
+ hdev,
+ mmDMA0_CORE_STS0 + dma_offset,
+ val,
+ ((val & DMA0_CORE_STS0_BUSY_MASK) == 0),
+ 0,
+ 1000000);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "DMA %d timed-out during reading of 0x%llx\n",
+ dma_id, addr);
+ return -EIO;
+ }
+
+ /* Verify DMA is OK */
+ err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
+ if (err_cause) {
+ dev_err(hdev->dev, "DMA Failed, cause 0x%x\n", err_cause);
+ dev_dbg(hdev->dev,
+ "Clearing DMA0 engine from errors (cause 0x%x)\n",
+ err_cause);
+ WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
+
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int gaudi_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size,
+ void *blob_addr)
+{
+ u32 dma_core_sts0, err_cause, cfg1, size_left, pos, size_to_dma;
+ u32 qm_glbl_sts0, qm_cgm_sts;
+ u64 dma_offset, qm_offset;
+ dma_addr_t dma_addr;
+ void *kernel_addr;
+ bool is_eng_idle;
+ int rc = 0, dma_id;
+
+ kernel_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &dma_addr, GFP_KERNEL | __GFP_ZERO);
+
+ if (!kernel_addr)
+ return -ENOMEM;
+
+ hdev->asic_funcs->hw_queues_lock(hdev);
+
+ dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_1];
+ dma_offset = dma_id * DMA_CORE_OFFSET;
+ qm_offset = dma_id * DMA_QMAN_OFFSET;
+ dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
+ qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
+ qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
+ IS_DMA_IDLE(dma_core_sts0);
+
+ if (!is_eng_idle) {
+ dma_id = gaudi_dma_assignment[GAUDI_PCI_DMA_2];
+ dma_offset = dma_id * DMA_CORE_OFFSET;
+ qm_offset = dma_id * DMA_QMAN_OFFSET;
+ dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + dma_offset);
+ qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + qm_offset);
+ qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + qm_offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
+ IS_DMA_IDLE(dma_core_sts0);
+
+ if (!is_eng_idle) {
+ dev_err_ratelimited(hdev->dev,
+ "Can't read via DMA because it is BUSY\n");
+ rc = -EAGAIN;
+ goto out;
+ }
+ }
+
+ cfg1 = RREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset);
+ WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset,
+ 0xF << DMA0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+
+ /* TODO: remove this by mapping the DMA temporary buffer to the MMU
+ * using the compute ctx ASID, if exists. If not, use the kernel ctx
+ * ASID
+ */
+ WREG32_OR(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_VAL_SHIFT));
+
+ /* Verify DMA is OK */
+ err_cause = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
+ if (err_cause) {
+ dev_dbg(hdev->dev,
+ "Clearing DMA0 engine from errors (cause 0x%x)\n",
+ err_cause);
+ WREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset, err_cause);
+ }
+
+ pos = 0;
+ size_left = size;
+ size_to_dma = SZ_2M;
+
+ while (size_left > 0) {
+
+ if (size_left < SZ_2M)
+ size_to_dma = size_left;
+
+ rc = gaudi_dma_core_transfer(hdev, dma_id, addr, size_to_dma,
+ dma_addr);
+ if (rc)
+ break;
+
+ memcpy(blob_addr + pos, kernel_addr, size_to_dma);
+
+ if (size_left <= SZ_2M)
+ break;
+
+ pos += SZ_2M;
+ addr += SZ_2M;
+ size_left -= SZ_2M;
+ }
+
+ /* TODO: remove this by mapping the DMA temporary buffer to the MMU
+ * using the compute ctx ASID, if exists. If not, use the kernel ctx
+ * ASID
+ */
+ WREG32_AND(mmDMA0_CORE_PROT + dma_offset,
+ ~BIT(DMA0_CORE_PROT_VAL_SHIFT));
+
+ WREG32(mmDMA0_QM_GLBL_CFG1 + qm_offset, cfg1);
+
+out:
+ hdev->asic_funcs->hw_queues_unlock(hdev);
+
+ hl_asic_dma_free_coherent(hdev, SZ_2M, kernel_addr, dma_addr);
+
+ return rc;
+}
+
+static u64 gaudi_read_pte(struct hl_device *hdev, u64 addr)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return U64_MAX;
+
+ return readq(hdev->pcie_bar[HBM_BAR_ID] +
+ (addr - gaudi->hbm_bar_cur_addr));
+}
+
+static void gaudi_write_pte(struct hl_device *hdev, u64 addr, u64 val)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return;
+
+ writeq(val, hdev->pcie_bar[HBM_BAR_ID] +
+ (addr - gaudi->hbm_bar_cur_addr));
+}
+
+void gaudi_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
+{
+ /* mask to zero the MMBP and ASID bits */
+ WREG32_AND(reg, ~0x7FF);
+ WREG32_OR(reg, asid);
+}
+
+static void gaudi_mmu_prepare(struct hl_device *hdev, u32 asid)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ if (asid & ~DMA0_QM_GLBL_NON_SECURE_PROPS_0_ASID_MASK) {
+ dev_crit(hdev->dev, "asid %u is too big\n", asid);
+ return;
+ }
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmDMA0_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA1_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA2_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA3_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA4_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA5_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA6_CORE_NON_SECURE_PROPS, asid);
+ gaudi_mmu_prepare_reg(hdev, mmDMA7_CORE_NON_SECURE_PROPS, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC0_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC1_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC2_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC3_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC4_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC5_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC6_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_ARUSER_LO, asid);
+ gaudi_mmu_prepare_reg(hdev, mmTPC7_CFG_AWUSER_LO, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_QM_GLBL_NON_SECURE_PROPS_4, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_2, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_3, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_QM_GLBL_NON_SECURE_PROPS_4, asid);
+
+ gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_SBAB_ARUSER1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME1_SBAB_ARUSER1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_SBAB_ARUSER1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER0, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME3_SBAB_ARUSER1, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME0_ACC_WBC, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME1_ACC_WBC, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME2_ACC_WBC, asid);
+ gaudi_mmu_prepare_reg(hdev, mmMME3_ACC_WBC, asid);
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC0) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM0_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC1) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC0_QM1_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC2) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM0_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC3) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC1_QM1_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC4) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM0_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC5) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC2_QM1_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC6) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM0_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC7) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC3_QM1_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC8) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM0_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ if (gaudi->hw_cap_initialized & HW_CAP_NIC9) {
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_0,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_1,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_2,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_3,
+ asid);
+ gaudi_mmu_prepare_reg(hdev, mmNIC4_QM1_GLBL_NON_SECURE_PROPS_4,
+ asid);
+ }
+
+ gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_ARUSER, asid);
+ gaudi_mmu_prepare_reg(hdev, mmPSOC_GLOBAL_CONF_TRACE_AWUSER, asid);
+}
+
+static int gaudi_send_job_on_qman0(struct hl_device *hdev,
+ struct hl_cs_job *job)
+{
+ struct packet_msg_prot *fence_pkt;
+ u32 *fence_ptr;
+ dma_addr_t fence_dma_addr;
+ struct hl_cb *cb;
+ u32 tmp, timeout, dma_offset;
+ int rc;
+
+ if (hdev->pldm)
+ timeout = GAUDI_PLDM_QMAN0_TIMEOUT_USEC;
+ else
+ timeout = HL_DEVICE_TIMEOUT_USEC;
+
+ fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
+ if (!fence_ptr) {
+ dev_err(hdev->dev,
+ "Failed to allocate fence memory for QMAN0\n");
+ return -ENOMEM;
+ }
+
+ cb = job->patched_cb;
+
+ fence_pkt = cb->kernel_address +
+ job->job_cb_size - sizeof(struct packet_msg_prot);
+
+ tmp = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_PROT);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 1);
+ tmp |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ fence_pkt->ctl = cpu_to_le32(tmp);
+ fence_pkt->value = cpu_to_le32(GAUDI_QMAN0_FENCE_VAL);
+ fence_pkt->addr = cpu_to_le64(fence_dma_addr);
+
+ dma_offset = gaudi_dma_assignment[GAUDI_PCI_DMA_1] * DMA_CORE_OFFSET;
+
+ WREG32(mmDMA0_CORE_PROT + dma_offset,
+ BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT) | BIT(DMA0_CORE_PROT_VAL_SHIFT));
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, GAUDI_QUEUE_ID_DMA_0_0,
+ job->job_cb_size, cb->bus_address);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
+ goto free_fence_ptr;
+ }
+
+ rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
+ (tmp == GAUDI_QMAN0_FENCE_VAL), 1000,
+ timeout, true);
+
+ hl_hw_queue_inc_ci_kernel(hdev, GAUDI_QUEUE_ID_DMA_0_0);
+
+ if (rc == -ETIMEDOUT) {
+ dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
+ goto free_fence_ptr;
+ }
+
+free_fence_ptr:
+ WREG32(mmDMA0_CORE_PROT + dma_offset, BIT(DMA0_CORE_PROT_ERR_VAL_SHIFT));
+
+ hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
+ return rc;
+}
+
+static void gaudi_get_event_desc(u16 event_type, char *desc, size_t size)
+{
+ if (event_type >= GAUDI_EVENT_SIZE)
+ goto event_not_supported;
+
+ if (!gaudi_irq_map_table[event_type].valid)
+ goto event_not_supported;
+
+ snprintf(desc, size, gaudi_irq_map_table[event_type].name);
+
+ return;
+
+event_not_supported:
+ snprintf(desc, size, "N/A");
+}
+
+static const char *gaudi_get_razwi_initiator_dma_name(struct hl_device *hdev, u32 x_y,
+ bool is_write, u16 *engine_id_1,
+ u16 *engine_id_2)
+{
+ u32 dma_id[2], dma_offset, err_cause[2], mask, i;
+
+ mask = is_write ? DMA0_CORE_ERR_CAUSE_HBW_WR_ERR_MASK :
+ DMA0_CORE_ERR_CAUSE_HBW_RD_ERR_MASK;
+
+ switch (x_y) {
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
+ dma_id[0] = 0;
+ dma_id[1] = 2;
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
+ dma_id[0] = 1;
+ dma_id[1] = 3;
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
+ dma_id[0] = 4;
+ dma_id[1] = 6;
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
+ dma_id[0] = 5;
+ dma_id[1] = 7;
+ break;
+ default:
+ goto unknown_initiator;
+ }
+
+ for (i = 0 ; i < 2 ; i++) {
+ dma_offset = dma_id[i] * DMA_CORE_OFFSET;
+ err_cause[i] = RREG32(mmDMA0_CORE_ERR_CAUSE + dma_offset);
+ }
+
+ switch (x_y) {
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
+ if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
+ return "DMA0";
+ } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_2;
+ return "DMA2";
+ } else {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_0;
+ *engine_id_2 = GAUDI_ENGINE_ID_DMA_2;
+ return "DMA0 or DMA2";
+ }
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
+ if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
+ return "DMA1";
+ } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_3;
+ return "DMA3";
+ } else {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_1;
+ *engine_id_2 = GAUDI_ENGINE_ID_DMA_3;
+ return "DMA1 or DMA3";
+ }
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
+ if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
+ return "DMA4";
+ } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_6;
+ return "DMA6";
+ } else {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_4;
+ *engine_id_2 = GAUDI_ENGINE_ID_DMA_6;
+ return "DMA4 or DMA6";
+ }
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
+ if ((err_cause[0] & mask) && !(err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
+ return "DMA5";
+ } else if (!(err_cause[0] & mask) && (err_cause[1] & mask)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_7;
+ return "DMA7";
+ } else {
+ *engine_id_1 = GAUDI_ENGINE_ID_DMA_5;
+ *engine_id_2 = GAUDI_ENGINE_ID_DMA_7;
+ return "DMA5 or DMA7";
+ }
+ }
+
+unknown_initiator:
+ return "unknown initiator";
+}
+
+static const char *gaudi_get_razwi_initiator_name(struct hl_device *hdev, bool is_write,
+ u16 *engine_id_1, u16 *engine_id_2)
+{
+ u32 val, x_y, axi_id;
+
+ val = is_write ? RREG32(mmMMU_UP_RAZWI_WRITE_ID) :
+ RREG32(mmMMU_UP_RAZWI_READ_ID);
+ x_y = val & ((RAZWI_INITIATOR_Y_MASK << RAZWI_INITIATOR_Y_SHIFT) |
+ (RAZWI_INITIATOR_X_MASK << RAZWI_INITIATOR_X_SHIFT));
+ axi_id = val & (RAZWI_INITIATOR_AXI_ID_MASK <<
+ RAZWI_INITIATOR_AXI_ID_SHIFT);
+
+ switch (x_y) {
+ case RAZWI_INITIATOR_ID_X_Y_TPC0_NIC0:
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_0;
+ return "TPC0";
+ }
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_NIC_0;
+ return "NIC0";
+ }
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_TPC1:
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_1;
+ return "TPC1";
+ case RAZWI_INITIATOR_ID_X_Y_MME0_0:
+ case RAZWI_INITIATOR_ID_X_Y_MME0_1:
+ *engine_id_1 = GAUDI_ENGINE_ID_MME_0;
+ return "MME0";
+ case RAZWI_INITIATOR_ID_X_Y_MME1_0:
+ case RAZWI_INITIATOR_ID_X_Y_MME1_1:
+ *engine_id_1 = GAUDI_ENGINE_ID_MME_1;
+ return "MME1";
+ case RAZWI_INITIATOR_ID_X_Y_TPC2:
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_2;
+ return "TPC2";
+ case RAZWI_INITIATOR_ID_X_Y_TPC3_PCI_CPU_PSOC:
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_3;
+ return "TPC3";
+ }
+ /* PCI, CPU or PSOC does not have engine id*/
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PCI))
+ return "PCI";
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_CPU))
+ return "CPU";
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_PSOC))
+ return "PSOC";
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_S_1:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_S_1:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_W_N_1:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_0:
+ case RAZWI_INITIATOR_ID_X_Y_DMA_IF_E_N_1:
+ return gaudi_get_razwi_initiator_dma_name(hdev, x_y, is_write,
+ engine_id_1, engine_id_2);
+ case RAZWI_INITIATOR_ID_X_Y_TPC4_NIC1_NIC2:
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_4;
+ return "TPC4";
+ }
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_NIC_1;
+ return "NIC1";
+ }
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_NIC_2;
+ return "NIC2";
+ }
+ break;
+ case RAZWI_INITIATOR_ID_X_Y_TPC5:
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_5;
+ return "TPC5";
+ case RAZWI_INITIATOR_ID_X_Y_MME2_0:
+ case RAZWI_INITIATOR_ID_X_Y_MME2_1:
+ *engine_id_1 = GAUDI_ENGINE_ID_MME_2;
+ return "MME2";
+ case RAZWI_INITIATOR_ID_X_Y_MME3_0:
+ case RAZWI_INITIATOR_ID_X_Y_MME3_1:
+ *engine_id_1 = GAUDI_ENGINE_ID_MME_3;
+ return "MME3";
+ case RAZWI_INITIATOR_ID_X_Y_TPC6:
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_6;
+ return "TPC6";
+ case RAZWI_INITIATOR_ID_X_Y_TPC7_NIC4_NIC5:
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_TPC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_TPC_7;
+ return "TPC7";
+ }
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_NIC_4;
+ return "NIC4";
+ }
+ if (axi_id == RAZWI_INITIATOR_ID_AXI_ID(AXI_ID_NIC_FT)) {
+ *engine_id_1 = GAUDI_ENGINE_ID_NIC_5;
+ return "NIC5";
+ }
+ break;
+ default:
+ break;
+ }
+
+ dev_err(hdev->dev,
+ "Unknown RAZWI initiator ID 0x%x [Y=%d, X=%d, AXI_ID=%d]\n",
+ val,
+ (val >> RAZWI_INITIATOR_Y_SHIFT) & RAZWI_INITIATOR_Y_MASK,
+ (val >> RAZWI_INITIATOR_X_SHIFT) & RAZWI_INITIATOR_X_MASK,
+ (val >> RAZWI_INITIATOR_AXI_ID_SHIFT) &
+ RAZWI_INITIATOR_AXI_ID_MASK);
+
+ return "unknown initiator";
+}
+
+static void gaudi_print_and_get_razwi_info(struct hl_device *hdev, u16 *engine_id_1,
+ u16 *engine_id_2, bool *is_read, bool *is_write)
+{
+
+ if (RREG32(mmMMU_UP_RAZWI_WRITE_VLD)) {
+ dev_err_ratelimited(hdev->dev,
+ "RAZWI event caused by illegal write of %s\n",
+ gaudi_get_razwi_initiator_name(hdev, true, engine_id_1, engine_id_2));
+ WREG32(mmMMU_UP_RAZWI_WRITE_VLD, 0);
+ *is_write = true;
+ }
+
+ if (RREG32(mmMMU_UP_RAZWI_READ_VLD)) {
+ dev_err_ratelimited(hdev->dev,
+ "RAZWI event caused by illegal read of %s\n",
+ gaudi_get_razwi_initiator_name(hdev, false, engine_id_1, engine_id_2));
+ WREG32(mmMMU_UP_RAZWI_READ_VLD, 0);
+ *is_read = true;
+ }
+}
+
+static void gaudi_print_and_get_mmu_error_info(struct hl_device *hdev, u64 *addr, u64 *event_mask)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 val;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ val = RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE);
+ if (val & MMU_UP_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
+ *addr = val & MMU_UP_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
+ *addr <<= 32;
+ *addr |= RREG32(mmMMU_UP_PAGE_ERROR_CAPTURE_VA);
+
+ dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n", *addr);
+ hl_handle_page_fault(hdev, *addr, 0, true, event_mask);
+
+ WREG32(mmMMU_UP_PAGE_ERROR_CAPTURE, 0);
+ }
+
+ val = RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE);
+ if (val & MMU_UP_ACCESS_ERROR_CAPTURE_ENTRY_VALID_MASK) {
+ *addr = val & MMU_UP_ACCESS_ERROR_CAPTURE_VA_49_32_MASK;
+ *addr <<= 32;
+ *addr |= RREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE_VA);
+
+ dev_err_ratelimited(hdev->dev, "MMU access error on va 0x%llx\n", *addr);
+
+ WREG32(mmMMU_UP_ACCESS_ERROR_CAPTURE, 0);
+ }
+}
+
+/*
+ * +-------------------+------------------------------------------------------+
+ * | Configuration Reg | Description |
+ * | Address | |
+ * +-------------------+------------------------------------------------------+
+ * | 0xF30 - 0xF3F |ECC single error indication (1 bit per memory wrapper)|
+ * | |0xF30 memory wrappers 31:0 (MSB to LSB) |
+ * | |0xF34 memory wrappers 63:32 |
+ * | |0xF38 memory wrappers 95:64 |
+ * | |0xF3C memory wrappers 127:96 |
+ * +-------------------+------------------------------------------------------+
+ * | 0xF40 - 0xF4F |ECC double error indication (1 bit per memory wrapper)|
+ * | |0xF40 memory wrappers 31:0 (MSB to LSB) |
+ * | |0xF44 memory wrappers 63:32 |
+ * | |0xF48 memory wrappers 95:64 |
+ * | |0xF4C memory wrappers 127:96 |
+ * +-------------------+------------------------------------------------------+
+ */
+static int gaudi_extract_ecc_info(struct hl_device *hdev,
+ struct ecc_info_extract_params *params, u64 *ecc_address,
+ u64 *ecc_syndrom, u8 *memory_wrapper_idx)
+{
+ u32 i, num_mem_regs, reg, err_bit;
+ u64 err_addr, err_word = 0;
+
+ num_mem_regs = params->num_memories / 32 +
+ ((params->num_memories % 32) ? 1 : 0);
+
+ if (params->block_address >= CFG_BASE)
+ params->block_address -= CFG_BASE;
+
+ if (params->derr)
+ err_addr = params->block_address + GAUDI_ECC_DERR0_OFFSET;
+ else
+ err_addr = params->block_address + GAUDI_ECC_SERR0_OFFSET;
+
+ /* Set invalid wrapper index */
+ *memory_wrapper_idx = 0xFF;
+
+ /* Iterate through memory wrappers, a single bit must be set */
+ for (i = 0 ; i < num_mem_regs ; i++) {
+ err_addr += i * 4;
+ err_word = RREG32(err_addr);
+ if (err_word) {
+ err_bit = __ffs(err_word);
+ *memory_wrapper_idx = err_bit + (32 * i);
+ break;
+ }
+ }
+
+ if (*memory_wrapper_idx == 0xFF) {
+ dev_err(hdev->dev, "ECC error information cannot be found\n");
+ return -EINVAL;
+ }
+
+ WREG32(params->block_address + GAUDI_ECC_MEM_SEL_OFFSET,
+ *memory_wrapper_idx);
+
+ *ecc_address =
+ RREG32(params->block_address + GAUDI_ECC_ADDRESS_OFFSET);
+ *ecc_syndrom =
+ RREG32(params->block_address + GAUDI_ECC_SYNDROME_OFFSET);
+
+ /* Clear error indication */
+ reg = RREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET);
+ if (params->derr)
+ reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_DERR_MASK, 1);
+ else
+ reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_SERR_MASK, 1);
+
+ WREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET, reg);
+
+ return 0;
+}
+
+/*
+ * gaudi_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
+ *
+ * @idx: the current pi/ci value
+ * @q_len: the queue length (power of 2)
+ *
+ * @return the cyclically decremented index
+ */
+static inline u32 gaudi_queue_idx_dec(u32 idx, u32 q_len)
+{
+ u32 mask = q_len - 1;
+
+ /*
+ * modular decrement is equivalent to adding (queue_size -1)
+ * later we take LSBs to make sure the value is in the
+ * range [0, queue_len - 1]
+ */
+ return (idx + q_len - 1) & mask;
+}
+
+/**
+ * gaudi_handle_sw_config_stream_data - print SW config stream data
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ * @event_mask: mask of the last events occurred
+ */
+static void gaudi_handle_sw_config_stream_data(struct hl_device *hdev, u32 stream,
+ u64 qman_base, u64 event_mask)
+{
+ u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
+ u32 cq_ptr_lo_off, size;
+
+ cq_ptr_lo_off = mmTPC0_QM_CQ_PTR_LO_1 - mmTPC0_QM_CQ_PTR_LO_0;
+
+ cq_ptr_lo = qman_base + (mmTPC0_QM_CQ_PTR_LO_0 - mmTPC0_QM_BASE) +
+ stream * cq_ptr_lo_off;
+ cq_ptr_hi = cq_ptr_lo +
+ (mmTPC0_QM_CQ_PTR_HI_0 - mmTPC0_QM_CQ_PTR_LO_0);
+ cq_tsize = cq_ptr_lo +
+ (mmTPC0_QM_CQ_TSIZE_0 - mmTPC0_QM_CQ_PTR_LO_0);
+
+ cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
+ size = RREG32(cq_tsize);
+ dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %u\n",
+ stream, cq_ptr, size);
+
+ if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
+ hdev->captured_err_info.undef_opcode.cq_addr = cq_ptr;
+ hdev->captured_err_info.undef_opcode.cq_size = size;
+ hdev->captured_err_info.undef_opcode.stream_id = stream;
+ }
+}
+
+/**
+ * gaudi_handle_last_pqes_on_err - print last PQEs on error
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @qid_base: first QID of the QMAN (out of 4 streams)
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ * @event_mask: mask of the last events occurred
+ * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
+ */
+static void gaudi_handle_last_pqes_on_err(struct hl_device *hdev, u32 qid_base,
+ u32 stream, u64 qman_base,
+ u64 event_mask,
+ bool pr_sw_conf)
+{
+ u32 ci, qm_ci_stream_off, queue_len;
+ struct hl_hw_queue *q;
+ u64 pq_ci, addr[PQ_FETCHER_CACHE_SIZE];
+ int i;
+
+ q = &hdev->kernel_queues[qid_base + stream];
+
+ qm_ci_stream_off = mmTPC0_QM_PQ_CI_1 - mmTPC0_QM_PQ_CI_0;
+ pq_ci = qman_base + (mmTPC0_QM_PQ_CI_0 - mmTPC0_QM_BASE) +
+ stream * qm_ci_stream_off;
+
+ queue_len = (q->queue_type == QUEUE_TYPE_INT) ?
+ q->int_queue_len : HL_QUEUE_LENGTH;
+
+ hdev->asic_funcs->hw_queues_lock(hdev);
+
+ if (pr_sw_conf)
+ gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
+
+ ci = RREG32(pq_ci);
+
+ /* we should start printing form ci -1 */
+ ci = gaudi_queue_idx_dec(ci, queue_len);
+ memset(addr, 0, sizeof(addr));
+
+ for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
+ struct hl_bd *bd;
+ u32 len;
+
+ bd = q->kernel_address;
+ bd += ci;
+
+ len = le32_to_cpu(bd->len);
+ /* len 0 means uninitialized entry- break */
+ if (!len)
+ break;
+
+ addr[i] = le64_to_cpu(bd->ptr);
+
+ dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %u\n",
+ stream, ci, addr[i], len);
+
+ /* get previous ci, wrap if needed */
+ ci = gaudi_queue_idx_dec(ci, queue_len);
+ }
+
+ if (event_mask & HL_NOTIFIER_EVENT_UNDEFINED_OPCODE) {
+ struct undefined_opcode_info *undef_opcode = &hdev->captured_err_info.undef_opcode;
+ u32 arr_idx = undef_opcode->cb_addr_streams_len;
+
+ if (arr_idx == 0) {
+ undef_opcode->timestamp = ktime_get();
+ undef_opcode->engine_id = gaudi_queue_id_to_engine_id[qid_base];
+ }
+
+ memcpy(undef_opcode->cb_addr_streams[arr_idx], addr, sizeof(addr));
+ undef_opcode->cb_addr_streams_len++;
+ }
+
+ hdev->asic_funcs->hw_queues_unlock(hdev);
+}
+
+/**
+ * handle_qman_data_on_err - extract QMAN data on error
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @qid_base: first QID of the QMAN (out of 4 streams)
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ * @event_mask: mask of the last events occurred
+ *
+ * This function attempt to exatract as much data as possible on QMAN error.
+ * On upper CP print the SW config stream data and last 8 PQEs.
+ * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
+ */
+static void handle_qman_data_on_err(struct hl_device *hdev, u32 qid_base,
+ u32 stream, u64 qman_base, u64 event_mask)
+{
+ u32 i;
+
+ if (stream != QMAN_STREAMS) {
+ gaudi_handle_last_pqes_on_err(hdev, qid_base, stream,
+ qman_base, event_mask, true);
+ return;
+ }
+
+ /* handle Lower-CP */
+ gaudi_handle_sw_config_stream_data(hdev, stream, qman_base, event_mask);
+
+ for (i = 0; i < QMAN_STREAMS; i++)
+ gaudi_handle_last_pqes_on_err(hdev, qid_base, i,
+ qman_base, event_mask, false);
+}
+
+static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
+ const char *qm_name,
+ u64 qman_base,
+ u32 qid_base,
+ u64 *event_mask)
+{
+ u32 i, j, glbl_sts_val, arb_err_val, glbl_sts_clr_val;
+ u64 glbl_sts_addr, arb_err_addr;
+ char reg_desc[32];
+
+ glbl_sts_addr = qman_base + (mmTPC0_QM_GLBL_STS1_0 - mmTPC0_QM_BASE);
+ arb_err_addr = qman_base + (mmTPC0_QM_ARB_ERR_CAUSE - mmTPC0_QM_BASE);
+
+ /* Iterate through all stream GLBL_STS1 registers + Lower CP */
+ for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
+ glbl_sts_clr_val = 0;
+ glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
+
+ if (!glbl_sts_val)
+ continue;
+
+ if (i == QMAN_STREAMS)
+ snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
+ else
+ snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
+
+ for (j = 0 ; j < GAUDI_NUM_OF_QM_ERR_CAUSE ; j++) {
+ if (glbl_sts_val & BIT(j)) {
+ dev_err_ratelimited(hdev->dev,
+ "%s %s. err cause: %s\n",
+ qm_name, reg_desc,
+ gaudi_qman_error_cause[j]);
+ glbl_sts_clr_val |= BIT(j);
+ }
+ }
+ /* check for undefined opcode */
+ if (glbl_sts_val & TPC0_QM_GLBL_STS1_CP_UNDEF_CMD_ERR_MASK &&
+ hdev->captured_err_info.undef_opcode.write_enable) {
+ memset(&hdev->captured_err_info.undef_opcode, 0,
+ sizeof(hdev->captured_err_info.undef_opcode));
+
+ hdev->captured_err_info.undef_opcode.write_enable = false;
+ *event_mask |= HL_NOTIFIER_EVENT_UNDEFINED_OPCODE;
+ }
+
+ /* Write 1 clear errors */
+ if (!hdev->stop_on_err)
+ WREG32(glbl_sts_addr + 4 * i, glbl_sts_clr_val);
+ else
+ handle_qman_data_on_err(hdev, qid_base, i, qman_base, *event_mask);
+ }
+
+ arb_err_val = RREG32(arb_err_addr);
+
+ if (!arb_err_val)
+ return;
+
+ for (j = 0 ; j < GAUDI_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
+ if (arb_err_val & BIT(j)) {
+ dev_err_ratelimited(hdev->dev,
+ "%s ARB_ERR. err cause: %s\n",
+ qm_name,
+ gaudi_qman_arb_error_cause[j]);
+ }
+ }
+}
+
+static void gaudi_print_sm_sei_info(struct hl_device *hdev, u16 event_type,
+ struct hl_eq_sm_sei_data *sei_data)
+{
+ u32 index = event_type - GAUDI_EVENT_DMA_IF_SEI_0;
+
+ /* Flip the bits as the enum is ordered in the opposite way */
+ index = (index ^ 0x3) & 0x3;
+
+ switch (sei_data->sei_cause) {
+ case SM_SEI_SO_OVERFLOW:
+ dev_err_ratelimited(hdev->dev,
+ "%s SEI Error: SOB Group %u overflow/underflow",
+ gaudi_sync_manager_names[index],
+ le32_to_cpu(sei_data->sei_log));
+ break;
+ case SM_SEI_LBW_4B_UNALIGNED:
+ dev_err_ratelimited(hdev->dev,
+ "%s SEI Error: Unaligned 4B LBW access, monitor agent address low - %#x",
+ gaudi_sync_manager_names[index],
+ le32_to_cpu(sei_data->sei_log));
+ break;
+ case SM_SEI_AXI_RESPONSE_ERR:
+ dev_err_ratelimited(hdev->dev,
+ "%s SEI Error: AXI ID %u response error",
+ gaudi_sync_manager_names[index],
+ le32_to_cpu(sei_data->sei_log));
+ break;
+ default:
+ dev_err_ratelimited(hdev->dev, "Unknown SM SEI cause %u",
+ le32_to_cpu(sei_data->sei_log));
+ break;
+ }
+}
+
+static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
+ struct hl_eq_ecc_data *ecc_data)
+{
+ struct ecc_info_extract_params params;
+ u64 ecc_address = 0, ecc_syndrom = 0;
+ u8 index, memory_wrapper_idx = 0;
+ bool extract_info_from_fw;
+ int rc;
+
+ if (hdev->asic_prop.fw_security_enabled) {
+ extract_info_from_fw = true;
+ goto extract_ecc_info;
+ }
+
+ switch (event_type) {
+ case GAUDI_EVENT_PCIE_CORE_SERR ... GAUDI_EVENT_PCIE_PHY_DERR:
+ case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_MMU_DERR:
+ extract_info_from_fw = true;
+ break;
+ case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
+ index = event_type - GAUDI_EVENT_TPC0_SERR;
+ params.block_address = mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
+ params.num_memories = 90;
+ params.derr = false;
+ extract_info_from_fw = false;
+ break;
+ case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
+ index = event_type - GAUDI_EVENT_TPC0_DERR;
+ params.block_address =
+ mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
+ params.num_memories = 90;
+ params.derr = true;
+ extract_info_from_fw = false;
+ break;
+ case GAUDI_EVENT_MME0_ACC_SERR:
+ case GAUDI_EVENT_MME1_ACC_SERR:
+ case GAUDI_EVENT_MME2_ACC_SERR:
+ case GAUDI_EVENT_MME3_ACC_SERR:
+ index = (event_type - GAUDI_EVENT_MME0_ACC_SERR) / 4;
+ params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
+ params.num_memories = 128;
+ params.derr = false;
+ extract_info_from_fw = false;
+ break;
+ case GAUDI_EVENT_MME0_ACC_DERR:
+ case GAUDI_EVENT_MME1_ACC_DERR:
+ case GAUDI_EVENT_MME2_ACC_DERR:
+ case GAUDI_EVENT_MME3_ACC_DERR:
+ index = (event_type - GAUDI_EVENT_MME0_ACC_DERR) / 4;
+ params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
+ params.num_memories = 128;
+ params.derr = true;
+ extract_info_from_fw = false;
+ break;
+ case GAUDI_EVENT_MME0_SBAB_SERR:
+ case GAUDI_EVENT_MME1_SBAB_SERR:
+ case GAUDI_EVENT_MME2_SBAB_SERR:
+ case GAUDI_EVENT_MME3_SBAB_SERR:
+ index = (event_type - GAUDI_EVENT_MME0_SBAB_SERR) / 4;
+ params.block_address =
+ mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
+ params.num_memories = 33;
+ params.derr = false;
+ extract_info_from_fw = false;
+ break;
+ case GAUDI_EVENT_MME0_SBAB_DERR:
+ case GAUDI_EVENT_MME1_SBAB_DERR:
+ case GAUDI_EVENT_MME2_SBAB_DERR:
+ case GAUDI_EVENT_MME3_SBAB_DERR:
+ index = (event_type - GAUDI_EVENT_MME0_SBAB_DERR) / 4;
+ params.block_address =
+ mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
+ params.num_memories = 33;
+ params.derr = true;
+ extract_info_from_fw = false;
+ break;
+ default:
+ return;
+ }
+
+extract_ecc_info:
+ if (extract_info_from_fw) {
+ ecc_address = le64_to_cpu(ecc_data->ecc_address);
+ ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
+ memory_wrapper_idx = ecc_data->memory_wrapper_idx;
+ } else {
+ rc = gaudi_extract_ecc_info(hdev, ¶ms, &ecc_address,
+ &ecc_syndrom, &memory_wrapper_idx);
+ if (rc)
+ return;
+ }
+
+ dev_err(hdev->dev,
+ "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u\n",
+ ecc_address, ecc_syndrom, memory_wrapper_idx);
+}
+
+static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
+{
+ u64 qman_base;
+ char desc[32];
+ u32 qid_base;
+ u8 index;
+
+ switch (event_type) {
+ case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
+ index = event_type - GAUDI_EVENT_TPC0_QM;
+ qid_base = GAUDI_QUEUE_ID_TPC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmTPC0_QM_BASE + index * TPC_QMAN_OFFSET;
+ snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC_QM", index);
+ break;
+ case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
+ if (event_type == GAUDI_EVENT_MME0_QM) {
+ index = 0;
+ qid_base = GAUDI_QUEUE_ID_MME_0_0;
+ } else { /* event_type == GAUDI_EVENT_MME2_QM */
+ index = 2;
+ qid_base = GAUDI_QUEUE_ID_MME_1_0;
+ }
+ qman_base = mmMME0_QM_BASE + index * MME_QMAN_OFFSET;
+ snprintf(desc, ARRAY_SIZE(desc), "%s%d", "MME_QM", index);
+ break;
+ case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
+ index = event_type - GAUDI_EVENT_DMA0_QM;
+ qid_base = GAUDI_QUEUE_ID_DMA_0_0 + index * QMAN_STREAMS;
+ /* skip GAUDI_QUEUE_ID_CPU_PQ if necessary */
+ if (index > 1)
+ qid_base++;
+ qman_base = mmDMA0_QM_BASE + index * DMA_QMAN_OFFSET;
+ snprintf(desc, ARRAY_SIZE(desc), "%s%d", "DMA_QM", index);
+ break;
+ case GAUDI_EVENT_NIC0_QM0:
+ qid_base = GAUDI_QUEUE_ID_NIC_0_0;
+ qman_base = mmNIC0_QM0_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM0");
+ break;
+ case GAUDI_EVENT_NIC0_QM1:
+ qid_base = GAUDI_QUEUE_ID_NIC_1_0;
+ qman_base = mmNIC0_QM1_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC0_QM1");
+ break;
+ case GAUDI_EVENT_NIC1_QM0:
+ qid_base = GAUDI_QUEUE_ID_NIC_2_0;
+ qman_base = mmNIC1_QM0_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM0");
+ break;
+ case GAUDI_EVENT_NIC1_QM1:
+ qid_base = GAUDI_QUEUE_ID_NIC_3_0;
+ qman_base = mmNIC1_QM1_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC1_QM1");
+ break;
+ case GAUDI_EVENT_NIC2_QM0:
+ qid_base = GAUDI_QUEUE_ID_NIC_4_0;
+ qman_base = mmNIC2_QM0_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM0");
+ break;
+ case GAUDI_EVENT_NIC2_QM1:
+ qid_base = GAUDI_QUEUE_ID_NIC_5_0;
+ qman_base = mmNIC2_QM1_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC2_QM1");
+ break;
+ case GAUDI_EVENT_NIC3_QM0:
+ qid_base = GAUDI_QUEUE_ID_NIC_6_0;
+ qman_base = mmNIC3_QM0_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM0");
+ break;
+ case GAUDI_EVENT_NIC3_QM1:
+ qid_base = GAUDI_QUEUE_ID_NIC_7_0;
+ qman_base = mmNIC3_QM1_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC3_QM1");
+ break;
+ case GAUDI_EVENT_NIC4_QM0:
+ qid_base = GAUDI_QUEUE_ID_NIC_8_0;
+ qman_base = mmNIC4_QM0_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM0");
+ break;
+ case GAUDI_EVENT_NIC4_QM1:
+ qid_base = GAUDI_QUEUE_ID_NIC_9_0;
+ qman_base = mmNIC4_QM1_BASE;
+ snprintf(desc, ARRAY_SIZE(desc), "NIC4_QM1");
+ break;
+ default:
+ return;
+ }
+
+ gaudi_handle_qman_err_generic(hdev, desc, qman_base, qid_base, event_mask);
+}
+
+static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
+ bool razwi, u64 *event_mask)
+{
+ bool is_read = false, is_write = false;
+ u16 engine_id[2], num_of_razwi_eng = 0;
+ char desc[64] = "";
+ u64 razwi_addr = 0;
+ u8 razwi_flags = 0;
+
+ /*
+ * Init engine id by default as not valid and only if razwi initiated from engine with
+ * engine id it will get valid value.
+ */
+ engine_id[0] = HL_RAZWI_NA_ENG_ID;
+ engine_id[1] = HL_RAZWI_NA_ENG_ID;
+
+ gaudi_get_event_desc(event_type, desc, sizeof(desc));
+ dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+ event_type, desc);
+
+ if (razwi) {
+ gaudi_print_and_get_razwi_info(hdev, &engine_id[0], &engine_id[1], &is_read,
+ &is_write);
+ gaudi_print_and_get_mmu_error_info(hdev, &razwi_addr, event_mask);
+
+ if (is_read)
+ razwi_flags |= HL_RAZWI_READ;
+ if (is_write)
+ razwi_flags |= HL_RAZWI_WRITE;
+
+ if (engine_id[0] != HL_RAZWI_NA_ENG_ID) {
+ if (engine_id[1] != HL_RAZWI_NA_ENG_ID)
+ num_of_razwi_eng = 2;
+ else
+ num_of_razwi_eng = 1;
+ }
+
+ hl_handle_razwi(hdev, razwi_addr, engine_id, num_of_razwi_eng, razwi_flags,
+ event_mask);
+ }
+}
+
+static void gaudi_print_out_of_sync_info(struct hl_device *hdev,
+ struct cpucp_pkt_sync_err *sync_err)
+{
+ struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI_QUEUE_ID_CPU_PQ];
+
+ dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
+ le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
+}
+
+static void gaudi_print_fw_alive_info(struct hl_device *hdev,
+ struct hl_eq_fw_alive *fw_alive)
+{
+ dev_err(hdev->dev,
+ "FW alive report: severity=%s, process_id=%u, thread_id=%u, uptime=%llu seconds\n",
+ (fw_alive->severity == FW_ALIVE_SEVERITY_MINOR) ? "Minor" : "Critical",
+ le32_to_cpu(fw_alive->process_id),
+ le32_to_cpu(fw_alive->thread_id),
+ le64_to_cpu(fw_alive->uptime_seconds));
+}
+
+static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type,
+ void *data)
+{
+ char desc[64] = "", *type;
+ struct eq_nic_sei_event *eq_nic_sei = data;
+ u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0;
+
+ switch (eq_nic_sei->axi_error_cause) {
+ case RXB:
+ type = "RXB";
+ break;
+ case RXE:
+ type = "RXE";
+ break;
+ case TXS:
+ type = "TXS";
+ break;
+ case TXE:
+ type = "TXE";
+ break;
+ case QPC_RESP:
+ type = "QPC_RESP";
+ break;
+ case NON_AXI_ERR:
+ type = "NON_AXI_ERR";
+ break;
+ case TMR:
+ type = "TMR";
+ break;
+ default:
+ dev_err(hdev->dev, "unknown NIC AXI cause %d\n",
+ eq_nic_sei->axi_error_cause);
+ type = "N/A";
+ break;
+ }
+
+ snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type,
+ eq_nic_sei->id);
+ dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+ event_type, desc);
+}
+
+static int gaudi_compute_reset_late_init(struct hl_device *hdev)
+{
+ /* GAUDI doesn't support any reset except hard-reset */
+ return -EPERM;
+}
+
+static int gaudi_hbm_read_interrupts(struct hl_device *hdev, int device,
+ struct hl_eq_hbm_ecc_data *hbm_ecc_data)
+{
+ u32 base, val, val2, wr_par, rd_par, ca_par, derr, serr, type, ch;
+ int rc = 0;
+
+ if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
+ CPU_BOOT_DEV_STS0_HBM_ECC_EN) {
+ if (!hbm_ecc_data) {
+ dev_err(hdev->dev, "No FW ECC data");
+ return 0;
+ }
+
+ wr_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_WR_PAR_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ rd_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_RD_PAR_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ ca_par = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_CA_PAR_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ derr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_DERR_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ serr = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_SERR_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ type = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_TYPE_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+ ch = FIELD_GET(CPUCP_PKT_HBM_ECC_INFO_HBM_CH_MASK,
+ le32_to_cpu(hbm_ecc_data->hbm_ecc_info));
+
+ dev_err(hdev->dev,
+ "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
+ device, ch, wr_par, rd_par, ca_par, serr, derr);
+ dev_err(hdev->dev,
+ "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%u, SEC_CNT=%d, DEC_CNT=%d\n",
+ device, ch, hbm_ecc_data->first_addr, type,
+ hbm_ecc_data->sec_cont_cnt, hbm_ecc_data->sec_cnt,
+ hbm_ecc_data->dec_cnt);
+ return 0;
+ }
+
+ if (hdev->asic_prop.fw_security_enabled) {
+ dev_info(hdev->dev, "Cannot access MC regs for ECC data while security is enabled\n");
+ return 0;
+ }
+
+ base = GAUDI_HBM_CFG_BASE + device * GAUDI_HBM_CFG_OFFSET;
+ for (ch = 0 ; ch < GAUDI_HBM_CHANNELS ; ch++) {
+ val = RREG32_MASK(base + ch * 0x1000 + 0x06C, 0x0000FFFF);
+ val = (val & 0xFF) | ((val >> 8) & 0xFF);
+ if (val) {
+ rc = -EIO;
+ dev_err(hdev->dev,
+ "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
+ device, ch * 2, val & 0x1, (val >> 1) & 0x1,
+ (val >> 2) & 0x1, (val >> 3) & 0x1,
+ (val >> 4) & 0x1);
+
+ val2 = RREG32(base + ch * 0x1000 + 0x060);
+ dev_err(hdev->dev,
+ "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
+ device, ch * 2,
+ RREG32(base + ch * 0x1000 + 0x064),
+ (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
+ (val2 & 0xFF0000) >> 16,
+ (val2 & 0xFF000000) >> 24);
+ }
+
+ val = RREG32_MASK(base + ch * 0x1000 + 0x07C, 0x0000FFFF);
+ val = (val & 0xFF) | ((val >> 8) & 0xFF);
+ if (val) {
+ rc = -EIO;
+ dev_err(hdev->dev,
+ "HBM%d pc%d interrupts info: WR_PAR=%d, RD_PAR=%d, CA_PAR=%d, SERR=%d, DERR=%d\n",
+ device, ch * 2 + 1, val & 0x1, (val >> 1) & 0x1,
+ (val >> 2) & 0x1, (val >> 3) & 0x1,
+ (val >> 4) & 0x1);
+
+ val2 = RREG32(base + ch * 0x1000 + 0x070);
+ dev_err(hdev->dev,
+ "HBM%d pc%d ECC info: 1ST_ERR_ADDR=0x%x, 1ST_ERR_TYPE=%d, SEC_CONT_CNT=%d, SEC_CNT=%d, DEC_CNT=%d\n",
+ device, ch * 2 + 1,
+ RREG32(base + ch * 0x1000 + 0x074),
+ (val2 & 0x200) >> 9, (val2 & 0xFC00) >> 10,
+ (val2 & 0xFF0000) >> 16,
+ (val2 & 0xFF000000) >> 24);
+ }
+
+ /* Clear interrupts */
+ RMWREG32(base + (ch * 0x1000) + 0x060, 0x1C8, 0x1FF);
+ RMWREG32(base + (ch * 0x1000) + 0x070, 0x1C8, 0x1FF);
+ WREG32(base + (ch * 0x1000) + 0x06C, 0x1F1F);
+ WREG32(base + (ch * 0x1000) + 0x07C, 0x1F1F);
+ RMWREG32(base + (ch * 0x1000) + 0x060, 0x0, 0xF);
+ RMWREG32(base + (ch * 0x1000) + 0x070, 0x0, 0xF);
+ }
+
+ val = RREG32(base + 0x8F30);
+ val2 = RREG32(base + 0x8F34);
+ if (val | val2) {
+ rc = -EIO;
+ dev_err(hdev->dev,
+ "HBM %d MC SRAM SERR info: Reg 0x8F30=0x%x, Reg 0x8F34=0x%x\n",
+ device, val, val2);
+ }
+ val = RREG32(base + 0x8F40);
+ val2 = RREG32(base + 0x8F44);
+ if (val | val2) {
+ rc = -EIO;
+ dev_err(hdev->dev,
+ "HBM %d MC SRAM DERR info: Reg 0x8F40=0x%x, Reg 0x8F44=0x%x\n",
+ device, val, val2);
+ }
+
+ return rc;
+}
+
+static int gaudi_hbm_event_to_dev(u16 hbm_event_type)
+{
+ switch (hbm_event_type) {
+ case GAUDI_EVENT_HBM0_SPI_0:
+ case GAUDI_EVENT_HBM0_SPI_1:
+ return 0;
+ case GAUDI_EVENT_HBM1_SPI_0:
+ case GAUDI_EVENT_HBM1_SPI_1:
+ return 1;
+ case GAUDI_EVENT_HBM2_SPI_0:
+ case GAUDI_EVENT_HBM2_SPI_1:
+ return 2;
+ case GAUDI_EVENT_HBM3_SPI_0:
+ case GAUDI_EVENT_HBM3_SPI_1:
+ return 3;
+ default:
+ break;
+ }
+
+ /* Should never happen */
+ return 0;
+}
+
+static bool gaudi_tpc_read_interrupts(struct hl_device *hdev, u8 tpc_id,
+ char *interrupt_name)
+{
+ u32 tpc_offset = tpc_id * TPC_CFG_OFFSET, tpc_interrupts_cause, i;
+ bool soft_reset_required = false;
+
+ tpc_interrupts_cause = RREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset) &
+ TPC0_CFG_TPC_INTR_CAUSE_CAUSE_MASK;
+
+ for (i = 0 ; i < GAUDI_NUM_OF_TPC_INTR_CAUSE ; i++)
+ if (tpc_interrupts_cause & BIT(i)) {
+ dev_err_ratelimited(hdev->dev,
+ "TPC%d_%s interrupt cause: %s\n",
+ tpc_id, interrupt_name,
+ gaudi_tpc_interrupts_cause[i]);
+ /* If this is QM error, we need to soft-reset */
+ if (i == 15)
+ soft_reset_required = true;
+ }
+
+ /* Clear interrupts */
+ WREG32(mmTPC0_CFG_TPC_INTR_CAUSE + tpc_offset, 0);
+
+ return soft_reset_required;
+}
+
+static int tpc_dec_event_to_tpc_id(u16 tpc_dec_event_type)
+{
+ return (tpc_dec_event_type - GAUDI_EVENT_TPC0_DEC) >> 1;
+}
+
+static int tpc_krn_event_to_tpc_id(u16 tpc_dec_event_type)
+{
+ return (tpc_dec_event_type - GAUDI_EVENT_TPC0_KRN_ERR) / 6;
+}
+
+static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
+{
+ ktime_t zero_time = ktime_set(0, 0);
+
+ mutex_lock(&hdev->clk_throttling.lock);
+
+ switch (event_type) {
+ case GAUDI_EVENT_FIX_POWER_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
+ dev_info_ratelimited(hdev->dev,
+ "Clock throttling due to power consumption\n");
+ break;
+
+ case GAUDI_EVENT_FIX_POWER_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
+ dev_info_ratelimited(hdev->dev,
+ "Power envelop is safe, back to optimal clock\n");
+ break;
+
+ case GAUDI_EVENT_FIX_THERMAL_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
+ *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ dev_info_ratelimited(hdev->dev,
+ "Clock throttling due to overheating\n");
+ break;
+
+ case GAUDI_EVENT_FIX_THERMAL_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
+ *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ dev_info_ratelimited(hdev->dev,
+ "Thermal envelop is safe, back to optimal clock\n");
+ break;
+
+ default:
+ dev_err(hdev->dev, "Received invalid clock change event %d\n",
+ event_type);
+ break;
+ }
+
+ mutex_unlock(&hdev->clk_throttling.lock);
+}
+
+static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u64 data = le64_to_cpu(eq_entry->data[0]), event_mask = 0;
+ u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
+ u32 fw_fatal_err_flag = 0, flags = 0;
+ u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
+ >> EQ_CTL_EVENT_TYPE_SHIFT);
+ bool reset_required, reset_direct = false;
+ u8 cause;
+ int rc;
+
+ if (event_type >= GAUDI_EVENT_SIZE) {
+ dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
+ event_type, GAUDI_EVENT_SIZE - 1);
+ return;
+ }
+
+ gaudi->events_stat[event_type]++;
+ gaudi->events_stat_aggregate[event_type]++;
+
+ switch (event_type) {
+ case GAUDI_EVENT_PCIE_CORE_DERR:
+ case GAUDI_EVENT_PCIE_IF_DERR:
+ case GAUDI_EVENT_PCIE_PHY_DERR:
+ case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
+ case GAUDI_EVENT_MME0_ACC_DERR:
+ case GAUDI_EVENT_MME0_SBAB_DERR:
+ case GAUDI_EVENT_MME1_ACC_DERR:
+ case GAUDI_EVENT_MME1_SBAB_DERR:
+ case GAUDI_EVENT_MME2_ACC_DERR:
+ case GAUDI_EVENT_MME2_SBAB_DERR:
+ case GAUDI_EVENT_MME3_ACC_DERR:
+ case GAUDI_EVENT_MME3_SBAB_DERR:
+ case GAUDI_EVENT_DMA0_DERR_ECC ... GAUDI_EVENT_DMA7_DERR_ECC:
+ fallthrough;
+ case GAUDI_EVENT_CPU_IF_ECC_DERR:
+ case GAUDI_EVENT_PSOC_MEM_DERR:
+ case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
+ case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
+ case GAUDI_EVENT_NIC0_DERR ... GAUDI_EVENT_NIC4_DERR:
+ case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
+ case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
+ case GAUDI_EVENT_MMU_DERR:
+ case GAUDI_EVENT_NIC0_CS_DBG_DERR ... GAUDI_EVENT_NIC4_CS_DBG_DERR:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_GIC500:
+ case GAUDI_EVENT_AXI_ECC:
+ case GAUDI_EVENT_L2_RAM_ECC:
+ case GAUDI_EVENT_PLL0 ... GAUDI_EVENT_PLL17:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_HBM0_SPI_0:
+ case GAUDI_EVENT_HBM1_SPI_0:
+ case GAUDI_EVENT_HBM2_SPI_0:
+ case GAUDI_EVENT_HBM3_SPI_0:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ gaudi_hbm_read_interrupts(hdev,
+ gaudi_hbm_event_to_dev(event_type),
+ &eq_entry->hbm_ecc_data);
+ fw_fatal_err_flag = HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_HBM0_SPI_1:
+ case GAUDI_EVENT_HBM1_SPI_1:
+ case GAUDI_EVENT_HBM2_SPI_1:
+ case GAUDI_EVENT_HBM3_SPI_1:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ gaudi_hbm_read_interrupts(hdev,
+ gaudi_hbm_event_to_dev(event_type),
+ &eq_entry->hbm_ecc_data);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI_EVENT_TPC0_DEC:
+ case GAUDI_EVENT_TPC1_DEC:
+ case GAUDI_EVENT_TPC2_DEC:
+ case GAUDI_EVENT_TPC3_DEC:
+ case GAUDI_EVENT_TPC4_DEC:
+ case GAUDI_EVENT_TPC5_DEC:
+ case GAUDI_EVENT_TPC6_DEC:
+ case GAUDI_EVENT_TPC7_DEC:
+ /* In TPC DEC event, notify on TPC assertion. While there isn't
+ * a specific event for assertion yet, the FW generates TPC DEC event.
+ * The SW upper layer will inspect an internal mapped area to indicate
+ * if the event is a TPC Assertion or a "real" TPC DEC.
+ */
+ event_mask |= HL_NOTIFIER_EVENT_TPC_ASSERT;
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ reset_required = gaudi_tpc_read_interrupts(hdev,
+ tpc_dec_event_to_tpc_id(event_type),
+ "AXI_SLV_DEC_Error");
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ if (reset_required) {
+ dev_err(hdev->dev, "reset required due to %s\n",
+ gaudi_irq_map_table[event_type].name);
+
+ reset_direct = true;
+ goto reset_device;
+ } else {
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+ }
+ break;
+
+ case GAUDI_EVENT_TPC0_KRN_ERR:
+ case GAUDI_EVENT_TPC1_KRN_ERR:
+ case GAUDI_EVENT_TPC2_KRN_ERR:
+ case GAUDI_EVENT_TPC3_KRN_ERR:
+ case GAUDI_EVENT_TPC4_KRN_ERR:
+ case GAUDI_EVENT_TPC5_KRN_ERR:
+ case GAUDI_EVENT_TPC6_KRN_ERR:
+ case GAUDI_EVENT_TPC7_KRN_ERR:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ reset_required = gaudi_tpc_read_interrupts(hdev,
+ tpc_krn_event_to_tpc_id(event_type),
+ "KRN_ERR");
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ if (reset_required) {
+ dev_err(hdev->dev, "reset required due to %s\n",
+ gaudi_irq_map_table[event_type].name);
+
+ reset_direct = true;
+ goto reset_device;
+ } else {
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+ }
+ break;
+
+ case GAUDI_EVENT_PCIE_CORE_SERR:
+ case GAUDI_EVENT_PCIE_IF_SERR:
+ case GAUDI_EVENT_PCIE_PHY_SERR:
+ case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
+ case GAUDI_EVENT_MME0_ACC_SERR:
+ case GAUDI_EVENT_MME0_SBAB_SERR:
+ case GAUDI_EVENT_MME1_ACC_SERR:
+ case GAUDI_EVENT_MME1_SBAB_SERR:
+ case GAUDI_EVENT_MME2_ACC_SERR:
+ case GAUDI_EVENT_MME2_SBAB_SERR:
+ case GAUDI_EVENT_MME3_ACC_SERR:
+ case GAUDI_EVENT_MME3_SBAB_SERR:
+ case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_DMA7_SERR_ECC:
+ case GAUDI_EVENT_CPU_IF_ECC_SERR:
+ case GAUDI_EVENT_PSOC_MEM_SERR:
+ case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
+ case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
+ case GAUDI_EVENT_NIC0_SERR ... GAUDI_EVENT_NIC4_SERR:
+ case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
+ case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
+ fallthrough;
+ case GAUDI_EVENT_MMU_SERR:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI_EVENT_PCIE_DEC:
+ case GAUDI_EVENT_CPU_AXI_SPLITTER:
+ case GAUDI_EVENT_PSOC_AXI_DEC:
+ case GAUDI_EVENT_PSOC_PRSTN_FALL:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI_EVENT_MMU_PAGE_FAULT:
+ case GAUDI_EVENT_MMU_WR_PERM:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI_EVENT_MME0_WBC_RSP:
+ case GAUDI_EVENT_MME0_SBAB0_RSP:
+ case GAUDI_EVENT_MME1_WBC_RSP:
+ case GAUDI_EVENT_MME1_SBAB0_RSP:
+ case GAUDI_EVENT_MME2_WBC_RSP:
+ case GAUDI_EVENT_MME2_SBAB0_RSP:
+ case GAUDI_EVENT_MME3_WBC_RSP:
+ case GAUDI_EVENT_MME3_SBAB0_RSP:
+ case GAUDI_EVENT_RAZWI_OR_ADC:
+ case GAUDI_EVENT_MME0_QM ... GAUDI_EVENT_MME2_QM:
+ case GAUDI_EVENT_DMA0_QM ... GAUDI_EVENT_DMA7_QM:
+ fallthrough;
+ case GAUDI_EVENT_NIC0_QM0:
+ case GAUDI_EVENT_NIC0_QM1:
+ case GAUDI_EVENT_NIC1_QM0:
+ case GAUDI_EVENT_NIC1_QM1:
+ case GAUDI_EVENT_NIC2_QM0:
+ case GAUDI_EVENT_NIC2_QM1:
+ case GAUDI_EVENT_NIC3_QM0:
+ case GAUDI_EVENT_NIC3_QM1:
+ case GAUDI_EVENT_NIC4_QM0:
+ case GAUDI_EVENT_NIC4_QM1:
+ case GAUDI_EVENT_DMA0_CORE ... GAUDI_EVENT_DMA7_CORE:
+ case GAUDI_EVENT_TPC0_QM ... GAUDI_EVENT_TPC7_QM:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ gaudi_handle_qman_err(hdev, event_type, &event_mask);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= (HL_NOTIFIER_EVENT_USER_ENGINE_ERR | HL_NOTIFIER_EVENT_DEVICE_RESET);
+ break;
+
+ case GAUDI_EVENT_RAZWI_OR_ADC_SW:
+ gaudi_print_irq_info(hdev, event_type, true, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_TPC0_BMON_SPMU:
+ case GAUDI_EVENT_TPC1_BMON_SPMU:
+ case GAUDI_EVENT_TPC2_BMON_SPMU:
+ case GAUDI_EVENT_TPC3_BMON_SPMU:
+ case GAUDI_EVENT_TPC4_BMON_SPMU:
+ case GAUDI_EVENT_TPC5_BMON_SPMU:
+ case GAUDI_EVENT_TPC6_BMON_SPMU:
+ case GAUDI_EVENT_TPC7_BMON_SPMU:
+ case GAUDI_EVENT_DMA_BM_CH0 ... GAUDI_EVENT_DMA_BM_CH7:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4:
+ gaudi_print_nic_axi_irq_info(hdev, event_type, &data);
+ hl_fw_unmask_irq(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI_EVENT_DMA_IF_SEI_0 ... GAUDI_EVENT_DMA_IF_SEI_3:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ gaudi_print_sm_sei_info(hdev, event_type,
+ &eq_entry->sm_sei_data);
+ rc = hl_state_dump(hdev);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ if (rc)
+ dev_err(hdev->dev,
+ "Error during system state dump %d\n", rc);
+ hl_fw_unmask_irq(hdev, event_type);
+ break;
+
+ case GAUDI_EVENT_STATUS_NIC0_ENG0 ... GAUDI_EVENT_STATUS_NIC4_ENG1:
+ break;
+
+ case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E:
+ gaudi_print_clk_change_info(hdev, event_type, &event_mask);
+ hl_fw_unmask_irq(hdev, event_type);
+ break;
+
+ case GAUDI_EVENT_PSOC_GPIO_U16_0:
+ cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
+ dev_err(hdev->dev,
+ "Received high temp H/W interrupt %d (cause %d)\n",
+ event_type, cause);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI_EVENT_DEV_RESET_REQ:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_PKT_QUEUE_OUT_SYNC:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ gaudi_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ goto reset_device;
+
+ case GAUDI_EVENT_FW_ALIVE_S:
+ gaudi_print_irq_info(hdev, event_type, false, &event_mask);
+ gaudi_print_fw_alive_info(hdev, &eq_entry->fw_alive);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ goto reset_device;
+
+ default:
+ dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
+ event_type);
+ break;
+ }
+
+ if (event_mask)
+ hl_notifier_event_send_all(hdev, event_mask);
+
+ return;
+
+reset_device:
+ reset_required = true;
+
+ if (hdev->asic_prop.fw_security_enabled && !reset_direct) {
+ flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW | fw_fatal_err_flag;
+
+ /* notify on device unavailable while the reset triggered by fw */
+ event_mask |= (HL_NOTIFIER_EVENT_DEVICE_RESET |
+ HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE);
+ } else if (hdev->hard_reset_on_fw_events) {
+ flags = HL_DRV_RESET_HARD | HL_DRV_RESET_DELAY | fw_fatal_err_flag;
+ event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+ } else {
+ reset_required = false;
+ }
+
+ if (reset_required) {
+ hl_device_cond_reset(hdev, flags, event_mask);
+ } else {
+ hl_fw_unmask_irq(hdev, event_type);
+ /* Notification on occurred event needs to be sent although reset is not executed */
+ if (event_mask)
+ hl_notifier_event_send_all(hdev, event_mask);
+ }
+}
+
+static void *gaudi_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (aggregate) {
+ *size = (u32) sizeof(gaudi->events_stat_aggregate);
+ return gaudi->events_stat_aggregate;
+ }
+
+ *size = (u32) sizeof(gaudi->events_stat);
+ return gaudi->events_stat;
+}
+
+static int gaudi_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ u32 status, timeout_usec;
+ int rc;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU) ||
+ hdev->reset_info.hard_reset_pending)
+ return 0;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
+
+ /* L0 & L1 invalidation */
+ WREG32(mmSTLB_INV_PS, 3);
+ WREG32(mmSTLB_CACHE_INV, gaudi->mmu_cache_inv_pi++);
+ WREG32(mmSTLB_INV_PS, 2);
+
+ rc = hl_poll_timeout(
+ hdev,
+ mmSTLB_INV_PS,
+ status,
+ !status,
+ 1000,
+ timeout_usec);
+
+ WREG32(mmSTLB_INV_SET, 0);
+
+ return rc;
+}
+
+static int gaudi_mmu_invalidate_cache_range(struct hl_device *hdev,
+ bool is_hard, u32 flags,
+ u32 asid, u64 va, u64 size)
+{
+ /* Treat as invalidate all because there is no range invalidation
+ * in Gaudi
+ */
+ return hdev->asic_funcs->mmu_invalidate_cache(hdev, is_hard, flags);
+}
+
+static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid, u64 phys_addr)
+{
+ u32 status, timeout_usec;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
+
+ WREG32(MMU_ASID, asid);
+ WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
+ WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
+ WREG32(MMU_BUSY, 0x80000000);
+
+ rc = hl_poll_timeout(
+ hdev,
+ MMU_BUSY,
+ status,
+ !(status & 0x80000000),
+ 1000,
+ timeout_usec);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout during MMU hop0 config of asid %d\n", asid);
+ return rc;
+ }
+
+ return 0;
+}
+
+static int gaudi_send_heartbeat(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_send_heartbeat(hdev);
+}
+
+static int gaudi_cpucp_info_get(struct hl_device *hdev)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int rc;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
+ mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
+ mmCPU_BOOT_ERR1);
+ if (rc)
+ return rc;
+
+ if (!strlen(prop->cpucp_info.card_name))
+ strncpy(prop->cpucp_info.card_name, GAUDI_DEFAULT_CARD_NAME,
+ CARD_NAME_MAX_LEN);
+
+ hdev->card_type = le32_to_cpu(hdev->asic_prop.cpucp_info.card_type);
+
+ set_default_power_values(hdev);
+
+ return 0;
+}
+
+static bool gaudi_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
+ struct engines_data *e)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ const char *fmt = "%-5d%-9s%#-14x%#-12x%#x\n";
+ const char *mme_slave_fmt = "%-5d%-9s%-14s%-12s%#x\n";
+ const char *nic_fmt = "%-5d%-9s%#-14x%#x\n";
+ unsigned long *mask = (unsigned long *)mask_arr;
+ u32 qm_glbl_sts0, qm_cgm_sts, dma_core_sts0, tpc_cfg_sts, mme_arch_sts;
+ bool is_idle = true, is_eng_idle, is_slave;
+ u64 offset;
+ int i, dma_id, port;
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nDMA is_idle QM_GLBL_STS0 QM_CGM_STS DMA_CORE_STS0\n"
+ "--- ------- ------------ ---------- -------------\n");
+
+ for (i = 0 ; i < DMA_NUMBER_OF_CHNLS ; i++) {
+ dma_id = gaudi_dma_assignment[i];
+ offset = dma_id * DMA_QMAN_OFFSET;
+
+ qm_glbl_sts0 = RREG32(mmDMA0_QM_GLBL_STS0 + offset);
+ qm_cgm_sts = RREG32(mmDMA0_QM_CGM_STS + offset);
+ dma_core_sts0 = RREG32(mmDMA0_CORE_STS0 + offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
+ IS_DMA_IDLE(dma_core_sts0);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GAUDI_ENGINE_ID_DMA_0 + dma_id, mask);
+ if (e)
+ hl_engine_data_sprintf(e, fmt, dma_id,
+ is_eng_idle ? "Y" : "N", qm_glbl_sts0,
+ qm_cgm_sts, dma_core_sts0);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nTPC is_idle QM_GLBL_STS0 QM_CGM_STS CFG_STATUS\n"
+ "--- ------- ------------ ---------- ----------\n");
+
+ for (i = 0 ; i < TPC_NUMBER_OF_ENGINES ; i++) {
+ offset = i * TPC_QMAN_OFFSET;
+ qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + offset);
+ qm_cgm_sts = RREG32(mmTPC0_QM_CGM_STS + offset);
+ tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts) &&
+ IS_TPC_IDLE(tpc_cfg_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GAUDI_ENGINE_ID_TPC_0 + i, mask);
+ if (e)
+ hl_engine_data_sprintf(e, fmt, i,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nMME is_idle QM_GLBL_STS0 QM_CGM_STS ARCH_STATUS\n"
+ "--- ------- ------------ ---------- -----------\n");
+
+ for (i = 0 ; i < MME_NUMBER_OF_ENGINES ; i++) {
+ offset = i * MME_QMAN_OFFSET;
+ mme_arch_sts = RREG32(mmMME0_CTRL_ARCH_STATUS + offset);
+ is_eng_idle = IS_MME_IDLE(mme_arch_sts);
+
+ /* MME 1 & 3 are slaves, no need to check their QMANs */
+ is_slave = i % 2;
+ if (!is_slave) {
+ qm_glbl_sts0 = RREG32(mmMME0_QM_GLBL_STS0 + offset);
+ qm_cgm_sts = RREG32(mmMME0_QM_CGM_STS + offset);
+ is_eng_idle &= IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+ }
+
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GAUDI_ENGINE_ID_MME_0 + i, mask);
+ if (e) {
+ if (!is_slave)
+ hl_engine_data_sprintf(e, fmt, i,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts, mme_arch_sts);
+ else
+ hl_engine_data_sprintf(e, mme_slave_fmt, i,
+ is_eng_idle ? "Y" : "N", "-",
+ "-", mme_arch_sts);
+ }
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nNIC is_idle QM_GLBL_STS0 QM_CGM_STS\n"
+ "--- ------- ------------ ----------\n");
+
+ for (i = 0 ; i < (NIC_NUMBER_OF_ENGINES / 2) ; i++) {
+ offset = i * NIC_MACRO_QMAN_OFFSET;
+ port = 2 * i;
+ if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
+ qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
+ qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
+ if (e)
+ hl_engine_data_sprintf(e, nic_fmt, port,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts);
+ }
+
+ port = 2 * i + 1;
+ if (gaudi->hw_cap_initialized & BIT(HW_CAP_NIC_SHIFT + port)) {
+ qm_glbl_sts0 = RREG32(mmNIC0_QM1_GLBL_STS0 + offset);
+ qm_cgm_sts = RREG32(mmNIC0_QM1_CGM_STS + offset);
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_cgm_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GAUDI_ENGINE_ID_NIC_0 + port, mask);
+ if (e)
+ hl_engine_data_sprintf(e, nic_fmt, port,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts);
+ }
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e, "\n");
+
+ return is_idle;
+}
+
+static void gaudi_hw_queues_lock(struct hl_device *hdev)
+ __acquires(&gaudi->hw_queues_lock)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ spin_lock(&gaudi->hw_queues_lock);
+}
+
+static void gaudi_hw_queues_unlock(struct hl_device *hdev)
+ __releases(&gaudi->hw_queues_lock)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ spin_unlock(&gaudi->hw_queues_lock);
+}
+
+static u32 gaudi_get_pci_id(struct hl_device *hdev)
+{
+ return hdev->pdev->device;
+}
+
+static int gaudi_get_eeprom_data(struct hl_device *hdev, void *data,
+ size_t max_size)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_get_eeprom_data(hdev, data, max_size);
+}
+
+static int gaudi_get_monitor_dump(struct hl_device *hdev, void *data)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_get_monitor_dump(hdev, data);
+}
+
+/*
+ * this function should be used only during initialization and/or after reset,
+ * when there are no active users.
+ */
+static int gaudi_run_tpc_kernel(struct hl_device *hdev, u64 tpc_kernel, u32 tpc_id)
+{
+ u64 kernel_timeout;
+ u32 status, offset;
+ int rc;
+
+ offset = tpc_id * (mmTPC1_CFG_STATUS - mmTPC0_CFG_STATUS);
+
+ if (hdev->pldm)
+ kernel_timeout = GAUDI_PLDM_TPC_KERNEL_WAIT_USEC;
+ else
+ kernel_timeout = HL_DEVICE_TIMEOUT_USEC;
+
+ WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_LOW + offset,
+ lower_32_bits(tpc_kernel));
+ WREG32(mmTPC0_CFG_QM_KERNEL_BASE_ADDRESS_HIGH + offset,
+ upper_32_bits(tpc_kernel));
+
+ WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_LOW + offset,
+ lower_32_bits(tpc_kernel));
+ WREG32(mmTPC0_CFG_ICACHE_BASE_ADDERESS_HIGH + offset,
+ upper_32_bits(tpc_kernel));
+ /* set a valid LUT pointer, content is of no significance */
+ WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_LO + offset,
+ lower_32_bits(tpc_kernel));
+ WREG32(mmTPC0_CFG_LUT_FUNC256_BASE_ADDR_HI + offset,
+ upper_32_bits(tpc_kernel));
+
+ WREG32(mmTPC0_CFG_QM_SYNC_OBJECT_ADDR + offset,
+ lower_32_bits(CFG_BASE +
+ mmSYNC_MNGR_E_N_SYNC_MNGR_OBJS_SOB_OBJ_0));
+
+ WREG32(mmTPC0_CFG_TPC_CMD + offset,
+ (1 << TPC0_CFG_TPC_CMD_ICACHE_INVALIDATE_SHIFT |
+ 1 << TPC0_CFG_TPC_CMD_ICACHE_PREFETCH_64KB_SHIFT));
+ /* wait a bit for the engine to start executing */
+ usleep_range(1000, 1500);
+
+ /* wait until engine has finished executing */
+ rc = hl_poll_timeout(
+ hdev,
+ mmTPC0_CFG_STATUS + offset,
+ status,
+ (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
+ TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
+ 1000,
+ kernel_timeout);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout while waiting for TPC%d icache prefetch\n",
+ tpc_id);
+ return -EIO;
+ }
+
+ WREG32(mmTPC0_CFG_TPC_EXECUTE + offset,
+ 1 << TPC0_CFG_TPC_EXECUTE_V_SHIFT);
+
+ /* wait a bit for the engine to start executing */
+ usleep_range(1000, 1500);
+
+ /* wait until engine has finished executing */
+ rc = hl_poll_timeout(
+ hdev,
+ mmTPC0_CFG_STATUS + offset,
+ status,
+ (status & TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK) ==
+ TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK,
+ 1000,
+ kernel_timeout);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout while waiting for TPC%d vector pipe\n",
+ tpc_id);
+ return -EIO;
+ }
+
+ rc = hl_poll_timeout(
+ hdev,
+ mmTPC0_CFG_WQ_INFLIGHT_CNTR + offset,
+ status,
+ (status == 0),
+ 1000,
+ kernel_timeout);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout while waiting for TPC%d kernel to execute\n",
+ tpc_id);
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int gaudi_internal_cb_pool_init(struct hl_device *hdev,
+ struct hl_ctx *ctx)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+ int min_alloc_order, rc, collective_cb_size;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+ return 0;
+
+ hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
+ HOST_SPACE_INTERNAL_CB_SZ,
+ &hdev->internal_cb_pool_dma_addr,
+ GFP_KERNEL | __GFP_ZERO);
+
+ if (!hdev->internal_cb_pool_virt_addr)
+ return -ENOMEM;
+
+ collective_cb_size = sizeof(struct packet_msg_short) * 5 +
+ sizeof(struct packet_fence);
+ min_alloc_order = ilog2(collective_cb_size);
+
+ hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
+ if (!hdev->internal_cb_pool) {
+ dev_err(hdev->dev,
+ "Failed to create internal CB pool\n");
+ rc = -ENOMEM;
+ goto free_internal_cb_pool;
+ }
+
+ rc = gen_pool_add(hdev->internal_cb_pool,
+ (uintptr_t) hdev->internal_cb_pool_virt_addr,
+ HOST_SPACE_INTERNAL_CB_SZ, -1);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to add memory to internal CB pool\n");
+ rc = -EFAULT;
+ goto destroy_internal_cb_pool;
+ }
+
+ hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx,
+ HL_VA_RANGE_TYPE_HOST, HOST_SPACE_INTERNAL_CB_SZ,
+ HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
+
+ if (!hdev->internal_cb_va_base) {
+ rc = -ENOMEM;
+ goto destroy_internal_cb_pool;
+ }
+
+ mutex_lock(&hdev->mmu_lock);
+ rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base,
+ hdev->internal_cb_pool_dma_addr,
+ HOST_SPACE_INTERNAL_CB_SZ);
+
+ hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
+ mutex_unlock(&hdev->mmu_lock);
+
+ if (rc)
+ goto unreserve_internal_cb_pool;
+
+ return 0;
+
+unreserve_internal_cb_pool:
+ hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
+ HOST_SPACE_INTERNAL_CB_SZ);
+destroy_internal_cb_pool:
+ gen_pool_destroy(hdev->internal_cb_pool);
+free_internal_cb_pool:
+ hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
+ hdev->internal_cb_pool_dma_addr);
+
+ return rc;
+}
+
+static void gaudi_internal_cb_pool_fini(struct hl_device *hdev,
+ struct hl_ctx *ctx)
+{
+ struct gaudi_device *gaudi = hdev->asic_specific;
+
+ if (!(gaudi->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ mutex_lock(&hdev->mmu_lock);
+ hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base,
+ HOST_SPACE_INTERNAL_CB_SZ);
+ hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base,
+ HOST_SPACE_INTERNAL_CB_SZ);
+ hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
+ mutex_unlock(&hdev->mmu_lock);
+
+ gen_pool_destroy(hdev->internal_cb_pool);
+
+ hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
+ hdev->internal_cb_pool_dma_addr);
+}
+
+static int gaudi_ctx_init(struct hl_ctx *ctx)
+{
+ int rc;
+
+ if (ctx->asid == HL_KERNEL_ASID_ID)
+ return 0;
+
+ rc = gaudi_internal_cb_pool_init(ctx->hdev, ctx);
+ if (rc)
+ return rc;
+
+ rc = gaudi_restore_user_registers(ctx->hdev);
+ if (rc)
+ gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
+
+ return rc;
+}
+
+static void gaudi_ctx_fini(struct hl_ctx *ctx)
+{
+ if (ctx->asid == HL_KERNEL_ASID_ID)
+ return;
+
+ gaudi_internal_cb_pool_fini(ctx->hdev, ctx);
+}
+
+static int gaudi_pre_schedule_cs(struct hl_cs *cs)
+{
+ return 0;
+}
+
+static u32 gaudi_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
+{
+ return gaudi_cq_assignment[cq_idx];
+}
+
+static u32 gaudi_get_signal_cb_size(struct hl_device *hdev)
+{
+ return sizeof(struct packet_msg_short) +
+ sizeof(struct packet_msg_prot) * 2;
+}
+
+static u32 gaudi_get_wait_cb_size(struct hl_device *hdev)
+{
+ return sizeof(struct packet_msg_short) * 4 +
+ sizeof(struct packet_fence) +
+ sizeof(struct packet_msg_prot) * 2;
+}
+
+static u32 gaudi_get_sob_addr(struct hl_device *hdev, u32 sob_id)
+{
+ return mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 + (sob_id * 4);
+}
+
+static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size, bool eb)
+{
+ struct hl_cb *cb = (struct hl_cb *) data;
+ struct packet_msg_short *pkt;
+ u32 value, ctl, pkt_size = sizeof(*pkt);
+
+ pkt = cb->kernel_address + size;
+ memset(pkt, 0, pkt_size);
+
+ /* Inc by 1, Mode ADD */
+ value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
+ value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
+
+ ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
+ ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
+ ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 3); /* W_S SOB base */
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, eb);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return size + pkt_size;
+}
+
+static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
+ u16 addr)
+{
+ u32 ctl, pkt_size = sizeof(*pkt);
+
+ memset(pkt, 0, pkt_size);
+
+ ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
+ ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 0); /* last pkt MB */
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
+ struct packet_msg_short *pkt, u16 sob_base, u8 sob_mask,
+ u16 sob_val, u16 mon_id)
+{
+ u64 monitor_base;
+ u32 ctl, value, pkt_size = sizeof(*pkt);
+ u16 msg_addr_offset;
+ u8 mask;
+
+ if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
+ dev_err(hdev->dev,
+ "sob_base %u (mask %#x) is not valid\n",
+ sob_base, sob_mask);
+ return 0;
+ }
+
+ /*
+ * monitor_base should be the content of the base0 address registers,
+ * so it will be added to the msg short offsets
+ */
+ monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
+
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
+ monitor_base;
+
+ memset(pkt, 0, pkt_size);
+
+ /* Monitor config packet: bind the monitor to a sync object */
+ value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
+ value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
+ value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MODE_MASK,
+ 0); /* GREATER OR EQUAL*/
+ value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MASK_MASK, mask);
+
+ ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
+ ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
+ ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
+{
+ u32 ctl, cfg, pkt_size = sizeof(*pkt);
+
+ memset(pkt, 0, pkt_size);
+
+ cfg = FIELD_PREP(GAUDI_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
+ cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
+ cfg |= FIELD_PREP(GAUDI_PKT_FENCE_CFG_ID_MASK, 2);
+
+ ctl = FIELD_PREP(GAUDI_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_RB_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI_PKT_CTL_MB_MASK, 1);
+
+ pkt->cfg = cpu_to_le32(cfg);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static int gaudi_get_fence_addr(struct hl_device *hdev, u32 queue_id, u64 *addr)
+{
+ u32 offset, nic_index;
+
+ switch (queue_id) {
+ case GAUDI_QUEUE_ID_DMA_0_0:
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_0;
+ break;
+ case GAUDI_QUEUE_ID_DMA_0_1:
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_1;
+ break;
+ case GAUDI_QUEUE_ID_DMA_0_2:
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_2;
+ break;
+ case GAUDI_QUEUE_ID_DMA_0_3:
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_3;
+ break;
+ case GAUDI_QUEUE_ID_DMA_1_0:
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_0;
+ break;
+ case GAUDI_QUEUE_ID_DMA_1_1:
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_1;
+ break;
+ case GAUDI_QUEUE_ID_DMA_1_2:
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_2;
+ break;
+ case GAUDI_QUEUE_ID_DMA_1_3:
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_3;
+ break;
+ case GAUDI_QUEUE_ID_DMA_5_0:
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_0;
+ break;
+ case GAUDI_QUEUE_ID_DMA_5_1:
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_1;
+ break;
+ case GAUDI_QUEUE_ID_DMA_5_2:
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_2;
+ break;
+ case GAUDI_QUEUE_ID_DMA_5_3:
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_3;
+ break;
+ case GAUDI_QUEUE_ID_TPC_7_0:
+ offset = mmTPC7_QM_CP_FENCE2_RDATA_0;
+ break;
+ case GAUDI_QUEUE_ID_TPC_7_1:
+ offset = mmTPC7_QM_CP_FENCE2_RDATA_1;
+ break;
+ case GAUDI_QUEUE_ID_TPC_7_2:
+ offset = mmTPC7_QM_CP_FENCE2_RDATA_2;
+ break;
+ case GAUDI_QUEUE_ID_TPC_7_3:
+ offset = mmTPC7_QM_CP_FENCE2_RDATA_3;
+ break;
+ case GAUDI_QUEUE_ID_NIC_0_0:
+ case GAUDI_QUEUE_ID_NIC_1_0:
+ case GAUDI_QUEUE_ID_NIC_2_0:
+ case GAUDI_QUEUE_ID_NIC_3_0:
+ case GAUDI_QUEUE_ID_NIC_4_0:
+ case GAUDI_QUEUE_ID_NIC_5_0:
+ case GAUDI_QUEUE_ID_NIC_6_0:
+ case GAUDI_QUEUE_ID_NIC_7_0:
+ case GAUDI_QUEUE_ID_NIC_8_0:
+ case GAUDI_QUEUE_ID_NIC_9_0:
+ nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_0) >> 2;
+ offset = mmNIC0_QM0_CP_FENCE2_RDATA_0 +
+ (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
+ (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+ break;
+ case GAUDI_QUEUE_ID_NIC_0_1:
+ case GAUDI_QUEUE_ID_NIC_1_1:
+ case GAUDI_QUEUE_ID_NIC_2_1:
+ case GAUDI_QUEUE_ID_NIC_3_1:
+ case GAUDI_QUEUE_ID_NIC_4_1:
+ case GAUDI_QUEUE_ID_NIC_5_1:
+ case GAUDI_QUEUE_ID_NIC_6_1:
+ case GAUDI_QUEUE_ID_NIC_7_1:
+ case GAUDI_QUEUE_ID_NIC_8_1:
+ case GAUDI_QUEUE_ID_NIC_9_1:
+ nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_1) >> 2;
+ offset = mmNIC0_QM0_CP_FENCE2_RDATA_1 +
+ (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
+ (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+ break;
+ case GAUDI_QUEUE_ID_NIC_0_2:
+ case GAUDI_QUEUE_ID_NIC_1_2:
+ case GAUDI_QUEUE_ID_NIC_2_2:
+ case GAUDI_QUEUE_ID_NIC_3_2:
+ case GAUDI_QUEUE_ID_NIC_4_2:
+ case GAUDI_QUEUE_ID_NIC_5_2:
+ case GAUDI_QUEUE_ID_NIC_6_2:
+ case GAUDI_QUEUE_ID_NIC_7_2:
+ case GAUDI_QUEUE_ID_NIC_8_2:
+ case GAUDI_QUEUE_ID_NIC_9_2:
+ nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_2) >> 2;
+ offset = mmNIC0_QM0_CP_FENCE2_RDATA_2 +
+ (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
+ (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+ break;
+ case GAUDI_QUEUE_ID_NIC_0_3:
+ case GAUDI_QUEUE_ID_NIC_1_3:
+ case GAUDI_QUEUE_ID_NIC_2_3:
+ case GAUDI_QUEUE_ID_NIC_3_3:
+ case GAUDI_QUEUE_ID_NIC_4_3:
+ case GAUDI_QUEUE_ID_NIC_5_3:
+ case GAUDI_QUEUE_ID_NIC_6_3:
+ case GAUDI_QUEUE_ID_NIC_7_3:
+ case GAUDI_QUEUE_ID_NIC_8_3:
+ case GAUDI_QUEUE_ID_NIC_9_3:
+ nic_index = (queue_id - GAUDI_QUEUE_ID_NIC_0_3) >> 2;
+ offset = mmNIC0_QM0_CP_FENCE2_RDATA_3 +
+ (nic_index >> 1) * NIC_MACRO_QMAN_OFFSET +
+ (nic_index & 0x1) * NIC_ENGINE_QMAN_OFFSET;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ *addr = CFG_BASE + offset;
+
+ return 0;
+}
+
+static u32 gaudi_add_mon_pkts(void *buf, u16 mon_id, u64 fence_addr)
+{
+ u64 monitor_base;
+ u32 size = 0;
+ u16 msg_addr_offset;
+
+ /*
+ * monitor_base should be the content of the base0 address registers,
+ * so it will be added to the msg short offsets
+ */
+ monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
+
+ /* First monitor config packet: low address of the sync */
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
+ monitor_base;
+
+ size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
+ msg_addr_offset);
+
+ /* Second monitor config packet: high address of the sync */
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
+ monitor_base;
+
+ size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
+ msg_addr_offset);
+
+ /*
+ * Third monitor config packet: the payload, i.e. what to write when the
+ * sync triggers
+ */
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
+ monitor_base;
+
+ size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
+
+ return size;
+}
+
+static u32 gaudi_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop)
+{
+ struct hl_cb *cb = (struct hl_cb *) prop->data;
+ void *buf = cb->kernel_address;
+ u64 fence_addr = 0;
+ u32 size = prop->size;
+
+ if (gaudi_get_fence_addr(hdev, prop->q_idx, &fence_addr)) {
+ dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
+ prop->q_idx);
+ return 0;
+ }
+
+ size += gaudi_add_mon_pkts(buf + size, prop->mon_id, fence_addr);
+ size += gaudi_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base,
+ prop->sob_mask, prop->sob_val, prop->mon_id);
+ size += gaudi_add_fence_pkt(buf + size);
+
+ return size;
+}
+
+static void gaudi_reset_sob(struct hl_device *hdev, void *data)
+{
+ struct hl_hw_sob *hw_sob = (struct hl_hw_sob *) data;
+
+ dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx,
+ hw_sob->sob_id);
+
+ WREG32(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0 +
+ hw_sob->sob_id * 4, 0);
+
+ kref_init(&hw_sob->kref);
+}
+
+static u64 gaudi_get_device_time(struct hl_device *hdev)
+{
+ u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
+
+ return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
+}
+
+static int gaudi_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
+ u32 *block_size, u32 *block_id)
+{
+ return -EPERM;
+}
+
+static int gaudi_block_mmap(struct hl_device *hdev,
+ struct vm_area_struct *vma,
+ u32 block_id, u32 block_size)
+{
+ return -EPERM;
+}
+
+static void gaudi_enable_events_from_fw(struct hl_device *hdev)
+{
+ struct cpu_dyn_regs *dyn_regs =
+ &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 irq_handler_offset = hdev->asic_prop.gic_interrupts_enable ?
+ mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR :
+ le32_to_cpu(dyn_regs->gic_host_ints_irq);
+
+ WREG32(irq_handler_offset,
+ gaudi_irq_map_table[GAUDI_EVENT_INTS_REGISTER].cpu_id);
+}
+
+static int gaudi_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
+{
+ return -EINVAL;
+}
+
+static int gaudi_map_pll_idx_to_fw_idx(u32 pll_idx)
+{
+ switch (pll_idx) {
+ case HL_GAUDI_CPU_PLL: return CPU_PLL;
+ case HL_GAUDI_PCI_PLL: return PCI_PLL;
+ case HL_GAUDI_NIC_PLL: return NIC_PLL;
+ case HL_GAUDI_DMA_PLL: return DMA_PLL;
+ case HL_GAUDI_MESH_PLL: return MESH_PLL;
+ case HL_GAUDI_MME_PLL: return MME_PLL;
+ case HL_GAUDI_TPC_PLL: return TPC_PLL;
+ case HL_GAUDI_IF_PLL: return IF_PLL;
+ case HL_GAUDI_SRAM_PLL: return SRAM_PLL;
+ case HL_GAUDI_HBM_PLL: return HBM_PLL;
+ default: return -EINVAL;
+ }
+}
+
+static int gaudi_add_sync_to_engine_map_entry(
+ struct hl_sync_to_engine_map *map, u32 reg_value,
+ enum hl_sync_engine_type engine_type, u32 engine_id)
+{
+ struct hl_sync_to_engine_map_entry *entry;
+
+ /* Reg value represents a partial address of sync object,
+ * it is used as unique identifier. For this we need to
+ * clear the cutoff cfg base bits from the value.
+ */
+ if (reg_value == 0 || reg_value == 0xffffffff)
+ return 0;
+ reg_value -= lower_32_bits(CFG_BASE);
+
+ /* create a new hash entry */
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+ entry->engine_type = engine_type;
+ entry->engine_id = engine_id;
+ entry->sync_id = reg_value;
+ hash_add(map->tb, &entry->node, reg_value);
+
+ return 0;
+}
+
+static int gaudi_gen_sync_to_engine_map(struct hl_device *hdev,
+ struct hl_sync_to_engine_map *map)
+{
+ struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
+ int i, j, rc;
+ u32 reg_value;
+
+ /* Iterate over TPC engines */
+ for (i = 0; i < sds->props[SP_NUM_OF_TPC_ENGINES]; ++i) {
+
+ reg_value = RREG32(sds->props[SP_TPC0_CFG_SO] +
+ sds->props[SP_NEXT_TPC] * i);
+
+ rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
+ ENGINE_TPC, i);
+ if (rc)
+ goto free_sync_to_engine_map;
+ }
+
+ /* Iterate over MME engines */
+ for (i = 0; i < sds->props[SP_NUM_OF_MME_ENGINES]; ++i) {
+ for (j = 0; j < sds->props[SP_SUB_MME_ENG_NUM]; ++j) {
+
+ reg_value = RREG32(sds->props[SP_MME_CFG_SO] +
+ sds->props[SP_NEXT_MME] * i +
+ j * sizeof(u32));
+
+ rc = gaudi_add_sync_to_engine_map_entry(
+ map, reg_value, ENGINE_MME,
+ i * sds->props[SP_SUB_MME_ENG_NUM] + j);
+ if (rc)
+ goto free_sync_to_engine_map;
+ }
+ }
+
+ /* Iterate over DMA engines */
+ for (i = 0; i < sds->props[SP_NUM_OF_DMA_ENGINES]; ++i) {
+ reg_value = RREG32(sds->props[SP_DMA_CFG_SO] +
+ sds->props[SP_DMA_QUEUES_OFFSET] * i);
+ rc = gaudi_add_sync_to_engine_map_entry(map, reg_value,
+ ENGINE_DMA, i);
+ if (rc)
+ goto free_sync_to_engine_map;
+ }
+
+ return 0;
+
+free_sync_to_engine_map:
+ hl_state_dump_free_sync_to_engine_map(map);
+
+ return rc;
+}
+
+static int gaudi_monitor_valid(struct hl_mon_state_dump *mon)
+{
+ return FIELD_GET(
+ SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_VALID_MASK,
+ mon->status);
+}
+
+static void gaudi_fill_sobs_from_mon(char *sobs, struct hl_mon_state_dump *mon)
+{
+ const size_t max_write = 10;
+ u32 gid, mask, sob;
+ int i, offset;
+
+ /* Sync object ID is calculated as follows:
+ * (8 * group_id + cleared bits in mask)
+ */
+ gid = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
+ mon->arm_data);
+ mask = FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
+ mon->arm_data);
+
+ for (i = 0, offset = 0; mask && offset < MONITOR_SOB_STRING_SIZE -
+ max_write; mask >>= 1, i++) {
+ if (!(mask & 1)) {
+ sob = gid * MONITOR_MAX_SOBS + i;
+
+ if (offset > 0)
+ offset += snprintf(sobs + offset, max_write,
+ ", ");
+
+ offset += snprintf(sobs + offset, max_write, "%u", sob);
+ }
+ }
+}
+
+static int gaudi_print_single_monitor(char **buf, size_t *size, size_t *offset,
+ struct hl_device *hdev,
+ struct hl_mon_state_dump *mon)
+{
+ const char *name;
+ char scratch_buf1[BIN_REG_STRING_SIZE],
+ scratch_buf2[BIN_REG_STRING_SIZE];
+ char monitored_sobs[MONITOR_SOB_STRING_SIZE] = {0};
+
+ name = hl_state_dump_get_monitor_name(hdev, mon);
+ if (!name)
+ name = "";
+
+ gaudi_fill_sobs_from_mon(monitored_sobs, mon);
+
+ return hl_snprintf_resize(
+ buf, size, offset,
+ "Mon id: %u%s, wait for group id: %u mask %s to reach val: %u and write %u to address 0x%llx. Pending: %s. Means sync objects [%s] are being monitored.",
+ mon->id, name,
+ FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SID_MASK,
+ mon->arm_data),
+ hl_format_as_binary(
+ scratch_buf1, sizeof(scratch_buf1),
+ FIELD_GET(
+ SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_MASK_MASK,
+ mon->arm_data)),
+ FIELD_GET(SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0_SOD_MASK,
+ mon->arm_data),
+ mon->wr_data,
+ (((u64)mon->wr_addr_high) << 32) | mon->wr_addr_low,
+ hl_format_as_binary(
+ scratch_buf2, sizeof(scratch_buf2),
+ FIELD_GET(
+ SYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_STATUS_0_PENDING_MASK,
+ mon->status)),
+ monitored_sobs);
+}
+
+
+static int gaudi_print_fences_single_engine(
+ struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
+ enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
+ size_t *size, size_t *offset)
+{
+ struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
+ int rc = -ENOMEM, i;
+ u32 *statuses, *fences;
+
+ statuses = kcalloc(sds->props[SP_ENGINE_NUM_OF_QUEUES],
+ sizeof(*statuses), GFP_KERNEL);
+ if (!statuses)
+ goto out;
+
+ fences = kcalloc(sds->props[SP_ENGINE_NUM_OF_FENCES] *
+ sds->props[SP_ENGINE_NUM_OF_QUEUES],
+ sizeof(*fences), GFP_KERNEL);
+ if (!fences)
+ goto free_status;
+
+ for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES]; ++i)
+ statuses[i] = RREG32(status_base_offset + i * sizeof(u32));
+
+ for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_FENCES] *
+ sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i)
+ fences[i] = RREG32(base_offset + i * sizeof(u32));
+
+ /* The actual print */
+ for (i = 0; i < sds->props[SP_ENGINE_NUM_OF_QUEUES]; ++i) {
+ u32 fence_id;
+ u64 fence_cnt, fence_rdata;
+ const char *engine_name;
+
+ if (!FIELD_GET(TPC0_QM_CP_STS_0_FENCE_IN_PROGRESS_MASK,
+ statuses[i]))
+ continue;
+
+ fence_id =
+ FIELD_GET(TPC0_QM_CP_STS_0_FENCE_ID_MASK, statuses[i]);
+ fence_cnt = base_offset + CFG_BASE +
+ sizeof(u32) *
+ (i + fence_id * sds->props[SP_ENGINE_NUM_OF_QUEUES]);
+ fence_rdata = fence_cnt - sds->props[SP_FENCE0_CNT_OFFSET] +
+ sds->props[SP_FENCE0_RDATA_OFFSET];
+ engine_name = hl_sync_engine_to_string(engine_type);
+
+ rc = hl_snprintf_resize(
+ buf, size, offset,
+ "%s%u, stream %u: fence id %u cnt = 0x%llx (%s%u_QM.CP_FENCE%u_CNT_%u) rdata = 0x%llx (%s%u_QM.CP_FENCE%u_RDATA_%u) value = %u, cp_status = %u\n",
+ engine_name, engine_id,
+ i, fence_id,
+ fence_cnt, engine_name, engine_id, fence_id, i,
+ fence_rdata, engine_name, engine_id, fence_id, i,
+ fences[fence_id],
+ statuses[i]);
+ if (rc)
+ goto free_fences;
+ }
+
+ rc = 0;
+
+free_fences:
+ kfree(fences);
+free_status:
+ kfree(statuses);
+out:
+ return rc;
+}
+
+
+static struct hl_state_dump_specs_funcs gaudi_state_dump_funcs = {
+ .monitor_valid = gaudi_monitor_valid,
+ .print_single_monitor = gaudi_print_single_monitor,
+ .gen_sync_to_engine_map = gaudi_gen_sync_to_engine_map,
+ .print_fences_single_engine = gaudi_print_fences_single_engine,
+};
+
+static void gaudi_state_dump_init(struct hl_device *hdev)
+{
+ struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(gaudi_so_id_to_str); ++i)
+ hash_add(sds->so_id_to_str_tb,
+ &gaudi_so_id_to_str[i].node,
+ gaudi_so_id_to_str[i].id);
+
+ for (i = 0; i < ARRAY_SIZE(gaudi_monitor_id_to_str); ++i)
+ hash_add(sds->monitor_id_to_str_tb,
+ &gaudi_monitor_id_to_str[i].node,
+ gaudi_monitor_id_to_str[i].id);
+
+ sds->props = gaudi_state_dump_specs_props;
+
+ sds->sync_namager_names = gaudi_sync_manager_names;
+
+ sds->funcs = gaudi_state_dump_funcs;
+}
+
+static u32 *gaudi_get_stream_master_qid_arr(void)
+{
+ return gaudi_stream_master;
+}
+
+static int gaudi_set_dram_properties(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static int gaudi_set_binning_masks(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static void gaudi_check_if_razwi_happened(struct hl_device *hdev)
+{
+}
+
+static ssize_t infineon_ver_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct hl_device *hdev = dev_get_drvdata(dev);
+ struct cpucp_info *cpucp_info;
+
+ cpucp_info = &hdev->asic_prop.cpucp_info;
+
+ return sprintf(buf, "%#04x\n", le32_to_cpu(cpucp_info->infineon_version));
+}
+
+static DEVICE_ATTR_RO(infineon_ver);
+
+static struct attribute *gaudi_vrm_dev_attrs[] = {
+ &dev_attr_infineon_ver.attr,
+ NULL,
+};
+
+static void gaudi_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
+ struct attribute_group *dev_vrm_attr_grp)
+{
+ hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
+ dev_vrm_attr_grp->attrs = gaudi_vrm_dev_attrs;
+}
+
+static int gaudi_send_device_activity(struct hl_device *hdev, bool open)
+{
+ return 0;
+}
+
+static const struct hl_asic_funcs gaudi_funcs = {
+ .early_init = gaudi_early_init,
+ .early_fini = gaudi_early_fini,
+ .late_init = gaudi_late_init,
+ .late_fini = gaudi_late_fini,
+ .sw_init = gaudi_sw_init,
+ .sw_fini = gaudi_sw_fini,
+ .hw_init = gaudi_hw_init,
+ .hw_fini = gaudi_hw_fini,
+ .halt_engines = gaudi_halt_engines,
+ .suspend = gaudi_suspend,
+ .resume = gaudi_resume,
+ .mmap = gaudi_mmap,
+ .ring_doorbell = gaudi_ring_doorbell,
+ .pqe_write = gaudi_pqe_write,
+ .asic_dma_alloc_coherent = gaudi_dma_alloc_coherent,
+ .asic_dma_free_coherent = gaudi_dma_free_coherent,
+ .scrub_device_mem = gaudi_scrub_device_mem,
+ .scrub_device_dram = gaudi_scrub_device_dram,
+ .get_int_queue_base = gaudi_get_int_queue_base,
+ .test_queues = gaudi_test_queues,
+ .asic_dma_pool_zalloc = gaudi_dma_pool_zalloc,
+ .asic_dma_pool_free = gaudi_dma_pool_free,
+ .cpu_accessible_dma_pool_alloc = gaudi_cpu_accessible_dma_pool_alloc,
+ .cpu_accessible_dma_pool_free = gaudi_cpu_accessible_dma_pool_free,
+ .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
+ .cs_parser = gaudi_cs_parser,
+ .asic_dma_map_sgtable = hl_dma_map_sgtable,
+ .add_end_of_cb_packets = gaudi_add_end_of_cb_packets,
+ .update_eq_ci = gaudi_update_eq_ci,
+ .context_switch = gaudi_context_switch,
+ .restore_phase_topology = gaudi_restore_phase_topology,
+ .debugfs_read_dma = gaudi_debugfs_read_dma,
+ .add_device_attr = gaudi_add_device_attr,
+ .handle_eqe = gaudi_handle_eqe,
+ .get_events_stat = gaudi_get_events_stat,
+ .read_pte = gaudi_read_pte,
+ .write_pte = gaudi_write_pte,
+ .mmu_invalidate_cache = gaudi_mmu_invalidate_cache,
+ .mmu_invalidate_cache_range = gaudi_mmu_invalidate_cache_range,
+ .mmu_prefetch_cache_range = NULL,
+ .send_heartbeat = gaudi_send_heartbeat,
+ .debug_coresight = gaudi_debug_coresight,
+ .is_device_idle = gaudi_is_device_idle,
+ .compute_reset_late_init = gaudi_compute_reset_late_init,
+ .hw_queues_lock = gaudi_hw_queues_lock,
+ .hw_queues_unlock = gaudi_hw_queues_unlock,
+ .get_pci_id = gaudi_get_pci_id,
+ .get_eeprom_data = gaudi_get_eeprom_data,
+ .get_monitor_dump = gaudi_get_monitor_dump,
+ .send_cpu_message = gaudi_send_cpu_message,
+ .pci_bars_map = gaudi_pci_bars_map,
+ .init_iatu = gaudi_init_iatu,
+ .rreg = hl_rreg,
+ .wreg = hl_wreg,
+ .halt_coresight = gaudi_halt_coresight,
+ .ctx_init = gaudi_ctx_init,
+ .ctx_fini = gaudi_ctx_fini,
+ .pre_schedule_cs = gaudi_pre_schedule_cs,
+ .get_queue_id_for_cq = gaudi_get_queue_id_for_cq,
+ .load_firmware_to_device = gaudi_load_firmware_to_device,
+ .load_boot_fit_to_device = gaudi_load_boot_fit_to_device,
+ .get_signal_cb_size = gaudi_get_signal_cb_size,
+ .get_wait_cb_size = gaudi_get_wait_cb_size,
+ .gen_signal_cb = gaudi_gen_signal_cb,
+ .gen_wait_cb = gaudi_gen_wait_cb,
+ .reset_sob = gaudi_reset_sob,
+ .reset_sob_group = gaudi_reset_sob_group,
+ .get_device_time = gaudi_get_device_time,
+ .pb_print_security_errors = NULL,
+ .collective_wait_init_cs = gaudi_collective_wait_init_cs,
+ .collective_wait_create_jobs = gaudi_collective_wait_create_jobs,
+ .get_dec_base_addr = NULL,
+ .scramble_addr = hl_mmu_scramble_addr,
+ .descramble_addr = hl_mmu_descramble_addr,
+ .ack_protection_bits_errors = gaudi_ack_protection_bits_errors,
+ .get_hw_block_id = gaudi_get_hw_block_id,
+ .hw_block_mmap = gaudi_block_mmap,
+ .enable_events_from_fw = gaudi_enable_events_from_fw,
+ .ack_mmu_errors = gaudi_ack_mmu_page_fault_or_access_error,
+ .map_pll_idx_to_fw_idx = gaudi_map_pll_idx_to_fw_idx,
+ .init_firmware_preload_params = gaudi_init_firmware_preload_params,
+ .init_firmware_loader = gaudi_init_firmware_loader,
+ .init_cpu_scrambler_dram = gaudi_init_scrambler_hbm,
+ .state_dump_init = gaudi_state_dump_init,
+ .get_sob_addr = gaudi_get_sob_addr,
+ .set_pci_memory_regions = gaudi_set_pci_memory_regions,
+ .get_stream_master_qid_arr = gaudi_get_stream_master_qid_arr,
+ .check_if_razwi_happened = gaudi_check_if_razwi_happened,
+ .mmu_get_real_page_size = hl_mmu_get_real_page_size,
+ .access_dev_mem = hl_access_dev_mem,
+ .set_dram_bar_base = gaudi_set_hbm_bar_base,
+ .send_device_activity = gaudi_send_device_activity,
+ .set_dram_properties = gaudi_set_dram_properties,
+ .set_binning_masks = gaudi_set_binning_masks,
+};
+
+/**
+ * gaudi_set_asic_funcs - set GAUDI function pointers
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+void gaudi_set_asic_funcs(struct hl_device *hdev)
+{
+ hdev->asic_funcs = &gaudi_funcs;
+}
--- /dev/null
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020-2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "gaudi2P.h"
+#include "gaudi2_masks.h"
+#include "../include/gaudi2/gaudi2_special_blocks.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+#include "../include/hw_ip/mmu/mmu_v2_0.h"
+#include "../include/gaudi2/gaudi2_packets.h"
+#include "../include/gaudi2/gaudi2_reg_map.h"
+#include "../include/gaudi2/gaudi2_async_ids_map_extended.h"
+#include "../include/gaudi2/arc/gaudi2_arc_common_packets.h"
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/hwmon.h>
+#include <linux/iommu.h>
+
+#define GAUDI2_DMA_POOL_BLK_SIZE SZ_256 /* 256 bytes */
+
+#define GAUDI2_RESET_TIMEOUT_MSEC 2000 /* 2000ms */
+#define GAUDI2_RESET_POLL_TIMEOUT_USEC 50000 /* 50ms */
+#define GAUDI2_PLDM_HRESET_TIMEOUT_MSEC 25000 /* 25s */
+#define GAUDI2_PLDM_SRESET_TIMEOUT_MSEC 25000 /* 25s */
+#define GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC 3000000 /* 3s */
+#define GAUDI2_RESET_POLL_CNT 3
+#define GAUDI2_RESET_WAIT_MSEC 1 /* 1ms */
+#define GAUDI2_CPU_RESET_WAIT_MSEC 100 /* 100ms */
+#define GAUDI2_PLDM_RESET_WAIT_MSEC 1000 /* 1s */
+#define GAUDI2_CB_POOL_CB_CNT 512
+#define GAUDI2_CB_POOL_CB_SIZE SZ_128K /* 128KB */
+#define GAUDI2_MSG_TO_CPU_TIMEOUT_USEC 4000000 /* 4s */
+#define GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC 25000000 /* 25s */
+#define GAUDI2_TEST_QUEUE_WAIT_USEC 100000 /* 100ms */
+#define GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC 1000000 /* 1s */
+
+#define GAUDI2_ALLOC_CPU_MEM_RETRY_CNT 3
+
+/*
+ * since the code already has built-in support for binning of up to MAX_FAULTY_TPCS TPCs
+ * and the code relies on that value (for array size etc..) we define another value
+ * for MAX faulty TPCs which reflects the cluster binning requirements
+ */
+#define MAX_CLUSTER_BINNING_FAULTY_TPCS 1
+#define MAX_FAULTY_XBARS 1
+#define MAX_FAULTY_EDMAS 1
+#define MAX_FAULTY_DECODERS 1
+
+#define GAUDI2_TPC_FULL_MASK 0x1FFFFFF
+#define GAUDI2_HIF_HMMU_FULL_MASK 0xFFFF
+#define GAUDI2_DECODER_FULL_MASK 0x3FF
+
+#define GAUDI2_NA_EVENT_CAUSE 0xFF
+#define GAUDI2_NUM_OF_QM_ERR_CAUSE 18
+#define GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE 25
+#define GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE 3
+#define GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE 14
+#define GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE 3
+#define GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE 2
+#define GAUDI2_NUM_OF_ROT_ERR_CAUSE 22
+#define GAUDI2_NUM_OF_TPC_INTR_CAUSE 30
+#define GAUDI2_NUM_OF_DEC_ERR_CAUSE 25
+#define GAUDI2_NUM_OF_MME_ERR_CAUSE 16
+#define GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE 5
+#define GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE 7
+#define GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE 8
+#define GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE 19
+#define GAUDI2_NUM_OF_HBM_SEI_CAUSE 9
+#define GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE 3
+#define GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE 3
+#define GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE 2
+#define GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE 2
+#define GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE 2
+#define GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE 5
+
+#define GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC (MMU_CONFIG_TIMEOUT_USEC * 10)
+#define GAUDI2_PLDM_MMU_TIMEOUT_USEC (MMU_CONFIG_TIMEOUT_USEC * 200)
+#define GAUDI2_ARB_WDT_TIMEOUT (0x1000000)
+
+#define GAUDI2_VDEC_TIMEOUT_USEC 10000 /* 10ms */
+#define GAUDI2_PLDM_VDEC_TIMEOUT_USEC (GAUDI2_VDEC_TIMEOUT_USEC * 100)
+
+#define KDMA_TIMEOUT_USEC USEC_PER_SEC
+
+#define IS_DMA_IDLE(dma_core_idle_ind_mask) \
+ (!((dma_core_idle_ind_mask) & \
+ ((DCORE0_EDMA0_CORE_IDLE_IND_MASK_DESC_CNT_STS_MASK) | \
+ (DCORE0_EDMA0_CORE_IDLE_IND_MASK_COMP_MASK))))
+
+#define IS_MME_IDLE(mme_arch_sts) (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
+
+#define IS_TPC_IDLE(tpc_cfg_sts) (((tpc_cfg_sts) & (TPC_IDLE_MASK)) == (TPC_IDLE_MASK))
+
+#define IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) \
+ ((((qm_glbl_sts0) & (QM_IDLE_MASK)) == (QM_IDLE_MASK)) && \
+ (((qm_glbl_sts1) & (QM_ARC_IDLE_MASK)) == (QM_ARC_IDLE_MASK)) && \
+ (((qm_cgm_sts) & (CGM_IDLE_MASK)) == (CGM_IDLE_MASK)))
+
+#define PCIE_DEC_EN_MASK 0x300
+#define DEC_WORK_STATE_IDLE 0
+#define DEC_WORK_STATE_PEND 3
+#define IS_DEC_IDLE(dec_swreg15) \
+ (((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) == DEC_WORK_STATE_IDLE || \
+ ((dec_swreg15) & DCORE0_DEC0_CMD_SWREG15_SW_WORK_STATE_MASK) == DEC_WORK_STATE_PEND)
+
+/* HBM MMU address scrambling parameters */
+#define GAUDI2_HBM_MMU_SCRM_MEM_SIZE SZ_8M
+#define GAUDI2_HBM_MMU_SCRM_DIV_SHIFT 26
+#define GAUDI2_HBM_MMU_SCRM_MOD_SHIFT 0
+#define GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK DRAM_VA_HINT_MASK
+#define GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR 16
+#define MMU_RANGE_INV_VA_LSB_SHIFT 12
+#define MMU_RANGE_INV_VA_MSB_SHIFT 44
+#define MMU_RANGE_INV_EN_SHIFT 0
+#define MMU_RANGE_INV_ASID_EN_SHIFT 1
+#define MMU_RANGE_INV_ASID_SHIFT 2
+
+/* The last SPI_SEI cause bit, "burst_fifo_full", is expected to be triggered in PMMU because it has
+ * a 2 entries FIFO, and hence it is not enabled for it.
+ */
+#define GAUDI2_PMMU_SPI_SEI_ENABLE_MASK GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 2, 0)
+#define GAUDI2_HMMU_SPI_SEI_ENABLE_MASK GENMASK(GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE - 1, 0)
+
+#define GAUDI2_MAX_STRING_LEN 64
+
+#define GAUDI2_VDEC_MSIX_ENTRIES (GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM - \
+ GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 1)
+
+#define ENGINE_ID_DCORE_OFFSET (GAUDI2_DCORE1_ENGINE_ID_EDMA_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0)
+
+enum hl_pmmu_fatal_cause {
+ LATENCY_RD_OUT_FIFO_OVERRUN,
+ LATENCY_WR_OUT_FIFO_OVERRUN,
+};
+
+enum hl_pcie_drain_ind_cause {
+ LBW_AXI_DRAIN_IND,
+ HBW_AXI_DRAIN_IND
+};
+
+static const u32 cluster_hmmu_hif_enabled_mask[GAUDI2_HBM_NUM] = {
+ [HBM_ID0] = 0xFFFC,
+ [HBM_ID1] = 0xFFCF,
+ [HBM_ID2] = 0xF7F7,
+ [HBM_ID3] = 0x7F7F,
+ [HBM_ID4] = 0xFCFF,
+ [HBM_ID5] = 0xCFFF,
+};
+
+static const u8 xbar_edge_to_hbm_cluster[EDMA_ID_SIZE] = {
+ [0] = HBM_ID0,
+ [1] = HBM_ID1,
+ [2] = HBM_ID4,
+ [3] = HBM_ID5,
+};
+
+static const u8 edma_to_hbm_cluster[EDMA_ID_SIZE] = {
+ [EDMA_ID_DCORE0_INSTANCE0] = HBM_ID0,
+ [EDMA_ID_DCORE0_INSTANCE1] = HBM_ID2,
+ [EDMA_ID_DCORE1_INSTANCE0] = HBM_ID1,
+ [EDMA_ID_DCORE1_INSTANCE1] = HBM_ID3,
+ [EDMA_ID_DCORE2_INSTANCE0] = HBM_ID2,
+ [EDMA_ID_DCORE2_INSTANCE1] = HBM_ID4,
+ [EDMA_ID_DCORE3_INSTANCE0] = HBM_ID3,
+ [EDMA_ID_DCORE3_INSTANCE1] = HBM_ID5,
+};
+
+static const int gaudi2_qman_async_event_id[] = {
+ [GAUDI2_QUEUE_ID_PDMA_0_0] = GAUDI2_EVENT_PDMA0_QM,
+ [GAUDI2_QUEUE_ID_PDMA_0_1] = GAUDI2_EVENT_PDMA0_QM,
+ [GAUDI2_QUEUE_ID_PDMA_0_2] = GAUDI2_EVENT_PDMA0_QM,
+ [GAUDI2_QUEUE_ID_PDMA_0_3] = GAUDI2_EVENT_PDMA0_QM,
+ [GAUDI2_QUEUE_ID_PDMA_1_0] = GAUDI2_EVENT_PDMA1_QM,
+ [GAUDI2_QUEUE_ID_PDMA_1_1] = GAUDI2_EVENT_PDMA1_QM,
+ [GAUDI2_QUEUE_ID_PDMA_1_2] = GAUDI2_EVENT_PDMA1_QM,
+ [GAUDI2_QUEUE_ID_PDMA_1_3] = GAUDI2_EVENT_PDMA1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = GAUDI2_EVENT_HDMA0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = GAUDI2_EVENT_HDMA0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = GAUDI2_EVENT_HDMA0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = GAUDI2_EVENT_HDMA0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = GAUDI2_EVENT_HDMA1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = GAUDI2_EVENT_HDMA1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = GAUDI2_EVENT_HDMA1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = GAUDI2_EVENT_HDMA1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = GAUDI2_EVENT_MME0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = GAUDI2_EVENT_MME0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = GAUDI2_EVENT_MME0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = GAUDI2_EVENT_MME0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = GAUDI2_EVENT_TPC0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = GAUDI2_EVENT_TPC0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = GAUDI2_EVENT_TPC0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = GAUDI2_EVENT_TPC0_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = GAUDI2_EVENT_TPC1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = GAUDI2_EVENT_TPC1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = GAUDI2_EVENT_TPC1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = GAUDI2_EVENT_TPC1_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = GAUDI2_EVENT_TPC2_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = GAUDI2_EVENT_TPC2_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = GAUDI2_EVENT_TPC2_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = GAUDI2_EVENT_TPC2_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = GAUDI2_EVENT_TPC3_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = GAUDI2_EVENT_TPC3_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = GAUDI2_EVENT_TPC3_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = GAUDI2_EVENT_TPC3_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = GAUDI2_EVENT_TPC4_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = GAUDI2_EVENT_TPC4_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = GAUDI2_EVENT_TPC4_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = GAUDI2_EVENT_TPC4_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = GAUDI2_EVENT_TPC5_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = GAUDI2_EVENT_TPC5_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = GAUDI2_EVENT_TPC5_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = GAUDI2_EVENT_TPC5_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = GAUDI2_EVENT_TPC24_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = GAUDI2_EVENT_TPC24_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = GAUDI2_EVENT_TPC24_QM,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = GAUDI2_EVENT_TPC24_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = GAUDI2_EVENT_HDMA2_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = GAUDI2_EVENT_HDMA2_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = GAUDI2_EVENT_HDMA2_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = GAUDI2_EVENT_HDMA2_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = GAUDI2_EVENT_HDMA3_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = GAUDI2_EVENT_HDMA3_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = GAUDI2_EVENT_HDMA3_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = GAUDI2_EVENT_HDMA3_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = GAUDI2_EVENT_MME1_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = GAUDI2_EVENT_MME1_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = GAUDI2_EVENT_MME1_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = GAUDI2_EVENT_MME1_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = GAUDI2_EVENT_TPC6_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = GAUDI2_EVENT_TPC6_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = GAUDI2_EVENT_TPC6_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = GAUDI2_EVENT_TPC6_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = GAUDI2_EVENT_TPC7_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = GAUDI2_EVENT_TPC7_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = GAUDI2_EVENT_TPC7_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = GAUDI2_EVENT_TPC7_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = GAUDI2_EVENT_TPC8_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = GAUDI2_EVENT_TPC8_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = GAUDI2_EVENT_TPC8_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = GAUDI2_EVENT_TPC8_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = GAUDI2_EVENT_TPC9_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = GAUDI2_EVENT_TPC9_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = GAUDI2_EVENT_TPC9_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = GAUDI2_EVENT_TPC9_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = GAUDI2_EVENT_TPC10_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = GAUDI2_EVENT_TPC10_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = GAUDI2_EVENT_TPC10_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = GAUDI2_EVENT_TPC10_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = GAUDI2_EVENT_TPC11_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = GAUDI2_EVENT_TPC11_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = GAUDI2_EVENT_TPC11_QM,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = GAUDI2_EVENT_TPC11_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = GAUDI2_EVENT_HDMA4_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = GAUDI2_EVENT_HDMA4_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = GAUDI2_EVENT_HDMA4_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = GAUDI2_EVENT_HDMA4_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = GAUDI2_EVENT_HDMA5_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = GAUDI2_EVENT_HDMA5_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = GAUDI2_EVENT_HDMA5_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = GAUDI2_EVENT_HDMA5_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = GAUDI2_EVENT_MME2_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = GAUDI2_EVENT_MME2_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = GAUDI2_EVENT_MME2_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = GAUDI2_EVENT_MME2_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = GAUDI2_EVENT_TPC12_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = GAUDI2_EVENT_TPC12_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = GAUDI2_EVENT_TPC12_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = GAUDI2_EVENT_TPC12_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = GAUDI2_EVENT_TPC13_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = GAUDI2_EVENT_TPC13_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = GAUDI2_EVENT_TPC13_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = GAUDI2_EVENT_TPC13_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = GAUDI2_EVENT_TPC14_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = GAUDI2_EVENT_TPC14_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = GAUDI2_EVENT_TPC14_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = GAUDI2_EVENT_TPC14_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = GAUDI2_EVENT_TPC15_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = GAUDI2_EVENT_TPC15_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = GAUDI2_EVENT_TPC15_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = GAUDI2_EVENT_TPC15_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = GAUDI2_EVENT_TPC16_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = GAUDI2_EVENT_TPC16_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = GAUDI2_EVENT_TPC16_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = GAUDI2_EVENT_TPC16_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = GAUDI2_EVENT_TPC17_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = GAUDI2_EVENT_TPC17_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = GAUDI2_EVENT_TPC17_QM,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = GAUDI2_EVENT_TPC17_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = GAUDI2_EVENT_HDMA6_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = GAUDI2_EVENT_HDMA6_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = GAUDI2_EVENT_HDMA6_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = GAUDI2_EVENT_HDMA6_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = GAUDI2_EVENT_HDMA7_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = GAUDI2_EVENT_HDMA7_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = GAUDI2_EVENT_HDMA7_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = GAUDI2_EVENT_HDMA7_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = GAUDI2_EVENT_MME3_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = GAUDI2_EVENT_MME3_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = GAUDI2_EVENT_MME3_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = GAUDI2_EVENT_MME3_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = GAUDI2_EVENT_TPC18_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = GAUDI2_EVENT_TPC18_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = GAUDI2_EVENT_TPC18_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = GAUDI2_EVENT_TPC18_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = GAUDI2_EVENT_TPC19_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = GAUDI2_EVENT_TPC19_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = GAUDI2_EVENT_TPC19_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = GAUDI2_EVENT_TPC19_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = GAUDI2_EVENT_TPC20_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = GAUDI2_EVENT_TPC20_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = GAUDI2_EVENT_TPC20_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = GAUDI2_EVENT_TPC20_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = GAUDI2_EVENT_TPC21_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = GAUDI2_EVENT_TPC21_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = GAUDI2_EVENT_TPC21_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = GAUDI2_EVENT_TPC21_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = GAUDI2_EVENT_TPC22_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = GAUDI2_EVENT_TPC22_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = GAUDI2_EVENT_TPC22_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = GAUDI2_EVENT_TPC22_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = GAUDI2_EVENT_TPC23_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = GAUDI2_EVENT_TPC23_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = GAUDI2_EVENT_TPC23_QM,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = GAUDI2_EVENT_TPC23_QM,
+ [GAUDI2_QUEUE_ID_NIC_0_0] = GAUDI2_EVENT_NIC0_QM0,
+ [GAUDI2_QUEUE_ID_NIC_0_1] = GAUDI2_EVENT_NIC0_QM0,
+ [GAUDI2_QUEUE_ID_NIC_0_2] = GAUDI2_EVENT_NIC0_QM0,
+ [GAUDI2_QUEUE_ID_NIC_0_3] = GAUDI2_EVENT_NIC0_QM0,
+ [GAUDI2_QUEUE_ID_NIC_1_0] = GAUDI2_EVENT_NIC0_QM1,
+ [GAUDI2_QUEUE_ID_NIC_1_1] = GAUDI2_EVENT_NIC0_QM1,
+ [GAUDI2_QUEUE_ID_NIC_1_2] = GAUDI2_EVENT_NIC0_QM1,
+ [GAUDI2_QUEUE_ID_NIC_1_3] = GAUDI2_EVENT_NIC0_QM1,
+ [GAUDI2_QUEUE_ID_NIC_2_0] = GAUDI2_EVENT_NIC1_QM0,
+ [GAUDI2_QUEUE_ID_NIC_2_1] = GAUDI2_EVENT_NIC1_QM0,
+ [GAUDI2_QUEUE_ID_NIC_2_2] = GAUDI2_EVENT_NIC1_QM0,
+ [GAUDI2_QUEUE_ID_NIC_2_3] = GAUDI2_EVENT_NIC1_QM0,
+ [GAUDI2_QUEUE_ID_NIC_3_0] = GAUDI2_EVENT_NIC1_QM1,
+ [GAUDI2_QUEUE_ID_NIC_3_1] = GAUDI2_EVENT_NIC1_QM1,
+ [GAUDI2_QUEUE_ID_NIC_3_2] = GAUDI2_EVENT_NIC1_QM1,
+ [GAUDI2_QUEUE_ID_NIC_3_3] = GAUDI2_EVENT_NIC1_QM1,
+ [GAUDI2_QUEUE_ID_NIC_4_0] = GAUDI2_EVENT_NIC2_QM0,
+ [GAUDI2_QUEUE_ID_NIC_4_1] = GAUDI2_EVENT_NIC2_QM0,
+ [GAUDI2_QUEUE_ID_NIC_4_2] = GAUDI2_EVENT_NIC2_QM0,
+ [GAUDI2_QUEUE_ID_NIC_4_3] = GAUDI2_EVENT_NIC2_QM0,
+ [GAUDI2_QUEUE_ID_NIC_5_0] = GAUDI2_EVENT_NIC2_QM1,
+ [GAUDI2_QUEUE_ID_NIC_5_1] = GAUDI2_EVENT_NIC2_QM1,
+ [GAUDI2_QUEUE_ID_NIC_5_2] = GAUDI2_EVENT_NIC2_QM1,
+ [GAUDI2_QUEUE_ID_NIC_5_3] = GAUDI2_EVENT_NIC2_QM1,
+ [GAUDI2_QUEUE_ID_NIC_6_0] = GAUDI2_EVENT_NIC3_QM0,
+ [GAUDI2_QUEUE_ID_NIC_6_1] = GAUDI2_EVENT_NIC3_QM0,
+ [GAUDI2_QUEUE_ID_NIC_6_2] = GAUDI2_EVENT_NIC3_QM0,
+ [GAUDI2_QUEUE_ID_NIC_6_3] = GAUDI2_EVENT_NIC3_QM0,
+ [GAUDI2_QUEUE_ID_NIC_7_0] = GAUDI2_EVENT_NIC3_QM1,
+ [GAUDI2_QUEUE_ID_NIC_7_1] = GAUDI2_EVENT_NIC3_QM1,
+ [GAUDI2_QUEUE_ID_NIC_7_2] = GAUDI2_EVENT_NIC3_QM1,
+ [GAUDI2_QUEUE_ID_NIC_7_3] = GAUDI2_EVENT_NIC3_QM1,
+ [GAUDI2_QUEUE_ID_NIC_8_0] = GAUDI2_EVENT_NIC4_QM0,
+ [GAUDI2_QUEUE_ID_NIC_8_1] = GAUDI2_EVENT_NIC4_QM0,
+ [GAUDI2_QUEUE_ID_NIC_8_2] = GAUDI2_EVENT_NIC4_QM0,
+ [GAUDI2_QUEUE_ID_NIC_8_3] = GAUDI2_EVENT_NIC4_QM0,
+ [GAUDI2_QUEUE_ID_NIC_9_0] = GAUDI2_EVENT_NIC4_QM1,
+ [GAUDI2_QUEUE_ID_NIC_9_1] = GAUDI2_EVENT_NIC4_QM1,
+ [GAUDI2_QUEUE_ID_NIC_9_2] = GAUDI2_EVENT_NIC4_QM1,
+ [GAUDI2_QUEUE_ID_NIC_9_3] = GAUDI2_EVENT_NIC4_QM1,
+ [GAUDI2_QUEUE_ID_NIC_10_0] = GAUDI2_EVENT_NIC5_QM0,
+ [GAUDI2_QUEUE_ID_NIC_10_1] = GAUDI2_EVENT_NIC5_QM0,
+ [GAUDI2_QUEUE_ID_NIC_10_2] = GAUDI2_EVENT_NIC5_QM0,
+ [GAUDI2_QUEUE_ID_NIC_10_3] = GAUDI2_EVENT_NIC5_QM0,
+ [GAUDI2_QUEUE_ID_NIC_11_0] = GAUDI2_EVENT_NIC5_QM1,
+ [GAUDI2_QUEUE_ID_NIC_11_1] = GAUDI2_EVENT_NIC5_QM1,
+ [GAUDI2_QUEUE_ID_NIC_11_2] = GAUDI2_EVENT_NIC5_QM1,
+ [GAUDI2_QUEUE_ID_NIC_11_3] = GAUDI2_EVENT_NIC5_QM1,
+ [GAUDI2_QUEUE_ID_NIC_12_0] = GAUDI2_EVENT_NIC6_QM0,
+ [GAUDI2_QUEUE_ID_NIC_12_1] = GAUDI2_EVENT_NIC6_QM0,
+ [GAUDI2_QUEUE_ID_NIC_12_2] = GAUDI2_EVENT_NIC6_QM0,
+ [GAUDI2_QUEUE_ID_NIC_12_3] = GAUDI2_EVENT_NIC6_QM0,
+ [GAUDI2_QUEUE_ID_NIC_13_0] = GAUDI2_EVENT_NIC6_QM1,
+ [GAUDI2_QUEUE_ID_NIC_13_1] = GAUDI2_EVENT_NIC6_QM1,
+ [GAUDI2_QUEUE_ID_NIC_13_2] = GAUDI2_EVENT_NIC6_QM1,
+ [GAUDI2_QUEUE_ID_NIC_13_3] = GAUDI2_EVENT_NIC6_QM1,
+ [GAUDI2_QUEUE_ID_NIC_14_0] = GAUDI2_EVENT_NIC7_QM0,
+ [GAUDI2_QUEUE_ID_NIC_14_1] = GAUDI2_EVENT_NIC7_QM0,
+ [GAUDI2_QUEUE_ID_NIC_14_2] = GAUDI2_EVENT_NIC7_QM0,
+ [GAUDI2_QUEUE_ID_NIC_14_3] = GAUDI2_EVENT_NIC7_QM0,
+ [GAUDI2_QUEUE_ID_NIC_15_0] = GAUDI2_EVENT_NIC7_QM1,
+ [GAUDI2_QUEUE_ID_NIC_15_1] = GAUDI2_EVENT_NIC7_QM1,
+ [GAUDI2_QUEUE_ID_NIC_15_2] = GAUDI2_EVENT_NIC7_QM1,
+ [GAUDI2_QUEUE_ID_NIC_15_3] = GAUDI2_EVENT_NIC7_QM1,
+ [GAUDI2_QUEUE_ID_NIC_16_0] = GAUDI2_EVENT_NIC8_QM0,
+ [GAUDI2_QUEUE_ID_NIC_16_1] = GAUDI2_EVENT_NIC8_QM0,
+ [GAUDI2_QUEUE_ID_NIC_16_2] = GAUDI2_EVENT_NIC8_QM0,
+ [GAUDI2_QUEUE_ID_NIC_16_3] = GAUDI2_EVENT_NIC8_QM0,
+ [GAUDI2_QUEUE_ID_NIC_17_0] = GAUDI2_EVENT_NIC8_QM1,
+ [GAUDI2_QUEUE_ID_NIC_17_1] = GAUDI2_EVENT_NIC8_QM1,
+ [GAUDI2_QUEUE_ID_NIC_17_2] = GAUDI2_EVENT_NIC8_QM1,
+ [GAUDI2_QUEUE_ID_NIC_17_3] = GAUDI2_EVENT_NIC8_QM1,
+ [GAUDI2_QUEUE_ID_NIC_18_0] = GAUDI2_EVENT_NIC9_QM0,
+ [GAUDI2_QUEUE_ID_NIC_18_1] = GAUDI2_EVENT_NIC9_QM0,
+ [GAUDI2_QUEUE_ID_NIC_18_2] = GAUDI2_EVENT_NIC9_QM0,
+ [GAUDI2_QUEUE_ID_NIC_18_3] = GAUDI2_EVENT_NIC9_QM0,
+ [GAUDI2_QUEUE_ID_NIC_19_0] = GAUDI2_EVENT_NIC9_QM1,
+ [GAUDI2_QUEUE_ID_NIC_19_1] = GAUDI2_EVENT_NIC9_QM1,
+ [GAUDI2_QUEUE_ID_NIC_19_2] = GAUDI2_EVENT_NIC9_QM1,
+ [GAUDI2_QUEUE_ID_NIC_19_3] = GAUDI2_EVENT_NIC9_QM1,
+ [GAUDI2_QUEUE_ID_NIC_20_0] = GAUDI2_EVENT_NIC10_QM0,
+ [GAUDI2_QUEUE_ID_NIC_20_1] = GAUDI2_EVENT_NIC10_QM0,
+ [GAUDI2_QUEUE_ID_NIC_20_2] = GAUDI2_EVENT_NIC10_QM0,
+ [GAUDI2_QUEUE_ID_NIC_20_3] = GAUDI2_EVENT_NIC10_QM0,
+ [GAUDI2_QUEUE_ID_NIC_21_0] = GAUDI2_EVENT_NIC10_QM1,
+ [GAUDI2_QUEUE_ID_NIC_21_1] = GAUDI2_EVENT_NIC10_QM1,
+ [GAUDI2_QUEUE_ID_NIC_21_2] = GAUDI2_EVENT_NIC10_QM1,
+ [GAUDI2_QUEUE_ID_NIC_21_3] = GAUDI2_EVENT_NIC10_QM1,
+ [GAUDI2_QUEUE_ID_NIC_22_0] = GAUDI2_EVENT_NIC11_QM0,
+ [GAUDI2_QUEUE_ID_NIC_22_1] = GAUDI2_EVENT_NIC11_QM0,
+ [GAUDI2_QUEUE_ID_NIC_22_2] = GAUDI2_EVENT_NIC11_QM0,
+ [GAUDI2_QUEUE_ID_NIC_22_3] = GAUDI2_EVENT_NIC11_QM0,
+ [GAUDI2_QUEUE_ID_NIC_23_0] = GAUDI2_EVENT_NIC11_QM1,
+ [GAUDI2_QUEUE_ID_NIC_23_1] = GAUDI2_EVENT_NIC11_QM1,
+ [GAUDI2_QUEUE_ID_NIC_23_2] = GAUDI2_EVENT_NIC11_QM1,
+ [GAUDI2_QUEUE_ID_NIC_23_3] = GAUDI2_EVENT_NIC11_QM1,
+ [GAUDI2_QUEUE_ID_ROT_0_0] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
+ [GAUDI2_QUEUE_ID_ROT_0_1] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
+ [GAUDI2_QUEUE_ID_ROT_0_2] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
+ [GAUDI2_QUEUE_ID_ROT_0_3] = GAUDI2_EVENT_ROTATOR0_ROT0_QM,
+ [GAUDI2_QUEUE_ID_ROT_1_0] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
+ [GAUDI2_QUEUE_ID_ROT_1_1] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
+ [GAUDI2_QUEUE_ID_ROT_1_2] = GAUDI2_EVENT_ROTATOR1_ROT1_QM,
+ [GAUDI2_QUEUE_ID_ROT_1_3] = GAUDI2_EVENT_ROTATOR1_ROT1_QM
+};
+
+static const int gaudi2_dma_core_async_event_id[] = {
+ [DMA_CORE_ID_EDMA0] = GAUDI2_EVENT_HDMA0_CORE,
+ [DMA_CORE_ID_EDMA1] = GAUDI2_EVENT_HDMA1_CORE,
+ [DMA_CORE_ID_EDMA2] = GAUDI2_EVENT_HDMA2_CORE,
+ [DMA_CORE_ID_EDMA3] = GAUDI2_EVENT_HDMA3_CORE,
+ [DMA_CORE_ID_EDMA4] = GAUDI2_EVENT_HDMA4_CORE,
+ [DMA_CORE_ID_EDMA5] = GAUDI2_EVENT_HDMA5_CORE,
+ [DMA_CORE_ID_EDMA6] = GAUDI2_EVENT_HDMA6_CORE,
+ [DMA_CORE_ID_EDMA7] = GAUDI2_EVENT_HDMA7_CORE,
+ [DMA_CORE_ID_PDMA0] = GAUDI2_EVENT_PDMA0_CORE,
+ [DMA_CORE_ID_PDMA1] = GAUDI2_EVENT_PDMA1_CORE,
+ [DMA_CORE_ID_KDMA] = GAUDI2_EVENT_KDMA0_CORE,
+};
+
+static const char * const gaudi2_qm_sei_error_cause[GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE] = {
+ "qman sei intr",
+ "arc sei intr"
+};
+
+static const char * const gaudi2_cpu_sei_error_cause[GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE] = {
+ "AXI_TERMINATOR WR",
+ "AXI_TERMINATOR RD",
+ "AXI SPLIT SEI Status"
+};
+
+static const char * const gaudi2_arc_sei_error_cause[GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE] = {
+ "cbu_bresp_sei_intr_cause",
+ "cbu_rresp_sei_intr_cause",
+ "lbu_bresp_sei_intr_cause",
+ "lbu_rresp_sei_intr_cause",
+ "cbu_axi_split_intr_cause",
+ "lbu_axi_split_intr_cause",
+ "arc_ip_excptn_sei_intr_cause",
+ "dmi_bresp_sei_intr_cause",
+ "aux2apb_err_sei_intr_cause",
+ "cfg_lbw_wr_terminated_intr_cause",
+ "cfg_lbw_rd_terminated_intr_cause",
+ "cfg_dccm_wr_terminated_intr_cause",
+ "cfg_dccm_rd_terminated_intr_cause",
+ "cfg_hbw_rd_terminated_intr_cause"
+};
+
+static const char * const gaudi2_dec_error_cause[GAUDI2_NUM_OF_DEC_ERR_CAUSE] = {
+ "msix_vcd_hbw_sei",
+ "msix_l2c_hbw_sei",
+ "msix_nrm_hbw_sei",
+ "msix_abnrm_hbw_sei",
+ "msix_vcd_lbw_sei",
+ "msix_l2c_lbw_sei",
+ "msix_nrm_lbw_sei",
+ "msix_abnrm_lbw_sei",
+ "apb_vcd_lbw_sei",
+ "apb_l2c_lbw_sei",
+ "apb_nrm_lbw_sei",
+ "apb_abnrm_lbw_sei",
+ "dec_sei",
+ "dec_apb_sei",
+ "trc_apb_sei",
+ "lbw_mstr_if_sei",
+ "axi_split_bresp_err_sei",
+ "hbw_axi_wr_viol_sei",
+ "hbw_axi_rd_viol_sei",
+ "lbw_axi_wr_viol_sei",
+ "lbw_axi_rd_viol_sei",
+ "vcd_spi",
+ "l2c_spi",
+ "nrm_spi",
+ "abnrm_spi",
+};
+
+static const char * const gaudi2_qman_error_cause[GAUDI2_NUM_OF_QM_ERR_CAUSE] = {
+ "PQ AXI HBW error",
+ "CQ AXI HBW error",
+ "CP AXI HBW error",
+ "CP error due to undefined OPCODE",
+ "CP encountered STOP OPCODE",
+ "CP AXI LBW error",
+ "CP WRREG32 or WRBULK returned error",
+ "N/A",
+ "FENCE 0 inc over max value and clipped",
+ "FENCE 1 inc over max value and clipped",
+ "FENCE 2 inc over max value and clipped",
+ "FENCE 3 inc over max value and clipped",
+ "FENCE 0 dec under min value and clipped",
+ "FENCE 1 dec under min value and clipped",
+ "FENCE 2 dec under min value and clipped",
+ "FENCE 3 dec under min value and clipped",
+ "CPDMA Up overflow",
+ "PQC L2H error"
+};
+
+static const char * const gaudi2_qman_lower_cp_error_cause[GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE] = {
+ "RSVD0",
+ "CQ AXI HBW error",
+ "CP AXI HBW error",
+ "CP error due to undefined OPCODE",
+ "CP encountered STOP OPCODE",
+ "CP AXI LBW error",
+ "CP WRREG32 or WRBULK returned error",
+ "N/A",
+ "FENCE 0 inc over max value and clipped",
+ "FENCE 1 inc over max value and clipped",
+ "FENCE 2 inc over max value and clipped",
+ "FENCE 3 inc over max value and clipped",
+ "FENCE 0 dec under min value and clipped",
+ "FENCE 1 dec under min value and clipped",
+ "FENCE 2 dec under min value and clipped",
+ "FENCE 3 dec under min value and clipped",
+ "CPDMA Up overflow",
+ "RSVD17",
+ "CQ_WR_IFIFO_CI_ERR",
+ "CQ_WR_CTL_CI_ERR",
+ "ARC_CQF_RD_ERR",
+ "ARC_CQ_WR_IFIFO_CI_ERR",
+ "ARC_CQ_WR_CTL_CI_ERR",
+ "ARC_AXI_ERR",
+ "CP_SWITCH_WDT_ERR"
+};
+
+static const char * const gaudi2_qman_arb_error_cause[GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE] = {
+ "Choice push while full error",
+ "Choice Q watchdog error",
+ "MSG AXI LBW returned with error"
+};
+
+static const char * const guadi2_rot_error_cause[GAUDI2_NUM_OF_ROT_ERR_CAUSE] = {
+ "qm_axi_err",
+ "qm_trace_fence_events",
+ "qm_sw_err",
+ "qm_cp_sw_stop",
+ "lbw_mstr_rresp_err",
+ "lbw_mstr_bresp_err",
+ "lbw_msg_slverr",
+ "hbw_msg_slverr",
+ "wbc_slverr",
+ "hbw_mstr_rresp_err",
+ "hbw_mstr_bresp_err",
+ "sb_resp_intr",
+ "mrsb_resp_intr",
+ "core_dw_status_0",
+ "core_dw_status_1",
+ "core_dw_status_2",
+ "core_dw_status_3",
+ "core_dw_status_4",
+ "core_dw_status_5",
+ "core_dw_status_6",
+ "core_dw_status_7",
+ "async_arc2cpu_sei_intr",
+};
+
+static const char * const gaudi2_tpc_interrupts_cause[GAUDI2_NUM_OF_TPC_INTR_CAUSE] = {
+ "tpc_address_exceed_slm",
+ "tpc_div_by_0",
+ "tpc_spu_mac_overflow",
+ "tpc_spu_addsub_overflow",
+ "tpc_spu_abs_overflow",
+ "tpc_spu_fma_fp_dst_nan",
+ "tpc_spu_fma_fp_dst_inf",
+ "tpc_spu_convert_fp_dst_nan",
+ "tpc_spu_convert_fp_dst_inf",
+ "tpc_spu_fp_dst_denorm",
+ "tpc_vpu_mac_overflow",
+ "tpc_vpu_addsub_overflow",
+ "tpc_vpu_abs_overflow",
+ "tpc_vpu_convert_fp_dst_nan",
+ "tpc_vpu_convert_fp_dst_inf",
+ "tpc_vpu_fma_fp_dst_nan",
+ "tpc_vpu_fma_fp_dst_inf",
+ "tpc_vpu_fp_dst_denorm",
+ "tpc_assertions",
+ "tpc_illegal_instruction",
+ "tpc_pc_wrap_around",
+ "tpc_qm_sw_err",
+ "tpc_hbw_rresp_err",
+ "tpc_hbw_bresp_err",
+ "tpc_lbw_rresp_err",
+ "tpc_lbw_bresp_err",
+ "st_unlock_already_locked",
+ "invalid_lock_access",
+ "LD_L protection violation",
+ "ST_L protection violation",
+};
+
+static const char * const guadi2_mme_error_cause[GAUDI2_NUM_OF_MME_ERR_CAUSE] = {
+ "agu_resp_intr",
+ "qman_axi_err",
+ "wap sei (wbc axi err)",
+ "arc sei",
+ "cfg access error",
+ "qm_sw_err",
+ "sbte_dbg_intr_0",
+ "sbte_dbg_intr_1",
+ "sbte_dbg_intr_2",
+ "sbte_dbg_intr_3",
+ "sbte_dbg_intr_4",
+ "sbte_prtn_intr_0",
+ "sbte_prtn_intr_1",
+ "sbte_prtn_intr_2",
+ "sbte_prtn_intr_3",
+ "sbte_prtn_intr_4",
+};
+
+static const char * const guadi2_mme_sbte_error_cause[GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE] = {
+ "i0",
+ "i1",
+ "i2",
+ "i3",
+ "i4",
+};
+
+static const char * const guadi2_mme_wap_error_cause[GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE] = {
+ "WBC ERR RESP_0",
+ "WBC ERR RESP_1",
+ "AP SOURCE POS INF",
+ "AP SOURCE NEG INF",
+ "AP SOURCE NAN",
+ "AP RESULT POS INF",
+ "AP RESULT NEG INF",
+};
+
+static const char * const gaudi2_dma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
+ "HBW Read returned with error RRESP",
+ "HBW write returned with error BRESP",
+ "LBW write returned with error BRESP",
+ "descriptor_fifo_overflow",
+ "KDMA SB LBW Read returned with error",
+ "KDMA WBC LBW Write returned with error",
+ "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
+ "WRONG CFG FOR COMMIT IN LIN DMA"
+};
+
+static const char * const gaudi2_kdma_core_interrupts_cause[GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE] = {
+ "HBW/LBW Read returned with error RRESP",
+ "HBW/LBW write returned with error BRESP",
+ "LBW write returned with error BRESP",
+ "descriptor_fifo_overflow",
+ "KDMA SB LBW Read returned with error",
+ "KDMA WBC LBW Write returned with error",
+ "TRANSPOSE ENGINE DESC FIFO OVERFLOW",
+ "WRONG CFG FOR COMMIT IN LIN DMA"
+};
+
+struct gaudi2_sm_sei_cause_data {
+ const char *cause_name;
+ const char *log_name;
+};
+
+static const struct gaudi2_sm_sei_cause_data
+gaudi2_sm_sei_cause[GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE] = {
+ {"calculated SO value overflow/underflow", "SOB ID"},
+ {"payload address of monitor is not aligned to 4B", "monitor addr"},
+ {"armed monitor write got BRESP (SLVERR or DECERR)", "AXI id"},
+};
+
+static const char * const
+gaudi2_pmmu_fatal_interrupts_cause[GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE] = {
+ "LATENCY_RD_OUT_FIFO_OVERRUN",
+ "LATENCY_WR_OUT_FIFO_OVERRUN",
+};
+
+static const char * const
+gaudi2_hif_fatal_interrupts_cause[GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE] = {
+ "LATENCY_RD_OUT_FIFO_OVERRUN",
+ "LATENCY_WR_OUT_FIFO_OVERRUN",
+};
+
+static const char * const
+gaudi2_psoc_axi_drain_interrupts_cause[GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE] = {
+ "AXI drain HBW",
+ "AXI drain LBW",
+};
+
+static const char * const
+gaudi2_pcie_addr_dec_error_cause[GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE] = {
+ "HBW error response",
+ "LBW error response",
+ "TLP is blocked by RR"
+};
+
+const u32 gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_SIZE] = {
+ [GAUDI2_QUEUE_ID_PDMA_0_0] = mmPDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_0_1] = mmPDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_0_2] = mmPDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_0_3] = mmPDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_1_0] = mmPDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_1_1] = mmPDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_1_2] = mmPDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_PDMA_1_3] = mmPDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = mmDCORE0_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = mmDCORE0_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = mmDCORE0_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = mmDCORE0_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = mmDCORE0_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = mmDCORE0_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = mmDCORE0_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = mmDCORE0_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = mmDCORE0_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = mmDCORE0_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = mmDCORE0_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = mmDCORE0_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = mmDCORE0_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = mmDCORE0_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = mmDCORE0_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = mmDCORE0_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = mmDCORE0_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = mmDCORE0_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = mmDCORE0_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = mmDCORE0_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = mmDCORE0_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = mmDCORE0_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = mmDCORE0_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = mmDCORE0_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = mmDCORE0_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = mmDCORE0_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = mmDCORE0_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = mmDCORE0_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = mmDCORE0_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = mmDCORE0_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = mmDCORE0_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = mmDCORE0_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = mmDCORE0_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = mmDCORE0_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = mmDCORE0_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = mmDCORE0_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = mmDCORE0_TPC6_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = mmDCORE0_TPC6_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = mmDCORE0_TPC6_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = mmDCORE0_TPC6_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = mmDCORE1_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = mmDCORE1_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = mmDCORE1_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = mmDCORE1_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = mmDCORE1_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = mmDCORE1_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = mmDCORE1_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = mmDCORE1_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = mmDCORE1_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = mmDCORE1_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = mmDCORE1_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = mmDCORE1_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = mmDCORE1_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = mmDCORE1_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = mmDCORE1_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = mmDCORE1_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = mmDCORE1_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = mmDCORE1_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = mmDCORE1_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = mmDCORE1_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = mmDCORE1_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = mmDCORE1_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = mmDCORE1_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = mmDCORE1_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = mmDCORE1_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = mmDCORE1_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = mmDCORE1_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = mmDCORE1_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = mmDCORE1_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = mmDCORE1_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = mmDCORE1_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = mmDCORE1_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = mmDCORE1_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = mmDCORE1_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = mmDCORE1_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = mmDCORE1_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = mmDCORE2_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = mmDCORE2_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = mmDCORE2_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = mmDCORE2_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = mmDCORE2_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = mmDCORE2_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = mmDCORE2_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = mmDCORE2_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = mmDCORE2_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = mmDCORE2_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = mmDCORE2_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = mmDCORE2_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = mmDCORE2_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = mmDCORE2_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = mmDCORE2_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = mmDCORE2_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = mmDCORE2_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = mmDCORE2_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = mmDCORE2_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = mmDCORE2_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = mmDCORE2_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = mmDCORE2_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = mmDCORE2_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = mmDCORE2_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = mmDCORE2_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = mmDCORE2_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = mmDCORE2_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = mmDCORE2_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = mmDCORE2_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = mmDCORE2_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = mmDCORE2_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = mmDCORE2_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = mmDCORE2_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = mmDCORE2_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = mmDCORE2_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = mmDCORE2_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = mmDCORE3_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = mmDCORE3_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = mmDCORE3_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = mmDCORE3_EDMA0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = mmDCORE3_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = mmDCORE3_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = mmDCORE3_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = mmDCORE3_EDMA1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = mmDCORE3_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = mmDCORE3_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = mmDCORE3_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = mmDCORE3_MME_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = mmDCORE3_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = mmDCORE3_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = mmDCORE3_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = mmDCORE3_TPC0_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = mmDCORE3_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = mmDCORE3_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = mmDCORE3_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = mmDCORE3_TPC1_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = mmDCORE3_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = mmDCORE3_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = mmDCORE3_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = mmDCORE3_TPC2_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = mmDCORE3_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = mmDCORE3_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = mmDCORE3_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = mmDCORE3_TPC3_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = mmDCORE3_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = mmDCORE3_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = mmDCORE3_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = mmDCORE3_TPC4_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = mmDCORE3_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = mmDCORE3_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = mmDCORE3_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = mmDCORE3_TPC5_QM_BASE,
+ [GAUDI2_QUEUE_ID_NIC_0_0] = mmNIC0_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_0_1] = mmNIC0_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_0_2] = mmNIC0_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_0_3] = mmNIC0_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_1_0] = mmNIC0_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_1_1] = mmNIC0_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_1_2] = mmNIC0_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_1_3] = mmNIC0_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_2_0] = mmNIC1_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_2_1] = mmNIC1_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_2_2] = mmNIC1_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_2_3] = mmNIC1_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_3_0] = mmNIC1_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_3_1] = mmNIC1_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_3_2] = mmNIC1_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_3_3] = mmNIC1_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_4_0] = mmNIC2_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_4_1] = mmNIC2_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_4_2] = mmNIC2_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_4_3] = mmNIC2_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_5_0] = mmNIC2_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_5_1] = mmNIC2_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_5_2] = mmNIC2_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_5_3] = mmNIC2_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_6_0] = mmNIC3_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_6_1] = mmNIC3_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_6_2] = mmNIC3_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_6_3] = mmNIC3_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_7_0] = mmNIC3_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_7_1] = mmNIC3_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_7_2] = mmNIC3_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_7_3] = mmNIC3_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_8_0] = mmNIC4_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_8_1] = mmNIC4_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_8_2] = mmNIC4_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_8_3] = mmNIC4_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_9_0] = mmNIC4_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_9_1] = mmNIC4_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_9_2] = mmNIC4_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_9_3] = mmNIC4_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_10_0] = mmNIC5_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_10_1] = mmNIC5_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_10_2] = mmNIC5_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_10_3] = mmNIC5_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_11_0] = mmNIC5_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_11_1] = mmNIC5_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_11_2] = mmNIC5_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_11_3] = mmNIC5_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_12_0] = mmNIC6_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_12_1] = mmNIC6_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_12_2] = mmNIC6_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_12_3] = mmNIC6_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_13_0] = mmNIC6_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_13_1] = mmNIC6_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_13_2] = mmNIC6_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_13_3] = mmNIC6_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_14_0] = mmNIC7_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_14_1] = mmNIC7_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_14_2] = mmNIC7_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_14_3] = mmNIC7_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_15_0] = mmNIC7_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_15_1] = mmNIC7_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_15_2] = mmNIC7_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_15_3] = mmNIC7_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_16_0] = mmNIC8_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_16_1] = mmNIC8_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_16_2] = mmNIC8_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_16_3] = mmNIC8_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_17_0] = mmNIC8_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_17_1] = mmNIC8_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_17_2] = mmNIC8_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_17_3] = mmNIC8_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_18_0] = mmNIC9_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_18_1] = mmNIC9_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_18_2] = mmNIC9_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_18_3] = mmNIC9_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_19_0] = mmNIC9_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_19_1] = mmNIC9_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_19_2] = mmNIC9_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_19_3] = mmNIC9_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_20_0] = mmNIC10_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_20_1] = mmNIC10_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_20_2] = mmNIC10_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_20_3] = mmNIC10_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_21_0] = mmNIC10_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_21_1] = mmNIC10_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_21_2] = mmNIC10_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_21_3] = mmNIC10_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_22_0] = mmNIC11_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_22_1] = mmNIC11_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_22_2] = mmNIC11_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_22_3] = mmNIC11_QM0_BASE,
+ [GAUDI2_QUEUE_ID_NIC_23_0] = mmNIC11_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_23_1] = mmNIC11_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_23_2] = mmNIC11_QM1_BASE,
+ [GAUDI2_QUEUE_ID_NIC_23_3] = mmNIC11_QM1_BASE,
+ [GAUDI2_QUEUE_ID_ROT_0_0] = mmROT0_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_0_1] = mmROT0_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_0_2] = mmROT0_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_0_3] = mmROT0_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_1_0] = mmROT1_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_1_1] = mmROT1_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_1_2] = mmROT1_QM_BASE,
+ [GAUDI2_QUEUE_ID_ROT_1_3] = mmROT1_QM_BASE
+};
+
+static const u32 gaudi2_arc_blocks_bases[NUM_ARC_CPUS] = {
+ [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_AUX_BASE,
+ [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_AUX_BASE,
+ [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_AUX_BASE,
+ [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_AUX_BASE,
+ [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_AUX_BASE,
+ [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_ARC_AUX_BASE,
+ [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_ARC_AUX_BASE,
+ [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_AUX_BASE,
+ [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_ARC_AUX_BASE,
+ [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_ARC_AUX_BASE,
+ [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_AUX_BASE,
+ [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_AUX_BASE,
+ [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_AUX_BASE,
+ [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_AUX_BASE,
+ [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_ARC_AUX1_BASE,
+ [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_ARC_AUX0_BASE,
+ [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_ARC_AUX1_BASE,
+};
+
+static const u32 gaudi2_arc_dccm_bases[NUM_ARC_CPUS] = {
+ [CPU_ID_SCHED_ARC0] = mmARC_FARM_ARC0_DCCM0_BASE,
+ [CPU_ID_SCHED_ARC1] = mmARC_FARM_ARC1_DCCM0_BASE,
+ [CPU_ID_SCHED_ARC2] = mmARC_FARM_ARC2_DCCM0_BASE,
+ [CPU_ID_SCHED_ARC3] = mmARC_FARM_ARC3_DCCM0_BASE,
+ [CPU_ID_SCHED_ARC4] = mmDCORE1_MME_QM_ARC_DCCM_BASE,
+ [CPU_ID_SCHED_ARC5] = mmDCORE3_MME_QM_ARC_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC0] = mmDCORE0_TPC0_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC1] = mmDCORE0_TPC1_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC2] = mmDCORE0_TPC2_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC3] = mmDCORE0_TPC3_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC4] = mmDCORE0_TPC4_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC5] = mmDCORE0_TPC5_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC6] = mmDCORE1_TPC0_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC7] = mmDCORE1_TPC1_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC8] = mmDCORE1_TPC2_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC9] = mmDCORE1_TPC3_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC10] = mmDCORE1_TPC4_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC11] = mmDCORE1_TPC5_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC12] = mmDCORE2_TPC0_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC13] = mmDCORE2_TPC1_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC14] = mmDCORE2_TPC2_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC15] = mmDCORE2_TPC3_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC16] = mmDCORE2_TPC4_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC17] = mmDCORE2_TPC5_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC18] = mmDCORE3_TPC0_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC19] = mmDCORE3_TPC1_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC20] = mmDCORE3_TPC2_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC21] = mmDCORE3_TPC3_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC22] = mmDCORE3_TPC4_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC23] = mmDCORE3_TPC5_QM_DCCM_BASE,
+ [CPU_ID_TPC_QMAN_ARC24] = mmDCORE0_TPC6_QM_DCCM_BASE,
+ [CPU_ID_MME_QMAN_ARC0] = mmDCORE0_MME_QM_ARC_DCCM_BASE,
+ [CPU_ID_MME_QMAN_ARC1] = mmDCORE2_MME_QM_ARC_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC0] = mmDCORE0_EDMA0_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC1] = mmDCORE0_EDMA1_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC2] = mmDCORE1_EDMA0_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC3] = mmDCORE1_EDMA1_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC4] = mmDCORE2_EDMA0_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC5] = mmDCORE2_EDMA1_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC6] = mmDCORE3_EDMA0_QM_DCCM_BASE,
+ [CPU_ID_EDMA_QMAN_ARC7] = mmDCORE3_EDMA1_QM_DCCM_BASE,
+ [CPU_ID_PDMA_QMAN_ARC0] = mmPDMA0_QM_ARC_DCCM_BASE,
+ [CPU_ID_PDMA_QMAN_ARC1] = mmPDMA1_QM_ARC_DCCM_BASE,
+ [CPU_ID_ROT_QMAN_ARC0] = mmROT0_QM_ARC_DCCM_BASE,
+ [CPU_ID_ROT_QMAN_ARC1] = mmROT1_QM_ARC_DCCM_BASE,
+ [CPU_ID_NIC_QMAN_ARC0] = mmNIC0_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC1] = mmNIC0_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC2] = mmNIC1_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC3] = mmNIC1_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC4] = mmNIC2_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC5] = mmNIC2_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC6] = mmNIC3_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC7] = mmNIC3_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC8] = mmNIC4_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC9] = mmNIC4_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC10] = mmNIC5_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC11] = mmNIC5_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC12] = mmNIC6_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC13] = mmNIC6_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC14] = mmNIC7_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC15] = mmNIC7_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC16] = mmNIC8_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC17] = mmNIC8_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC18] = mmNIC9_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC19] = mmNIC9_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC20] = mmNIC10_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC21] = mmNIC10_QM_DCCM1_BASE,
+ [CPU_ID_NIC_QMAN_ARC22] = mmNIC11_QM_DCCM0_BASE,
+ [CPU_ID_NIC_QMAN_ARC23] = mmNIC11_QM_DCCM1_BASE,
+};
+
+const u32 gaudi2_mme_ctrl_lo_blocks_bases[MME_ID_SIZE] = {
+ [MME_ID_DCORE0] = mmDCORE0_MME_CTRL_LO_BASE,
+ [MME_ID_DCORE1] = mmDCORE1_MME_CTRL_LO_BASE,
+ [MME_ID_DCORE2] = mmDCORE2_MME_CTRL_LO_BASE,
+ [MME_ID_DCORE3] = mmDCORE3_MME_CTRL_LO_BASE,
+};
+
+static const u32 gaudi2_queue_id_to_arc_id[GAUDI2_QUEUE_ID_SIZE] = {
+ [GAUDI2_QUEUE_ID_PDMA_0_0] = CPU_ID_PDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_PDMA_0_1] = CPU_ID_PDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_PDMA_0_2] = CPU_ID_PDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_PDMA_0_3] = CPU_ID_PDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_PDMA_1_0] = CPU_ID_PDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_PDMA_1_1] = CPU_ID_PDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_PDMA_1_2] = CPU_ID_PDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_PDMA_1_3] = CPU_ID_PDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_0] = CPU_ID_MME_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_1] = CPU_ID_MME_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_2] = CPU_ID_MME_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_MME_0_3] = CPU_ID_MME_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_0] = CPU_ID_TPC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_1] = CPU_ID_TPC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_2] = CPU_ID_TPC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_0_3] = CPU_ID_TPC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_0] = CPU_ID_TPC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_1] = CPU_ID_TPC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_2] = CPU_ID_TPC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_1_3] = CPU_ID_TPC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_0] = CPU_ID_TPC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_1] = CPU_ID_TPC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_2] = CPU_ID_TPC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_2_3] = CPU_ID_TPC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_0] = CPU_ID_TPC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_1] = CPU_ID_TPC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_2] = CPU_ID_TPC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_3_3] = CPU_ID_TPC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_0] = CPU_ID_TPC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_1] = CPU_ID_TPC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_2] = CPU_ID_TPC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_4_3] = CPU_ID_TPC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_0] = CPU_ID_TPC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_1] = CPU_ID_TPC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_2] = CPU_ID_TPC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_5_3] = CPU_ID_TPC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_0] = CPU_ID_TPC_QMAN_ARC24,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_1] = CPU_ID_TPC_QMAN_ARC24,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_2] = CPU_ID_TPC_QMAN_ARC24,
+ [GAUDI2_QUEUE_ID_DCORE0_TPC_6_3] = CPU_ID_TPC_QMAN_ARC24,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_0] = CPU_ID_SCHED_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_1] = CPU_ID_SCHED_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_2] = CPU_ID_SCHED_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE1_MME_0_3] = CPU_ID_SCHED_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_0] = CPU_ID_TPC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_1] = CPU_ID_TPC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_2] = CPU_ID_TPC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_0_3] = CPU_ID_TPC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_0] = CPU_ID_TPC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_1] = CPU_ID_TPC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_2] = CPU_ID_TPC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_1_3] = CPU_ID_TPC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_0] = CPU_ID_TPC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_1] = CPU_ID_TPC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_2] = CPU_ID_TPC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_2_3] = CPU_ID_TPC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_0] = CPU_ID_TPC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_1] = CPU_ID_TPC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_2] = CPU_ID_TPC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_3_3] = CPU_ID_TPC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_0] = CPU_ID_TPC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_1] = CPU_ID_TPC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_2] = CPU_ID_TPC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_4_3] = CPU_ID_TPC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_0] = CPU_ID_TPC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_1] = CPU_ID_TPC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_2] = CPU_ID_TPC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_DCORE1_TPC_5_3] = CPU_ID_TPC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_0] = CPU_ID_MME_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_1] = CPU_ID_MME_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_2] = CPU_ID_MME_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE2_MME_0_3] = CPU_ID_MME_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_0] = CPU_ID_TPC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_1] = CPU_ID_TPC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_2] = CPU_ID_TPC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_0_3] = CPU_ID_TPC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_0] = CPU_ID_TPC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_1] = CPU_ID_TPC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_2] = CPU_ID_TPC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_1_3] = CPU_ID_TPC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_0] = CPU_ID_TPC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_1] = CPU_ID_TPC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_2] = CPU_ID_TPC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_2_3] = CPU_ID_TPC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_0] = CPU_ID_TPC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_1] = CPU_ID_TPC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_2] = CPU_ID_TPC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_3_3] = CPU_ID_TPC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_0] = CPU_ID_TPC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_1] = CPU_ID_TPC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_2] = CPU_ID_TPC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_4_3] = CPU_ID_TPC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_0] = CPU_ID_TPC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_1] = CPU_ID_TPC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_2] = CPU_ID_TPC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_DCORE2_TPC_5_3] = CPU_ID_TPC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0] = CPU_ID_EDMA_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_1] = CPU_ID_EDMA_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_2] = CPU_ID_EDMA_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_0_3] = CPU_ID_EDMA_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0] = CPU_ID_EDMA_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1] = CPU_ID_EDMA_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2] = CPU_ID_EDMA_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3] = CPU_ID_EDMA_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_0] = CPU_ID_SCHED_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_1] = CPU_ID_SCHED_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_2] = CPU_ID_SCHED_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE3_MME_0_3] = CPU_ID_SCHED_ARC5,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_0] = CPU_ID_TPC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_1] = CPU_ID_TPC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_2] = CPU_ID_TPC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_0_3] = CPU_ID_TPC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_0] = CPU_ID_TPC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_1] = CPU_ID_TPC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_2] = CPU_ID_TPC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_1_3] = CPU_ID_TPC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_0] = CPU_ID_TPC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_1] = CPU_ID_TPC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_2] = CPU_ID_TPC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_2_3] = CPU_ID_TPC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_0] = CPU_ID_TPC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_1] = CPU_ID_TPC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_2] = CPU_ID_TPC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_3_3] = CPU_ID_TPC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_0] = CPU_ID_TPC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_1] = CPU_ID_TPC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_2] = CPU_ID_TPC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_4_3] = CPU_ID_TPC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_0] = CPU_ID_TPC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_1] = CPU_ID_TPC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_2] = CPU_ID_TPC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_DCORE3_TPC_5_3] = CPU_ID_TPC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_NIC_0_0] = CPU_ID_NIC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_NIC_0_1] = CPU_ID_NIC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_NIC_0_2] = CPU_ID_NIC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_NIC_0_3] = CPU_ID_NIC_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_NIC_1_0] = CPU_ID_NIC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_NIC_1_1] = CPU_ID_NIC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_NIC_1_2] = CPU_ID_NIC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_NIC_1_3] = CPU_ID_NIC_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_NIC_2_0] = CPU_ID_NIC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_NIC_2_1] = CPU_ID_NIC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_NIC_2_2] = CPU_ID_NIC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_NIC_2_3] = CPU_ID_NIC_QMAN_ARC2,
+ [GAUDI2_QUEUE_ID_NIC_3_0] = CPU_ID_NIC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_NIC_3_1] = CPU_ID_NIC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_NIC_3_2] = CPU_ID_NIC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_NIC_3_3] = CPU_ID_NIC_QMAN_ARC3,
+ [GAUDI2_QUEUE_ID_NIC_4_0] = CPU_ID_NIC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_NIC_4_1] = CPU_ID_NIC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_NIC_4_2] = CPU_ID_NIC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_NIC_4_3] = CPU_ID_NIC_QMAN_ARC4,
+ [GAUDI2_QUEUE_ID_NIC_5_0] = CPU_ID_NIC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_NIC_5_1] = CPU_ID_NIC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_NIC_5_2] = CPU_ID_NIC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_NIC_5_3] = CPU_ID_NIC_QMAN_ARC5,
+ [GAUDI2_QUEUE_ID_NIC_6_0] = CPU_ID_NIC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_NIC_6_1] = CPU_ID_NIC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_NIC_6_2] = CPU_ID_NIC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_NIC_6_3] = CPU_ID_NIC_QMAN_ARC6,
+ [GAUDI2_QUEUE_ID_NIC_7_0] = CPU_ID_NIC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_NIC_7_1] = CPU_ID_NIC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_NIC_7_2] = CPU_ID_NIC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_NIC_7_3] = CPU_ID_NIC_QMAN_ARC7,
+ [GAUDI2_QUEUE_ID_NIC_8_0] = CPU_ID_NIC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_NIC_8_1] = CPU_ID_NIC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_NIC_8_2] = CPU_ID_NIC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_NIC_8_3] = CPU_ID_NIC_QMAN_ARC8,
+ [GAUDI2_QUEUE_ID_NIC_9_0] = CPU_ID_NIC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_NIC_9_1] = CPU_ID_NIC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_NIC_9_2] = CPU_ID_NIC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_NIC_9_3] = CPU_ID_NIC_QMAN_ARC9,
+ [GAUDI2_QUEUE_ID_NIC_10_0] = CPU_ID_NIC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_NIC_10_1] = CPU_ID_NIC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_NIC_10_2] = CPU_ID_NIC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_NIC_10_3] = CPU_ID_NIC_QMAN_ARC10,
+ [GAUDI2_QUEUE_ID_NIC_11_0] = CPU_ID_NIC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_NIC_11_1] = CPU_ID_NIC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_NIC_11_2] = CPU_ID_NIC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_NIC_11_3] = CPU_ID_NIC_QMAN_ARC11,
+ [GAUDI2_QUEUE_ID_NIC_12_0] = CPU_ID_NIC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_NIC_12_1] = CPU_ID_NIC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_NIC_12_2] = CPU_ID_NIC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_NIC_12_3] = CPU_ID_NIC_QMAN_ARC12,
+ [GAUDI2_QUEUE_ID_NIC_13_0] = CPU_ID_NIC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_NIC_13_1] = CPU_ID_NIC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_NIC_13_2] = CPU_ID_NIC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_NIC_13_3] = CPU_ID_NIC_QMAN_ARC13,
+ [GAUDI2_QUEUE_ID_NIC_14_0] = CPU_ID_NIC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_NIC_14_1] = CPU_ID_NIC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_NIC_14_2] = CPU_ID_NIC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_NIC_14_3] = CPU_ID_NIC_QMAN_ARC14,
+ [GAUDI2_QUEUE_ID_NIC_15_0] = CPU_ID_NIC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_NIC_15_1] = CPU_ID_NIC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_NIC_15_2] = CPU_ID_NIC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_NIC_15_3] = CPU_ID_NIC_QMAN_ARC15,
+ [GAUDI2_QUEUE_ID_NIC_16_0] = CPU_ID_NIC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_NIC_16_1] = CPU_ID_NIC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_NIC_16_2] = CPU_ID_NIC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_NIC_16_3] = CPU_ID_NIC_QMAN_ARC16,
+ [GAUDI2_QUEUE_ID_NIC_17_0] = CPU_ID_NIC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_NIC_17_1] = CPU_ID_NIC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_NIC_17_2] = CPU_ID_NIC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_NIC_17_3] = CPU_ID_NIC_QMAN_ARC17,
+ [GAUDI2_QUEUE_ID_NIC_18_0] = CPU_ID_NIC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_NIC_18_1] = CPU_ID_NIC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_NIC_18_2] = CPU_ID_NIC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_NIC_18_3] = CPU_ID_NIC_QMAN_ARC18,
+ [GAUDI2_QUEUE_ID_NIC_19_0] = CPU_ID_NIC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_NIC_19_1] = CPU_ID_NIC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_NIC_19_2] = CPU_ID_NIC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_NIC_19_3] = CPU_ID_NIC_QMAN_ARC19,
+ [GAUDI2_QUEUE_ID_NIC_20_0] = CPU_ID_NIC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_NIC_20_1] = CPU_ID_NIC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_NIC_20_2] = CPU_ID_NIC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_NIC_20_3] = CPU_ID_NIC_QMAN_ARC20,
+ [GAUDI2_QUEUE_ID_NIC_21_0] = CPU_ID_NIC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_NIC_21_1] = CPU_ID_NIC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_NIC_21_2] = CPU_ID_NIC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_NIC_21_3] = CPU_ID_NIC_QMAN_ARC21,
+ [GAUDI2_QUEUE_ID_NIC_22_0] = CPU_ID_NIC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_NIC_22_1] = CPU_ID_NIC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_NIC_22_2] = CPU_ID_NIC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_NIC_22_3] = CPU_ID_NIC_QMAN_ARC22,
+ [GAUDI2_QUEUE_ID_NIC_23_0] = CPU_ID_NIC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_NIC_23_1] = CPU_ID_NIC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_NIC_23_2] = CPU_ID_NIC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_NIC_23_3] = CPU_ID_NIC_QMAN_ARC23,
+ [GAUDI2_QUEUE_ID_ROT_0_0] = CPU_ID_ROT_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_ROT_0_1] = CPU_ID_ROT_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_ROT_0_2] = CPU_ID_ROT_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_ROT_0_3] = CPU_ID_ROT_QMAN_ARC0,
+ [GAUDI2_QUEUE_ID_ROT_1_0] = CPU_ID_ROT_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_ROT_1_1] = CPU_ID_ROT_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_ROT_1_2] = CPU_ID_ROT_QMAN_ARC1,
+ [GAUDI2_QUEUE_ID_ROT_1_3] = CPU_ID_ROT_QMAN_ARC1
+};
+
+const u32 gaudi2_dma_core_blocks_bases[DMA_CORE_ID_SIZE] = {
+ [DMA_CORE_ID_PDMA0] = mmPDMA0_CORE_BASE,
+ [DMA_CORE_ID_PDMA1] = mmPDMA1_CORE_BASE,
+ [DMA_CORE_ID_EDMA0] = mmDCORE0_EDMA0_CORE_BASE,
+ [DMA_CORE_ID_EDMA1] = mmDCORE0_EDMA1_CORE_BASE,
+ [DMA_CORE_ID_EDMA2] = mmDCORE1_EDMA0_CORE_BASE,
+ [DMA_CORE_ID_EDMA3] = mmDCORE1_EDMA1_CORE_BASE,
+ [DMA_CORE_ID_EDMA4] = mmDCORE2_EDMA0_CORE_BASE,
+ [DMA_CORE_ID_EDMA5] = mmDCORE2_EDMA1_CORE_BASE,
+ [DMA_CORE_ID_EDMA6] = mmDCORE3_EDMA0_CORE_BASE,
+ [DMA_CORE_ID_EDMA7] = mmDCORE3_EDMA1_CORE_BASE,
+ [DMA_CORE_ID_KDMA] = mmARC_FARM_KDMA_BASE
+};
+
+const u32 gaudi2_mme_acc_blocks_bases[MME_ID_SIZE] = {
+ [MME_ID_DCORE0] = mmDCORE0_MME_ACC_BASE,
+ [MME_ID_DCORE1] = mmDCORE1_MME_ACC_BASE,
+ [MME_ID_DCORE2] = mmDCORE2_MME_ACC_BASE,
+ [MME_ID_DCORE3] = mmDCORE3_MME_ACC_BASE
+};
+
+static const u32 gaudi2_tpc_cfg_blocks_bases[TPC_ID_SIZE] = {
+ [TPC_ID_DCORE0_TPC0] = mmDCORE0_TPC0_CFG_BASE,
+ [TPC_ID_DCORE0_TPC1] = mmDCORE0_TPC1_CFG_BASE,
+ [TPC_ID_DCORE0_TPC2] = mmDCORE0_TPC2_CFG_BASE,
+ [TPC_ID_DCORE0_TPC3] = mmDCORE0_TPC3_CFG_BASE,
+ [TPC_ID_DCORE0_TPC4] = mmDCORE0_TPC4_CFG_BASE,
+ [TPC_ID_DCORE0_TPC5] = mmDCORE0_TPC5_CFG_BASE,
+ [TPC_ID_DCORE1_TPC0] = mmDCORE1_TPC0_CFG_BASE,
+ [TPC_ID_DCORE1_TPC1] = mmDCORE1_TPC1_CFG_BASE,
+ [TPC_ID_DCORE1_TPC2] = mmDCORE1_TPC2_CFG_BASE,
+ [TPC_ID_DCORE1_TPC3] = mmDCORE1_TPC3_CFG_BASE,
+ [TPC_ID_DCORE1_TPC4] = mmDCORE1_TPC4_CFG_BASE,
+ [TPC_ID_DCORE1_TPC5] = mmDCORE1_TPC5_CFG_BASE,
+ [TPC_ID_DCORE2_TPC0] = mmDCORE2_TPC0_CFG_BASE,
+ [TPC_ID_DCORE2_TPC1] = mmDCORE2_TPC1_CFG_BASE,
+ [TPC_ID_DCORE2_TPC2] = mmDCORE2_TPC2_CFG_BASE,
+ [TPC_ID_DCORE2_TPC3] = mmDCORE2_TPC3_CFG_BASE,
+ [TPC_ID_DCORE2_TPC4] = mmDCORE2_TPC4_CFG_BASE,
+ [TPC_ID_DCORE2_TPC5] = mmDCORE2_TPC5_CFG_BASE,
+ [TPC_ID_DCORE3_TPC0] = mmDCORE3_TPC0_CFG_BASE,
+ [TPC_ID_DCORE3_TPC1] = mmDCORE3_TPC1_CFG_BASE,
+ [TPC_ID_DCORE3_TPC2] = mmDCORE3_TPC2_CFG_BASE,
+ [TPC_ID_DCORE3_TPC3] = mmDCORE3_TPC3_CFG_BASE,
+ [TPC_ID_DCORE3_TPC4] = mmDCORE3_TPC4_CFG_BASE,
+ [TPC_ID_DCORE3_TPC5] = mmDCORE3_TPC5_CFG_BASE,
+ [TPC_ID_DCORE0_TPC6] = mmDCORE0_TPC6_CFG_BASE,
+};
+
+const u32 gaudi2_rot_blocks_bases[ROTATOR_ID_SIZE] = {
+ [ROTATOR_ID_0] = mmROT0_BASE,
+ [ROTATOR_ID_1] = mmROT1_BASE
+};
+
+static const u32 gaudi2_tpc_id_to_queue_id[TPC_ID_SIZE] = {
+ [TPC_ID_DCORE0_TPC0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0,
+ [TPC_ID_DCORE0_TPC1] = GAUDI2_QUEUE_ID_DCORE0_TPC_1_0,
+ [TPC_ID_DCORE0_TPC2] = GAUDI2_QUEUE_ID_DCORE0_TPC_2_0,
+ [TPC_ID_DCORE0_TPC3] = GAUDI2_QUEUE_ID_DCORE0_TPC_3_0,
+ [TPC_ID_DCORE0_TPC4] = GAUDI2_QUEUE_ID_DCORE0_TPC_4_0,
+ [TPC_ID_DCORE0_TPC5] = GAUDI2_QUEUE_ID_DCORE0_TPC_5_0,
+ [TPC_ID_DCORE1_TPC0] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0,
+ [TPC_ID_DCORE1_TPC1] = GAUDI2_QUEUE_ID_DCORE1_TPC_1_0,
+ [TPC_ID_DCORE1_TPC2] = GAUDI2_QUEUE_ID_DCORE1_TPC_2_0,
+ [TPC_ID_DCORE1_TPC3] = GAUDI2_QUEUE_ID_DCORE1_TPC_3_0,
+ [TPC_ID_DCORE1_TPC4] = GAUDI2_QUEUE_ID_DCORE1_TPC_4_0,
+ [TPC_ID_DCORE1_TPC5] = GAUDI2_QUEUE_ID_DCORE1_TPC_5_0,
+ [TPC_ID_DCORE2_TPC0] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0,
+ [TPC_ID_DCORE2_TPC1] = GAUDI2_QUEUE_ID_DCORE2_TPC_1_0,
+ [TPC_ID_DCORE2_TPC2] = GAUDI2_QUEUE_ID_DCORE2_TPC_2_0,
+ [TPC_ID_DCORE2_TPC3] = GAUDI2_QUEUE_ID_DCORE2_TPC_3_0,
+ [TPC_ID_DCORE2_TPC4] = GAUDI2_QUEUE_ID_DCORE2_TPC_4_0,
+ [TPC_ID_DCORE2_TPC5] = GAUDI2_QUEUE_ID_DCORE2_TPC_5_0,
+ [TPC_ID_DCORE3_TPC0] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0,
+ [TPC_ID_DCORE3_TPC1] = GAUDI2_QUEUE_ID_DCORE3_TPC_1_0,
+ [TPC_ID_DCORE3_TPC2] = GAUDI2_QUEUE_ID_DCORE3_TPC_2_0,
+ [TPC_ID_DCORE3_TPC3] = GAUDI2_QUEUE_ID_DCORE3_TPC_3_0,
+ [TPC_ID_DCORE3_TPC4] = GAUDI2_QUEUE_ID_DCORE3_TPC_4_0,
+ [TPC_ID_DCORE3_TPC5] = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0,
+ [TPC_ID_DCORE0_TPC6] = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0,
+};
+
+static const u32 gaudi2_rot_id_to_queue_id[ROTATOR_ID_SIZE] = {
+ [ROTATOR_ID_0] = GAUDI2_QUEUE_ID_ROT_0_0,
+ [ROTATOR_ID_1] = GAUDI2_QUEUE_ID_ROT_1_0,
+};
+
+const u32 edma_stream_base[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
+ GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0,
+ GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0,
+ GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0,
+ GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0,
+};
+
+static const char gaudi2_vdec_irq_name[GAUDI2_VDEC_MSIX_ENTRIES][GAUDI2_MAX_STRING_LEN] = {
+ "gaudi2 vdec 0_0", "gaudi2 vdec 0_0 abnormal",
+ "gaudi2 vdec 0_1", "gaudi2 vdec 0_1 abnormal",
+ "gaudi2 vdec 1_0", "gaudi2 vdec 1_0 abnormal",
+ "gaudi2 vdec 1_1", "gaudi2 vdec 1_1 abnormal",
+ "gaudi2 vdec 2_0", "gaudi2 vdec 2_0 abnormal",
+ "gaudi2 vdec 2_1", "gaudi2 vdec 2_1 abnormal",
+ "gaudi2 vdec 3_0", "gaudi2 vdec 3_0 abnormal",
+ "gaudi2 vdec 3_1", "gaudi2 vdec 3_1 abnormal",
+ "gaudi2 vdec s_0", "gaudi2 vdec s_0 abnormal",
+ "gaudi2 vdec s_1", "gaudi2 vdec s_1 abnormal"
+};
+
+static const u32 rtr_coordinates_to_rtr_id[NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES] = {
+ RTR_ID_X_Y(2, 4),
+ RTR_ID_X_Y(3, 4),
+ RTR_ID_X_Y(4, 4),
+ RTR_ID_X_Y(5, 4),
+ RTR_ID_X_Y(6, 4),
+ RTR_ID_X_Y(7, 4),
+ RTR_ID_X_Y(8, 4),
+ RTR_ID_X_Y(9, 4),
+ RTR_ID_X_Y(10, 4),
+ RTR_ID_X_Y(11, 4),
+ RTR_ID_X_Y(12, 4),
+ RTR_ID_X_Y(13, 4),
+ RTR_ID_X_Y(14, 4),
+ RTR_ID_X_Y(15, 4),
+ RTR_ID_X_Y(16, 4),
+ RTR_ID_X_Y(17, 4),
+ RTR_ID_X_Y(2, 11),
+ RTR_ID_X_Y(3, 11),
+ RTR_ID_X_Y(4, 11),
+ RTR_ID_X_Y(5, 11),
+ RTR_ID_X_Y(6, 11),
+ RTR_ID_X_Y(7, 11),
+ RTR_ID_X_Y(8, 11),
+ RTR_ID_X_Y(9, 11),
+ RTR_ID_X_Y(0, 0),/* 24 no id */
+ RTR_ID_X_Y(0, 0),/* 25 no id */
+ RTR_ID_X_Y(0, 0),/* 26 no id */
+ RTR_ID_X_Y(0, 0),/* 27 no id */
+ RTR_ID_X_Y(14, 11),
+ RTR_ID_X_Y(15, 11),
+ RTR_ID_X_Y(16, 11),
+ RTR_ID_X_Y(17, 11)
+};
+
+enum rtr_id {
+ DCORE0_RTR0,
+ DCORE0_RTR1,
+ DCORE0_RTR2,
+ DCORE0_RTR3,
+ DCORE0_RTR4,
+ DCORE0_RTR5,
+ DCORE0_RTR6,
+ DCORE0_RTR7,
+ DCORE1_RTR0,
+ DCORE1_RTR1,
+ DCORE1_RTR2,
+ DCORE1_RTR3,
+ DCORE1_RTR4,
+ DCORE1_RTR5,
+ DCORE1_RTR6,
+ DCORE1_RTR7,
+ DCORE2_RTR0,
+ DCORE2_RTR1,
+ DCORE2_RTR2,
+ DCORE2_RTR3,
+ DCORE2_RTR4,
+ DCORE2_RTR5,
+ DCORE2_RTR6,
+ DCORE2_RTR7,
+ DCORE3_RTR0,
+ DCORE3_RTR1,
+ DCORE3_RTR2,
+ DCORE3_RTR3,
+ DCORE3_RTR4,
+ DCORE3_RTR5,
+ DCORE3_RTR6,
+ DCORE3_RTR7,
+};
+
+static const u32 gaudi2_tpc_initiator_hbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
+ DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2, DCORE0_RTR3, DCORE0_RTR3,
+ DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5, DCORE1_RTR4, DCORE1_RTR4,
+ DCORE2_RTR3, DCORE2_RTR3, DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1,
+ DCORE3_RTR4, DCORE3_RTR4, DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6,
+ DCORE0_RTR0
+};
+
+static const u32 gaudi2_tpc_initiator_lbw_rtr_id[NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1] = {
+ DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR1, DCORE0_RTR2, DCORE0_RTR2,
+ DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR6, DCORE1_RTR6, DCORE1_RTR5, DCORE1_RTR5,
+ DCORE2_RTR2, DCORE2_RTR2, DCORE2_RTR1, DCORE2_RTR1, DCORE2_RTR0, DCORE2_RTR0,
+ DCORE3_RTR5, DCORE3_RTR5, DCORE3_RTR6, DCORE3_RTR6, DCORE3_RTR7, DCORE3_RTR7,
+ DCORE0_RTR0
+};
+
+static const u32 gaudi2_dec_initiator_hbw_rtr_id[NUMBER_OF_DEC] = {
+ DCORE0_RTR0, DCORE0_RTR0, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0, DCORE2_RTR0,
+ DCORE3_RTR7, DCORE3_RTR7, DCORE0_RTR0, DCORE0_RTR0
+};
+
+static const u32 gaudi2_dec_initiator_lbw_rtr_id[NUMBER_OF_DEC] = {
+ DCORE0_RTR1, DCORE0_RTR1, DCORE1_RTR6, DCORE1_RTR6, DCORE2_RTR1, DCORE2_RTR1,
+ DCORE3_RTR6, DCORE3_RTR6, DCORE0_RTR0, DCORE0_RTR0
+};
+
+static const u32 gaudi2_nic_initiator_hbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
+ DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
+ DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
+};
+
+static const u32 gaudi2_nic_initiator_lbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
+ DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE1_RTR7, DCORE2_RTR0,
+ DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
+};
+
+static const u32 gaudi2_edma_initiator_hbw_sft[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
+ mmSFT0_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT0_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT1_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT1_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT2_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT2_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT3_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+ mmSFT3_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE
+};
+
+static const u32 gaudi2_pdma_initiator_hbw_rtr_id[NUM_OF_PDMA] = {
+ DCORE0_RTR0, DCORE0_RTR0
+};
+
+static const u32 gaudi2_pdma_initiator_lbw_rtr_id[NUM_OF_PDMA] = {
+ DCORE0_RTR2, DCORE0_RTR2
+};
+
+static const u32 gaudi2_rot_initiator_hbw_rtr_id[NUM_OF_ROT] = {
+ DCORE2_RTR0, DCORE3_RTR7
+};
+
+static const u32 gaudi2_rot_initiator_lbw_rtr_id[NUM_OF_ROT] = {
+ DCORE2_RTR2, DCORE3_RTR5
+};
+
+struct mme_initiators_rtr_id {
+ u32 wap0;
+ u32 wap1;
+ u32 write;
+ u32 read;
+ u32 sbte0;
+ u32 sbte1;
+ u32 sbte2;
+ u32 sbte3;
+ u32 sbte4;
+};
+
+enum mme_initiators {
+ MME_WAP0 = 0,
+ MME_WAP1,
+ MME_WRITE,
+ MME_READ,
+ MME_SBTE0,
+ MME_SBTE1,
+ MME_SBTE2,
+ MME_SBTE3,
+ MME_SBTE4,
+ MME_INITIATORS_MAX
+};
+
+static const struct mme_initiators_rtr_id
+gaudi2_mme_initiator_rtr_id[NUM_OF_MME_PER_DCORE * NUM_OF_DCORES] = {
+ { .wap0 = 5, .wap1 = 7, .write = 6, .read = 7,
+ .sbte0 = 7, .sbte1 = 4, .sbte2 = 4, .sbte3 = 5, .sbte4 = 6},
+ { .wap0 = 10, .wap1 = 8, .write = 9, .read = 8,
+ .sbte0 = 11, .sbte1 = 11, .sbte2 = 10, .sbte3 = 9, .sbte4 = 8},
+ { .wap0 = 21, .wap1 = 23, .write = 22, .read = 23,
+ .sbte0 = 20, .sbte1 = 20, .sbte2 = 21, .sbte3 = 22, .sbte4 = 23},
+ { .wap0 = 30, .wap1 = 28, .write = 29, .read = 30,
+ .sbte0 = 31, .sbte1 = 31, .sbte2 = 30, .sbte3 = 29, .sbte4 = 28},
+};
+
+enum razwi_event_sources {
+ RAZWI_TPC,
+ RAZWI_MME,
+ RAZWI_EDMA,
+ RAZWI_PDMA,
+ RAZWI_NIC,
+ RAZWI_DEC,
+ RAZWI_ROT
+};
+
+struct hbm_mc_error_causes {
+ u32 mask;
+ char cause[50];
+};
+
+static struct hl_special_block_info gaudi2_special_blocks[] = GAUDI2_SPECIAL_BLOCKS;
+
+/* Special blocks iterator is currently used to configure security protection bits,
+ * and read global errors. Most HW blocks are addressable and those who aren't (N/A)-
+ * must be skipped. Following configurations are commonly used for both PB config
+ * and global error reading, since currently they both share the same settings.
+ * Once it changes, we must remember to use separate configurations for either one.
+ */
+static int gaudi2_iterator_skip_block_types[] = {
+ GAUDI2_BLOCK_TYPE_PLL,
+ GAUDI2_BLOCK_TYPE_EU_BIST,
+ GAUDI2_BLOCK_TYPE_HBM,
+ GAUDI2_BLOCK_TYPE_XFT
+};
+
+static struct range gaudi2_iterator_skip_block_ranges[] = {
+ /* Skip all PSOC blocks except for PSOC_GLOBAL_CONF */
+ {mmPSOC_I2C_M0_BASE, mmPSOC_EFUSE_BASE},
+ {mmPSOC_BTL_BASE, mmPSOC_MSTR_IF_RR_SHRD_HBW_BASE},
+ /* Skip all CPU blocks except for CPU_IF */
+ {mmCPU_CA53_CFG_BASE, mmCPU_CA53_CFG_BASE},
+ {mmCPU_TIMESTAMP_BASE, mmCPU_MSTR_IF_RR_SHRD_HBW_BASE}
+};
+
+static struct hbm_mc_error_causes hbm_mc_spi[GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE] = {
+ {HBM_MC_SPI_TEMP_PIN_CHG_MASK, "temperature pins changed"},
+ {HBM_MC_SPI_THR_ENG_MASK, "temperature-based throttling engaged"},
+ {HBM_MC_SPI_THR_DIS_ENG_MASK, "temperature-based throttling disengaged"},
+ {HBM_MC_SPI_IEEE1500_COMP_MASK, "IEEE1500 op comp"},
+ {HBM_MC_SPI_IEEE1500_PAUSED_MASK, "IEEE1500 op paused"},
+};
+
+static const char * const hbm_mc_sei_cause[GAUDI2_NUM_OF_HBM_SEI_CAUSE] = {
+ [HBM_SEI_CMD_PARITY_EVEN] = "SEI C/A parity even",
+ [HBM_SEI_CMD_PARITY_ODD] = "SEI C/A parity odd",
+ [HBM_SEI_READ_ERR] = "SEI read data error",
+ [HBM_SEI_WRITE_DATA_PARITY_ERR] = "SEI write data parity error",
+ [HBM_SEI_CATTRIP] = "SEI CATTRIP asserted",
+ [HBM_SEI_MEM_BIST_FAIL] = "SEI memory BIST fail",
+ [HBM_SEI_DFI] = "SEI DFI error",
+ [HBM_SEI_INV_TEMP_READ_OUT] = "SEI invalid temp read",
+ [HBM_SEI_BIST_FAIL] = "SEI BIST fail"
+};
+
+struct mmu_spi_sei_cause {
+ char cause[50];
+ int clear_bit;
+};
+
+static const struct mmu_spi_sei_cause gaudi2_mmu_spi_sei[GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE] = {
+ {"page fault", 1}, /* INTERRUPT_CLR[1] */
+ {"page access", 1}, /* INTERRUPT_CLR[1] */
+ {"bypass ddr", 2}, /* INTERRUPT_CLR[2] */
+ {"multi hit", 2}, /* INTERRUPT_CLR[2] */
+ {"mmu rei0", -1}, /* no clear register bit */
+ {"mmu rei1", -1}, /* no clear register bit */
+ {"stlb rei0", -1}, /* no clear register bit */
+ {"stlb rei1", -1}, /* no clear register bit */
+ {"rr privileged write hit", 2}, /* INTERRUPT_CLR[2] */
+ {"rr privileged read hit", 2}, /* INTERRUPT_CLR[2] */
+ {"rr secure write hit", 2}, /* INTERRUPT_CLR[2] */
+ {"rr secure read hit", 2}, /* INTERRUPT_CLR[2] */
+ {"bist_fail no use", 2}, /* INTERRUPT_CLR[2] */
+ {"bist_fail no use", 2}, /* INTERRUPT_CLR[2] */
+ {"bist_fail no use", 2}, /* INTERRUPT_CLR[2] */
+ {"bist_fail no use", 2}, /* INTERRUPT_CLR[2] */
+ {"slave error", 16}, /* INTERRUPT_CLR[16] */
+ {"dec error", 17}, /* INTERRUPT_CLR[17] */
+ {"burst fifo full", 2} /* INTERRUPT_CLR[2] */
+};
+
+struct gaudi2_cache_invld_params {
+ u64 start_va;
+ u64 end_va;
+ u32 inv_start_val;
+ u32 flags;
+ bool range_invalidation;
+};
+
+struct gaudi2_tpc_idle_data {
+ struct engines_data *e;
+ unsigned long *mask;
+ bool *is_idle;
+ const char *tpc_fmt;
+};
+
+struct gaudi2_tpc_mmu_data {
+ u32 rw_asid;
+};
+
+static s64 gaudi2_state_dump_specs_props[SP_MAX] = {0};
+
+static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val);
+static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id);
+static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id);
+static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id);
+static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id);
+static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val);
+static int gaudi2_send_job_to_kdma(struct hl_device *hdev, u64 src_addr, u64 dst_addr, u32 size,
+ bool is_memset);
+static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr);
+
+static void gaudi2_init_scrambler_hbm(struct hl_device *hdev)
+{
+
+}
+
+static u32 gaudi2_get_signal_cb_size(struct hl_device *hdev)
+{
+ return sizeof(struct packet_msg_short);
+}
+
+static u32 gaudi2_get_wait_cb_size(struct hl_device *hdev)
+{
+ return sizeof(struct packet_msg_short) * 4 + sizeof(struct packet_fence);
+}
+
+void gaudi2_iterate_tpcs(struct hl_device *hdev, struct iterate_module_ctx *ctx)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int dcore, inst, tpc_seq;
+ u32 offset;
+
+ /* init the return code */
+ ctx->rc = 0;
+
+ for (dcore = 0; dcore < NUM_OF_DCORES; dcore++) {
+ for (inst = 0; inst < NUM_OF_TPC_PER_DCORE; inst++) {
+ tpc_seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
+
+ if (!(prop->tpc_enabled_mask & BIT(tpc_seq)))
+ continue;
+
+ offset = (DCORE_OFFSET * dcore) + (DCORE_TPC_OFFSET * inst);
+
+ ctx->fn(hdev, dcore, inst, offset, ctx);
+ if (ctx->rc) {
+ dev_err(hdev->dev, "TPC iterator failed for DCORE%d TPC%d\n",
+ dcore, inst);
+ return;
+ }
+ }
+ }
+
+ if (!(prop->tpc_enabled_mask & BIT(TPC_ID_DCORE0_TPC6)))
+ return;
+
+ /* special check for PCI TPC (DCORE0_TPC6) */
+ offset = DCORE_TPC_OFFSET * (NUM_DCORE0_TPC - 1);
+ ctx->fn(hdev, 0, NUM_DCORE0_TPC - 1, offset, ctx);
+ if (ctx->rc)
+ dev_err(hdev->dev, "TPC iterator failed for DCORE0 TPC6\n");
+}
+
+static bool gaudi2_host_phys_addr_valid(u64 addr)
+{
+ if ((addr < HOST_PHYS_BASE_0 + HOST_PHYS_SIZE_0) || (addr >= HOST_PHYS_BASE_1))
+ return true;
+
+ return false;
+}
+
+static int set_number_of_functional_hbms(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u8 faulty_hbms = hweight64(hdev->dram_binning);
+
+ /* check if all HBMs should be used */
+ if (!faulty_hbms) {
+ dev_dbg(hdev->dev, "All HBM are in use (no binning)\n");
+ prop->num_functional_hbms = GAUDI2_HBM_NUM;
+ return 0;
+ }
+
+ /*
+ * check for error condition in which number of binning
+ * candidates is higher than the maximum supported by the
+ * driver (in which case binning mask shall be ignored and driver will
+ * set the default)
+ */
+ if (faulty_hbms > MAX_FAULTY_HBMS) {
+ dev_err(hdev->dev,
+ "HBM binning supports max of %d faulty HBMs, supplied mask 0x%llx.\n",
+ MAX_FAULTY_HBMS, hdev->dram_binning);
+ return -EINVAL;
+ }
+
+ /*
+ * by default, number of functional HBMs in Gaudi2 is always
+ * GAUDI2_HBM_NUM - 1.
+ */
+ prop->num_functional_hbms = GAUDI2_HBM_NUM - faulty_hbms;
+ return 0;
+}
+
+static int gaudi2_set_dram_properties(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 basic_hbm_page_size;
+ int rc;
+
+ rc = set_number_of_functional_hbms(hdev);
+ if (rc)
+ return -EINVAL;
+
+ /*
+ * Due to HW bug in which TLB size is x16 smaller than expected we use a workaround
+ * in which we are using x16 bigger page size to be able to populate the entire
+ * HBM mappings in the TLB
+ */
+ basic_hbm_page_size = prop->num_functional_hbms * SZ_8M;
+ prop->dram_page_size = GAUDI2_COMPENSATE_TLB_PAGE_SIZE_FACTOR * basic_hbm_page_size;
+ prop->device_mem_alloc_default_page_size = prop->dram_page_size;
+ prop->dram_size = prop->num_functional_hbms * SZ_16G;
+ prop->dram_base_address = DRAM_PHYS_BASE;
+ prop->dram_end_address = prop->dram_base_address + prop->dram_size;
+ prop->dram_supports_virtual_memory = true;
+
+ prop->dram_user_base_address = DRAM_PHYS_BASE + prop->dram_page_size;
+ prop->dram_hints_align_mask = ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK;
+ prop->hints_dram_reserved_va_range.start_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_START;
+ prop->hints_dram_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HBM_END;
+
+ /* since DRAM page size differs from DMMU page size we need to allocate
+ * DRAM memory in units of dram_page size and mapping this memory in
+ * units of DMMU page size. we overcome this size mismatch using a
+ * scrambling routine which takes a DRAM page and converts it to a DMMU
+ * page.
+ * We therefore:
+ * 1. partition the virtual address space to DRAM-page (whole) pages.
+ * (suppose we get n such pages)
+ * 2. limit the amount of virtual address space we got from 1 above to
+ * a multiple of 64M as we don't want the scrambled address to cross
+ * the DRAM virtual address space.
+ * ( m = (n * DRAM_page_size) / DMMU_page_size).
+ * 3. determine the and address accordingly
+ * end_addr = start_addr + m * 48M
+ *
+ * the DRAM address MSBs (63:48) are not part of the roundup calculation
+ */
+ prop->dmmu.start_addr = prop->dram_base_address +
+ (prop->dram_page_size *
+ DIV_ROUND_UP_SECTOR_T(prop->dram_size, prop->dram_page_size));
+
+ prop->dmmu.end_addr = prop->dmmu.start_addr + prop->dram_page_size *
+ div_u64((VA_HBM_SPACE_END - prop->dmmu.start_addr), prop->dmmu.page_size);
+
+ return 0;
+}
+
+static int gaudi2_set_fixed_properties(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hw_queue_properties *q_props;
+ u32 num_sync_stream_queues = 0;
+ int i;
+
+ prop->max_queues = GAUDI2_QUEUE_ID_SIZE;
+ prop->hw_queues_props = kcalloc(prop->max_queues, sizeof(struct hw_queue_properties),
+ GFP_KERNEL);
+
+ if (!prop->hw_queues_props)
+ return -ENOMEM;
+
+ q_props = prop->hw_queues_props;
+
+ for (i = 0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i++) {
+ q_props[i].type = QUEUE_TYPE_HW;
+ q_props[i].driver_only = 0;
+
+ if (i >= GAUDI2_QUEUE_ID_NIC_0_0 && i <= GAUDI2_QUEUE_ID_NIC_23_3) {
+ q_props[i].supports_sync_stream = 0;
+ } else {
+ q_props[i].supports_sync_stream = 1;
+ num_sync_stream_queues++;
+ }
+
+ q_props[i].cb_alloc_flags = CB_ALLOC_USER;
+ }
+
+ q_props[GAUDI2_QUEUE_ID_CPU_PQ].type = QUEUE_TYPE_CPU;
+ q_props[GAUDI2_QUEUE_ID_CPU_PQ].driver_only = 1;
+ q_props[GAUDI2_QUEUE_ID_CPU_PQ].cb_alloc_flags = CB_ALLOC_KERNEL;
+
+ prop->cache_line_size = DEVICE_CACHE_LINE_SIZE;
+ prop->cfg_base_address = CFG_BASE;
+ prop->device_dma_offset_for_host_access = HOST_PHYS_BASE_0;
+ prop->host_base_address = HOST_PHYS_BASE_0;
+ prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE_0;
+ prop->max_pending_cs = GAUDI2_MAX_PENDING_CS;
+ prop->completion_queues_count = GAUDI2_RESERVED_CQ_NUMBER;
+ prop->user_dec_intr_count = NUMBER_OF_DEC;
+ prop->user_interrupt_count = GAUDI2_IRQ_NUM_USER_LAST - GAUDI2_IRQ_NUM_USER_FIRST + 1;
+ prop->completion_mode = HL_COMPLETION_MODE_CS;
+ prop->sync_stream_first_sob = GAUDI2_RESERVED_SOB_NUMBER;
+ prop->sync_stream_first_mon = GAUDI2_RESERVED_MON_NUMBER;
+
+ prop->sram_base_address = SRAM_BASE_ADDR;
+ prop->sram_size = SRAM_SIZE;
+ prop->sram_end_address = prop->sram_base_address + prop->sram_size;
+ prop->sram_user_base_address = prop->sram_base_address + SRAM_USER_BASE_OFFSET;
+
+ prop->hints_range_reservation = true;
+
+ if (hdev->pldm)
+ prop->mmu_pgt_size = 0x800000; /* 8MB */
+ else
+ prop->mmu_pgt_size = MMU_PAGE_TABLES_INITIAL_SIZE;
+
+ prop->mmu_pte_size = HL_PTE_SIZE;
+ prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
+ prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
+
+ prop->dmmu.hop_shifts[MMU_HOP0] = DHOP0_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP1] = DHOP1_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP2] = DHOP2_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP3] = DHOP3_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP4] = DHOP4_SHIFT;
+ prop->dmmu.hop_masks[MMU_HOP0] = DHOP0_MASK;
+ prop->dmmu.hop_masks[MMU_HOP1] = DHOP1_MASK;
+ prop->dmmu.hop_masks[MMU_HOP2] = DHOP2_MASK;
+ prop->dmmu.hop_masks[MMU_HOP3] = DHOP3_MASK;
+ prop->dmmu.hop_masks[MMU_HOP4] = DHOP4_MASK;
+ prop->dmmu.page_size = PAGE_SIZE_1GB;
+ prop->dmmu.num_hops = MMU_ARCH_6_HOPS;
+ prop->dmmu.last_mask = LAST_MASK;
+ prop->dmmu.host_resident = 1;
+ /* TODO: will be duplicated until implementing per-MMU props */
+ prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
+ prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
+
+ /*
+ * this is done in order to be able to validate FW descriptor (i.e. validating that
+ * the addresses and allocated space for FW image does not cross memory bounds).
+ * for this reason we set the DRAM size to the minimum possible and later it will
+ * be modified according to what reported in the cpucp info packet
+ */
+ prop->dram_size = (GAUDI2_HBM_NUM - 1) * SZ_16G;
+
+ hdev->pmmu_huge_range = true;
+ prop->pmmu.host_resident = 1;
+ prop->pmmu.num_hops = MMU_ARCH_6_HOPS;
+ prop->pmmu.last_mask = LAST_MASK;
+ /* TODO: will be duplicated until implementing per-MMU props */
+ prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
+ prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
+
+ prop->hints_host_reserved_va_range.start_addr = RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START;
+ prop->hints_host_reserved_va_range.end_addr = RESERVED_VA_RANGE_FOR_ARC_ON_HOST_END;
+ prop->hints_host_hpage_reserved_va_range.start_addr =
+ RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_START;
+ prop->hints_host_hpage_reserved_va_range.end_addr =
+ RESERVED_VA_RANGE_FOR_ARC_ON_HOST_HPAGE_END;
+
+ if (PAGE_SIZE == SZ_64K) {
+ prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_64K;
+ prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_64K;
+ prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_64K;
+ prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_64K;
+ prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_64K;
+ prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_64K;
+ prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_64K;
+ prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_64K;
+ prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_64K;
+ prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_64K;
+ prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_64K;
+ prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_64K;
+ prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
+ prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
+ prop->pmmu.page_size = PAGE_SIZE_64KB;
+
+ /* shifts and masks are the same in PMMU and HPMMU */
+ memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
+ prop->pmmu_huge.page_size = PAGE_SIZE_16MB;
+ prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
+ prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
+ } else {
+ prop->pmmu.hop_shifts[MMU_HOP0] = HOP0_SHIFT_4K;
+ prop->pmmu.hop_shifts[MMU_HOP1] = HOP1_SHIFT_4K;
+ prop->pmmu.hop_shifts[MMU_HOP2] = HOP2_SHIFT_4K;
+ prop->pmmu.hop_shifts[MMU_HOP3] = HOP3_SHIFT_4K;
+ prop->pmmu.hop_shifts[MMU_HOP4] = HOP4_SHIFT_4K;
+ prop->pmmu.hop_shifts[MMU_HOP5] = HOP5_SHIFT_4K;
+ prop->pmmu.hop_masks[MMU_HOP0] = HOP0_MASK_4K;
+ prop->pmmu.hop_masks[MMU_HOP1] = HOP1_MASK_4K;
+ prop->pmmu.hop_masks[MMU_HOP2] = HOP2_MASK_4K;
+ prop->pmmu.hop_masks[MMU_HOP3] = HOP3_MASK_4K;
+ prop->pmmu.hop_masks[MMU_HOP4] = HOP4_MASK_4K;
+ prop->pmmu.hop_masks[MMU_HOP5] = HOP5_MASK_4K;
+ prop->pmmu.start_addr = VA_HOST_SPACE_PAGE_START;
+ prop->pmmu.end_addr = VA_HOST_SPACE_PAGE_END;
+ prop->pmmu.page_size = PAGE_SIZE_4KB;
+
+ /* shifts and masks are the same in PMMU and HPMMU */
+ memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
+ prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
+ prop->pmmu_huge.start_addr = VA_HOST_SPACE_HPAGE_START;
+ prop->pmmu_huge.end_addr = VA_HOST_SPACE_HPAGE_END;
+ }
+
+ prop->num_engine_cores = CPU_ID_MAX;
+ prop->cfg_size = CFG_SIZE;
+ prop->max_asid = MAX_ASID;
+ prop->num_of_events = GAUDI2_EVENT_SIZE;
+
+ prop->dc_power_default = DC_POWER_DEFAULT;
+
+ prop->cb_pool_cb_cnt = GAUDI2_CB_POOL_CB_CNT;
+ prop->cb_pool_cb_size = GAUDI2_CB_POOL_CB_SIZE;
+ prop->pcie_dbi_base_address = CFG_BASE + mmPCIE_DBI_BASE;
+ prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
+
+ strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
+
+ prop->mme_master_slave_mode = 1;
+
+ prop->first_available_user_sob[0] = GAUDI2_RESERVED_SOB_NUMBER +
+ (num_sync_stream_queues * HL_RSVD_SOBS);
+
+ prop->first_available_user_mon[0] = GAUDI2_RESERVED_MON_NUMBER +
+ (num_sync_stream_queues * HL_RSVD_MONS);
+
+ prop->first_available_user_interrupt = GAUDI2_IRQ_NUM_USER_FIRST;
+
+ prop->first_available_cq[0] = GAUDI2_RESERVED_CQ_NUMBER;
+
+ prop->fw_cpu_boot_dev_sts0_valid = false;
+ prop->fw_cpu_boot_dev_sts1_valid = false;
+ prop->hard_reset_done_by_fw = false;
+ prop->gic_interrupts_enable = true;
+
+ prop->server_type = HL_SERVER_TYPE_UNKNOWN;
+
+ prop->max_dec = NUMBER_OF_DEC;
+
+ prop->clk_pll_index = HL_GAUDI2_MME_PLL;
+
+ prop->dma_mask = 64;
+
+ prop->hbw_flush_reg = mmPCIE_WRAP_SPECIAL_GLBL_SPARE_0;
+
+ return 0;
+}
+
+static int gaudi2_pci_bars_map(struct hl_device *hdev)
+{
+ static const char * const name[] = {"CFG_SRAM", "MSIX", "DRAM"};
+ bool is_wc[3] = {false, false, true};
+ int rc;
+
+ rc = hl_pci_bars_map(hdev, name, is_wc);
+ if (rc)
+ return rc;
+
+ hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] + (CFG_BASE - STM_FLASH_BASE_ADDR);
+
+ return 0;
+}
+
+static u64 gaudi2_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct hl_inbound_pci_region pci_region;
+ u64 old_addr = addr;
+ int rc;
+
+ if ((gaudi2) && (gaudi2->dram_bar_cur_addr == addr))
+ return old_addr;
+
+ if (hdev->asic_prop.iatu_done_by_fw)
+ return U64_MAX;
+
+ /* Inbound Region 2 - Bar 4 - Point to DRAM */
+ pci_region.mode = PCI_BAR_MATCH_MODE;
+ pci_region.bar = DRAM_BAR_ID;
+ pci_region.addr = addr;
+ rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
+ if (rc)
+ return U64_MAX;
+
+ if (gaudi2) {
+ old_addr = gaudi2->dram_bar_cur_addr;
+ gaudi2->dram_bar_cur_addr = addr;
+ }
+
+ return old_addr;
+}
+
+static int gaudi2_init_iatu(struct hl_device *hdev)
+{
+ struct hl_inbound_pci_region inbound_region;
+ struct hl_outbound_pci_region outbound_region;
+ u32 bar_addr_low, bar_addr_high;
+ int rc;
+
+ if (hdev->asic_prop.iatu_done_by_fw)
+ return 0;
+
+ /* Temporary inbound Region 0 - Bar 0 - Point to CFG
+ * We must map this region in BAR match mode in order to
+ * fetch BAR physical base address
+ */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = SRAM_CFG_BAR_ID;
+ /* Base address must be aligned to Bar size which is 256 MB */
+ inbound_region.addr = STM_FLASH_BASE_ADDR - STM_FLASH_ALIGNED_OFF;
+ rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+ if (rc)
+ return rc;
+
+ /* Fetch physical BAR address */
+ bar_addr_high = RREG32(mmPCIE_DBI_BAR1_REG + STM_FLASH_ALIGNED_OFF);
+ bar_addr_low = RREG32(mmPCIE_DBI_BAR0_REG + STM_FLASH_ALIGNED_OFF) & ~0xF;
+
+ hdev->pcie_bar_phys[SRAM_CFG_BAR_ID] = (u64)bar_addr_high << 32 | bar_addr_low;
+
+ /* Inbound Region 0 - Bar 0 - Point to CFG */
+ inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
+ inbound_region.bar = SRAM_CFG_BAR_ID;
+ inbound_region.offset_in_bar = 0;
+ inbound_region.addr = STM_FLASH_BASE_ADDR;
+ inbound_region.size = CFG_REGION_SIZE;
+ rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+ if (rc)
+ return rc;
+
+ /* Inbound Region 1 - Bar 0 - Point to BAR0_RESERVED + SRAM */
+ inbound_region.mode = PCI_ADDRESS_MATCH_MODE;
+ inbound_region.bar = SRAM_CFG_BAR_ID;
+ inbound_region.offset_in_bar = CFG_REGION_SIZE;
+ inbound_region.addr = BAR0_RSRVD_BASE_ADDR;
+ inbound_region.size = BAR0_RSRVD_SIZE + SRAM_SIZE;
+ rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
+ if (rc)
+ return rc;
+
+ /* Inbound Region 2 - Bar 4 - Point to DRAM */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = DRAM_BAR_ID;
+ inbound_region.addr = DRAM_PHYS_BASE;
+ rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
+ if (rc)
+ return rc;
+
+ /* Outbound Region 0 - Point to Host */
+ outbound_region.addr = HOST_PHYS_BASE_0;
+ outbound_region.size = HOST_PHYS_SIZE_0;
+ rc = hl_pci_set_outbound_region(hdev, &outbound_region);
+
+ return rc;
+}
+
+static enum hl_device_hw_state gaudi2_get_hw_state(struct hl_device *hdev)
+{
+ return RREG32(mmHW_STATE);
+}
+
+static int gaudi2_tpc_binning_init_prop(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+ /*
+ * check for error condition in which number of binning candidates
+ * is higher than the maximum supported by the driver
+ */
+ if (hweight64(hdev->tpc_binning) > MAX_CLUSTER_BINNING_FAULTY_TPCS) {
+ dev_err(hdev->dev, "TPC binning is supported for max of %d faulty TPCs, provided mask 0x%llx\n",
+ MAX_CLUSTER_BINNING_FAULTY_TPCS,
+ hdev->tpc_binning);
+ return -EINVAL;
+ }
+
+ prop->tpc_binning_mask = hdev->tpc_binning;
+ prop->tpc_enabled_mask = GAUDI2_TPC_FULL_MASK;
+
+ return 0;
+}
+
+static int gaudi2_set_tpc_binning_masks(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hw_queue_properties *q_props = prop->hw_queues_props;
+ u64 tpc_binning_mask;
+ u8 subst_idx = 0;
+ int i, rc;
+
+ rc = gaudi2_tpc_binning_init_prop(hdev);
+ if (rc)
+ return rc;
+
+ tpc_binning_mask = prop->tpc_binning_mask;
+
+ for (i = 0 ; i < MAX_FAULTY_TPCS ; i++) {
+ u8 subst_seq, binned, qid_base;
+
+ if (tpc_binning_mask == 0)
+ break;
+
+ if (subst_idx == 0) {
+ subst_seq = TPC_ID_DCORE0_TPC6;
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
+ } else {
+ subst_seq = TPC_ID_DCORE3_TPC5;
+ qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_5_0;
+ }
+
+
+ /* clear bit from mask */
+ binned = __ffs(tpc_binning_mask);
+ /*
+ * Coverity complains about possible out-of-bound access in
+ * clear_bit
+ */
+ if (binned >= TPC_ID_SIZE) {
+ dev_err(hdev->dev,
+ "Invalid binned TPC (binning mask: %llx)\n",
+ tpc_binning_mask);
+ return -EINVAL;
+ }
+ clear_bit(binned, (unsigned long *)&tpc_binning_mask);
+
+ /* also clear replacing TPC bit from enabled mask */
+ clear_bit(subst_seq, (unsigned long *)&prop->tpc_enabled_mask);
+
+ /* bin substite TPC's Qs */
+ q_props[qid_base].binned = 1;
+ q_props[qid_base + 1].binned = 1;
+ q_props[qid_base + 2].binned = 1;
+ q_props[qid_base + 3].binned = 1;
+
+ subst_idx++;
+ }
+
+ return 0;
+}
+
+static int gaudi2_set_dec_binning_masks(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u8 num_faulty;
+
+ num_faulty = hweight32(hdev->decoder_binning);
+
+ /*
+ * check for error condition in which number of binning candidates
+ * is higher than the maximum supported by the driver
+ */
+ if (num_faulty > MAX_FAULTY_DECODERS) {
+ dev_err(hdev->dev, "decoder binning is supported for max of single faulty decoder, provided mask 0x%x\n",
+ hdev->decoder_binning);
+ return -EINVAL;
+ }
+
+ prop->decoder_binning_mask = (hdev->decoder_binning & GAUDI2_DECODER_FULL_MASK);
+
+ if (prop->decoder_binning_mask)
+ prop->decoder_enabled_mask = (GAUDI2_DECODER_FULL_MASK & ~BIT(DEC_ID_PCIE_VDEC1));
+ else
+ prop->decoder_enabled_mask = GAUDI2_DECODER_FULL_MASK;
+
+ return 0;
+}
+
+static void gaudi2_set_dram_binning_masks(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+ /* check if we should override default binning */
+ if (!hdev->dram_binning) {
+ prop->dram_binning_mask = 0;
+ prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK;
+ return;
+ }
+
+ /* set DRAM binning constraints */
+ prop->faulty_dram_cluster_map |= hdev->dram_binning;
+ prop->dram_binning_mask = hdev->dram_binning;
+ prop->dram_enabled_mask = GAUDI2_DRAM_FULL_MASK & ~BIT(HBM_ID5);
+}
+
+static int gaudi2_set_edma_binning_masks(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hw_queue_properties *q_props;
+ u8 seq, num_faulty;
+
+ num_faulty = hweight32(hdev->edma_binning);
+
+ /*
+ * check for error condition in which number of binning candidates
+ * is higher than the maximum supported by the driver
+ */
+ if (num_faulty > MAX_FAULTY_EDMAS) {
+ dev_err(hdev->dev,
+ "EDMA binning is supported for max of single faulty EDMA, provided mask 0x%x\n",
+ hdev->edma_binning);
+ return -EINVAL;
+ }
+
+ if (!hdev->edma_binning) {
+ prop->edma_binning_mask = 0;
+ prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK;
+ return 0;
+ }
+
+ seq = __ffs((unsigned long)hdev->edma_binning);
+
+ /* set binning constraints */
+ prop->faulty_dram_cluster_map |= BIT(edma_to_hbm_cluster[seq]);
+ prop->edma_binning_mask = hdev->edma_binning;
+ prop->edma_enabled_mask = GAUDI2_EDMA_FULL_MASK & ~BIT(EDMA_ID_DCORE3_INSTANCE1);
+
+ /* bin substitute EDMA's queue */
+ q_props = prop->hw_queues_props;
+ q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0].binned = 1;
+ q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_1].binned = 1;
+ q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_2].binned = 1;
+ q_props[GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3].binned = 1;
+
+ return 0;
+}
+
+static int gaudi2_set_xbar_edge_enable_mask(struct hl_device *hdev, u32 xbar_edge_iso_mask)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u8 num_faulty, seq;
+
+ /* check if we should override default binning */
+ if (!xbar_edge_iso_mask) {
+ prop->xbar_edge_enabled_mask = GAUDI2_XBAR_EDGE_FULL_MASK;
+ return 0;
+ }
+
+ /*
+ * note that it can be set to value other than 0 only after cpucp packet (i.e.
+ * only the FW can set a redundancy value). for user it'll always be 0.
+ */
+ num_faulty = hweight32(xbar_edge_iso_mask);
+
+ /*
+ * check for error condition in which number of binning candidates
+ * is higher than the maximum supported by the driver
+ */
+ if (num_faulty > MAX_FAULTY_XBARS) {
+ dev_err(hdev->dev, "we cannot have more than %d faulty XBAR EDGE\n",
+ MAX_FAULTY_XBARS);
+ return -EINVAL;
+ }
+
+ seq = __ffs((unsigned long)xbar_edge_iso_mask);
+
+ /* set binning constraints */
+ prop->faulty_dram_cluster_map |= BIT(xbar_edge_to_hbm_cluster[seq]);
+ prop->xbar_edge_enabled_mask = (~xbar_edge_iso_mask) & GAUDI2_XBAR_EDGE_FULL_MASK;
+
+ return 0;
+}
+
+static int gaudi2_set_cluster_binning_masks_common(struct hl_device *hdev, u8 xbar_edge_iso_mask)
+{
+ int rc;
+
+ /*
+ * mark all clusters as good, each component will "fail" cluster
+ * based on eFuse/user values.
+ * If more than single cluster is faulty- the chip is unusable
+ */
+ hdev->asic_prop.faulty_dram_cluster_map = 0;
+
+ gaudi2_set_dram_binning_masks(hdev);
+
+ rc = gaudi2_set_edma_binning_masks(hdev);
+ if (rc)
+ return rc;
+
+ rc = gaudi2_set_xbar_edge_enable_mask(hdev, xbar_edge_iso_mask);
+ if (rc)
+ return rc;
+
+
+ /* always initially set to full mask */
+ hdev->asic_prop.hmmu_hif_enabled_mask = GAUDI2_HIF_HMMU_FULL_MASK;
+
+ return 0;
+}
+
+static int gaudi2_set_cluster_binning_masks(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int rc;
+
+ rc = gaudi2_set_cluster_binning_masks_common(hdev, prop->cpucp_info.xbar_binning_mask);
+ if (rc)
+ return rc;
+
+ /* if we have DRAM binning reported by FW we should perform cluster config */
+ if (prop->faulty_dram_cluster_map) {
+ u8 cluster_seq = __ffs((unsigned long)prop->faulty_dram_cluster_map);
+
+ prop->hmmu_hif_enabled_mask = cluster_hmmu_hif_enabled_mask[cluster_seq];
+ }
+
+ return 0;
+}
+
+static int gaudi2_set_binning_masks(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = gaudi2_set_cluster_binning_masks(hdev);
+ if (rc)
+ return rc;
+
+ rc = gaudi2_set_tpc_binning_masks(hdev);
+ if (rc)
+ return rc;
+
+ rc = gaudi2_set_dec_binning_masks(hdev);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int gaudi2_cpucp_info_get(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ long max_power;
+ u64 dram_size;
+ int rc;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ /* No point of asking this information again when not doing hard reset, as the device
+ * CPU hasn't been reset
+ */
+ if (hdev->reset_info.in_compute_reset)
+ return 0;
+
+ rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0, mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
+ mmCPU_BOOT_ERR1);
+ if (rc)
+ return rc;
+
+ dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
+ if (dram_size) {
+ /* we can have wither 5 or 6 HBMs. other values are invalid */
+
+ if ((dram_size != ((GAUDI2_HBM_NUM - 1) * SZ_16G)) &&
+ (dram_size != (GAUDI2_HBM_NUM * SZ_16G))) {
+ dev_err(hdev->dev,
+ "F/W reported invalid DRAM size %llu. Trying to use default size %llu\n",
+ dram_size, prop->dram_size);
+ dram_size = prop->dram_size;
+ }
+
+ prop->dram_size = dram_size;
+ prop->dram_end_address = prop->dram_base_address + dram_size;
+ }
+
+ if (!strlen(prop->cpucp_info.card_name))
+ strncpy(prop->cpucp_info.card_name, GAUDI2_DEFAULT_CARD_NAME, CARD_NAME_MAX_LEN);
+
+ /* Overwrite binning masks with the actual binning values from F/W */
+ hdev->dram_binning = prop->cpucp_info.dram_binning_mask;
+ hdev->edma_binning = prop->cpucp_info.edma_binning_mask;
+ hdev->tpc_binning = le64_to_cpu(prop->cpucp_info.tpc_binning_mask);
+ hdev->decoder_binning = lower_32_bits(le64_to_cpu(prop->cpucp_info.decoder_binning_mask));
+
+ /*
+ * at this point the DRAM parameters need to be updated according to data obtained
+ * from the FW
+ */
+ rc = hdev->asic_funcs->set_dram_properties(hdev);
+ if (rc)
+ return rc;
+
+ rc = hdev->asic_funcs->set_binning_masks(hdev);
+ if (rc)
+ return rc;
+
+ max_power = hl_fw_get_max_power(hdev);
+ if (max_power < 0)
+ return max_power;
+
+ prop->max_power_default = (u64) max_power;
+
+ return 0;
+}
+
+static int gaudi2_fetch_psoc_frequency(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS];
+ int rc;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ rc = hl_fw_cpucp_pll_info_get(hdev, HL_GAUDI2_CPU_PLL, pll_freq_arr);
+ if (rc)
+ return rc;
+
+ hdev->asic_prop.psoc_timestamp_frequency = pll_freq_arr[3];
+
+ return 0;
+}
+
+static int gaudi2_early_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_dev *pdev = hdev->pdev;
+ resource_size_t pci_bar_size;
+ int rc;
+
+ rc = gaudi2_set_fixed_properties(hdev);
+ if (rc)
+ return rc;
+
+ /* Check BAR sizes */
+ pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
+
+ if (pci_bar_size != CFG_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
+ if (pci_bar_size != MSIX_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ prop->dram_pci_bar_size = pci_resource_len(pdev, DRAM_BAR_ID);
+ hdev->dram_pci_bar_start = pci_resource_start(pdev, DRAM_BAR_ID);
+
+ /*
+ * Only in pldm driver config iATU
+ */
+ if (hdev->pldm)
+ hdev->asic_prop.iatu_done_by_fw = false;
+ else
+ hdev->asic_prop.iatu_done_by_fw = true;
+
+ rc = hl_pci_init(hdev);
+ if (rc)
+ goto free_queue_props;
+
+ /* Before continuing in the initialization, we need to read the preboot
+ * version to determine whether we run with a security-enabled firmware
+ */
+ rc = hl_fw_read_preboot_status(hdev);
+ if (rc) {
+ if (hdev->reset_on_preboot_fail)
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ goto pci_fini;
+ }
+
+ if (gaudi2_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
+ dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ }
+
+ return 0;
+
+pci_fini:
+ hl_pci_fini(hdev);
+free_queue_props:
+ kfree(hdev->asic_prop.hw_queues_props);
+ return rc;
+}
+
+static int gaudi2_early_fini(struct hl_device *hdev)
+{
+ kfree(hdev->asic_prop.hw_queues_props);
+ hl_pci_fini(hdev);
+
+ return 0;
+}
+
+static bool gaudi2_is_arc_nic_owned(u64 arc_id)
+{
+ switch (arc_id) {
+ case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static bool gaudi2_is_arc_tpc_owned(u64 arc_id)
+{
+ switch (arc_id) {
+ case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static void gaudi2_init_arcs(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 arc_id;
+ u32 i;
+
+ for (i = CPU_ID_SCHED_ARC0 ; i <= CPU_ID_SCHED_ARC3 ; i++) {
+ if (gaudi2_is_arc_enabled(hdev, i))
+ continue;
+
+ gaudi2_set_arc_id_cap(hdev, i);
+ }
+
+ for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
+ if (!gaudi2_is_queue_enabled(hdev, i))
+ continue;
+
+ arc_id = gaudi2_queue_id_to_arc_id[i];
+ if (gaudi2_is_arc_enabled(hdev, arc_id))
+ continue;
+
+ if (gaudi2_is_arc_nic_owned(arc_id) &&
+ !(hdev->nic_ports_mask & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0)))
+ continue;
+
+ if (gaudi2_is_arc_tpc_owned(arc_id) && !(gaudi2->tpc_hw_cap_initialized &
+ BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0)))
+ continue;
+
+ gaudi2_set_arc_id_cap(hdev, arc_id);
+ }
+}
+
+static int gaudi2_scrub_arc_dccm(struct hl_device *hdev, u32 cpu_id)
+{
+ u32 reg_base, reg_val;
+ int rc;
+
+ switch (cpu_id) {
+ case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC3:
+ /* Each ARC scheduler has 2 consecutive DCCM blocks */
+ rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
+ ARC_DCCM_BLOCK_SIZE * 2, true);
+ if (rc)
+ return rc;
+ break;
+ case CPU_ID_SCHED_ARC4:
+ case CPU_ID_SCHED_ARC5:
+ case CPU_ID_MME_QMAN_ARC0:
+ case CPU_ID_MME_QMAN_ARC1:
+ reg_base = gaudi2_arc_blocks_bases[cpu_id];
+
+ /* Scrub lower DCCM block */
+ rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
+ ARC_DCCM_BLOCK_SIZE, true);
+ if (rc)
+ return rc;
+
+ /* Switch to upper DCCM block */
+ reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 1);
+ WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
+
+ /* Scrub upper DCCM block */
+ rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
+ ARC_DCCM_BLOCK_SIZE, true);
+ if (rc)
+ return rc;
+
+ /* Switch to lower DCCM block */
+ reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_MME_ARC_UPPER_DCCM_EN_VAL_MASK, 0);
+ WREG32(reg_base + ARC_DCCM_UPPER_EN_OFFSET, reg_val);
+ break;
+ default:
+ rc = gaudi2_send_job_to_kdma(hdev, 0, CFG_BASE + gaudi2_arc_dccm_bases[cpu_id],
+ ARC_DCCM_BLOCK_SIZE, true);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static void gaudi2_scrub_arcs_dccm(struct hl_device *hdev)
+{
+ u16 arc_id;
+
+ for (arc_id = CPU_ID_SCHED_ARC0 ; arc_id < CPU_ID_MAX ; arc_id++) {
+ if (!gaudi2_is_arc_enabled(hdev, arc_id))
+ continue;
+
+ gaudi2_scrub_arc_dccm(hdev, arc_id);
+ }
+}
+
+static int gaudi2_late_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int rc;
+
+ hdev->asic_prop.supports_advanced_cpucp_rc = true;
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS,
+ gaudi2->virt_msix_db_dma_addr);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to enable PCI access from CPU\n");
+ return rc;
+ }
+
+ rc = gaudi2_fetch_psoc_frequency(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to fetch psoc frequency\n");
+ goto disable_pci_access;
+ }
+
+ gaudi2_init_arcs(hdev);
+ gaudi2_scrub_arcs_dccm(hdev);
+ gaudi2_init_security(hdev);
+
+ return 0;
+
+disable_pci_access:
+ hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
+
+ return rc;
+}
+
+static void gaudi2_late_fini(struct hl_device *hdev)
+{
+ hl_hwmon_release_resources(hdev);
+}
+
+static void gaudi2_user_mapped_dec_init(struct gaudi2_device *gaudi2, u32 start_idx)
+{
+ struct user_mapped_block *blocks = gaudi2->mapped_blocks;
+
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC0_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE0_DEC1_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC0_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE1_DEC1_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC0_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE2_DEC1_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC0_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmDCORE3_DEC1_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx++], mmPCIE_DEC0_CMD_BASE, HL_BLOCK_SIZE);
+ HL_USR_MAPPED_BLK_INIT(&blocks[start_idx], mmPCIE_DEC1_CMD_BASE, HL_BLOCK_SIZE);
+}
+
+static void gaudi2_user_mapped_blocks_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct user_mapped_block *blocks = gaudi2->mapped_blocks;
+ u32 block_size, umr_start_idx, num_umr_blocks;
+ int i;
+
+ for (i = 0 ; i < NUM_ARC_CPUS ; i++) {
+ if (i >= CPU_ID_SCHED_ARC0 && i <= CPU_ID_SCHED_ARC3)
+ block_size = ARC_DCCM_BLOCK_SIZE * 2;
+ else
+ block_size = ARC_DCCM_BLOCK_SIZE;
+
+ blocks[i].address = gaudi2_arc_dccm_bases[i];
+ blocks[i].size = block_size;
+ }
+
+ blocks[NUM_ARC_CPUS].address = mmARC_FARM_ARC0_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 1].address = mmARC_FARM_ARC1_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 1].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 2].address = mmARC_FARM_ARC2_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 2].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 3].address = mmARC_FARM_ARC3_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 3].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 4].address = mmDCORE0_MME_QM_ARC_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 4].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 5].address = mmDCORE1_MME_QM_ARC_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 5].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 6].address = mmDCORE2_MME_QM_ARC_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 6].size = HL_BLOCK_SIZE;
+
+ blocks[NUM_ARC_CPUS + 7].address = mmDCORE3_MME_QM_ARC_ACP_ENG_BASE;
+ blocks[NUM_ARC_CPUS + 7].size = HL_BLOCK_SIZE;
+
+ umr_start_idx = NUM_ARC_CPUS + NUM_OF_USER_ACP_BLOCKS;
+ num_umr_blocks = NIC_NUMBER_OF_ENGINES * NUM_OF_USER_NIC_UMR_BLOCKS;
+ for (i = 0 ; i < num_umr_blocks ; i++) {
+ u8 nic_id, umr_block_id;
+
+ nic_id = i / NUM_OF_USER_NIC_UMR_BLOCKS;
+ umr_block_id = i % NUM_OF_USER_NIC_UMR_BLOCKS;
+
+ blocks[umr_start_idx + i].address =
+ mmNIC0_UMR0_0_UNSECURE_DOORBELL0_BASE +
+ (nic_id / NIC_NUMBER_OF_QM_PER_MACRO) * NIC_OFFSET +
+ (nic_id % NIC_NUMBER_OF_QM_PER_MACRO) * NIC_QM_OFFSET +
+ umr_block_id * NIC_UMR_OFFSET;
+ blocks[umr_start_idx + i].size = HL_BLOCK_SIZE;
+ }
+
+ /* Expose decoder HW configuration block to user */
+ gaudi2_user_mapped_dec_init(gaudi2, USR_MAPPED_BLK_DEC_START_IDX);
+
+ for (i = 1; i < NUM_OF_DCORES; ++i) {
+ blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].size = SM_OBJS_BLOCK_SIZE;
+ blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].size = HL_BLOCK_SIZE;
+
+ blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1)].address =
+ mmDCORE0_SYNC_MNGR_OBJS_BASE + i * DCORE_OFFSET;
+
+ blocks[USR_MAPPED_BLK_SM_START_IDX + 2 * (i - 1) + 1].address =
+ mmDCORE0_SYNC_MNGR_GLBL_BASE + i * DCORE_OFFSET;
+ }
+}
+
+static int gaudi2_alloc_cpu_accessible_dma_mem(struct hl_device *hdev)
+{
+ dma_addr_t dma_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {}, end_addr;
+ void *virt_addr_arr[GAUDI2_ALLOC_CPU_MEM_RETRY_CNT] = {};
+ int i, j, rc = 0;
+
+ /* The device ARC works with 32-bits addresses, and because there is a single HW register
+ * that holds the extension bits (49..28), these bits must be identical in all the allocated
+ * range.
+ */
+
+ for (i = 0 ; i < GAUDI2_ALLOC_CPU_MEM_RETRY_CNT ; i++) {
+ virt_addr_arr[i] = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
+ &dma_addr_arr[i], GFP_KERNEL | __GFP_ZERO);
+ if (!virt_addr_arr[i]) {
+ rc = -ENOMEM;
+ goto free_dma_mem_arr;
+ }
+
+ end_addr = dma_addr_arr[i] + HL_CPU_ACCESSIBLE_MEM_SIZE - 1;
+ if (GAUDI2_ARC_PCI_MSB_ADDR(dma_addr_arr[i]) == GAUDI2_ARC_PCI_MSB_ADDR(end_addr))
+ break;
+ }
+
+ if (i == GAUDI2_ALLOC_CPU_MEM_RETRY_CNT) {
+ dev_err(hdev->dev,
+ "MSB of ARC accessible DMA memory are not identical in all range\n");
+ rc = -EFAULT;
+ goto free_dma_mem_arr;
+ }
+
+ hdev->cpu_accessible_dma_mem = virt_addr_arr[i];
+ hdev->cpu_accessible_dma_address = dma_addr_arr[i];
+
+free_dma_mem_arr:
+ for (j = 0 ; j < i ; j++)
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, virt_addr_arr[j],
+ dma_addr_arr[j]);
+
+ return rc;
+}
+
+static void gaudi2_set_pci_memory_regions(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_mem_region *region;
+
+ /* CFG */
+ region = &hdev->pci_mem_region[PCI_REGION_CFG];
+ region->region_base = CFG_BASE;
+ region->region_size = CFG_SIZE;
+ region->offset_in_bar = CFG_BASE - STM_FLASH_BASE_ADDR;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = SRAM_CFG_BAR_ID;
+ region->used = 1;
+
+ /* SRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_SRAM];
+ region->region_base = SRAM_BASE_ADDR;
+ region->region_size = SRAM_SIZE;
+ region->offset_in_bar = CFG_REGION_SIZE + BAR0_RSRVD_SIZE;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = SRAM_CFG_BAR_ID;
+ region->used = 1;
+
+ /* DRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_DRAM];
+ region->region_base = DRAM_PHYS_BASE;
+ region->region_size = hdev->asic_prop.dram_size;
+ region->offset_in_bar = 0;
+ region->bar_size = prop->dram_pci_bar_size;
+ region->bar_id = DRAM_BAR_ID;
+ region->used = 1;
+}
+
+static void gaudi2_user_interrupt_setup(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int i, j, k;
+
+ /* Initialize common user CQ interrupt */
+ HL_USR_INTR_STRUCT_INIT(hdev->common_user_cq_interrupt, hdev,
+ HL_COMMON_USER_CQ_INTERRUPT_ID, HL_USR_INTERRUPT_CQ);
+
+ /* Initialize common decoder interrupt */
+ HL_USR_INTR_STRUCT_INIT(hdev->common_decoder_interrupt, hdev,
+ HL_COMMON_DEC_INTERRUPT_ID, HL_USR_INTERRUPT_DECODER);
+
+ /* User interrupts structure holds both decoder and user interrupts from various engines.
+ * We first initialize the decoder interrupts and then we add the user interrupts.
+ * The only limitation is that the last decoder interrupt id must be smaller
+ * then GAUDI2_IRQ_NUM_USER_FIRST. This is checked at compilation time.
+ */
+
+ /* Initialize decoder interrupts, expose only normal interrupts,
+ * error interrupts to be handled by driver
+ */
+ for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, j = 0 ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_NRM;
+ i += 2, j++)
+ HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i,
+ HL_USR_INTERRUPT_DECODER);
+
+ for (i = GAUDI2_IRQ_NUM_USER_FIRST, k = 0 ; k < prop->user_interrupt_count; i++, j++, k++)
+ HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, HL_USR_INTERRUPT_CQ);
+}
+
+static inline int gaudi2_get_non_zero_random_int(void)
+{
+ int rand = get_random_u32();
+
+ return rand ? rand : 1;
+}
+
+static void gaudi2_special_blocks_free(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hl_skip_blocks_cfg *skip_special_blocks_cfg =
+ &prop->skip_special_blocks_cfg;
+
+ kfree(prop->special_blocks);
+ kfree(skip_special_blocks_cfg->block_types);
+ kfree(skip_special_blocks_cfg->block_ranges);
+}
+
+static void gaudi2_special_blocks_iterator_free(struct hl_device *hdev)
+{
+ gaudi2_special_blocks_free(hdev);
+}
+
+static bool gaudi2_special_block_skip(struct hl_device *hdev,
+ struct hl_special_blocks_cfg *special_blocks_cfg,
+ u32 blk_idx, u32 major, u32 minor, u32 sub_minor)
+{
+ return false;
+}
+
+static int gaudi2_special_blocks_config(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int i, rc;
+
+ /* Configure Special blocks */
+ prop->glbl_err_cause_num = GAUDI2_NUM_OF_GLBL_ERR_CAUSE;
+ prop->num_of_special_blocks = ARRAY_SIZE(gaudi2_special_blocks);
+ prop->special_blocks = kmalloc_array(prop->num_of_special_blocks,
+ sizeof(*prop->special_blocks), GFP_KERNEL);
+ if (!prop->special_blocks)
+ return -ENOMEM;
+
+ for (i = 0 ; i < prop->num_of_special_blocks ; i++)
+ memcpy(&prop->special_blocks[i], &gaudi2_special_blocks[i],
+ sizeof(*prop->special_blocks));
+
+ /* Configure when to skip Special blocks */
+ memset(&prop->skip_special_blocks_cfg, 0, sizeof(prop->skip_special_blocks_cfg));
+ prop->skip_special_blocks_cfg.skip_block_hook = gaudi2_special_block_skip;
+
+ if (ARRAY_SIZE(gaudi2_iterator_skip_block_types)) {
+ prop->skip_special_blocks_cfg.block_types =
+ kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_types),
+ sizeof(gaudi2_iterator_skip_block_types[0]), GFP_KERNEL);
+ if (!prop->skip_special_blocks_cfg.block_types) {
+ rc = -ENOMEM;
+ goto free_special_blocks;
+ }
+
+ memcpy(prop->skip_special_blocks_cfg.block_types, gaudi2_iterator_skip_block_types,
+ sizeof(gaudi2_iterator_skip_block_types));
+
+ prop->skip_special_blocks_cfg.block_types_len =
+ ARRAY_SIZE(gaudi2_iterator_skip_block_types);
+ }
+
+ if (ARRAY_SIZE(gaudi2_iterator_skip_block_ranges)) {
+ prop->skip_special_blocks_cfg.block_ranges =
+ kmalloc_array(ARRAY_SIZE(gaudi2_iterator_skip_block_ranges),
+ sizeof(gaudi2_iterator_skip_block_ranges[0]), GFP_KERNEL);
+ if (!prop->skip_special_blocks_cfg.block_ranges) {
+ rc = -ENOMEM;
+ goto free_skip_special_blocks_types;
+ }
+
+ for (i = 0 ; i < ARRAY_SIZE(gaudi2_iterator_skip_block_ranges) ; i++)
+ memcpy(&prop->skip_special_blocks_cfg.block_ranges[i],
+ &gaudi2_iterator_skip_block_ranges[i],
+ sizeof(struct range));
+
+ prop->skip_special_blocks_cfg.block_ranges_len =
+ ARRAY_SIZE(gaudi2_iterator_skip_block_ranges);
+ }
+
+ return 0;
+
+free_skip_special_blocks_types:
+ kfree(prop->skip_special_blocks_cfg.block_types);
+free_special_blocks:
+ kfree(prop->special_blocks);
+
+ return rc;
+}
+
+static int gaudi2_special_blocks_iterator_config(struct hl_device *hdev)
+{
+ return gaudi2_special_blocks_config(hdev);
+}
+
+static int gaudi2_sw_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2;
+ int i, rc;
+
+ /* Allocate device structure */
+ gaudi2 = kzalloc(sizeof(*gaudi2), GFP_KERNEL);
+ if (!gaudi2)
+ return -ENOMEM;
+
+ for (i = 0 ; i < ARRAY_SIZE(gaudi2_irq_map_table) ; i++) {
+ if (gaudi2_irq_map_table[i].msg || !gaudi2_irq_map_table[i].valid)
+ continue;
+
+ if (gaudi2->num_of_valid_hw_events == GAUDI2_EVENT_SIZE) {
+ dev_err(hdev->dev, "H/W events array exceeds the limit of %u events\n",
+ GAUDI2_EVENT_SIZE);
+ rc = -EINVAL;
+ goto free_gaudi2_device;
+ }
+
+ gaudi2->hw_events[gaudi2->num_of_valid_hw_events++] = gaudi2_irq_map_table[i].fc_id;
+ }
+
+ for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++)
+ gaudi2->lfsr_rand_seeds[i] = gaudi2_get_non_zero_random_int();
+
+ gaudi2->cpucp_info_get = gaudi2_cpucp_info_get;
+
+ hdev->asic_specific = gaudi2;
+
+ /* Create DMA pool for small allocations.
+ * Use DEVICE_CACHE_LINE_SIZE for alignment since the NIC memory-mapped
+ * PI/CI registers allocated from this pool have this restriction
+ */
+ hdev->dma_pool = dma_pool_create(dev_name(hdev->dev), &hdev->pdev->dev,
+ GAUDI2_DMA_POOL_BLK_SIZE, DEVICE_CACHE_LINE_SIZE, 0);
+ if (!hdev->dma_pool) {
+ dev_err(hdev->dev, "failed to create DMA pool\n");
+ rc = -ENOMEM;
+ goto free_gaudi2_device;
+ }
+
+ rc = gaudi2_alloc_cpu_accessible_dma_mem(hdev);
+ if (rc)
+ goto free_dma_pool;
+
+ hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
+ if (!hdev->cpu_accessible_dma_pool) {
+ dev_err(hdev->dev, "Failed to create CPU accessible DMA pool\n");
+ rc = -ENOMEM;
+ goto free_cpu_dma_mem;
+ }
+
+ rc = gen_pool_add(hdev->cpu_accessible_dma_pool, (uintptr_t) hdev->cpu_accessible_dma_mem,
+ HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to add memory to CPU accessible DMA pool\n");
+ rc = -EFAULT;
+ goto free_cpu_accessible_dma_pool;
+ }
+
+ gaudi2->virt_msix_db_cpu_addr = hl_cpu_accessible_dma_pool_alloc(hdev, prop->pmmu.page_size,
+ &gaudi2->virt_msix_db_dma_addr);
+ if (!gaudi2->virt_msix_db_cpu_addr) {
+ dev_err(hdev->dev, "Failed to allocate DMA memory for virtual MSI-X doorbell\n");
+ rc = -ENOMEM;
+ goto free_cpu_accessible_dma_pool;
+ }
+
+ spin_lock_init(&gaudi2->hw_queues_lock);
+
+ gaudi2->scratchpad_kernel_address = hl_asic_dma_alloc_coherent(hdev, PAGE_SIZE,
+ &gaudi2->scratchpad_bus_address,
+ GFP_KERNEL | __GFP_ZERO);
+ if (!gaudi2->scratchpad_kernel_address) {
+ rc = -ENOMEM;
+ goto free_virt_msix_db_mem;
+ }
+
+ gaudi2_user_mapped_blocks_init(hdev);
+
+ /* Initialize user interrupts */
+ gaudi2_user_interrupt_setup(hdev);
+
+ hdev->supports_coresight = true;
+ hdev->supports_sync_stream = true;
+ hdev->supports_cb_mapping = true;
+ hdev->supports_wait_for_multi_cs = false;
+
+ prop->supports_compute_reset = true;
+
+ hdev->asic_funcs->set_pci_memory_regions(hdev);
+
+ rc = gaudi2_special_blocks_iterator_config(hdev);
+ if (rc)
+ goto free_scratchpad_mem;
+
+ return 0;
+
+free_scratchpad_mem:
+ hl_asic_dma_pool_free(hdev, gaudi2->scratchpad_kernel_address,
+ gaudi2->scratchpad_bus_address);
+free_virt_msix_db_mem:
+ hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
+free_cpu_accessible_dma_pool:
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+free_cpu_dma_mem:
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+free_dma_pool:
+ dma_pool_destroy(hdev->dma_pool);
+free_gaudi2_device:
+ kfree(gaudi2);
+ return rc;
+}
+
+static int gaudi2_sw_fini(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ gaudi2_special_blocks_iterator_free(hdev);
+
+ hl_cpu_accessible_dma_pool_free(hdev, prop->pmmu.page_size, gaudi2->virt_msix_db_cpu_addr);
+
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+
+ hl_asic_dma_free_coherent(hdev, PAGE_SIZE, gaudi2->scratchpad_kernel_address,
+ gaudi2->scratchpad_bus_address);
+
+ dma_pool_destroy(hdev->dma_pool);
+
+ kfree(gaudi2);
+
+ return 0;
+}
+
+static void gaudi2_stop_qman_common(struct hl_device *hdev, u32 reg_base)
+{
+ WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_STOP |
+ QM_GLBL_CFG1_CQF_STOP |
+ QM_GLBL_CFG1_CP_STOP);
+
+ /* stop also the ARC */
+ WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_STOP);
+}
+
+static void gaudi2_flush_qman_common(struct hl_device *hdev, u32 reg_base)
+{
+ WREG32(reg_base + QM_GLBL_CFG1_OFFSET, QM_GLBL_CFG1_PQF_FLUSH |
+ QM_GLBL_CFG1_CQF_FLUSH |
+ QM_GLBL_CFG1_CP_FLUSH);
+}
+
+static void gaudi2_flush_qman_arc_common(struct hl_device *hdev, u32 reg_base)
+{
+ WREG32(reg_base + QM_GLBL_CFG2_OFFSET, QM_GLBL_CFG2_ARC_CQF_FLUSH);
+}
+
+/**
+ * gaudi2_clear_qm_fence_counters_common - clear QM's fence counters
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @queue_id: queue to clear fence counters to
+ * @skip_fence: if true set maximum fence value to all fence counters to avoid
+ * getting stuck on any fence value. otherwise set all fence
+ * counters to 0 (standard clear of fence counters)
+ */
+static void gaudi2_clear_qm_fence_counters_common(struct hl_device *hdev, u32 queue_id,
+ bool skip_fence)
+{
+ u32 size, reg_base;
+ u32 addr, val;
+
+ reg_base = gaudi2_qm_blocks_bases[queue_id];
+
+ addr = reg_base + QM_CP_FENCE0_CNT_0_OFFSET;
+ size = mmPDMA0_QM_CP_BARRIER_CFG - mmPDMA0_QM_CP_FENCE0_CNT_0;
+
+ /*
+ * in case we want to make sure that QM that is stuck on a fence will
+ * be released we should set the fence counter to a higher value that
+ * the value the QM waiting for. to comply with any fence counter of
+ * any value we set maximum fence value to all counters
+ */
+ val = skip_fence ? U32_MAX : 0;
+ gaudi2_memset_device_lbw(hdev, addr, size, val);
+}
+
+static void gaudi2_qman_manual_flush_common(struct hl_device *hdev, u32 queue_id)
+{
+ u32 reg_base = gaudi2_qm_blocks_bases[queue_id];
+
+ gaudi2_clear_qm_fence_counters_common(hdev, queue_id, true);
+ gaudi2_flush_qman_common(hdev, reg_base);
+ gaudi2_flush_qman_arc_common(hdev, reg_base);
+}
+
+static void gaudi2_stop_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int dcore, inst;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
+ goto stop_edma_qmans;
+
+ /* Stop CPs of PDMA QMANs */
+ gaudi2_stop_qman_common(hdev, mmPDMA0_QM_BASE);
+ gaudi2_stop_qman_common(hdev, mmPDMA1_QM_BASE);
+
+stop_edma_qmans:
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
+ return;
+
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
+ u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
+ u32 qm_base;
+
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
+ continue;
+
+ qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
+ inst * DCORE_EDMA_OFFSET;
+
+ /* Stop CPs of EDMA QMANs */
+ gaudi2_stop_qman_common(hdev, qm_base);
+ }
+ }
+}
+
+static void gaudi2_stop_mme_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 offset, i;
+
+ offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
+
+ for (i = 0 ; i < NUM_OF_DCORES ; i++) {
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i)))
+ continue;
+
+ gaudi2_stop_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
+ }
+}
+
+static void gaudi2_stop_tpc_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+ int i;
+
+ if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ for (i = 0 ; i < TPC_ID_SIZE ; i++) {
+ if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
+ gaudi2_stop_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_stop_rot_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+ int i;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
+ return;
+
+ for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
+ gaudi2_stop_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_stop_nic_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base, queue_id;
+ int i;
+
+ if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
+ return;
+
+ queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!(hdev->nic_ports_mask & BIT(i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[queue_id];
+ gaudi2_stop_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_stall_dma_common(struct hl_device *hdev, u32 reg_base)
+{
+ u32 reg_val;
+
+ reg_val = FIELD_PREP(PDMA0_CORE_CFG_1_HALT_MASK, 0x1);
+ WREG32(reg_base + DMA_CORE_CFG_1_OFFSET, reg_val);
+}
+
+static void gaudi2_dma_stall(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int dcore, inst;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
+ goto stall_edma;
+
+ gaudi2_stall_dma_common(hdev, mmPDMA0_CORE_BASE);
+ gaudi2_stall_dma_common(hdev, mmPDMA1_CORE_BASE);
+
+stall_edma:
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
+ return;
+
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
+ u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
+ u32 core_base;
+
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
+ continue;
+
+ core_base = mmDCORE0_EDMA0_CORE_BASE + dcore * DCORE_OFFSET +
+ inst * DCORE_EDMA_OFFSET;
+
+ /* Stall CPs of EDMA QMANs */
+ gaudi2_stall_dma_common(hdev, core_base);
+ }
+ }
+}
+
+static void gaudi2_mme_stall(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 offset, i;
+
+ offset = mmDCORE1_MME_CTRL_LO_QM_STALL - mmDCORE0_MME_CTRL_LO_QM_STALL;
+
+ for (i = 0 ; i < NUM_OF_DCORES ; i++)
+ if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
+ WREG32(mmDCORE0_MME_CTRL_LO_QM_STALL + (i * offset), 1);
+}
+
+static void gaudi2_tpc_stall(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+ int i;
+
+ if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ for (i = 0 ; i < TPC_ID_SIZE ; i++) {
+ if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
+ continue;
+
+ reg_base = gaudi2_tpc_cfg_blocks_bases[i];
+ WREG32(reg_base + TPC_CFG_STALL_OFFSET, 1);
+ }
+}
+
+static void gaudi2_rotator_stall(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_val;
+ int i;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
+ return;
+
+ reg_val = FIELD_PREP(ROT_MSS_HALT_WBC_MASK, 0x1) |
+ FIELD_PREP(ROT_MSS_HALT_RSB_MASK, 0x1) |
+ FIELD_PREP(ROT_MSS_HALT_MRSB_MASK, 0x1);
+
+ for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
+ continue;
+
+ WREG32(mmROT0_MSS_HALT + i * ROT_OFFSET, reg_val);
+ }
+}
+
+static void gaudi2_disable_qman_common(struct hl_device *hdev, u32 reg_base)
+{
+ WREG32(reg_base + QM_GLBL_CFG0_OFFSET, 0);
+}
+
+static void gaudi2_disable_dma_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int dcore, inst;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK))
+ goto stop_edma_qmans;
+
+ gaudi2_disable_qman_common(hdev, mmPDMA0_QM_BASE);
+ gaudi2_disable_qman_common(hdev, mmPDMA1_QM_BASE);
+
+stop_edma_qmans:
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK))
+ return;
+
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
+ u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
+ u32 qm_base;
+
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_EDMA_SHIFT + seq)))
+ continue;
+
+ qm_base = mmDCORE0_EDMA0_QM_BASE + dcore * DCORE_OFFSET +
+ inst * DCORE_EDMA_OFFSET;
+
+ /* Disable CPs of EDMA QMANs */
+ gaudi2_disable_qman_common(hdev, qm_base);
+ }
+ }
+}
+
+static void gaudi2_disable_mme_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 offset, i;
+
+ offset = mmDCORE1_MME_QM_BASE - mmDCORE0_MME_QM_BASE;
+
+ for (i = 0 ; i < NUM_OF_DCORES ; i++)
+ if (gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_MME_SHIFT + i))
+ gaudi2_disable_qman_common(hdev, mmDCORE0_MME_QM_BASE + (i * offset));
+}
+
+static void gaudi2_disable_tpc_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+ int i;
+
+ if (!(gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK))
+ return;
+
+ for (i = 0 ; i < TPC_ID_SIZE ; i++) {
+ if (!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(HW_CAP_TPC_SHIFT + i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[gaudi2_tpc_id_to_queue_id[i]];
+ gaudi2_disable_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_disable_rot_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+ int i;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_ROT_MASK))
+ return;
+
+ for (i = 0 ; i < ROTATOR_ID_SIZE ; i++) {
+ if (!(gaudi2->hw_cap_initialized & BIT_ULL(HW_CAP_ROT_SHIFT + i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[gaudi2_rot_id_to_queue_id[i]];
+ gaudi2_disable_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_disable_nic_qmans(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base, queue_id;
+ int i;
+
+ if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
+ return;
+
+ queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!(hdev->nic_ports_mask & BIT(i)))
+ continue;
+
+ reg_base = gaudi2_qm_blocks_bases[queue_id];
+ gaudi2_disable_qman_common(hdev, reg_base);
+ }
+}
+
+static void gaudi2_enable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE, 0);
+
+ /* Zero the lower/upper parts of the 64-bit counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE + 0xC, 0);
+ WREG32(mmPSOC_TIMESTAMP_BASE + 0x8, 0);
+
+ /* Enable the counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE, 1);
+}
+
+static void gaudi2_disable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE, 0);
+}
+
+static const char *gaudi2_irq_name(u16 irq_number)
+{
+ switch (irq_number) {
+ case GAUDI2_IRQ_NUM_EVENT_QUEUE:
+ return "gaudi2 cpu eq";
+ case GAUDI2_IRQ_NUM_COMPLETION:
+ return "gaudi2 completion";
+ case GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ... GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM:
+ return gaudi2_vdec_irq_name[irq_number - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM];
+ case GAUDI2_IRQ_NUM_USER_FIRST ... GAUDI2_IRQ_NUM_USER_LAST:
+ return "gaudi2 user completion";
+ default:
+ return "invalid";
+ }
+}
+
+static void gaudi2_dec_disable_msix(struct hl_device *hdev, u32 max_irq_num)
+{
+ int i, irq, relative_idx;
+ struct hl_dec *dec;
+
+ for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i < max_irq_num ; i++) {
+ irq = pci_irq_vector(hdev->pdev, i);
+ relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
+
+ dec = hdev->dec + relative_idx / 2;
+
+ /* We pass different structures depending on the irq handler. For the abnormal
+ * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
+ * user_interrupt entry
+ */
+ free_irq(irq, ((relative_idx % 2) ?
+ (void *) dec :
+ (void *) &hdev->user_interrupt[dec->core_id]));
+ }
+}
+
+static int gaudi2_dec_enable_msix(struct hl_device *hdev)
+{
+ int rc, i, irq_init_cnt, irq, relative_idx;
+ irq_handler_t irq_handler;
+ struct hl_dec *dec;
+
+ for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, irq_init_cnt = 0;
+ i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM;
+ i++, irq_init_cnt++) {
+
+ irq = pci_irq_vector(hdev->pdev, i);
+ relative_idx = i - GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM;
+
+ irq_handler = (relative_idx % 2) ?
+ hl_irq_handler_dec_abnrm :
+ hl_irq_handler_user_interrupt;
+
+ dec = hdev->dec + relative_idx / 2;
+
+ /* We pass different structures depending on the irq handler. For the abnormal
+ * interrupt we pass hl_dec and for the regular interrupt we pass the relevant
+ * user_interrupt entry
+ */
+ rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i),
+ ((relative_idx % 2) ?
+ (void *) dec :
+ (void *) &hdev->user_interrupt[dec->core_id]));
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_dec_irqs;
+ }
+ }
+
+ return 0;
+
+free_dec_irqs:
+ gaudi2_dec_disable_msix(hdev, (GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + irq_init_cnt));
+ return rc;
+}
+
+static int gaudi2_enable_msix(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int rc, irq, i, j, user_irq_init_cnt;
+ irq_handler_t irq_handler;
+ struct hl_cq *cq;
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_MSIX)
+ return 0;
+
+ rc = pci_alloc_irq_vectors(hdev->pdev, GAUDI2_MSIX_ENTRIES, GAUDI2_MSIX_ENTRIES,
+ PCI_IRQ_MSIX);
+ if (rc < 0) {
+ dev_err(hdev->dev, "MSI-X: Failed to enable support -- %d/%d\n",
+ GAUDI2_MSIX_ENTRIES, rc);
+ return rc;
+ }
+
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
+ cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
+ rc = request_irq(irq, hl_irq_handler_cq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_COMPLETION), cq);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_irq_vectors;
+ }
+
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
+ rc = request_irq(irq, hl_irq_handler_eq, 0, gaudi2_irq_name(GAUDI2_IRQ_NUM_EVENT_QUEUE),
+ &hdev->event_queue);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_completion_irq;
+ }
+
+ rc = gaudi2_dec_enable_msix(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to enable decoder IRQ");
+ goto free_event_irq;
+ }
+
+ for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, user_irq_init_cnt = 0;
+ user_irq_init_cnt < prop->user_interrupt_count;
+ i++, j++, user_irq_init_cnt++) {
+
+ irq = pci_irq_vector(hdev->pdev, i);
+ irq_handler = hl_irq_handler_user_interrupt;
+
+ rc = request_irq(irq, irq_handler, 0, gaudi2_irq_name(i), &hdev->user_interrupt[j]);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_user_irq;
+ }
+ }
+
+ gaudi2->hw_cap_initialized |= HW_CAP_MSIX;
+
+ return 0;
+
+free_user_irq:
+ for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count;
+ i < GAUDI2_IRQ_NUM_USER_FIRST + user_irq_init_cnt ; i++, j++) {
+
+ irq = pci_irq_vector(hdev->pdev, i);
+ free_irq(irq, &hdev->user_interrupt[j]);
+ }
+
+ gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
+
+free_event_irq:
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
+ free_irq(irq, cq);
+
+free_completion_irq:
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
+ free_irq(irq, cq);
+
+free_irq_vectors:
+ pci_free_irq_vectors(hdev->pdev);
+
+ return rc;
+}
+
+static void gaudi2_sync_irqs(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int i, j;
+ int irq;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
+ return;
+
+ /* Wait for all pending IRQs to be finished */
+ synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION));
+
+ for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM ; i++) {
+ irq = pci_irq_vector(hdev->pdev, i);
+ synchronize_irq(irq);
+ }
+
+ for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = 0 ; j < hdev->asic_prop.user_interrupt_count;
+ i++, j++) {
+ irq = pci_irq_vector(hdev->pdev, i);
+ synchronize_irq(irq);
+ }
+
+ synchronize_irq(pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE));
+}
+
+static void gaudi2_disable_msix(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct hl_cq *cq;
+ int irq, i, j, k;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_MSIX))
+ return;
+
+ gaudi2_sync_irqs(hdev);
+
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_EVENT_QUEUE);
+ free_irq(irq, &hdev->event_queue);
+
+ gaudi2_dec_disable_msix(hdev, GAUDI2_IRQ_NUM_SHARED_DEC1_ABNRM + 1);
+
+ for (i = GAUDI2_IRQ_NUM_USER_FIRST, j = prop->user_dec_intr_count, k = 0;
+ k < hdev->asic_prop.user_interrupt_count ; i++, j++, k++) {
+
+ irq = pci_irq_vector(hdev->pdev, i);
+ free_irq(irq, &hdev->user_interrupt[j]);
+ }
+
+ irq = pci_irq_vector(hdev->pdev, GAUDI2_IRQ_NUM_COMPLETION);
+ cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_CS_COMPLETION];
+ free_irq(irq, cq);
+
+ pci_free_irq_vectors(hdev->pdev);
+
+ gaudi2->hw_cap_initialized &= ~HW_CAP_MSIX;
+}
+
+static void gaudi2_stop_dcore_dec(struct hl_device *hdev, int dcore_id)
+{
+ u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
+ u32 graceful_pend_mask = DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
+ u32 timeout_usec, dec_id, dec_bit, offset, graceful;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
+ else
+ timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
+
+ for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
+ dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
+ if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
+ continue;
+
+ offset = dcore_id * DCORE_OFFSET + dec_id * DCORE_VDEC_OFFSET;
+
+ WREG32(mmDCORE0_DEC0_CMD_SWREG16 + offset, 0);
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
+
+ /* Wait till all traffic from decoder stops
+ * before apply core reset.
+ */
+ rc = hl_poll_timeout(
+ hdev,
+ mmDCORE0_VDEC0_BRDG_CTRL_GRACEFUL + offset,
+ graceful,
+ (graceful & graceful_pend_mask),
+ 100,
+ timeout_usec);
+ if (rc)
+ dev_err(hdev->dev,
+ "Failed to stop traffic from DCORE%d Decoder %d\n",
+ dcore_id, dec_id);
+ }
+}
+
+static void gaudi2_stop_pcie_dec(struct hl_device *hdev)
+{
+ u32 reg_val = FIELD_PREP(DCORE0_VDEC0_BRDG_CTRL_GRACEFUL_STOP_MASK, 0x1);
+ u32 graceful_pend_mask = PCIE_VDEC0_BRDG_CTRL_GRACEFUL_PEND_MASK;
+ u32 timeout_usec, dec_id, dec_bit, offset, graceful;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI2_PLDM_VDEC_TIMEOUT_USEC;
+ else
+ timeout_usec = GAUDI2_VDEC_TIMEOUT_USEC;
+
+ for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
+ dec_bit = PCIE_DEC_SHIFT + dec_id;
+ if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
+ continue;
+
+ offset = dec_id * PCIE_VDEC_OFFSET;
+
+ WREG32(mmPCIE_DEC0_CMD_SWREG16 + offset, 0);
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset, reg_val);
+
+ /* Wait till all traffic from decoder stops
+ * before apply core reset.
+ */
+ rc = hl_poll_timeout(
+ hdev,
+ mmPCIE_VDEC0_BRDG_CTRL_GRACEFUL + offset,
+ graceful,
+ (graceful & graceful_pend_mask),
+ 100,
+ timeout_usec);
+ if (rc)
+ dev_err(hdev->dev,
+ "Failed to stop traffic from PCIe Decoder %d\n",
+ dec_id);
+ }
+}
+
+static void gaudi2_stop_dec(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int dcore_id;
+
+ if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == 0)
+ return;
+
+ for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
+ gaudi2_stop_dcore_dec(hdev, dcore_id);
+
+ gaudi2_stop_pcie_dec(hdev);
+}
+
+static void gaudi2_set_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
+{
+ u32 reg_base, reg_val;
+
+ reg_base = gaudi2_arc_blocks_bases[cpu_id];
+ if (run_mode == HL_ENGINE_CORE_RUN)
+ reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 1);
+ else
+ reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_HALT_REQ_MASK, 1);
+
+ WREG32(reg_base + ARC_HALT_REQ_OFFSET, reg_val);
+}
+
+static void gaudi2_halt_arcs(struct hl_device *hdev)
+{
+ u16 arc_id;
+
+ for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++) {
+ if (gaudi2_is_arc_enabled(hdev, arc_id))
+ gaudi2_set_arc_running_mode(hdev, arc_id, HL_ENGINE_CORE_HALT);
+ }
+}
+
+static int gaudi2_verify_arc_running_mode(struct hl_device *hdev, u32 cpu_id, u32 run_mode)
+{
+ int rc;
+ u32 reg_base, val, ack_mask, timeout_usec = 100000;
+
+ if (hdev->pldm)
+ timeout_usec *= 100;
+
+ reg_base = gaudi2_arc_blocks_bases[cpu_id];
+ if (run_mode == HL_ENGINE_CORE_RUN)
+ ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_RUN_ACK_MASK;
+ else
+ ack_mask = ARC_FARM_ARC0_AUX_RUN_HALT_ACK_HALT_ACK_MASK;
+
+ rc = hl_poll_timeout(hdev, reg_base + ARC_HALT_ACK_OFFSET,
+ val, ((val & ack_mask) == ack_mask),
+ 1000, timeout_usec);
+
+ if (!rc) {
+ /* Clear */
+ val = FIELD_PREP(ARC_FARM_ARC0_AUX_RUN_HALT_REQ_RUN_REQ_MASK, 0);
+ WREG32(reg_base + ARC_HALT_REQ_OFFSET, val);
+ }
+
+ return rc;
+}
+
+static void gaudi2_reset_arcs(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u16 arc_id;
+
+ if (!gaudi2)
+ return;
+
+ for (arc_id = CPU_ID_SCHED_ARC0; arc_id < CPU_ID_MAX; arc_id++)
+ if (gaudi2_is_arc_enabled(hdev, arc_id))
+ gaudi2_clr_arc_id_cap(hdev, arc_id);
+}
+
+static void gaudi2_nic_qmans_manual_flush(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 queue_id;
+ int i;
+
+ if (!(gaudi2->nic_hw_cap_initialized & HW_CAP_NIC_MASK))
+ return;
+
+ queue_id = GAUDI2_QUEUE_ID_NIC_0_0;
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!(hdev->nic_ports_mask & BIT(i)))
+ continue;
+
+ gaudi2_qman_manual_flush_common(hdev, queue_id);
+ }
+}
+
+static int gaudi2_set_engine_cores(struct hl_device *hdev, u32 *core_ids,
+ u32 num_cores, u32 core_command)
+{
+ int i, rc;
+
+
+ for (i = 0 ; i < num_cores ; i++) {
+ if (gaudi2_is_arc_enabled(hdev, core_ids[i]))
+ gaudi2_set_arc_running_mode(hdev, core_ids[i], core_command);
+ }
+
+ for (i = 0 ; i < num_cores ; i++) {
+ if (gaudi2_is_arc_enabled(hdev, core_ids[i])) {
+ rc = gaudi2_verify_arc_running_mode(hdev, core_ids[i], core_command);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to %s arc: %d\n",
+ (core_command == HL_ENGINE_CORE_HALT) ?
+ "HALT" : "RUN", core_ids[i]);
+ return -1;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static void gaudi2_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ u32 wait_timeout_ms;
+
+ if (hdev->pldm)
+ wait_timeout_ms = GAUDI2_PLDM_RESET_WAIT_MSEC;
+ else
+ wait_timeout_ms = GAUDI2_RESET_WAIT_MSEC;
+
+ if (fw_reset)
+ goto skip_engines;
+
+ gaudi2_stop_dma_qmans(hdev);
+ gaudi2_stop_mme_qmans(hdev);
+ gaudi2_stop_tpc_qmans(hdev);
+ gaudi2_stop_rot_qmans(hdev);
+ gaudi2_stop_nic_qmans(hdev);
+ msleep(wait_timeout_ms);
+
+ gaudi2_halt_arcs(hdev);
+ gaudi2_dma_stall(hdev);
+ gaudi2_mme_stall(hdev);
+ gaudi2_tpc_stall(hdev);
+ gaudi2_rotator_stall(hdev);
+
+ msleep(wait_timeout_ms);
+
+ gaudi2_stop_dec(hdev);
+
+ /*
+ * in case of soft reset do a manual flush for QMANs (currently called
+ * only for NIC QMANs
+ */
+ if (!hard_reset)
+ gaudi2_nic_qmans_manual_flush(hdev);
+
+ gaudi2_disable_dma_qmans(hdev);
+ gaudi2_disable_mme_qmans(hdev);
+ gaudi2_disable_tpc_qmans(hdev);
+ gaudi2_disable_rot_qmans(hdev);
+ gaudi2_disable_nic_qmans(hdev);
+ gaudi2_disable_timestamp(hdev);
+
+skip_engines:
+ if (hard_reset) {
+ gaudi2_disable_msix(hdev);
+ return;
+ }
+
+ gaudi2_sync_irqs(hdev);
+}
+
+static void gaudi2_init_firmware_preload_params(struct hl_device *hdev)
+{
+ struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
+
+ pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
+ pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
+ pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
+ pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
+ pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
+ pre_fw_load->wait_for_preboot_timeout = GAUDI2_PREBOOT_REQ_TIMEOUT_USEC;
+}
+
+static void gaudi2_init_firmware_loader(struct hl_device *hdev)
+{
+ struct fw_load_mgr *fw_loader = &hdev->fw_loader;
+ struct dynamic_fw_load_mgr *dynamic_loader;
+ struct cpu_dyn_regs *dyn_regs;
+
+ /* fill common fields */
+ fw_loader->fw_comp_loaded = FW_TYPE_NONE;
+ fw_loader->boot_fit_img.image_name = GAUDI2_BOOT_FIT_FILE;
+ fw_loader->linux_img.image_name = GAUDI2_LINUX_FW_FILE;
+ fw_loader->boot_fit_timeout = GAUDI2_BOOT_FIT_REQ_TIMEOUT_USEC;
+ fw_loader->skip_bmc = false;
+ fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
+ fw_loader->dram_bar_id = DRAM_BAR_ID;
+ fw_loader->cpu_timeout = GAUDI2_CPU_TIMEOUT_USEC;
+
+ /* here we update initial values for few specific dynamic regs (as
+ * before reading the first descriptor from FW those value has to be
+ * hard-coded). in later stages of the protocol those values will be
+ * updated automatically by reading the FW descriptor so data there
+ * will always be up-to-date
+ */
+ dynamic_loader = &hdev->fw_loader.dynamic_loader;
+ dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
+ dyn_regs->kmd_msg_to_cpu = cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
+ dyn_regs->cpu_cmd_status_to_host = cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
+ dynamic_loader->wait_for_bl_timeout = GAUDI2_WAIT_FOR_BL_TIMEOUT_USEC;
+}
+
+static int gaudi2_init_cpu(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int rc;
+
+ if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
+ return 0;
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_CPU)
+ return 0;
+
+ rc = hl_fw_init_cpu(hdev);
+ if (rc)
+ return rc;
+
+ gaudi2->hw_cap_initialized |= HW_CAP_CPU;
+
+ return 0;
+}
+
+static int gaudi2_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)
+{
+ struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct cpu_dyn_regs *dyn_regs;
+ struct hl_eq *eq;
+ u32 status;
+ int err;
+
+ if (!hdev->cpu_queues_enable)
+ return 0;
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
+ return 0;
+
+ eq = &hdev->event_queue;
+
+ dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+
+ WREG32(mmCPU_IF_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
+ WREG32(mmCPU_IF_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
+
+ WREG32(mmCPU_IF_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
+ WREG32(mmCPU_IF_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
+
+ WREG32(mmCPU_IF_CQ_BASE_ADDR_LOW, lower_32_bits(hdev->cpu_accessible_dma_address));
+ WREG32(mmCPU_IF_CQ_BASE_ADDR_HIGH, upper_32_bits(hdev->cpu_accessible_dma_address));
+
+ WREG32(mmCPU_IF_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
+ WREG32(mmCPU_IF_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
+ WREG32(mmCPU_IF_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
+
+ /* Used for EQ CI */
+ WREG32(mmCPU_IF_EQ_RD_OFFS, 0);
+
+ WREG32(mmCPU_IF_PF_PQ_PI, 0);
+
+ WREG32(mmCPU_IF_QUEUE_INIT, PQ_INIT_STATUS_READY_FOR_CP);
+
+ /* Let the ARC know we are ready as it is now handling those queues */
+
+ WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
+ gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
+
+ err = hl_poll_timeout(
+ hdev,
+ mmCPU_IF_QUEUE_INIT,
+ status,
+ (status == PQ_INIT_STATUS_READY_FOR_HOST),
+ 1000,
+ cpu_timeout);
+
+ if (err) {
+ dev_err(hdev->dev, "Failed to communicate with device CPU (timeout)\n");
+ return -EIO;
+ }
+
+ /* update FW application security bits */
+ if (prop->fw_cpu_boot_dev_sts0_valid)
+ prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
+
+ if (prop->fw_cpu_boot_dev_sts1_valid)
+ prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
+
+ gaudi2->hw_cap_initialized |= HW_CAP_CPU_Q;
+ return 0;
+}
+
+static void gaudi2_init_qman_pq(struct hl_device *hdev, u32 reg_base,
+ u32 queue_id_base)
+{
+ struct hl_hw_queue *q;
+ u32 pq_id, pq_offset;
+
+ for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
+ q = &hdev->kernel_queues[queue_id_base + pq_id];
+ pq_offset = pq_id * 4;
+
+ WREG32(reg_base + QM_PQ_BASE_LO_0_OFFSET + pq_offset,
+ lower_32_bits(q->bus_address));
+ WREG32(reg_base + QM_PQ_BASE_HI_0_OFFSET + pq_offset,
+ upper_32_bits(q->bus_address));
+ WREG32(reg_base + QM_PQ_SIZE_0_OFFSET + pq_offset, ilog2(HL_QUEUE_LENGTH));
+ WREG32(reg_base + QM_PQ_PI_0_OFFSET + pq_offset, 0);
+ WREG32(reg_base + QM_PQ_CI_0_OFFSET + pq_offset, 0);
+ }
+}
+
+static void gaudi2_init_qman_cp(struct hl_device *hdev, u32 reg_base)
+{
+ u32 cp_id, cp_offset, mtr_base_lo, mtr_base_hi, so_base_lo, so_base_hi;
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ for (cp_id = 0 ; cp_id < NUM_OF_CP_PER_QMAN; cp_id++) {
+ cp_offset = cp_id * 4;
+
+ WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_LO_0_OFFSET + cp_offset, mtr_base_lo);
+ WREG32(reg_base + QM_CP_MSG_BASE0_ADDR_HI_0_OFFSET + cp_offset, mtr_base_hi);
+ WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_LO_0_OFFSET + cp_offset, so_base_lo);
+ WREG32(reg_base + QM_CP_MSG_BASE1_ADDR_HI_0_OFFSET + cp_offset, so_base_hi);
+ }
+
+ /* allow QMANs to accept work from ARC CQF */
+ WREG32(reg_base + QM_CP_CFG_OFFSET, FIELD_PREP(PDMA0_QM_CP_CFG_SWITCH_EN_MASK, 0x1));
+}
+
+static void gaudi2_init_qman_pqc(struct hl_device *hdev, u32 reg_base,
+ u32 queue_id_base)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 pq_id, pq_offset, so_base_lo, so_base_hi;
+
+ so_base_lo = lower_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0);
+
+ for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++) {
+ pq_offset = pq_id * 4;
+
+ /* Configure QMAN HBW to scratchpad as it is not needed */
+ WREG32(reg_base + QM_PQC_HBW_BASE_LO_0_OFFSET + pq_offset,
+ lower_32_bits(gaudi2->scratchpad_bus_address));
+ WREG32(reg_base + QM_PQC_HBW_BASE_HI_0_OFFSET + pq_offset,
+ upper_32_bits(gaudi2->scratchpad_bus_address));
+ WREG32(reg_base + QM_PQC_SIZE_0_OFFSET + pq_offset,
+ ilog2(PAGE_SIZE / sizeof(struct hl_cq_entry)));
+
+ WREG32(reg_base + QM_PQC_PI_0_OFFSET + pq_offset, 0);
+ WREG32(reg_base + QM_PQC_LBW_WDATA_0_OFFSET + pq_offset, QM_PQC_LBW_WDATA);
+ WREG32(reg_base + QM_PQC_LBW_BASE_LO_0_OFFSET + pq_offset, so_base_lo);
+ WREG32(reg_base + QM_PQC_LBW_BASE_HI_0_OFFSET + pq_offset, so_base_hi);
+ }
+
+ /* Enable QMAN H/W completion */
+ WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
+}
+
+static u32 gaudi2_get_dyn_sp_reg(struct hl_device *hdev, u32 queue_id_base)
+{
+ struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 sp_reg_addr;
+
+ switch (queue_id_base) {
+ case GAUDI2_QUEUE_ID_PDMA_0_0...GAUDI2_QUEUE_ID_PDMA_1_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
+ sp_reg_addr = le32_to_cpu(dyn_regs->gic_dma_qm_irq_ctrl);
+ break;
+ case GAUDI2_QUEUE_ID_DCORE0_MME_0_0...GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE1_MME_0_0...GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE2_MME_0_0...GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE3_MME_0_0...GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
+ sp_reg_addr = le32_to_cpu(dyn_regs->gic_mme_qm_irq_ctrl);
+ break;
+ case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
+ fallthrough;
+ case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
+ sp_reg_addr = le32_to_cpu(dyn_regs->gic_tpc_qm_irq_ctrl);
+ break;
+ case GAUDI2_QUEUE_ID_ROT_0_0...GAUDI2_QUEUE_ID_ROT_1_3:
+ sp_reg_addr = le32_to_cpu(dyn_regs->gic_rot_qm_irq_ctrl);
+ break;
+ case GAUDI2_QUEUE_ID_NIC_0_0...GAUDI2_QUEUE_ID_NIC_23_3:
+ sp_reg_addr = le32_to_cpu(dyn_regs->gic_nic_qm_irq_ctrl);
+ break;
+ default:
+ dev_err(hdev->dev, "Unexpected h/w queue %d\n", queue_id_base);
+ return 0;
+ }
+
+ return sp_reg_addr;
+}
+
+static void gaudi2_init_qman_common(struct hl_device *hdev, u32 reg_base,
+ u32 queue_id_base)
+{
+ u32 glbl_prot = QMAN_MAKE_TRUSTED, irq_handler_offset;
+ int map_table_entry;
+
+ WREG32(reg_base + QM_GLBL_PROT_OFFSET, glbl_prot);
+
+ irq_handler_offset = gaudi2_get_dyn_sp_reg(hdev, queue_id_base);
+ WREG32(reg_base + QM_GLBL_ERR_ADDR_LO_OFFSET, lower_32_bits(CFG_BASE + irq_handler_offset));
+ WREG32(reg_base + QM_GLBL_ERR_ADDR_HI_OFFSET, upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ map_table_entry = gaudi2_qman_async_event_id[queue_id_base];
+ WREG32(reg_base + QM_GLBL_ERR_WDATA_OFFSET,
+ gaudi2_irq_map_table[map_table_entry].cpu_id);
+
+ WREG32(reg_base + QM_ARB_ERR_MSG_EN_OFFSET, QM_ARB_ERR_MSG_EN_MASK);
+
+ WREG32(reg_base + QM_ARB_SLV_CHOISE_WDT_OFFSET, GAUDI2_ARB_WDT_TIMEOUT);
+ WREG32(reg_base + QM_GLBL_CFG1_OFFSET, 0);
+ WREG32(reg_base + QM_GLBL_CFG2_OFFSET, 0);
+
+ /* Enable the QMAN channel.
+ * PDMA QMAN configuration is different, as we do not allow user to
+ * access some of the CPs.
+ * PDMA0: CP2/3 are reserved for the ARC usage.
+ * PDMA1: CP1/2/3 are reserved for the ARC usage.
+ */
+ if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0])
+ WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA1_QMAN_ENABLE);
+ else if (reg_base == gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0])
+ WREG32(reg_base + QM_GLBL_CFG0_OFFSET, PDMA0_QMAN_ENABLE);
+ else
+ WREG32(reg_base + QM_GLBL_CFG0_OFFSET, QMAN_ENABLE);
+}
+
+static void gaudi2_init_qman(struct hl_device *hdev, u32 reg_base,
+ u32 queue_id_base)
+{
+ u32 pq_id;
+
+ for (pq_id = 0 ; pq_id < NUM_OF_PQ_PER_QMAN ; pq_id++)
+ hdev->kernel_queues[queue_id_base + pq_id].cq_id = GAUDI2_RESERVED_CQ_CS_COMPLETION;
+
+ gaudi2_init_qman_pq(hdev, reg_base, queue_id_base);
+ gaudi2_init_qman_cp(hdev, reg_base);
+ gaudi2_init_qman_pqc(hdev, reg_base, queue_id_base);
+ gaudi2_init_qman_common(hdev, reg_base, queue_id_base);
+}
+
+static void gaudi2_init_dma_core(struct hl_device *hdev, u32 reg_base,
+ u32 dma_core_id, bool is_secure)
+{
+ u32 prot, irq_handler_offset;
+ struct cpu_dyn_regs *dyn_regs;
+ int map_table_entry;
+
+ prot = 1 << ARC_FARM_KDMA_PROT_ERR_VAL_SHIFT;
+ if (is_secure)
+ prot |= 1 << ARC_FARM_KDMA_PROT_VAL_SHIFT;
+
+ WREG32(reg_base + DMA_CORE_PROT_OFFSET, prot);
+
+ dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ irq_handler_offset = le32_to_cpu(dyn_regs->gic_dma_core_irq_ctrl);
+
+ WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_LO_OFFSET,
+ lower_32_bits(CFG_BASE + irq_handler_offset));
+
+ WREG32(reg_base + DMA_CORE_ERRMSG_ADDR_HI_OFFSET,
+ upper_32_bits(CFG_BASE + irq_handler_offset));
+
+ map_table_entry = gaudi2_dma_core_async_event_id[dma_core_id];
+ WREG32(reg_base + DMA_CORE_ERRMSG_WDATA_OFFSET,
+ gaudi2_irq_map_table[map_table_entry].cpu_id);
+
+ /* Enable the DMA channel */
+ WREG32(reg_base + DMA_CORE_CFG_0_OFFSET, 1 << ARC_FARM_KDMA_CFG_0_EN_SHIFT);
+}
+
+static void gaudi2_init_kdma(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+
+ if ((gaudi2->hw_cap_initialized & HW_CAP_KDMA) == HW_CAP_KDMA)
+ return;
+
+ reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_KDMA];
+
+ gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_KDMA, true);
+
+ gaudi2->hw_cap_initialized |= HW_CAP_KDMA;
+}
+
+static void gaudi2_init_pdma(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_base;
+
+ if ((gaudi2->hw_cap_initialized & HW_CAP_PDMA_MASK) == HW_CAP_PDMA_MASK)
+ return;
+
+ reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA0];
+ gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA0, false);
+
+ reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_0_0];
+ gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_0_0);
+
+ reg_base = gaudi2_dma_core_blocks_bases[DMA_CORE_ID_PDMA1];
+ gaudi2_init_dma_core(hdev, reg_base, DMA_CORE_ID_PDMA1, false);
+
+ reg_base = gaudi2_qm_blocks_bases[GAUDI2_QUEUE_ID_PDMA_1_0];
+ gaudi2_init_qman(hdev, reg_base, GAUDI2_QUEUE_ID_PDMA_1_0);
+
+ gaudi2->hw_cap_initialized |= HW_CAP_PDMA_MASK;
+}
+
+static void gaudi2_init_edma_instance(struct hl_device *hdev, u8 seq)
+{
+ u32 reg_base, base_edma_core_id, base_edma_qman_id;
+
+ base_edma_core_id = DMA_CORE_ID_EDMA0 + seq;
+ base_edma_qman_id = edma_stream_base[seq];
+
+ reg_base = gaudi2_dma_core_blocks_bases[base_edma_core_id];
+ gaudi2_init_dma_core(hdev, reg_base, base_edma_core_id, false);
+
+ reg_base = gaudi2_qm_blocks_bases[base_edma_qman_id];
+ gaudi2_init_qman(hdev, reg_base, base_edma_qman_id);
+}
+
+static void gaudi2_init_edma(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int dcore, inst;
+
+ if ((gaudi2->hw_cap_initialized & HW_CAP_EDMA_MASK) == HW_CAP_EDMA_MASK)
+ return;
+
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (inst = 0 ; inst < NUM_OF_EDMA_PER_DCORE ; inst++) {
+ u8 seq = dcore * NUM_OF_EDMA_PER_DCORE + inst;
+
+ if (!(prop->edma_enabled_mask & BIT(seq)))
+ continue;
+
+ gaudi2_init_edma_instance(hdev, seq);
+
+ gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_EDMA_SHIFT + seq);
+ }
+ }
+}
+
+/*
+ * gaudi2_arm_monitors_for_virt_msix_db() - Arm monitors for writing to the virtual MSI-X doorbell.
+ * @hdev: pointer to habanalabs device structure.
+ * @sob_id: sync object ID.
+ * @first_mon_id: ID of first monitor out of 3 consecutive monitors.
+ * @interrupt_id: interrupt ID.
+ *
+ * Some initiators cannot have HBW address in their completion address registers, and thus cannot
+ * write directly to the HBW host memory of the virtual MSI-X doorbell.
+ * Instead, they are configured to LBW write to a sync object, and a monitor will do the HBW write.
+ *
+ * The mechanism in the sync manager block is composed of a master monitor with 3 messages.
+ * In addition to the HBW write, the other 2 messages are for preparing the monitor to next
+ * completion, by decrementing the sync object value and re-arming the monitor.
+ */
+static void gaudi2_arm_monitors_for_virt_msix_db(struct hl_device *hdev, u32 sob_id,
+ u32 first_mon_id, u32 interrupt_id)
+{
+ u32 sob_offset, first_mon_offset, mon_offset, payload, sob_group, mode, arm, config;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 addr;
+ u8 mask;
+
+ /* Reset the SOB value */
+ sob_offset = sob_id * sizeof(u32);
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
+
+ /* Configure 3 monitors:
+ * 1. Write interrupt ID to the virtual MSI-X doorbell (master monitor)
+ * 2. Decrement SOB value by 1.
+ * 3. Re-arm the master monitor.
+ */
+
+ first_mon_offset = first_mon_id * sizeof(u32);
+
+ /* 2nd monitor: Decrement SOB value by 1 */
+ mon_offset = first_mon_offset + sizeof(u32);
+
+ addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
+
+ payload = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 0x7FFF) | /* "-1" */
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_SIGN_MASK, 1) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1);
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
+
+ /* 3rd monitor: Re-arm the master monitor */
+ mon_offset = first_mon_offset + 2 * sizeof(u32);
+
+ addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + first_mon_offset;
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
+
+ sob_group = sob_id / 8;
+ mask = ~BIT(sob_id & 0x7);
+ mode = 0; /* comparison mode is "greater than or equal to" */
+ arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sob_group) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, 1);
+
+ payload = arm;
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
+
+ /* 1st monitor (master): Write interrupt ID to the virtual MSI-X doorbell */
+ mon_offset = first_mon_offset;
+
+ config = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_WR_NUM_MASK, 2); /* "2": 3 writes */
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + mon_offset, config);
+
+ addr = gaudi2->virt_msix_db_dma_addr;
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, lower_32_bits(addr));
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_offset, upper_32_bits(addr));
+
+ payload = interrupt_id;
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, payload);
+
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, arm);
+}
+
+static void gaudi2_prepare_sm_for_virt_msix_db(struct hl_device *hdev)
+{
+ u32 decoder_id, sob_id, first_mon_id, interrupt_id;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+ /* Decoder normal/abnormal interrupts */
+ for (decoder_id = 0 ; decoder_id < NUMBER_OF_DEC ; ++decoder_id) {
+ if (!(prop->decoder_enabled_mask & BIT(decoder_id)))
+ continue;
+
+ sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
+ first_mon_id = GAUDI2_RESERVED_MON_DEC_NRM_FIRST + 3 * decoder_id;
+ interrupt_id = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM + 2 * decoder_id;
+ gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
+
+ sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
+ first_mon_id = GAUDI2_RESERVED_MON_DEC_ABNRM_FIRST + 3 * decoder_id;
+ interrupt_id += 1;
+ gaudi2_arm_monitors_for_virt_msix_db(hdev, sob_id, first_mon_id, interrupt_id);
+ }
+}
+
+static void gaudi2_init_sm(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 cq_address;
+ u32 reg_val;
+ int i;
+
+ /* Enable HBW/LBW CQ for completion monitors */
+ reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
+ reg_val |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_LBW_EN_MASK, 1);
+
+ for (i = 0 ; i < GAUDI2_MAX_PENDING_CS ; i++)
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
+
+ /* Enable only HBW CQ for KDMA completion monitor */
+ reg_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_CONFIG_CQ_EN_MASK, 1);
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + (4 * i), reg_val);
+
+ /* Init CQ0 DB - configure the monitor to trigger MSI-X interrupt */
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0, lower_32_bits(gaudi2->virt_msix_db_dma_addr));
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0, upper_32_bits(gaudi2->virt_msix_db_dma_addr));
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0, GAUDI2_IRQ_NUM_COMPLETION);
+
+ for (i = 0 ; i < GAUDI2_RESERVED_CQ_NUMBER ; i++) {
+ cq_address =
+ hdev->completion_queue[i].bus_address;
+
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + (4 * i),
+ lower_32_bits(cq_address));
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + (4 * i),
+ upper_32_bits(cq_address));
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + (4 * i),
+ ilog2(HL_CQ_SIZE_IN_BYTES));
+ }
+
+ /* Configure kernel ASID and MMU BP*/
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_SEC, 0x10000);
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV, 0);
+
+ /* Initialize sync objects and monitors which are used for the virtual MSI-X doorbell */
+ gaudi2_prepare_sm_for_virt_msix_db(hdev);
+}
+
+static void gaudi2_init_mme_acc(struct hl_device *hdev, u32 reg_base)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 reg_val;
+ int i;
+
+ reg_val = FIELD_PREP(MME_ACC_INTR_MASK_WBC_ERR_RESP_MASK, 0);
+ reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_POS_INF_MASK, 1);
+ reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NEG_INF_MASK, 1);
+ reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_SRC_NAN_MASK, 1);
+ reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_POS_INF_MASK, 1);
+ reg_val |= FIELD_PREP(MME_ACC_INTR_MASK_AP_RESULT_NEG_INF_MASK, 1);
+
+ WREG32(reg_base + MME_ACC_INTR_MASK_OFFSET, reg_val);
+ WREG32(reg_base + MME_ACC_AP_LFSR_POLY_OFFSET, 0x80DEADAF);
+
+ for (i = 0 ; i < MME_NUM_OF_LFSR_SEEDS ; i++) {
+ WREG32(reg_base + MME_ACC_AP_LFSR_SEED_SEL_OFFSET, i);
+ WREG32(reg_base + MME_ACC_AP_LFSR_SEED_WDATA_OFFSET, gaudi2->lfsr_rand_seeds[i]);
+ }
+}
+
+static void gaudi2_init_dcore_mme(struct hl_device *hdev, int dcore_id,
+ bool config_qman_only)
+{
+ u32 queue_id_base, reg_base;
+
+ switch (dcore_id) {
+ case 0:
+ queue_id_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
+ break;
+ case 1:
+ queue_id_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
+ break;
+ case 2:
+ queue_id_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
+ break;
+ case 3:
+ queue_id_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
+ break;
+ default:
+ dev_err(hdev->dev, "Invalid dcore id %u\n", dcore_id);
+ return;
+ }
+
+ if (!config_qman_only) {
+ reg_base = gaudi2_mme_acc_blocks_bases[dcore_id];
+ gaudi2_init_mme_acc(hdev, reg_base);
+ }
+
+ reg_base = gaudi2_qm_blocks_bases[queue_id_base];
+ gaudi2_init_qman(hdev, reg_base, queue_id_base);
+}
+
+static void gaudi2_init_mme(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int i;
+
+ if ((gaudi2->hw_cap_initialized & HW_CAP_MME_MASK) == HW_CAP_MME_MASK)
+ return;
+
+ for (i = 0 ; i < NUM_OF_DCORES ; i++) {
+ gaudi2_init_dcore_mme(hdev, i, false);
+
+ gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_MME_SHIFT + i);
+ }
+}
+
+static void gaudi2_init_tpc_cfg(struct hl_device *hdev, u32 reg_base)
+{
+ /* Mask arithmetic and QM interrupts in TPC */
+ WREG32(reg_base + TPC_CFG_TPC_INTR_MASK_OFFSET, 0x23FFFE);
+
+ /* Set 16 cache lines */
+ WREG32(reg_base + TPC_CFG_MSS_CONFIG_OFFSET,
+ 2 << DCORE0_TPC0_CFG_MSS_CONFIG_ICACHE_FETCH_LINE_NUM_SHIFT);
+}
+
+struct gaudi2_tpc_init_cfg_data {
+ enum gaudi2_queue_id dcore_tpc_qid_base[NUM_OF_DCORES];
+};
+
+static void gaudi2_init_tpc_config(struct hl_device *hdev, int dcore, int inst,
+ u32 offset, struct iterate_module_ctx *ctx)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_tpc_init_cfg_data *cfg_data = ctx->data;
+ u32 queue_id_base;
+ u8 seq;
+
+ queue_id_base = cfg_data->dcore_tpc_qid_base[dcore] + (inst * NUM_OF_PQ_PER_QMAN);
+
+ if (dcore == 0 && inst == (NUM_DCORE0_TPC - 1))
+ /* gets last sequence number */
+ seq = NUM_OF_DCORES * NUM_OF_TPC_PER_DCORE;
+ else
+ seq = dcore * NUM_OF_TPC_PER_DCORE + inst;
+
+ gaudi2_init_tpc_cfg(hdev, mmDCORE0_TPC0_CFG_BASE + offset);
+ gaudi2_init_qman(hdev, mmDCORE0_TPC0_QM_BASE + offset, queue_id_base);
+
+ gaudi2->tpc_hw_cap_initialized |= BIT_ULL(HW_CAP_TPC_SHIFT + seq);
+}
+
+static void gaudi2_init_tpc(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_tpc_init_cfg_data init_cfg_data;
+ struct iterate_module_ctx tpc_iter;
+
+ if (!hdev->asic_prop.tpc_enabled_mask)
+ return;
+
+ if ((gaudi2->tpc_hw_cap_initialized & HW_CAP_TPC_MASK) == HW_CAP_TPC_MASK)
+ return;
+
+ init_cfg_data.dcore_tpc_qid_base[0] = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0;
+ init_cfg_data.dcore_tpc_qid_base[1] = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0;
+ init_cfg_data.dcore_tpc_qid_base[2] = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0;
+ init_cfg_data.dcore_tpc_qid_base[3] = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0;
+ tpc_iter.fn = &gaudi2_init_tpc_config;
+ tpc_iter.data = &init_cfg_data;
+ gaudi2_iterate_tpcs(hdev, &tpc_iter);
+}
+
+static void gaudi2_init_rotator(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 i, reg_base, queue_id;
+
+ queue_id = GAUDI2_QUEUE_ID_ROT_0_0;
+
+ for (i = 0 ; i < NUM_OF_ROT ; i++, queue_id += NUM_OF_PQ_PER_QMAN) {
+ reg_base = gaudi2_qm_blocks_bases[queue_id];
+ gaudi2_init_qman(hdev, reg_base, queue_id);
+
+ gaudi2->hw_cap_initialized |= BIT_ULL(HW_CAP_ROT_SHIFT + i);
+ }
+}
+
+static void gaudi2_init_vdec_brdg_ctrl(struct hl_device *hdev, u64 base_addr, u32 decoder_id)
+{
+ u32 sob_id;
+
+ /* VCMD normal interrupt */
+ sob_id = GAUDI2_RESERVED_SOB_DEC_NRM_FIRST + decoder_id;
+ WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_AWADDR,
+ mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
+ WREG32(base_addr + BRDG_CTRL_NRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
+
+ /* VCMD abnormal interrupt */
+ sob_id = GAUDI2_RESERVED_SOB_DEC_ABNRM_FIRST + decoder_id;
+ WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_AWADDR,
+ mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_id * sizeof(u32));
+ WREG32(base_addr + BRDG_CTRL_ABNRM_MSIX_LBW_WDATA, GAUDI2_SOB_INCREMENT_BY_ONE);
+}
+
+static void gaudi2_init_dec(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 dcore_id, dec_id, dec_bit;
+ u64 base_addr;
+
+ if (!hdev->asic_prop.decoder_enabled_mask)
+ return;
+
+ if ((gaudi2->dec_hw_cap_initialized & HW_CAP_DEC_MASK) == HW_CAP_DEC_MASK)
+ return;
+
+ for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
+ for (dec_id = 0 ; dec_id < NUM_OF_DEC_PER_DCORE ; dec_id++) {
+ dec_bit = dcore_id * NUM_OF_DEC_PER_DCORE + dec_id;
+
+ if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
+ continue;
+
+ base_addr = mmDCORE0_DEC0_CMD_BASE +
+ BRDG_CTRL_BLOCK_OFFSET +
+ dcore_id * DCORE_OFFSET +
+ dec_id * DCORE_VDEC_OFFSET;
+
+ gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
+
+ gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
+ }
+
+ for (dec_id = 0 ; dec_id < NUM_OF_PCIE_VDEC ; dec_id++) {
+ dec_bit = PCIE_DEC_SHIFT + dec_id;
+ if (!(hdev->asic_prop.decoder_enabled_mask & BIT(dec_bit)))
+ continue;
+
+ base_addr = mmPCIE_DEC0_CMD_BASE + BRDG_CTRL_BLOCK_OFFSET +
+ dec_id * DCORE_VDEC_OFFSET;
+
+ gaudi2_init_vdec_brdg_ctrl(hdev, base_addr, dec_bit);
+
+ gaudi2->dec_hw_cap_initialized |= BIT_ULL(HW_CAP_DEC_SHIFT + dec_bit);
+ }
+}
+
+static int gaudi2_mmu_update_asid_hop0_addr(struct hl_device *hdev,
+ u32 stlb_base, u32 asid, u64 phys_addr)
+{
+ u32 status, timeout_usec;
+ int rc;
+
+ if (hdev->pldm || !hdev->pdev)
+ timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
+
+ WREG32(stlb_base + STLB_ASID_OFFSET, asid);
+ WREG32(stlb_base + STLB_HOP0_PA43_12_OFFSET, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
+ WREG32(stlb_base + STLB_HOP0_PA63_44_OFFSET, phys_addr >> MMU_HOP0_PA63_44_SHIFT);
+ WREG32(stlb_base + STLB_BUSY_OFFSET, 0x80000000);
+
+ rc = hl_poll_timeout(
+ hdev,
+ stlb_base + STLB_BUSY_OFFSET,
+ status,
+ !(status & 0x80000000),
+ 1000,
+ timeout_usec);
+
+ if (rc) {
+ dev_err(hdev->dev, "Timeout during MMU hop0 config of asid %d\n", asid);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void gaudi2_mmu_send_invalidate_cache_cmd(struct hl_device *hdev, u32 stlb_base,
+ u32 start_offset, u32 inv_start_val,
+ u32 flags)
+{
+ /* clear PMMU mem line cache (only needed in mmu range invalidation) */
+ if (flags & MMU_OP_CLEAR_MEMCACHE)
+ WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INVALIDATION, 0x1);
+
+ if (flags & MMU_OP_SKIP_LOW_CACHE_INV)
+ return;
+
+ WREG32(stlb_base + start_offset, inv_start_val);
+}
+
+static int gaudi2_mmu_invalidate_cache_status_poll(struct hl_device *hdev, u32 stlb_base,
+ struct gaudi2_cache_invld_params *inv_params)
+{
+ u32 status, timeout_usec, start_offset;
+ int rc;
+
+ timeout_usec = (hdev->pldm) ? GAUDI2_PLDM_MMU_TIMEOUT_USEC :
+ GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
+
+ /* poll PMMU mem line cache (only needed in mmu range invalidation) */
+ if (inv_params->flags & MMU_OP_CLEAR_MEMCACHE) {
+ rc = hl_poll_timeout(
+ hdev,
+ mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS,
+ status,
+ status & 0x1,
+ 1000,
+ timeout_usec);
+
+ if (rc)
+ return rc;
+
+ /* Need to manually reset the status to 0 */
+ WREG32(mmPMMU_HBW_STLB_MEM_CACHE_INV_STATUS, 0x0);
+ }
+
+ /* Lower cache does not work with cache lines, hence we can skip its
+ * invalidation upon map and invalidate only upon unmap
+ */
+ if (inv_params->flags & MMU_OP_SKIP_LOW_CACHE_INV)
+ return 0;
+
+ start_offset = inv_params->range_invalidation ?
+ STLB_RANGE_CACHE_INVALIDATION_OFFSET : STLB_INV_ALL_START_OFFSET;
+
+ rc = hl_poll_timeout(
+ hdev,
+ stlb_base + start_offset,
+ status,
+ !(status & 0x1),
+ 1000,
+ timeout_usec);
+
+ return rc;
+}
+
+bool gaudi2_is_hmmu_enabled(struct hl_device *hdev, int dcore_id, int hmmu_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 hw_cap;
+
+ hw_cap = HW_CAP_DCORE0_DMMU0 << (NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id);
+
+ if (gaudi2->hw_cap_initialized & hw_cap)
+ return true;
+
+ return false;
+}
+
+/* this function shall be called only for HMMUs for which capability bit is set */
+static inline u32 get_hmmu_stlb_base(int dcore_id, int hmmu_id)
+{
+ u32 offset;
+
+ offset = (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
+ return (u32)(mmDCORE0_HMMU0_STLB_BASE + offset);
+}
+
+static void gaudi2_mmu_invalidate_cache_trigger(struct hl_device *hdev, u32 stlb_base,
+ struct gaudi2_cache_invld_params *inv_params)
+{
+ u32 start_offset;
+
+ if (inv_params->range_invalidation) {
+ /* Set the addresses range
+ * Note: that the start address we set in register, is not included in
+ * the range of the invalidation, by design.
+ * that's why we need to set lower address than the one we actually
+ * want to be included in the range invalidation.
+ */
+ u64 start = inv_params->start_va - 1;
+
+ start_offset = STLB_RANGE_CACHE_INVALIDATION_OFFSET;
+
+ WREG32(stlb_base + STLB_RANGE_INV_START_LSB_OFFSET,
+ start >> MMU_RANGE_INV_VA_LSB_SHIFT);
+
+ WREG32(stlb_base + STLB_RANGE_INV_START_MSB_OFFSET,
+ start >> MMU_RANGE_INV_VA_MSB_SHIFT);
+
+ WREG32(stlb_base + STLB_RANGE_INV_END_LSB_OFFSET,
+ inv_params->end_va >> MMU_RANGE_INV_VA_LSB_SHIFT);
+
+ WREG32(stlb_base + STLB_RANGE_INV_END_MSB_OFFSET,
+ inv_params->end_va >> MMU_RANGE_INV_VA_MSB_SHIFT);
+ } else {
+ start_offset = STLB_INV_ALL_START_OFFSET;
+ }
+
+ gaudi2_mmu_send_invalidate_cache_cmd(hdev, stlb_base, start_offset,
+ inv_params->inv_start_val, inv_params->flags);
+}
+
+static inline void gaudi2_hmmu_invalidate_cache_trigger(struct hl_device *hdev,
+ int dcore_id, int hmmu_id,
+ struct gaudi2_cache_invld_params *inv_params)
+{
+ u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
+
+ gaudi2_mmu_invalidate_cache_trigger(hdev, stlb_base, inv_params);
+}
+
+static inline int gaudi2_hmmu_invalidate_cache_status_poll(struct hl_device *hdev,
+ int dcore_id, int hmmu_id,
+ struct gaudi2_cache_invld_params *inv_params)
+{
+ u32 stlb_base = get_hmmu_stlb_base(dcore_id, hmmu_id);
+
+ return gaudi2_mmu_invalidate_cache_status_poll(hdev, stlb_base, inv_params);
+}
+
+static int gaudi2_hmmus_invalidate_cache(struct hl_device *hdev,
+ struct gaudi2_cache_invld_params *inv_params)
+{
+ int dcore_id, hmmu_id;
+
+ /* first send all invalidation commands */
+ for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
+ for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
+ if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
+ continue;
+
+ gaudi2_hmmu_invalidate_cache_trigger(hdev, dcore_id, hmmu_id, inv_params);
+ }
+ }
+
+ /* next, poll all invalidations status */
+ for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
+ for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE ; hmmu_id++) {
+ int rc;
+
+ if (!gaudi2_is_hmmu_enabled(hdev, dcore_id, hmmu_id))
+ continue;
+
+ rc = gaudi2_hmmu_invalidate_cache_status_poll(hdev, dcore_id, hmmu_id,
+ inv_params);
+ if (rc)
+ return rc;
+ }
+ }
+
+ return 0;
+}
+
+static int gaudi2_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard, u32 flags)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_cache_invld_params invld_params;
+ int rc = 0;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return rc;
+
+ invld_params.range_invalidation = false;
+ invld_params.inv_start_val = 1;
+
+ if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
+ invld_params.flags = flags;
+ gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
+ rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
+ &invld_params);
+ } else if (flags & MMU_OP_PHYS_PACK) {
+ invld_params.flags = 0;
+ rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
+ }
+
+ return rc;
+}
+
+static int gaudi2_mmu_invalidate_cache_range(struct hl_device *hdev, bool is_hard,
+ u32 flags, u32 asid, u64 va, u64 size)
+{
+ struct gaudi2_cache_invld_params invld_params = {0};
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 start_va, end_va;
+ u32 inv_start_val;
+ int rc = 0;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return 0;
+
+ inv_start_val = (1 << MMU_RANGE_INV_EN_SHIFT |
+ 1 << MMU_RANGE_INV_ASID_EN_SHIFT |
+ asid << MMU_RANGE_INV_ASID_SHIFT);
+ start_va = va;
+ end_va = start_va + size;
+
+ if ((flags & MMU_OP_USERPTR) && (gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
+ /* As range invalidation does not support zero address we will
+ * do full invalidation in this case
+ */
+ if (start_va) {
+ invld_params.range_invalidation = true;
+ invld_params.start_va = start_va;
+ invld_params.end_va = end_va;
+ invld_params.inv_start_val = inv_start_val;
+ invld_params.flags = flags | MMU_OP_CLEAR_MEMCACHE;
+ } else {
+ invld_params.range_invalidation = false;
+ invld_params.inv_start_val = 1;
+ invld_params.flags = flags;
+ }
+
+
+ gaudi2_mmu_invalidate_cache_trigger(hdev, mmPMMU_HBW_STLB_BASE, &invld_params);
+ rc = gaudi2_mmu_invalidate_cache_status_poll(hdev, mmPMMU_HBW_STLB_BASE,
+ &invld_params);
+ if (rc)
+ return rc;
+
+ } else if (flags & MMU_OP_PHYS_PACK) {
+ invld_params.start_va = gaudi2_mmu_scramble_addr(hdev, start_va);
+ invld_params.end_va = gaudi2_mmu_scramble_addr(hdev, end_va);
+ invld_params.inv_start_val = inv_start_val;
+ invld_params.flags = flags;
+ rc = gaudi2_hmmus_invalidate_cache(hdev, &invld_params);
+ }
+
+ return rc;
+}
+
+static int gaudi2_mmu_update_hop0_addr(struct hl_device *hdev, u32 stlb_base)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 hop0_addr;
+ u32 asid, max_asid = prop->max_asid;
+ int rc;
+
+ /* it takes too much time to init all of the ASIDs on palladium */
+ if (hdev->pldm)
+ max_asid = min((u32) 8, max_asid);
+
+ for (asid = 0 ; asid < max_asid ; asid++) {
+ hop0_addr = hdev->mmu_priv.hr.mmu_asid_hop0[asid].phys_addr;
+ rc = gaudi2_mmu_update_asid_hop0_addr(hdev, stlb_base, asid, hop0_addr);
+ if (rc) {
+ dev_err(hdev->dev, "failed to set hop0 addr for asid %d\n", asid);
+ return rc;
+ }
+ }
+
+ return 0;
+}
+
+static int gaudi2_mmu_init_common(struct hl_device *hdev, u32 mmu_base, u32 stlb_base)
+{
+ u32 status, timeout_usec;
+ int rc;
+
+ if (hdev->pldm || !hdev->pdev)
+ timeout_usec = GAUDI2_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = GAUDI2_MMU_CACHE_INV_TIMEOUT_USEC;
+
+ WREG32(stlb_base + STLB_INV_ALL_START_OFFSET, 1);
+
+ rc = hl_poll_timeout(
+ hdev,
+ stlb_base + STLB_SRAM_INIT_OFFSET,
+ status,
+ !status,
+ 1000,
+ timeout_usec);
+
+ if (rc)
+ dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU SRAM init\n");
+
+ rc = gaudi2_mmu_update_hop0_addr(hdev, stlb_base);
+ if (rc)
+ return rc;
+
+ WREG32(mmu_base + MMU_BYPASS_OFFSET, 0);
+
+ rc = hl_poll_timeout(
+ hdev,
+ stlb_base + STLB_INV_ALL_START_OFFSET,
+ status,
+ !status,
+ 1000,
+ timeout_usec);
+
+ if (rc)
+ dev_notice_ratelimited(hdev->dev, "Timeout when waiting for MMU invalidate all\n");
+
+ WREG32(mmu_base + MMU_ENABLE_OFFSET, 1);
+
+ return rc;
+}
+
+static int gaudi2_pci_mmu_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 mmu_base, stlb_base;
+ int rc;
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_PMMU)
+ return 0;
+
+ mmu_base = mmPMMU_HBW_MMU_BASE;
+ stlb_base = mmPMMU_HBW_STLB_BASE;
+
+ RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
+ (0 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_SHIFT) |
+ (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_SHIFT) |
+ (4 << PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_SHIFT) |
+ (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_SHIFT) |
+ (5 << PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_SHIFT),
+ PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
+ PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
+ PMMU_HBW_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
+ PMMU_HBW_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
+ PMMU_HBW_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
+
+ WREG32(stlb_base + STLB_LL_LOOKUP_MASK_63_32_OFFSET, 0);
+
+ if (PAGE_SIZE == SZ_64K) {
+ /* Set page sizes to 64K on hop5 and 16M on hop4 + enable 8 bit hops */
+ RMWREG32_SHIFTED(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET,
+ FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK, 4) |
+ FIELD_PREP(DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK, 3) |
+ FIELD_PREP(
+ DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK,
+ 1),
+ DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP5_PAGE_SIZE_MASK |
+ DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK |
+ DCORE0_HMMU0_MMU_STATIC_MULTI_PAGE_SIZE_CFG_8_BITS_HOP_MODE_EN_MASK);
+ }
+
+ WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_PMMU_SPI_SEI_ENABLE_MASK);
+
+ rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
+ if (rc)
+ return rc;
+
+ gaudi2->hw_cap_initialized |= HW_CAP_PMMU;
+
+ return 0;
+}
+
+static int gaudi2_dcore_hmmu_init(struct hl_device *hdev, int dcore_id,
+ int hmmu_id)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 offset, mmu_base, stlb_base, hw_cap;
+ u8 dmmu_seq;
+ int rc;
+
+ dmmu_seq = NUM_OF_HMMU_PER_DCORE * dcore_id + hmmu_id;
+ hw_cap = HW_CAP_DCORE0_DMMU0 << dmmu_seq;
+
+ /*
+ * return if DMMU is already initialized or if it's not out of
+ * isolation (due to cluster binning)
+ */
+ if ((gaudi2->hw_cap_initialized & hw_cap) || !(prop->hmmu_hif_enabled_mask & BIT(dmmu_seq)))
+ return 0;
+
+ offset = (u32) (dcore_id * DCORE_OFFSET + hmmu_id * DCORE_HMMU_OFFSET);
+ mmu_base = mmDCORE0_HMMU0_MMU_BASE + offset;
+ stlb_base = mmDCORE0_HMMU0_STLB_BASE + offset;
+
+ RMWREG32(mmu_base + MMU_STATIC_MULTI_PAGE_SIZE_OFFSET, 5 /* 64MB */,
+ MMU_STATIC_MULTI_PAGE_SIZE_HOP4_PAGE_SIZE_MASK);
+
+ RMWREG32_SHIFTED(stlb_base + STLB_HOP_CONFIGURATION_OFFSET,
+ FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK, 0) |
+ FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK, 3) |
+ FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK, 3) |
+ FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK, 3) |
+ FIELD_PREP(DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK, 3),
+ DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_HOP_MASK |
+ DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_SMALL_P_MASK |
+ DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FIRST_LOOKUP_HOP_LARGE_P_MASK |
+ DCORE0_HMMU0_STLB_HOP_CONFIGURATION_LAST_HOP_MASK |
+ DCORE0_HMMU0_STLB_HOP_CONFIGURATION_FOLLOWER_HOP_MASK);
+
+ RMWREG32(stlb_base + STLB_HOP_CONFIGURATION_OFFSET, 1,
+ STLB_HOP_CONFIGURATION_ONLY_LARGE_PAGE_MASK);
+
+ WREG32(mmu_base + MMU_SPI_SEI_MASK_OFFSET, GAUDI2_HMMU_SPI_SEI_ENABLE_MASK);
+
+ rc = gaudi2_mmu_init_common(hdev, mmu_base, stlb_base);
+ if (rc)
+ return rc;
+
+ gaudi2->hw_cap_initialized |= hw_cap;
+
+ return 0;
+}
+
+static int gaudi2_hbm_mmu_init(struct hl_device *hdev)
+{
+ int rc, dcore_id, hmmu_id;
+
+ for (dcore_id = 0 ; dcore_id < NUM_OF_DCORES ; dcore_id++)
+ for (hmmu_id = 0 ; hmmu_id < NUM_OF_HMMU_PER_DCORE; hmmu_id++) {
+ rc = gaudi2_dcore_hmmu_init(hdev, dcore_id, hmmu_id);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int gaudi2_mmu_init(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = gaudi2_pci_mmu_init(hdev);
+ if (rc)
+ return rc;
+
+ rc = gaudi2_hbm_mmu_init(hdev);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int gaudi2_hw_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int rc;
+
+ /* Let's mark in the H/W that we have reached this point. We check
+ * this value in the reset_before_init function to understand whether
+ * we need to reset the chip before doing H/W init. This register is
+ * cleared by the H/W upon H/W reset
+ */
+ WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
+
+ /* Perform read from the device to make sure device is up */
+ RREG32(mmHW_STATE);
+
+ /* If iATU is done by FW, the HBM bar ALWAYS points to DRAM_PHYS_BASE.
+ * So we set it here and if anyone tries to move it later to
+ * a different address, there will be an error
+ */
+ if (hdev->asic_prop.iatu_done_by_fw)
+ gaudi2->dram_bar_cur_addr = DRAM_PHYS_BASE;
+
+ /*
+ * Before pushing u-boot/linux to device, need to set the hbm bar to
+ * base address of dram
+ */
+ if (gaudi2_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
+ dev_err(hdev->dev, "failed to map HBM bar to DRAM base address\n");
+ return -EIO;
+ }
+
+ rc = gaudi2_init_cpu(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to initialize CPU\n");
+ return rc;
+ }
+
+ gaudi2_init_scrambler_hbm(hdev);
+ gaudi2_init_kdma(hdev);
+
+ rc = gaudi2_init_cpu_queues(hdev, GAUDI2_CPU_TIMEOUT_USEC);
+ if (rc) {
+ dev_err(hdev->dev, "failed to initialize CPU H/W queues %d\n", rc);
+ return rc;
+ }
+
+ rc = gaudi2->cpucp_info_get(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get cpucp info\n");
+ return rc;
+ }
+
+ rc = gaudi2_mmu_init(hdev);
+ if (rc)
+ return rc;
+
+ gaudi2_init_pdma(hdev);
+ gaudi2_init_edma(hdev);
+ gaudi2_init_sm(hdev);
+ gaudi2_init_tpc(hdev);
+ gaudi2_init_mme(hdev);
+ gaudi2_init_rotator(hdev);
+ gaudi2_init_dec(hdev);
+ gaudi2_enable_timestamp(hdev);
+
+ rc = gaudi2_coresight_init(hdev);
+ if (rc)
+ goto disable_queues;
+
+ rc = gaudi2_enable_msix(hdev);
+ if (rc)
+ goto disable_queues;
+
+ /* Perform read from the device to flush all configuration */
+ RREG32(mmHW_STATE);
+
+ return 0;
+
+disable_queues:
+ gaudi2_disable_dma_qmans(hdev);
+ gaudi2_disable_mme_qmans(hdev);
+ gaudi2_disable_tpc_qmans(hdev);
+ gaudi2_disable_rot_qmans(hdev);
+ gaudi2_disable_nic_qmans(hdev);
+
+ gaudi2_disable_timestamp(hdev);
+
+ return rc;
+}
+
+/**
+ * gaudi2_send_hard_reset_cmd - common function to handle reset
+ *
+ * @hdev: pointer to the habanalabs device structure
+ *
+ * This function handles the various possible scenarios for reset.
+ * It considers if reset is handled by driver\FW and what FW components are loaded
+ */
+static void gaudi2_send_hard_reset_cmd(struct hl_device *hdev)
+{
+ struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ bool heartbeat_reset, preboot_only, cpu_initialized = false;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 cpu_boot_status;
+
+ preboot_only = (hdev->fw_loader.fw_comp_loaded == FW_TYPE_PREBOOT_CPU);
+ heartbeat_reset = (hdev->reset_info.curr_reset_cause == HL_RESET_CAUSE_HEARTBEAT);
+
+ /*
+ * Handle corner case where failure was at cpu management app load,
+ * and driver didn't detect any failure while loading the FW,
+ * then at such scenario driver will send only HALT_MACHINE
+ * and no one will respond to this request since FW already back to preboot
+ * and it cannot handle such cmd.
+ * In this case next time the management app loads it'll check on events register
+ * which will still have the halt indication, and will reboot the device.
+ * The solution is to let preboot clear all relevant registers before next boot
+ * once driver send COMMS_RST_DEV.
+ */
+ cpu_boot_status = RREG32(mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS);
+
+ if (gaudi2 && (gaudi2->hw_cap_initialized & HW_CAP_CPU) &&
+ (cpu_boot_status == CPU_BOOT_STATUS_SRAM_AVAIL))
+ cpu_initialized = true;
+
+ /*
+ * when Linux/Bootfit exist this write to the SP can be interpreted in 2 ways:
+ * 1. FW reset: FW initiate the reset sequence
+ * 2. driver reset: FW will start HALT sequence (the preparations for the
+ * reset but not the reset itself as it is not implemented
+ * on their part) and LKD will wait to let FW complete the
+ * sequence before issuing the reset
+ */
+ if (!preboot_only && cpu_initialized) {
+ WREG32(le32_to_cpu(dyn_regs->gic_host_halt_irq),
+ gaudi2_irq_map_table[GAUDI2_EVENT_CPU_HALT_MACHINE].cpu_id);
+
+ msleep(GAUDI2_CPU_RESET_WAIT_MSEC);
+ }
+
+ /*
+ * When working with preboot (without Linux/Boot fit) we can
+ * communicate only using the COMMS commands to issue halt/reset.
+ *
+ * For the case in which we are working with Linux/Bootfit this is a hail-mary
+ * attempt to revive the card in the small chance that the f/w has
+ * experienced a watchdog event, which caused it to return back to preboot.
+ * In that case, triggering reset through GIC won't help. We need to
+ * trigger the reset as if Linux wasn't loaded.
+ *
+ * We do it only if the reset cause was HB, because that would be the
+ * indication of such an event.
+ *
+ * In case watchdog hasn't expired but we still got HB, then this won't
+ * do any damage.
+ */
+
+ if (heartbeat_reset || preboot_only || !cpu_initialized) {
+ if (hdev->asic_prop.hard_reset_done_by_fw)
+ hl_fw_ask_hard_reset_without_linux(hdev);
+ else
+ hl_fw_ask_halt_machine_without_linux(hdev);
+ }
+}
+
+/**
+ * gaudi2_execute_hard_reset - execute hard reset by driver/FW
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @reset_sleep_ms: sleep time in msec after reset
+ *
+ * This function executes hard reset based on if driver/FW should do the reset
+ */
+static void gaudi2_execute_hard_reset(struct hl_device *hdev, u32 reset_sleep_ms)
+{
+ if (hdev->asic_prop.hard_reset_done_by_fw) {
+ gaudi2_send_hard_reset_cmd(hdev);
+ return;
+ }
+
+ /* Set device to handle FLR by H/W as we will put the device
+ * CPU to halt mode
+ */
+ WREG32(mmPCIE_AUX_FLR_CTRL,
+ (PCIE_AUX_FLR_CTRL_HW_CTRL_MASK | PCIE_AUX_FLR_CTRL_INT_MASK_MASK));
+
+ gaudi2_send_hard_reset_cmd(hdev);
+
+ WREG32(mmPSOC_RESET_CONF_SW_ALL_RST, 1);
+}
+
+/**
+ * gaudi2_execute_soft_reset - execute soft reset by driver/FW
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @reset_sleep_ms: sleep time in msec after reset
+ * @driver_performs_reset: true if driver should perform reset instead of f/w.
+ *
+ * This function executes soft reset based on if driver/FW should do the reset
+ */
+static void gaudi2_execute_soft_reset(struct hl_device *hdev, u32 reset_sleep_ms,
+ bool driver_performs_reset)
+{
+ struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+
+ if (!driver_performs_reset) {
+ /* set SP to indicate reset request sent to FW */
+ if (dyn_regs->cpu_rst_status)
+ WREG32(le32_to_cpu(dyn_regs->cpu_rst_status), CPU_RST_STATUS_NA);
+ else
+ WREG32(mmCPU_RST_STATUS_TO_HOST, CPU_RST_STATUS_NA);
+
+ WREG32(le32_to_cpu(dyn_regs->gic_host_soft_rst_irq),
+ gaudi2_irq_map_table[GAUDI2_EVENT_CPU_SOFT_RESET].cpu_id);
+ return;
+ }
+
+ /* Block access to engines, QMANs and SM during reset, these
+ * RRs will be reconfigured after soft reset.
+ * PCIE_MSIX is left unsecured to allow NIC packets processing during the reset.
+ */
+ gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 1,
+ mmDCORE0_TPC0_QM_DCCM_BASE, mmPCIE_MSIX_BASE);
+
+ gaudi2_write_rr_to_all_lbw_rtrs(hdev, RR_TYPE_LONG, NUM_LONG_LBW_RR - 2,
+ mmPCIE_MSIX_BASE + HL_BLOCK_SIZE,
+ mmPCIE_VDEC1_MSTR_IF_RR_SHRD_HBW_BASE + HL_BLOCK_SIZE);
+
+ WREG32(mmPSOC_RESET_CONF_SOFT_RST, 1);
+}
+
+static void gaudi2_poll_btm_indication(struct hl_device *hdev, u32 reset_sleep_ms,
+ u32 poll_timeout_us)
+{
+ int i, rc = 0;
+ u32 reg_val;
+
+ /* without this sleep reset will not work */
+ msleep(reset_sleep_ms);
+
+ /* We poll the BTM done indication multiple times after reset due to
+ * a HW errata 'GAUDI2_0300'
+ */
+ for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
+ rc = hl_poll_timeout(
+ hdev,
+ mmPSOC_GLOBAL_CONF_BTM_FSM,
+ reg_val,
+ reg_val == 0,
+ 1000,
+ poll_timeout_us);
+
+ if (rc)
+ dev_err(hdev->dev, "Timeout while waiting for device to reset 0x%x\n", reg_val);
+}
+
+static void gaudi2_get_soft_rst_done_indication(struct hl_device *hdev, u32 poll_timeout_us)
+{
+ int i, rc = 0;
+ u32 reg_val;
+
+ for (i = 0 ; i < GAUDI2_RESET_POLL_CNT ; i++)
+ rc = hl_poll_timeout(
+ hdev,
+ mmCPU_RST_STATUS_TO_HOST,
+ reg_val,
+ reg_val == CPU_RST_STATUS_SOFT_RST_DONE,
+ 1000,
+ poll_timeout_us);
+
+ if (rc)
+ dev_err(hdev->dev, "Timeout while waiting for FW to complete soft reset (0x%x)\n",
+ reg_val);
+}
+
+static void gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 poll_timeout_us, reset_sleep_ms;
+ bool driver_performs_reset = false;
+
+ if (hdev->pldm) {
+ reset_sleep_ms = hard_reset ? GAUDI2_PLDM_HRESET_TIMEOUT_MSEC :
+ GAUDI2_PLDM_SRESET_TIMEOUT_MSEC;
+ poll_timeout_us = GAUDI2_PLDM_RESET_POLL_TIMEOUT_USEC;
+ } else {
+ reset_sleep_ms = GAUDI2_RESET_TIMEOUT_MSEC;
+ poll_timeout_us = GAUDI2_RESET_POLL_TIMEOUT_USEC;
+ }
+
+ if (fw_reset)
+ goto skip_reset;
+
+ gaudi2_reset_arcs(hdev);
+
+ if (hard_reset) {
+ driver_performs_reset = !hdev->asic_prop.hard_reset_done_by_fw;
+ gaudi2_execute_hard_reset(hdev, reset_sleep_ms);
+ } else {
+ /*
+ * As we have to support also work with preboot only (which does not supports
+ * soft reset) we have to make sure that security is disabled before letting driver
+ * do the reset. user shall control the BFE flags to avoid asking soft reset in
+ * secured device with preboot only.
+ */
+ driver_performs_reset = (hdev->fw_components == FW_TYPE_PREBOOT_CPU &&
+ !hdev->asic_prop.fw_security_enabled);
+ gaudi2_execute_soft_reset(hdev, reset_sleep_ms, driver_performs_reset);
+ }
+
+skip_reset:
+ if (driver_performs_reset || hard_reset)
+ /*
+ * Instead of waiting for BTM indication we should wait for preboot ready:
+ * Consider the below scenario:
+ * 1. FW update is being triggered
+ * - setting the dirty bit
+ * 2. hard reset will be triggered due to the dirty bit
+ * 3. FW initiates the reset:
+ * - dirty bit cleared
+ * - BTM indication cleared
+ * - preboot ready indication cleared
+ * 4. during hard reset:
+ * - BTM indication will be set
+ * - BIST test performed and another reset triggered
+ * 5. only after this reset the preboot will set the preboot ready
+ *
+ * when polling on BTM indication alone we can lose sync with FW while trying to
+ * communicate with FW that is during reset.
+ * to overcome this we will always wait to preboot ready indication
+ */
+ if ((hdev->fw_components & FW_TYPE_PREBOOT_CPU)) {
+ msleep(reset_sleep_ms);
+ hl_fw_wait_preboot_ready(hdev);
+ } else {
+ gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us);
+ }
+ else
+ gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us);
+
+ if (!gaudi2)
+ return;
+
+ gaudi2->dec_hw_cap_initialized &= ~(HW_CAP_DEC_MASK);
+ gaudi2->tpc_hw_cap_initialized &= ~(HW_CAP_TPC_MASK);
+
+ /*
+ * Clear NIC capability mask in order for driver to re-configure
+ * NIC QMANs. NIC ports will not be re-configured during soft
+ * reset as we call gaudi2_nic_init only during hard reset
+ */
+ gaudi2->nic_hw_cap_initialized &= ~(HW_CAP_NIC_MASK);
+
+ if (hard_reset) {
+ gaudi2->hw_cap_initialized &=
+ ~(HW_CAP_DRAM | HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_MASK |
+ HW_CAP_PMMU | HW_CAP_CPU | HW_CAP_CPU_Q |
+ HW_CAP_SRAM_SCRAMBLER | HW_CAP_DMMU_MASK |
+ HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_KDMA |
+ HW_CAP_MME_MASK | HW_CAP_ROT_MASK);
+
+ memset(gaudi2->events_stat, 0, sizeof(gaudi2->events_stat));
+ } else {
+ gaudi2->hw_cap_initialized &=
+ ~(HW_CAP_CLK_GATE | HW_CAP_HBM_SCRAMBLER_SW_RESET |
+ HW_CAP_PDMA_MASK | HW_CAP_EDMA_MASK | HW_CAP_MME_MASK |
+ HW_CAP_ROT_MASK);
+ }
+}
+
+static int gaudi2_suspend(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
+ if (rc)
+ dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
+
+ return rc;
+}
+
+static int gaudi2_resume(struct hl_device *hdev)
+{
+ return gaudi2_init_iatu(hdev);
+}
+
+static int gaudi2_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+ int rc;
+
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
++ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++ VM_DONTCOPY | VM_NORESERVE);
+
+#ifdef _HAS_DMA_MMAP_COHERENT
+
+ rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr, dma_addr, size);
+ if (rc)
+ dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
+
+#else
+
+ rc = remap_pfn_range(vma, vma->vm_start,
+ virt_to_phys(cpu_addr) >> PAGE_SHIFT,
+ size, vma->vm_page_prot);
+ if (rc)
+ dev_err(hdev->dev, "remap_pfn_range error %d", rc);
+
+#endif
+
+ return rc;
+}
+
+static bool gaudi2_is_queue_enabled(struct hl_device *hdev, u32 hw_queue_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 hw_cap_mask = 0;
+ u64 hw_tpc_cap_bit = 0;
+ u64 hw_nic_cap_bit = 0;
+ u64 hw_test_cap_bit = 0;
+
+ switch (hw_queue_id) {
+ case GAUDI2_QUEUE_ID_PDMA_0_0:
+ case GAUDI2_QUEUE_ID_PDMA_0_1:
+ case GAUDI2_QUEUE_ID_PDMA_1_0:
+ hw_cap_mask = HW_CAP_PDMA_MASK;
+ break;
+ case GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE0_EDMA_1_3:
+ hw_test_cap_bit = HW_CAP_EDMA_SHIFT +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0) >> 2);
+ break;
+ case GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE1_EDMA_1_3:
+ hw_test_cap_bit = HW_CAP_EDMA_SHIFT + NUM_OF_EDMA_PER_DCORE +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0) >> 2);
+ break;
+ case GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE2_EDMA_1_3:
+ hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 2 * NUM_OF_EDMA_PER_DCORE +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0) >> 2);
+ break;
+ case GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0...GAUDI2_QUEUE_ID_DCORE3_EDMA_1_3:
+ hw_test_cap_bit = HW_CAP_EDMA_SHIFT + 3 * NUM_OF_EDMA_PER_DCORE +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0) >> 2);
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE0_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE0_MME_0_3:
+ hw_test_cap_bit = HW_CAP_MME_SHIFT;
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE1_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE1_MME_0_3:
+ hw_test_cap_bit = HW_CAP_MME_SHIFT + 1;
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE2_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE2_MME_0_3:
+ hw_test_cap_bit = HW_CAP_MME_SHIFT + 2;
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE3_MME_0_0 ... GAUDI2_QUEUE_ID_DCORE3_MME_0_3:
+ hw_test_cap_bit = HW_CAP_MME_SHIFT + 3;
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_5_3:
+ hw_tpc_cap_bit = HW_CAP_TPC_SHIFT +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE0_TPC_0_0) >> 2);
+
+ /* special case where cap bit refers to the first queue id */
+ if (!hw_tpc_cap_bit)
+ return !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(0));
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE1_TPC_5_3:
+ hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + NUM_OF_TPC_PER_DCORE +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE1_TPC_0_0) >> 2);
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE2_TPC_5_3:
+ hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (2 * NUM_OF_TPC_PER_DCORE) +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE2_TPC_0_0) >> 2);
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 ... GAUDI2_QUEUE_ID_DCORE3_TPC_5_3:
+ hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (3 * NUM_OF_TPC_PER_DCORE) +
+ ((hw_queue_id - GAUDI2_QUEUE_ID_DCORE3_TPC_0_0) >> 2);
+ break;
+
+ case GAUDI2_QUEUE_ID_DCORE0_TPC_6_0 ... GAUDI2_QUEUE_ID_DCORE0_TPC_6_3:
+ hw_tpc_cap_bit = HW_CAP_TPC_SHIFT + (4 * NUM_OF_TPC_PER_DCORE);
+ break;
+
+ case GAUDI2_QUEUE_ID_ROT_0_0 ... GAUDI2_QUEUE_ID_ROT_1_3:
+ hw_test_cap_bit = HW_CAP_ROT_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_ROT_0_0) >> 2);
+ break;
+
+ case GAUDI2_QUEUE_ID_NIC_0_0 ... GAUDI2_QUEUE_ID_NIC_23_3:
+ hw_nic_cap_bit = HW_CAP_NIC_SHIFT + ((hw_queue_id - GAUDI2_QUEUE_ID_NIC_0_0) >> 2);
+
+ /* special case where cap bit refers to the first queue id */
+ if (!hw_nic_cap_bit)
+ return !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(0));
+ break;
+
+ case GAUDI2_QUEUE_ID_CPU_PQ:
+ return !!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q);
+
+ default:
+ return false;
+ }
+
+ if (hw_tpc_cap_bit)
+ return !!(gaudi2->tpc_hw_cap_initialized & BIT_ULL(hw_tpc_cap_bit));
+
+ if (hw_nic_cap_bit)
+ return !!(gaudi2->nic_hw_cap_initialized & BIT_ULL(hw_nic_cap_bit));
+
+ if (hw_test_cap_bit)
+ hw_cap_mask = BIT_ULL(hw_test_cap_bit);
+
+ return !!(gaudi2->hw_cap_initialized & hw_cap_mask);
+}
+
+static bool gaudi2_is_arc_enabled(struct hl_device *hdev, u64 arc_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ switch (arc_id) {
+ case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
+ case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
+ return !!(gaudi2->active_hw_arc & BIT_ULL(arc_id));
+
+ case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
+ return !!(gaudi2->active_tpc_arc & BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
+
+ case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
+ return !!(gaudi2->active_nic_arc & BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
+
+ default:
+ return false;
+ }
+}
+
+static void gaudi2_clr_arc_id_cap(struct hl_device *hdev, u64 arc_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ switch (arc_id) {
+ case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
+ case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
+ gaudi2->active_hw_arc &= ~(BIT_ULL(arc_id));
+ break;
+
+ case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
+ gaudi2->active_tpc_arc &= ~(BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0));
+ break;
+
+ case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
+ gaudi2->active_nic_arc &= ~(BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0));
+ break;
+
+ default:
+ return;
+ }
+}
+
+static void gaudi2_set_arc_id_cap(struct hl_device *hdev, u64 arc_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ switch (arc_id) {
+ case CPU_ID_SCHED_ARC0 ... CPU_ID_SCHED_ARC5:
+ case CPU_ID_MME_QMAN_ARC0...CPU_ID_ROT_QMAN_ARC1:
+ gaudi2->active_hw_arc |= BIT_ULL(arc_id);
+ break;
+
+ case CPU_ID_TPC_QMAN_ARC0...CPU_ID_TPC_QMAN_ARC24:
+ gaudi2->active_tpc_arc |= BIT_ULL(arc_id - CPU_ID_TPC_QMAN_ARC0);
+ break;
+
+ case CPU_ID_NIC_QMAN_ARC0...CPU_ID_NIC_QMAN_ARC23:
+ gaudi2->active_nic_arc |= BIT_ULL(arc_id - CPU_ID_NIC_QMAN_ARC0);
+ break;
+
+ default:
+ return;
+ }
+}
+
+static void gaudi2_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
+{
+ struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 pq_offset, reg_base, db_reg_offset, db_value;
+
+ if (hw_queue_id != GAUDI2_QUEUE_ID_CPU_PQ) {
+ /*
+ * QMAN has 4 successive PQ_PI registers, 1 for each of the QMAN PQs.
+ * Masking the H/W queue ID with 0x3 extracts the QMAN internal PQ
+ * number.
+ */
+ pq_offset = (hw_queue_id & 0x3) * 4;
+ reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
+ db_reg_offset = reg_base + QM_PQ_PI_0_OFFSET + pq_offset;
+ } else {
+ db_reg_offset = mmCPU_IF_PF_PQ_PI;
+ }
+
+ db_value = pi;
+
+ /* ring the doorbell */
+ WREG32(db_reg_offset, db_value);
+
+ if (hw_queue_id == GAUDI2_QUEUE_ID_CPU_PQ) {
+ /* make sure device CPU will read latest data from host */
+ mb();
+ WREG32(le32_to_cpu(dyn_regs->gic_host_pi_upd_irq),
+ gaudi2_irq_map_table[GAUDI2_EVENT_CPU_PI_UPDATE].cpu_id);
+ }
+}
+
+static void gaudi2_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
+{
+ __le64 *pbd = (__le64 *) bd;
+
+ /* The QMANs are on the host memory so a simple copy suffice */
+ pqe[0] = pbd[0];
+ pqe[1] = pbd[1];
+}
+
+static void *gaudi2_dma_alloc_coherent(struct hl_device *hdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags)
+{
+ return dma_alloc_coherent(&hdev->pdev->dev, size, dma_handle, flags);
+}
+
+static void gaudi2_dma_free_coherent(struct hl_device *hdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle)
+{
+ dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, dma_handle);
+}
+
+static int gaudi2_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
+ u32 timeout, u64 *result)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)) {
+ if (result)
+ *result = 0;
+ return 0;
+ }
+
+ if (!timeout)
+ timeout = GAUDI2_MSG_TO_CPU_TIMEOUT_USEC;
+
+ return hl_fw_send_cpu_message(hdev, GAUDI2_QUEUE_ID_CPU_PQ, msg, len, timeout, result);
+}
+
+static void *gaudi2_dma_pool_zalloc(struct hl_device *hdev, size_t size,
+ gfp_t mem_flags, dma_addr_t *dma_handle)
+{
+ if (size > GAUDI2_DMA_POOL_BLK_SIZE)
+ return NULL;
+
+ return dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
+}
+
+static void gaudi2_dma_pool_free(struct hl_device *hdev, void *vaddr, dma_addr_t dma_addr)
+{
+ dma_pool_free(hdev->dma_pool, vaddr, dma_addr);
+}
+
+static void *gaudi2_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
+ dma_addr_t *dma_handle)
+{
+ return hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
+}
+
+static void gaudi2_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size, void *vaddr)
+{
+ hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
+}
+
+static dma_addr_t gaudi2_dma_map_single(struct hl_device *hdev, void *addr, int len,
+ enum dma_data_direction dir)
+{
+ dma_addr_t dma_addr;
+
+ dma_addr = dma_map_single(&hdev->pdev->dev, addr, len, dir);
+ if (unlikely(dma_mapping_error(&hdev->pdev->dev, dma_addr)))
+ return 0;
+
+ return dma_addr;
+}
+
+static void gaudi2_dma_unmap_single(struct hl_device *hdev, dma_addr_t addr, int len,
+ enum dma_data_direction dir)
+{
+ dma_unmap_single(&hdev->pdev->dev, addr, len, dir);
+}
+
+static int gaudi2_validate_cb_address(struct hl_device *hdev, struct hl_cs_parser *parser)
+{
+ struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!gaudi2_is_queue_enabled(hdev, parser->hw_queue_id)) {
+ dev_err(hdev->dev, "h/w queue %d is disabled\n", parser->hw_queue_id);
+ return -EINVAL;
+ }
+
+ /* Just check if CB address is valid */
+
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->sram_user_base_address,
+ asic_prop->sram_end_address))
+ return 0;
+
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->dram_user_base_address,
+ asic_prop->dram_end_address))
+ return 0;
+
+ if ((gaudi2->hw_cap_initialized & HW_CAP_DMMU_MASK) &&
+ hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->dmmu.start_addr,
+ asic_prop->dmmu.end_addr))
+ return 0;
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_PMMU) {
+ if (hl_mem_area_inside_range((u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->pmmu.start_addr,
+ asic_prop->pmmu.end_addr) ||
+ hl_mem_area_inside_range(
+ (u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->pmmu_huge.start_addr,
+ asic_prop->pmmu_huge.end_addr))
+ return 0;
+
+ } else if (gaudi2_host_phys_addr_valid((u64) (uintptr_t) parser->user_cb)) {
+ if (!hdev->pdev)
+ return 0;
+
+ if (!device_iommu_mapped(&hdev->pdev->dev))
+ return 0;
+ }
+
+ dev_err(hdev->dev, "CB address %p + 0x%x for internal QMAN is not valid\n",
+ parser->user_cb, parser->user_cb_size);
+
+ return -EFAULT;
+}
+
+static int gaudi2_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!parser->is_kernel_allocated_cb)
+ return gaudi2_validate_cb_address(hdev, parser);
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU)) {
+ dev_err(hdev->dev, "PMMU not initialized - Unsupported mode in Gaudi2\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int gaudi2_send_heartbeat(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_send_heartbeat(hdev);
+}
+
+/* This is an internal helper function, used to update the KDMA mmu props.
+ * Should be called with a proper kdma lock.
+ */
+static void gaudi2_kdma_set_mmbp_asid(struct hl_device *hdev,
+ bool mmu_bypass, u32 asid)
+{
+ u32 rw_asid, rw_mmu_bp;
+
+ rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
+ (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
+
+ rw_mmu_bp = (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_SHIFT) |
+ (!!mmu_bypass << ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_SHIFT);
+
+ WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_ASID, rw_asid);
+ WREG32(mmARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP, rw_mmu_bp);
+}
+
+static void gaudi2_arm_cq_monitor(struct hl_device *hdev, u32 sob_id, u32 mon_id, u32 cq_id,
+ u32 mon_payload, u32 sync_value)
+{
+ u32 sob_offset, mon_offset, sync_group_id, mode, mon_arm;
+ u8 mask;
+
+ sob_offset = sob_id * 4;
+ mon_offset = mon_id * 4;
+
+ /* Reset the SOB value */
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset, 0);
+
+ /* Configure this address with CQ_ID 0 because CQ_EN is set */
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_offset, cq_id);
+
+ /* Configure this address with CS index because CQ_EN is set */
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_offset, mon_payload);
+
+ sync_group_id = sob_id / 8;
+ mask = ~(1 << (sob_id & 0x7));
+ mode = 1; /* comparison mode is "equal to" */
+
+ mon_arm = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOD_MASK, sync_value);
+ mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SOP_MASK, mode);
+ mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_MASK_MASK, mask);
+ mon_arm |= FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_MON_ARM_SID_MASK, sync_group_id);
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + mon_offset, mon_arm);
+}
+
+/* This is an internal helper function used by gaudi2_send_job_to_kdma only */
+static int gaudi2_send_job_to_kdma(struct hl_device *hdev,
+ u64 src_addr, u64 dst_addr,
+ u32 size, bool is_memset)
+{
+ u32 comp_val, commit_mask, *polling_addr, timeout, status = 0;
+ struct hl_cq_entry *cq_base;
+ struct hl_cq *cq;
+ u64 comp_addr;
+ int rc;
+
+ gaudi2_arm_cq_monitor(hdev, GAUDI2_RESERVED_SOB_KDMA_COMPLETION,
+ GAUDI2_RESERVED_MON_KDMA_COMPLETION,
+ GAUDI2_RESERVED_CQ_KDMA_COMPLETION, 1, 1);
+
+ comp_addr = CFG_BASE + mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 +
+ (GAUDI2_RESERVED_SOB_KDMA_COMPLETION * sizeof(u32));
+
+ comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
+
+ WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_LO, lower_32_bits(src_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_SRC_BASE_HI, upper_32_bits(src_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_LO, lower_32_bits(dst_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_DST_BASE_HI, upper_32_bits(dst_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_LO, lower_32_bits(comp_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_ADDR_HI, upper_32_bits(comp_addr));
+ WREG32(mmARC_FARM_KDMA_CTX_WR_COMP_WDATA, comp_val);
+ WREG32(mmARC_FARM_KDMA_CTX_DST_TSIZE_0, size);
+
+ commit_mask = FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_LIN_MASK, 1) |
+ FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_WR_COMP_EN_MASK, 1);
+
+ if (is_memset)
+ commit_mask |= FIELD_PREP(ARC_FARM_KDMA_CTX_COMMIT_MEM_SET_MASK, 1);
+
+ WREG32(mmARC_FARM_KDMA_CTX_COMMIT, commit_mask);
+
+ /* Wait for completion */
+ cq = &hdev->completion_queue[GAUDI2_RESERVED_CQ_KDMA_COMPLETION];
+ cq_base = cq->kernel_address;
+ polling_addr = (u32 *)&cq_base[cq->ci];
+
+ if (hdev->pldm)
+ /* for each 1MB 20 second of timeout */
+ timeout = ((size / SZ_1M) + 1) * USEC_PER_SEC * 20;
+ else
+ timeout = KDMA_TIMEOUT_USEC;
+
+ /* Polling */
+ rc = hl_poll_timeout_memory(
+ hdev,
+ polling_addr,
+ status,
+ (status == 1),
+ 1000,
+ timeout,
+ true);
+
+ *polling_addr = 0;
+
+ if (rc) {
+ dev_err(hdev->dev, "Timeout while waiting for KDMA to be idle\n");
+ WREG32(mmARC_FARM_KDMA_CFG_1, 1 << ARC_FARM_KDMA_CFG_1_HALT_SHIFT);
+ return rc;
+ }
+
+ cq->ci = hl_cq_inc_ptr(cq->ci);
+
+ return 0;
+}
+
+static void gaudi2_memset_device_lbw(struct hl_device *hdev, u32 addr, u32 size, u32 val)
+{
+ u32 i;
+
+ for (i = 0 ; i < size ; i += sizeof(u32))
+ WREG32(addr + i, val);
+}
+
+static void gaudi2_qman_set_test_mode(struct hl_device *hdev, u32 hw_queue_id, bool enable)
+{
+ u32 reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
+
+ if (enable) {
+ WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED_TEST_MODE);
+ WREG32(reg_base + QM_PQC_CFG_OFFSET, 0);
+ } else {
+ WREG32(reg_base + QM_GLBL_PROT_OFFSET, QMAN_MAKE_TRUSTED);
+ WREG32(reg_base + QM_PQC_CFG_OFFSET, 1 << PDMA0_QM_PQC_CFG_EN_SHIFT);
+ }
+}
+
+static int gaudi2_test_queue(struct hl_device *hdev, u32 hw_queue_id)
+{
+ u32 sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
+ u32 sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
+ u32 timeout_usec, tmp, sob_base = 1, sob_val = 0x5a5a;
+ struct packet_msg_short *msg_short_pkt;
+ dma_addr_t pkt_dma_addr;
+ size_t pkt_size;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GAUDI2_PLDM_TEST_QUEUE_WAIT_USEC;
+ else
+ timeout_usec = GAUDI2_TEST_QUEUE_WAIT_USEC;
+
+ pkt_size = sizeof(*msg_short_pkt);
+ msg_short_pkt = hl_asic_dma_pool_zalloc(hdev, pkt_size, GFP_KERNEL, &pkt_dma_addr);
+ if (!msg_short_pkt) {
+ dev_err(hdev->dev, "Failed to allocate packet for H/W queue %d testing\n",
+ hw_queue_id);
+ return -ENOMEM;
+ }
+
+ tmp = (PACKET_MSG_SHORT << GAUDI2_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GAUDI2_PKT_CTL_EB_SHIFT) |
+ (1 << GAUDI2_PKT_CTL_MB_SHIFT) |
+ (sob_base << GAUDI2_PKT_SHORT_CTL_BASE_SHIFT) |
+ (sob_offset << GAUDI2_PKT_SHORT_CTL_ADDR_SHIFT);
+
+ msg_short_pkt->value = cpu_to_le32(sob_val);
+ msg_short_pkt->ctl = cpu_to_le32(tmp);
+
+ /* Reset the SOB value */
+ WREG32(sob_addr, 0);
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to send msg_short packet to H/W queue %d\n",
+ hw_queue_id);
+ goto free_pkt;
+ }
+
+ rc = hl_poll_timeout(
+ hdev,
+ sob_addr,
+ tmp,
+ (tmp == sob_val),
+ 1000,
+ timeout_usec);
+
+ if (rc == -ETIMEDOUT) {
+ dev_err(hdev->dev, "H/W queue %d test failed (SOB_OBJ_0 == 0x%x)\n",
+ hw_queue_id, tmp);
+ rc = -EIO;
+ }
+
+ /* Reset the SOB value */
+ WREG32(sob_addr, 0);
+
+free_pkt:
+ hl_asic_dma_pool_free(hdev, (void *) msg_short_pkt, pkt_dma_addr);
+ return rc;
+}
+
+static int gaudi2_test_cpu_queue(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ /*
+ * check capability here as send_cpu_message() won't update the result
+ * value if no capability
+ */
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_test_cpu_queue(hdev);
+}
+
+static int gaudi2_test_queues(struct hl_device *hdev)
+{
+ int i, rc, ret_val = 0;
+
+ for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ; i++) {
+ if (!gaudi2_is_queue_enabled(hdev, i))
+ continue;
+
+ gaudi2_qman_set_test_mode(hdev, i, true);
+ rc = gaudi2_test_queue(hdev, i);
+ gaudi2_qman_set_test_mode(hdev, i, false);
+
+ if (rc) {
+ ret_val = -EINVAL;
+ goto done;
+ }
+ }
+
+ rc = gaudi2_test_cpu_queue(hdev);
+ if (rc) {
+ ret_val = -EINVAL;
+ goto done;
+ }
+
+done:
+ return ret_val;
+}
+
+static int gaudi2_compute_reset_late_init(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ size_t irq_arr_size;
+
+ /* TODO: missing gaudi2_nic_resume.
+ * Until implemented nic_hw_cap_initialized will remain zeroed
+ */
+ gaudi2_init_arcs(hdev);
+ gaudi2_scrub_arcs_dccm(hdev);
+ gaudi2_init_security(hdev);
+
+ /* Unmask all IRQs since some could have been received during the soft reset */
+ irq_arr_size = gaudi2->num_of_valid_hw_events * sizeof(gaudi2->hw_events[0]);
+ return hl_fw_unmask_irq_arr(hdev, gaudi2->hw_events, irq_arr_size);
+}
+
+static void gaudi2_is_tpc_engine_idle(struct hl_device *hdev, int dcore, int inst, u32 offset,
+ struct iterate_module_ctx *ctx)
+{
+ struct gaudi2_tpc_idle_data *idle_data = ctx->data;
+ u32 tpc_cfg_sts, qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts;
+ bool is_eng_idle;
+ int engine_idx;
+
+ if ((dcore == 0) && (inst == (NUM_DCORE0_TPC - 1)))
+ engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
+ else
+ engine_idx = GAUDI2_DCORE0_ENGINE_ID_TPC_0 +
+ dcore * GAUDI2_ENGINE_ID_DCORE_OFFSET + inst;
+
+ tpc_cfg_sts = RREG32(mmDCORE0_TPC0_CFG_STATUS + offset);
+ qm_glbl_sts0 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmDCORE0_TPC0_QM_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmDCORE0_TPC0_QM_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
+ IS_TPC_IDLE(tpc_cfg_sts);
+ *(idle_data->is_idle) &= is_eng_idle;
+
+ if (idle_data->mask && !is_eng_idle)
+ set_bit(engine_idx, idle_data->mask);
+
+ if (idle_data->e)
+ hl_engine_data_sprintf(idle_data->e,
+ idle_data->tpc_fmt, dcore, inst,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts, tpc_cfg_sts);
+}
+
+static bool gaudi2_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
+ struct engines_data *e)
+{
+ u32 qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts, dma_core_idle_ind_mask,
+ mme_arch_sts, dec_swreg15, dec_enabled_bit;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ const char *rot_fmt = "%-6d%-5d%-9s%#-14x%#-12x%s\n";
+ unsigned long *mask = (unsigned long *) mask_arr;
+ const char *edma_fmt = "%-6d%-6d%-9s%#-14x%#x\n";
+ const char *mme_fmt = "%-5d%-6s%-9s%#-14x%#x\n";
+ const char *nic_fmt = "%-5d%-9s%#-14x%#-12x\n";
+ const char *pdma_fmt = "%-6d%-9s%#-14x%#x\n";
+ const char *pcie_dec_fmt = "%-10d%-9s%#x\n";
+ const char *dec_fmt = "%-6d%-5d%-9s%#x\n";
+ bool is_idle = true, is_eng_idle;
+ u64 offset;
+
+ struct gaudi2_tpc_idle_data tpc_idle_data = {
+ .tpc_fmt = "%-6d%-5d%-9s%#-14x%#-12x%#x\n",
+ .e = e,
+ .mask = mask,
+ .is_idle = &is_idle,
+ };
+ struct iterate_module_ctx tpc_iter = {
+ .fn = &gaudi2_is_tpc_engine_idle,
+ .data = &tpc_idle_data,
+ };
+
+ int engine_idx, i, j;
+
+ /* EDMA, Two engines per Dcore */
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nCORE EDMA is_idle QM_GLBL_STS0 DMA_CORE_IDLE_IND_MASK\n"
+ "---- ---- ------- ------------ ----------------------\n");
+
+ for (i = 0; i < NUM_OF_DCORES; i++) {
+ for (j = 0 ; j < NUM_OF_EDMA_PER_DCORE ; j++) {
+ int seq = i * NUM_OF_EDMA_PER_DCORE + j;
+
+ if (!(prop->edma_enabled_mask & BIT(seq)))
+ continue;
+
+ engine_idx = GAUDI2_DCORE0_ENGINE_ID_EDMA_0 +
+ i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
+ offset = i * DCORE_OFFSET + j * DCORE_EDMA_OFFSET;
+
+ dma_core_idle_ind_mask =
+ RREG32(mmDCORE0_EDMA0_CORE_IDLE_IND_MASK + offset);
+
+ qm_glbl_sts0 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmDCORE0_EDMA0_QM_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmDCORE0_EDMA0_QM_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
+ IS_DMA_IDLE(dma_core_idle_ind_mask);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, edma_fmt, i, j,
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0,
+ dma_core_idle_ind_mask);
+ }
+ }
+
+ /* PDMA, Two engines in Full chip */
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nPDMA is_idle QM_GLBL_STS0 DMA_CORE_IDLE_IND_MASK\n"
+ "---- ------- ------------ ----------------------\n");
+
+ for (i = 0 ; i < NUM_OF_PDMA ; i++) {
+ engine_idx = GAUDI2_ENGINE_ID_PDMA_0 + i;
+ offset = i * PDMA_OFFSET;
+ dma_core_idle_ind_mask = RREG32(mmPDMA0_CORE_IDLE_IND_MASK + offset);
+
+ qm_glbl_sts0 = RREG32(mmPDMA0_QM_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmPDMA0_QM_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmPDMA0_QM_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts) &&
+ IS_DMA_IDLE(dma_core_idle_ind_mask);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, pdma_fmt, i, is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, dma_core_idle_ind_mask);
+ }
+
+ /* NIC, twelve macros in Full chip */
+ if (e && hdev->nic_ports_mask)
+ hl_engine_data_sprintf(e,
+ "\nNIC is_idle QM_GLBL_STS0 QM_CGM_STS\n"
+ "--- ------- ------------ ----------\n");
+
+ for (i = 0 ; i < NIC_NUMBER_OF_ENGINES ; i++) {
+ if (!(i & 1))
+ offset = i / 2 * NIC_OFFSET;
+ else
+ offset += NIC_QM_OFFSET;
+
+ if (!(hdev->nic_ports_mask & BIT(i)))
+ continue;
+
+ engine_idx = GAUDI2_ENGINE_ID_NIC0_0 + i;
+
+
+ qm_glbl_sts0 = RREG32(mmNIC0_QM0_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmNIC0_QM0_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmNIC0_QM0_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, nic_fmt, i, is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nMME Stub is_idle QM_GLBL_STS0 MME_ARCH_STATUS\n"
+ "--- ---- ------- ------------ ---------------\n");
+ /* MME, one per Dcore */
+ for (i = 0 ; i < NUM_OF_DCORES ; i++) {
+ engine_idx = GAUDI2_DCORE0_ENGINE_ID_MME + i * GAUDI2_ENGINE_ID_DCORE_OFFSET;
+ offset = i * DCORE_OFFSET;
+
+ qm_glbl_sts0 = RREG32(mmDCORE0_MME_QM_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmDCORE0_MME_QM_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmDCORE0_MME_QM_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
+ is_idle &= is_eng_idle;
+
+ mme_arch_sts = RREG32(mmDCORE0_MME_CTRL_LO_ARCH_STATUS + offset);
+ is_eng_idle &= IS_MME_IDLE(mme_arch_sts);
+ is_idle &= is_eng_idle;
+
+ if (e)
+ hl_engine_data_sprintf(e, mme_fmt, i, "N",
+ is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0,
+ mme_arch_sts);
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+ }
+
+ /*
+ * TPC
+ */
+ if (e && prop->tpc_enabled_mask)
+ hl_engine_data_sprintf(e,
+ "\nCORE TPC is_idle QM_GLBL_STS0 QM_CGM_STS DMA_CORE_IDLE_IND_MASK\n"
+ "---- --- -------- ------------ ---------- ----------------------\n");
+
+ gaudi2_iterate_tpcs(hdev, &tpc_iter);
+
+ /* Decoders, two each Dcore and two shared PCIe decoders */
+ if (e && (prop->decoder_enabled_mask & (~PCIE_DEC_EN_MASK)))
+ hl_engine_data_sprintf(e,
+ "\nCORE DEC is_idle VSI_CMD_SWREG15\n"
+ "---- --- ------- ---------------\n");
+
+ for (i = 0 ; i < NUM_OF_DCORES ; i++) {
+ for (j = 0 ; j < NUM_OF_DEC_PER_DCORE ; j++) {
+ dec_enabled_bit = 1 << (i * NUM_OF_DEC_PER_DCORE + j);
+ if (!(prop->decoder_enabled_mask & dec_enabled_bit))
+ continue;
+
+ engine_idx = GAUDI2_DCORE0_ENGINE_ID_DEC_0 +
+ i * GAUDI2_ENGINE_ID_DCORE_OFFSET + j;
+ offset = i * DCORE_OFFSET + j * DCORE_DEC_OFFSET;
+
+ dec_swreg15 = RREG32(mmDCORE0_DEC0_CMD_SWREG15 + offset);
+ is_eng_idle = IS_DEC_IDLE(dec_swreg15);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, dec_fmt, i, j,
+ is_eng_idle ? "Y" : "N", dec_swreg15);
+ }
+ }
+
+ if (e && (prop->decoder_enabled_mask & PCIE_DEC_EN_MASK))
+ hl_engine_data_sprintf(e,
+ "\nPCIe DEC is_idle VSI_CMD_SWREG15\n"
+ "-------- ------- ---------------\n");
+
+ /* Check shared(PCIe) decoders */
+ for (i = 0 ; i < NUM_OF_DEC_PER_DCORE ; i++) {
+ dec_enabled_bit = PCIE_DEC_SHIFT + i;
+ if (!(prop->decoder_enabled_mask & BIT(dec_enabled_bit)))
+ continue;
+
+ engine_idx = GAUDI2_PCIE_ENGINE_ID_DEC_0 + i;
+ offset = i * DCORE_DEC_OFFSET;
+ dec_swreg15 = RREG32(mmPCIE_DEC0_CMD_SWREG15 + offset);
+ is_eng_idle = IS_DEC_IDLE(dec_swreg15);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, pcie_dec_fmt, i,
+ is_eng_idle ? "Y" : "N", dec_swreg15);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nCORE ROT is_idle QM_GLBL_STS0 QM_CGM_STS DMA_CORE_STS0\n"
+ "---- ---- ------- ------------ ---------- -------------\n");
+
+ for (i = 0 ; i < NUM_OF_ROT ; i++) {
+ engine_idx = GAUDI2_ENGINE_ID_ROT_0 + i;
+
+ offset = i * ROT_OFFSET;
+
+ qm_glbl_sts0 = RREG32(mmROT0_QM_GLBL_STS0 + offset);
+ qm_glbl_sts1 = RREG32(mmROT0_QM_GLBL_STS1 + offset);
+ qm_cgm_sts = RREG32(mmROT0_QM_CGM_STS + offset);
+
+ is_eng_idle = IS_QM_IDLE(qm_glbl_sts0, qm_glbl_sts1, qm_cgm_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(engine_idx, mask);
+
+ if (e)
+ hl_engine_data_sprintf(e, rot_fmt, i, 0, is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, qm_cgm_sts, "-");
+ }
+
+ return is_idle;
+}
+
+static void gaudi2_hw_queues_lock(struct hl_device *hdev)
+ __acquires(&gaudi2->hw_queues_lock)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ spin_lock(&gaudi2->hw_queues_lock);
+}
+
+static void gaudi2_hw_queues_unlock(struct hl_device *hdev)
+ __releases(&gaudi2->hw_queues_lock)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ spin_unlock(&gaudi2->hw_queues_lock);
+}
+
+static u32 gaudi2_get_pci_id(struct hl_device *hdev)
+{
+ return hdev->pdev->device;
+}
+
+static int gaudi2_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_get_eeprom_data(hdev, data, max_size);
+}
+
+static void gaudi2_update_eq_ci(struct hl_device *hdev, u32 val)
+{
+ WREG32(mmCPU_IF_EQ_RD_OFFS, val);
+}
+
+static void *gaudi2_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (aggregate) {
+ *size = (u32) sizeof(gaudi2->events_stat_aggregate);
+ return gaudi2->events_stat_aggregate;
+ }
+
+ *size = (u32) sizeof(gaudi2->events_stat);
+ return gaudi2->events_stat;
+}
+
+static void gaudi2_mmu_vdec_dcore_prepare(struct hl_device *hdev, int dcore_id,
+ int dcore_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
+{
+ u32 offset = (mmDCORE0_VDEC1_BRDG_CTRL_BASE - mmDCORE0_VDEC0_BRDG_CTRL_BASE) *
+ dcore_vdec_id + DCORE_OFFSET * dcore_id;
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
+
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmDCORE0_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
+}
+
+static void gaudi2_mmu_dcore_prepare(struct hl_device *hdev, int dcore_id, u32 asid)
+{
+ u32 rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
+ (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 dcore_offset = dcore_id * DCORE_OFFSET;
+ u32 vdec_id, i, ports_offset, reg_val;
+ u8 edma_seq_base;
+
+ /* EDMA */
+ edma_seq_base = dcore_id * NUM_OF_EDMA_PER_DCORE;
+ if (prop->edma_enabled_mask & BIT(edma_seq_base)) {
+ WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
+ WREG32(mmDCORE0_EDMA0_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
+ }
+
+ if (prop->edma_enabled_mask & BIT(edma_seq_base + 1)) {
+ WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
+ WREG32(mmDCORE0_EDMA1_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
+ WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_ASID + dcore_offset, rw_asid);
+ WREG32(mmDCORE0_EDMA1_CORE_CTX_AXUSER_HB_MMU_BP + dcore_offset, 0);
+ }
+
+ /* Sync Mngr */
+ WREG32(mmDCORE0_SYNC_MNGR_GLBL_ASID_NONE_SEC_PRIV + dcore_offset, asid);
+ /*
+ * Sync Mngrs on dcores 1 - 3 are exposed to user, so must use user ASID
+ * for any access type
+ */
+ if (dcore_id > 0) {
+ reg_val = (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_RD_SHIFT) |
+ (asid << DCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID_WR_SHIFT);
+ WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_ASID + dcore_offset, reg_val);
+ WREG32(mmDCORE0_SYNC_MNGR_MSTR_IF_AXUSER_HB_MMU_BP + dcore_offset, 0);
+ }
+
+ WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_MMU_BP + dcore_offset, 0);
+ WREG32(mmDCORE0_MME_CTRL_LO_MME_AXUSER_HB_ASID + dcore_offset, rw_asid);
+
+ for (i = 0 ; i < NUM_OF_MME_SBTE_PORTS ; i++) {
+ ports_offset = i * DCORE_MME_SBTE_OFFSET;
+ WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_MMU_BP +
+ dcore_offset + ports_offset, 0);
+ WREG32(mmDCORE0_MME_SBTE0_MSTR_IF_AXUSER_HB_ASID +
+ dcore_offset + ports_offset, rw_asid);
+ }
+
+ for (i = 0 ; i < NUM_OF_MME_WB_PORTS ; i++) {
+ ports_offset = i * DCORE_MME_WB_OFFSET;
+ WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_MMU_BP +
+ dcore_offset + ports_offset, 0);
+ WREG32(mmDCORE0_MME_WB0_MSTR_IF_AXUSER_HB_ASID +
+ dcore_offset + ports_offset, rw_asid);
+ }
+
+ WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_MMU_BP + dcore_offset, 0);
+ WREG32(mmDCORE0_MME_QM_AXUSER_NONSECURED_HB_ASID + dcore_offset, rw_asid);
+
+ /*
+ * Decoders
+ */
+ for (vdec_id = 0 ; vdec_id < NUM_OF_DEC_PER_DCORE ; vdec_id++) {
+ if (prop->decoder_enabled_mask & BIT(dcore_id * NUM_OF_DEC_PER_DCORE + vdec_id))
+ gaudi2_mmu_vdec_dcore_prepare(hdev, dcore_id, vdec_id, rw_asid, 0);
+ }
+}
+
+static void gudi2_mmu_vdec_shared_prepare(struct hl_device *hdev,
+ int shared_vdec_id, u32 rw_asid, u32 rw_mmu_bp)
+{
+ u32 offset = (mmPCIE_VDEC1_BRDG_CTRL_BASE - mmPCIE_VDEC0_BRDG_CTRL_BASE) * shared_vdec_id;
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_DEC_HB_ASID + offset, rw_asid);
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_ABNRM_HB_ASID + offset, rw_asid);
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_L2C_HB_ASID + offset, rw_asid);
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_NRM_HB_ASID + offset, rw_asid);
+
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmPCIE_VDEC0_BRDG_CTRL_AXUSER_MSIX_VCD_HB_ASID + offset, rw_asid);
+}
+
+static void gudi2_mmu_arc_farm_arc_dup_eng_prepare(struct hl_device *hdev, int arc_farm_id,
+ u32 rw_asid, u32 rw_mmu_bp)
+{
+ u32 offset = (mmARC_FARM_ARC1_DUP_ENG_BASE - mmARC_FARM_ARC0_DUP_ENG_BASE) * arc_farm_id;
+
+ WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_MMU_BP + offset, rw_mmu_bp);
+ WREG32(mmARC_FARM_ARC0_DUP_ENG_AXUSER_HB_ASID + offset, rw_asid);
+}
+
+static void gaudi2_arc_mmu_prepare(struct hl_device *hdev, u32 cpu_id, u32 asid)
+{
+ u32 reg_base, reg_offset, reg_val = 0;
+
+ reg_base = gaudi2_arc_blocks_bases[cpu_id];
+
+ /* Enable MMU and configure asid for all relevant ARC regions */
+ reg_val = FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_MMU_BP_MASK, 0);
+ reg_val |= FIELD_PREP(ARC_FARM_ARC0_AUX_ARC_REGION_CFG_0_ASID_MASK, asid);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION3_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION4_HBM0_FW);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION5_HBM1_GC_DATA);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION6_HBM2_GC_DATA);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION7_HBM3_GC_DATA);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION9_PCIE);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION10_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION11_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION12_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION13_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+
+ reg_offset = ARC_REGION_CFG_OFFSET(ARC_REGION14_GENERAL);
+ WREG32(reg_base + reg_offset, reg_val);
+}
+
+static int gaudi2_arc_mmu_prepare_all(struct hl_device *hdev, u32 asid)
+{
+ int i;
+
+ if (hdev->fw_components & FW_TYPE_BOOT_CPU)
+ return hl_fw_cpucp_engine_core_asid_set(hdev, asid);
+
+ for (i = CPU_ID_SCHED_ARC0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
+ gaudi2_arc_mmu_prepare(hdev, i, asid);
+
+ for (i = GAUDI2_QUEUE_ID_PDMA_0_0 ; i < GAUDI2_QUEUE_ID_CPU_PQ ; i += 4) {
+ if (!gaudi2_is_queue_enabled(hdev, i))
+ continue;
+
+ gaudi2_arc_mmu_prepare(hdev, gaudi2_queue_id_to_arc_id[i], asid);
+ }
+
+ return 0;
+}
+
+static int gaudi2_mmu_shared_prepare(struct hl_device *hdev, u32 asid)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 rw_asid, offset;
+ int rc, i;
+
+ rw_asid = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_MASK, asid) |
+ FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_MASK, asid);
+
+ WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
+ WREG32(mmPDMA0_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
+ WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_ASID, rw_asid);
+ WREG32(mmPDMA0_CORE_CTX_AXUSER_HB_MMU_BP, 0);
+
+ WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_ASID, rw_asid);
+ WREG32(mmPDMA1_QM_AXUSER_NONSECURED_HB_MMU_BP, 0);
+ WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_ASID, rw_asid);
+ WREG32(mmPDMA1_CORE_CTX_AXUSER_HB_MMU_BP, 0);
+
+ /* ROT */
+ for (i = 0 ; i < NUM_OF_ROT ; i++) {
+ offset = i * ROT_OFFSET;
+ WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_ASID + offset, rw_asid);
+ WREG32(mmROT0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
+ RMWREG32(mmROT0_CPL_QUEUE_AWUSER + offset, asid, MMUBP_ASID_MASK);
+ RMWREG32(mmROT0_DESC_HBW_ARUSER_LO + offset, asid, MMUBP_ASID_MASK);
+ RMWREG32(mmROT0_DESC_HBW_AWUSER_LO + offset, asid, MMUBP_ASID_MASK);
+ }
+
+ /* Shared Decoders are the last bits in the decoders mask */
+ if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 0))
+ gudi2_mmu_vdec_shared_prepare(hdev, 0, rw_asid, 0);
+
+ if (prop->decoder_enabled_mask & BIT(NUM_OF_DCORES * NUM_OF_DEC_PER_DCORE + 1))
+ gudi2_mmu_vdec_shared_prepare(hdev, 1, rw_asid, 0);
+
+ /* arc farm arc dup eng */
+ for (i = 0 ; i < NUM_OF_ARC_FARMS_ARC ; i++)
+ gudi2_mmu_arc_farm_arc_dup_eng_prepare(hdev, i, rw_asid, 0);
+
+ rc = gaudi2_arc_mmu_prepare_all(hdev, asid);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static void gaudi2_tpc_mmu_prepare(struct hl_device *hdev, int dcore, int inst, u32 offset,
+ struct iterate_module_ctx *ctx)
+{
+ struct gaudi2_tpc_mmu_data *mmu_data = ctx->data;
+
+ WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_MMU_BP + offset, 0);
+ WREG32(mmDCORE0_TPC0_CFG_AXUSER_HB_ASID + offset, mmu_data->rw_asid);
+ WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_MMU_BP + offset, 0);
+ WREG32(mmDCORE0_TPC0_QM_AXUSER_NONSECURED_HB_ASID + offset, mmu_data->rw_asid);
+}
+
+/* zero the MMUBP and set the ASID */
+static int gaudi2_mmu_prepare(struct hl_device *hdev, u32 asid)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ struct gaudi2_tpc_mmu_data tpc_mmu_data;
+ struct iterate_module_ctx tpc_iter = {
+ .fn = &gaudi2_tpc_mmu_prepare,
+ .data = &tpc_mmu_data,
+ };
+ int rc, i;
+
+ if (asid & ~DCORE0_HMMU0_STLB_ASID_ASID_MASK) {
+ dev_crit(hdev->dev, "asid %u is too big\n", asid);
+ return -EINVAL;
+ }
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_MMU_MASK))
+ return 0;
+
+ rc = gaudi2_mmu_shared_prepare(hdev, asid);
+ if (rc)
+ return rc;
+
+ /* configure DCORE MMUs */
+ tpc_mmu_data.rw_asid = (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_RD_SHIFT) |
+ (asid << ARC_FARM_KDMA_CTX_AXUSER_HB_ASID_WR_SHIFT);
+ gaudi2_iterate_tpcs(hdev, &tpc_iter);
+ for (i = 0 ; i < NUM_OF_DCORES ; i++)
+ gaudi2_mmu_dcore_prepare(hdev, i, asid);
+
+ return 0;
+}
+
+static inline bool is_info_event(u32 event)
+{
+ switch (event) {
+ case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
+ case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S ... GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
+
+ /* return in case of NIC status event - these events are received periodically and not as
+ * an indication to an error.
+ */
+ case GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 ... GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static void gaudi2_print_event(struct hl_device *hdev, u16 event_type,
+ bool ratelimited, const char *fmt, ...)
+{
+ struct va_format vaf;
+ va_list args;
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+
+ if (ratelimited)
+ dev_err_ratelimited(hdev->dev, "%s: %pV\n",
+ gaudi2_irq_map_table[event_type].valid ?
+ gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
+ else
+ dev_err(hdev->dev, "%s: %pV\n",
+ gaudi2_irq_map_table[event_type].valid ?
+ gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf);
+
+ va_end(args);
+}
+
+static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type,
+ struct hl_eq_ecc_data *ecc_data)
+{
+ u64 ecc_address = 0, ecc_syndrom = 0;
+ u8 memory_wrapper_idx = 0;
+
+ ecc_address = le64_to_cpu(ecc_data->ecc_address);
+ ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
+ memory_wrapper_idx = ecc_data->memory_wrapper_idx;
+
+ gaudi2_print_event(hdev, event_type, !ecc_data->is_critical,
+ "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.\n",
+ ecc_address, ecc_syndrom, memory_wrapper_idx, ecc_data->is_critical);
+
+ return !!ecc_data->is_critical;
+}
+
+/*
+ * gaudi2_queue_idx_dec - decrement queue index (pi/ci) and handle wrap
+ *
+ * @idx: the current pi/ci value
+ * @q_len: the queue length (power of 2)
+ *
+ * @return the cyclically decremented index
+ */
+static inline u32 gaudi2_queue_idx_dec(u32 idx, u32 q_len)
+{
+ u32 mask = q_len - 1;
+
+ /*
+ * modular decrement is equivalent to adding (queue_size -1)
+ * later we take LSBs to make sure the value is in the
+ * range [0, queue_len - 1]
+ */
+ return (idx + q_len - 1) & mask;
+}
+
+/**
+ * gaudi2_print_sw_config_stream_data - print SW config stream data
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ */
+static void gaudi2_print_sw_config_stream_data(struct hl_device *hdev,
+ u32 stream, u64 qman_base)
+{
+ u64 cq_ptr_lo, cq_ptr_hi, cq_tsize, cq_ptr;
+ u32 cq_ptr_lo_off, size;
+
+ cq_ptr_lo_off = mmDCORE0_TPC0_QM_CQ_PTR_LO_1 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0;
+
+ cq_ptr_lo = qman_base + (mmDCORE0_TPC0_QM_CQ_PTR_LO_0 - mmDCORE0_TPC0_QM_BASE) +
+ stream * cq_ptr_lo_off;
+
+ cq_ptr_hi = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_PTR_HI_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
+
+ cq_tsize = cq_ptr_lo + (mmDCORE0_TPC0_QM_CQ_TSIZE_0 - mmDCORE0_TPC0_QM_CQ_PTR_LO_0);
+
+ cq_ptr = (((u64) RREG32(cq_ptr_hi)) << 32) | RREG32(cq_ptr_lo);
+ size = RREG32(cq_tsize);
+ dev_info(hdev->dev, "stop on err: stream: %u, addr: %#llx, size: %x\n",
+ stream, cq_ptr, size);
+}
+
+/**
+ * gaudi2_print_last_pqes_on_err - print last PQEs on error
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @qid_base: first QID of the QMAN (out of 4 streams)
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ * @pr_sw_conf: if true print the SW config stream data (CQ PTR and SIZE)
+ */
+static void gaudi2_print_last_pqes_on_err(struct hl_device *hdev, u32 qid_base, u32 stream,
+ u64 qman_base, bool pr_sw_conf)
+{
+ u32 ci, qm_ci_stream_off;
+ struct hl_hw_queue *q;
+ u64 pq_ci;
+ int i;
+
+ q = &hdev->kernel_queues[qid_base + stream];
+
+ qm_ci_stream_off = mmDCORE0_TPC0_QM_PQ_CI_1 - mmDCORE0_TPC0_QM_PQ_CI_0;
+ pq_ci = qman_base + (mmDCORE0_TPC0_QM_PQ_CI_0 - mmDCORE0_TPC0_QM_BASE) +
+ stream * qm_ci_stream_off;
+
+ hdev->asic_funcs->hw_queues_lock(hdev);
+
+ if (pr_sw_conf)
+ gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
+
+ ci = RREG32(pq_ci);
+
+ /* we should start printing form ci -1 */
+ ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
+
+ for (i = 0; i < PQ_FETCHER_CACHE_SIZE; i++) {
+ struct hl_bd *bd;
+ u64 addr;
+ u32 len;
+
+ bd = q->kernel_address;
+ bd += ci;
+
+ len = le32_to_cpu(bd->len);
+ /* len 0 means uninitialized entry- break */
+ if (!len)
+ break;
+
+ addr = le64_to_cpu(bd->ptr);
+
+ dev_info(hdev->dev, "stop on err PQE(stream %u): ci: %u, addr: %#llx, size: %x\n",
+ stream, ci, addr, len);
+
+ /* get previous ci, wrap if needed */
+ ci = gaudi2_queue_idx_dec(ci, HL_QUEUE_LENGTH);
+ }
+
+ hdev->asic_funcs->hw_queues_unlock(hdev);
+}
+
+/**
+ * print_qman_data_on_err - extract QMAN data on error
+ *
+ * @hdev: pointer to the habanalabs device structure
+ * @qid_base: first QID of the QMAN (out of 4 streams)
+ * @stream: the QMAN's stream
+ * @qman_base: base address of QMAN registers block
+ *
+ * This function attempt to extract as much data as possible on QMAN error.
+ * On upper CP print the SW config stream data and last 8 PQEs.
+ * On lower CP print SW config data and last PQEs of ALL 4 upper CPs
+ */
+static void print_qman_data_on_err(struct hl_device *hdev, u32 qid_base, u32 stream, u64 qman_base)
+{
+ u32 i;
+
+ if (stream != QMAN_STREAMS) {
+ gaudi2_print_last_pqes_on_err(hdev, qid_base, stream, qman_base, true);
+ return;
+ }
+
+ gaudi2_print_sw_config_stream_data(hdev, stream, qman_base);
+
+ for (i = 0 ; i < QMAN_STREAMS ; i++)
+ gaudi2_print_last_pqes_on_err(hdev, qid_base, i, qman_base, false);
+}
+
+static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, u16 event_type,
+ u64 qman_base, u32 qid_base)
+{
+ u32 i, j, glbl_sts_val, arb_err_val, num_error_causes, error_count = 0;
+ u64 glbl_sts_addr, arb_err_addr;
+ char reg_desc[32];
+
+ glbl_sts_addr = qman_base + (mmDCORE0_TPC0_QM_GLBL_ERR_STS_0 - mmDCORE0_TPC0_QM_BASE);
+ arb_err_addr = qman_base + (mmDCORE0_TPC0_QM_ARB_ERR_CAUSE - mmDCORE0_TPC0_QM_BASE);
+
+ /* Iterate through all stream GLBL_ERR_STS registers + Lower CP */
+ for (i = 0 ; i < QMAN_STREAMS + 1 ; i++) {
+ glbl_sts_val = RREG32(glbl_sts_addr + 4 * i);
+
+ if (!glbl_sts_val)
+ continue;
+
+ if (i == QMAN_STREAMS) {
+ snprintf(reg_desc, ARRAY_SIZE(reg_desc), "LowerCP");
+ num_error_causes = GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE;
+ } else {
+ snprintf(reg_desc, ARRAY_SIZE(reg_desc), "stream%u", i);
+ num_error_causes = GAUDI2_NUM_OF_QM_ERR_CAUSE;
+ }
+
+ for (j = 0 ; j < num_error_causes ; j++)
+ if (glbl_sts_val & BIT(j)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "%s. err cause: %s", reg_desc,
+ i == QMAN_STREAMS ?
+ gaudi2_qman_lower_cp_error_cause[j] :
+ gaudi2_qman_error_cause[j]);
+ error_count++;
+ }
+
+ print_qman_data_on_err(hdev, qid_base, i, qman_base);
+ }
+
+ arb_err_val = RREG32(arb_err_addr);
+
+ if (!arb_err_val)
+ goto out;
+
+ for (j = 0 ; j < GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE ; j++) {
+ if (arb_err_val & BIT(j)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "ARB_ERR. err cause: %s",
+ gaudi2_qman_arb_error_cause[j]);
+ error_count++;
+ }
+ }
+
+out:
+ return error_count;
+}
+
+static void gaudi2_razwi_rr_hbw_shared_printf_info(struct hl_device *hdev,
+ u64 rtr_mstr_if_base_addr, bool is_write, char *name,
+ enum gaudi2_engine_id id, u64 *event_mask)
+{
+ u32 razwi_hi, razwi_lo, razwi_xy;
+ u16 eng_id = id;
+ u8 rd_wr_flag;
+
+ if (is_write) {
+ razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HI);
+ razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_LO);
+ razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_XY);
+ rd_wr_flag = HL_RAZWI_WRITE;
+ } else {
+ razwi_hi = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HI);
+ razwi_lo = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_LO);
+ razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_XY);
+ rd_wr_flag = HL_RAZWI_READ;
+ }
+
+ hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &eng_id, 1,
+ rd_wr_flag | HL_RAZWI_HBW, event_mask);
+
+ dev_err_ratelimited(hdev->dev,
+ "%s-RAZWI SHARED RR HBW %s error, address %#llx, Initiator coordinates 0x%x\n",
+ name, is_write ? "WR" : "RD", (u64)razwi_hi << 32 | razwi_lo, razwi_xy);
+}
+
+static void gaudi2_razwi_rr_lbw_shared_printf_info(struct hl_device *hdev,
+ u64 rtr_mstr_if_base_addr, bool is_write, char *name,
+ enum gaudi2_engine_id id, u64 *event_mask)
+{
+ u64 razwi_addr = CFG_BASE;
+ u32 razwi_xy;
+ u16 eng_id = id;
+ u8 rd_wr_flag;
+
+ if (is_write) {
+ razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI);
+ razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_XY);
+ rd_wr_flag = HL_RAZWI_WRITE;
+ } else {
+ razwi_addr += RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI);
+ razwi_xy = RREG32(rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_XY);
+ rd_wr_flag = HL_RAZWI_READ;
+ }
+
+ hl_handle_razwi(hdev, razwi_addr, &eng_id, 1, rd_wr_flag | HL_RAZWI_LBW, event_mask);
+ dev_err_ratelimited(hdev->dev,
+ "%s-RAZWI SHARED RR LBW %s error, mstr_if 0x%llx, captured address 0x%llX Initiator coordinates 0x%x\n",
+ name, is_write ? "WR" : "RD", rtr_mstr_if_base_addr, razwi_addr,
+ razwi_xy);
+}
+
+static enum gaudi2_engine_id gaudi2_razwi_calc_engine_id(struct hl_device *hdev,
+ enum razwi_event_sources module, u8 module_idx)
+{
+ switch (module) {
+ case RAZWI_TPC:
+ if (module_idx == (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES))
+ return GAUDI2_DCORE0_ENGINE_ID_TPC_6;
+ return (((module_idx / NUM_OF_TPC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
+ (module_idx % NUM_OF_TPC_PER_DCORE) +
+ (GAUDI2_DCORE0_ENGINE_ID_TPC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
+
+ case RAZWI_MME:
+ return ((GAUDI2_DCORE0_ENGINE_ID_MME - GAUDI2_DCORE0_ENGINE_ID_EDMA_0) +
+ (module_idx * ENGINE_ID_DCORE_OFFSET));
+
+ case RAZWI_EDMA:
+ return (((module_idx / NUM_OF_EDMA_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
+ (module_idx % NUM_OF_EDMA_PER_DCORE));
+
+ case RAZWI_PDMA:
+ return (GAUDI2_ENGINE_ID_PDMA_0 + module_idx);
+
+ case RAZWI_NIC:
+ return (GAUDI2_ENGINE_ID_NIC0_0 + (NIC_NUMBER_OF_QM_PER_MACRO * module_idx));
+
+ case RAZWI_DEC:
+ if (module_idx == 8)
+ return GAUDI2_PCIE_ENGINE_ID_DEC_0;
+
+ if (module_idx == 9)
+ return GAUDI2_PCIE_ENGINE_ID_DEC_1;
+ ;
+ return (((module_idx / NUM_OF_DEC_PER_DCORE) * ENGINE_ID_DCORE_OFFSET) +
+ (module_idx % NUM_OF_DEC_PER_DCORE) +
+ (GAUDI2_DCORE0_ENGINE_ID_DEC_0 - GAUDI2_DCORE0_ENGINE_ID_EDMA_0));
+
+ case RAZWI_ROT:
+ return GAUDI2_ENGINE_ID_ROT_0 + module_idx;
+
+ default:
+ return GAUDI2_ENGINE_ID_SIZE;
+ }
+}
+
+/*
+ * This function handles RR(Range register) hit events.
+ * raised be initiators not PSOC RAZWI.
+ */
+static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
+ enum razwi_event_sources module, u8 module_idx,
+ u8 module_sub_idx, u64 *event_mask)
+{
+ bool via_sft = false;
+ u32 hbw_rtr_id, lbw_rtr_id, dcore_id, dcore_rtr_id, eng_id;
+ u64 hbw_rtr_mstr_if_base_addr, lbw_rtr_mstr_if_base_addr;
+ u32 hbw_shrd_aw = 0, hbw_shrd_ar = 0;
+ u32 lbw_shrd_aw = 0, lbw_shrd_ar = 0;
+ char initiator_name[64];
+
+ switch (module) {
+ case RAZWI_TPC:
+ hbw_rtr_id = gaudi2_tpc_initiator_hbw_rtr_id[module_idx];
+
+ /* TODO : remove this check and depend only on tpc routers table
+ * when SW-118828 is resolved
+ */
+ if (!hdev->asic_prop.fw_security_enabled &&
+ ((module_idx == 0) || (module_idx == 1)))
+ lbw_rtr_id = DCORE0_RTR0;
+ else
+ lbw_rtr_id = gaudi2_tpc_initiator_lbw_rtr_id[module_idx];
+ sprintf(initiator_name, "TPC_%u", module_idx);
+ break;
+ case RAZWI_MME:
+ sprintf(initiator_name, "MME_%u", module_idx);
+ switch (module_sub_idx) {
+ case MME_WAP0:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap0;
+ break;
+ case MME_WAP1:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].wap1;
+ break;
+ case MME_WRITE:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].write;
+ break;
+ case MME_READ:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].read;
+ break;
+ case MME_SBTE0:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte0;
+ break;
+ case MME_SBTE1:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte1;
+ break;
+ case MME_SBTE2:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte2;
+ break;
+ case MME_SBTE3:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte3;
+ break;
+ case MME_SBTE4:
+ hbw_rtr_id = gaudi2_mme_initiator_rtr_id[module_idx].sbte4;
+ break;
+ default:
+ return;
+ }
+ lbw_rtr_id = hbw_rtr_id;
+ break;
+ case RAZWI_EDMA:
+ hbw_rtr_mstr_if_base_addr = gaudi2_edma_initiator_hbw_sft[module_idx];
+ dcore_id = module_idx / NUM_OF_EDMA_PER_DCORE;
+ /* SFT has separate MSTR_IF for LBW, only there we can
+ * read the LBW razwi related registers
+ */
+ lbw_rtr_mstr_if_base_addr = mmSFT0_LBW_RTR_IF_MSTR_IF_RR_SHRD_HBW_BASE +
+ dcore_id * SFT_DCORE_OFFSET;
+ via_sft = true;
+ sprintf(initiator_name, "EDMA_%u", module_idx);
+ break;
+ case RAZWI_PDMA:
+ hbw_rtr_id = gaudi2_pdma_initiator_hbw_rtr_id[module_idx];
+ lbw_rtr_id = gaudi2_pdma_initiator_lbw_rtr_id[module_idx];
+ sprintf(initiator_name, "PDMA_%u", module_idx);
+ break;
+ case RAZWI_NIC:
+ hbw_rtr_id = gaudi2_nic_initiator_hbw_rtr_id[module_idx];
+ lbw_rtr_id = gaudi2_nic_initiator_lbw_rtr_id[module_idx];
+ sprintf(initiator_name, "NIC_%u", module_idx);
+ break;
+ case RAZWI_DEC:
+ hbw_rtr_id = gaudi2_dec_initiator_hbw_rtr_id[module_idx];
+ lbw_rtr_id = gaudi2_dec_initiator_lbw_rtr_id[module_idx];
+ sprintf(initiator_name, "DEC_%u", module_idx);
+ break;
+ case RAZWI_ROT:
+ hbw_rtr_id = gaudi2_rot_initiator_hbw_rtr_id[module_idx];
+ lbw_rtr_id = gaudi2_rot_initiator_lbw_rtr_id[module_idx];
+ sprintf(initiator_name, "ROT_%u", module_idx);
+ break;
+ default:
+ return;
+ }
+
+ /* Find router mstr_if register base */
+ if (!via_sft) {
+ dcore_id = hbw_rtr_id / NUM_OF_RTR_PER_DCORE;
+ dcore_rtr_id = hbw_rtr_id % NUM_OF_RTR_PER_DCORE;
+ hbw_rtr_mstr_if_base_addr = mmDCORE0_RTR0_CTRL_BASE +
+ dcore_id * DCORE_OFFSET +
+ dcore_rtr_id * DCORE_RTR_OFFSET +
+ RTR_MSTR_IF_OFFSET;
+ lbw_rtr_mstr_if_base_addr = hbw_rtr_mstr_if_base_addr +
+ (((s32)lbw_rtr_id - hbw_rtr_id) * DCORE_RTR_OFFSET);
+ }
+
+ /* Find out event cause by reading "RAZWI_HAPPENED" registers */
+ hbw_shrd_aw = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED);
+ hbw_shrd_ar = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED);
+ lbw_shrd_aw = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
+ lbw_shrd_ar = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
+
+ eng_id = gaudi2_razwi_calc_engine_id(hdev, module, module_idx);
+ if (hbw_shrd_aw) {
+ gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, true,
+ initiator_name, eng_id, event_mask);
+
+ /* Clear event indication */
+ WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED, hbw_shrd_aw);
+ }
+
+ if (hbw_shrd_ar) {
+ gaudi2_razwi_rr_hbw_shared_printf_info(hdev, hbw_rtr_mstr_if_base_addr, false,
+ initiator_name, eng_id, event_mask);
+
+ /* Clear event indication */
+ WREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED, hbw_shrd_ar);
+ }
+
+ if (lbw_shrd_aw) {
+ gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, true,
+ initiator_name, eng_id, event_mask);
+
+ /* Clear event indication */
+ WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED, lbw_shrd_aw);
+ }
+
+ if (lbw_shrd_ar) {
+ gaudi2_razwi_rr_lbw_shared_printf_info(hdev, lbw_rtr_mstr_if_base_addr, false,
+ initiator_name, eng_id, event_mask);
+
+ /* Clear event indication */
+ WREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED, lbw_shrd_ar);
+ }
+}
+
+static void gaudi2_check_if_razwi_happened(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u8 mod_idx, sub_mod;
+
+ /* check all TPCs */
+ for (mod_idx = 0 ; mod_idx < (NUM_OF_TPC_PER_DCORE * NUM_OF_DCORES + 1) ; mod_idx++) {
+ if (prop->tpc_enabled_mask & BIT(mod_idx))
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, mod_idx, 0, NULL);
+ }
+
+ /* check all MMEs */
+ for (mod_idx = 0 ; mod_idx < (NUM_OF_MME_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
+ for (sub_mod = MME_WAP0 ; sub_mod < MME_INITIATORS_MAX ; sub_mod++)
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mod_idx,
+ sub_mod, NULL);
+
+ /* check all EDMAs */
+ for (mod_idx = 0 ; mod_idx < (NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES) ; mod_idx++)
+ if (prop->edma_enabled_mask & BIT(mod_idx))
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, mod_idx, 0, NULL);
+
+ /* check all PDMAs */
+ for (mod_idx = 0 ; mod_idx < NUM_OF_PDMA ; mod_idx++)
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_PDMA, mod_idx, 0, NULL);
+
+ /* check all NICs */
+ for (mod_idx = 0 ; mod_idx < NIC_NUMBER_OF_PORTS ; mod_idx++)
+ if (hdev->nic_ports_mask & BIT(mod_idx))
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_NIC, mod_idx >> 1, 0,
+ NULL);
+
+ /* check all DECs */
+ for (mod_idx = 0 ; mod_idx < NUMBER_OF_DEC ; mod_idx++)
+ if (prop->decoder_enabled_mask & BIT(mod_idx))
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, mod_idx, 0, NULL);
+
+ /* check all ROTs */
+ for (mod_idx = 0 ; mod_idx < NUM_OF_ROT ; mod_idx++)
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, mod_idx, 0, NULL);
+}
+
+static const char *gaudi2_get_initiators_name(u32 rtr_id)
+{
+ switch (rtr_id) {
+ case DCORE0_RTR0:
+ return "DEC0/1/8/9, TPC24, PDMA0/1, PMMU, PCIE_IF, EDMA0/2, HMMU0/2/4/6, CPU";
+ case DCORE0_RTR1:
+ return "TPC0/1";
+ case DCORE0_RTR2:
+ return "TPC2/3";
+ case DCORE0_RTR3:
+ return "TPC4/5";
+ case DCORE0_RTR4:
+ return "MME0_SBTE0/1";
+ case DCORE0_RTR5:
+ return "MME0_WAP0/SBTE2";
+ case DCORE0_RTR6:
+ return "MME0_CTRL_WR/SBTE3";
+ case DCORE0_RTR7:
+ return "MME0_WAP1/CTRL_RD/SBTE4";
+ case DCORE1_RTR0:
+ return "MME1_WAP1/CTRL_RD/SBTE4";
+ case DCORE1_RTR1:
+ return "MME1_CTRL_WR/SBTE3";
+ case DCORE1_RTR2:
+ return "MME1_WAP0/SBTE2";
+ case DCORE1_RTR3:
+ return "MME1_SBTE0/1";
+ case DCORE1_RTR4:
+ return "TPC10/11";
+ case DCORE1_RTR5:
+ return "TPC8/9";
+ case DCORE1_RTR6:
+ return "TPC6/7";
+ case DCORE1_RTR7:
+ return "DEC2/3, NIC0/1/2/3/4, ARC_FARM, KDMA, EDMA1/3, HMMU1/3/5/7";
+ case DCORE2_RTR0:
+ return "DEC4/5, NIC5/6/7/8, EDMA4/6, HMMU8/10/12/14, ROT0";
+ case DCORE2_RTR1:
+ return "TPC16/17";
+ case DCORE2_RTR2:
+ return "TPC14/15";
+ case DCORE2_RTR3:
+ return "TPC12/13";
+ case DCORE2_RTR4:
+ return "MME2_SBTE0/1";
+ case DCORE2_RTR5:
+ return "MME2_WAP0/SBTE2";
+ case DCORE2_RTR6:
+ return "MME2_CTRL_WR/SBTE3";
+ case DCORE2_RTR7:
+ return "MME2_WAP1/CTRL_RD/SBTE4";
+ case DCORE3_RTR0:
+ return "MME3_WAP1/CTRL_RD/SBTE4";
+ case DCORE3_RTR1:
+ return "MME3_CTRL_WR/SBTE3";
+ case DCORE3_RTR2:
+ return "MME3_WAP0/SBTE2";
+ case DCORE3_RTR3:
+ return "MME3_SBTE0/1";
+ case DCORE3_RTR4:
+ return "TPC18/19";
+ case DCORE3_RTR5:
+ return "TPC20/21";
+ case DCORE3_RTR6:
+ return "TPC22/23";
+ case DCORE3_RTR7:
+ return "DEC6/7, NIC9/10/11, EDMA5/7, HMMU9/11/13/15, ROT1, PSOC";
+ default:
+ return "N/A";
+ }
+}
+
+static u16 gaudi2_get_razwi_initiators(u32 rtr_id, u16 *engines)
+{
+ switch (rtr_id) {
+ case DCORE0_RTR0:
+ engines[0] = GAUDI2_DCORE0_ENGINE_ID_DEC_0;
+ engines[1] = GAUDI2_DCORE0_ENGINE_ID_DEC_1;
+ engines[2] = GAUDI2_PCIE_ENGINE_ID_DEC_0;
+ engines[3] = GAUDI2_PCIE_ENGINE_ID_DEC_1;
+ engines[4] = GAUDI2_DCORE0_ENGINE_ID_TPC_6;
+ engines[5] = GAUDI2_ENGINE_ID_PDMA_0;
+ engines[6] = GAUDI2_ENGINE_ID_PDMA_1;
+ engines[7] = GAUDI2_ENGINE_ID_PCIE;
+ engines[8] = GAUDI2_DCORE0_ENGINE_ID_EDMA_0;
+ engines[9] = GAUDI2_DCORE1_ENGINE_ID_EDMA_0;
+ engines[10] = GAUDI2_ENGINE_ID_PSOC;
+ return 11;
+
+ case DCORE0_RTR1:
+ engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_0;
+ engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_1;
+ return 2;
+
+ case DCORE0_RTR2:
+ engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_2;
+ engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_3;
+ return 2;
+
+ case DCORE0_RTR3:
+ engines[0] = GAUDI2_DCORE0_ENGINE_ID_TPC_4;
+ engines[1] = GAUDI2_DCORE0_ENGINE_ID_TPC_5;
+ return 2;
+
+ case DCORE0_RTR4:
+ case DCORE0_RTR5:
+ case DCORE0_RTR6:
+ case DCORE0_RTR7:
+ engines[0] = GAUDI2_DCORE0_ENGINE_ID_MME;
+ return 1;
+
+ case DCORE1_RTR0:
+ case DCORE1_RTR1:
+ case DCORE1_RTR2:
+ case DCORE1_RTR3:
+ engines[0] = GAUDI2_DCORE1_ENGINE_ID_MME;
+ return 1;
+
+ case DCORE1_RTR4:
+ engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_4;
+ engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_5;
+ return 2;
+
+ case DCORE1_RTR5:
+ engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_2;
+ engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_3;
+ return 2;
+
+ case DCORE1_RTR6:
+ engines[0] = GAUDI2_DCORE1_ENGINE_ID_TPC_0;
+ engines[1] = GAUDI2_DCORE1_ENGINE_ID_TPC_1;
+ return 2;
+
+ case DCORE1_RTR7:
+ engines[0] = GAUDI2_DCORE1_ENGINE_ID_DEC_0;
+ engines[1] = GAUDI2_DCORE1_ENGINE_ID_DEC_1;
+ engines[2] = GAUDI2_ENGINE_ID_NIC0_0;
+ engines[3] = GAUDI2_ENGINE_ID_NIC1_0;
+ engines[4] = GAUDI2_ENGINE_ID_NIC2_0;
+ engines[5] = GAUDI2_ENGINE_ID_NIC3_0;
+ engines[6] = GAUDI2_ENGINE_ID_NIC4_0;
+ engines[7] = GAUDI2_ENGINE_ID_ARC_FARM;
+ engines[8] = GAUDI2_ENGINE_ID_KDMA;
+ engines[9] = GAUDI2_DCORE0_ENGINE_ID_EDMA_1;
+ engines[10] = GAUDI2_DCORE1_ENGINE_ID_EDMA_1;
+ return 11;
+
+ case DCORE2_RTR0:
+ engines[0] = GAUDI2_DCORE2_ENGINE_ID_DEC_0;
+ engines[1] = GAUDI2_DCORE2_ENGINE_ID_DEC_1;
+ engines[2] = GAUDI2_ENGINE_ID_NIC5_0;
+ engines[3] = GAUDI2_ENGINE_ID_NIC6_0;
+ engines[4] = GAUDI2_ENGINE_ID_NIC7_0;
+ engines[5] = GAUDI2_ENGINE_ID_NIC8_0;
+ engines[6] = GAUDI2_DCORE2_ENGINE_ID_EDMA_0;
+ engines[7] = GAUDI2_DCORE3_ENGINE_ID_EDMA_0;
+ engines[8] = GAUDI2_ENGINE_ID_ROT_0;
+ return 9;
+
+ case DCORE2_RTR1:
+ engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_4;
+ engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_5;
+ return 2;
+
+ case DCORE2_RTR2:
+ engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_2;
+ engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_3;
+ return 2;
+
+ case DCORE2_RTR3:
+ engines[0] = GAUDI2_DCORE2_ENGINE_ID_TPC_0;
+ engines[1] = GAUDI2_DCORE2_ENGINE_ID_TPC_1;
+ return 2;
+
+ case DCORE2_RTR4:
+ case DCORE2_RTR5:
+ case DCORE2_RTR6:
+ case DCORE2_RTR7:
+ engines[0] = GAUDI2_DCORE2_ENGINE_ID_MME;
+ return 1;
+ case DCORE3_RTR0:
+ case DCORE3_RTR1:
+ case DCORE3_RTR2:
+ case DCORE3_RTR3:
+ engines[0] = GAUDI2_DCORE3_ENGINE_ID_MME;
+ return 1;
+ case DCORE3_RTR4:
+ engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_0;
+ engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_1;
+ return 2;
+ case DCORE3_RTR5:
+ engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_2;
+ engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_3;
+ return 2;
+ case DCORE3_RTR6:
+ engines[0] = GAUDI2_DCORE3_ENGINE_ID_TPC_4;
+ engines[1] = GAUDI2_DCORE3_ENGINE_ID_TPC_5;
+ return 2;
+ case DCORE3_RTR7:
+ engines[0] = GAUDI2_DCORE3_ENGINE_ID_DEC_0;
+ engines[1] = GAUDI2_DCORE3_ENGINE_ID_DEC_1;
+ engines[2] = GAUDI2_ENGINE_ID_NIC9_0;
+ engines[3] = GAUDI2_ENGINE_ID_NIC10_0;
+ engines[4] = GAUDI2_ENGINE_ID_NIC11_0;
+ engines[5] = GAUDI2_DCORE2_ENGINE_ID_EDMA_1;
+ engines[6] = GAUDI2_DCORE3_ENGINE_ID_EDMA_1;
+ engines[7] = GAUDI2_ENGINE_ID_ROT_1;
+ engines[8] = GAUDI2_ENGINE_ID_ROT_0;
+ return 9;
+ default:
+ return 0;
+ }
+}
+
+static void gaudi2_razwi_unmapped_addr_hbw_printf_info(struct hl_device *hdev, u32 rtr_id,
+ u64 rtr_ctrl_base_addr, bool is_write,
+ u64 *event_mask)
+{
+ u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
+ u32 razwi_hi, razwi_lo;
+ u8 rd_wr_flag;
+
+ num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
+
+ if (is_write) {
+ razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_HI);
+ razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_ADDR_LO);
+ rd_wr_flag = HL_RAZWI_WRITE;
+
+ /* Clear set indication */
+ WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET, 0x1);
+ } else {
+ razwi_hi = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_HI);
+ razwi_lo = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_ADDR_LO);
+ rd_wr_flag = HL_RAZWI_READ;
+
+ /* Clear set indication */
+ WREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET, 0x1);
+ }
+
+ hl_handle_razwi(hdev, (u64)razwi_hi << 32 | razwi_lo, &engines[0], num_of_eng,
+ rd_wr_flag | HL_RAZWI_HBW, event_mask);
+ dev_err_ratelimited(hdev->dev,
+ "RAZWI PSOC unmapped HBW %s error, rtr id %u, address %#llx\n",
+ is_write ? "WR" : "RD", rtr_id, (u64)razwi_hi << 32 | razwi_lo);
+
+ dev_err_ratelimited(hdev->dev,
+ "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
+}
+
+static void gaudi2_razwi_unmapped_addr_lbw_printf_info(struct hl_device *hdev, u32 rtr_id,
+ u64 rtr_ctrl_base_addr, bool is_write,
+ u64 *event_mask)
+{
+ u16 engines[HL_RAZWI_MAX_NUM_OF_ENGINES_PER_RTR], num_of_eng;
+ u64 razwi_addr = CFG_BASE;
+ u8 rd_wr_flag;
+
+ num_of_eng = gaudi2_get_razwi_initiators(rtr_id, &engines[0]);
+
+ if (is_write) {
+ razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_ADDR);
+ rd_wr_flag = HL_RAZWI_WRITE;
+
+ /* Clear set indication */
+ WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET, 0x1);
+ } else {
+ razwi_addr += RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_ADDR);
+ rd_wr_flag = HL_RAZWI_READ;
+
+ /* Clear set indication */
+ WREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET, 0x1);
+ }
+
+ hl_handle_razwi(hdev, razwi_addr, &engines[0], num_of_eng, rd_wr_flag | HL_RAZWI_LBW,
+ event_mask);
+ dev_err_ratelimited(hdev->dev,
+ "RAZWI PSOC unmapped LBW %s error, rtr id %u, address 0x%llX\n",
+ is_write ? "WR" : "RD", rtr_id, razwi_addr);
+
+ dev_err_ratelimited(hdev->dev,
+ "Initiators: %s\n", gaudi2_get_initiators_name(rtr_id));
+}
+
+/* PSOC RAZWI interrupt occurs only when trying to access a bad address */
+static int gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *event_mask)
+{
+ u32 hbw_aw_set, hbw_ar_set, lbw_aw_set, lbw_ar_set, rtr_id, dcore_id, dcore_rtr_id, xy,
+ razwi_mask_info, razwi_intr = 0, error_count = 0;
+ int rtr_map_arr_len = NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES;
+ u64 rtr_ctrl_base_addr;
+
+ if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX)) {
+ razwi_intr = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT);
+ if (!razwi_intr)
+ return 0;
+ }
+
+ razwi_mask_info = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_MASK_INFO);
+ xy = FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_L_MASK, razwi_mask_info);
+
+ dev_err_ratelimited(hdev->dev,
+ "PSOC RAZWI interrupt: Mask %d, AR %d, AW %d, AXUSER_L 0x%x AXUSER_H 0x%x\n",
+ FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_MASK_MASK, razwi_mask_info),
+ FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AR_MASK, razwi_mask_info),
+ FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_WAS_AW_MASK, razwi_mask_info),
+ xy,
+ FIELD_GET(PSOC_GLOBAL_CONF_RAZWI_MASK_INFO_AXUSER_H_MASK, razwi_mask_info));
+
+ if (xy == 0) {
+ dev_err_ratelimited(hdev->dev,
+ "PSOC RAZWI interrupt: received event from 0 rtr coordinates\n");
+ goto clear;
+ }
+
+ /* Find router id by router coordinates */
+ for (rtr_id = 0 ; rtr_id < rtr_map_arr_len ; rtr_id++)
+ if (rtr_coordinates_to_rtr_id[rtr_id] == xy)
+ break;
+
+ if (rtr_id == rtr_map_arr_len) {
+ dev_err_ratelimited(hdev->dev,
+ "PSOC RAZWI interrupt: invalid rtr coordinates (0x%x)\n", xy);
+ goto clear;
+ }
+
+ /* Find router mstr_if register base */
+ dcore_id = rtr_id / NUM_OF_RTR_PER_DCORE;
+ dcore_rtr_id = rtr_id % NUM_OF_RTR_PER_DCORE;
+ rtr_ctrl_base_addr = mmDCORE0_RTR0_CTRL_BASE + dcore_id * DCORE_OFFSET +
+ dcore_rtr_id * DCORE_RTR_OFFSET;
+
+ hbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AW_SET);
+ hbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_HBW_AR_SET);
+ lbw_aw_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AW_SET);
+ lbw_ar_set = RREG32(rtr_ctrl_base_addr + DEC_RAZWI_LBW_AR_SET);
+
+ if (hbw_aw_set)
+ gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
+ rtr_ctrl_base_addr, true, event_mask);
+
+ if (hbw_ar_set)
+ gaudi2_razwi_unmapped_addr_hbw_printf_info(hdev, rtr_id,
+ rtr_ctrl_base_addr, false, event_mask);
+
+ if (lbw_aw_set)
+ gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
+ rtr_ctrl_base_addr, true, event_mask);
+
+ if (lbw_ar_set)
+ gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id,
+ rtr_ctrl_base_addr, false, event_mask);
+
+ error_count++;
+
+clear:
+ /* Clear Interrupts only on pldm or if f/w doesn't handle interrupts */
+ if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX))
+ WREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT, razwi_intr);
+
+ return error_count;
+}
+
+static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base, u16 event_type)
+{
+ u32 i, sts_val, sts_clr_val = 0, error_count = 0;
+
+ sts_val = RREG32(qman_base + QM_SEI_STATUS_OFFSET);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_qm_sei_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ WREG32(qman_base + QM_SEI_STATUS_OFFSET, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type,
+ bool extended_err_check, u64 *event_mask)
+{
+ enum razwi_event_sources module;
+ u32 error_count = 0;
+ u64 qman_base;
+ u8 index;
+
+ switch (event_type) {
+ case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC23_AXI_ERR_RSP:
+ index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
+ qman_base = mmDCORE0_TPC0_QM_BASE +
+ (index / NUM_OF_TPC_PER_DCORE) * DCORE_OFFSET +
+ (index % NUM_OF_TPC_PER_DCORE) * DCORE_TPC_OFFSET;
+ module = RAZWI_TPC;
+ break;
+ case GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
+ qman_base = mmDCORE0_TPC6_QM_BASE;
+ module = RAZWI_TPC;
+ break;
+ case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
+ index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
+ (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
+ GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
+ qman_base = mmDCORE0_MME_QM_BASE + index * DCORE_OFFSET;
+ module = RAZWI_MME;
+ break;
+ case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
+ case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
+ index = event_type - GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP;
+ qman_base = mmPDMA0_QM_BASE + index * PDMA_OFFSET;
+ module = RAZWI_PDMA;
+ break;
+ case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
+ index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
+ qman_base = mmROT0_QM_BASE + index * ROT_OFFSET;
+ module = RAZWI_ROT;
+ break;
+ default:
+ return 0;
+ }
+
+ error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
+
+ /* There is a single event per NIC macro, so should check its both QMAN blocks */
+ if (event_type >= GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE &&
+ event_type <= GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE)
+ error_count += _gaudi2_handle_qm_sei_err(hdev,
+ qman_base + NIC_QM_OFFSET, event_type);
+
+ if (extended_err_check) {
+ /* check if RAZWI happened */
+ gaudi2_ack_module_razwi_event_handler(hdev, module, 0, 0, event_mask);
+ hl_check_for_glbl_errors(hdev);
+ }
+
+ return error_count;
+}
+
+static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
+{
+ u32 qid_base, error_count = 0;
+ u64 qman_base;
+ u8 index;
+
+ switch (event_type) {
+ case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_TPC5_QM:
+ index = event_type - GAUDI2_EVENT_TPC0_QM;
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmDCORE0_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
+ break;
+ case GAUDI2_EVENT_TPC6_QM ... GAUDI2_EVENT_TPC11_QM:
+ index = event_type - GAUDI2_EVENT_TPC6_QM;
+ qid_base = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmDCORE1_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
+ break;
+ case GAUDI2_EVENT_TPC12_QM ... GAUDI2_EVENT_TPC17_QM:
+ index = event_type - GAUDI2_EVENT_TPC12_QM;
+ qid_base = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmDCORE2_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
+ break;
+ case GAUDI2_EVENT_TPC18_QM ... GAUDI2_EVENT_TPC23_QM:
+ index = event_type - GAUDI2_EVENT_TPC18_QM;
+ qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 + index * QMAN_STREAMS;
+ qman_base = mmDCORE3_TPC0_QM_BASE + index * DCORE_TPC_OFFSET;
+ break;
+ case GAUDI2_EVENT_TPC24_QM:
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0;
+ qman_base = mmDCORE0_TPC6_QM_BASE;
+ break;
+ case GAUDI2_EVENT_MME0_QM:
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0;
+ qman_base = mmDCORE0_MME_QM_BASE;
+ break;
+ case GAUDI2_EVENT_MME1_QM:
+ qid_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0;
+ qman_base = mmDCORE1_MME_QM_BASE;
+ break;
+ case GAUDI2_EVENT_MME2_QM:
+ qid_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0;
+ qman_base = mmDCORE2_MME_QM_BASE;
+ break;
+ case GAUDI2_EVENT_MME3_QM:
+ qid_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0;
+ qman_base = mmDCORE3_MME_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA0_QM:
+ index = 0;
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0;
+ qman_base = mmDCORE0_EDMA0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA1_QM:
+ index = 1;
+ qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0;
+ qman_base = mmDCORE0_EDMA1_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA2_QM:
+ index = 2;
+ qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0;
+ qman_base = mmDCORE1_EDMA0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA3_QM:
+ index = 3;
+ qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0;
+ qman_base = mmDCORE1_EDMA1_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA4_QM:
+ index = 4;
+ qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0;
+ qman_base = mmDCORE2_EDMA0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA5_QM:
+ index = 5;
+ qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0;
+ qman_base = mmDCORE2_EDMA1_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA6_QM:
+ index = 6;
+ qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0;
+ qman_base = mmDCORE3_EDMA0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_HDMA7_QM:
+ index = 7;
+ qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0;
+ qman_base = mmDCORE3_EDMA1_QM_BASE;
+ break;
+ case GAUDI2_EVENT_PDMA0_QM:
+ qid_base = GAUDI2_QUEUE_ID_PDMA_0_0;
+ qman_base = mmPDMA0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_PDMA1_QM:
+ qid_base = GAUDI2_QUEUE_ID_PDMA_1_0;
+ qman_base = mmPDMA1_QM_BASE;
+ break;
+ case GAUDI2_EVENT_ROTATOR0_ROT0_QM:
+ qid_base = GAUDI2_QUEUE_ID_ROT_0_0;
+ qman_base = mmROT0_QM_BASE;
+ break;
+ case GAUDI2_EVENT_ROTATOR1_ROT1_QM:
+ qid_base = GAUDI2_QUEUE_ID_ROT_1_0;
+ qman_base = mmROT1_QM_BASE;
+ break;
+ default:
+ return 0;
+ }
+
+ error_count = gaudi2_handle_qman_err_generic(hdev, event_type, qman_base, qid_base);
+
+ /* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */
+ if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) {
+ error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, index, 0, event_mask);
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev, u16 event_type)
+{
+ u32 i, sts_val, sts_clr_val = 0, error_count = 0;
+
+ sts_val = RREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_STS);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_arc_sei_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ WREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_CLR, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev, u16 event_type)
+{
+ u32 i, sts_val, sts_clr_val = 0, error_count = 0;
+
+ sts_val = RREG32(mmCPU_IF_CPU_SEI_INTR_STS);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_cpu_sei_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ WREG32(mmCPU_IF_CPU_SEI_INTR_CLR, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, u16 event_type,
+ struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
+ u64 *event_mask)
+{
+ u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_ROT_ERR_CAUSE ; i++)
+ if (intr_cause_data & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", guadi2_rot_error_cause[i]);
+ error_count++;
+ }
+
+ /* check if RAZWI happened */
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, rot_index, 0, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, u16 event_type,
+ struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause,
+ u64 *event_mask)
+{
+ u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data);
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_TPC_INTR_CAUSE ; i++)
+ if (intr_cause_data & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "interrupt cause: %s", gaudi2_tpc_interrupts_cause[i]);
+ error_count++;
+ }
+
+ /* check if RAZWI happened */
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, tpc_index, 0, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, u16 event_type,
+ u64 *event_mask)
+{
+ u32 sts_addr, sts_val, sts_clr_val = 0, error_count = 0;
+ int i;
+
+ if (dec_index < NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES)
+ /* DCORE DEC */
+ sts_addr = mmDCORE0_VDEC0_BRDG_CTRL_CAUSE_INTR +
+ DCORE_OFFSET * (dec_index / NUM_OF_DEC_PER_DCORE) +
+ DCORE_VDEC_OFFSET * (dec_index % NUM_OF_DEC_PER_DCORE);
+ else
+ /* PCIE DEC */
+ sts_addr = mmPCIE_VDEC0_BRDG_CTRL_CAUSE_INTR + PCIE_VDEC_OFFSET *
+ (dec_index - NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES);
+
+ sts_val = RREG32(sts_addr);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_DEC_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_dec_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ /* check if RAZWI happened */
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_DEC, dec_index, 0, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ /* Write 1 clear errors */
+ WREG32(sts_addr, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
+ u64 *event_mask)
+{
+ u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
+ int i;
+
+ sts_addr = mmDCORE0_MME_CTRL_LO_INTR_CAUSE + DCORE_OFFSET * mme_index;
+ sts_clr_addr = mmDCORE0_MME_CTRL_LO_INTR_CLEAR + DCORE_OFFSET * mme_index;
+
+ sts_val = RREG32(sts_addr);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_MME_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", guadi2_mme_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ /* check if RAZWI happened */
+ for (i = MME_WRITE ; i < MME_INITIATORS_MAX ; i++)
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, i, event_mask);
+
+ hl_check_for_glbl_errors(hdev);
+
+ WREG32(sts_clr_addr, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data)
+{
+ int i, error_count = 0;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE ; i++)
+ if (intr_cause_data & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", guadi2_mme_sbte_error_cause[i]);
+ error_count++;
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, u16 event_type,
+ u64 *event_mask)
+{
+ u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0;
+ int i;
+
+ sts_addr = mmDCORE0_MME_ACC_INTR_CAUSE + DCORE_OFFSET * mme_index;
+ sts_clr_addr = mmDCORE0_MME_ACC_INTR_CLEAR + DCORE_OFFSET * mme_index;
+
+ sts_val = RREG32(sts_addr);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE ; i++) {
+ if (sts_val & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", guadi2_mme_wap_error_cause[i]);
+ sts_clr_val |= BIT(i);
+ error_count++;
+ }
+ }
+
+ /* check if RAZWI happened on WAP0/1 */
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP0, event_mask);
+ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_MME, mme_index, MME_WAP1, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ WREG32(sts_clr_addr, sts_clr_val);
+
+ return error_count;
+}
+
+static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data)
+{
+ u32 error_count = 0;
+ int i;
+
+ /* If an AXI read or write error is received, an error is reported and
+ * interrupt message is sent. Due to an HW errata, when reading the cause
+ * register of the KDMA engine, the reported error is always HBW even if
+ * the actual error caused by a LBW KDMA transaction.
+ */
+ for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
+ if (intr_cause_data & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_kdma_core_interrupts_cause[i]);
+ error_count++;
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data)
+{
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++)
+ if (intr_cause_data & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_dma_core_interrupts_cause[i]);
+ error_count++;
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask)
+{
+ u32 mstr_if_base_addr = mmPCIE_MSTR_RR_MSTR_IF_RR_SHRD_HBW_BASE, razwi_happened_addr;
+
+ razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED;
+ if (RREG32(razwi_happened_addr)) {
+ gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
+ GAUDI2_ENGINE_ID_PCIE, event_mask);
+ WREG32(razwi_happened_addr, 0x1);
+ }
+
+ razwi_happened_addr = mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED;
+ if (RREG32(razwi_happened_addr)) {
+ gaudi2_razwi_rr_hbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
+ GAUDI2_ENGINE_ID_PCIE, event_mask);
+ WREG32(razwi_happened_addr, 0x1);
+ }
+
+ razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED;
+ if (RREG32(razwi_happened_addr)) {
+ gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, true, "PCIE",
+ GAUDI2_ENGINE_ID_PCIE, event_mask);
+ WREG32(razwi_happened_addr, 0x1);
+ }
+
+ razwi_happened_addr = mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED;
+ if (RREG32(razwi_happened_addr)) {
+ gaudi2_razwi_rr_lbw_shared_printf_info(hdev, mstr_if_base_addr, false, "PCIE",
+ GAUDI2_ENGINE_ID_PCIE, event_mask);
+ WREG32(razwi_happened_addr, 0x1);
+ }
+}
+
+static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data, u64 *event_mask)
+{
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE ; i++) {
+ if (!(intr_cause_data & BIT_ULL(i)))
+ continue;
+
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_pcie_addr_dec_error_cause[i]);
+ error_count++;
+
+ switch (intr_cause_data & BIT_ULL(i)) {
+ case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_AXI_LBW_ERR_INTR_MASK:
+ hl_check_for_glbl_errors(hdev);
+ break;
+ case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_BAD_ACCESS_INTR_MASK:
+ gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(hdev, event_mask);
+ break;
+ }
+ }
+
+ return error_count;
+}
+
+static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data)
+
+{
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE ; i++) {
+ if (intr_cause_data & BIT_ULL(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_pmmu_fatal_interrupts_cause[i]);
+ error_count++;
+ }
+ }
+
+ return error_count;
+}
+
+static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data)
+{
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE ; i++) {
+ if (intr_cause_data & BIT_ULL(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_hif_fatal_interrupts_cause[i]);
+ error_count++;
+ }
+ }
+
+ return error_count;
+}
+
+static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu,
+ u64 *event_mask)
+{
+ u32 valid, val, axid_l, axid_h;
+ u64 addr;
+
+ valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
+
+ if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_PAGE_ERR_VALID_ENTRY_MASK))
+ return;
+
+ val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE));
+ addr = val & DCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA_63_32_MASK;
+ addr <<= 32;
+ addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA));
+
+ axid_l = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_LSB));
+ axid_h = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_MSB));
+
+ dev_err_ratelimited(hdev->dev, "%s page fault on va 0x%llx, transaction id 0x%llX\n",
+ is_pmmu ? "PMMU" : "HMMU", addr, ((u64)axid_h << 32) + axid_l);
+ hl_handle_page_fault(hdev, addr, 0, is_pmmu, event_mask);
+
+ WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE), 0);
+}
+
+static void gaudi2_handle_access_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu)
+{
+ u32 valid, val;
+ u64 addr;
+
+ valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
+
+ if (!(valid & DCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID_ACCESS_ERR_VALID_ENTRY_MASK))
+ return;
+
+ val = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE));
+ addr = val & DCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA_63_32_MASK;
+ addr <<= 32;
+ addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE_VA));
+
+ dev_err_ratelimited(hdev->dev, "%s access error on va 0x%llx\n",
+ is_pmmu ? "PMMU" : "HMMU", addr);
+ WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE), 0);
+}
+
+static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, u16 event_type,
+ u64 mmu_base, bool is_pmmu, u64 *event_mask)
+{
+ u32 spi_sei_cause, interrupt_clr = 0x0, error_count = 0;
+ int i;
+
+ spi_sei_cause = RREG32(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE ; i++) {
+ if (spi_sei_cause & BIT(i)) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s", gaudi2_mmu_spi_sei[i].cause);
+
+ if (i == 0)
+ gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, event_mask);
+ else if (i == 1)
+ gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
+
+ if (gaudi2_mmu_spi_sei[i].clear_bit >= 0)
+ interrupt_clr |= BIT(gaudi2_mmu_spi_sei[i].clear_bit);
+
+ error_count++;
+ }
+ }
+
+ /* Clear cause */
+ WREG32_AND(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET, ~spi_sei_cause);
+
+ /* Clear interrupt */
+ WREG32(mmu_base + MMU_INTERRUPT_CLR_OFFSET, interrupt_clr);
+
+ return error_count;
+}
+
+static int gaudi2_handle_sm_err(struct hl_device *hdev, u16 event_type, u8 sm_index)
+{
+ u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log,
+ cq_intr_addr, cq_intr_val, cq_intr_queue_index, error_count = 0;
+ int i;
+
+ sei_cause_addr = mmDCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE + DCORE_OFFSET * sm_index;
+ cq_intr_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_INTR + DCORE_OFFSET * sm_index;
+
+ sei_cause_val = RREG32(sei_cause_addr);
+ sei_cause_cause = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_CAUSE_MASK, sei_cause_val);
+ cq_intr_val = RREG32(cq_intr_addr);
+
+ /* SEI interrupt */
+ if (sei_cause_cause) {
+ /* There are corresponding SEI_CAUSE_log bits for every SEI_CAUSE_cause bit */
+ sei_cause_log = FIELD_GET(DCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE_LOG_MASK,
+ sei_cause_val);
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_SM_SEI_ERR_CAUSE ; i++) {
+ if (!(sei_cause_cause & BIT(i)))
+ continue;
+
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s. %s: 0x%X\n",
+ gaudi2_sm_sei_cause[i].cause_name,
+ gaudi2_sm_sei_cause[i].log_name,
+ sei_cause_log);
+ error_count++;
+ break;
+ }
+
+ /* Clear SM_SEI_CAUSE */
+ WREG32(sei_cause_addr, 0);
+ }
+
+ /* CQ interrupt */
+ if (cq_intr_val & DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_SEC_INTR_MASK) {
+ cq_intr_queue_index =
+ FIELD_GET(DCORE0_SYNC_MNGR_GLBL_CQ_INTR_CQ_INTR_QUEUE_INDEX_MASK,
+ cq_intr_val);
+
+ dev_err_ratelimited(hdev->dev, "SM%u err. err cause: CQ_INTR. queue index: %u\n",
+ sm_index, cq_intr_queue_index);
+ error_count++;
+
+ /* Clear CQ_INTR */
+ WREG32(cq_intr_addr, 0);
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
+{
+ bool is_pmmu = false;
+ u32 error_count = 0;
+ u64 mmu_base;
+ u8 index;
+
+ switch (event_type) {
+ case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU3_SECURITY_ERROR:
+ index = (event_type - GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM) / 3;
+ mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_3_AXI_ERR_RSP:
+ index = (event_type - GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP);
+ mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU11_SECURITY_ERROR:
+ index = (event_type - GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM) / 3;
+ mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_11_AXI_ERR_RSP:
+ index = (event_type - GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP);
+ mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU4_SECURITY_ERROR:
+ index = (event_type - GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM) / 3;
+ mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_4_AXI_ERR_RSP:
+ index = (event_type - GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP);
+ mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
+ index = (event_type - GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM) / 3;
+ mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
+ index = (event_type - GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP);
+ mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET;
+ break;
+ case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
+ case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
+ is_pmmu = true;
+ mmu_base = mmPMMU_HBW_MMU_BASE;
+ break;
+ default:
+ return 0;
+ }
+
+ error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, event_type, mmu_base,
+ is_pmmu, event_mask);
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+
+/* returns true if hard reset is required (ECC DERR or Read parity), false otherwise (ECC SERR) */
+static bool gaudi2_hbm_sei_handle_read_err(struct hl_device *hdev,
+ struct hl_eq_hbm_sei_read_err_intr_info *rd_err_data, u32 err_cnt)
+{
+ u32 addr, beat, beat_shift;
+ bool rc = false;
+
+ dev_err_ratelimited(hdev->dev,
+ "READ ERROR count: ECC SERR: %d, ECC DERR: %d, RD_PARITY: %d\n",
+ FIELD_GET(HBM_ECC_SERR_CNTR_MASK, err_cnt),
+ FIELD_GET(HBM_ECC_DERR_CNTR_MASK, err_cnt),
+ FIELD_GET(HBM_RD_PARITY_CNTR_MASK, err_cnt));
+
+ addr = le32_to_cpu(rd_err_data->dbg_rd_err_addr.rd_addr_val);
+ dev_err_ratelimited(hdev->dev,
+ "READ ERROR address: sid(%u), bg(%u), ba(%u), col(%u), row(%u)\n",
+ FIELD_GET(HBM_RD_ADDR_SID_MASK, addr),
+ FIELD_GET(HBM_RD_ADDR_BG_MASK, addr),
+ FIELD_GET(HBM_RD_ADDR_BA_MASK, addr),
+ FIELD_GET(HBM_RD_ADDR_COL_MASK, addr),
+ FIELD_GET(HBM_RD_ADDR_ROW_MASK, addr));
+
+ /* For each beat (RDQS edge), look for possible errors and print relevant info */
+ for (beat = 0 ; beat < 4 ; beat++) {
+ if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
+ (HBM_RD_ERR_SERR_BEAT0_MASK << beat))
+ dev_err_ratelimited(hdev->dev, "Beat%d ECC SERR: DM: %#x, Syndrome: %#x\n",
+ beat,
+ le32_to_cpu(rd_err_data->dbg_rd_err_dm),
+ le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
+
+ if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
+ (HBM_RD_ERR_DERR_BEAT0_MASK << beat)) {
+ dev_err_ratelimited(hdev->dev, "Beat%d ECC DERR: DM: %#x, Syndrome: %#x\n",
+ beat,
+ le32_to_cpu(rd_err_data->dbg_rd_err_dm),
+ le32_to_cpu(rd_err_data->dbg_rd_err_syndrome));
+ rc |= true;
+ }
+
+ beat_shift = beat * HBM_RD_ERR_BEAT_SHIFT;
+ if (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
+ (HBM_RD_ERR_PAR_ERR_BEAT0_MASK << beat_shift)) {
+ dev_err_ratelimited(hdev->dev,
+ "Beat%d read PARITY: DM: %#x, PAR data: %#x\n",
+ beat,
+ le32_to_cpu(rd_err_data->dbg_rd_err_dm),
+ (le32_to_cpu(rd_err_data->dbg_rd_err_misc) &
+ (HBM_RD_ERR_PAR_DATA_BEAT0_MASK << beat_shift)) >>
+ (HBM_RD_ERR_PAR_DATA_BEAT0_SHIFT + beat_shift));
+ rc |= true;
+ }
+
+ dev_err_ratelimited(hdev->dev, "Beat%d DQ data:\n", beat);
+ dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
+ le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2]));
+ dev_err_ratelimited(hdev->dev, "\t0x%08x\n",
+ le32_to_cpu(rd_err_data->dbg_rd_err_data[beat * 2 + 1]));
+ }
+
+ return rc;
+}
+
+static void gaudi2_hbm_sei_print_wr_par_info(struct hl_device *hdev,
+ struct hl_eq_hbm_sei_wr_par_intr_info *wr_par_err_data, u32 err_cnt)
+{
+ struct hbm_sei_wr_cmd_address *wr_cmd_addr = wr_par_err_data->dbg_last_wr_cmds;
+ u32 i, curr_addr, derr = wr_par_err_data->dbg_derr;
+
+ dev_err_ratelimited(hdev->dev, "WRITE PARITY ERROR count: %d\n", err_cnt);
+
+ dev_err_ratelimited(hdev->dev, "CK-0 DERR: 0x%02x, CK-1 DERR: 0x%02x\n",
+ derr & 0x3, derr & 0xc);
+
+ /* JIRA H6-3286 - the following prints may not be valid */
+ dev_err_ratelimited(hdev->dev, "Last latched write commands addresses:\n");
+ for (i = 0 ; i < HBM_WR_PAR_CMD_LIFO_LEN ; i++) {
+ curr_addr = le32_to_cpu(wr_cmd_addr[i].dbg_wr_cmd_addr);
+ dev_err_ratelimited(hdev->dev,
+ "\twrite cmd[%u]: Address: SID(%u) BG(%u) BA(%u) COL(%u).\n",
+ i,
+ FIELD_GET(WR_PAR_LAST_CMD_SID_MASK, curr_addr),
+ FIELD_GET(WR_PAR_LAST_CMD_BG_MASK, curr_addr),
+ FIELD_GET(WR_PAR_LAST_CMD_BA_MASK, curr_addr),
+ FIELD_GET(WR_PAR_LAST_CMD_COL_MASK, curr_addr));
+ }
+}
+
+static void gaudi2_hbm_sei_print_ca_par_info(struct hl_device *hdev,
+ struct hl_eq_hbm_sei_ca_par_intr_info *ca_par_err_data, u32 err_cnt)
+{
+ __le32 *col_cmd = ca_par_err_data->dbg_col;
+ __le16 *row_cmd = ca_par_err_data->dbg_row;
+ u32 i;
+
+ dev_err_ratelimited(hdev->dev, "CA ERROR count: %d\n", err_cnt);
+
+ dev_err_ratelimited(hdev->dev, "Last latched C&R bus commands:\n");
+ for (i = 0 ; i < HBM_CA_ERR_CMD_LIFO_LEN ; i++)
+ dev_err_ratelimited(hdev->dev, "cmd%u: ROW(0x%04x) COL(0x%05x)\n", i,
+ le16_to_cpu(row_cmd[i]) & (u16)GENMASK(13, 0),
+ le32_to_cpu(col_cmd[i]) & (u32)GENMASK(17, 0));
+}
+
+/* Returns true if hard reset is needed or false otherwise */
+static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type,
+ struct hl_eq_hbm_sei_data *sei_data)
+{
+ bool require_hard_reset = false;
+ u32 hbm_id, mc_id, cause_idx;
+
+ hbm_id = (event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 4;
+ mc_id = ((event_type - GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE) / 2) % 2;
+
+ cause_idx = sei_data->hdr.sei_cause;
+ if (cause_idx > GAUDI2_NUM_OF_HBM_SEI_CAUSE - 1) {
+ gaudi2_print_event(hdev, event_type, true,
+ "err cause: %s",
+ "Invalid HBM SEI event cause (%d) provided by FW\n", cause_idx);
+ return true;
+ }
+
+ gaudi2_print_event(hdev, event_type, !sei_data->hdr.is_critical,
+ "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n",
+ sei_data->hdr.is_critical ? "Critical" : "Non-critical",
+ hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel,
+ hbm_mc_sei_cause[cause_idx]);
+
+ /* Print error-specific info */
+ switch (cause_idx) {
+ case HBM_SEI_CATTRIP:
+ require_hard_reset = true;
+ break;
+
+ case HBM_SEI_CMD_PARITY_EVEN:
+ gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_even_info,
+ le32_to_cpu(sei_data->hdr.cnt));
+ require_hard_reset = true;
+ break;
+
+ case HBM_SEI_CMD_PARITY_ODD:
+ gaudi2_hbm_sei_print_ca_par_info(hdev, &sei_data->ca_parity_odd_info,
+ le32_to_cpu(sei_data->hdr.cnt));
+ require_hard_reset = true;
+ break;
+
+ case HBM_SEI_WRITE_DATA_PARITY_ERR:
+ gaudi2_hbm_sei_print_wr_par_info(hdev, &sei_data->wr_parity_info,
+ le32_to_cpu(sei_data->hdr.cnt));
+ require_hard_reset = true;
+ break;
+
+ case HBM_SEI_READ_ERR:
+ /* Unlike other SEI events, read error requires further processing of the
+ * raw data in order to determine the root cause.
+ */
+ require_hard_reset = gaudi2_hbm_sei_handle_read_err(hdev,
+ &sei_data->read_err_info,
+ le32_to_cpu(sei_data->hdr.cnt));
+ break;
+
+ default:
+ break;
+ }
+
+ require_hard_reset |= !!sei_data->hdr.is_critical;
+
+ return require_hard_reset;
+}
+
+static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u16 event_type,
+ u64 intr_cause_data)
+{
+ if (intr_cause_data) {
+ gaudi2_print_event(hdev, event_type, true,
+ "temperature error cause: %#llx", intr_cause_data);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data)
+{
+ u32 i, error_count = 0;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE ; i++)
+ if (intr_cause_data & hbm_mc_spi[i].mask) {
+ dev_dbg(hdev->dev, "HBM spi event: notification cause(%s)\n",
+ hbm_mc_spi[i].cause);
+ error_count++;
+ }
+
+ return error_count;
+}
+
+static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask)
+{
+ ktime_t zero_time = ktime_set(0, 0);
+
+ mutex_lock(&hdev->clk_throttling.lock);
+
+ switch (event_type) {
+ case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
+ dev_dbg_ratelimited(hdev->dev, "Clock throttling due to power consumption\n");
+ break;
+
+ case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
+ dev_dbg_ratelimited(hdev->dev, "Power envelop is safe, back to optimal clock\n");
+ break;
+
+ case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
+ *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ dev_info_ratelimited(hdev->dev, "Clock throttling due to overheating\n");
+ break;
+
+ case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
+ *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ dev_info_ratelimited(hdev->dev, "Thermal envelop is safe, back to optimal clock\n");
+ break;
+
+ default:
+ dev_err(hdev->dev, "Received invalid clock change event %d\n", event_type);
+ break;
+ }
+
+ mutex_unlock(&hdev->clk_throttling.lock);
+}
+
+static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, u16 event_type,
+ struct cpucp_pkt_sync_err *sync_err)
+{
+ struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
+
+ gaudi2_print_event(hdev, event_type, false,
+ "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
+ le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci),
+ q->pi, atomic_read(&q->ci));
+}
+
+static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type)
+{
+ u32 p2p_intr, msix_gw_intr, error_count = 0;
+
+ p2p_intr = RREG32(mmPCIE_WRAP_P2P_INTR);
+ msix_gw_intr = RREG32(mmPCIE_WRAP_MSIX_GW_INTR);
+
+ if (p2p_intr) {
+ gaudi2_print_event(hdev, event_type, true,
+ "pcie p2p transaction terminated due to security, req_id(0x%x)\n",
+ RREG32(mmPCIE_WRAP_P2P_REQ_ID));
+
+ WREG32(mmPCIE_WRAP_P2P_INTR, 0x1);
+ error_count++;
+ }
+
+ if (msix_gw_intr) {
+ gaudi2_print_event(hdev, event_type, true,
+ "pcie msi-x gen denied due to vector num check failure, vec(0x%X)\n",
+ RREG32(mmPCIE_WRAP_MSIX_GW_VEC));
+
+ WREG32(mmPCIE_WRAP_MSIX_GW_INTR, 0x1);
+ error_count++;
+ }
+
+ return error_count;
+}
+
+static int gaudi2_handle_pcie_drain(struct hl_device *hdev,
+ struct hl_eq_pcie_drain_ind_data *drain_data)
+{
+ u64 lbw_rd, lbw_wr, hbw_rd, hbw_wr, cause, error_count = 0;
+
+ cause = le64_to_cpu(drain_data->intr_cause.intr_cause_data);
+ lbw_rd = le64_to_cpu(drain_data->drain_rd_addr_lbw);
+ lbw_wr = le64_to_cpu(drain_data->drain_wr_addr_lbw);
+ hbw_rd = le64_to_cpu(drain_data->drain_rd_addr_hbw);
+ hbw_wr = le64_to_cpu(drain_data->drain_wr_addr_hbw);
+
+ if (cause & BIT_ULL(0)) {
+ dev_err_ratelimited(hdev->dev,
+ "PCIE AXI drain LBW completed, read_err %u, write_err %u\n",
+ !!lbw_rd, !!lbw_wr);
+ error_count++;
+ }
+
+ if (cause & BIT_ULL(1)) {
+ dev_err_ratelimited(hdev->dev,
+ "PCIE AXI drain HBW completed, raddr %#llx, waddr %#llx\n",
+ hbw_rd, hbw_wr);
+ error_count++;
+ }
+
+ return error_count;
+}
+
+static int gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data)
+{
+ u32 error_count = 0;
+ int i;
+
+ for (i = 0 ; i < GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE ; i++) {
+ if (intr_cause_data & BIT_ULL(i)) {
+ dev_err_ratelimited(hdev->dev, "PSOC %s completed\n",
+ gaudi2_psoc_axi_drain_interrupts_cause[i]);
+ error_count++;
+ }
+ }
+
+ hl_check_for_glbl_errors(hdev);
+
+ return error_count;
+}
+
+static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, u16 event_type,
+ struct cpucp_pkt_sync_err *sync_err)
+{
+ struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ];
+
+ gaudi2_print_event(hdev, event_type, false,
+ "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
+ le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
+}
+
+static int hl_arc_event_handle(struct hl_device *hdev, u16 event_type,
+ struct hl_eq_engine_arc_intr_data *data)
+{
+ struct hl_engine_arc_dccm_queue_full_irq *q;
+ u32 intr_type, engine_id;
+ u64 payload;
+
+ intr_type = le32_to_cpu(data->intr_type);
+ engine_id = le32_to_cpu(data->engine_id);
+ payload = le64_to_cpu(data->payload);
+
+ switch (intr_type) {
+ case ENGINE_ARC_DCCM_QUEUE_FULL_IRQ:
+ q = (struct hl_engine_arc_dccm_queue_full_irq *) &payload;
+
+ gaudi2_print_event(hdev, event_type, true,
+ "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n",
+ engine_id, intr_type, q->queue_index);
+ return 1;
+ default:
+ gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type\n");
+ return 0;
+ }
+}
+
+static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ bool reset_required = false, is_critical = false;
+ u32 index, ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0;
+ u64 event_mask = 0;
+ u16 event_type;
+
+ ctl = le32_to_cpu(eq_entry->hdr.ctl);
+ event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK) >> EQ_CTL_EVENT_TYPE_SHIFT);
+
+ if (event_type >= GAUDI2_EVENT_SIZE) {
+ dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
+ event_type, GAUDI2_EVENT_SIZE - 1);
+ return;
+ }
+
+ gaudi2->events_stat[event_type]++;
+ gaudi2->events_stat_aggregate[event_type]++;
+
+ switch (event_type) {
+ case GAUDI2_EVENT_PCIE_CORE_SERR ... GAUDI2_EVENT_ARC0_ECC_DERR:
+ fallthrough;
+ case GAUDI2_EVENT_ROTATOR0_SERR ... GAUDI2_EVENT_ROTATOR1_DERR:
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ reset_required = gaudi2_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+ is_critical = eq_entry->ecc_data.is_critical;
+ error_count++;
+ break;
+
+ case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_PDMA1_QM:
+ fallthrough;
+ case GAUDI2_EVENT_ROTATOR0_ROT0_QM ... GAUDI2_EVENT_ROTATOR1_ROT1_QM:
+ fallthrough;
+ case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1:
+ error_count = gaudi2_handle_qman_err(hdev, event_type, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_ARC_AXI_ERROR_RESPONSE_0:
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ error_count = gaudi2_handle_arc_farm_sei_err(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_AXI_ERR_RSP:
+ error_count = gaudi2_handle_cpu_sei_err(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP:
+ case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP:
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ error_count = gaudi2_handle_qm_sei_err(hdev, event_type, true, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE:
+ index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE;
+ error_count = gaudi2_handle_rot_err(hdev, index, event_type,
+ &eq_entry->razwi_with_intr_cause, &event_mask);
+ error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC24_AXI_ERR_RSP:
+ index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP;
+ error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
+ &eq_entry->razwi_with_intr_cause, &event_mask);
+ error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE ... GAUDI2_EVENT_DEC9_AXI_ERR_RSPONSE:
+ index = event_type - GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE;
+ error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_TPC0_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC1_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC2_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC3_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC4_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC5_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC6_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC7_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC8_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC9_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC10_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC11_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC12_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC13_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC14_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC15_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC16_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC17_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC18_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC19_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC20_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC21_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC22_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC23_KERNEL_ERR:
+ case GAUDI2_EVENT_TPC24_KERNEL_ERR:
+ index = (event_type - GAUDI2_EVENT_TPC0_KERNEL_ERR) /
+ (GAUDI2_EVENT_TPC1_KERNEL_ERR - GAUDI2_EVENT_TPC0_KERNEL_ERR);
+ error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type,
+ &eq_entry->razwi_with_intr_cause, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_DEC0_SPI:
+ case GAUDI2_EVENT_DEC1_SPI:
+ case GAUDI2_EVENT_DEC2_SPI:
+ case GAUDI2_EVENT_DEC3_SPI:
+ case GAUDI2_EVENT_DEC4_SPI:
+ case GAUDI2_EVENT_DEC5_SPI:
+ case GAUDI2_EVENT_DEC6_SPI:
+ case GAUDI2_EVENT_DEC7_SPI:
+ case GAUDI2_EVENT_DEC8_SPI:
+ case GAUDI2_EVENT_DEC9_SPI:
+ index = (event_type - GAUDI2_EVENT_DEC0_SPI) /
+ (GAUDI2_EVENT_DEC1_SPI - GAUDI2_EVENT_DEC0_SPI);
+ error_count = gaudi2_handle_dec_err(hdev, index, event_type, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME2_CTRL_AXI_ERROR_RESPONSE:
+ case GAUDI2_EVENT_MME3_CTRL_AXI_ERROR_RESPONSE:
+ index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) /
+ (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE -
+ GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE);
+ error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
+ error_count += gaudi2_handle_qm_sei_err(hdev, event_type, false, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_MME0_QMAN_SW_ERROR:
+ case GAUDI2_EVENT_MME1_QMAN_SW_ERROR:
+ case GAUDI2_EVENT_MME2_QMAN_SW_ERROR:
+ case GAUDI2_EVENT_MME3_QMAN_SW_ERROR:
+ index = (event_type - GAUDI2_EVENT_MME0_QMAN_SW_ERROR) /
+ (GAUDI2_EVENT_MME1_QMAN_SW_ERROR -
+ GAUDI2_EVENT_MME0_QMAN_SW_ERROR);
+ error_count = gaudi2_handle_mme_err(hdev, index, event_type, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID:
+ case GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID:
+ case GAUDI2_EVENT_MME2_WAP_SOURCE_RESULT_INVALID:
+ case GAUDI2_EVENT_MME3_WAP_SOURCE_RESULT_INVALID:
+ index = (event_type - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID) /
+ (GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID -
+ GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID);
+ error_count = gaudi2_handle_mme_wap_err(hdev, index, event_type, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP:
+ case GAUDI2_EVENT_KDMA0_CORE:
+ error_count = gaudi2_handle_kdma_core_event(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE:
+ error_count = gaudi2_handle_dma_core_event(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR:
+ error_count = gaudi2_print_pcie_addr_dec_info(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data), &event_mask);
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
+ case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP:
+ case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR:
+ case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0:
+ error_count = gaudi2_handle_mmu_spi_sei_err(hdev, event_type, &event_mask);
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_HIF0_FATAL ... GAUDI2_EVENT_HIF12_FATAL:
+ error_count = gaudi2_handle_hif_fatal(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PMMU_FATAL_0:
+ error_count = gaudi2_handle_pif_fatal(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PSOC63_RAZWI_OR_PID_MIN_MAX_INTERRUPT:
+ error_count = gaudi2_ack_psoc_razwi_event_handler(hdev, &event_mask);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_HBM0_MC0_SEI_SEVERE ... GAUDI2_EVENT_HBM5_MC1_SEI_NON_SEVERE:
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ if (gaudi2_handle_hbm_mc_sei_err(hdev, event_type, &eq_entry->sei_data)) {
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ reset_required = true;
+ }
+ error_count++;
+ break;
+
+ case GAUDI2_EVENT_HBM_CATTRIP_0 ... GAUDI2_EVENT_HBM_CATTRIP_5:
+ error_count = gaudi2_handle_hbm_cattrip(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_HBM0_MC0_SPI ... GAUDI2_EVENT_HBM5_MC1_SPI:
+ error_count = gaudi2_handle_hbm_mc_spi(hdev,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PCIE_DRAIN_COMPLETE:
+ error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data);
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN:
+ error_count = gaudi2_handle_psoc_drain(hdev,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_AXI_ECC:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_CPU_L2_RAM_ECC:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME0_SBTE4_AXI_ERR_RSP:
+ case GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME1_SBTE4_AXI_ERR_RSP:
+ case GAUDI2_EVENT_MME2_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME2_SBTE4_AXI_ERR_RSP:
+ case GAUDI2_EVENT_MME3_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME3_SBTE4_AXI_ERR_RSP:
+ error_count = gaudi2_handle_mme_sbte_err(hdev, event_type,
+ le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+ case GAUDI2_EVENT_VM0_ALARM_A ... GAUDI2_EVENT_VM3_ALARM_B:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_PSOC_AXI_ERR_RSP:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_PSOC_PRSTN_FALL:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_PCIE_APB_TIMEOUT:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_PCIE_FATAL_ERR:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_TPC0_BMON_SPMU:
+ case GAUDI2_EVENT_TPC1_BMON_SPMU:
+ case GAUDI2_EVENT_TPC2_BMON_SPMU:
+ case GAUDI2_EVENT_TPC3_BMON_SPMU:
+ case GAUDI2_EVENT_TPC4_BMON_SPMU:
+ case GAUDI2_EVENT_TPC5_BMON_SPMU:
+ case GAUDI2_EVENT_TPC6_BMON_SPMU:
+ case GAUDI2_EVENT_TPC7_BMON_SPMU:
+ case GAUDI2_EVENT_TPC8_BMON_SPMU:
+ case GAUDI2_EVENT_TPC9_BMON_SPMU:
+ case GAUDI2_EVENT_TPC10_BMON_SPMU:
+ case GAUDI2_EVENT_TPC11_BMON_SPMU:
+ case GAUDI2_EVENT_TPC12_BMON_SPMU:
+ case GAUDI2_EVENT_TPC13_BMON_SPMU:
+ case GAUDI2_EVENT_TPC14_BMON_SPMU:
+ case GAUDI2_EVENT_TPC15_BMON_SPMU:
+ case GAUDI2_EVENT_TPC16_BMON_SPMU:
+ case GAUDI2_EVENT_TPC17_BMON_SPMU:
+ case GAUDI2_EVENT_TPC18_BMON_SPMU:
+ case GAUDI2_EVENT_TPC19_BMON_SPMU:
+ case GAUDI2_EVENT_TPC20_BMON_SPMU:
+ case GAUDI2_EVENT_TPC21_BMON_SPMU:
+ case GAUDI2_EVENT_TPC22_BMON_SPMU:
+ case GAUDI2_EVENT_TPC23_BMON_SPMU:
+ case GAUDI2_EVENT_TPC24_BMON_SPMU:
+ case GAUDI2_EVENT_MME0_CTRL_BMON_SPMU:
+ case GAUDI2_EVENT_MME0_SBTE_BMON_SPMU:
+ case GAUDI2_EVENT_MME0_WAP_BMON_SPMU:
+ case GAUDI2_EVENT_MME1_CTRL_BMON_SPMU:
+ case GAUDI2_EVENT_MME1_SBTE_BMON_SPMU:
+ case GAUDI2_EVENT_MME1_WAP_BMON_SPMU:
+ case GAUDI2_EVENT_MME2_CTRL_BMON_SPMU:
+ case GAUDI2_EVENT_MME2_SBTE_BMON_SPMU:
+ case GAUDI2_EVENT_MME2_WAP_BMON_SPMU:
+ case GAUDI2_EVENT_MME3_CTRL_BMON_SPMU:
+ case GAUDI2_EVENT_MME3_SBTE_BMON_SPMU:
+ case GAUDI2_EVENT_MME3_WAP_BMON_SPMU:
+ case GAUDI2_EVENT_HDMA2_BM_SPMU ... GAUDI2_EVENT_PDMA1_BM_SPMU:
+ fallthrough;
+ case GAUDI2_EVENT_DEC0_BMON_SPMU:
+ case GAUDI2_EVENT_DEC1_BMON_SPMU:
+ case GAUDI2_EVENT_DEC2_BMON_SPMU:
+ case GAUDI2_EVENT_DEC3_BMON_SPMU:
+ case GAUDI2_EVENT_DEC4_BMON_SPMU:
+ case GAUDI2_EVENT_DEC5_BMON_SPMU:
+ case GAUDI2_EVENT_DEC6_BMON_SPMU:
+ case GAUDI2_EVENT_DEC7_BMON_SPMU:
+ case GAUDI2_EVENT_DEC8_BMON_SPMU:
+ case GAUDI2_EVENT_DEC9_BMON_SPMU:
+ case GAUDI2_EVENT_ROTATOR0_BMON_SPMU ... GAUDI2_EVENT_SM3_BMON_SPMU:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S:
+ case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E:
+ case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S:
+ case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E:
+ gaudi2_print_clk_change_info(hdev, event_type, &event_mask);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ break;
+
+ case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC:
+ gaudi2_print_out_of_sync_info(hdev, event_type, &eq_entry->pkt_sync_err);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_PCIE_FLR_REQUESTED:
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ /* Do nothing- FW will handle it */
+ break;
+
+ case GAUDI2_EVENT_PCIE_P2P_MSIX:
+ error_count = gaudi2_handle_pcie_p2p_msix(hdev, event_type);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_SM3_AXI_ERROR_RESPONSE:
+ index = event_type - GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE;
+ error_count = gaudi2_handle_sm_err(hdev, event_type, index);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_PSOC_MME_PLL_LOCK_ERR ... GAUDI2_EVENT_DCORE2_HBM_PLL_LOCK_ERR:
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE:
+ dev_info(hdev->dev, "CPLD shutdown cause, reset reason: 0x%llx\n",
+ le64_to_cpu(eq_entry->data[0]));
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+ case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_EVENT:
+ dev_err(hdev->dev, "CPLD shutdown event, reset reason: 0x%llx\n",
+ le64_to_cpu(eq_entry->data[0]));
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_PKT_SANITY_FAILED:
+ gaudi2_print_cpu_pkt_failure_info(hdev, event_type, &eq_entry->pkt_sync_err);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ break;
+
+ case GAUDI2_EVENT_ARC_DCCM_FULL:
+ error_count = hl_arc_event_handle(hdev, event_type, &eq_entry->arc_data);
+ event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
+ break;
+
+ case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED:
+ case GAUDI2_EVENT_DEV_RESET_REQ:
+ event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ is_critical = true;
+ break;
+
+ default:
+ if (gaudi2_irq_map_table[event_type].valid) {
+ dev_err_ratelimited(hdev->dev, "Cannot find handler for event %d\n",
+ event_type);
+ error_count = GAUDI2_NA_EVENT_CAUSE;
+ }
+ }
+
+ /* Make sure to dump an error in case no error cause was printed so far.
+ * Note that although we have counted the errors, we use this number as
+ * a boolean.
+ */
+ if (error_count == GAUDI2_NA_EVENT_CAUSE && !is_info_event(event_type))
+ gaudi2_print_event(hdev, event_type, true, "%d", event_type);
+ else if (error_count == 0)
+ gaudi2_print_event(hdev, event_type, true,
+ "No error cause for H/W event %u\n", event_type);
+
+ if ((gaudi2_irq_map_table[event_type].reset || reset_required) &&
+ (hdev->hard_reset_on_fw_events ||
+ (hdev->asic_prop.fw_security_enabled && is_critical)))
+ goto reset_device;
+
+ /* Send unmask irq only for interrupts not classified as MSG */
+ if (!gaudi2_irq_map_table[event_type].msg)
+ hl_fw_unmask_irq(hdev, event_type);
+
+ if (event_mask)
+ hl_notifier_event_send_all(hdev, event_mask);
+
+ return;
+
+reset_device:
+ if (hdev->asic_prop.fw_security_enabled && is_critical) {
+ reset_flags |= HL_DRV_RESET_BYPASS_REQ_TO_FW;
+ event_mask |= HL_NOTIFIER_EVENT_DEVICE_UNAVAILABLE;
+ } else {
+ reset_flags |= HL_DRV_RESET_DELAY;
+ }
+ event_mask |= HL_NOTIFIER_EVENT_DEVICE_RESET;
+ hl_device_cond_reset(hdev, reset_flags, event_mask);
+}
+
+static int gaudi2_memset_memory_chunk_using_edma_qm(struct hl_device *hdev,
+ struct packet_lin_dma *lin_dma_pkt, dma_addr_t pkt_dma_addr,
+ u32 hw_queue_id, u32 size, u64 addr, u32 val)
+{
+ u32 ctl, pkt_size;
+ int rc = 0;
+
+ ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_LIN_DMA);
+ ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_MEMSET_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI2_PKT_LIN_DMA_CTL_WRCOMP_MASK, 1);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 1);
+
+ lin_dma_pkt->ctl = cpu_to_le32(ctl);
+ lin_dma_pkt->src_addr = cpu_to_le64(val);
+ lin_dma_pkt->dst_addr = cpu_to_le64(addr);
+ lin_dma_pkt->tsize = cpu_to_le32(size);
+
+ pkt_size = sizeof(struct packet_lin_dma);
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, pkt_size, pkt_dma_addr);
+ if (rc)
+ dev_err(hdev->dev, "Failed to send lin dma packet to H/W queue %d\n",
+ hw_queue_id);
+
+ return rc;
+}
+
+static int gaudi2_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size, u64 val)
+{
+ u32 edma_queues_id[] = {GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0,
+ GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0};
+ u32 chunk_size, dcore, edma_idx, sob_offset, sob_addr, comp_val,
+ old_mmubp, mmubp, num_of_pkts, busy, pkt_size;
+ u64 comp_addr, cur_addr = addr, end_addr = addr + size;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ void *lin_dma_pkts_arr;
+ dma_addr_t pkt_dma_addr;
+ int rc = 0, dma_num = 0;
+
+ if (prop->edma_enabled_mask == 0) {
+ dev_info(hdev->dev, "non of the EDMA engines is enabled - skip dram scrubbing\n");
+ return -EIO;
+ }
+
+ sob_offset = hdev->asic_prop.first_available_user_sob[0] * 4;
+ sob_addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + sob_offset;
+ comp_addr = CFG_BASE + sob_addr;
+ comp_val = FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_INC_MASK, 1) |
+ FIELD_PREP(DCORE0_SYNC_MNGR_OBJS_SOB_OBJ_VAL_MASK, 1);
+ mmubp = FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_WR_MASK, 1) |
+ FIELD_PREP(ARC_FARM_KDMA_CTX_AXUSER_HB_MMU_BP_RD_MASK, 1);
+
+ /* Calculate how many lin dma pkts we'll need */
+ num_of_pkts = div64_u64(round_up(size, SZ_2G), SZ_2G);
+ pkt_size = sizeof(struct packet_lin_dma);
+
+ lin_dma_pkts_arr = hl_asic_dma_alloc_coherent(hdev, pkt_size * num_of_pkts,
+ &pkt_dma_addr, GFP_KERNEL);
+ if (!lin_dma_pkts_arr)
+ return -ENOMEM;
+
+ /*
+ * set mmu bypass for the scrubbing - all ddmas are configured the same so save
+ * only the first one to restore later
+ * also set the sob addr for all edma cores for completion.
+ * set QM as trusted to allow it to access physical address with MMU bp.
+ */
+ old_mmubp = RREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP);
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
+ u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
+ u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
+
+ if (!(prop->edma_enabled_mask & BIT(edma_bit)))
+ continue;
+
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP +
+ edma_offset, mmubp);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset,
+ lower_32_bits(comp_addr));
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset,
+ upper_32_bits(comp_addr));
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset,
+ comp_val);
+ gaudi2_qman_set_test_mode(hdev,
+ edma_queues_id[dcore] + 4 * edma_idx, true);
+ }
+ }
+
+ WREG32(sob_addr, 0);
+
+ while (cur_addr < end_addr) {
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
+ u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
+
+ if (!(prop->edma_enabled_mask & BIT(edma_bit)))
+ continue;
+
+ chunk_size = min_t(u64, SZ_2G, end_addr - cur_addr);
+
+ rc = gaudi2_memset_memory_chunk_using_edma_qm(hdev,
+ (struct packet_lin_dma *)lin_dma_pkts_arr + dma_num,
+ pkt_dma_addr + dma_num * pkt_size,
+ edma_queues_id[dcore] + edma_idx * 4,
+ chunk_size, cur_addr, val);
+ if (rc)
+ goto end;
+
+ dma_num++;
+ cur_addr += chunk_size;
+ if (cur_addr == end_addr)
+ break;
+ }
+ }
+ }
+
+ rc = hl_poll_timeout(hdev, sob_addr, busy, (busy == dma_num), 1000, 1000000);
+ if (rc) {
+ dev_err(hdev->dev, "DMA Timeout during HBM scrubbing\n");
+ goto end;
+ }
+end:
+ for (dcore = 0 ; dcore < NUM_OF_DCORES ; dcore++) {
+ for (edma_idx = 0 ; edma_idx < NUM_OF_EDMA_PER_DCORE ; edma_idx++) {
+ u32 edma_offset = dcore * DCORE_OFFSET + edma_idx * DCORE_EDMA_OFFSET;
+ u32 edma_bit = dcore * NUM_OF_EDMA_PER_DCORE + edma_idx;
+
+ if (!(prop->edma_enabled_mask & BIT(edma_bit)))
+ continue;
+
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_AXUSER_HB_MMU_BP + edma_offset, old_mmubp);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_LO + edma_offset, 0);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_ADDR_HI + edma_offset, 0);
+ WREG32(mmDCORE0_EDMA0_CORE_CTX_WR_COMP_WDATA + edma_offset, 0);
+ gaudi2_qman_set_test_mode(hdev,
+ edma_queues_id[dcore] + 4 * edma_idx, false);
+ }
+ }
+
+ WREG32(sob_addr, 0);
+ hl_asic_dma_free_coherent(hdev, pkt_size * num_of_pkts, lin_dma_pkts_arr, pkt_dma_addr);
+
+ return rc;
+}
+
+static int gaudi2_scrub_device_dram(struct hl_device *hdev, u64 val)
+{
+ int rc;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 size = prop->dram_end_address - prop->dram_user_base_address;
+
+ rc = gaudi2_memset_device_memory(hdev, prop->dram_user_base_address, size, val);
+
+ if (rc)
+ dev_err(hdev->dev, "Failed to scrub dram, address: 0x%llx size: %llu\n",
+ prop->dram_user_base_address, size);
+ return rc;
+}
+
+static int gaudi2_scrub_device_mem(struct hl_device *hdev)
+{
+ int rc;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 val = hdev->memory_scrub_val;
+ u64 addr, size;
+
+ if (!hdev->memory_scrub)
+ return 0;
+
+ /* scrub SRAM */
+ addr = prop->sram_user_base_address;
+ size = hdev->pldm ? 0x10000 : (prop->sram_size - SRAM_USER_BASE_OFFSET);
+ dev_dbg(hdev->dev, "Scrubbing SRAM: 0x%09llx - 0x%09llx, val: 0x%llx\n",
+ addr, addr + size, val);
+ rc = gaudi2_memset_device_memory(hdev, addr, size, val);
+ if (rc) {
+ dev_err(hdev->dev, "scrubbing SRAM failed (%d)\n", rc);
+ return rc;
+ }
+
+ /* scrub DRAM */
+ rc = gaudi2_scrub_device_dram(hdev, val);
+ if (rc) {
+ dev_err(hdev->dev, "scrubbing DRAM failed (%d)\n", rc);
+ return rc;
+ }
+ return 0;
+}
+
+static void gaudi2_restore_user_sm_registers(struct hl_device *hdev)
+{
+ u64 addr, mon_sts_addr, mon_cfg_addr, cq_lbw_l_addr, cq_lbw_h_addr,
+ cq_lbw_data_addr, cq_base_l_addr, cq_base_h_addr, cq_size_addr;
+ u32 val, size, offset;
+ int dcore_id;
+
+ offset = hdev->asic_prop.first_available_cq[0] * 4;
+ cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset;
+ cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + offset;
+ cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + offset;
+ cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + offset;
+ cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + offset;
+ cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + offset;
+ size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 -
+ (mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + offset);
+
+ /* memset dcore0 CQ registers */
+ gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
+
+ cq_lbw_l_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0 + DCORE_OFFSET;
+ cq_lbw_h_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 + DCORE_OFFSET;
+ cq_lbw_data_addr = mmDCORE0_SYNC_MNGR_GLBL_LBW_DATA_0 + DCORE_OFFSET;
+ cq_base_l_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_L_0 + DCORE_OFFSET;
+ cq_base_h_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_BASE_ADDR_H_0 + DCORE_OFFSET;
+ cq_size_addr = mmDCORE0_SYNC_MNGR_GLBL_CQ_SIZE_LOG2_0 + DCORE_OFFSET;
+ size = mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_H_0 - mmDCORE0_SYNC_MNGR_GLBL_LBW_ADDR_L_0;
+
+ for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
+ gaudi2_memset_device_lbw(hdev, cq_lbw_l_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_lbw_h_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_lbw_data_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_base_l_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_base_h_addr, size, 0);
+ gaudi2_memset_device_lbw(hdev, cq_size_addr, size, 0);
+
+ cq_lbw_l_addr += DCORE_OFFSET;
+ cq_lbw_h_addr += DCORE_OFFSET;
+ cq_lbw_data_addr += DCORE_OFFSET;
+ cq_base_l_addr += DCORE_OFFSET;
+ cq_base_h_addr += DCORE_OFFSET;
+ cq_size_addr += DCORE_OFFSET;
+ }
+
+ offset = hdev->asic_prop.first_available_user_mon[0] * 4;
+ addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset;
+ val = 1 << DCORE0_SYNC_MNGR_OBJS_MON_STATUS_PROT_SHIFT;
+ size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - (mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + offset);
+
+ /* memset dcore0 monitors */
+ gaudi2_memset_device_lbw(hdev, addr, size, val);
+
+ addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + offset;
+ gaudi2_memset_device_lbw(hdev, addr, size, 0);
+
+ mon_sts_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0 + DCORE_OFFSET;
+ mon_cfg_addr = mmDCORE0_SYNC_MNGR_OBJS_MON_CONFIG_0 + DCORE_OFFSET;
+ size = mmDCORE0_SYNC_MNGR_OBJS_SM_SEC_0 - mmDCORE0_SYNC_MNGR_OBJS_MON_STATUS_0;
+
+ for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
+ gaudi2_memset_device_lbw(hdev, mon_sts_addr, size, val);
+ gaudi2_memset_device_lbw(hdev, mon_cfg_addr, size, 0);
+ mon_sts_addr += DCORE_OFFSET;
+ mon_cfg_addr += DCORE_OFFSET;
+ }
+
+ offset = hdev->asic_prop.first_available_user_sob[0] * 4;
+ addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset;
+ val = 0;
+ size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 -
+ (mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
+
+ /* memset dcore0 sobs */
+ gaudi2_memset_device_lbw(hdev, addr, size, val);
+
+ addr = mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + DCORE_OFFSET;
+ size = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 - mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0;
+
+ for (dcore_id = 1 ; dcore_id < NUM_OF_DCORES ; dcore_id++) {
+ gaudi2_memset_device_lbw(hdev, addr, size, val);
+ addr += DCORE_OFFSET;
+ }
+
+ /* Flush all WREG to prevent race */
+ val = RREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + offset);
+}
+
+static void gaudi2_restore_user_qm_registers(struct hl_device *hdev)
+{
+ u32 reg_base, hw_queue_id;
+
+ for (hw_queue_id = GAUDI2_QUEUE_ID_PDMA_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_ROT_1_0;
+ hw_queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
+ continue;
+
+ gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
+
+ reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
+ WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
+ }
+
+ /* Flush all WREG to prevent race */
+ RREG32(mmPDMA0_QM_ARB_CFG_0);
+}
+
+static void gaudi2_restore_nic_qm_registers(struct hl_device *hdev)
+{
+ u32 reg_base, hw_queue_id;
+
+ for (hw_queue_id = GAUDI2_QUEUE_ID_NIC_0_0 ; hw_queue_id <= GAUDI2_QUEUE_ID_NIC_23_3;
+ hw_queue_id += NUM_OF_PQ_PER_QMAN) {
+ if (!gaudi2_is_queue_enabled(hdev, hw_queue_id))
+ continue;
+
+ gaudi2_clear_qm_fence_counters_common(hdev, hw_queue_id, false);
+
+ reg_base = gaudi2_qm_blocks_bases[hw_queue_id];
+ WREG32(reg_base + QM_ARB_CFG_0_OFFSET, 0);
+ }
+
+ /* Flush all WREG to prevent race */
+ RREG32(mmPDMA0_QM_ARB_CFG_0);
+}
+
+static int gaudi2_context_switch(struct hl_device *hdev, u32 asid)
+{
+ return 0;
+}
+
+static void gaudi2_restore_phase_topology(struct hl_device *hdev)
+{
+}
+
+static void gaudi2_init_block_instances(struct hl_device *hdev, u32 block_idx,
+ struct dup_block_ctx *cfg_ctx)
+{
+ u64 block_base = cfg_ctx->base + block_idx * cfg_ctx->block_off;
+ u8 seq;
+ int i;
+
+ for (i = 0 ; i < cfg_ctx->instances ; i++) {
+ seq = block_idx * cfg_ctx->instances + i;
+
+ /* skip disabled instance */
+ if (!(cfg_ctx->enabled_mask & BIT_ULL(seq)))
+ continue;
+
+ cfg_ctx->instance_cfg_fn(hdev, block_base + i * cfg_ctx->instance_off,
+ cfg_ctx->data);
+ }
+}
+
+static void gaudi2_init_blocks_with_mask(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx,
+ u64 mask)
+{
+ int i;
+
+ cfg_ctx->enabled_mask = mask;
+
+ for (i = 0 ; i < cfg_ctx->blocks ; i++)
+ gaudi2_init_block_instances(hdev, i, cfg_ctx);
+}
+
+void gaudi2_init_blocks(struct hl_device *hdev, struct dup_block_ctx *cfg_ctx)
+{
+ gaudi2_init_blocks_with_mask(hdev, cfg_ctx, U64_MAX);
+}
+
+static int gaudi2_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
+{
+ void *host_mem_virtual_addr;
+ dma_addr_t host_mem_dma_addr;
+ u64 reserved_va_base;
+ u32 pos, size_left, size_to_dma;
+ struct hl_ctx *ctx;
+ int rc = 0;
+
+ /* Fetch the ctx */
+ ctx = hl_get_compute_ctx(hdev);
+ if (!ctx) {
+ dev_err(hdev->dev, "No ctx available\n");
+ return -EINVAL;
+ }
+
+ /* Allocate buffers for read and for poll */
+ host_mem_virtual_addr = hl_asic_dma_alloc_coherent(hdev, SZ_2M, &host_mem_dma_addr,
+ GFP_KERNEL | __GFP_ZERO);
+ if (host_mem_virtual_addr == NULL) {
+ dev_err(hdev->dev, "Failed to allocate memory for KDMA read\n");
+ rc = -ENOMEM;
+ goto put_ctx;
+ }
+
+ /* Reserve VM region on asic side */
+ reserved_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST, SZ_2M,
+ HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
+ if (!reserved_va_base) {
+ dev_err(hdev->dev, "Failed to reserve vmem on asic\n");
+ rc = -ENOMEM;
+ goto free_data_buffer;
+ }
+
+ /* Create mapping on asic side */
+ mutex_lock(&hdev->mmu_lock);
+ rc = hl_mmu_map_contiguous(ctx, reserved_va_base, host_mem_dma_addr, SZ_2M);
+ hl_mmu_invalidate_cache_range(hdev, false,
+ MMU_OP_USERPTR | MMU_OP_SKIP_LOW_CACHE_INV,
+ ctx->asid, reserved_va_base, SZ_2M);
+ mutex_unlock(&hdev->mmu_lock);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to create mapping on asic mmu\n");
+ goto unreserve_va;
+ }
+
+ /* Enable MMU on KDMA */
+ gaudi2_kdma_set_mmbp_asid(hdev, false, ctx->asid);
+
+ pos = 0;
+ size_left = size;
+ size_to_dma = SZ_2M;
+
+ while (size_left > 0) {
+ if (size_left < SZ_2M)
+ size_to_dma = size_left;
+
+ rc = gaudi2_send_job_to_kdma(hdev, addr, reserved_va_base, size_to_dma, false);
+ if (rc)
+ break;
+
+ memcpy(blob_addr + pos, host_mem_virtual_addr, size_to_dma);
+
+ if (size_left <= SZ_2M)
+ break;
+
+ pos += SZ_2M;
+ addr += SZ_2M;
+ size_left -= SZ_2M;
+ }
+
+ gaudi2_kdma_set_mmbp_asid(hdev, true, HL_KERNEL_ASID_ID);
+
+ mutex_lock(&hdev->mmu_lock);
+ hl_mmu_unmap_contiguous(ctx, reserved_va_base, SZ_2M);
+ hl_mmu_invalidate_cache_range(hdev, false, MMU_OP_USERPTR,
+ ctx->asid, reserved_va_base, SZ_2M);
+ mutex_unlock(&hdev->mmu_lock);
+unreserve_va:
+ hl_unreserve_va_block(hdev, ctx, reserved_va_base, SZ_2M);
+free_data_buffer:
+ hl_asic_dma_free_coherent(hdev, SZ_2M, host_mem_virtual_addr, host_mem_dma_addr);
+put_ctx:
+ hl_ctx_put(ctx);
+
+ return rc;
+}
+
+static int gaudi2_internal_cb_pool_init(struct hl_device *hdev, struct hl_ctx *ctx)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int min_alloc_order, rc;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
+ return 0;
+
+ hdev->internal_cb_pool_virt_addr = hl_asic_dma_alloc_coherent(hdev,
+ HOST_SPACE_INTERNAL_CB_SZ,
+ &hdev->internal_cb_pool_dma_addr,
+ GFP_KERNEL | __GFP_ZERO);
+
+ if (!hdev->internal_cb_pool_virt_addr)
+ return -ENOMEM;
+
+ min_alloc_order = ilog2(min(gaudi2_get_signal_cb_size(hdev),
+ gaudi2_get_wait_cb_size(hdev)));
+
+ hdev->internal_cb_pool = gen_pool_create(min_alloc_order, -1);
+ if (!hdev->internal_cb_pool) {
+ dev_err(hdev->dev, "Failed to create internal CB pool\n");
+ rc = -ENOMEM;
+ goto free_internal_cb_pool;
+ }
+
+ rc = gen_pool_add(hdev->internal_cb_pool, (uintptr_t) hdev->internal_cb_pool_virt_addr,
+ HOST_SPACE_INTERNAL_CB_SZ, -1);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to add memory to internal CB pool\n");
+ rc = -EFAULT;
+ goto destroy_internal_cb_pool;
+ }
+
+ hdev->internal_cb_va_base = hl_reserve_va_block(hdev, ctx, HL_VA_RANGE_TYPE_HOST,
+ HOST_SPACE_INTERNAL_CB_SZ, HL_MMU_VA_ALIGNMENT_NOT_NEEDED);
+
+ if (!hdev->internal_cb_va_base) {
+ rc = -ENOMEM;
+ goto destroy_internal_cb_pool;
+ }
+
+ mutex_lock(&hdev->mmu_lock);
+ rc = hl_mmu_map_contiguous(ctx, hdev->internal_cb_va_base, hdev->internal_cb_pool_dma_addr,
+ HOST_SPACE_INTERNAL_CB_SZ);
+ hl_mmu_invalidate_cache(hdev, false, MMU_OP_USERPTR);
+ mutex_unlock(&hdev->mmu_lock);
+
+ if (rc)
+ goto unreserve_internal_cb_pool;
+
+ return 0;
+
+unreserve_internal_cb_pool:
+ hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
+destroy_internal_cb_pool:
+ gen_pool_destroy(hdev->internal_cb_pool);
+free_internal_cb_pool:
+ hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
+ hdev->internal_cb_pool_dma_addr);
+
+ return rc;
+}
+
+static void gaudi2_internal_cb_pool_fini(struct hl_device *hdev, struct hl_ctx *ctx)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_PMMU))
+ return;
+
+ mutex_lock(&hdev->mmu_lock);
+ hl_mmu_unmap_contiguous(ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
+ hl_unreserve_va_block(hdev, ctx, hdev->internal_cb_va_base, HOST_SPACE_INTERNAL_CB_SZ);
+ hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR);
+ mutex_unlock(&hdev->mmu_lock);
+
+ gen_pool_destroy(hdev->internal_cb_pool);
+
+ hl_asic_dma_free_coherent(hdev, HOST_SPACE_INTERNAL_CB_SZ, hdev->internal_cb_pool_virt_addr,
+ hdev->internal_cb_pool_dma_addr);
+}
+
+static void gaudi2_restore_user_registers(struct hl_device *hdev)
+{
+ gaudi2_restore_user_sm_registers(hdev);
+ gaudi2_restore_user_qm_registers(hdev);
+}
+
+static int gaudi2_map_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int rc;
+
+ rc = hl_mmu_map_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
+ gaudi2->virt_msix_db_dma_addr, prop->pmmu.page_size, true);
+ if (rc)
+ dev_err(hdev->dev, "Failed to map VA %#llx for virtual MSI-X doorbell memory\n",
+ RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
+
+ return rc;
+}
+
+static void gaudi2_unmap_virtual_msix_doorbell_memory(struct hl_ctx *ctx)
+{
+ struct hl_device *hdev = ctx->hdev;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int rc;
+
+ rc = hl_mmu_unmap_page(ctx, RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START,
+ prop->pmmu.page_size, true);
+ if (rc)
+ dev_err(hdev->dev, "Failed to unmap VA %#llx of virtual MSI-X doorbell memory\n",
+ RESERVED_VA_FOR_VIRTUAL_MSIX_DOORBELL_START);
+}
+
+static int gaudi2_ctx_init(struct hl_ctx *ctx)
+{
+ int rc;
+
+ rc = gaudi2_mmu_prepare(ctx->hdev, ctx->asid);
+ if (rc)
+ return rc;
+
+ /* No need to clear user registers if the device has just
+ * performed reset, we restore only nic qm registers
+ */
+ if (ctx->hdev->reset_upon_device_release)
+ gaudi2_restore_nic_qm_registers(ctx->hdev);
+ else
+ gaudi2_restore_user_registers(ctx->hdev);
+
+ rc = gaudi2_internal_cb_pool_init(ctx->hdev, ctx);
+ if (rc)
+ return rc;
+
+ rc = gaudi2_map_virtual_msix_doorbell_memory(ctx);
+ if (rc)
+ gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
+
+ return rc;
+}
+
+static void gaudi2_ctx_fini(struct hl_ctx *ctx)
+{
+ if (ctx->asid == HL_KERNEL_ASID_ID)
+ return;
+
+ gaudi2_internal_cb_pool_fini(ctx->hdev, ctx);
+
+ gaudi2_unmap_virtual_msix_doorbell_memory(ctx);
+}
+
+static int gaudi2_pre_schedule_cs(struct hl_cs *cs)
+{
+ struct hl_device *hdev = cs->ctx->hdev;
+ int index = cs->sequence & (hdev->asic_prop.max_pending_cs - 1);
+ u32 mon_payload, sob_id, mon_id;
+
+ if (!cs_needs_completion(cs))
+ return 0;
+
+ /*
+ * First 64 SOB/MON are reserved for driver for QMAN auto completion
+ * mechanism. Each SOB/MON pair are used for a pending CS with the same
+ * cyclic index. The SOB value is increased when each of the CS jobs is
+ * completed. When the SOB reaches the number of CS jobs, the monitor
+ * generates MSI-X interrupt.
+ */
+
+ sob_id = mon_id = index;
+ mon_payload = (1 << CQ_ENTRY_SHADOW_INDEX_VALID_SHIFT) |
+ (1 << CQ_ENTRY_READY_SHIFT) | index;
+
+ gaudi2_arm_cq_monitor(hdev, sob_id, mon_id, GAUDI2_RESERVED_CQ_CS_COMPLETION, mon_payload,
+ cs->jobs_cnt);
+
+ return 0;
+}
+
+static u32 gaudi2_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
+{
+ return HL_INVALID_QUEUE;
+}
+
+static u32 gaudi2_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id, u32 size, bool eb)
+{
+ struct hl_cb *cb = data;
+ struct packet_msg_short *pkt;
+ u32 value, ctl, pkt_size = sizeof(*pkt);
+
+ pkt = (struct packet_msg_short *) (uintptr_t) (cb->kernel_address + size);
+ memset(pkt, 0, pkt_size);
+
+ /* Inc by 1, Mode ADD */
+ value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
+ value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_SOB_MOD_MASK, 1);
+
+ ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, sob_id * 4);
+ ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 1); /* SOB base */
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, eb);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return size + pkt_size;
+}
+
+static u32 gaudi2_add_mon_msg_short(struct packet_msg_short *pkt, u32 value, u16 addr)
+{
+ u32 ctl, pkt_size = sizeof(*pkt);
+
+ memset(pkt, 0, pkt_size);
+
+ ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
+ ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0); /* MON base */
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 0);
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static u32 gaudi2_add_arm_monitor_pkt(struct hl_device *hdev, struct packet_msg_short *pkt,
+ u16 sob_base, u8 sob_mask, u16 sob_val, u16 addr)
+{
+ u32 ctl, value, pkt_size = sizeof(*pkt);
+ u8 mask;
+
+ if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
+ dev_err(hdev->dev, "sob_base %u (mask %#x) is not valid\n", sob_base, sob_mask);
+ return 0;
+ }
+
+ memset(pkt, 0, pkt_size);
+
+ value = FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
+ value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
+ value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MODE_MASK, 0); /* GREATER OR EQUAL*/
+ value |= FIELD_PREP(GAUDI2_PKT_SHORT_VAL_MON_MASK_MASK, mask);
+
+ ctl = FIELD_PREP(GAUDI2_PKT_SHORT_CTL_ADDR_MASK, addr);
+ ctl |= FIELD_PREP(GAUDI2_PKT_SHORT_CTL_BASE_MASK, 0); /* MON base */
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
+
+ pkt->value = cpu_to_le32(value);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static u32 gaudi2_add_fence_pkt(struct packet_fence *pkt)
+{
+ u32 ctl, cfg, pkt_size = sizeof(*pkt);
+
+ memset(pkt, 0, pkt_size);
+
+ cfg = FIELD_PREP(GAUDI2_PKT_FENCE_CFG_DEC_VAL_MASK, 1);
+ cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_TARGET_VAL_MASK, 1);
+ cfg |= FIELD_PREP(GAUDI2_PKT_FENCE_CFG_ID_MASK, 2);
+
+ ctl = FIELD_PREP(GAUDI2_PKT_CTL_OPCODE_MASK, PACKET_FENCE);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_EB_MASK, 0);
+ ctl |= FIELD_PREP(GAUDI2_PKT_CTL_MB_MASK, 1);
+
+ pkt->cfg = cpu_to_le32(cfg);
+ pkt->ctl = cpu_to_le32(ctl);
+
+ return pkt_size;
+}
+
+static u32 gaudi2_gen_wait_cb(struct hl_device *hdev, struct hl_gen_wait_properties *prop)
+{
+ struct hl_cb *cb = prop->data;
+ void *buf = (void *) (uintptr_t) (cb->kernel_address);
+
+ u64 monitor_base, fence_addr = 0;
+ u32 stream_index, size = prop->size;
+ u16 msg_addr_offset;
+
+ stream_index = prop->q_idx % 4;
+ fence_addr = CFG_BASE + gaudi2_qm_blocks_bases[prop->q_idx] +
+ QM_FENCE2_OFFSET + stream_index * 4;
+
+ /*
+ * monitor_base should be the content of the base0 address registers,
+ * so it will be added to the msg short offsets
+ */
+ monitor_base = mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
+
+ /* First monitor config packet: low address of the sync */
+ msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + prop->mon_id * 4) -
+ monitor_base;
+
+ size += gaudi2_add_mon_msg_short(buf + size, (u32) fence_addr, msg_addr_offset);
+
+ /* Second monitor config packet: high address of the sync */
+ msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + prop->mon_id * 4) -
+ monitor_base;
+
+ size += gaudi2_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32), msg_addr_offset);
+
+ /*
+ * Third monitor config packet: the payload, i.e. what to write when the
+ * sync triggers
+ */
+ msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + prop->mon_id * 4) -
+ monitor_base;
+
+ size += gaudi2_add_mon_msg_short(buf + size, 1, msg_addr_offset);
+
+ /* Fourth monitor config packet: bind the monitor to a sync object */
+ msg_addr_offset = (mmDCORE0_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) - monitor_base;
+
+ size += gaudi2_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base, prop->sob_mask,
+ prop->sob_val, msg_addr_offset);
+
+ /* Fence packet */
+ size += gaudi2_add_fence_pkt(buf + size);
+
+ return size;
+}
+
+static void gaudi2_reset_sob(struct hl_device *hdev, void *data)
+{
+ struct hl_hw_sob *hw_sob = data;
+
+ dev_dbg(hdev->dev, "reset SOB, q_idx: %d, sob_id: %d\n", hw_sob->q_idx, hw_sob->sob_id);
+
+ WREG32(mmDCORE0_SYNC_MNGR_OBJS_SOB_OBJ_0 + hw_sob->sob_id * 4, 0);
+
+ kref_init(&hw_sob->kref);
+}
+
+static void gaudi2_reset_sob_group(struct hl_device *hdev, u16 sob_group)
+{
+}
+
+static u64 gaudi2_get_device_time(struct hl_device *hdev)
+{
+ u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
+
+ return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
+}
+
+static int gaudi2_collective_wait_init_cs(struct hl_cs *cs)
+{
+ return 0;
+}
+
+static int gaudi2_collective_wait_create_jobs(struct hl_device *hdev, struct hl_ctx *ctx,
+ struct hl_cs *cs, u32 wait_queue_id,
+ u32 collective_engine_id, u32 encaps_signal_offset)
+{
+ return -EINVAL;
+}
+
+/*
+ * hl_mmu_scramble - converts a dram (non power of 2) page-size aligned address
+ * to DMMU page-size address (64MB) before mapping it in
+ * the MMU.
+ * The operation is performed on both the virtual and physical addresses.
+ * for device with 6 HBMs the scramble is:
+ * (addr[47:0] / 48M) * 64M + addr % 48M + addr[63:48]
+ *
+ * Example:
+ * =============================================================================
+ * Allocated DRAM Reserved VA scrambled VA for MMU mapping Scrambled PA
+ * Phys address in MMU last
+ * HOP
+ * =============================================================================
+ * PA1 0x3000000 VA1 0x9C000000 SVA1= (VA1/48M)*64M 0xD0000000 <- PA1/48M 0x1
+ * PA2 0x9000000 VA2 0x9F000000 SVA2= (VA2/48M)*64M 0xD4000000 <- PA2/48M 0x3
+ * =============================================================================
+ */
+static u64 gaudi2_mmu_scramble_addr(struct hl_device *hdev, u64 raw_addr)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 divisor, mod_va;
+ u64 div_va;
+
+ /* accept any address in the DRAM address space */
+ if (hl_mem_area_inside_range(raw_addr, sizeof(raw_addr), DRAM_PHYS_BASE,
+ VA_HBM_SPACE_END)) {
+
+ divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
+ div_va = div_u64_rem(raw_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK, divisor, &mod_va);
+ return (raw_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) |
+ (div_va << GAUDI2_HBM_MMU_SCRM_DIV_SHIFT) |
+ (mod_va << GAUDI2_HBM_MMU_SCRM_MOD_SHIFT);
+ }
+
+ return raw_addr;
+}
+
+static u64 gaudi2_mmu_descramble_addr(struct hl_device *hdev, u64 scrambled_addr)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 divisor, mod_va;
+ u64 div_va;
+
+ /* accept any address in the DRAM address space */
+ if (hl_mem_area_inside_range(scrambled_addr, sizeof(scrambled_addr), DRAM_PHYS_BASE,
+ VA_HBM_SPACE_END)) {
+
+ divisor = prop->num_functional_hbms * GAUDI2_HBM_MMU_SCRM_MEM_SIZE;
+ div_va = div_u64_rem(scrambled_addr & GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK,
+ PAGE_SIZE_64MB, &mod_va);
+
+ return ((scrambled_addr & ~GAUDI2_HBM_MMU_SCRM_ADDRESS_MASK) +
+ (div_va * divisor + mod_va));
+ }
+
+ return scrambled_addr;
+}
+
+static u32 gaudi2_get_dec_base_addr(struct hl_device *hdev, u32 core_id)
+{
+ u32 base = 0, dcore_id, dec_id;
+
+ if (core_id >= NUMBER_OF_DEC) {
+ dev_err(hdev->dev, "Unexpected core number %d for DEC\n", core_id);
+ goto out;
+ }
+
+ if (core_id < 8) {
+ dcore_id = core_id / NUM_OF_DEC_PER_DCORE;
+ dec_id = core_id % NUM_OF_DEC_PER_DCORE;
+
+ base = mmDCORE0_DEC0_CMD_BASE + dcore_id * DCORE_OFFSET +
+ dec_id * DCORE_VDEC_OFFSET;
+ } else {
+ /* PCIe Shared Decoder */
+ base = mmPCIE_DEC0_CMD_BASE + ((core_id % 8) * PCIE_VDEC_OFFSET);
+ }
+out:
+ return base;
+}
+
+static int gaudi2_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
+ u32 *block_size, u32 *block_id)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ int i;
+
+ for (i = 0 ; i < NUM_USER_MAPPED_BLOCKS ; i++) {
+ if (block_addr == CFG_BASE + gaudi2->mapped_blocks[i].address) {
+ *block_id = i;
+ if (block_size)
+ *block_size = gaudi2->mapped_blocks[i].size;
+ return 0;
+ }
+ }
+
+ dev_err(hdev->dev, "Invalid block address %#llx", block_addr);
+
+ return -EINVAL;
+}
+
+static int gaudi2_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
+ u32 block_id, u32 block_size)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u64 offset_in_bar;
+ u64 address;
+ int rc;
+
+ if (block_id >= NUM_USER_MAPPED_BLOCKS) {
+ dev_err(hdev->dev, "Invalid block id %u", block_id);
+ return -EINVAL;
+ }
+
+ /* we allow mapping only an entire block */
+ if (block_size != gaudi2->mapped_blocks[block_id].size) {
+ dev_err(hdev->dev, "Invalid block size %u", block_size);
+ return -EINVAL;
+ }
+
+ offset_in_bar = CFG_BASE + gaudi2->mapped_blocks[block_id].address - STM_FLASH_BASE_ADDR;
+
+ address = pci_resource_start(hdev->pdev, SRAM_CFG_BAR_ID) + offset_in_bar;
+
++ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++ VM_DONTCOPY | VM_NORESERVE);
+
+ rc = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
+ block_size, vma->vm_page_prot);
+ if (rc)
+ dev_err(hdev->dev, "remap_pfn_range error %d", rc);
+
+ return rc;
+}
+
+static void gaudi2_enable_events_from_fw(struct hl_device *hdev)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ struct cpu_dyn_regs *dyn_regs = &hdev->fw_loader.dynamic_loader.comm_desc.cpu_dyn_regs;
+ u32 irq_handler_offset = le32_to_cpu(dyn_regs->gic_host_ints_irq);
+
+ if (gaudi2->hw_cap_initialized & HW_CAP_CPU_Q)
+ WREG32(irq_handler_offset,
+ gaudi2_irq_map_table[GAUDI2_EVENT_CPU_INTS_REGISTER].cpu_id);
+}
+
+static int gaudi2_get_mmu_base(struct hl_device *hdev, u64 mmu_id, u32 *mmu_base)
+{
+ switch (mmu_id) {
+ case HW_CAP_DCORE0_DMMU0:
+ *mmu_base = mmDCORE0_HMMU0_MMU_BASE;
+ break;
+ case HW_CAP_DCORE0_DMMU1:
+ *mmu_base = mmDCORE0_HMMU1_MMU_BASE;
+ break;
+ case HW_CAP_DCORE0_DMMU2:
+ *mmu_base = mmDCORE0_HMMU2_MMU_BASE;
+ break;
+ case HW_CAP_DCORE0_DMMU3:
+ *mmu_base = mmDCORE0_HMMU3_MMU_BASE;
+ break;
+ case HW_CAP_DCORE1_DMMU0:
+ *mmu_base = mmDCORE1_HMMU0_MMU_BASE;
+ break;
+ case HW_CAP_DCORE1_DMMU1:
+ *mmu_base = mmDCORE1_HMMU1_MMU_BASE;
+ break;
+ case HW_CAP_DCORE1_DMMU2:
+ *mmu_base = mmDCORE1_HMMU2_MMU_BASE;
+ break;
+ case HW_CAP_DCORE1_DMMU3:
+ *mmu_base = mmDCORE1_HMMU3_MMU_BASE;
+ break;
+ case HW_CAP_DCORE2_DMMU0:
+ *mmu_base = mmDCORE2_HMMU0_MMU_BASE;
+ break;
+ case HW_CAP_DCORE2_DMMU1:
+ *mmu_base = mmDCORE2_HMMU1_MMU_BASE;
+ break;
+ case HW_CAP_DCORE2_DMMU2:
+ *mmu_base = mmDCORE2_HMMU2_MMU_BASE;
+ break;
+ case HW_CAP_DCORE2_DMMU3:
+ *mmu_base = mmDCORE2_HMMU3_MMU_BASE;
+ break;
+ case HW_CAP_DCORE3_DMMU0:
+ *mmu_base = mmDCORE3_HMMU0_MMU_BASE;
+ break;
+ case HW_CAP_DCORE3_DMMU1:
+ *mmu_base = mmDCORE3_HMMU1_MMU_BASE;
+ break;
+ case HW_CAP_DCORE3_DMMU2:
+ *mmu_base = mmDCORE3_HMMU2_MMU_BASE;
+ break;
+ case HW_CAP_DCORE3_DMMU3:
+ *mmu_base = mmDCORE3_HMMU3_MMU_BASE;
+ break;
+ case HW_CAP_PMMU:
+ *mmu_base = mmPMMU_HBW_MMU_BASE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void gaudi2_ack_mmu_error(struct hl_device *hdev, u64 mmu_id)
+{
+ bool is_pmmu = (mmu_id == HW_CAP_PMMU);
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+ u32 mmu_base;
+
+ if (!(gaudi2->hw_cap_initialized & mmu_id))
+ return;
+
+ if (gaudi2_get_mmu_base(hdev, mmu_id, &mmu_base))
+ return;
+
+ gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, NULL);
+ gaudi2_handle_access_error(hdev, mmu_base, is_pmmu);
+}
+
+static int gaudi2_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
+{
+ u32 i, mmu_id, num_of_hmmus = NUM_OF_HMMU_PER_DCORE * NUM_OF_DCORES;
+
+ /* check all HMMUs */
+ for (i = 0 ; i < num_of_hmmus ; i++) {
+ mmu_id = HW_CAP_DCORE0_DMMU0 << i;
+
+ if (mmu_cap_mask & mmu_id)
+ gaudi2_ack_mmu_error(hdev, mmu_id);
+ }
+
+ /* check PMMU */
+ if (mmu_cap_mask & HW_CAP_PMMU)
+ gaudi2_ack_mmu_error(hdev, HW_CAP_PMMU);
+
+ return 0;
+}
+
+static void gaudi2_get_msi_info(__le32 *table)
+{
+ table[CPUCP_EVENT_QUEUE_MSI_TYPE] = cpu_to_le32(GAUDI2_EVENT_QUEUE_MSIX_IDX);
+}
+
+static int gaudi2_map_pll_idx_to_fw_idx(u32 pll_idx)
+{
+ switch (pll_idx) {
+ case HL_GAUDI2_CPU_PLL: return CPU_PLL;
+ case HL_GAUDI2_PCI_PLL: return PCI_PLL;
+ case HL_GAUDI2_NIC_PLL: return NIC_PLL;
+ case HL_GAUDI2_DMA_PLL: return DMA_PLL;
+ case HL_GAUDI2_MESH_PLL: return MESH_PLL;
+ case HL_GAUDI2_MME_PLL: return MME_PLL;
+ case HL_GAUDI2_TPC_PLL: return TPC_PLL;
+ case HL_GAUDI2_IF_PLL: return IF_PLL;
+ case HL_GAUDI2_SRAM_PLL: return SRAM_PLL;
+ case HL_GAUDI2_HBM_PLL: return HBM_PLL;
+ case HL_GAUDI2_VID_PLL: return VID_PLL;
+ case HL_GAUDI2_MSS_PLL: return MSS_PLL;
+ default: return -EINVAL;
+ }
+}
+
+static int gaudi2_gen_sync_to_engine_map(struct hl_device *hdev, struct hl_sync_to_engine_map *map)
+{
+ /* Not implemented */
+ return 0;
+}
+
+static int gaudi2_monitor_valid(struct hl_mon_state_dump *mon)
+{
+ /* Not implemented */
+ return 0;
+}
+
+static int gaudi2_print_single_monitor(char **buf, size_t *size, size_t *offset,
+ struct hl_device *hdev, struct hl_mon_state_dump *mon)
+{
+ /* Not implemented */
+ return 0;
+}
+
+
+static int gaudi2_print_fences_single_engine(struct hl_device *hdev, u64 base_offset,
+ u64 status_base_offset, enum hl_sync_engine_type engine_type,
+ u32 engine_id, char **buf, size_t *size, size_t *offset)
+{
+ /* Not implemented */
+ return 0;
+}
+
+
+static struct hl_state_dump_specs_funcs gaudi2_state_dump_funcs = {
+ .monitor_valid = gaudi2_monitor_valid,
+ .print_single_monitor = gaudi2_print_single_monitor,
+ .gen_sync_to_engine_map = gaudi2_gen_sync_to_engine_map,
+ .print_fences_single_engine = gaudi2_print_fences_single_engine,
+};
+
+static void gaudi2_state_dump_init(struct hl_device *hdev)
+{
+ /* Not implemented */
+ hdev->state_dump_specs.props = gaudi2_state_dump_specs_props;
+ hdev->state_dump_specs.funcs = gaudi2_state_dump_funcs;
+}
+
+static u32 gaudi2_get_sob_addr(struct hl_device *hdev, u32 sob_id)
+{
+ return 0;
+}
+
+static u32 *gaudi2_get_stream_master_qid_arr(void)
+{
+ return NULL;
+}
+
+static void gaudi2_add_device_attr(struct hl_device *hdev, struct attribute_group *dev_clk_attr_grp,
+ struct attribute_group *dev_vrm_attr_grp)
+{
+ hl_sysfs_add_dev_clk_attr(hdev, dev_clk_attr_grp);
+ hl_sysfs_add_dev_vrm_attr(hdev, dev_vrm_attr_grp);
+}
+
+static int gaudi2_mmu_get_real_page_size(struct hl_device *hdev, struct hl_mmu_properties *mmu_prop,
+ u32 page_size, u32 *real_page_size, bool is_dram_addr)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+
+ /* for host pages the page size must be */
+ if (!is_dram_addr) {
+ if (page_size % mmu_prop->page_size)
+ goto page_size_err;
+
+ *real_page_size = mmu_prop->page_size;
+ return 0;
+ }
+
+ if ((page_size % prop->dram_page_size) || (prop->dram_page_size > mmu_prop->page_size))
+ goto page_size_err;
+
+ /*
+ * MMU page size is different from DRAM page size (more precisely, DMMU page is greater
+ * than DRAM page size).
+ * for this reason work with the DRAM page size and let the MMU scrambling routine handle
+ * this mismatch when calculating the address to place in the MMU page table.
+ * (in that case also make sure that the dram_page_size is not greater than the
+ * mmu page size)
+ */
+ *real_page_size = prop->dram_page_size;
+
+ return 0;
+
+page_size_err:
+ dev_err(hdev->dev, "page size of %u is not %uKB aligned, can't map\n",
+ page_size, mmu_prop->page_size >> 10);
+ return -EFAULT;
+}
+
+static int gaudi2_get_monitor_dump(struct hl_device *hdev, void *data)
+{
+ return -EOPNOTSUPP;
+}
+
+int gaudi2_send_device_activity(struct hl_device *hdev, bool open)
+{
+ struct gaudi2_device *gaudi2 = hdev->asic_specific;
+
+ if (!(gaudi2->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_send_device_activity(hdev, open);
+}
+
+static const struct hl_asic_funcs gaudi2_funcs = {
+ .early_init = gaudi2_early_init,
+ .early_fini = gaudi2_early_fini,
+ .late_init = gaudi2_late_init,
+ .late_fini = gaudi2_late_fini,
+ .sw_init = gaudi2_sw_init,
+ .sw_fini = gaudi2_sw_fini,
+ .hw_init = gaudi2_hw_init,
+ .hw_fini = gaudi2_hw_fini,
+ .halt_engines = gaudi2_halt_engines,
+ .suspend = gaudi2_suspend,
+ .resume = gaudi2_resume,
+ .mmap = gaudi2_mmap,
+ .ring_doorbell = gaudi2_ring_doorbell,
+ .pqe_write = gaudi2_pqe_write,
+ .asic_dma_alloc_coherent = gaudi2_dma_alloc_coherent,
+ .asic_dma_free_coherent = gaudi2_dma_free_coherent,
+ .scrub_device_mem = gaudi2_scrub_device_mem,
+ .scrub_device_dram = gaudi2_scrub_device_dram,
+ .get_int_queue_base = NULL,
+ .test_queues = gaudi2_test_queues,
+ .asic_dma_pool_zalloc = gaudi2_dma_pool_zalloc,
+ .asic_dma_pool_free = gaudi2_dma_pool_free,
+ .cpu_accessible_dma_pool_alloc = gaudi2_cpu_accessible_dma_pool_alloc,
+ .cpu_accessible_dma_pool_free = gaudi2_cpu_accessible_dma_pool_free,
+ .asic_dma_unmap_single = gaudi2_dma_unmap_single,
+ .asic_dma_map_single = gaudi2_dma_map_single,
+ .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
+ .cs_parser = gaudi2_cs_parser,
+ .asic_dma_map_sgtable = hl_dma_map_sgtable,
+ .add_end_of_cb_packets = NULL,
+ .update_eq_ci = gaudi2_update_eq_ci,
+ .context_switch = gaudi2_context_switch,
+ .restore_phase_topology = gaudi2_restore_phase_topology,
+ .debugfs_read_dma = gaudi2_debugfs_read_dma,
+ .add_device_attr = gaudi2_add_device_attr,
+ .handle_eqe = gaudi2_handle_eqe,
+ .get_events_stat = gaudi2_get_events_stat,
+ .read_pte = NULL,
+ .write_pte = NULL,
+ .mmu_invalidate_cache = gaudi2_mmu_invalidate_cache,
+ .mmu_invalidate_cache_range = gaudi2_mmu_invalidate_cache_range,
+ .mmu_prefetch_cache_range = NULL,
+ .send_heartbeat = gaudi2_send_heartbeat,
+ .debug_coresight = gaudi2_debug_coresight,
+ .is_device_idle = gaudi2_is_device_idle,
+ .compute_reset_late_init = gaudi2_compute_reset_late_init,
+ .hw_queues_lock = gaudi2_hw_queues_lock,
+ .hw_queues_unlock = gaudi2_hw_queues_unlock,
+ .get_pci_id = gaudi2_get_pci_id,
+ .get_eeprom_data = gaudi2_get_eeprom_data,
+ .get_monitor_dump = gaudi2_get_monitor_dump,
+ .send_cpu_message = gaudi2_send_cpu_message,
+ .pci_bars_map = gaudi2_pci_bars_map,
+ .init_iatu = gaudi2_init_iatu,
+ .rreg = hl_rreg,
+ .wreg = hl_wreg,
+ .halt_coresight = gaudi2_halt_coresight,
+ .ctx_init = gaudi2_ctx_init,
+ .ctx_fini = gaudi2_ctx_fini,
+ .pre_schedule_cs = gaudi2_pre_schedule_cs,
+ .get_queue_id_for_cq = gaudi2_get_queue_id_for_cq,
+ .load_firmware_to_device = NULL,
+ .load_boot_fit_to_device = NULL,
+ .get_signal_cb_size = gaudi2_get_signal_cb_size,
+ .get_wait_cb_size = gaudi2_get_wait_cb_size,
+ .gen_signal_cb = gaudi2_gen_signal_cb,
+ .gen_wait_cb = gaudi2_gen_wait_cb,
+ .reset_sob = gaudi2_reset_sob,
+ .reset_sob_group = gaudi2_reset_sob_group,
+ .get_device_time = gaudi2_get_device_time,
+ .pb_print_security_errors = gaudi2_pb_print_security_errors,
+ .collective_wait_init_cs = gaudi2_collective_wait_init_cs,
+ .collective_wait_create_jobs = gaudi2_collective_wait_create_jobs,
+ .get_dec_base_addr = gaudi2_get_dec_base_addr,
+ .scramble_addr = gaudi2_mmu_scramble_addr,
+ .descramble_addr = gaudi2_mmu_descramble_addr,
+ .ack_protection_bits_errors = gaudi2_ack_protection_bits_errors,
+ .get_hw_block_id = gaudi2_get_hw_block_id,
+ .hw_block_mmap = gaudi2_block_mmap,
+ .enable_events_from_fw = gaudi2_enable_events_from_fw,
+ .ack_mmu_errors = gaudi2_ack_mmu_page_fault_or_access_error,
+ .get_msi_info = gaudi2_get_msi_info,
+ .map_pll_idx_to_fw_idx = gaudi2_map_pll_idx_to_fw_idx,
+ .init_firmware_preload_params = gaudi2_init_firmware_preload_params,
+ .init_firmware_loader = gaudi2_init_firmware_loader,
+ .init_cpu_scrambler_dram = gaudi2_init_scrambler_hbm,
+ .state_dump_init = gaudi2_state_dump_init,
+ .get_sob_addr = &gaudi2_get_sob_addr,
+ .set_pci_memory_regions = gaudi2_set_pci_memory_regions,
+ .get_stream_master_qid_arr = gaudi2_get_stream_master_qid_arr,
+ .check_if_razwi_happened = gaudi2_check_if_razwi_happened,
+ .mmu_get_real_page_size = gaudi2_mmu_get_real_page_size,
+ .access_dev_mem = hl_access_dev_mem,
+ .set_dram_bar_base = gaudi2_set_hbm_bar_base,
+ .set_engine_cores = gaudi2_set_engine_cores,
+ .send_device_activity = gaudi2_send_device_activity,
+ .set_dram_properties = gaudi2_set_dram_properties,
+ .set_binning_masks = gaudi2_set_binning_masks,
+};
+
+void gaudi2_set_asic_funcs(struct hl_device *hdev)
+{
+ hdev->asic_funcs = &gaudi2_funcs;
+}
--- /dev/null
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2016-2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "goyaP.h"
+#include "../include/hw_ip/mmu/mmu_general.h"
+#include "../include/hw_ip/mmu/mmu_v1_0.h"
+#include "../include/goya/asic_reg/goya_masks.h"
+#include "../include/goya/goya_reg_map.h"
+
+#include <linux/pci.h>
+#include <linux/hwmon.h>
+#include <linux/iommu.h>
+#include <linux/seq_file.h>
+
+/*
+ * GOYA security scheme:
+ *
+ * 1. Host is protected by:
+ * - Range registers (When MMU is enabled, DMA RR does NOT protect host)
+ * - MMU
+ *
+ * 2. DRAM is protected by:
+ * - Range registers (protect the first 512MB)
+ * - MMU (isolation between users)
+ *
+ * 3. Configuration is protected by:
+ * - Range registers
+ * - Protection bits
+ *
+ * When MMU is disabled:
+ *
+ * QMAN DMA: PQ, CQ, CP, DMA are secured.
+ * PQ, CB and the data are on the host.
+ *
+ * QMAN TPC/MME:
+ * PQ, CQ and CP are not secured.
+ * PQ, CB and the data are on the SRAM/DRAM.
+ *
+ * Since QMAN DMA is secured, the driver is parsing the DMA CB:
+ * - checks DMA pointer
+ * - WREG, MSG_PROT are not allowed.
+ * - MSG_LONG/SHORT are allowed.
+ *
+ * A read/write transaction by the QMAN to a protected area will succeed if
+ * and only if the QMAN's CP is secured and MSG_PROT is used
+ *
+ *
+ * When MMU is enabled:
+ *
+ * QMAN DMA: PQ, CQ and CP are secured.
+ * MMU is set to bypass on the Secure props register of the QMAN.
+ * The reasons we don't enable MMU for PQ, CQ and CP are:
+ * - PQ entry is in kernel address space and the driver doesn't map it.
+ * - CP writes to MSIX register and to kernel address space (completion
+ * queue).
+ *
+ * DMA is not secured but because CP is secured, the driver still needs to parse
+ * the CB, but doesn't need to check the DMA addresses.
+ *
+ * For QMAN DMA 0, DMA is also secured because only the driver uses this DMA and
+ * the driver doesn't map memory in MMU.
+ *
+ * QMAN TPC/MME: PQ, CQ and CP aren't secured (no change from MMU disabled mode)
+ *
+ * DMA RR does NOT protect host because DMA is not secured
+ *
+ */
+
+#define GOYA_BOOT_FIT_FILE "habanalabs/goya/goya-boot-fit.itb"
+#define GOYA_LINUX_FW_FILE "habanalabs/goya/goya-fit.itb"
+
+#define GOYA_MMU_REGS_NUM 63
+
+#define GOYA_DMA_POOL_BLK_SIZE 0x100 /* 256 bytes */
+
+#define GOYA_RESET_TIMEOUT_MSEC 500 /* 500ms */
+#define GOYA_PLDM_RESET_TIMEOUT_MSEC 20000 /* 20s */
+#define GOYA_RESET_WAIT_MSEC 1 /* 1ms */
+#define GOYA_CPU_RESET_WAIT_MSEC 100 /* 100ms */
+#define GOYA_PLDM_RESET_WAIT_MSEC 1000 /* 1s */
+#define GOYA_TEST_QUEUE_WAIT_USEC 100000 /* 100ms */
+#define GOYA_PLDM_MMU_TIMEOUT_USEC (MMU_CONFIG_TIMEOUT_USEC * 100)
+#define GOYA_PLDM_QMAN0_TIMEOUT_USEC (HL_DEVICE_TIMEOUT_USEC * 30)
+#define GOYA_BOOT_FIT_REQ_TIMEOUT_USEC 1000000 /* 1s */
+#define GOYA_MSG_TO_CPU_TIMEOUT_USEC 4000000 /* 4s */
+#define GOYA_WAIT_FOR_BL_TIMEOUT_USEC 15000000 /* 15s */
+
+#define GOYA_QMAN0_FENCE_VAL 0xD169B243
+
+#define GOYA_MAX_STRING_LEN 20
+
+#define GOYA_CB_POOL_CB_CNT 512
+#define GOYA_CB_POOL_CB_SIZE 0x20000 /* 128KB */
+
+#define IS_QM_IDLE(engine, qm_glbl_sts0) \
+ (((qm_glbl_sts0) & engine##_QM_IDLE_MASK) == engine##_QM_IDLE_MASK)
+#define IS_DMA_QM_IDLE(qm_glbl_sts0) IS_QM_IDLE(DMA, qm_glbl_sts0)
+#define IS_TPC_QM_IDLE(qm_glbl_sts0) IS_QM_IDLE(TPC, qm_glbl_sts0)
+#define IS_MME_QM_IDLE(qm_glbl_sts0) IS_QM_IDLE(MME, qm_glbl_sts0)
+
+#define IS_CMDQ_IDLE(engine, cmdq_glbl_sts0) \
+ (((cmdq_glbl_sts0) & engine##_CMDQ_IDLE_MASK) == \
+ engine##_CMDQ_IDLE_MASK)
+#define IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) \
+ IS_CMDQ_IDLE(TPC, cmdq_glbl_sts0)
+#define IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) \
+ IS_CMDQ_IDLE(MME, cmdq_glbl_sts0)
+
+#define IS_DMA_IDLE(dma_core_sts0) \
+ !((dma_core_sts0) & DMA_CH_0_STS0_DMA_BUSY_MASK)
+
+#define IS_TPC_IDLE(tpc_cfg_sts) \
+ (((tpc_cfg_sts) & TPC_CFG_IDLE_MASK) == TPC_CFG_IDLE_MASK)
+
+#define IS_MME_IDLE(mme_arch_sts) \
+ (((mme_arch_sts) & MME_ARCH_IDLE_MASK) == MME_ARCH_IDLE_MASK)
+
+static const char goya_irq_name[GOYA_MSIX_ENTRIES][GOYA_MAX_STRING_LEN] = {
+ "goya cq 0", "goya cq 1", "goya cq 2", "goya cq 3",
+ "goya cq 4", "goya cpu eq"
+};
+
+static u16 goya_packet_sizes[MAX_PACKET_ID] = {
+ [PACKET_WREG_32] = sizeof(struct packet_wreg32),
+ [PACKET_WREG_BULK] = sizeof(struct packet_wreg_bulk),
+ [PACKET_MSG_LONG] = sizeof(struct packet_msg_long),
+ [PACKET_MSG_SHORT] = sizeof(struct packet_msg_short),
+ [PACKET_CP_DMA] = sizeof(struct packet_cp_dma),
+ [PACKET_MSG_PROT] = sizeof(struct packet_msg_prot),
+ [PACKET_FENCE] = sizeof(struct packet_fence),
+ [PACKET_LIN_DMA] = sizeof(struct packet_lin_dma),
+ [PACKET_NOP] = sizeof(struct packet_nop),
+ [PACKET_STOP] = sizeof(struct packet_stop)
+};
+
+static inline bool validate_packet_id(enum packet_id id)
+{
+ switch (id) {
+ case PACKET_WREG_32:
+ case PACKET_WREG_BULK:
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_CP_DMA:
+ case PACKET_MSG_PROT:
+ case PACKET_FENCE:
+ case PACKET_LIN_DMA:
+ case PACKET_NOP:
+ case PACKET_STOP:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static u64 goya_mmu_regs[GOYA_MMU_REGS_NUM] = {
+ mmDMA_QM_0_GLBL_NON_SECURE_PROPS,
+ mmDMA_QM_1_GLBL_NON_SECURE_PROPS,
+ mmDMA_QM_2_GLBL_NON_SECURE_PROPS,
+ mmDMA_QM_3_GLBL_NON_SECURE_PROPS,
+ mmDMA_QM_4_GLBL_NON_SECURE_PROPS,
+ mmTPC0_QM_GLBL_SECURE_PROPS,
+ mmTPC0_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC0_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC0_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC0_CFG_ARUSER,
+ mmTPC0_CFG_AWUSER,
+ mmTPC1_QM_GLBL_SECURE_PROPS,
+ mmTPC1_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC1_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC1_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC1_CFG_ARUSER,
+ mmTPC1_CFG_AWUSER,
+ mmTPC2_QM_GLBL_SECURE_PROPS,
+ mmTPC2_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC2_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC2_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC2_CFG_ARUSER,
+ mmTPC2_CFG_AWUSER,
+ mmTPC3_QM_GLBL_SECURE_PROPS,
+ mmTPC3_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC3_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC3_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC3_CFG_ARUSER,
+ mmTPC3_CFG_AWUSER,
+ mmTPC4_QM_GLBL_SECURE_PROPS,
+ mmTPC4_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC4_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC4_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC4_CFG_ARUSER,
+ mmTPC4_CFG_AWUSER,
+ mmTPC5_QM_GLBL_SECURE_PROPS,
+ mmTPC5_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC5_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC5_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC5_CFG_ARUSER,
+ mmTPC5_CFG_AWUSER,
+ mmTPC6_QM_GLBL_SECURE_PROPS,
+ mmTPC6_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC6_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC6_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC6_CFG_ARUSER,
+ mmTPC6_CFG_AWUSER,
+ mmTPC7_QM_GLBL_SECURE_PROPS,
+ mmTPC7_QM_GLBL_NON_SECURE_PROPS,
+ mmTPC7_CMDQ_GLBL_SECURE_PROPS,
+ mmTPC7_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmTPC7_CFG_ARUSER,
+ mmTPC7_CFG_AWUSER,
+ mmMME_QM_GLBL_SECURE_PROPS,
+ mmMME_QM_GLBL_NON_SECURE_PROPS,
+ mmMME_CMDQ_GLBL_SECURE_PROPS,
+ mmMME_CMDQ_GLBL_NON_SECURE_PROPS,
+ mmMME_SBA_CONTROL_DATA,
+ mmMME_SBB_CONTROL_DATA,
+ mmMME_SBC_CONTROL_DATA,
+ mmMME_WBC_CONTROL_DATA,
+ mmPCIE_WRAP_PSOC_ARUSER,
+ mmPCIE_WRAP_PSOC_AWUSER
+};
+
+static u32 goya_all_events[] = {
+ GOYA_ASYNC_EVENT_ID_PCIE_IF,
+ GOYA_ASYNC_EVENT_ID_TPC0_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC1_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC2_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC3_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC4_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC5_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC6_ECC,
+ GOYA_ASYNC_EVENT_ID_TPC7_ECC,
+ GOYA_ASYNC_EVENT_ID_MME_ECC,
+ GOYA_ASYNC_EVENT_ID_MME_ECC_EXT,
+ GOYA_ASYNC_EVENT_ID_MMU_ECC,
+ GOYA_ASYNC_EVENT_ID_DMA_MACRO,
+ GOYA_ASYNC_EVENT_ID_DMA_ECC,
+ GOYA_ASYNC_EVENT_ID_CPU_IF_ECC,
+ GOYA_ASYNC_EVENT_ID_PSOC_MEM,
+ GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT,
+ GOYA_ASYNC_EVENT_ID_SRAM0,
+ GOYA_ASYNC_EVENT_ID_SRAM1,
+ GOYA_ASYNC_EVENT_ID_SRAM2,
+ GOYA_ASYNC_EVENT_ID_SRAM3,
+ GOYA_ASYNC_EVENT_ID_SRAM4,
+ GOYA_ASYNC_EVENT_ID_SRAM5,
+ GOYA_ASYNC_EVENT_ID_SRAM6,
+ GOYA_ASYNC_EVENT_ID_SRAM7,
+ GOYA_ASYNC_EVENT_ID_SRAM8,
+ GOYA_ASYNC_EVENT_ID_SRAM9,
+ GOYA_ASYNC_EVENT_ID_SRAM10,
+ GOYA_ASYNC_EVENT_ID_SRAM11,
+ GOYA_ASYNC_EVENT_ID_SRAM12,
+ GOYA_ASYNC_EVENT_ID_SRAM13,
+ GOYA_ASYNC_EVENT_ID_SRAM14,
+ GOYA_ASYNC_EVENT_ID_SRAM15,
+ GOYA_ASYNC_EVENT_ID_SRAM16,
+ GOYA_ASYNC_EVENT_ID_SRAM17,
+ GOYA_ASYNC_EVENT_ID_SRAM18,
+ GOYA_ASYNC_EVENT_ID_SRAM19,
+ GOYA_ASYNC_EVENT_ID_SRAM20,
+ GOYA_ASYNC_EVENT_ID_SRAM21,
+ GOYA_ASYNC_EVENT_ID_SRAM22,
+ GOYA_ASYNC_EVENT_ID_SRAM23,
+ GOYA_ASYNC_EVENT_ID_SRAM24,
+ GOYA_ASYNC_EVENT_ID_SRAM25,
+ GOYA_ASYNC_EVENT_ID_SRAM26,
+ GOYA_ASYNC_EVENT_ID_SRAM27,
+ GOYA_ASYNC_EVENT_ID_SRAM28,
+ GOYA_ASYNC_EVENT_ID_SRAM29,
+ GOYA_ASYNC_EVENT_ID_GIC500,
+ GOYA_ASYNC_EVENT_ID_PLL0,
+ GOYA_ASYNC_EVENT_ID_PLL1,
+ GOYA_ASYNC_EVENT_ID_PLL3,
+ GOYA_ASYNC_EVENT_ID_PLL4,
+ GOYA_ASYNC_EVENT_ID_PLL5,
+ GOYA_ASYNC_EVENT_ID_PLL6,
+ GOYA_ASYNC_EVENT_ID_AXI_ECC,
+ GOYA_ASYNC_EVENT_ID_L2_RAM_ECC,
+ GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET,
+ GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT,
+ GOYA_ASYNC_EVENT_ID_PCIE_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC0_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC1_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC2_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC3_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC4_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC5_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC6_DEC,
+ GOYA_ASYNC_EVENT_ID_TPC7_DEC,
+ GOYA_ASYNC_EVENT_ID_MME_WACS,
+ GOYA_ASYNC_EVENT_ID_MME_WACSD,
+ GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER,
+ GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC,
+ GOYA_ASYNC_EVENT_ID_PSOC,
+ GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR,
+ GOYA_ASYNC_EVENT_ID_TPC0_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC1_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC2_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC3_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC4_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC5_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC6_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC7_CMDQ,
+ GOYA_ASYNC_EVENT_ID_TPC0_QM,
+ GOYA_ASYNC_EVENT_ID_TPC1_QM,
+ GOYA_ASYNC_EVENT_ID_TPC2_QM,
+ GOYA_ASYNC_EVENT_ID_TPC3_QM,
+ GOYA_ASYNC_EVENT_ID_TPC4_QM,
+ GOYA_ASYNC_EVENT_ID_TPC5_QM,
+ GOYA_ASYNC_EVENT_ID_TPC6_QM,
+ GOYA_ASYNC_EVENT_ID_TPC7_QM,
+ GOYA_ASYNC_EVENT_ID_MME_QM,
+ GOYA_ASYNC_EVENT_ID_MME_CMDQ,
+ GOYA_ASYNC_EVENT_ID_DMA0_QM,
+ GOYA_ASYNC_EVENT_ID_DMA1_QM,
+ GOYA_ASYNC_EVENT_ID_DMA2_QM,
+ GOYA_ASYNC_EVENT_ID_DMA3_QM,
+ GOYA_ASYNC_EVENT_ID_DMA4_QM,
+ GOYA_ASYNC_EVENT_ID_DMA0_CH,
+ GOYA_ASYNC_EVENT_ID_DMA1_CH,
+ GOYA_ASYNC_EVENT_ID_DMA2_CH,
+ GOYA_ASYNC_EVENT_ID_DMA3_CH,
+ GOYA_ASYNC_EVENT_ID_DMA4_CH,
+ GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU,
+ GOYA_ASYNC_EVENT_ID_DMA_BM_CH0,
+ GOYA_ASYNC_EVENT_ID_DMA_BM_CH1,
+ GOYA_ASYNC_EVENT_ID_DMA_BM_CH2,
+ GOYA_ASYNC_EVENT_ID_DMA_BM_CH3,
+ GOYA_ASYNC_EVENT_ID_DMA_BM_CH4,
+ GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S,
+ GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E,
+ GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S,
+ GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E
+};
+
+static s64 goya_state_dump_specs_props[SP_MAX] = {0};
+
+static int goya_mmu_clear_pgt_range(struct hl_device *hdev);
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
+static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev);
+static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
+
+int goya_set_fixed_properties(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int i;
+
+ prop->max_queues = GOYA_QUEUE_ID_SIZE;
+ prop->hw_queues_props = kcalloc(prop->max_queues,
+ sizeof(struct hw_queue_properties),
+ GFP_KERNEL);
+
+ if (!prop->hw_queues_props)
+ return -ENOMEM;
+
+ for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
+ prop->hw_queues_props[i].driver_only = 0;
+ prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
+ }
+
+ for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES ; i++) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_CPU;
+ prop->hw_queues_props[i].driver_only = 1;
+ prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_KERNEL;
+ }
+
+ for (; i < NUMBER_OF_EXT_HW_QUEUES + NUMBER_OF_CPU_HW_QUEUES +
+ NUMBER_OF_INT_HW_QUEUES; i++) {
+ prop->hw_queues_props[i].type = QUEUE_TYPE_INT;
+ prop->hw_queues_props[i].driver_only = 0;
+ prop->hw_queues_props[i].cb_alloc_flags = CB_ALLOC_USER;
+ }
+
+ prop->cfg_base_address = CFG_BASE;
+ prop->device_dma_offset_for_host_access = HOST_PHYS_BASE;
+ prop->host_base_address = HOST_PHYS_BASE;
+ prop->host_end_address = prop->host_base_address + HOST_PHYS_SIZE;
+ prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
+ prop->completion_mode = HL_COMPLETION_MODE_JOB;
+ prop->dram_base_address = DRAM_PHYS_BASE;
+ prop->dram_size = DRAM_PHYS_DEFAULT_SIZE;
+ prop->dram_end_address = prop->dram_base_address + prop->dram_size;
+ prop->dram_user_base_address = DRAM_BASE_ADDR_USER;
+
+ prop->sram_base_address = SRAM_BASE_ADDR;
+ prop->sram_size = SRAM_SIZE;
+ prop->sram_end_address = prop->sram_base_address + prop->sram_size;
+ prop->sram_user_base_address = prop->sram_base_address +
+ SRAM_USER_BASE_OFFSET;
+
+ prop->mmu_pgt_addr = MMU_PAGE_TABLES_ADDR;
+ prop->mmu_dram_default_page_addr = MMU_DRAM_DEFAULT_PAGE_ADDR;
+ if (hdev->pldm)
+ prop->mmu_pgt_size = 0x800000; /* 8MB */
+ else
+ prop->mmu_pgt_size = MMU_PAGE_TABLES_SIZE;
+ prop->mmu_pte_size = HL_PTE_SIZE;
+ prop->mmu_hop_table_size = HOP_TABLE_SIZE_512_PTE;
+ prop->mmu_hop0_tables_total_size = HOP0_512_PTE_TABLES_TOTAL_SIZE;
+ prop->dram_page_size = PAGE_SIZE_2MB;
+ prop->device_mem_alloc_default_page_size = prop->dram_page_size;
+ prop->dram_supports_virtual_memory = true;
+
+ prop->dmmu.hop_shifts[MMU_HOP0] = MMU_V1_0_HOP0_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP1] = MMU_V1_0_HOP1_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP2] = MMU_V1_0_HOP2_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP3] = MMU_V1_0_HOP3_SHIFT;
+ prop->dmmu.hop_shifts[MMU_HOP4] = MMU_V1_0_HOP4_SHIFT;
+ prop->dmmu.hop_masks[MMU_HOP0] = MMU_V1_0_HOP0_MASK;
+ prop->dmmu.hop_masks[MMU_HOP1] = MMU_V1_0_HOP1_MASK;
+ prop->dmmu.hop_masks[MMU_HOP2] = MMU_V1_0_HOP2_MASK;
+ prop->dmmu.hop_masks[MMU_HOP3] = MMU_V1_0_HOP3_MASK;
+ prop->dmmu.hop_masks[MMU_HOP4] = MMU_V1_0_HOP4_MASK;
+ prop->dmmu.start_addr = VA_DDR_SPACE_START;
+ prop->dmmu.end_addr = VA_DDR_SPACE_END;
+ prop->dmmu.page_size = PAGE_SIZE_2MB;
+ prop->dmmu.num_hops = MMU_ARCH_5_HOPS;
+ prop->dmmu.last_mask = LAST_MASK;
+ /* TODO: will be duplicated until implementing per-MMU props */
+ prop->dmmu.hop_table_size = prop->mmu_hop_table_size;
+ prop->dmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
+
+ /* shifts and masks are the same in PMMU and DMMU */
+ memcpy(&prop->pmmu, &prop->dmmu, sizeof(prop->dmmu));
+ prop->pmmu.start_addr = VA_HOST_SPACE_START;
+ prop->pmmu.end_addr = VA_HOST_SPACE_END;
+ prop->pmmu.page_size = PAGE_SIZE_4KB;
+ prop->pmmu.num_hops = MMU_ARCH_5_HOPS;
+ prop->pmmu.last_mask = LAST_MASK;
+ /* TODO: will be duplicated until implementing per-MMU props */
+ prop->pmmu.hop_table_size = prop->mmu_hop_table_size;
+ prop->pmmu.hop0_tables_total_size = prop->mmu_hop0_tables_total_size;
+
+ /* PMMU and HPMMU are the same except of page size */
+ memcpy(&prop->pmmu_huge, &prop->pmmu, sizeof(prop->pmmu));
+ prop->pmmu_huge.page_size = PAGE_SIZE_2MB;
+
+ prop->dram_size_for_default_page_mapping = VA_DDR_SPACE_END;
+ prop->cfg_size = CFG_SIZE;
+ prop->max_asid = MAX_ASID;
+ prop->num_of_events = GOYA_ASYNC_EVENT_ID_SIZE;
+ prop->high_pll = PLL_HIGH_DEFAULT;
+ prop->cb_pool_cb_cnt = GOYA_CB_POOL_CB_CNT;
+ prop->cb_pool_cb_size = GOYA_CB_POOL_CB_SIZE;
+ prop->max_power_default = MAX_POWER_DEFAULT;
+ prop->dc_power_default = DC_POWER_DEFAULT;
+ prop->tpc_enabled_mask = TPC_ENABLED_MASK;
+ prop->pcie_dbi_base_address = mmPCIE_DBI_BASE;
+ prop->pcie_aux_dbi_reg_addr = CFG_BASE + mmPCIE_AUX_DBI;
+
+ strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
+ CARD_NAME_MAX_LEN);
+
+ prop->max_pending_cs = GOYA_MAX_PENDING_CS;
+
+ prop->first_available_user_interrupt = USHRT_MAX;
+
+ for (i = 0 ; i < HL_MAX_DCORES ; i++)
+ prop->first_available_cq[i] = USHRT_MAX;
+
+ prop->fw_cpu_boot_dev_sts0_valid = false;
+ prop->fw_cpu_boot_dev_sts1_valid = false;
+ prop->hard_reset_done_by_fw = false;
+ prop->gic_interrupts_enable = true;
+
+ prop->server_type = HL_SERVER_TYPE_UNKNOWN;
+
+ prop->clk_pll_index = HL_GOYA_MME_PLL;
+
+ prop->use_get_power_for_reset_history = true;
+
+ prop->configurable_stop_on_err = true;
+
+ prop->set_max_power_on_device_init = true;
+
+ prop->dma_mask = 48;
+
+ return 0;
+}
+
+/*
+ * goya_pci_bars_map - Map PCI BARS of Goya device
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Request PCI regions and map them to kernel virtual addresses.
+ * Returns 0 on success
+ *
+ */
+static int goya_pci_bars_map(struct hl_device *hdev)
+{
+ static const char * const name[] = {"SRAM_CFG", "MSIX", "DDR"};
+ bool is_wc[3] = {false, false, true};
+ int rc;
+
+ rc = hl_pci_bars_map(hdev, name, is_wc);
+ if (rc)
+ return rc;
+
+ hdev->rmmio = hdev->pcie_bar[SRAM_CFG_BAR_ID] +
+ (CFG_BASE - SRAM_BASE_ADDR);
+
+ return 0;
+}
+
+static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ struct hl_inbound_pci_region pci_region;
+ u64 old_addr = addr;
+ int rc;
+
+ if ((goya) && (goya->ddr_bar_cur_addr == addr))
+ return old_addr;
+
+ /* Inbound Region 1 - Bar 4 - Point to DDR */
+ pci_region.mode = PCI_BAR_MATCH_MODE;
+ pci_region.bar = DDR_BAR_ID;
+ pci_region.addr = addr;
+ rc = hl_pci_set_inbound_region(hdev, 1, &pci_region);
+ if (rc)
+ return U64_MAX;
+
+ if (goya) {
+ old_addr = goya->ddr_bar_cur_addr;
+ goya->ddr_bar_cur_addr = addr;
+ }
+
+ return old_addr;
+}
+
+/*
+ * goya_init_iatu - Initialize the iATU unit inside the PCI controller
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * This is needed in case the firmware doesn't initialize the iATU
+ *
+ */
+static int goya_init_iatu(struct hl_device *hdev)
+{
+ struct hl_inbound_pci_region inbound_region;
+ struct hl_outbound_pci_region outbound_region;
+ int rc;
+
+ if (hdev->asic_prop.iatu_done_by_fw)
+ return 0;
+
+ /* Inbound Region 0 - Bar 0 - Point to SRAM and CFG */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = SRAM_CFG_BAR_ID;
+ inbound_region.addr = SRAM_BASE_ADDR;
+ rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+ if (rc)
+ goto done;
+
+ /* Inbound Region 1 - Bar 4 - Point to DDR */
+ inbound_region.mode = PCI_BAR_MATCH_MODE;
+ inbound_region.bar = DDR_BAR_ID;
+ inbound_region.addr = DRAM_PHYS_BASE;
+ rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
+ if (rc)
+ goto done;
+
+ /* Outbound Region 0 - Point to Host */
+ outbound_region.addr = HOST_PHYS_BASE;
+ outbound_region.size = HOST_PHYS_SIZE;
+ rc = hl_pci_set_outbound_region(hdev, &outbound_region);
+
+done:
+ return rc;
+}
+
+static enum hl_device_hw_state goya_get_hw_state(struct hl_device *hdev)
+{
+ return RREG32(mmHW_STATE);
+}
+
+/*
+ * goya_early_init - GOYA early initialization code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Verify PCI bars
+ * Set DMA masks
+ * PCI controller initialization
+ * Map PCI bars
+ *
+ */
+static int goya_early_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_dev *pdev = hdev->pdev;
+ resource_size_t pci_bar_size;
+ u32 fw_boot_status, val;
+ int rc;
+
+ rc = goya_set_fixed_properties(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get fixed properties\n");
+ return rc;
+ }
+
+ /* Check BAR sizes */
+ pci_bar_size = pci_resource_len(pdev, SRAM_CFG_BAR_ID);
+
+ if (pci_bar_size != CFG_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ SRAM_CFG_BAR_ID, &pci_bar_size, CFG_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ pci_bar_size = pci_resource_len(pdev, MSIX_BAR_ID);
+
+ if (pci_bar_size != MSIX_BAR_SIZE) {
+ dev_err(hdev->dev, "Not " HL_NAME "? BAR %d size %pa, expecting %llu\n",
+ MSIX_BAR_ID, &pci_bar_size, MSIX_BAR_SIZE);
+ rc = -ENODEV;
+ goto free_queue_props;
+ }
+
+ prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
+ hdev->dram_pci_bar_start = pci_resource_start(pdev, DDR_BAR_ID);
+
+ /* If FW security is enabled at this point it means no access to ELBI */
+ if (hdev->asic_prop.fw_security_enabled) {
+ hdev->asic_prop.iatu_done_by_fw = true;
+ goto pci_init;
+ }
+
+ rc = hl_pci_elbi_read(hdev, CFG_BASE + mmCPU_BOOT_DEV_STS0,
+ &fw_boot_status);
+ if (rc)
+ goto free_queue_props;
+
+ /* Check whether FW is configuring iATU */
+ if ((fw_boot_status & CPU_BOOT_DEV_STS0_ENABLED) &&
+ (fw_boot_status & CPU_BOOT_DEV_STS0_FW_IATU_CONF_EN))
+ hdev->asic_prop.iatu_done_by_fw = true;
+
+pci_init:
+ rc = hl_pci_init(hdev);
+ if (rc)
+ goto free_queue_props;
+
+ /* Before continuing in the initialization, we need to read the preboot
+ * version to determine whether we run with a security-enabled firmware
+ */
+ rc = hl_fw_read_preboot_status(hdev);
+ if (rc) {
+ if (hdev->reset_on_preboot_fail)
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ goto pci_fini;
+ }
+
+ if (goya_get_hw_state(hdev) == HL_DEVICE_HW_STATE_DIRTY) {
+ dev_dbg(hdev->dev, "H/W state is dirty, must reset before initializing\n");
+ hdev->asic_funcs->hw_fini(hdev, true, false);
+ }
+
+ if (!hdev->pldm) {
+ val = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
+ if (val & PSOC_GLOBAL_CONF_BOOT_STRAP_PINS_SRIOV_EN_MASK)
+ dev_warn(hdev->dev,
+ "PCI strap is not configured correctly, PCI bus errors may occur\n");
+ }
+
+ return 0;
+
+pci_fini:
+ hl_pci_fini(hdev);
+free_queue_props:
+ kfree(hdev->asic_prop.hw_queues_props);
+ return rc;
+}
+
+/*
+ * goya_early_fini - GOYA early finalization code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Unmap PCI bars
+ *
+ */
+static int goya_early_fini(struct hl_device *hdev)
+{
+ kfree(hdev->asic_prop.hw_queues_props);
+ hl_pci_fini(hdev);
+
+ return 0;
+}
+
+static void goya_mmu_prepare_reg(struct hl_device *hdev, u64 reg, u32 asid)
+{
+ /* mask to zero the MMBP and ASID bits */
+ WREG32_AND(reg, ~0x7FF);
+ WREG32_OR(reg, asid);
+}
+
+static void goya_qman0_set_security(struct hl_device *hdev, bool secure)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ if (secure)
+ WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_FULLY_TRUSTED);
+ else
+ WREG32(mmDMA_QM_0_GLBL_PROT, QMAN_DMA_PARTLY_TRUSTED);
+
+ RREG32(mmDMA_QM_0_GLBL_PROT);
+}
+
+/*
+ * goya_fetch_psoc_frequency - Fetch PSOC frequency values
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static void goya_fetch_psoc_frequency(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u32 nr = 0, nf = 0, od = 0, div_fctr = 0, pll_clk, div_sel;
+ u16 pll_freq_arr[HL_PLL_NUM_OUTPUTS], freq;
+ int rc;
+
+ if (hdev->asic_prop.fw_security_enabled) {
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
+ return;
+
+ rc = hl_fw_cpucp_pll_info_get(hdev, HL_GOYA_PCI_PLL,
+ pll_freq_arr);
+
+ if (rc)
+ return;
+
+ freq = pll_freq_arr[1];
+ } else {
+ div_fctr = RREG32(mmPSOC_PCI_PLL_DIV_FACTOR_1);
+ div_sel = RREG32(mmPSOC_PCI_PLL_DIV_SEL_1);
+ nr = RREG32(mmPSOC_PCI_PLL_NR);
+ nf = RREG32(mmPSOC_PCI_PLL_NF);
+ od = RREG32(mmPSOC_PCI_PLL_OD);
+
+ if (div_sel == DIV_SEL_REF_CLK ||
+ div_sel == DIV_SEL_DIVIDED_REF) {
+ if (div_sel == DIV_SEL_REF_CLK)
+ freq = PLL_REF_CLK;
+ else
+ freq = PLL_REF_CLK / (div_fctr + 1);
+ } else if (div_sel == DIV_SEL_PLL_CLK ||
+ div_sel == DIV_SEL_DIVIDED_PLL) {
+ pll_clk = PLL_REF_CLK * (nf + 1) /
+ ((nr + 1) * (od + 1));
+ if (div_sel == DIV_SEL_PLL_CLK)
+ freq = pll_clk;
+ else
+ freq = pll_clk / (div_fctr + 1);
+ } else {
+ dev_warn(hdev->dev,
+ "Received invalid div select value: %d",
+ div_sel);
+ freq = 0;
+ }
+ }
+
+ prop->psoc_timestamp_frequency = freq;
+ prop->psoc_pci_pll_nr = nr;
+ prop->psoc_pci_pll_nf = nf;
+ prop->psoc_pci_pll_od = od;
+ prop->psoc_pci_pll_div_factor = div_fctr;
+}
+
+/*
+ * goya_set_frequency - set the frequency of the device
+ *
+ * @hdev: pointer to habanalabs device structure
+ * @freq: the new frequency value
+ *
+ * Change the frequency if needed. This function has no protection against
+ * concurrency, therefore it is assumed that the calling function has protected
+ * itself against the case of calling this function from multiple threads with
+ * different values
+ *
+ * Returns 0 if no change was done, otherwise returns 1
+ */
+int goya_set_frequency(struct hl_device *hdev, enum hl_pll_frequency freq)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if ((goya->pm_mng_profile == PM_MANUAL) ||
+ (goya->curr_pll_profile == freq))
+ return 0;
+
+ dev_dbg(hdev->dev, "Changing device frequency to %s\n",
+ freq == PLL_HIGH ? "high" : "low");
+
+ goya_set_pll_profile(hdev, freq);
+
+ goya->curr_pll_profile = freq;
+
+ return 1;
+}
+
+static void goya_set_freq_to_low_job(struct work_struct *work)
+{
+ struct goya_work_freq *goya_work = container_of(work,
+ struct goya_work_freq,
+ work_freq.work);
+ struct hl_device *hdev = goya_work->hdev;
+
+ mutex_lock(&hdev->fpriv_list_lock);
+
+ if (!hdev->is_compute_ctx_active)
+ goya_set_frequency(hdev, PLL_LOW);
+
+ mutex_unlock(&hdev->fpriv_list_lock);
+
+ schedule_delayed_work(&goya_work->work_freq,
+ usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
+}
+
+int goya_late_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+ int rc;
+
+ goya_fetch_psoc_frequency(hdev);
+
+ rc = goya_mmu_clear_pgt_range(hdev);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to clear MMU page tables range %d\n", rc);
+ return rc;
+ }
+
+ rc = goya_mmu_set_dram_default_page(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to set DRAM default page %d\n", rc);
+ return rc;
+ }
+
+ rc = goya_mmu_add_mappings_for_device_cpu(hdev);
+ if (rc)
+ return rc;
+
+ rc = goya_init_cpu_queues(hdev);
+ if (rc)
+ return rc;
+
+ rc = goya_test_cpu_queue(hdev);
+ if (rc)
+ return rc;
+
+ rc = goya_cpucp_info_get(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to get cpucp info %d\n", rc);
+ return rc;
+ }
+
+ /* Now that we have the DRAM size in ASIC prop, we need to check
+ * its size and configure the DMA_IF DDR wrap protection (which is in
+ * the MMU block) accordingly. The value is the log2 of the DRAM size
+ */
+ WREG32(mmMMU_LOG2_DDR_SIZE, ilog2(prop->dram_size));
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_ENABLE_PCI_ACCESS, 0x0);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to enable PCI access from CPU %d\n", rc);
+ return rc;
+ }
+
+ /* force setting to low frequency */
+ goya->curr_pll_profile = PLL_LOW;
+
+ goya->pm_mng_profile = PM_AUTO;
+
+ goya_set_pll_profile(hdev, PLL_LOW);
+
+ schedule_delayed_work(&goya->goya_work->work_freq,
+ usecs_to_jiffies(HL_PLL_LOW_JOB_FREQ_USEC));
+
+ return 0;
+}
+
+/*
+ * goya_late_fini - GOYA late tear-down code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Free sensors allocated structures
+ */
+void goya_late_fini(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ cancel_delayed_work_sync(&goya->goya_work->work_freq);
+
+ hl_hwmon_release_resources(hdev);
+}
+
+static void goya_set_pci_memory_regions(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct pci_mem_region *region;
+
+ /* CFG */
+ region = &hdev->pci_mem_region[PCI_REGION_CFG];
+ region->region_base = CFG_BASE;
+ region->region_size = CFG_SIZE;
+ region->offset_in_bar = CFG_BASE - SRAM_BASE_ADDR;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = SRAM_CFG_BAR_ID;
+ region->used = 1;
+
+ /* SRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_SRAM];
+ region->region_base = SRAM_BASE_ADDR;
+ region->region_size = SRAM_SIZE;
+ region->offset_in_bar = 0;
+ region->bar_size = CFG_BAR_SIZE;
+ region->bar_id = SRAM_CFG_BAR_ID;
+ region->used = 1;
+
+ /* DRAM */
+ region = &hdev->pci_mem_region[PCI_REGION_DRAM];
+ region->region_base = DRAM_PHYS_BASE;
+ region->region_size = hdev->asic_prop.dram_size;
+ region->offset_in_bar = 0;
+ region->bar_size = prop->dram_pci_bar_size;
+ region->bar_id = DDR_BAR_ID;
+ region->used = 1;
+}
+
+/*
+ * goya_sw_init - Goya software initialization code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static int goya_sw_init(struct hl_device *hdev)
+{
+ struct goya_device *goya;
+ int rc;
+
+ /* Allocate device structure */
+ goya = kzalloc(sizeof(*goya), GFP_KERNEL);
+ if (!goya)
+ return -ENOMEM;
+
+ /* according to goya_init_iatu */
+ goya->ddr_bar_cur_addr = DRAM_PHYS_BASE;
+
+ goya->mme_clk = GOYA_PLL_FREQ_LOW;
+ goya->tpc_clk = GOYA_PLL_FREQ_LOW;
+ goya->ic_clk = GOYA_PLL_FREQ_LOW;
+
+ hdev->asic_specific = goya;
+
+ /* Create DMA pool for small allocations */
+ hdev->dma_pool = dma_pool_create(dev_name(hdev->dev),
+ &hdev->pdev->dev, GOYA_DMA_POOL_BLK_SIZE, 8, 0);
+ if (!hdev->dma_pool) {
+ dev_err(hdev->dev, "failed to create DMA pool\n");
+ rc = -ENOMEM;
+ goto free_goya_device;
+ }
+
+ hdev->cpu_accessible_dma_mem = hl_asic_dma_alloc_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE,
+ &hdev->cpu_accessible_dma_address,
+ GFP_KERNEL | __GFP_ZERO);
+
+ if (!hdev->cpu_accessible_dma_mem) {
+ rc = -ENOMEM;
+ goto free_dma_pool;
+ }
+
+ dev_dbg(hdev->dev, "cpu accessible memory at bus address %pad\n",
+ &hdev->cpu_accessible_dma_address);
+
+ hdev->cpu_accessible_dma_pool = gen_pool_create(ilog2(32), -1);
+ if (!hdev->cpu_accessible_dma_pool) {
+ dev_err(hdev->dev,
+ "Failed to create CPU accessible DMA pool\n");
+ rc = -ENOMEM;
+ goto free_cpu_dma_mem;
+ }
+
+ rc = gen_pool_add(hdev->cpu_accessible_dma_pool,
+ (uintptr_t) hdev->cpu_accessible_dma_mem,
+ HL_CPU_ACCESSIBLE_MEM_SIZE, -1);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to add memory to CPU accessible DMA pool\n");
+ rc = -EFAULT;
+ goto free_cpu_accessible_dma_pool;
+ }
+
+ spin_lock_init(&goya->hw_queues_lock);
+ hdev->supports_coresight = true;
+ hdev->asic_prop.supports_compute_reset = true;
+ hdev->asic_prop.allow_inference_soft_reset = true;
+ hdev->supports_wait_for_multi_cs = false;
+ hdev->supports_ctx_switch = true;
+
+ hdev->asic_funcs->set_pci_memory_regions(hdev);
+
+ goya->goya_work = kmalloc(sizeof(struct goya_work_freq), GFP_KERNEL);
+ if (!goya->goya_work) {
+ rc = -ENOMEM;
+ goto free_cpu_accessible_dma_pool;
+ }
+
+ goya->goya_work->hdev = hdev;
+ INIT_DELAYED_WORK(&goya->goya_work->work_freq, goya_set_freq_to_low_job);
+
+ return 0;
+
+free_cpu_accessible_dma_pool:
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+free_cpu_dma_mem:
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+free_dma_pool:
+ dma_pool_destroy(hdev->dma_pool);
+free_goya_device:
+ kfree(goya);
+
+ return rc;
+}
+
+/*
+ * goya_sw_fini - Goya software tear-down code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static int goya_sw_fini(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ gen_pool_destroy(hdev->cpu_accessible_dma_pool);
+
+ hl_asic_dma_free_coherent(hdev, HL_CPU_ACCESSIBLE_MEM_SIZE, hdev->cpu_accessible_dma_mem,
+ hdev->cpu_accessible_dma_address);
+
+ dma_pool_destroy(hdev->dma_pool);
+
+ kfree(goya->goya_work);
+ kfree(goya);
+
+ return 0;
+}
+
+static void goya_init_dma_qman(struct hl_device *hdev, int dma_id,
+ dma_addr_t bus_address)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 gic_base_lo, gic_base_hi;
+ u32 reg_off = dma_id * (mmDMA_QM_1_PQ_PI - mmDMA_QM_0_PQ_PI);
+ u32 dma_err_cfg = QMAN_DMA_ERR_MSG_EN;
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ WREG32(mmDMA_QM_0_PQ_BASE_LO + reg_off, lower_32_bits(bus_address));
+ WREG32(mmDMA_QM_0_PQ_BASE_HI + reg_off, upper_32_bits(bus_address));
+
+ WREG32(mmDMA_QM_0_PQ_SIZE + reg_off, ilog2(HL_QUEUE_LENGTH));
+ WREG32(mmDMA_QM_0_PQ_PI + reg_off, 0);
+ WREG32(mmDMA_QM_0_PQ_CI + reg_off, 0);
+
+ WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
+ WREG32(mmDMA_QM_0_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
+ WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
+ WREG32(mmDMA_QM_0_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
+ WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
+ WREG32(mmDMA_QM_0_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
+ WREG32(mmDMA_QM_0_GLBL_ERR_WDATA + reg_off,
+ GOYA_ASYNC_EVENT_ID_DMA0_QM + dma_id);
+
+ /* PQ has buffer of 2 cache lines, while CQ has 8 lines */
+ WREG32(mmDMA_QM_0_PQ_CFG1 + reg_off, 0x00020002);
+ WREG32(mmDMA_QM_0_CQ_CFG1 + reg_off, 0x00080008);
+
+ if (goya->hw_cap_initialized & HW_CAP_MMU)
+ WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_PARTLY_TRUSTED);
+ else
+ WREG32(mmDMA_QM_0_GLBL_PROT + reg_off, QMAN_DMA_FULLY_TRUSTED);
+
+ if (hdev->stop_on_err)
+ dma_err_cfg |= 1 << DMA_QM_0_GLBL_ERR_CFG_DMA_STOP_ON_ERR_SHIFT;
+
+ WREG32(mmDMA_QM_0_GLBL_ERR_CFG + reg_off, dma_err_cfg);
+ WREG32(mmDMA_QM_0_GLBL_CFG0 + reg_off, QMAN_DMA_ENABLE);
+}
+
+static void goya_init_dma_ch(struct hl_device *hdev, int dma_id)
+{
+ u32 gic_base_lo, gic_base_hi;
+ u64 sob_addr;
+ u32 reg_off = dma_id * (mmDMA_CH_1_CFG1 - mmDMA_CH_0_CFG1);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ WREG32(mmDMA_CH_0_ERRMSG_ADDR_LO + reg_off, gic_base_lo);
+ WREG32(mmDMA_CH_0_ERRMSG_ADDR_HI + reg_off, gic_base_hi);
+ WREG32(mmDMA_CH_0_ERRMSG_WDATA + reg_off,
+ GOYA_ASYNC_EVENT_ID_DMA0_CH + dma_id);
+
+ if (dma_id)
+ sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
+ (dma_id - 1) * 4;
+ else
+ sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
+
+ WREG32(mmDMA_CH_0_WR_COMP_ADDR_HI + reg_off, upper_32_bits(sob_addr));
+ WREG32(mmDMA_CH_0_WR_COMP_WDATA + reg_off, 0x80000001);
+}
+
+/*
+ * goya_init_dma_qmans - Initialize QMAN DMA registers
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Initialize the H/W registers of the QMAN DMA channels
+ *
+ */
+void goya_init_dma_qmans(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ struct hl_hw_queue *q;
+ int i;
+
+ if (goya->hw_cap_initialized & HW_CAP_DMA)
+ return;
+
+ q = &hdev->kernel_queues[0];
+
+ for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++, q++) {
+ q->cq_id = q->msi_vec = i;
+ goya_init_dma_qman(hdev, i, q->bus_address);
+ goya_init_dma_ch(hdev, i);
+ }
+
+ goya->hw_cap_initialized |= HW_CAP_DMA;
+}
+
+/*
+ * goya_disable_external_queues - Disable external queues
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static void goya_disable_external_queues(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_DMA))
+ return;
+
+ WREG32(mmDMA_QM_0_GLBL_CFG0, 0);
+ WREG32(mmDMA_QM_1_GLBL_CFG0, 0);
+ WREG32(mmDMA_QM_2_GLBL_CFG0, 0);
+ WREG32(mmDMA_QM_3_GLBL_CFG0, 0);
+ WREG32(mmDMA_QM_4_GLBL_CFG0, 0);
+}
+
+static int goya_stop_queue(struct hl_device *hdev, u32 cfg_reg,
+ u32 cp_sts_reg, u32 glbl_sts0_reg)
+{
+ int rc;
+ u32 status;
+
+ /* use the values of TPC0 as they are all the same*/
+
+ WREG32(cfg_reg, 1 << TPC0_QM_GLBL_CFG1_CP_STOP_SHIFT);
+
+ status = RREG32(cp_sts_reg);
+ if (status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK) {
+ rc = hl_poll_timeout(
+ hdev,
+ cp_sts_reg,
+ status,
+ !(status & TPC0_QM_CP_STS_FENCE_IN_PROGRESS_MASK),
+ 1000,
+ QMAN_FENCE_TIMEOUT_USEC);
+
+ /* if QMAN is stuck in fence no need to check for stop */
+ if (rc)
+ return 0;
+ }
+
+ rc = hl_poll_timeout(
+ hdev,
+ glbl_sts0_reg,
+ status,
+ (status & TPC0_QM_GLBL_STS0_CP_IS_STOP_MASK),
+ 1000,
+ QMAN_STOP_TIMEOUT_USEC);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout while waiting for QMAN to stop\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * goya_stop_external_queues - Stop external queues
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Returns 0 on success
+ *
+ */
+static int goya_stop_external_queues(struct hl_device *hdev)
+{
+ int rc, retval = 0;
+
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_DMA))
+ return retval;
+
+ rc = goya_stop_queue(hdev,
+ mmDMA_QM_0_GLBL_CFG1,
+ mmDMA_QM_0_CP_STS,
+ mmDMA_QM_0_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop DMA QMAN 0\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmDMA_QM_1_GLBL_CFG1,
+ mmDMA_QM_1_CP_STS,
+ mmDMA_QM_1_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop DMA QMAN 1\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmDMA_QM_2_GLBL_CFG1,
+ mmDMA_QM_2_CP_STS,
+ mmDMA_QM_2_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop DMA QMAN 2\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmDMA_QM_3_GLBL_CFG1,
+ mmDMA_QM_3_CP_STS,
+ mmDMA_QM_3_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop DMA QMAN 3\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmDMA_QM_4_GLBL_CFG1,
+ mmDMA_QM_4_CP_STS,
+ mmDMA_QM_4_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop DMA QMAN 4\n");
+ retval = -EIO;
+ }
+
+ return retval;
+}
+
+/*
+ * goya_init_cpu_queues - Initialize PQ/CQ/EQ of CPU
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Returns 0 on success
+ *
+ */
+int goya_init_cpu_queues(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct hl_eq *eq;
+ u32 status;
+ struct hl_hw_queue *cpu_pq = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
+ int err;
+
+ if (!hdev->cpu_queues_enable)
+ return 0;
+
+ if (goya->hw_cap_initialized & HW_CAP_CPU_Q)
+ return 0;
+
+ eq = &hdev->event_queue;
+
+ WREG32(mmCPU_PQ_BASE_ADDR_LOW, lower_32_bits(cpu_pq->bus_address));
+ WREG32(mmCPU_PQ_BASE_ADDR_HIGH, upper_32_bits(cpu_pq->bus_address));
+
+ WREG32(mmCPU_EQ_BASE_ADDR_LOW, lower_32_bits(eq->bus_address));
+ WREG32(mmCPU_EQ_BASE_ADDR_HIGH, upper_32_bits(eq->bus_address));
+
+ WREG32(mmCPU_CQ_BASE_ADDR_LOW,
+ lower_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
+ WREG32(mmCPU_CQ_BASE_ADDR_HIGH,
+ upper_32_bits(VA_CPU_ACCESSIBLE_MEM_ADDR));
+
+ WREG32(mmCPU_PQ_LENGTH, HL_QUEUE_SIZE_IN_BYTES);
+ WREG32(mmCPU_EQ_LENGTH, HL_EQ_SIZE_IN_BYTES);
+ WREG32(mmCPU_CQ_LENGTH, HL_CPU_ACCESSIBLE_MEM_SIZE);
+
+ /* Used for EQ CI */
+ WREG32(mmCPU_EQ_CI, 0);
+
+ WREG32(mmCPU_IF_PF_PQ_PI, 0);
+
+ WREG32(mmCPU_PQ_INIT_STATUS, PQ_INIT_STATUS_READY_FOR_CP);
+
+ WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
+ GOYA_ASYNC_EVENT_ID_PI_UPDATE);
+
+ err = hl_poll_timeout(
+ hdev,
+ mmCPU_PQ_INIT_STATUS,
+ status,
+ (status == PQ_INIT_STATUS_READY_FOR_HOST),
+ 1000,
+ GOYA_CPU_TIMEOUT_USEC);
+
+ if (err) {
+ dev_err(hdev->dev,
+ "Failed to setup communication with device CPU\n");
+ return -EIO;
+ }
+
+ /* update FW application security bits */
+ if (prop->fw_cpu_boot_dev_sts0_valid)
+ prop->fw_app_cpu_boot_dev_sts0 = RREG32(mmCPU_BOOT_DEV_STS0);
+
+ if (prop->fw_cpu_boot_dev_sts1_valid)
+ prop->fw_app_cpu_boot_dev_sts1 = RREG32(mmCPU_BOOT_DEV_STS1);
+
+ goya->hw_cap_initialized |= HW_CAP_CPU_Q;
+ return 0;
+}
+
+static void goya_set_pll_refclk(struct hl_device *hdev)
+{
+ WREG32(mmCPU_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmCPU_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmCPU_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmCPU_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmIC_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmIC_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmIC_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmIC_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmMC_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmMC_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmMC_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmMC_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmPSOC_MME_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmPSOC_MME_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmPSOC_MME_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmPSOC_MME_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmPSOC_PCI_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmPSOC_PCI_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmPSOC_PCI_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmPSOC_PCI_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmPSOC_EMMC_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmPSOC_EMMC_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmPSOC_EMMC_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmPSOC_EMMC_PLL_DIV_SEL_3, 0x0);
+
+ WREG32(mmTPC_PLL_DIV_SEL_0, 0x0);
+ WREG32(mmTPC_PLL_DIV_SEL_1, 0x0);
+ WREG32(mmTPC_PLL_DIV_SEL_2, 0x0);
+ WREG32(mmTPC_PLL_DIV_SEL_3, 0x0);
+}
+
+static void goya_disable_clk_rlx(struct hl_device *hdev)
+{
+ WREG32(mmPSOC_MME_PLL_CLK_RLX_0, 0x100010);
+ WREG32(mmIC_PLL_CLK_RLX_0, 0x100010);
+}
+
+static void _goya_tpc_mbist_workaround(struct hl_device *hdev, u8 tpc_id)
+{
+ u64 tpc_eml_address;
+ u32 val, tpc_offset, tpc_eml_offset, tpc_slm_offset;
+ int err, slm_index;
+
+ tpc_offset = tpc_id * 0x40000;
+ tpc_eml_offset = tpc_id * 0x200000;
+ tpc_eml_address = (mmTPC0_EML_CFG_BASE + tpc_eml_offset - CFG_BASE);
+ tpc_slm_offset = tpc_eml_address + 0x100000;
+
+ /*
+ * Workaround for Bug H2 #2443 :
+ * "TPC SB is not initialized on chip reset"
+ */
+
+ val = RREG32(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset);
+ if (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_ACTIVE_MASK)
+ dev_warn(hdev->dev, "TPC%d MBIST ACTIVE is not cleared\n",
+ tpc_id);
+
+ WREG32(mmTPC0_CFG_FUNC_MBIST_PAT + tpc_offset, val & 0xFFFFF000);
+
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_0 + tpc_offset, 0x37FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_1 + tpc_offset, 0x303F);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_2 + tpc_offset, 0x71FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_3 + tpc_offset, 0x71FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_4 + tpc_offset, 0x70FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_5 + tpc_offset, 0x70FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_6 + tpc_offset, 0x70FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_7 + tpc_offset, 0x70FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_8 + tpc_offset, 0x70FF);
+ WREG32(mmTPC0_CFG_FUNC_MBIST_MEM_9 + tpc_offset, 0x70FF);
+
+ WREG32_OR(mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
+ 1 << TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_START_SHIFT);
+
+ err = hl_poll_timeout(
+ hdev,
+ mmTPC0_CFG_FUNC_MBIST_CNTRL + tpc_offset,
+ val,
+ (val & TPC0_CFG_FUNC_MBIST_CNTRL_MBIST_DONE_MASK),
+ 1000,
+ HL_DEVICE_TIMEOUT_USEC);
+
+ if (err)
+ dev_err(hdev->dev,
+ "Timeout while waiting for TPC%d MBIST DONE\n", tpc_id);
+
+ WREG32_OR(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
+ 1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT);
+
+ msleep(GOYA_RESET_WAIT_MSEC);
+
+ WREG32_AND(mmTPC0_EML_CFG_DBG_CNT + tpc_eml_offset,
+ ~(1 << TPC0_EML_CFG_DBG_CNT_CORE_RST_SHIFT));
+
+ msleep(GOYA_RESET_WAIT_MSEC);
+
+ for (slm_index = 0 ; slm_index < 256 ; slm_index++)
+ WREG32(tpc_slm_offset + (slm_index << 2), 0);
+
+ val = RREG32(tpc_slm_offset);
+}
+
+static void goya_tpc_mbist_workaround(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int i;
+
+ if (hdev->pldm)
+ return;
+
+ if (goya->hw_cap_initialized & HW_CAP_TPC_MBIST)
+ return;
+
+ /* Workaround for H2 #2443 */
+
+ for (i = 0 ; i < TPC_MAX_NUM ; i++)
+ _goya_tpc_mbist_workaround(hdev, i);
+
+ goya->hw_cap_initialized |= HW_CAP_TPC_MBIST;
+}
+
+/*
+ * goya_init_golden_registers - Initialize golden registers
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Initialize the H/W registers of the device
+ *
+ */
+static void goya_init_golden_registers(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 polynom[10], tpc_intr_mask, offset;
+ int i;
+
+ if (goya->hw_cap_initialized & HW_CAP_GOLDEN)
+ return;
+
+ polynom[0] = 0x00020080;
+ polynom[1] = 0x00401000;
+ polynom[2] = 0x00200800;
+ polynom[3] = 0x00002000;
+ polynom[4] = 0x00080200;
+ polynom[5] = 0x00040100;
+ polynom[6] = 0x00100400;
+ polynom[7] = 0x00004000;
+ polynom[8] = 0x00010000;
+ polynom[9] = 0x00008000;
+
+ /* Mask all arithmetic interrupts from TPC */
+ tpc_intr_mask = 0x7FFF;
+
+ for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x20000) {
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_RD_RQ_L_ARB + offset, 0x302);
+
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_L_ARB + offset, 0x204);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_L_ARB + offset, 0x204);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_L_ARB + offset, 0x204);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_L_ARB + offset, 0x204);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_L_ARB + offset, 0x204);
+
+
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_E_ARB + offset, 0x206);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_E_ARB + offset, 0x206);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_E_ARB + offset, 0x206);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_E_ARB + offset, 0x207);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_E_ARB + offset, 0x207);
+
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_DATA_W_ARB + offset, 0x207);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_DATA_W_ARB + offset, 0x207);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_DATA_W_ARB + offset, 0x206);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_DATA_W_ARB + offset, 0x206);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_DATA_W_ARB + offset, 0x206);
+
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_E_ARB + offset, 0x101);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_E_ARB + offset, 0x102);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_E_ARB + offset, 0x103);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_E_ARB + offset, 0x104);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_E_ARB + offset, 0x105);
+
+ WREG32(mmSRAM_Y0_X0_RTR_HBW_WR_RS_W_ARB + offset, 0x105);
+ WREG32(mmSRAM_Y0_X1_RTR_HBW_WR_RS_W_ARB + offset, 0x104);
+ WREG32(mmSRAM_Y0_X2_RTR_HBW_WR_RS_W_ARB + offset, 0x103);
+ WREG32(mmSRAM_Y0_X3_RTR_HBW_WR_RS_W_ARB + offset, 0x102);
+ WREG32(mmSRAM_Y0_X4_RTR_HBW_WR_RS_W_ARB + offset, 0x101);
+ }
+
+ WREG32(mmMME_STORE_MAX_CREDIT, 0x21);
+ WREG32(mmMME_AGU, 0x0f0f0f10);
+ WREG32(mmMME_SEI_MASK, ~0x0);
+
+ WREG32(mmMME6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
+ WREG32(mmMME5_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
+ WREG32(mmMME4_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
+ WREG32(mmMME3_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
+ WREG32(mmMME2_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
+ WREG32(mmMME1_RTR_HBW_RD_RQ_N_ARB, 0x07010701);
+ WREG32(mmMME6_RTR_HBW_RD_RQ_S_ARB, 0x04010401);
+ WREG32(mmMME5_RTR_HBW_RD_RQ_S_ARB, 0x04050401);
+ WREG32(mmMME4_RTR_HBW_RD_RQ_S_ARB, 0x03070301);
+ WREG32(mmMME3_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
+ WREG32(mmMME2_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
+ WREG32(mmMME1_RTR_HBW_RD_RQ_S_ARB, 0x01050105);
+ WREG32(mmMME6_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
+ WREG32(mmMME5_RTR_HBW_RD_RQ_W_ARB, 0x01010501);
+ WREG32(mmMME4_RTR_HBW_RD_RQ_W_ARB, 0x01040301);
+ WREG32(mmMME3_RTR_HBW_RD_RQ_W_ARB, 0x01030401);
+ WREG32(mmMME2_RTR_HBW_RD_RQ_W_ARB, 0x01040101);
+ WREG32(mmMME1_RTR_HBW_RD_RQ_W_ARB, 0x01050101);
+ WREG32(mmMME6_RTR_HBW_WR_RQ_N_ARB, 0x02020202);
+ WREG32(mmMME5_RTR_HBW_WR_RQ_N_ARB, 0x01070101);
+ WREG32(mmMME4_RTR_HBW_WR_RQ_N_ARB, 0x02020201);
+ WREG32(mmMME3_RTR_HBW_WR_RQ_N_ARB, 0x07020701);
+ WREG32(mmMME2_RTR_HBW_WR_RQ_N_ARB, 0x01020101);
+ WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
+ WREG32(mmMME6_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
+ WREG32(mmMME5_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
+ WREG32(mmMME4_RTR_HBW_WR_RQ_S_ARB, 0x07020701);
+ WREG32(mmMME3_RTR_HBW_WR_RQ_S_ARB, 0x02020201);
+ WREG32(mmMME2_RTR_HBW_WR_RQ_S_ARB, 0x01070101);
+ WREG32(mmMME1_RTR_HBW_WR_RQ_S_ARB, 0x01020102);
+ WREG32(mmMME6_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
+ WREG32(mmMME5_RTR_HBW_WR_RQ_W_ARB, 0x01020701);
+ WREG32(mmMME4_RTR_HBW_WR_RQ_W_ARB, 0x07020707);
+ WREG32(mmMME3_RTR_HBW_WR_RQ_W_ARB, 0x01020201);
+ WREG32(mmMME2_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
+ WREG32(mmMME1_RTR_HBW_WR_RQ_W_ARB, 0x01070201);
+ WREG32(mmMME6_RTR_HBW_RD_RS_N_ARB, 0x01070102);
+ WREG32(mmMME5_RTR_HBW_RD_RS_N_ARB, 0x01070102);
+ WREG32(mmMME4_RTR_HBW_RD_RS_N_ARB, 0x01060102);
+ WREG32(mmMME3_RTR_HBW_RD_RS_N_ARB, 0x01040102);
+ WREG32(mmMME2_RTR_HBW_RD_RS_N_ARB, 0x01020102);
+ WREG32(mmMME1_RTR_HBW_RD_RS_N_ARB, 0x01020107);
+ WREG32(mmMME6_RTR_HBW_RD_RS_S_ARB, 0x01020106);
+ WREG32(mmMME5_RTR_HBW_RD_RS_S_ARB, 0x01020102);
+ WREG32(mmMME4_RTR_HBW_RD_RS_S_ARB, 0x01040102);
+ WREG32(mmMME3_RTR_HBW_RD_RS_S_ARB, 0x01060102);
+ WREG32(mmMME2_RTR_HBW_RD_RS_S_ARB, 0x01070102);
+ WREG32(mmMME1_RTR_HBW_RD_RS_S_ARB, 0x01070102);
+ WREG32(mmMME6_RTR_HBW_RD_RS_E_ARB, 0x01020702);
+ WREG32(mmMME5_RTR_HBW_RD_RS_E_ARB, 0x01020702);
+ WREG32(mmMME4_RTR_HBW_RD_RS_E_ARB, 0x01040602);
+ WREG32(mmMME3_RTR_HBW_RD_RS_E_ARB, 0x01060402);
+ WREG32(mmMME2_RTR_HBW_RD_RS_E_ARB, 0x01070202);
+ WREG32(mmMME1_RTR_HBW_RD_RS_E_ARB, 0x01070102);
+ WREG32(mmMME6_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME5_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME4_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME3_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME2_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME1_RTR_HBW_RD_RS_W_ARB, 0x01060401);
+ WREG32(mmMME6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
+ WREG32(mmMME5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
+ WREG32(mmMME4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
+ WREG32(mmMME3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
+ WREG32(mmMME2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
+ WREG32(mmMME1_RTR_HBW_WR_RS_N_ARB, 0x01010107);
+ WREG32(mmMME6_RTR_HBW_WR_RS_S_ARB, 0x01010107);
+ WREG32(mmMME5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
+ WREG32(mmMME4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
+ WREG32(mmMME3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
+ WREG32(mmMME2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
+ WREG32(mmMME1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
+ WREG32(mmMME6_RTR_HBW_WR_RS_E_ARB, 0x01010501);
+ WREG32(mmMME5_RTR_HBW_WR_RS_E_ARB, 0x01010501);
+ WREG32(mmMME4_RTR_HBW_WR_RS_E_ARB, 0x01040301);
+ WREG32(mmMME3_RTR_HBW_WR_RS_E_ARB, 0x01030401);
+ WREG32(mmMME2_RTR_HBW_WR_RS_E_ARB, 0x01040101);
+ WREG32(mmMME1_RTR_HBW_WR_RS_E_ARB, 0x01050101);
+ WREG32(mmMME6_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+ WREG32(mmMME5_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+ WREG32(mmMME4_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+ WREG32(mmMME3_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+ WREG32(mmMME2_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+ WREG32(mmMME1_RTR_HBW_WR_RS_W_ARB, 0x01010101);
+
+ WREG32(mmTPC1_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
+ WREG32(mmTPC1_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
+ WREG32(mmTPC1_RTR_HBW_RD_RQ_E_ARB, 0x01060101);
+ WREG32(mmTPC1_RTR_HBW_WR_RQ_N_ARB, 0x02020102);
+ WREG32(mmTPC1_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
+ WREG32(mmTPC1_RTR_HBW_WR_RQ_E_ARB, 0x02070202);
+ WREG32(mmTPC1_RTR_HBW_RD_RS_N_ARB, 0x01020201);
+ WREG32(mmTPC1_RTR_HBW_RD_RS_S_ARB, 0x01070201);
+ WREG32(mmTPC1_RTR_HBW_RD_RS_W_ARB, 0x01070202);
+ WREG32(mmTPC1_RTR_HBW_WR_RS_N_ARB, 0x01010101);
+ WREG32(mmTPC1_RTR_HBW_WR_RS_S_ARB, 0x01050101);
+ WREG32(mmTPC1_RTR_HBW_WR_RS_W_ARB, 0x01050101);
+
+ WREG32(mmTPC2_RTR_HBW_RD_RQ_N_ARB, 0x01020101);
+ WREG32(mmTPC2_RTR_HBW_RD_RQ_S_ARB, 0x01050101);
+ WREG32(mmTPC2_RTR_HBW_RD_RQ_E_ARB, 0x01010201);
+ WREG32(mmTPC2_RTR_HBW_WR_RQ_N_ARB, 0x02040102);
+ WREG32(mmTPC2_RTR_HBW_WR_RQ_S_ARB, 0x01050101);
+ WREG32(mmTPC2_RTR_HBW_WR_RQ_E_ARB, 0x02060202);
+ WREG32(mmTPC2_RTR_HBW_RD_RS_N_ARB, 0x01020201);
+ WREG32(mmTPC2_RTR_HBW_RD_RS_S_ARB, 0x01070201);
+ WREG32(mmTPC2_RTR_HBW_RD_RS_W_ARB, 0x01070202);
+ WREG32(mmTPC2_RTR_HBW_WR_RS_N_ARB, 0x01010101);
+ WREG32(mmTPC2_RTR_HBW_WR_RS_S_ARB, 0x01040101);
+ WREG32(mmTPC2_RTR_HBW_WR_RS_W_ARB, 0x01040101);
+
+ WREG32(mmTPC3_RTR_HBW_RD_RQ_N_ARB, 0x01030101);
+ WREG32(mmTPC3_RTR_HBW_RD_RQ_S_ARB, 0x01040101);
+ WREG32(mmTPC3_RTR_HBW_RD_RQ_E_ARB, 0x01040301);
+ WREG32(mmTPC3_RTR_HBW_WR_RQ_N_ARB, 0x02060102);
+ WREG32(mmTPC3_RTR_HBW_WR_RQ_S_ARB, 0x01040101);
+ WREG32(mmTPC3_RTR_HBW_WR_RQ_E_ARB, 0x01040301);
+ WREG32(mmTPC3_RTR_HBW_RD_RS_N_ARB, 0x01040201);
+ WREG32(mmTPC3_RTR_HBW_RD_RS_S_ARB, 0x01060201);
+ WREG32(mmTPC3_RTR_HBW_RD_RS_W_ARB, 0x01060402);
+ WREG32(mmTPC3_RTR_HBW_WR_RS_N_ARB, 0x01020101);
+ WREG32(mmTPC3_RTR_HBW_WR_RS_S_ARB, 0x01030101);
+ WREG32(mmTPC3_RTR_HBW_WR_RS_W_ARB, 0x01030401);
+
+ WREG32(mmTPC4_RTR_HBW_RD_RQ_N_ARB, 0x01040101);
+ WREG32(mmTPC4_RTR_HBW_RD_RQ_S_ARB, 0x01030101);
+ WREG32(mmTPC4_RTR_HBW_RD_RQ_E_ARB, 0x01030401);
+ WREG32(mmTPC4_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
+ WREG32(mmTPC4_RTR_HBW_WR_RQ_S_ARB, 0x01030101);
+ WREG32(mmTPC4_RTR_HBW_WR_RQ_E_ARB, 0x02060702);
+ WREG32(mmTPC4_RTR_HBW_RD_RS_N_ARB, 0x01060201);
+ WREG32(mmTPC4_RTR_HBW_RD_RS_S_ARB, 0x01040201);
+ WREG32(mmTPC4_RTR_HBW_RD_RS_W_ARB, 0x01040602);
+ WREG32(mmTPC4_RTR_HBW_WR_RS_N_ARB, 0x01030101);
+ WREG32(mmTPC4_RTR_HBW_WR_RS_S_ARB, 0x01020101);
+ WREG32(mmTPC4_RTR_HBW_WR_RS_W_ARB, 0x01040301);
+
+ WREG32(mmTPC5_RTR_HBW_RD_RQ_N_ARB, 0x01050101);
+ WREG32(mmTPC5_RTR_HBW_RD_RQ_S_ARB, 0x01020101);
+ WREG32(mmTPC5_RTR_HBW_RD_RQ_E_ARB, 0x01200501);
+ WREG32(mmTPC5_RTR_HBW_WR_RQ_N_ARB, 0x02070102);
+ WREG32(mmTPC5_RTR_HBW_WR_RQ_S_ARB, 0x01020101);
+ WREG32(mmTPC5_RTR_HBW_WR_RQ_E_ARB, 0x02020602);
+ WREG32(mmTPC5_RTR_HBW_RD_RS_N_ARB, 0x01070201);
+ WREG32(mmTPC5_RTR_HBW_RD_RS_S_ARB, 0x01020201);
+ WREG32(mmTPC5_RTR_HBW_RD_RS_W_ARB, 0x01020702);
+ WREG32(mmTPC5_RTR_HBW_WR_RS_N_ARB, 0x01040101);
+ WREG32(mmTPC5_RTR_HBW_WR_RS_S_ARB, 0x01010101);
+ WREG32(mmTPC5_RTR_HBW_WR_RS_W_ARB, 0x01010501);
+
+ WREG32(mmTPC6_RTR_HBW_RD_RQ_N_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_RD_RQ_S_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_RD_RQ_E_ARB, 0x01010601);
+ WREG32(mmTPC6_RTR_HBW_WR_RQ_N_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_WR_RQ_S_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_WR_RQ_E_ARB, 0x02020702);
+ WREG32(mmTPC6_RTR_HBW_RD_RS_N_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_RD_RS_S_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_RD_RS_W_ARB, 0x01020702);
+ WREG32(mmTPC6_RTR_HBW_WR_RS_N_ARB, 0x01050101);
+ WREG32(mmTPC6_RTR_HBW_WR_RS_S_ARB, 0x01010101);
+ WREG32(mmTPC6_RTR_HBW_WR_RS_W_ARB, 0x01010501);
+
+ for (i = 0, offset = 0 ; i < 10 ; i++, offset += 4) {
+ WREG32(mmMME1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmMME2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmMME3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmMME4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmMME5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmMME6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+
+ WREG32(mmTPC0_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC1_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC2_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC3_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC4_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC5_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC6_RTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmTPC7_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+
+ WREG32(mmPCI_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ WREG32(mmDMA_NRTR_SPLIT_COEF_0 + offset, polynom[i] >> 7);
+ }
+
+ for (i = 0, offset = 0 ; i < 6 ; i++, offset += 0x40000) {
+ WREG32(mmMME1_RTR_SCRAMB_EN + offset,
+ 1 << MME1_RTR_SCRAMB_EN_VAL_SHIFT);
+ WREG32(mmMME1_RTR_NON_LIN_SCRAMB + offset,
+ 1 << MME1_RTR_NON_LIN_SCRAMB_EN_SHIFT);
+ }
+
+ for (i = 0, offset = 0 ; i < 8 ; i++, offset += 0x40000) {
+ /*
+ * Workaround for Bug H2 #2441 :
+ * "ST.NOP set trace event illegal opcode"
+ */
+ WREG32(mmTPC0_CFG_TPC_INTR_MASK + offset, tpc_intr_mask);
+
+ WREG32(mmTPC0_NRTR_SCRAMB_EN + offset,
+ 1 << TPC0_NRTR_SCRAMB_EN_VAL_SHIFT);
+ WREG32(mmTPC0_NRTR_NON_LIN_SCRAMB + offset,
+ 1 << TPC0_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
+
+ WREG32_FIELD(TPC0_CFG_MSS_CONFIG, offset,
+ ICACHE_FETCH_LINE_NUM, 2);
+ }
+
+ WREG32(mmDMA_NRTR_SCRAMB_EN, 1 << DMA_NRTR_SCRAMB_EN_VAL_SHIFT);
+ WREG32(mmDMA_NRTR_NON_LIN_SCRAMB,
+ 1 << DMA_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
+
+ WREG32(mmPCI_NRTR_SCRAMB_EN, 1 << PCI_NRTR_SCRAMB_EN_VAL_SHIFT);
+ WREG32(mmPCI_NRTR_NON_LIN_SCRAMB,
+ 1 << PCI_NRTR_NON_LIN_SCRAMB_EN_SHIFT);
+
+ /*
+ * Workaround for H2 #HW-23 bug
+ * Set DMA max outstanding read requests to 240 on DMA CH 1.
+ * This limitation is still large enough to not affect Gen4 bandwidth.
+ * We need to only limit that DMA channel because the user can only read
+ * from Host using DMA CH 1
+ */
+ WREG32(mmDMA_CH_1_CFG0, 0x0fff00F0);
+
+ WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
+
+ goya->hw_cap_initialized |= HW_CAP_GOLDEN;
+}
+
+static void goya_init_mme_qman(struct hl_device *hdev)
+{
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 gic_base_lo, gic_base_hi;
+ u64 qman_base_addr;
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ qman_base_addr = hdev->asic_prop.sram_base_address +
+ MME_QMAN_BASE_OFFSET;
+
+ WREG32(mmMME_QM_PQ_BASE_LO, lower_32_bits(qman_base_addr));
+ WREG32(mmMME_QM_PQ_BASE_HI, upper_32_bits(qman_base_addr));
+ WREG32(mmMME_QM_PQ_SIZE, ilog2(MME_QMAN_LENGTH));
+ WREG32(mmMME_QM_PQ_PI, 0);
+ WREG32(mmMME_QM_PQ_CI, 0);
+ WREG32(mmMME_QM_CP_LDMA_SRC_BASE_LO_OFFSET, 0x10C0);
+ WREG32(mmMME_QM_CP_LDMA_SRC_BASE_HI_OFFSET, 0x10C4);
+ WREG32(mmMME_QM_CP_LDMA_TSIZE_OFFSET, 0x10C8);
+ WREG32(mmMME_QM_CP_LDMA_COMMIT_OFFSET, 0x10CC);
+
+ WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
+ WREG32(mmMME_QM_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
+ WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_LO, so_base_lo);
+ WREG32(mmMME_QM_CP_MSG_BASE1_ADDR_HI, so_base_hi);
+
+ /* QMAN CQ has 8 cache lines */
+ WREG32(mmMME_QM_CQ_CFG1, 0x00080008);
+
+ WREG32(mmMME_QM_GLBL_ERR_ADDR_LO, gic_base_lo);
+ WREG32(mmMME_QM_GLBL_ERR_ADDR_HI, gic_base_hi);
+
+ WREG32(mmMME_QM_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_QM);
+
+ WREG32(mmMME_QM_GLBL_ERR_CFG, QMAN_MME_ERR_MSG_EN);
+
+ WREG32(mmMME_QM_GLBL_PROT, QMAN_MME_ERR_PROT);
+
+ WREG32(mmMME_QM_GLBL_CFG0, QMAN_MME_ENABLE);
+}
+
+static void goya_init_mme_cmdq(struct hl_device *hdev)
+{
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 gic_base_lo, gic_base_hi;
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_LO, mtr_base_lo);
+ WREG32(mmMME_CMDQ_CP_MSG_BASE0_ADDR_HI, mtr_base_hi);
+ WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_LO, so_base_lo);
+ WREG32(mmMME_CMDQ_CP_MSG_BASE1_ADDR_HI, so_base_hi);
+
+ /* CMDQ CQ has 20 cache lines */
+ WREG32(mmMME_CMDQ_CQ_CFG1, 0x00140014);
+
+ WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_LO, gic_base_lo);
+ WREG32(mmMME_CMDQ_GLBL_ERR_ADDR_HI, gic_base_hi);
+
+ WREG32(mmMME_CMDQ_GLBL_ERR_WDATA, GOYA_ASYNC_EVENT_ID_MME_CMDQ);
+
+ WREG32(mmMME_CMDQ_GLBL_ERR_CFG, CMDQ_MME_ERR_MSG_EN);
+
+ WREG32(mmMME_CMDQ_GLBL_PROT, CMDQ_MME_ERR_PROT);
+
+ WREG32(mmMME_CMDQ_GLBL_CFG0, CMDQ_MME_ENABLE);
+}
+
+void goya_init_mme_qmans(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 so_base_lo, so_base_hi;
+
+ if (goya->hw_cap_initialized & HW_CAP_MME)
+ return;
+
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ WREG32(mmMME_SM_BASE_ADDRESS_LOW, so_base_lo);
+ WREG32(mmMME_SM_BASE_ADDRESS_HIGH, so_base_hi);
+
+ goya_init_mme_qman(hdev);
+ goya_init_mme_cmdq(hdev);
+
+ goya->hw_cap_initialized |= HW_CAP_MME;
+}
+
+static void goya_init_tpc_qman(struct hl_device *hdev, u32 base_off, int tpc_id)
+{
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 gic_base_lo, gic_base_hi;
+ u64 qman_base_addr;
+ u32 reg_off = tpc_id * (mmTPC1_QM_PQ_PI - mmTPC0_QM_PQ_PI);
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ qman_base_addr = hdev->asic_prop.sram_base_address + base_off;
+
+ WREG32(mmTPC0_QM_PQ_BASE_LO + reg_off, lower_32_bits(qman_base_addr));
+ WREG32(mmTPC0_QM_PQ_BASE_HI + reg_off, upper_32_bits(qman_base_addr));
+ WREG32(mmTPC0_QM_PQ_SIZE + reg_off, ilog2(TPC_QMAN_LENGTH));
+ WREG32(mmTPC0_QM_PQ_PI + reg_off, 0);
+ WREG32(mmTPC0_QM_PQ_CI + reg_off, 0);
+ WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_LO_OFFSET + reg_off, 0x10C0);
+ WREG32(mmTPC0_QM_CP_LDMA_SRC_BASE_HI_OFFSET + reg_off, 0x10C4);
+ WREG32(mmTPC0_QM_CP_LDMA_TSIZE_OFFSET + reg_off, 0x10C8);
+ WREG32(mmTPC0_QM_CP_LDMA_COMMIT_OFFSET + reg_off, 0x10CC);
+
+ WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
+ WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
+ WREG32(mmTPC0_QM_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
+
+ WREG32(mmTPC0_QM_CQ_CFG1 + reg_off, 0x00080008);
+
+ WREG32(mmTPC0_QM_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
+ WREG32(mmTPC0_QM_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
+
+ WREG32(mmTPC0_QM_GLBL_ERR_WDATA + reg_off,
+ GOYA_ASYNC_EVENT_ID_TPC0_QM + tpc_id);
+
+ WREG32(mmTPC0_QM_GLBL_ERR_CFG + reg_off, QMAN_TPC_ERR_MSG_EN);
+
+ WREG32(mmTPC0_QM_GLBL_PROT + reg_off, QMAN_TPC_ERR_PROT);
+
+ WREG32(mmTPC0_QM_GLBL_CFG0 + reg_off, QMAN_TPC_ENABLE);
+}
+
+static void goya_init_tpc_cmdq(struct hl_device *hdev, int tpc_id)
+{
+ u32 mtr_base_lo, mtr_base_hi;
+ u32 so_base_lo, so_base_hi;
+ u32 gic_base_lo, gic_base_hi;
+ u32 reg_off = tpc_id * (mmTPC1_CMDQ_CQ_CFG1 - mmTPC0_CMDQ_CQ_CFG1);
+
+ mtr_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ mtr_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_MON_PAY_ADDRL_0);
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ gic_base_lo =
+ lower_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+ gic_base_hi =
+ upper_32_bits(CFG_BASE + mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR);
+
+ WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_LO + reg_off, mtr_base_lo);
+ WREG32(mmTPC0_CMDQ_CP_MSG_BASE0_ADDR_HI + reg_off, mtr_base_hi);
+ WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_LO + reg_off, so_base_lo);
+ WREG32(mmTPC0_CMDQ_CP_MSG_BASE1_ADDR_HI + reg_off, so_base_hi);
+
+ WREG32(mmTPC0_CMDQ_CQ_CFG1 + reg_off, 0x00140014);
+
+ WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_LO + reg_off, gic_base_lo);
+ WREG32(mmTPC0_CMDQ_GLBL_ERR_ADDR_HI + reg_off, gic_base_hi);
+
+ WREG32(mmTPC0_CMDQ_GLBL_ERR_WDATA + reg_off,
+ GOYA_ASYNC_EVENT_ID_TPC0_CMDQ + tpc_id);
+
+ WREG32(mmTPC0_CMDQ_GLBL_ERR_CFG + reg_off, CMDQ_TPC_ERR_MSG_EN);
+
+ WREG32(mmTPC0_CMDQ_GLBL_PROT + reg_off, CMDQ_TPC_ERR_PROT);
+
+ WREG32(mmTPC0_CMDQ_GLBL_CFG0 + reg_off, CMDQ_TPC_ENABLE);
+}
+
+void goya_init_tpc_qmans(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 so_base_lo, so_base_hi;
+ u32 cfg_off = mmTPC1_CFG_SM_BASE_ADDRESS_LOW -
+ mmTPC0_CFG_SM_BASE_ADDRESS_LOW;
+ int i;
+
+ if (goya->hw_cap_initialized & HW_CAP_TPC)
+ return;
+
+ so_base_lo = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ so_base_hi = upper_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+
+ for (i = 0 ; i < TPC_MAX_NUM ; i++) {
+ WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_LOW + i * cfg_off,
+ so_base_lo);
+ WREG32(mmTPC0_CFG_SM_BASE_ADDRESS_HIGH + i * cfg_off,
+ so_base_hi);
+ }
+
+ goya_init_tpc_qman(hdev, TPC0_QMAN_BASE_OFFSET, 0);
+ goya_init_tpc_qman(hdev, TPC1_QMAN_BASE_OFFSET, 1);
+ goya_init_tpc_qman(hdev, TPC2_QMAN_BASE_OFFSET, 2);
+ goya_init_tpc_qman(hdev, TPC3_QMAN_BASE_OFFSET, 3);
+ goya_init_tpc_qman(hdev, TPC4_QMAN_BASE_OFFSET, 4);
+ goya_init_tpc_qman(hdev, TPC5_QMAN_BASE_OFFSET, 5);
+ goya_init_tpc_qman(hdev, TPC6_QMAN_BASE_OFFSET, 6);
+ goya_init_tpc_qman(hdev, TPC7_QMAN_BASE_OFFSET, 7);
+
+ for (i = 0 ; i < TPC_MAX_NUM ; i++)
+ goya_init_tpc_cmdq(hdev, i);
+
+ goya->hw_cap_initialized |= HW_CAP_TPC;
+}
+
+/*
+ * goya_disable_internal_queues - Disable internal queues
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ */
+static void goya_disable_internal_queues(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MME))
+ goto disable_tpc;
+
+ WREG32(mmMME_QM_GLBL_CFG0, 0);
+ WREG32(mmMME_CMDQ_GLBL_CFG0, 0);
+
+disable_tpc:
+ if (!(goya->hw_cap_initialized & HW_CAP_TPC))
+ return;
+
+ WREG32(mmTPC0_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC0_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC1_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC1_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC2_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC2_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC3_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC3_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC4_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC4_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC5_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC5_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC6_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC6_CMDQ_GLBL_CFG0, 0);
+
+ WREG32(mmTPC7_QM_GLBL_CFG0, 0);
+ WREG32(mmTPC7_CMDQ_GLBL_CFG0, 0);
+}
+
+/*
+ * goya_stop_internal_queues - Stop internal queues
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Returns 0 on success
+ *
+ */
+static int goya_stop_internal_queues(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int rc, retval = 0;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MME))
+ goto stop_tpc;
+
+ /*
+ * Each queue (QMAN) is a separate H/W logic. That means that each
+ * QMAN can be stopped independently and failure to stop one does NOT
+ * mandate we should not try to stop other QMANs
+ */
+
+ rc = goya_stop_queue(hdev,
+ mmMME_QM_GLBL_CFG1,
+ mmMME_QM_CP_STS,
+ mmMME_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop MME QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmMME_CMDQ_GLBL_CFG1,
+ mmMME_CMDQ_CP_STS,
+ mmMME_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop MME CMDQ\n");
+ retval = -EIO;
+ }
+
+stop_tpc:
+ if (!(goya->hw_cap_initialized & HW_CAP_TPC))
+ return retval;
+
+ rc = goya_stop_queue(hdev,
+ mmTPC0_QM_GLBL_CFG1,
+ mmTPC0_QM_CP_STS,
+ mmTPC0_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 0 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC0_CMDQ_GLBL_CFG1,
+ mmTPC0_CMDQ_CP_STS,
+ mmTPC0_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 0 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC1_QM_GLBL_CFG1,
+ mmTPC1_QM_CP_STS,
+ mmTPC1_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 1 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC1_CMDQ_GLBL_CFG1,
+ mmTPC1_CMDQ_CP_STS,
+ mmTPC1_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 1 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC2_QM_GLBL_CFG1,
+ mmTPC2_QM_CP_STS,
+ mmTPC2_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 2 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC2_CMDQ_GLBL_CFG1,
+ mmTPC2_CMDQ_CP_STS,
+ mmTPC2_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 2 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC3_QM_GLBL_CFG1,
+ mmTPC3_QM_CP_STS,
+ mmTPC3_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 3 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC3_CMDQ_GLBL_CFG1,
+ mmTPC3_CMDQ_CP_STS,
+ mmTPC3_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 3 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC4_QM_GLBL_CFG1,
+ mmTPC4_QM_CP_STS,
+ mmTPC4_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 4 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC4_CMDQ_GLBL_CFG1,
+ mmTPC4_CMDQ_CP_STS,
+ mmTPC4_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 4 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC5_QM_GLBL_CFG1,
+ mmTPC5_QM_CP_STS,
+ mmTPC5_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 5 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC5_CMDQ_GLBL_CFG1,
+ mmTPC5_CMDQ_CP_STS,
+ mmTPC5_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 5 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC6_QM_GLBL_CFG1,
+ mmTPC6_QM_CP_STS,
+ mmTPC6_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 6 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC6_CMDQ_GLBL_CFG1,
+ mmTPC6_CMDQ_CP_STS,
+ mmTPC6_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 6 CMDQ\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC7_QM_GLBL_CFG1,
+ mmTPC7_QM_CP_STS,
+ mmTPC7_QM_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 7 QMAN\n");
+ retval = -EIO;
+ }
+
+ rc = goya_stop_queue(hdev,
+ mmTPC7_CMDQ_GLBL_CFG1,
+ mmTPC7_CMDQ_CP_STS,
+ mmTPC7_CMDQ_GLBL_STS0);
+
+ if (rc) {
+ dev_err(hdev->dev, "failed to stop TPC 7 CMDQ\n");
+ retval = -EIO;
+ }
+
+ return retval;
+}
+
+static void goya_dma_stall(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_DMA))
+ return;
+
+ WREG32(mmDMA_QM_0_GLBL_CFG1, 1 << DMA_QM_0_GLBL_CFG1_DMA_STOP_SHIFT);
+ WREG32(mmDMA_QM_1_GLBL_CFG1, 1 << DMA_QM_1_GLBL_CFG1_DMA_STOP_SHIFT);
+ WREG32(mmDMA_QM_2_GLBL_CFG1, 1 << DMA_QM_2_GLBL_CFG1_DMA_STOP_SHIFT);
+ WREG32(mmDMA_QM_3_GLBL_CFG1, 1 << DMA_QM_3_GLBL_CFG1_DMA_STOP_SHIFT);
+ WREG32(mmDMA_QM_4_GLBL_CFG1, 1 << DMA_QM_4_GLBL_CFG1_DMA_STOP_SHIFT);
+}
+
+static void goya_tpc_stall(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_TPC))
+ return;
+
+ WREG32(mmTPC0_CFG_TPC_STALL, 1 << TPC0_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC1_CFG_TPC_STALL, 1 << TPC1_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC2_CFG_TPC_STALL, 1 << TPC2_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC3_CFG_TPC_STALL, 1 << TPC3_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC4_CFG_TPC_STALL, 1 << TPC4_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC5_CFG_TPC_STALL, 1 << TPC5_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC6_CFG_TPC_STALL, 1 << TPC6_CFG_TPC_STALL_V_SHIFT);
+ WREG32(mmTPC7_CFG_TPC_STALL, 1 << TPC7_CFG_TPC_STALL_V_SHIFT);
+}
+
+static void goya_mme_stall(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MME))
+ return;
+
+ WREG32(mmMME_STALL, 0xFFFFFFFF);
+}
+
+static int goya_enable_msix(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int cq_cnt = hdev->asic_prop.completion_queues_count;
+ int rc, i, irq_cnt_init, irq;
+
+ if (goya->hw_cap_initialized & HW_CAP_MSIX)
+ return 0;
+
+ rc = pci_alloc_irq_vectors(hdev->pdev, GOYA_MSIX_ENTRIES,
+ GOYA_MSIX_ENTRIES, PCI_IRQ_MSIX);
+ if (rc < 0) {
+ dev_err(hdev->dev,
+ "MSI-X: Failed to enable support -- %d/%d\n",
+ GOYA_MSIX_ENTRIES, rc);
+ return rc;
+ }
+
+ for (i = 0, irq_cnt_init = 0 ; i < cq_cnt ; i++, irq_cnt_init++) {
+ irq = pci_irq_vector(hdev->pdev, i);
+ rc = request_irq(irq, hl_irq_handler_cq, 0, goya_irq_name[i],
+ &hdev->completion_queue[i]);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_irqs;
+ }
+ }
+
+ irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
+
+ rc = request_irq(irq, hl_irq_handler_eq, 0,
+ goya_irq_name[GOYA_EVENT_QUEUE_MSIX_IDX],
+ &hdev->event_queue);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to request IRQ %d", irq);
+ goto free_irqs;
+ }
+
+ goya->hw_cap_initialized |= HW_CAP_MSIX;
+ return 0;
+
+free_irqs:
+ for (i = 0 ; i < irq_cnt_init ; i++)
+ free_irq(pci_irq_vector(hdev->pdev, i),
+ &hdev->completion_queue[i]);
+
+ pci_free_irq_vectors(hdev->pdev);
+ return rc;
+}
+
+static void goya_sync_irqs(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int i;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
+ return;
+
+ /* Wait for all pending IRQs to be finished */
+ for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
+ synchronize_irq(pci_irq_vector(hdev->pdev, i));
+
+ synchronize_irq(pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX));
+}
+
+static void goya_disable_msix(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int i, irq;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MSIX))
+ return;
+
+ goya_sync_irqs(hdev);
+
+ irq = pci_irq_vector(hdev->pdev, GOYA_EVENT_QUEUE_MSIX_IDX);
+ free_irq(irq, &hdev->event_queue);
+
+ for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++) {
+ irq = pci_irq_vector(hdev->pdev, i);
+ free_irq(irq, &hdev->completion_queue[i]);
+ }
+
+ pci_free_irq_vectors(hdev->pdev);
+
+ goya->hw_cap_initialized &= ~HW_CAP_MSIX;
+}
+
+static void goya_enable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
+
+ /* Zero the lower/upper parts of the 64-bit counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
+
+ /* Enable the counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
+}
+
+static void goya_disable_timestamp(struct hl_device *hdev)
+{
+ /* Disable the timestamp counter */
+ WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
+}
+
+static void goya_halt_engines(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ u32 wait_timeout_ms;
+
+ if (hdev->pldm)
+ wait_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
+ else
+ wait_timeout_ms = GOYA_RESET_WAIT_MSEC;
+
+ goya_stop_external_queues(hdev);
+ goya_stop_internal_queues(hdev);
+
+ msleep(wait_timeout_ms);
+
+ goya_dma_stall(hdev);
+ goya_tpc_stall(hdev);
+ goya_mme_stall(hdev);
+
+ msleep(wait_timeout_ms);
+
+ goya_disable_external_queues(hdev);
+ goya_disable_internal_queues(hdev);
+
+ goya_disable_timestamp(hdev);
+
+ if (hard_reset) {
+ goya_disable_msix(hdev);
+ goya_mmu_remove_device_cpu_mappings(hdev);
+ } else {
+ goya_sync_irqs(hdev);
+ }
+}
+
+/*
+ * goya_load_firmware_to_device() - Load LINUX FW code to device.
+ * @hdev: Pointer to hl_device structure.
+ *
+ * Copy LINUX fw code from firmware file to HBM BAR.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int goya_load_firmware_to_device(struct hl_device *hdev)
+{
+ void __iomem *dst;
+
+ dst = hdev->pcie_bar[DDR_BAR_ID] + LINUX_FW_OFFSET;
+
+ return hl_fw_load_fw_to_device(hdev, GOYA_LINUX_FW_FILE, dst, 0, 0);
+}
+
+/*
+ * goya_load_boot_fit_to_device() - Load boot fit to device.
+ * @hdev: Pointer to hl_device structure.
+ *
+ * Copy boot fit file to SRAM BAR.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+static int goya_load_boot_fit_to_device(struct hl_device *hdev)
+{
+ void __iomem *dst;
+
+ dst = hdev->pcie_bar[SRAM_CFG_BAR_ID] + BOOT_FIT_SRAM_OFFSET;
+
+ return hl_fw_load_fw_to_device(hdev, GOYA_BOOT_FIT_FILE, dst, 0, 0);
+}
+
+static void goya_init_dynamic_firmware_loader(struct hl_device *hdev)
+{
+ struct dynamic_fw_load_mgr *dynamic_loader;
+ struct cpu_dyn_regs *dyn_regs;
+
+ dynamic_loader = &hdev->fw_loader.dynamic_loader;
+
+ /*
+ * here we update initial values for few specific dynamic regs (as
+ * before reading the first descriptor from FW those value has to be
+ * hard-coded) in later stages of the protocol those values will be
+ * updated automatically by reading the FW descriptor so data there
+ * will always be up-to-date
+ */
+ dyn_regs = &dynamic_loader->comm_desc.cpu_dyn_regs;
+ dyn_regs->kmd_msg_to_cpu =
+ cpu_to_le32(mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU);
+ dyn_regs->cpu_cmd_status_to_host =
+ cpu_to_le32(mmCPU_CMD_STATUS_TO_HOST);
+
+ dynamic_loader->wait_for_bl_timeout = GOYA_WAIT_FOR_BL_TIMEOUT_USEC;
+}
+
+static void goya_init_static_firmware_loader(struct hl_device *hdev)
+{
+ struct static_fw_load_mgr *static_loader;
+
+ static_loader = &hdev->fw_loader.static_loader;
+
+ static_loader->preboot_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
+ static_loader->boot_fit_version_max_off = SRAM_SIZE - VERSION_MAX_LEN;
+ static_loader->kmd_msg_to_cpu_reg = mmPSOC_GLOBAL_CONF_KMD_MSG_TO_CPU;
+ static_loader->cpu_cmd_status_to_host_reg = mmCPU_CMD_STATUS_TO_HOST;
+ static_loader->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
+ static_loader->cpu_boot_dev_status0_reg = mmCPU_BOOT_DEV_STS0;
+ static_loader->cpu_boot_dev_status1_reg = mmCPU_BOOT_DEV_STS1;
+ static_loader->boot_err0_reg = mmCPU_BOOT_ERR0;
+ static_loader->boot_err1_reg = mmCPU_BOOT_ERR1;
+ static_loader->preboot_version_offset_reg = mmPREBOOT_VER_OFFSET;
+ static_loader->boot_fit_version_offset_reg = mmUBOOT_VER_OFFSET;
+ static_loader->sram_offset_mask = ~(lower_32_bits(SRAM_BASE_ADDR));
+}
+
+static void goya_init_firmware_preload_params(struct hl_device *hdev)
+{
+ struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load;
+
+ pre_fw_load->cpu_boot_status_reg = mmPSOC_GLOBAL_CONF_CPU_BOOT_STATUS;
+ pre_fw_load->sts_boot_dev_sts0_reg = mmCPU_BOOT_DEV_STS0;
+ pre_fw_load->sts_boot_dev_sts1_reg = mmCPU_BOOT_DEV_STS1;
+ pre_fw_load->boot_err0_reg = mmCPU_BOOT_ERR0;
+ pre_fw_load->boot_err1_reg = mmCPU_BOOT_ERR1;
+ pre_fw_load->wait_for_preboot_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
+}
+
+static void goya_init_firmware_loader(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct fw_load_mgr *fw_loader = &hdev->fw_loader;
+
+ /* fill common fields */
+ fw_loader->fw_comp_loaded = FW_TYPE_NONE;
+ fw_loader->boot_fit_img.image_name = GOYA_BOOT_FIT_FILE;
+ fw_loader->linux_img.image_name = GOYA_LINUX_FW_FILE;
+ fw_loader->cpu_timeout = GOYA_CPU_TIMEOUT_USEC;
+ fw_loader->boot_fit_timeout = GOYA_BOOT_FIT_REQ_TIMEOUT_USEC;
+ fw_loader->skip_bmc = false;
+ fw_loader->sram_bar_id = SRAM_CFG_BAR_ID;
+ fw_loader->dram_bar_id = DDR_BAR_ID;
+
+ if (prop->dynamic_fw_load)
+ goya_init_dynamic_firmware_loader(hdev);
+ else
+ goya_init_static_firmware_loader(hdev);
+}
+
+static int goya_init_cpu(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int rc;
+
+ if (!(hdev->fw_components & FW_TYPE_PREBOOT_CPU))
+ return 0;
+
+ if (goya->hw_cap_initialized & HW_CAP_CPU)
+ return 0;
+
+ /*
+ * Before pushing u-boot/linux to device, need to set the ddr bar to
+ * base address of dram
+ */
+ if (goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
+ dev_err(hdev->dev,
+ "failed to map DDR bar to DRAM base address\n");
+ return -EIO;
+ }
+
+ rc = hl_fw_init_cpu(hdev);
+
+ if (rc)
+ return rc;
+
+ goya->hw_cap_initialized |= HW_CAP_CPU;
+
+ return 0;
+}
+
+static int goya_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
+ u64 phys_addr)
+{
+ u32 status, timeout_usec;
+ int rc;
+
+ if (hdev->pldm)
+ timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
+
+ WREG32(MMU_HOP0_PA43_12, phys_addr >> MMU_HOP0_PA43_12_SHIFT);
+ WREG32(MMU_HOP0_PA49_44, phys_addr >> MMU_HOP0_PA49_44_SHIFT);
+ WREG32(MMU_ASID_BUSY, 0x80000000 | asid);
+
+ rc = hl_poll_timeout(
+ hdev,
+ MMU_ASID_BUSY,
+ status,
+ !(status & 0x80000000),
+ 1000,
+ timeout_usec);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Timeout during MMU hop0 config of asid %d\n", asid);
+ return rc;
+ }
+
+ return 0;
+}
+
+int goya_mmu_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+ u64 hop0_addr;
+ int rc, i;
+
+ if (!hdev->mmu_enable)
+ return 0;
+
+ if (goya->hw_cap_initialized & HW_CAP_MMU)
+ return 0;
+
+ hdev->dram_default_page_mapping = true;
+
+ for (i = 0 ; i < prop->max_asid ; i++) {
+ hop0_addr = prop->mmu_pgt_addr +
+ (i * prop->mmu_hop_table_size);
+
+ rc = goya_mmu_update_asid_hop0_addr(hdev, i, hop0_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "failed to set hop0 addr for asid %d\n", i);
+ goto err;
+ }
+ }
+
+ goya->hw_cap_initialized |= HW_CAP_MMU;
+
+ /* init MMU cache manage page */
+ WREG32(mmSTLB_CACHE_INV_BASE_39_8,
+ lower_32_bits(MMU_CACHE_MNG_ADDR >> 8));
+ WREG32(mmSTLB_CACHE_INV_BASE_49_40, MMU_CACHE_MNG_ADDR >> 40);
+
+ /* Remove follower feature due to performance bug */
+ WREG32_AND(mmSTLB_STLB_FEATURE_EN,
+ (~STLB_STLB_FEATURE_EN_FOLLOWER_EN_MASK));
+
+ hl_mmu_invalidate_cache(hdev, true, MMU_OP_USERPTR | MMU_OP_PHYS_PACK);
+
+ WREG32(mmMMU_MMU_ENABLE, 1);
+ WREG32(mmMMU_SPI_MASK, 0xF);
+
+ return 0;
+
+err:
+ return rc;
+}
+
+/*
+ * goya_hw_init - Goya hardware initialization code
+ *
+ * @hdev: pointer to hl_device structure
+ *
+ * Returns 0 on success
+ *
+ */
+static int goya_hw_init(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ int rc;
+
+ /* Perform read from the device to make sure device is up */
+ RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
+
+ /*
+ * Let's mark in the H/W that we have reached this point. We check
+ * this value in the reset_before_init function to understand whether
+ * we need to reset the chip before doing H/W init. This register is
+ * cleared by the H/W upon H/W reset
+ */
+ WREG32(mmHW_STATE, HL_DEVICE_HW_STATE_DIRTY);
+
+ rc = goya_init_cpu(hdev);
+ if (rc) {
+ dev_err(hdev->dev, "failed to initialize CPU\n");
+ return rc;
+ }
+
+ goya_tpc_mbist_workaround(hdev);
+
+ goya_init_golden_registers(hdev);
+
+ /*
+ * After CPU initialization is finished, change DDR bar mapping inside
+ * iATU to point to the start address of the MMU page tables
+ */
+ if (goya_set_ddr_bar_base(hdev, (MMU_PAGE_TABLES_ADDR &
+ ~(prop->dram_pci_bar_size - 0x1ull))) == U64_MAX) {
+ dev_err(hdev->dev,
+ "failed to map DDR bar to MMU page tables\n");
+ return -EIO;
+ }
+
+ rc = goya_mmu_init(hdev);
+ if (rc)
+ return rc;
+
+ goya_init_security(hdev);
+
+ goya_init_dma_qmans(hdev);
+
+ goya_init_mme_qmans(hdev);
+
+ goya_init_tpc_qmans(hdev);
+
+ goya_enable_timestamp(hdev);
+
+ /* MSI-X must be enabled before CPU queues are initialized */
+ rc = goya_enable_msix(hdev);
+ if (rc)
+ goto disable_queues;
+
+ /* Perform read from the device to flush all MSI-X configuration */
+ RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
+
+ return 0;
+
+disable_queues:
+ goya_disable_internal_queues(hdev);
+ goya_disable_external_queues(hdev);
+
+ return rc;
+}
+
+static void goya_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_reset)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 reset_timeout_ms, cpu_timeout_ms, status;
+
+ if (hdev->pldm) {
+ reset_timeout_ms = GOYA_PLDM_RESET_TIMEOUT_MSEC;
+ cpu_timeout_ms = GOYA_PLDM_RESET_WAIT_MSEC;
+ } else {
+ reset_timeout_ms = GOYA_RESET_TIMEOUT_MSEC;
+ cpu_timeout_ms = GOYA_CPU_RESET_WAIT_MSEC;
+ }
+
+ if (hard_reset) {
+ /* I don't know what is the state of the CPU so make sure it is
+ * stopped in any means necessary
+ */
+ WREG32(mmPSOC_GLOBAL_CONF_UBOOT_MAGIC, KMD_MSG_GOTO_WFE);
+ WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
+ GOYA_ASYNC_EVENT_ID_HALT_MACHINE);
+
+ msleep(cpu_timeout_ms);
+
+ goya_set_ddr_bar_base(hdev, DRAM_PHYS_BASE);
+ goya_disable_clk_rlx(hdev);
+ goya_set_pll_refclk(hdev);
+
+ WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, RESET_ALL);
+ dev_dbg(hdev->dev,
+ "Issued HARD reset command, going to wait %dms\n",
+ reset_timeout_ms);
+ } else {
+ WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST_CFG, DMA_MME_TPC_RESET);
+ dev_dbg(hdev->dev,
+ "Issued SOFT reset command, going to wait %dms\n",
+ reset_timeout_ms);
+ }
+
+ /*
+ * After hard reset, we can't poll the BTM_FSM register because the PSOC
+ * itself is in reset. In either reset we need to wait until the reset
+ * is deasserted
+ */
+ msleep(reset_timeout_ms);
+
+ status = RREG32(mmPSOC_GLOBAL_CONF_BTM_FSM);
+ if (status & PSOC_GLOBAL_CONF_BTM_FSM_STATE_MASK)
+ dev_err(hdev->dev,
+ "Timeout while waiting for device to reset 0x%x\n",
+ status);
+
+ if (!hard_reset && goya) {
+ goya->hw_cap_initialized &= ~(HW_CAP_DMA | HW_CAP_MME |
+ HW_CAP_GOLDEN | HW_CAP_TPC);
+ WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
+ GOYA_ASYNC_EVENT_ID_SOFT_RESET);
+ return;
+ }
+
+ /* Chicken bit to re-initiate boot sequencer flow */
+ WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START,
+ 1 << PSOC_GLOBAL_CONF_BOOT_SEQ_RE_START_IND_SHIFT);
+ /* Move boot manager FSM to pre boot sequencer init state */
+ WREG32(mmPSOC_GLOBAL_CONF_SW_BTM_FSM,
+ 0xA << PSOC_GLOBAL_CONF_SW_BTM_FSM_CTRL_SHIFT);
+
+ if (goya) {
+ goya->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q |
+ HW_CAP_DDR_0 | HW_CAP_DDR_1 |
+ HW_CAP_DMA | HW_CAP_MME |
+ HW_CAP_MMU | HW_CAP_TPC_MBIST |
+ HW_CAP_GOLDEN | HW_CAP_TPC);
+
+ memset(goya->events_stat, 0, sizeof(goya->events_stat));
+ }
+}
+
+int goya_suspend(struct hl_device *hdev)
+{
+ int rc;
+
+ rc = hl_fw_send_pci_access_msg(hdev, CPUCP_PACKET_DISABLE_PCI_ACCESS, 0x0);
+ if (rc)
+ dev_err(hdev->dev, "Failed to disable PCI access from CPU\n");
+
+ return rc;
+}
+
+int goya_resume(struct hl_device *hdev)
+{
+ return goya_init_iatu(hdev);
+}
+
+static int goya_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size)
+{
+ int rc;
+
++ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
++ VM_DONTCOPY | VM_NORESERVE);
+
+ rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
+ (dma_addr - HOST_PHYS_BASE), size);
+ if (rc)
+ dev_err(hdev->dev, "dma_mmap_coherent error %d", rc);
+
+ return rc;
+}
+
+void goya_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi)
+{
+ u32 db_reg_offset, db_value;
+
+ switch (hw_queue_id) {
+ case GOYA_QUEUE_ID_DMA_0:
+ db_reg_offset = mmDMA_QM_0_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_DMA_1:
+ db_reg_offset = mmDMA_QM_1_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_DMA_2:
+ db_reg_offset = mmDMA_QM_2_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_DMA_3:
+ db_reg_offset = mmDMA_QM_3_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_DMA_4:
+ db_reg_offset = mmDMA_QM_4_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_CPU_PQ:
+ db_reg_offset = mmCPU_IF_PF_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_MME:
+ db_reg_offset = mmMME_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC0:
+ db_reg_offset = mmTPC0_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC1:
+ db_reg_offset = mmTPC1_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC2:
+ db_reg_offset = mmTPC2_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC3:
+ db_reg_offset = mmTPC3_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC4:
+ db_reg_offset = mmTPC4_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC5:
+ db_reg_offset = mmTPC5_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC6:
+ db_reg_offset = mmTPC6_QM_PQ_PI;
+ break;
+
+ case GOYA_QUEUE_ID_TPC7:
+ db_reg_offset = mmTPC7_QM_PQ_PI;
+ break;
+
+ default:
+ /* Should never get here */
+ dev_err(hdev->dev, "H/W queue %d is invalid. Can't set pi\n",
+ hw_queue_id);
+ return;
+ }
+
+ db_value = pi;
+
+ /* ring the doorbell */
+ WREG32(db_reg_offset, db_value);
+
+ if (hw_queue_id == GOYA_QUEUE_ID_CPU_PQ) {
+ /* make sure device CPU will read latest data from host */
+ mb();
+ WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
+ GOYA_ASYNC_EVENT_ID_PI_UPDATE);
+ }
+}
+
+void goya_pqe_write(struct hl_device *hdev, __le64 *pqe, struct hl_bd *bd)
+{
+ /* The QMANs are on the SRAM so need to copy to IO space */
+ memcpy_toio((void __iomem *) pqe, bd, sizeof(struct hl_bd));
+}
+
+static void *goya_dma_alloc_coherent(struct hl_device *hdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags)
+{
+ void *kernel_addr = dma_alloc_coherent(&hdev->pdev->dev, size,
+ dma_handle, flags);
+
+ /* Shift to the device's base physical address of host memory */
+ if (kernel_addr)
+ *dma_handle += HOST_PHYS_BASE;
+
+ return kernel_addr;
+}
+
+static void goya_dma_free_coherent(struct hl_device *hdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle)
+{
+ /* Cancel the device's base physical address of host memory */
+ dma_addr_t fixed_dma_handle = dma_handle - HOST_PHYS_BASE;
+
+ dma_free_coherent(&hdev->pdev->dev, size, cpu_addr, fixed_dma_handle);
+}
+
+int goya_scrub_device_mem(struct hl_device *hdev)
+{
+ return 0;
+}
+
+void *goya_get_int_queue_base(struct hl_device *hdev, u32 queue_id,
+ dma_addr_t *dma_handle, u16 *queue_len)
+{
+ void *base;
+ u32 offset;
+
+ *dma_handle = hdev->asic_prop.sram_base_address;
+
+ base = (__force void *) hdev->pcie_bar[SRAM_CFG_BAR_ID];
+
+ switch (queue_id) {
+ case GOYA_QUEUE_ID_MME:
+ offset = MME_QMAN_BASE_OFFSET;
+ *queue_len = MME_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC0:
+ offset = TPC0_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC1:
+ offset = TPC1_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC2:
+ offset = TPC2_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC3:
+ offset = TPC3_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC4:
+ offset = TPC4_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC5:
+ offset = TPC5_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC6:
+ offset = TPC6_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ case GOYA_QUEUE_ID_TPC7:
+ offset = TPC7_QMAN_BASE_OFFSET;
+ *queue_len = TPC_QMAN_LENGTH;
+ break;
+ default:
+ dev_err(hdev->dev, "Got invalid queue id %d\n", queue_id);
+ return NULL;
+ }
+
+ base += offset;
+ *dma_handle += offset;
+
+ return base;
+}
+
+static int goya_send_job_on_qman0(struct hl_device *hdev, struct hl_cs_job *job)
+{
+ struct packet_msg_prot *fence_pkt;
+ u32 *fence_ptr;
+ dma_addr_t fence_dma_addr;
+ struct hl_cb *cb;
+ u32 tmp, timeout;
+ int rc;
+
+ if (hdev->pldm)
+ timeout = GOYA_PLDM_QMAN0_TIMEOUT_USEC;
+ else
+ timeout = HL_DEVICE_TIMEOUT_USEC;
+
+ if (!hdev->asic_funcs->is_device_idle(hdev, NULL, 0, NULL)) {
+ dev_err_ratelimited(hdev->dev,
+ "Can't send driver job on QMAN0 because the device is not idle\n");
+ return -EBUSY;
+ }
+
+ fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
+ if (!fence_ptr) {
+ dev_err(hdev->dev,
+ "Failed to allocate fence memory for QMAN0\n");
+ return -ENOMEM;
+ }
+
+ goya_qman0_set_security(hdev, true);
+
+ cb = job->patched_cb;
+
+ fence_pkt = cb->kernel_address +
+ job->job_cb_size - sizeof(struct packet_msg_prot);
+
+ tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GOYA_PKT_CTL_EB_SHIFT) |
+ (1 << GOYA_PKT_CTL_MB_SHIFT);
+ fence_pkt->ctl = cpu_to_le32(tmp);
+ fence_pkt->value = cpu_to_le32(GOYA_QMAN0_FENCE_VAL);
+ fence_pkt->addr = cpu_to_le64(fence_dma_addr);
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_DMA_0,
+ job->job_cb_size, cb->bus_address);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to send CB on QMAN0, %d\n", rc);
+ goto free_fence_ptr;
+ }
+
+ rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp,
+ (tmp == GOYA_QMAN0_FENCE_VAL), 1000,
+ timeout, true);
+
+ hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_DMA_0);
+
+ if (rc == -ETIMEDOUT) {
+ dev_err(hdev->dev, "QMAN0 Job timeout (0x%x)\n", tmp);
+ goto free_fence_ptr;
+ }
+
+free_fence_ptr:
+ hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
+
+ goya_qman0_set_security(hdev, false);
+
+ return rc;
+}
+
+int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
+ u32 timeout, u64 *result)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q)) {
+ if (result)
+ *result = 0;
+ return 0;
+ }
+
+ if (!timeout)
+ timeout = GOYA_MSG_TO_CPU_TIMEOUT_USEC;
+
+ return hl_fw_send_cpu_message(hdev, GOYA_QUEUE_ID_CPU_PQ, msg, len,
+ timeout, result);
+}
+
+int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id)
+{
+ struct packet_msg_prot *fence_pkt;
+ dma_addr_t pkt_dma_addr;
+ u32 fence_val, tmp;
+ dma_addr_t fence_dma_addr;
+ u32 *fence_ptr;
+ int rc;
+
+ fence_val = GOYA_QMAN0_FENCE_VAL;
+
+ fence_ptr = hl_asic_dma_pool_zalloc(hdev, 4, GFP_KERNEL, &fence_dma_addr);
+ if (!fence_ptr) {
+ dev_err(hdev->dev,
+ "Failed to allocate memory for H/W queue %d testing\n",
+ hw_queue_id);
+ return -ENOMEM;
+ }
+
+ *fence_ptr = 0;
+
+ fence_pkt = hl_asic_dma_pool_zalloc(hdev, sizeof(struct packet_msg_prot), GFP_KERNEL,
+ &pkt_dma_addr);
+ if (!fence_pkt) {
+ dev_err(hdev->dev,
+ "Failed to allocate packet for H/W queue %d testing\n",
+ hw_queue_id);
+ rc = -ENOMEM;
+ goto free_fence_ptr;
+ }
+
+ tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GOYA_PKT_CTL_EB_SHIFT) |
+ (1 << GOYA_PKT_CTL_MB_SHIFT);
+ fence_pkt->ctl = cpu_to_le32(tmp);
+ fence_pkt->value = cpu_to_le32(fence_val);
+ fence_pkt->addr = cpu_to_le64(fence_dma_addr);
+
+ rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id,
+ sizeof(struct packet_msg_prot),
+ pkt_dma_addr);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to send fence packet to H/W queue %d\n",
+ hw_queue_id);
+ goto free_pkt;
+ }
+
+ rc = hl_poll_timeout_memory(hdev, fence_ptr, tmp, (tmp == fence_val),
+ 1000, GOYA_TEST_QUEUE_WAIT_USEC, true);
+
+ hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
+
+ if (rc == -ETIMEDOUT) {
+ dev_err(hdev->dev,
+ "H/W queue %d test failed (scratch(0x%08llX) == 0x%08X)\n",
+ hw_queue_id, (unsigned long long) fence_dma_addr, tmp);
+ rc = -EIO;
+ }
+
+free_pkt:
+ hl_asic_dma_pool_free(hdev, (void *) fence_pkt, pkt_dma_addr);
+free_fence_ptr:
+ hl_asic_dma_pool_free(hdev, (void *) fence_ptr, fence_dma_addr);
+ return rc;
+}
+
+int goya_test_cpu_queue(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ /*
+ * check capability here as send_cpu_message() won't update the result
+ * value if no capability
+ */
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_test_cpu_queue(hdev);
+}
+
+int goya_test_queues(struct hl_device *hdev)
+{
+ int i, rc, ret_val = 0;
+
+ for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
+ rc = goya_test_queue(hdev, i);
+ if (rc)
+ ret_val = -EINVAL;
+ }
+
+ return ret_val;
+}
+
+static void *goya_dma_pool_zalloc(struct hl_device *hdev, size_t size,
+ gfp_t mem_flags, dma_addr_t *dma_handle)
+{
+ void *kernel_addr;
+
+ if (size > GOYA_DMA_POOL_BLK_SIZE)
+ return NULL;
+
+ kernel_addr = dma_pool_zalloc(hdev->dma_pool, mem_flags, dma_handle);
+
+ /* Shift to the device's base physical address of host memory */
+ if (kernel_addr)
+ *dma_handle += HOST_PHYS_BASE;
+
+ return kernel_addr;
+}
+
+static void goya_dma_pool_free(struct hl_device *hdev, void *vaddr,
+ dma_addr_t dma_addr)
+{
+ /* Cancel the device's base physical address of host memory */
+ dma_addr_t fixed_dma_addr = dma_addr - HOST_PHYS_BASE;
+
+ dma_pool_free(hdev->dma_pool, vaddr, fixed_dma_addr);
+}
+
+void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
+ dma_addr_t *dma_handle)
+{
+ void *vaddr;
+
+ vaddr = hl_fw_cpu_accessible_dma_pool_alloc(hdev, size, dma_handle);
+ *dma_handle = (*dma_handle) - hdev->cpu_accessible_dma_address +
+ VA_CPU_ACCESSIBLE_MEM_ADDR;
+
+ return vaddr;
+}
+
+void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
+ void *vaddr)
+{
+ hl_fw_cpu_accessible_dma_pool_free(hdev, size, vaddr);
+}
+
+u32 goya_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt)
+{
+ struct scatterlist *sg, *sg_next_iter;
+ u32 count, dma_desc_cnt;
+ u64 len, len_next;
+ dma_addr_t addr, addr_next;
+
+ dma_desc_cnt = 0;
+
+ for_each_sgtable_dma_sg(sgt, sg, count) {
+ len = sg_dma_len(sg);
+ addr = sg_dma_address(sg);
+
+ if (len == 0)
+ break;
+
+ while ((count + 1) < sgt->nents) {
+ sg_next_iter = sg_next(sg);
+ len_next = sg_dma_len(sg_next_iter);
+ addr_next = sg_dma_address(sg_next_iter);
+
+ if (len_next == 0)
+ break;
+
+ if ((addr + len == addr_next) &&
+ (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
+ len += len_next;
+ count++;
+ sg = sg_next_iter;
+ } else {
+ break;
+ }
+ }
+
+ dma_desc_cnt++;
+ }
+
+ return dma_desc_cnt * sizeof(struct packet_lin_dma);
+}
+
+static int goya_pin_memory_before_cs(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt,
+ u64 addr, enum dma_data_direction dir)
+{
+ struct hl_userptr *userptr;
+ int rc;
+
+ if (hl_userptr_is_pinned(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
+ parser->job_userptr_list, &userptr))
+ goto already_pinned;
+
+ userptr = kzalloc(sizeof(*userptr), GFP_KERNEL);
+ if (!userptr)
+ return -ENOMEM;
+
+ rc = hl_pin_host_memory(hdev, addr, le32_to_cpu(user_dma_pkt->tsize),
+ userptr);
+ if (rc)
+ goto free_userptr;
+
+ list_add_tail(&userptr->job_node, parser->job_userptr_list);
+
+ rc = hdev->asic_funcs->asic_dma_map_sgtable(hdev, userptr->sgt, dir);
+ if (rc) {
+ dev_err(hdev->dev, "failed to map sgt with DMA region\n");
+ goto unpin_memory;
+ }
+
+ userptr->dma_mapped = true;
+ userptr->dir = dir;
+
+already_pinned:
+ parser->patched_cb_size +=
+ goya_get_dma_desc_list_size(hdev, userptr->sgt);
+
+ return 0;
+
+unpin_memory:
+ list_del(&userptr->job_node);
+ hl_unpin_host_memory(hdev, userptr);
+free_userptr:
+ kfree(userptr);
+ return rc;
+}
+
+static int goya_validate_dma_pkt_host(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt)
+{
+ u64 device_memory_addr, addr;
+ enum dma_data_direction dir;
+ enum hl_goya_dma_direction user_dir;
+ bool sram_addr = true;
+ bool skip_host_mem_pin = false;
+ bool user_memset;
+ u32 ctl;
+ int rc = 0;
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+
+ user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+
+ user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
+
+ switch (user_dir) {
+ case HL_DMA_HOST_TO_DRAM:
+ dev_dbg(hdev->dev, "DMA direction is HOST --> DRAM\n");
+ dir = DMA_TO_DEVICE;
+ sram_addr = false;
+ addr = le64_to_cpu(user_dma_pkt->src_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ if (user_memset)
+ skip_host_mem_pin = true;
+ break;
+
+ case HL_DMA_DRAM_TO_HOST:
+ dev_dbg(hdev->dev, "DMA direction is DRAM --> HOST\n");
+ dir = DMA_FROM_DEVICE;
+ sram_addr = false;
+ addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ break;
+
+ case HL_DMA_HOST_TO_SRAM:
+ dev_dbg(hdev->dev, "DMA direction is HOST --> SRAM\n");
+ dir = DMA_TO_DEVICE;
+ addr = le64_to_cpu(user_dma_pkt->src_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ if (user_memset)
+ skip_host_mem_pin = true;
+ break;
+
+ case HL_DMA_SRAM_TO_HOST:
+ dev_dbg(hdev->dev, "DMA direction is SRAM --> HOST\n");
+ dir = DMA_FROM_DEVICE;
+ addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ break;
+ default:
+ dev_err(hdev->dev, "DMA direction %d is unsupported/undefined\n", user_dir);
+ return -EFAULT;
+ }
+
+ if (sram_addr) {
+ if (!hl_mem_area_inside_range(device_memory_addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ hdev->asic_prop.sram_user_base_address,
+ hdev->asic_prop.sram_end_address)) {
+
+ dev_err(hdev->dev,
+ "SRAM address 0x%llx + 0x%x is invalid\n",
+ device_memory_addr,
+ user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+ } else {
+ if (!hl_mem_area_inside_range(device_memory_addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ hdev->asic_prop.dram_user_base_address,
+ hdev->asic_prop.dram_end_address)) {
+
+ dev_err(hdev->dev,
+ "DRAM address 0x%llx + 0x%x is invalid\n",
+ device_memory_addr,
+ user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+ }
+
+ if (skip_host_mem_pin)
+ parser->patched_cb_size += sizeof(*user_dma_pkt);
+ else {
+ if ((dir == DMA_TO_DEVICE) &&
+ (parser->hw_queue_id > GOYA_QUEUE_ID_DMA_1)) {
+ dev_err(hdev->dev,
+ "Can't DMA from host on queue other then 1\n");
+ return -EFAULT;
+ }
+
+ rc = goya_pin_memory_before_cs(hdev, parser, user_dma_pkt,
+ addr, dir);
+ }
+
+ return rc;
+}
+
+static int goya_validate_dma_pkt_no_host(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt)
+{
+ u64 sram_memory_addr, dram_memory_addr;
+ enum hl_goya_dma_direction user_dir;
+ u32 ctl;
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+ user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+
+ if (user_dir == HL_DMA_DRAM_TO_SRAM) {
+ dev_dbg(hdev->dev, "DMA direction is DRAM --> SRAM\n");
+ dram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ sram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ } else {
+ dev_dbg(hdev->dev, "DMA direction is SRAM --> DRAM\n");
+ sram_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ dram_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ }
+
+ if (!hl_mem_area_inside_range(sram_memory_addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ hdev->asic_prop.sram_user_base_address,
+ hdev->asic_prop.sram_end_address)) {
+ dev_err(hdev->dev, "SRAM address 0x%llx + 0x%x is invalid\n",
+ sram_memory_addr, user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+
+ if (!hl_mem_area_inside_range(dram_memory_addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ hdev->asic_prop.dram_user_base_address,
+ hdev->asic_prop.dram_end_address)) {
+ dev_err(hdev->dev, "DRAM address 0x%llx + 0x%x is invalid\n",
+ dram_memory_addr, user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+
+ parser->patched_cb_size += sizeof(*user_dma_pkt);
+
+ return 0;
+}
+
+static int goya_validate_dma_pkt_no_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt)
+{
+ enum hl_goya_dma_direction user_dir;
+ u32 ctl;
+ int rc;
+
+ dev_dbg(hdev->dev, "DMA packet details:\n");
+ dev_dbg(hdev->dev, "source == 0x%llx\n",
+ le64_to_cpu(user_dma_pkt->src_addr));
+ dev_dbg(hdev->dev, "destination == 0x%llx\n",
+ le64_to_cpu(user_dma_pkt->dst_addr));
+ dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+ user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+
+ /*
+ * Special handling for DMA with size 0. The H/W has a bug where
+ * this can cause the QMAN DMA to get stuck, so block it here.
+ */
+ if (user_dma_pkt->tsize == 0) {
+ dev_err(hdev->dev,
+ "Got DMA with size 0, might reset the device\n");
+ return -EINVAL;
+ }
+
+ if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM))
+ rc = goya_validate_dma_pkt_no_host(hdev, parser, user_dma_pkt);
+ else
+ rc = goya_validate_dma_pkt_host(hdev, parser, user_dma_pkt);
+
+ return rc;
+}
+
+static int goya_validate_dma_pkt_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt)
+{
+ dev_dbg(hdev->dev, "DMA packet details:\n");
+ dev_dbg(hdev->dev, "source == 0x%llx\n",
+ le64_to_cpu(user_dma_pkt->src_addr));
+ dev_dbg(hdev->dev, "destination == 0x%llx\n",
+ le64_to_cpu(user_dma_pkt->dst_addr));
+ dev_dbg(hdev->dev, "size == %u\n", le32_to_cpu(user_dma_pkt->tsize));
+
+ /*
+ * WA for HW-23.
+ * We can't allow user to read from Host using QMANs other than 1.
+ * PMMU and HPMMU addresses are equal, check only one of them.
+ */
+ if (parser->hw_queue_id != GOYA_QUEUE_ID_DMA_1 &&
+ hl_mem_area_inside_range(le64_to_cpu(user_dma_pkt->src_addr),
+ le32_to_cpu(user_dma_pkt->tsize),
+ hdev->asic_prop.pmmu.start_addr,
+ hdev->asic_prop.pmmu.end_addr)) {
+ dev_err(hdev->dev,
+ "Can't DMA from host on queue other then 1\n");
+ return -EFAULT;
+ }
+
+ if (user_dma_pkt->tsize == 0) {
+ dev_err(hdev->dev,
+ "Got DMA with size 0, might reset the device\n");
+ return -EINVAL;
+ }
+
+ parser->patched_cb_size += sizeof(*user_dma_pkt);
+
+ return 0;
+}
+
+static int goya_validate_wreg32(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_wreg32 *wreg_pkt)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 sob_start_addr, sob_end_addr;
+ u16 reg_offset;
+
+ reg_offset = le32_to_cpu(wreg_pkt->ctl) &
+ GOYA_PKT_WREG32_CTL_REG_OFFSET_MASK;
+
+ dev_dbg(hdev->dev, "WREG32 packet details:\n");
+ dev_dbg(hdev->dev, "reg_offset == 0x%x\n", reg_offset);
+ dev_dbg(hdev->dev, "value == 0x%x\n",
+ le32_to_cpu(wreg_pkt->value));
+
+ if (reg_offset != (mmDMA_CH_0_WR_COMP_ADDR_LO & 0x1FFF)) {
+ dev_err(hdev->dev, "WREG32 packet with illegal address 0x%x\n",
+ reg_offset);
+ return -EPERM;
+ }
+
+ /*
+ * With MMU, DMA channels are not secured, so it doesn't matter where
+ * the WR COMP will be written to because it will go out with
+ * non-secured property
+ */
+ if (goya->hw_cap_initialized & HW_CAP_MMU)
+ return 0;
+
+ sob_start_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_0);
+ sob_end_addr = lower_32_bits(CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1023);
+
+ if ((le32_to_cpu(wreg_pkt->value) < sob_start_addr) ||
+ (le32_to_cpu(wreg_pkt->value) > sob_end_addr)) {
+
+ dev_err(hdev->dev, "WREG32 packet with illegal value 0x%x\n",
+ wreg_pkt->value);
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+static int goya_validate_cb(struct hl_device *hdev,
+ struct hl_cs_parser *parser, bool is_mmu)
+{
+ u32 cb_parsed_length = 0;
+ int rc = 0;
+
+ parser->patched_cb_size = 0;
+
+ /* cb_user_size is more than 0 so loop will always be executed */
+ while (cb_parsed_length < parser->user_cb_size) {
+ enum packet_id pkt_id;
+ u16 pkt_size;
+ struct goya_packet *user_pkt;
+
+ user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
+
+ pkt_id = (enum packet_id) (
+ (le64_to_cpu(user_pkt->header) &
+ PACKET_HEADER_PACKET_ID_MASK) >>
+ PACKET_HEADER_PACKET_ID_SHIFT);
+
+ if (!validate_packet_id(pkt_id)) {
+ dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ pkt_size = goya_packet_sizes[pkt_id];
+ cb_parsed_length += pkt_size;
+ if (cb_parsed_length > parser->user_cb_size) {
+ dev_err(hdev->dev,
+ "packet 0x%x is out of CB boundary\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ switch (pkt_id) {
+ case PACKET_WREG_32:
+ /*
+ * Although it is validated after copy in patch_cb(),
+ * need to validate here as well because patch_cb() is
+ * not called in MMU path while this function is called
+ */
+ rc = goya_validate_wreg32(hdev,
+ parser, (struct packet_wreg32 *) user_pkt);
+ parser->patched_cb_size += pkt_size;
+ break;
+
+ case PACKET_WREG_BULK:
+ dev_err(hdev->dev,
+ "User not allowed to use WREG_BULK\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_MSG_PROT:
+ dev_err(hdev->dev,
+ "User not allowed to use MSG_PROT\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_CP_DMA:
+ dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_STOP:
+ dev_err(hdev->dev, "User not allowed to use STOP\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_LIN_DMA:
+ if (is_mmu)
+ rc = goya_validate_dma_pkt_mmu(hdev, parser,
+ (struct packet_lin_dma *) user_pkt);
+ else
+ rc = goya_validate_dma_pkt_no_mmu(hdev, parser,
+ (struct packet_lin_dma *) user_pkt);
+ break;
+
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_FENCE:
+ case PACKET_NOP:
+ parser->patched_cb_size += pkt_size;
+ break;
+
+ default:
+ dev_err(hdev->dev, "Invalid packet header 0x%x\n",
+ pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ if (rc)
+ break;
+ }
+
+ /*
+ * The new CB should have space at the end for two MSG_PROT packets:
+ * 1. A packet that will act as a completion packet
+ * 2. A packet that will generate MSI-X interrupt
+ */
+ parser->patched_cb_size += sizeof(struct packet_msg_prot) * 2;
+
+ return rc;
+}
+
+static int goya_patch_dma_packet(struct hl_device *hdev,
+ struct hl_cs_parser *parser,
+ struct packet_lin_dma *user_dma_pkt,
+ struct packet_lin_dma *new_dma_pkt,
+ u32 *new_dma_pkt_size)
+{
+ struct hl_userptr *userptr;
+ struct scatterlist *sg, *sg_next_iter;
+ u32 count, dma_desc_cnt;
+ u64 len, len_next;
+ dma_addr_t dma_addr, dma_addr_next;
+ enum hl_goya_dma_direction user_dir;
+ u64 device_memory_addr, addr;
+ enum dma_data_direction dir;
+ struct sg_table *sgt;
+ bool skip_host_mem_pin = false;
+ bool user_memset;
+ u32 user_rdcomp_mask, user_wrcomp_mask, ctl;
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+
+ user_dir = (ctl & GOYA_PKT_LIN_DMA_CTL_DMA_DIR_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+
+ user_memset = (ctl & GOYA_PKT_LIN_DMA_CTL_MEMSET_MASK) >>
+ GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT;
+
+ if ((user_dir == HL_DMA_DRAM_TO_SRAM) || (user_dir == HL_DMA_SRAM_TO_DRAM) ||
+ (user_dma_pkt->tsize == 0)) {
+ memcpy(new_dma_pkt, user_dma_pkt, sizeof(*new_dma_pkt));
+ *new_dma_pkt_size = sizeof(*new_dma_pkt);
+ return 0;
+ }
+
+ if ((user_dir == HL_DMA_HOST_TO_DRAM) || (user_dir == HL_DMA_HOST_TO_SRAM)) {
+ addr = le64_to_cpu(user_dma_pkt->src_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ dir = DMA_TO_DEVICE;
+ if (user_memset)
+ skip_host_mem_pin = true;
+ } else {
+ addr = le64_to_cpu(user_dma_pkt->dst_addr);
+ device_memory_addr = le64_to_cpu(user_dma_pkt->src_addr);
+ dir = DMA_FROM_DEVICE;
+ }
+
+ if ((!skip_host_mem_pin) &&
+ (hl_userptr_is_pinned(hdev, addr,
+ le32_to_cpu(user_dma_pkt->tsize),
+ parser->job_userptr_list, &userptr) == false)) {
+ dev_err(hdev->dev, "Userptr 0x%llx + 0x%x NOT mapped\n",
+ addr, user_dma_pkt->tsize);
+ return -EFAULT;
+ }
+
+ if ((user_memset) && (dir == DMA_TO_DEVICE)) {
+ memcpy(new_dma_pkt, user_dma_pkt, sizeof(*user_dma_pkt));
+ *new_dma_pkt_size = sizeof(*user_dma_pkt);
+ return 0;
+ }
+
+ user_rdcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK;
+
+ user_wrcomp_mask = ctl & GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK;
+
+ sgt = userptr->sgt;
+ dma_desc_cnt = 0;
+
+ for_each_sgtable_dma_sg(sgt, sg, count) {
+ len = sg_dma_len(sg);
+ dma_addr = sg_dma_address(sg);
+
+ if (len == 0)
+ break;
+
+ while ((count + 1) < sgt->nents) {
+ sg_next_iter = sg_next(sg);
+ len_next = sg_dma_len(sg_next_iter);
+ dma_addr_next = sg_dma_address(sg_next_iter);
+
+ if (len_next == 0)
+ break;
+
+ if ((dma_addr + len == dma_addr_next) &&
+ (len + len_next <= DMA_MAX_TRANSFER_SIZE)) {
+ len += len_next;
+ count++;
+ sg = sg_next_iter;
+ } else {
+ break;
+ }
+ }
+
+ ctl = le32_to_cpu(user_dma_pkt->ctl);
+ if (likely(dma_desc_cnt))
+ ctl &= ~GOYA_PKT_CTL_EB_MASK;
+ ctl &= ~(GOYA_PKT_LIN_DMA_CTL_RDCOMP_MASK |
+ GOYA_PKT_LIN_DMA_CTL_WRCOMP_MASK);
+ new_dma_pkt->ctl = cpu_to_le32(ctl);
+ new_dma_pkt->tsize = cpu_to_le32((u32) len);
+
+ if (dir == DMA_TO_DEVICE) {
+ new_dma_pkt->src_addr = cpu_to_le64(dma_addr);
+ new_dma_pkt->dst_addr = cpu_to_le64(device_memory_addr);
+ } else {
+ new_dma_pkt->src_addr = cpu_to_le64(device_memory_addr);
+ new_dma_pkt->dst_addr = cpu_to_le64(dma_addr);
+ }
+
+ if (!user_memset)
+ device_memory_addr += len;
+ dma_desc_cnt++;
+ new_dma_pkt++;
+ }
+
+ if (!dma_desc_cnt) {
+ dev_err(hdev->dev,
+ "Error of 0 SG entries when patching DMA packet\n");
+ return -EFAULT;
+ }
+
+ /* Fix the last dma packet - rdcomp/wrcomp must be as user set them */
+ new_dma_pkt--;
+ new_dma_pkt->ctl |= cpu_to_le32(user_rdcomp_mask | user_wrcomp_mask);
+
+ *new_dma_pkt_size = dma_desc_cnt * sizeof(struct packet_lin_dma);
+
+ return 0;
+}
+
+static int goya_patch_cb(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u32 cb_parsed_length = 0;
+ u32 cb_patched_cur_length = 0;
+ int rc = 0;
+
+ /* cb_user_size is more than 0 so loop will always be executed */
+ while (cb_parsed_length < parser->user_cb_size) {
+ enum packet_id pkt_id;
+ u16 pkt_size;
+ u32 new_pkt_size = 0;
+ struct goya_packet *user_pkt, *kernel_pkt;
+
+ user_pkt = parser->user_cb->kernel_address + cb_parsed_length;
+ kernel_pkt = parser->patched_cb->kernel_address +
+ cb_patched_cur_length;
+
+ pkt_id = (enum packet_id) (
+ (le64_to_cpu(user_pkt->header) &
+ PACKET_HEADER_PACKET_ID_MASK) >>
+ PACKET_HEADER_PACKET_ID_SHIFT);
+
+ if (!validate_packet_id(pkt_id)) {
+ dev_err(hdev->dev, "Invalid packet id %u\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ pkt_size = goya_packet_sizes[pkt_id];
+ cb_parsed_length += pkt_size;
+ if (cb_parsed_length > parser->user_cb_size) {
+ dev_err(hdev->dev,
+ "packet 0x%x is out of CB boundary\n", pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ switch (pkt_id) {
+ case PACKET_LIN_DMA:
+ rc = goya_patch_dma_packet(hdev, parser,
+ (struct packet_lin_dma *) user_pkt,
+ (struct packet_lin_dma *) kernel_pkt,
+ &new_pkt_size);
+ cb_patched_cur_length += new_pkt_size;
+ break;
+
+ case PACKET_WREG_32:
+ memcpy(kernel_pkt, user_pkt, pkt_size);
+ cb_patched_cur_length += pkt_size;
+ rc = goya_validate_wreg32(hdev, parser,
+ (struct packet_wreg32 *) kernel_pkt);
+ break;
+
+ case PACKET_WREG_BULK:
+ dev_err(hdev->dev,
+ "User not allowed to use WREG_BULK\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_MSG_PROT:
+ dev_err(hdev->dev,
+ "User not allowed to use MSG_PROT\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_CP_DMA:
+ dev_err(hdev->dev, "User not allowed to use CP_DMA\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_STOP:
+ dev_err(hdev->dev, "User not allowed to use STOP\n");
+ rc = -EPERM;
+ break;
+
+ case PACKET_MSG_LONG:
+ case PACKET_MSG_SHORT:
+ case PACKET_FENCE:
+ case PACKET_NOP:
+ memcpy(kernel_pkt, user_pkt, pkt_size);
+ cb_patched_cur_length += pkt_size;
+ break;
+
+ default:
+ dev_err(hdev->dev, "Invalid packet header 0x%x\n",
+ pkt_id);
+ rc = -EINVAL;
+ break;
+ }
+
+ if (rc)
+ break;
+ }
+
+ return rc;
+}
+
+static int goya_parse_cb_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u64 handle;
+ u32 patched_cb_size;
+ struct hl_cb *user_cb;
+ int rc;
+
+ /*
+ * The new CB should have space at the end for two MSG_PROT pkt:
+ * 1. A packet that will act as a completion packet
+ * 2. A packet that will generate MSI-X interrupt
+ */
+ parser->patched_cb_size = parser->user_cb_size +
+ sizeof(struct packet_msg_prot) * 2;
+
+ rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
+ parser->patched_cb_size, false, false,
+ &handle);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to allocate patched CB for DMA CS %d\n",
+ rc);
+ return rc;
+ }
+
+ parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
+ /* hl_cb_get should never fail here */
+ if (!parser->patched_cb) {
+ dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
+ rc = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * The check that parser->user_cb_size <= parser->user_cb->size was done
+ * in validate_queue_index().
+ */
+ memcpy(parser->patched_cb->kernel_address,
+ parser->user_cb->kernel_address,
+ parser->user_cb_size);
+
+ patched_cb_size = parser->patched_cb_size;
+
+ /* validate patched CB instead of user CB */
+ user_cb = parser->user_cb;
+ parser->user_cb = parser->patched_cb;
+ rc = goya_validate_cb(hdev, parser, true);
+ parser->user_cb = user_cb;
+
+ if (rc) {
+ hl_cb_put(parser->patched_cb);
+ goto out;
+ }
+
+ if (patched_cb_size != parser->patched_cb_size) {
+ dev_err(hdev->dev, "user CB size mismatch\n");
+ hl_cb_put(parser->patched_cb);
+ rc = -EINVAL;
+ goto out;
+ }
+
+out:
+ /*
+ * Always call cb destroy here because we still have 1 reference
+ * to it by calling cb_get earlier. After the job will be completed,
+ * cb_put will release it, but here we want to remove it from the
+ * idr
+ */
+ hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
+
+ return rc;
+}
+
+static int goya_parse_cb_no_mmu(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ u64 handle;
+ int rc;
+
+ rc = goya_validate_cb(hdev, parser, false);
+
+ if (rc)
+ goto free_userptr;
+
+ rc = hl_cb_create(hdev, &hdev->kernel_mem_mgr, hdev->kernel_ctx,
+ parser->patched_cb_size, false, false,
+ &handle);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Failed to allocate patched CB for DMA CS %d\n", rc);
+ goto free_userptr;
+ }
+
+ parser->patched_cb = hl_cb_get(&hdev->kernel_mem_mgr, handle);
+ /* hl_cb_get should never fail here */
+ if (!parser->patched_cb) {
+ dev_crit(hdev->dev, "DMA CB handle invalid 0x%llx\n", handle);
+ rc = -EFAULT;
+ goto out;
+ }
+
+ rc = goya_patch_cb(hdev, parser);
+
+ if (rc)
+ hl_cb_put(parser->patched_cb);
+
+out:
+ /*
+ * Always call cb destroy here because we still have 1 reference
+ * to it by calling cb_get earlier. After the job will be completed,
+ * cb_put will release it, but here we want to remove it from the
+ * idr
+ */
+ hl_cb_destroy(&hdev->kernel_mem_mgr, handle);
+
+free_userptr:
+ if (rc)
+ hl_userptr_delete_list(hdev, parser->job_userptr_list);
+ return rc;
+}
+
+static int goya_parse_cb_no_ext_queue(struct hl_device *hdev,
+ struct hl_cs_parser *parser)
+{
+ struct asic_fixed_properties *asic_prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (goya->hw_cap_initialized & HW_CAP_MMU)
+ return 0;
+
+ /* For internal queue jobs, just check if CB address is valid */
+ if (hl_mem_area_inside_range(
+ (u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->sram_user_base_address,
+ asic_prop->sram_end_address))
+ return 0;
+
+ if (hl_mem_area_inside_range(
+ (u64) (uintptr_t) parser->user_cb,
+ parser->user_cb_size,
+ asic_prop->dram_user_base_address,
+ asic_prop->dram_end_address))
+ return 0;
+
+ dev_err(hdev->dev,
+ "Internal CB address 0x%px + 0x%x is not in SRAM nor in DRAM\n",
+ parser->user_cb, parser->user_cb_size);
+
+ return -EFAULT;
+}
+
+int goya_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (parser->queue_type == QUEUE_TYPE_INT)
+ return goya_parse_cb_no_ext_queue(hdev, parser);
+
+ if (goya->hw_cap_initialized & HW_CAP_MMU)
+ return goya_parse_cb_mmu(hdev, parser);
+ else
+ return goya_parse_cb_no_mmu(hdev, parser);
+}
+
+void goya_add_end_of_cb_packets(struct hl_device *hdev, void *kernel_address,
+ u32 len, u32 original_len, u64 cq_addr, u32 cq_val,
+ u32 msix_vec, bool eb)
+{
+ struct packet_msg_prot *cq_pkt;
+ u32 tmp;
+
+ cq_pkt = kernel_address + len - (sizeof(struct packet_msg_prot) * 2);
+
+ tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GOYA_PKT_CTL_EB_SHIFT) |
+ (1 << GOYA_PKT_CTL_MB_SHIFT);
+ cq_pkt->ctl = cpu_to_le32(tmp);
+ cq_pkt->value = cpu_to_le32(cq_val);
+ cq_pkt->addr = cpu_to_le64(cq_addr);
+
+ cq_pkt++;
+
+ tmp = (PACKET_MSG_PROT << GOYA_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GOYA_PKT_CTL_MB_SHIFT);
+ cq_pkt->ctl = cpu_to_le32(tmp);
+ cq_pkt->value = cpu_to_le32(msix_vec & 0x7FF);
+ cq_pkt->addr = cpu_to_le64(CFG_BASE + mmPCIE_DBI_MSIX_DOORBELL_OFF);
+}
+
+void goya_update_eq_ci(struct hl_device *hdev, u32 val)
+{
+ WREG32(mmCPU_EQ_CI, val);
+}
+
+void goya_restore_phase_topology(struct hl_device *hdev)
+{
+
+}
+
+static void goya_clear_sm_regs(struct hl_device *hdev)
+{
+ int i, num_of_sob_in_longs, num_of_mon_in_longs;
+
+ num_of_sob_in_longs =
+ ((mmSYNC_MNGR_SOB_OBJ_1023 - mmSYNC_MNGR_SOB_OBJ_0) + 4);
+
+ num_of_mon_in_longs =
+ ((mmSYNC_MNGR_MON_STATUS_255 - mmSYNC_MNGR_MON_STATUS_0) + 4);
+
+ for (i = 0 ; i < num_of_sob_in_longs ; i += 4)
+ WREG32(mmSYNC_MNGR_SOB_OBJ_0 + i, 0);
+
+ for (i = 0 ; i < num_of_mon_in_longs ; i += 4)
+ WREG32(mmSYNC_MNGR_MON_STATUS_0 + i, 0);
+
+ /* Flush all WREG to prevent race */
+ i = RREG32(mmSYNC_MNGR_SOB_OBJ_0);
+}
+
+static int goya_debugfs_read_dma(struct hl_device *hdev, u64 addr, u32 size, void *blob_addr)
+{
+ dev_err(hdev->dev, "Reading via DMA is unimplemented yet\n");
+ return -EPERM;
+}
+
+static u64 goya_read_pte(struct hl_device *hdev, u64 addr)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return U64_MAX;
+
+ return readq(hdev->pcie_bar[DDR_BAR_ID] +
+ (addr - goya->ddr_bar_cur_addr));
+}
+
+static void goya_write_pte(struct hl_device *hdev, u64 addr, u64 val)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (hdev->reset_info.hard_reset_pending)
+ return;
+
+ writeq(val, hdev->pcie_bar[DDR_BAR_ID] +
+ (addr - goya->ddr_bar_cur_addr));
+}
+
+static const char *_goya_get_event_desc(u16 event_type)
+{
+ switch (event_type) {
+ case GOYA_ASYNC_EVENT_ID_PCIE_IF:
+ return "PCIe_if";
+ case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
+ return "TPC%d_ecc";
+ case GOYA_ASYNC_EVENT_ID_MME_ECC:
+ return "MME_ecc";
+ case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
+ return "MME_ecc_ext";
+ case GOYA_ASYNC_EVENT_ID_MMU_ECC:
+ return "MMU_ecc";
+ case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
+ return "DMA_macro";
+ case GOYA_ASYNC_EVENT_ID_DMA_ECC:
+ return "DMA_ecc";
+ case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
+ return "CPU_if_ecc";
+ case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
+ return "PSOC_mem";
+ case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
+ return "PSOC_coresight";
+ case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
+ return "SRAM%d";
+ case GOYA_ASYNC_EVENT_ID_GIC500:
+ return "GIC500";
+ case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
+ return "PLL%d";
+ case GOYA_ASYNC_EVENT_ID_AXI_ECC:
+ return "AXI_ecc";
+ case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
+ return "L2_ram_ecc";
+ case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
+ return "PSOC_gpio_05_sw_reset";
+ case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
+ return "PSOC_gpio_10_vrhot_icrit";
+ case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
+ return "PCIe_dec";
+ case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
+ return "TPC%d_dec";
+ case GOYA_ASYNC_EVENT_ID_MME_WACS:
+ return "MME_wacs";
+ case GOYA_ASYNC_EVENT_ID_MME_WACSD:
+ return "MME_wacsd";
+ case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
+ return "CPU_axi_splitter";
+ case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
+ return "PSOC_axi_dec";
+ case GOYA_ASYNC_EVENT_ID_PSOC:
+ return "PSOC";
+ case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
+ return "TPC%d_krn_err";
+ case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
+ return "TPC%d_cq";
+ case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
+ return "TPC%d_qm";
+ case GOYA_ASYNC_EVENT_ID_MME_QM:
+ return "MME_qm";
+ case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
+ return "MME_cq";
+ case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
+ return "DMA%d_qm";
+ case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
+ return "DMA%d_ch";
+ case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
+ return "TPC%d_bmon_spmu";
+ case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
+ return "DMA_bm_ch%d";
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
+ return "POWER_ENV_S";
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
+ return "POWER_ENV_E";
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
+ return "THERMAL_ENV_S";
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
+ return "THERMAL_ENV_E";
+ case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
+ return "QUEUE_OUT_OF_SYNC";
+ default:
+ return "N/A";
+ }
+}
+
+static void goya_get_event_desc(u16 event_type, char *desc, size_t size)
+{
+ u8 index;
+
+ switch (event_type) {
+ case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
+ index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_ECC) / 3;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
+ index = event_type - GOYA_ASYNC_EVENT_ID_SRAM0;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
+ index = event_type - GOYA_ASYNC_EVENT_ID_PLL0;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
+ index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_DEC) / 3;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
+ index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR) / 10;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_CMDQ:
+ index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_CMDQ;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_TPC0_QM ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
+ index = event_type - GOYA_ASYNC_EVENT_ID_TPC0_QM;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
+ index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_QM;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
+ index = event_type - GOYA_ASYNC_EVENT_ID_DMA0_CH;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
+ index = (event_type - GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU) / 10;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
+ index = event_type - GOYA_ASYNC_EVENT_ID_DMA_BM_CH0;
+ snprintf(desc, size, _goya_get_event_desc(event_type), index);
+ break;
+ case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
+ snprintf(desc, size, _goya_get_event_desc(event_type));
+ break;
+ default:
+ snprintf(desc, size, _goya_get_event_desc(event_type));
+ break;
+ }
+}
+
+static void goya_print_razwi_info(struct hl_device *hdev)
+{
+ if (RREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD)) {
+ dev_err_ratelimited(hdev->dev, "Illegal write to LBW\n");
+ WREG32(mmDMA_MACRO_RAZWI_LBW_WT_VLD, 0);
+ }
+
+ if (RREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD)) {
+ dev_err_ratelimited(hdev->dev, "Illegal read from LBW\n");
+ WREG32(mmDMA_MACRO_RAZWI_LBW_RD_VLD, 0);
+ }
+
+ if (RREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD)) {
+ dev_err_ratelimited(hdev->dev, "Illegal write to HBW\n");
+ WREG32(mmDMA_MACRO_RAZWI_HBW_WT_VLD, 0);
+ }
+
+ if (RREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD)) {
+ dev_err_ratelimited(hdev->dev, "Illegal read from HBW\n");
+ WREG32(mmDMA_MACRO_RAZWI_HBW_RD_VLD, 0);
+ }
+}
+
+static void goya_print_mmu_error_info(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u64 addr;
+ u32 val;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ val = RREG32(mmMMU_PAGE_ERROR_CAPTURE);
+ if (val & MMU_PAGE_ERROR_CAPTURE_ENTRY_VALID_MASK) {
+ addr = val & MMU_PAGE_ERROR_CAPTURE_VA_49_32_MASK;
+ addr <<= 32;
+ addr |= RREG32(mmMMU_PAGE_ERROR_CAPTURE_VA);
+
+ dev_err_ratelimited(hdev->dev, "MMU page fault on va 0x%llx\n",
+ addr);
+
+ WREG32(mmMMU_PAGE_ERROR_CAPTURE, 0);
+ }
+}
+
+static void goya_print_out_of_sync_info(struct hl_device *hdev,
+ struct cpucp_pkt_sync_err *sync_err)
+{
+ struct hl_hw_queue *q = &hdev->kernel_queues[GOYA_QUEUE_ID_CPU_PQ];
+
+ dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n",
+ le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci));
+}
+
+static void goya_print_irq_info(struct hl_device *hdev, u16 event_type,
+ bool razwi)
+{
+ char desc[20] = "";
+
+ goya_get_event_desc(event_type, desc, sizeof(desc));
+ dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
+ event_type, desc);
+
+ if (razwi) {
+ goya_print_razwi_info(hdev);
+ goya_print_mmu_error_info(hdev);
+ }
+}
+
+static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
+ size_t irq_arr_size)
+{
+ struct cpucp_unmask_irq_arr_packet *pkt;
+ size_t total_pkt_size;
+ u64 result;
+ int rc;
+ int irq_num_entries, irq_arr_index;
+ __le32 *goya_irq_arr;
+
+ total_pkt_size = sizeof(struct cpucp_unmask_irq_arr_packet) +
+ irq_arr_size;
+
+ /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
+ total_pkt_size = (total_pkt_size + 0x7) & ~0x7;
+
+ /* total_pkt_size is casted to u16 later on */
+ if (total_pkt_size > USHRT_MAX) {
+ dev_err(hdev->dev, "too many elements in IRQ array\n");
+ return -EINVAL;
+ }
+
+ pkt = kzalloc(total_pkt_size, GFP_KERNEL);
+ if (!pkt)
+ return -ENOMEM;
+
+ irq_num_entries = irq_arr_size / sizeof(irq_arr[0]);
+ pkt->length = cpu_to_le32(irq_num_entries);
+
+ /* We must perform any necessary endianness conversation on the irq
+ * array being passed to the goya hardware
+ */
+ for (irq_arr_index = 0, goya_irq_arr = (__le32 *) &pkt->irqs;
+ irq_arr_index < irq_num_entries ; irq_arr_index++)
+ goya_irq_arr[irq_arr_index] =
+ cpu_to_le32(irq_arr[irq_arr_index]);
+
+ pkt->cpucp_pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ_ARRAY <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+
+ rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) pkt,
+ total_pkt_size, 0, &result);
+
+ if (rc)
+ dev_err(hdev->dev, "failed to unmask IRQ array\n");
+
+ kfree(pkt);
+
+ return rc;
+}
+
+static int goya_compute_reset_late_init(struct hl_device *hdev)
+{
+ /*
+ * Unmask all IRQs since some could have been received
+ * during the soft reset
+ */
+ return goya_unmask_irq_arr(hdev, goya_all_events,
+ sizeof(goya_all_events));
+}
+
+static int goya_unmask_irq(struct hl_device *hdev, u16 event_type)
+{
+ struct cpucp_packet pkt;
+ u64 result;
+ int rc;
+
+ memset(&pkt, 0, sizeof(pkt));
+
+ pkt.ctl = cpu_to_le32(CPUCP_PACKET_UNMASK_RAZWI_IRQ <<
+ CPUCP_PKT_CTL_OPCODE_SHIFT);
+ pkt.value = cpu_to_le64(event_type);
+
+ rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+ 0, &result);
+
+ if (rc)
+ dev_err(hdev->dev, "failed to unmask RAZWI IRQ %d", event_type);
+
+ return rc;
+}
+
+static void goya_print_clk_change_info(struct hl_device *hdev, u16 event_type)
+{
+ ktime_t zero_time = ktime_set(0, 0);
+
+ mutex_lock(&hdev->clk_throttling.lock);
+
+ switch (event_type) {
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time;
+ dev_info_ratelimited(hdev->dev,
+ "Clock throttling due to power consumption\n");
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get();
+ dev_info_ratelimited(hdev->dev,
+ "Power envelop is safe, back to optimal clock\n");
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
+ hdev->clk_throttling.current_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get();
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time;
+ dev_info_ratelimited(hdev->dev,
+ "Clock throttling due to overheating\n");
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
+ hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL;
+ hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get();
+ dev_info_ratelimited(hdev->dev,
+ "Thermal envelop is safe, back to optimal clock\n");
+ break;
+
+ default:
+ dev_err(hdev->dev, "Received invalid clock change event %d\n",
+ event_type);
+ break;
+ }
+
+ mutex_unlock(&hdev->clk_throttling.lock);
+}
+
+void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
+{
+ u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
+ u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
+ >> EQ_CTL_EVENT_TYPE_SHIFT);
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (event_type >= GOYA_ASYNC_EVENT_ID_SIZE) {
+ dev_err(hdev->dev, "Event type %u exceeds maximum of %u",
+ event_type, GOYA_ASYNC_EVENT_ID_SIZE - 1);
+ return;
+ }
+
+ goya->events_stat[event_type]++;
+ goya->events_stat_aggregate[event_type]++;
+
+ switch (event_type) {
+ case GOYA_ASYNC_EVENT_ID_PCIE_IF:
+ case GOYA_ASYNC_EVENT_ID_TPC0_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_ECC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_ECC:
+ case GOYA_ASYNC_EVENT_ID_MME_ECC:
+ case GOYA_ASYNC_EVENT_ID_MME_ECC_EXT:
+ case GOYA_ASYNC_EVENT_ID_MMU_ECC:
+ case GOYA_ASYNC_EVENT_ID_DMA_MACRO:
+ case GOYA_ASYNC_EVENT_ID_DMA_ECC:
+ case GOYA_ASYNC_EVENT_ID_CPU_IF_ECC:
+ case GOYA_ASYNC_EVENT_ID_PSOC_MEM:
+ case GOYA_ASYNC_EVENT_ID_PSOC_CORESIGHT:
+ case GOYA_ASYNC_EVENT_ID_SRAM0 ... GOYA_ASYNC_EVENT_ID_SRAM29:
+ case GOYA_ASYNC_EVENT_ID_GIC500:
+ case GOYA_ASYNC_EVENT_ID_PLL0 ... GOYA_ASYNC_EVENT_ID_PLL6:
+ case GOYA_ASYNC_EVENT_ID_AXI_ECC:
+ case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
+ goya_print_irq_info(hdev, event_type, false);
+ if (hdev->hard_reset_on_fw_events)
+ hl_device_reset(hdev, (HL_DRV_RESET_HARD |
+ HL_DRV_RESET_FW_FATAL_ERR));
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
+ goya_print_irq_info(hdev, event_type, false);
+ if (hdev->hard_reset_on_fw_events)
+ hl_device_reset(hdev, HL_DRV_RESET_HARD);
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC0_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC1_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC2_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC3_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC4_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC5_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC6_DEC:
+ case GOYA_ASYNC_EVENT_ID_TPC7_DEC:
+ case GOYA_ASYNC_EVENT_ID_MME_WACS:
+ case GOYA_ASYNC_EVENT_ID_MME_WACSD:
+ case GOYA_ASYNC_EVENT_ID_CPU_AXI_SPLITTER:
+ case GOYA_ASYNC_EVENT_ID_PSOC_AXI_DEC:
+ case GOYA_ASYNC_EVENT_ID_PSOC:
+ case GOYA_ASYNC_EVENT_ID_TPC0_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC1_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC2_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC3_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC4_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC5_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC6_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC7_KRN_ERR:
+ case GOYA_ASYNC_EVENT_ID_TPC0_CMDQ ... GOYA_ASYNC_EVENT_ID_TPC7_QM:
+ case GOYA_ASYNC_EVENT_ID_MME_QM:
+ case GOYA_ASYNC_EVENT_ID_MME_CMDQ:
+ case GOYA_ASYNC_EVENT_ID_DMA0_QM ... GOYA_ASYNC_EVENT_ID_DMA4_QM:
+ case GOYA_ASYNC_EVENT_ID_DMA0_CH ... GOYA_ASYNC_EVENT_ID_DMA4_CH:
+ goya_print_irq_info(hdev, event_type, true);
+ goya_unmask_irq(hdev, event_type);
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_10_VRHOT_ICRIT:
+ case GOYA_ASYNC_EVENT_ID_TPC0_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC1_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC2_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC3_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC4_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC5_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC6_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_TPC7_BMON_SPMU:
+ case GOYA_ASYNC_EVENT_ID_DMA_BM_CH0 ... GOYA_ASYNC_EVENT_ID_DMA_BM_CH4:
+ goya_print_irq_info(hdev, event_type, false);
+ goya_unmask_irq(hdev, event_type);
+ break;
+
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_S:
+ case GOYA_ASYNC_EVENT_ID_FIX_POWER_ENV_E:
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_S:
+ case GOYA_ASYNC_EVENT_ID_FIX_THERMAL_ENV_E:
+ goya_print_clk_change_info(hdev, event_type);
+ goya_unmask_irq(hdev, event_type);
+ break;
+
+ case GOYA_ASYNC_EVENT_PKT_QUEUE_OUT_SYNC:
+ goya_print_irq_info(hdev, event_type, false);
+ goya_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
+ if (hdev->hard_reset_on_fw_events)
+ hl_device_reset(hdev, HL_DRV_RESET_HARD);
+ else
+ hl_fw_unmask_irq(hdev, event_type);
+ break;
+
+ default:
+ dev_err(hdev->dev, "Received invalid H/W interrupt %d\n",
+ event_type);
+ break;
+ }
+}
+
+void *goya_get_events_stat(struct hl_device *hdev, bool aggregate, u32 *size)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (aggregate) {
+ *size = (u32) sizeof(goya->events_stat_aggregate);
+ return goya->events_stat_aggregate;
+ }
+
+ *size = (u32) sizeof(goya->events_stat);
+ return goya->events_stat;
+}
+
+static int goya_memset_device_memory(struct hl_device *hdev, u64 addr, u64 size,
+ u64 val, bool is_dram)
+{
+ struct packet_lin_dma *lin_dma_pkt;
+ struct hl_cs_job *job;
+ u32 cb_size, ctl;
+ struct hl_cb *cb;
+ int rc, lin_dma_pkts_cnt;
+
+ lin_dma_pkts_cnt = DIV_ROUND_UP_ULL(size, SZ_2G);
+ cb_size = lin_dma_pkts_cnt * sizeof(struct packet_lin_dma) +
+ sizeof(struct packet_msg_prot);
+ cb = hl_cb_kernel_create(hdev, cb_size, false);
+ if (!cb)
+ return -ENOMEM;
+
+ lin_dma_pkt = cb->kernel_address;
+
+ do {
+ memset(lin_dma_pkt, 0, sizeof(*lin_dma_pkt));
+
+ ctl = ((PACKET_LIN_DMA << GOYA_PKT_CTL_OPCODE_SHIFT) |
+ (1 << GOYA_PKT_LIN_DMA_CTL_MEMSET_SHIFT) |
+ (1 << GOYA_PKT_LIN_DMA_CTL_WO_SHIFT) |
+ (1 << GOYA_PKT_CTL_RB_SHIFT) |
+ (1 << GOYA_PKT_CTL_MB_SHIFT));
+ ctl |= (is_dram ? HL_DMA_HOST_TO_DRAM : HL_DMA_HOST_TO_SRAM) <<
+ GOYA_PKT_LIN_DMA_CTL_DMA_DIR_SHIFT;
+ lin_dma_pkt->ctl = cpu_to_le32(ctl);
+
+ lin_dma_pkt->src_addr = cpu_to_le64(val);
+ lin_dma_pkt->dst_addr = cpu_to_le64(addr);
+ if (lin_dma_pkts_cnt > 1)
+ lin_dma_pkt->tsize = cpu_to_le32(SZ_2G);
+ else
+ lin_dma_pkt->tsize = cpu_to_le32(size);
+
+ size -= SZ_2G;
+ addr += SZ_2G;
+ lin_dma_pkt++;
+ } while (--lin_dma_pkts_cnt);
+
+ job = hl_cs_allocate_job(hdev, QUEUE_TYPE_EXT, true);
+ if (!job) {
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ rc = -ENOMEM;
+ goto release_cb;
+ }
+
+ job->id = 0;
+ job->user_cb = cb;
+ atomic_inc(&job->user_cb->cs_cnt);
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = GOYA_QUEUE_ID_DMA_0;
+ job->patched_cb = job->user_cb;
+ job->job_cb_size = job->user_cb_size;
+
+ hl_debugfs_add_job(hdev, job);
+
+ rc = goya_send_job_on_qman0(hdev, job);
+
+ hl_debugfs_remove_job(hdev, job);
+ kfree(job);
+ atomic_dec(&cb->cs_cnt);
+
+release_cb:
+ hl_cb_put(cb);
+ hl_cb_destroy(&hdev->kernel_mem_mgr, cb->buf->handle);
+
+ return rc;
+}
+
+int goya_context_switch(struct hl_device *hdev, u32 asid)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 addr = prop->sram_base_address, sob_addr;
+ u32 size = hdev->pldm ? 0x10000 : prop->sram_size;
+ u64 val = 0x7777777777777777ull;
+ int rc, dma_id;
+ u32 channel_off = mmDMA_CH_1_WR_COMP_ADDR_LO -
+ mmDMA_CH_0_WR_COMP_ADDR_LO;
+
+ rc = goya_memset_device_memory(hdev, addr, size, val, false);
+ if (rc) {
+ dev_err(hdev->dev, "Failed to clear SRAM in context switch\n");
+ return rc;
+ }
+
+ /* we need to reset registers that the user is allowed to change */
+ sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1007;
+ WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO, lower_32_bits(sob_addr));
+
+ for (dma_id = 1 ; dma_id < NUMBER_OF_EXT_HW_QUEUES ; dma_id++) {
+ sob_addr = CFG_BASE + mmSYNC_MNGR_SOB_OBJ_1000 +
+ (dma_id - 1) * 4;
+ WREG32(mmDMA_CH_0_WR_COMP_ADDR_LO + channel_off * dma_id,
+ lower_32_bits(sob_addr));
+ }
+
+ WREG32(mmTPC_PLL_CLK_RLX_0, 0x200020);
+
+ goya_clear_sm_regs(hdev);
+
+ return 0;
+}
+
+static int goya_mmu_clear_pgt_range(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+ u64 addr = prop->mmu_pgt_addr;
+ u32 size = prop->mmu_pgt_size + MMU_DRAM_DEFAULT_PAGE_SIZE +
+ MMU_CACHE_MNG_SIZE;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return 0;
+
+ return goya_memset_device_memory(hdev, addr, size, 0, true);
+}
+
+static int goya_mmu_set_dram_default_page(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u64 addr = hdev->asic_prop.mmu_dram_default_page_addr;
+ u32 size = MMU_DRAM_DEFAULT_PAGE_SIZE;
+ u64 val = 0x9999999999999999ull;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return 0;
+
+ return goya_memset_device_memory(hdev, addr, size, val, true);
+}
+
+static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+ s64 off, cpu_off;
+ int rc;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return 0;
+
+ for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB) {
+ rc = hl_mmu_map_page(hdev->kernel_ctx,
+ prop->dram_base_address + off,
+ prop->dram_base_address + off, PAGE_SIZE_2MB,
+ (off + PAGE_SIZE_2MB) == CPU_FW_IMAGE_SIZE);
+ if (rc) {
+ dev_err(hdev->dev, "Map failed for address 0x%llx\n",
+ prop->dram_base_address + off);
+ goto unmap;
+ }
+ }
+
+ if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
+ rc = hl_mmu_map_page(hdev->kernel_ctx,
+ VA_CPU_ACCESSIBLE_MEM_ADDR,
+ hdev->cpu_accessible_dma_address,
+ PAGE_SIZE_2MB, true);
+
+ if (rc) {
+ dev_err(hdev->dev,
+ "Map failed for CPU accessible memory\n");
+ off -= PAGE_SIZE_2MB;
+ goto unmap;
+ }
+ } else {
+ for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB) {
+ rc = hl_mmu_map_page(hdev->kernel_ctx,
+ VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+ hdev->cpu_accessible_dma_address + cpu_off,
+ PAGE_SIZE_4KB, true);
+ if (rc) {
+ dev_err(hdev->dev,
+ "Map failed for CPU accessible memory\n");
+ cpu_off -= PAGE_SIZE_4KB;
+ goto unmap_cpu;
+ }
+ }
+ }
+
+ goya_mmu_prepare_reg(hdev, mmCPU_IF_ARUSER_OVR, HL_KERNEL_ASID_ID);
+ goya_mmu_prepare_reg(hdev, mmCPU_IF_AWUSER_OVR, HL_KERNEL_ASID_ID);
+ WREG32(mmCPU_IF_ARUSER_OVR_EN, 0x7FF);
+ WREG32(mmCPU_IF_AWUSER_OVR_EN, 0x7FF);
+
+ /* Make sure configuration is flushed to device */
+ RREG32(mmCPU_IF_AWUSER_OVR_EN);
+
+ goya->device_cpu_mmu_mappings_done = true;
+
+ return 0;
+
+unmap_cpu:
+ for (; cpu_off >= 0 ; cpu_off -= PAGE_SIZE_4KB)
+ if (hl_mmu_unmap_page(hdev->kernel_ctx,
+ VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+ PAGE_SIZE_4KB, true))
+ dev_warn_ratelimited(hdev->dev,
+ "failed to unmap address 0x%llx\n",
+ VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
+unmap:
+ for (; off >= 0 ; off -= PAGE_SIZE_2MB)
+ if (hl_mmu_unmap_page(hdev->kernel_ctx,
+ prop->dram_base_address + off, PAGE_SIZE_2MB,
+ true))
+ dev_warn_ratelimited(hdev->dev,
+ "failed to unmap address 0x%llx\n",
+ prop->dram_base_address + off);
+
+ return rc;
+}
+
+void goya_mmu_remove_device_cpu_mappings(struct hl_device *hdev)
+{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ struct goya_device *goya = hdev->asic_specific;
+ u32 off, cpu_off;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ if (!goya->device_cpu_mmu_mappings_done)
+ return;
+
+ WREG32(mmCPU_IF_ARUSER_OVR_EN, 0);
+ WREG32(mmCPU_IF_AWUSER_OVR_EN, 0);
+
+ if (!(hdev->cpu_accessible_dma_address & (PAGE_SIZE_2MB - 1))) {
+ if (hl_mmu_unmap_page(hdev->kernel_ctx,
+ VA_CPU_ACCESSIBLE_MEM_ADDR,
+ PAGE_SIZE_2MB, true))
+ dev_warn(hdev->dev,
+ "Failed to unmap CPU accessible memory\n");
+ } else {
+ for (cpu_off = 0 ; cpu_off < SZ_2M ; cpu_off += PAGE_SIZE_4KB)
+ if (hl_mmu_unmap_page(hdev->kernel_ctx,
+ VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off,
+ PAGE_SIZE_4KB,
+ (cpu_off + PAGE_SIZE_4KB) >= SZ_2M))
+ dev_warn_ratelimited(hdev->dev,
+ "failed to unmap address 0x%llx\n",
+ VA_CPU_ACCESSIBLE_MEM_ADDR + cpu_off);
+ }
+
+ for (off = 0 ; off < CPU_FW_IMAGE_SIZE ; off += PAGE_SIZE_2MB)
+ if (hl_mmu_unmap_page(hdev->kernel_ctx,
+ prop->dram_base_address + off, PAGE_SIZE_2MB,
+ (off + PAGE_SIZE_2MB) >= CPU_FW_IMAGE_SIZE))
+ dev_warn_ratelimited(hdev->dev,
+ "Failed to unmap address 0x%llx\n",
+ prop->dram_base_address + off);
+
+ goya->device_cpu_mmu_mappings_done = false;
+}
+
+static void goya_mmu_prepare(struct hl_device *hdev, u32 asid)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ int i;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU))
+ return;
+
+ if (asid & ~MME_QM_GLBL_SECURE_PROPS_ASID_MASK) {
+ dev_crit(hdev->dev, "asid %u is too big\n", asid);
+ return;
+ }
+
+ /* zero the MMBP and ASID bits and then set the ASID */
+ for (i = 0 ; i < GOYA_MMU_REGS_NUM ; i++)
+ goya_mmu_prepare_reg(hdev, goya_mmu_regs[i], asid);
+}
+
+static int goya_mmu_invalidate_cache(struct hl_device *hdev, bool is_hard,
+ u32 flags)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ u32 status, timeout_usec;
+ int rc;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_MMU) ||
+ hdev->reset_info.hard_reset_pending)
+ return 0;
+
+ /* no need in L1 only invalidation in Goya */
+ if (!is_hard)
+ return 0;
+
+ if (hdev->pldm)
+ timeout_usec = GOYA_PLDM_MMU_TIMEOUT_USEC;
+ else
+ timeout_usec = MMU_CONFIG_TIMEOUT_USEC;
+
+ /* L0 & L1 invalidation */
+ WREG32(mmSTLB_INV_ALL_START, 1);
+
+ rc = hl_poll_timeout(
+ hdev,
+ mmSTLB_INV_ALL_START,
+ status,
+ !status,
+ 1000,
+ timeout_usec);
+
+ return rc;
+}
+
+static int goya_mmu_invalidate_cache_range(struct hl_device *hdev,
+ bool is_hard, u32 flags,
+ u32 asid, u64 va, u64 size)
+{
+ /* Treat as invalidate all because there is no range invalidation
+ * in Goya
+ */
+ return hl_mmu_invalidate_cache(hdev, is_hard, flags);
+}
+
+int goya_send_heartbeat(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_send_heartbeat(hdev);
+}
+
+int goya_cpucp_info_get(struct hl_device *hdev)
+{
+ struct goya_device *goya = hdev->asic_specific;
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
+ u64 dram_size;
+ int rc;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ rc = hl_fw_cpucp_handshake(hdev, mmCPU_BOOT_DEV_STS0,
+ mmCPU_BOOT_DEV_STS1, mmCPU_BOOT_ERR0,
+ mmCPU_BOOT_ERR1);
+ if (rc)
+ return rc;
+
+ dram_size = le64_to_cpu(prop->cpucp_info.dram_size);
+ if (dram_size) {
+ if ((!is_power_of_2(dram_size)) ||
+ (dram_size < DRAM_PHYS_DEFAULT_SIZE)) {
+ dev_err(hdev->dev,
+ "F/W reported invalid DRAM size %llu. Trying to use default size\n",
+ dram_size);
+ dram_size = DRAM_PHYS_DEFAULT_SIZE;
+ }
+
+ prop->dram_size = dram_size;
+ prop->dram_end_address = prop->dram_base_address + dram_size;
+ }
+
+ if (!strlen(prop->cpucp_info.card_name))
+ strncpy(prop->cpucp_info.card_name, GOYA_DEFAULT_CARD_NAME,
+ CARD_NAME_MAX_LEN);
+
+ return 0;
+}
+
+static bool goya_is_device_idle(struct hl_device *hdev, u64 *mask_arr, u8 mask_len,
+ struct engines_data *e)
+{
+ const char *fmt = "%-5d%-9s%#-14x%#-16x%#x\n";
+ const char *dma_fmt = "%-5d%-9s%#-14x%#x\n";
+ unsigned long *mask = (unsigned long *)mask_arr;
+ u32 qm_glbl_sts0, cmdq_glbl_sts0, dma_core_sts0, tpc_cfg_sts,
+ mme_arch_sts;
+ bool is_idle = true, is_eng_idle;
+ u64 offset;
+ int i;
+
+ if (e)
+ hl_engine_data_sprintf(e, "\nDMA is_idle QM_GLBL_STS0 DMA_CORE_STS0\n"
+ "--- ------- ------------ -------------\n");
+
+ offset = mmDMA_QM_1_GLBL_STS0 - mmDMA_QM_0_GLBL_STS0;
+
+ for (i = 0 ; i < DMA_MAX_NUM ; i++) {
+ qm_glbl_sts0 = RREG32(mmDMA_QM_0_GLBL_STS0 + i * offset);
+ dma_core_sts0 = RREG32(mmDMA_CH_0_STS0 + i * offset);
+ is_eng_idle = IS_DMA_QM_IDLE(qm_glbl_sts0) &&
+ IS_DMA_IDLE(dma_core_sts0);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GOYA_ENGINE_ID_DMA_0 + i, mask);
+ if (e)
+ hl_engine_data_sprintf(e, dma_fmt, i, is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, dma_core_sts0);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nTPC is_idle QM_GLBL_STS0 CMDQ_GLBL_STS0 CFG_STATUS\n"
+ "--- ------- ------------ -------------- ----------\n");
+
+ offset = mmTPC1_QM_GLBL_STS0 - mmTPC0_QM_GLBL_STS0;
+
+ for (i = 0 ; i < TPC_MAX_NUM ; i++) {
+ qm_glbl_sts0 = RREG32(mmTPC0_QM_GLBL_STS0 + i * offset);
+ cmdq_glbl_sts0 = RREG32(mmTPC0_CMDQ_GLBL_STS0 + i * offset);
+ tpc_cfg_sts = RREG32(mmTPC0_CFG_STATUS + i * offset);
+ is_eng_idle = IS_TPC_QM_IDLE(qm_glbl_sts0) &&
+ IS_TPC_CMDQ_IDLE(cmdq_glbl_sts0) &&
+ IS_TPC_IDLE(tpc_cfg_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GOYA_ENGINE_ID_TPC_0 + i, mask);
+ if (e)
+ hl_engine_data_sprintf(e, fmt, i, is_eng_idle ? "Y" : "N",
+ qm_glbl_sts0, cmdq_glbl_sts0, tpc_cfg_sts);
+ }
+
+ if (e)
+ hl_engine_data_sprintf(e,
+ "\nMME is_idle QM_GLBL_STS0 CMDQ_GLBL_STS0 ARCH_STATUS\n"
+ "--- ------- ------------ -------------- -----------\n");
+
+ qm_glbl_sts0 = RREG32(mmMME_QM_GLBL_STS0);
+ cmdq_glbl_sts0 = RREG32(mmMME_CMDQ_GLBL_STS0);
+ mme_arch_sts = RREG32(mmMME_ARCH_STATUS);
+ is_eng_idle = IS_MME_QM_IDLE(qm_glbl_sts0) &&
+ IS_MME_CMDQ_IDLE(cmdq_glbl_sts0) &&
+ IS_MME_IDLE(mme_arch_sts);
+ is_idle &= is_eng_idle;
+
+ if (mask && !is_eng_idle)
+ set_bit(GOYA_ENGINE_ID_MME_0, mask);
+ if (e) {
+ hl_engine_data_sprintf(e, fmt, 0, is_eng_idle ? "Y" : "N", qm_glbl_sts0,
+ cmdq_glbl_sts0, mme_arch_sts);
+ hl_engine_data_sprintf(e, "\n");
+ }
+
+ return is_idle;
+}
+
+static void goya_hw_queues_lock(struct hl_device *hdev)
+ __acquires(&goya->hw_queues_lock)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ spin_lock(&goya->hw_queues_lock);
+}
+
+static void goya_hw_queues_unlock(struct hl_device *hdev)
+ __releases(&goya->hw_queues_lock)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ spin_unlock(&goya->hw_queues_lock);
+}
+
+static u32 goya_get_pci_id(struct hl_device *hdev)
+{
+ return hdev->pdev->device;
+}
+
+static int goya_get_eeprom_data(struct hl_device *hdev, void *data,
+ size_t max_size)
+{
+ struct goya_device *goya = hdev->asic_specific;
+
+ if (!(goya->hw_cap_initialized & HW_CAP_CPU_Q))
+ return 0;
+
+ return hl_fw_get_eeprom_data(hdev, data, max_size);
+}
+
+static void goya_cpu_init_scrambler_dram(struct hl_device *hdev)
+{
+
+}
+
+static int goya_ctx_init(struct hl_ctx *ctx)
+{
+ if (ctx->asid != HL_KERNEL_ASID_ID)
+ goya_mmu_prepare(ctx->hdev, ctx->asid);
+
+ return 0;
+}
+
+static int goya_pre_schedule_cs(struct hl_cs *cs)
+{
+ return 0;
+}
+
+u32 goya_get_queue_id_for_cq(struct hl_device *hdev, u32 cq_idx)
+{
+ return cq_idx;
+}
+
+static u32 goya_get_signal_cb_size(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static u32 goya_get_wait_cb_size(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static u32 goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size, bool eb)
+{
+ return 0;
+}
+
+static u32 goya_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop)
+{
+ return 0;
+}
+
+static void goya_reset_sob(struct hl_device *hdev, void *data)
+{
+
+}
+
+static void goya_reset_sob_group(struct hl_device *hdev, u16 sob_group)
+{
+
+}
+
+u64 goya_get_device_time(struct hl_device *hdev)
+{
+ u64 device_time = ((u64) RREG32(mmPSOC_TIMESTAMP_CNTCVU)) << 32;
+
+ return device_time | RREG32(mmPSOC_TIMESTAMP_CNTCVL);
+}
+
+static int goya_collective_wait_init_cs(struct hl_cs *cs)
+{
+ return 0;
+}
+
+static int goya_collective_wait_create_jobs(struct hl_device *hdev,
+ struct hl_ctx *ctx, struct hl_cs *cs, u32 wait_queue_id,
+ u32 collective_engine_id, u32 encaps_signal_offset)
+{
+ return -EINVAL;
+}
+
+static void goya_ctx_fini(struct hl_ctx *ctx)
+{
+
+}
+
+static int goya_get_hw_block_id(struct hl_device *hdev, u64 block_addr,
+ u32 *block_size, u32 *block_id)
+{
+ return -EPERM;
+}
+
+static int goya_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
+ u32 block_id, u32 block_size)
+{
+ return -EPERM;
+}
+
+static void goya_enable_events_from_fw(struct hl_device *hdev)
+{
+ WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
+ GOYA_ASYNC_EVENT_ID_INTS_REGISTER);
+}
+
+static int goya_ack_mmu_page_fault_or_access_error(struct hl_device *hdev, u64 mmu_cap_mask)
+{
+ return -EINVAL;
+}
+
+static int goya_map_pll_idx_to_fw_idx(u32 pll_idx)
+{
+ switch (pll_idx) {
+ case HL_GOYA_CPU_PLL: return CPU_PLL;
+ case HL_GOYA_PCI_PLL: return PCI_PLL;
+ case HL_GOYA_MME_PLL: return MME_PLL;
+ case HL_GOYA_TPC_PLL: return TPC_PLL;
+ case HL_GOYA_IC_PLL: return IC_PLL;
+ case HL_GOYA_MC_PLL: return MC_PLL;
+ case HL_GOYA_EMMC_PLL: return EMMC_PLL;
+ default: return -EINVAL;
+ }
+}
+
+static int goya_gen_sync_to_engine_map(struct hl_device *hdev,
+ struct hl_sync_to_engine_map *map)
+{
+ /* Not implemented */
+ return 0;
+}
+
+static int goya_monitor_valid(struct hl_mon_state_dump *mon)
+{
+ /* Not implemented */
+ return 0;
+}
+
+static int goya_print_single_monitor(char **buf, size_t *size, size_t *offset,
+ struct hl_device *hdev,
+ struct hl_mon_state_dump *mon)
+{
+ /* Not implemented */
+ return 0;
+}
+
+
+static int goya_print_fences_single_engine(
+ struct hl_device *hdev, u64 base_offset, u64 status_base_offset,
+ enum hl_sync_engine_type engine_type, u32 engine_id, char **buf,
+ size_t *size, size_t *offset)
+{
+ /* Not implemented */
+ return 0;
+}
+
+
+static struct hl_state_dump_specs_funcs goya_state_dump_funcs = {
+ .monitor_valid = goya_monitor_valid,
+ .print_single_monitor = goya_print_single_monitor,
+ .gen_sync_to_engine_map = goya_gen_sync_to_engine_map,
+ .print_fences_single_engine = goya_print_fences_single_engine,
+};
+
+static void goya_state_dump_init(struct hl_device *hdev)
+{
+ /* Not implemented */
+ hdev->state_dump_specs.props = goya_state_dump_specs_props;
+ hdev->state_dump_specs.funcs = goya_state_dump_funcs;
+}
+
+static u32 goya_get_sob_addr(struct hl_device *hdev, u32 sob_id)
+{
+ return 0;
+}
+
+static u32 *goya_get_stream_master_qid_arr(void)
+{
+ return NULL;
+}
+
+static int goya_get_monitor_dump(struct hl_device *hdev, void *data)
+{
+ return -EOPNOTSUPP;
+}
+
+static void goya_check_if_razwi_happened(struct hl_device *hdev)
+{
+}
+
+static int goya_scrub_device_dram(struct hl_device *hdev, u64 val)
+{
+ return -EOPNOTSUPP;
+}
+
+static int goya_set_dram_properties(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static int goya_set_binning_masks(struct hl_device *hdev)
+{
+ return 0;
+}
+
+static int goya_send_device_activity(struct hl_device *hdev, bool open)
+{
+ return 0;
+}
+
+static const struct hl_asic_funcs goya_funcs = {
+ .early_init = goya_early_init,
+ .early_fini = goya_early_fini,
+ .late_init = goya_late_init,
+ .late_fini = goya_late_fini,
+ .sw_init = goya_sw_init,
+ .sw_fini = goya_sw_fini,
+ .hw_init = goya_hw_init,
+ .hw_fini = goya_hw_fini,
+ .halt_engines = goya_halt_engines,
+ .suspend = goya_suspend,
+ .resume = goya_resume,
+ .mmap = goya_mmap,
+ .ring_doorbell = goya_ring_doorbell,
+ .pqe_write = goya_pqe_write,
+ .asic_dma_alloc_coherent = goya_dma_alloc_coherent,
+ .asic_dma_free_coherent = goya_dma_free_coherent,
+ .scrub_device_mem = goya_scrub_device_mem,
+ .scrub_device_dram = goya_scrub_device_dram,
+ .get_int_queue_base = goya_get_int_queue_base,
+ .test_queues = goya_test_queues,
+ .asic_dma_pool_zalloc = goya_dma_pool_zalloc,
+ .asic_dma_pool_free = goya_dma_pool_free,
+ .cpu_accessible_dma_pool_alloc = goya_cpu_accessible_dma_pool_alloc,
+ .cpu_accessible_dma_pool_free = goya_cpu_accessible_dma_pool_free,
+ .hl_dma_unmap_sgtable = hl_dma_unmap_sgtable,
+ .cs_parser = goya_cs_parser,
+ .asic_dma_map_sgtable = hl_dma_map_sgtable,
+ .add_end_of_cb_packets = goya_add_end_of_cb_packets,
+ .update_eq_ci = goya_update_eq_ci,
+ .context_switch = goya_context_switch,
+ .restore_phase_topology = goya_restore_phase_topology,
+ .debugfs_read_dma = goya_debugfs_read_dma,
+ .add_device_attr = goya_add_device_attr,
+ .handle_eqe = goya_handle_eqe,
+ .get_events_stat = goya_get_events_stat,
+ .read_pte = goya_read_pte,
+ .write_pte = goya_write_pte,
+ .mmu_invalidate_cache = goya_mmu_invalidate_cache,
+ .mmu_invalidate_cache_range = goya_mmu_invalidate_cache_range,
+ .mmu_prefetch_cache_range = NULL,
+ .send_heartbeat = goya_send_heartbeat,
+ .debug_coresight = goya_debug_coresight,
+ .is_device_idle = goya_is_device_idle,
+ .compute_reset_late_init = goya_compute_reset_late_init,
+ .hw_queues_lock = goya_hw_queues_lock,
+ .hw_queues_unlock = goya_hw_queues_unlock,
+ .get_pci_id = goya_get_pci_id,
+ .get_eeprom_data = goya_get_eeprom_data,
+ .get_monitor_dump = goya_get_monitor_dump,
+ .send_cpu_message = goya_send_cpu_message,
+ .pci_bars_map = goya_pci_bars_map,
+ .init_iatu = goya_init_iatu,
+ .rreg = hl_rreg,
+ .wreg = hl_wreg,
+ .halt_coresight = goya_halt_coresight,
+ .ctx_init = goya_ctx_init,
+ .ctx_fini = goya_ctx_fini,
+ .pre_schedule_cs = goya_pre_schedule_cs,
+ .get_queue_id_for_cq = goya_get_queue_id_for_cq,
+ .load_firmware_to_device = goya_load_firmware_to_device,
+ .load_boot_fit_to_device = goya_load_boot_fit_to_device,
+ .get_signal_cb_size = goya_get_signal_cb_size,
+ .get_wait_cb_size = goya_get_wait_cb_size,
+ .gen_signal_cb = goya_gen_signal_cb,
+ .gen_wait_cb = goya_gen_wait_cb,
+ .reset_sob = goya_reset_sob,
+ .reset_sob_group = goya_reset_sob_group,
+ .get_device_time = goya_get_device_time,
+ .pb_print_security_errors = NULL,
+ .collective_wait_init_cs = goya_collective_wait_init_cs,
+ .collective_wait_create_jobs = goya_collective_wait_create_jobs,
+ .get_dec_base_addr = NULL,
+ .scramble_addr = hl_mmu_scramble_addr,
+ .descramble_addr = hl_mmu_descramble_addr,
+ .ack_protection_bits_errors = goya_ack_protection_bits_errors,
+ .get_hw_block_id = goya_get_hw_block_id,
+ .hw_block_mmap = goya_block_mmap,
+ .enable_events_from_fw = goya_enable_events_from_fw,
+ .ack_mmu_errors = goya_ack_mmu_page_fault_or_access_error,
+ .map_pll_idx_to_fw_idx = goya_map_pll_idx_to_fw_idx,
+ .init_firmware_preload_params = goya_init_firmware_preload_params,
+ .init_firmware_loader = goya_init_firmware_loader,
+ .init_cpu_scrambler_dram = goya_cpu_init_scrambler_dram,
+ .state_dump_init = goya_state_dump_init,
+ .get_sob_addr = &goya_get_sob_addr,
+ .set_pci_memory_regions = goya_set_pci_memory_regions,
+ .get_stream_master_qid_arr = goya_get_stream_master_qid_arr,
+ .check_if_razwi_happened = goya_check_if_razwi_happened,
+ .mmu_get_real_page_size = hl_mmu_get_real_page_size,
+ .access_dev_mem = hl_access_dev_mem,
+ .set_dram_bar_base = goya_set_ddr_bar_base,
+ .send_device_activity = goya_send_device_activity,
+ .set_dram_properties = goya_set_dram_properties,
+ .set_binning_masks = goya_set_binning_masks,
+};
+
+/*
+ * goya_set_asic_funcs - set Goya function pointers
+ *
+ * @*hdev: pointer to hl_device structure
+ *
+ */
+void goya_set_asic_funcs(struct hl_device *hdev)
+{
+ hdev->asic_funcs = &goya_funcs;
+}
--- /dev/null
- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND;
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020-2023 Intel Corporation
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/set_memory.h>
+#include <linux/xarray.h>
+
+#include <drm/drm_cache.h>
+#include <drm/drm_debugfs.h>
+#include <drm/drm_file.h>
+#include <drm/drm_utils.h>
+
+#include "ivpu_drv.h"
+#include "ivpu_gem.h"
+#include "ivpu_hw.h"
+#include "ivpu_mmu.h"
+#include "ivpu_mmu_context.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+static const struct drm_gem_object_funcs ivpu_gem_funcs;
+
+static struct lock_class_key prime_bo_lock_class_key;
+
+static int __must_check prime_alloc_pages_locked(struct ivpu_bo *bo)
+{
+ /* Pages are managed by the underlying dma-buf */
+ return 0;
+}
+
+static void prime_free_pages_locked(struct ivpu_bo *bo)
+{
+ /* Pages are managed by the underlying dma-buf */
+}
+
+static int prime_map_pages_locked(struct ivpu_bo *bo)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ struct sg_table *sgt;
+
+ sgt = dma_buf_map_attachment_unlocked(bo->base.import_attach, DMA_BIDIRECTIONAL);
+ if (IS_ERR(sgt)) {
+ ivpu_err(vdev, "Failed to map attachment: %ld\n", PTR_ERR(sgt));
+ return PTR_ERR(sgt);
+ }
+
+ bo->sgt = sgt;
+ return 0;
+}
+
+static void prime_unmap_pages_locked(struct ivpu_bo *bo)
+{
+ dma_buf_unmap_attachment_unlocked(bo->base.import_attach, bo->sgt, DMA_BIDIRECTIONAL);
+ bo->sgt = NULL;
+}
+
+static const struct ivpu_bo_ops prime_ops = {
+ .type = IVPU_BO_TYPE_PRIME,
+ .name = "prime",
+ .alloc_pages = prime_alloc_pages_locked,
+ .free_pages = prime_free_pages_locked,
+ .map_pages = prime_map_pages_locked,
+ .unmap_pages = prime_unmap_pages_locked,
+};
+
+static int __must_check shmem_alloc_pages_locked(struct ivpu_bo *bo)
+{
+ int npages = bo->base.size >> PAGE_SHIFT;
+ struct page **pages;
+
+ pages = drm_gem_get_pages(&bo->base);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+
+ if (bo->flags & DRM_IVPU_BO_WC)
+ set_pages_array_wc(pages, npages);
+ else if (bo->flags & DRM_IVPU_BO_UNCACHED)
+ set_pages_array_uc(pages, npages);
+
+ bo->pages = pages;
+ return 0;
+}
+
+static void shmem_free_pages_locked(struct ivpu_bo *bo)
+{
+ if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
+ set_pages_array_wb(bo->pages, bo->base.size >> PAGE_SHIFT);
+
+ drm_gem_put_pages(&bo->base, bo->pages, true, false);
+ bo->pages = NULL;
+}
+
+static int ivpu_bo_map_pages_locked(struct ivpu_bo *bo)
+{
+ int npages = bo->base.size >> PAGE_SHIFT;
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ struct sg_table *sgt;
+ int ret;
+
+ sgt = drm_prime_pages_to_sg(&vdev->drm, bo->pages, npages);
+ if (IS_ERR(sgt)) {
+ ivpu_err(vdev, "Failed to allocate sgtable\n");
+ return PTR_ERR(sgt);
+ }
+
+ ret = dma_map_sgtable(vdev->drm.dev, sgt, DMA_BIDIRECTIONAL, 0);
+ if (ret) {
+ ivpu_err(vdev, "Failed to map BO in IOMMU: %d\n", ret);
+ goto err_free_sgt;
+ }
+
+ bo->sgt = sgt;
+ return 0;
+
+err_free_sgt:
+ kfree(sgt);
+ return ret;
+}
+
+static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+
+ dma_unmap_sgtable(vdev->drm.dev, bo->sgt, DMA_BIDIRECTIONAL, 0);
+ sg_free_table(bo->sgt);
+ kfree(bo->sgt);
+ bo->sgt = NULL;
+}
+
+static const struct ivpu_bo_ops shmem_ops = {
+ .type = IVPU_BO_TYPE_SHMEM,
+ .name = "shmem",
+ .alloc_pages = shmem_alloc_pages_locked,
+ .free_pages = shmem_free_pages_locked,
+ .map_pages = ivpu_bo_map_pages_locked,
+ .unmap_pages = ivpu_bo_unmap_pages_locked,
+};
+
+static int __must_check internal_alloc_pages_locked(struct ivpu_bo *bo)
+{
+ unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
+ struct page **pages;
+ int ret;
+
+ pages = kvmalloc_array(npages, sizeof(*bo->pages), GFP_KERNEL);
+ if (!pages)
+ return -ENOMEM;
+
+ for (i = 0; i < npages; i++) {
+ pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+ if (!pages[i]) {
+ ret = -ENOMEM;
+ goto err_free_pages;
+ }
+ cond_resched();
+ }
+
+ bo->pages = pages;
+ return 0;
+
+err_free_pages:
+ while (i--)
+ put_page(pages[i]);
+ kvfree(pages);
+ return ret;
+}
+
+static void internal_free_pages_locked(struct ivpu_bo *bo)
+{
+ unsigned int i, npages = bo->base.size >> PAGE_SHIFT;
+
+ for (i = 0; i < npages; i++)
+ put_page(bo->pages[i]);
+
+ kvfree(bo->pages);
+ bo->pages = NULL;
+}
+
+static const struct ivpu_bo_ops internal_ops = {
+ .type = IVPU_BO_TYPE_INTERNAL,
+ .name = "internal",
+ .alloc_pages = internal_alloc_pages_locked,
+ .free_pages = internal_free_pages_locked,
+ .map_pages = ivpu_bo_map_pages_locked,
+ .unmap_pages = ivpu_bo_unmap_pages_locked,
+};
+
+static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ int ret;
+
+ lockdep_assert_held(&bo->lock);
+ drm_WARN_ON(&vdev->drm, bo->sgt);
+
+ ret = bo->ops->alloc_pages(bo);
+ if (ret) {
+ ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
+ return ret;
+ }
+
+ ret = bo->ops->map_pages(bo);
+ if (ret) {
+ ivpu_err(vdev, "Failed to map pages for BO: %d", ret);
+ goto err_free_pages;
+ }
+ return ret;
+
+err_free_pages:
+ bo->ops->free_pages(bo);
+ return ret;
+}
+
+static void ivpu_bo_unmap_and_free_pages(struct ivpu_bo *bo)
+{
+ mutex_lock(&bo->lock);
+
+ WARN_ON(!bo->sgt);
+ bo->ops->unmap_pages(bo);
+ WARN_ON(bo->sgt);
+ bo->ops->free_pages(bo);
+ WARN_ON(bo->pages);
+
+ mutex_unlock(&bo->lock);
+}
+
+/*
+ * ivpu_bo_pin() - pin the backing physical pages and map them to VPU.
+ *
+ * This function pins physical memory pages, then maps the physical pages
+ * to IOMMU address space and finally updates the VPU MMU page tables
+ * to allow the VPU to translate VPU address to IOMMU address.
+ */
+int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ int ret = 0;
+
+ mutex_lock(&bo->lock);
+
+ if (!bo->vpu_addr) {
+ ivpu_err(vdev, "vpu_addr not set for BO ctx_id: %d handle: %d\n",
+ bo->ctx->id, bo->handle);
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (!bo->sgt) {
+ ret = ivpu_bo_alloc_and_map_pages_locked(bo);
+ if (ret)
+ goto unlock;
+ }
+
+ if (!bo->mmu_mapped) {
+ ret = ivpu_mmu_context_map_sgt(vdev, bo->ctx, bo->vpu_addr, bo->sgt,
+ ivpu_bo_is_snooped(bo));
+ if (ret) {
+ ivpu_err(vdev, "Failed to map BO in MMU: %d\n", ret);
+ goto unlock;
+ }
+ bo->mmu_mapped = true;
+ }
+
+unlock:
+ mutex_unlock(&bo->lock);
+
+ return ret;
+}
+
+static int
+ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
+ const struct ivpu_addr_range *range)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ int ret;
+
+ if (!range) {
+ if (bo->flags & DRM_IVPU_BO_HIGH_MEM)
+ range = &vdev->hw->ranges.user_high;
+ else
+ range = &vdev->hw->ranges.user_low;
+ }
+
+ mutex_lock(&ctx->lock);
+ ret = ivpu_mmu_context_insert_node_locked(ctx, range, bo->base.size, &bo->mm_node);
+ if (!ret) {
+ bo->ctx = ctx;
+ bo->vpu_addr = bo->mm_node.start;
+ list_add_tail(&bo->ctx_node, &ctx->bo_list);
+ }
+ mutex_unlock(&ctx->lock);
+
+ return ret;
+}
+
+static void ivpu_bo_free_vpu_addr(struct ivpu_bo *bo)
+{
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+ struct ivpu_mmu_context *ctx = bo->ctx;
+
+ ivpu_dbg(vdev, BO, "remove from ctx: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
+ ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
+
+ mutex_lock(&bo->lock);
+
+ if (bo->mmu_mapped) {
+ drm_WARN_ON(&vdev->drm, !bo->sgt);
+ ivpu_mmu_context_unmap_sgt(vdev, ctx, bo->vpu_addr, bo->sgt);
+ bo->mmu_mapped = false;
+ }
+
+ mutex_lock(&ctx->lock);
+ list_del(&bo->ctx_node);
+ bo->vpu_addr = 0;
+ bo->ctx = NULL;
+ ivpu_mmu_context_remove_node_locked(ctx, &bo->mm_node);
+ mutex_unlock(&ctx->lock);
+
+ mutex_unlock(&bo->lock);
+}
+
+void ivpu_bo_remove_all_bos_from_context(struct ivpu_mmu_context *ctx)
+{
+ struct ivpu_bo *bo, *tmp;
+
+ list_for_each_entry_safe(bo, tmp, &ctx->bo_list, ctx_node)
+ ivpu_bo_free_vpu_addr(bo);
+}
+
+static struct ivpu_bo *
+ivpu_bo_alloc(struct ivpu_device *vdev, struct ivpu_mmu_context *mmu_context,
+ u64 size, u32 flags, const struct ivpu_bo_ops *ops,
+ const struct ivpu_addr_range *range, u64 user_ptr)
+{
+ struct ivpu_bo *bo;
+ int ret = 0;
+
+ if (drm_WARN_ON(&vdev->drm, size == 0 || !PAGE_ALIGNED(size)))
+ return ERR_PTR(-EINVAL);
+
+ switch (flags & DRM_IVPU_BO_CACHE_MASK) {
+ case DRM_IVPU_BO_CACHED:
+ case DRM_IVPU_BO_UNCACHED:
+ case DRM_IVPU_BO_WC:
+ break;
+ default:
+ return ERR_PTR(-EINVAL);
+ }
+
+ bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+ if (!bo)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_init(&bo->lock);
+ bo->base.funcs = &ivpu_gem_funcs;
+ bo->flags = flags;
+ bo->ops = ops;
+ bo->user_ptr = user_ptr;
+
+ if (ops->type == IVPU_BO_TYPE_SHMEM)
+ ret = drm_gem_object_init(&vdev->drm, &bo->base, size);
+ else
+ drm_gem_private_object_init(&vdev->drm, &bo->base, size);
+
+ if (ret) {
+ ivpu_err(vdev, "Failed to initialize drm object\n");
+ goto err_free;
+ }
+
+ if (flags & DRM_IVPU_BO_MAPPABLE) {
+ ret = drm_gem_create_mmap_offset(&bo->base);
+ if (ret) {
+ ivpu_err(vdev, "Failed to allocate mmap offset\n");
+ goto err_release;
+ }
+ }
+
+ if (mmu_context) {
+ ret = ivpu_bo_alloc_vpu_addr(bo, mmu_context, range);
+ if (ret) {
+ ivpu_err(vdev, "Failed to add BO to context: %d\n", ret);
+ goto err_release;
+ }
+ }
+
+ return bo;
+
+err_release:
+ drm_gem_object_release(&bo->base);
+err_free:
+ kfree(bo);
+ return ERR_PTR(ret);
+}
+
+static void ivpu_bo_free(struct drm_gem_object *obj)
+{
+ struct ivpu_bo *bo = to_ivpu_bo(obj);
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+
+ if (bo->ctx)
+ ivpu_dbg(vdev, BO, "free: ctx %d vpu_addr 0x%llx allocated %d mmu_mapped %d\n",
+ bo->ctx->id, bo->vpu_addr, (bool)bo->sgt, bo->mmu_mapped);
+ else
+ ivpu_dbg(vdev, BO, "free: ctx (released) allocated %d mmu_mapped %d\n",
+ (bool)bo->sgt, bo->mmu_mapped);
+
+ drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
+
+ vunmap(bo->kvaddr);
+
+ if (bo->ctx)
+ ivpu_bo_free_vpu_addr(bo);
+
+ if (bo->sgt)
+ ivpu_bo_unmap_and_free_pages(bo);
+
+ if (bo->base.import_attach)
+ drm_prime_gem_destroy(&bo->base, bo->sgt);
+
+ drm_gem_object_release(&bo->base);
+
+ mutex_destroy(&bo->lock);
+ kfree(bo);
+}
+
+static int ivpu_bo_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+ struct ivpu_bo *bo = to_ivpu_bo(obj);
+ struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
+
+ ivpu_dbg(vdev, BO, "mmap: ctx %u handle %u vpu_addr 0x%llx size %zu type %s",
+ bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size, bo->ops->name);
+
+ if (obj->import_attach) {
+ /* Drop the reference drm_gem_mmap_obj() acquired.*/
+ drm_gem_object_put(obj);
+ vma->vm_private_data = NULL;
+ return dma_buf_mmap(obj->dma_buf, vma, 0);
+ }
+
++ vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND);
+ vma->vm_page_prot = ivpu_bo_pgprot(bo, vm_get_page_prot(vma->vm_flags));
+
+ return 0;
+}
+
+static struct sg_table *ivpu_bo_get_sg_table(struct drm_gem_object *obj)
+{
+ struct ivpu_bo *bo = to_ivpu_bo(obj);
+ loff_t npages = obj->size >> PAGE_SHIFT;
+ int ret = 0;
+
+ mutex_lock(&bo->lock);
+
+ if (!bo->sgt)
+ ret = ivpu_bo_alloc_and_map_pages_locked(bo);
+
+ mutex_unlock(&bo->lock);
+
+ if (ret)
+ return ERR_PTR(ret);
+
+ return drm_prime_pages_to_sg(obj->dev, bo->pages, npages);
+}
+
+static vm_fault_t ivpu_vm_fault(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct drm_gem_object *obj = vma->vm_private_data;
+ struct ivpu_bo *bo = to_ivpu_bo(obj);
+ loff_t npages = obj->size >> PAGE_SHIFT;
+ pgoff_t page_offset;
+ struct page *page;
+ vm_fault_t ret;
+ int err;
+
+ mutex_lock(&bo->lock);
+
+ if (!bo->sgt) {
+ err = ivpu_bo_alloc_and_map_pages_locked(bo);
+ if (err) {
+ ret = vmf_error(err);
+ goto unlock;
+ }
+ }
+
+ /* We don't use vmf->pgoff since that has the fake offset */
+ page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ if (page_offset >= npages) {
+ ret = VM_FAULT_SIGBUS;
+ } else {
+ page = bo->pages[page_offset];
+ ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
+ }
+
+unlock:
+ mutex_unlock(&bo->lock);
+
+ return ret;
+}
+
+static const struct vm_operations_struct ivpu_vm_ops = {
+ .fault = ivpu_vm_fault,
+ .open = drm_gem_vm_open,
+ .close = drm_gem_vm_close,
+};
+
+static const struct drm_gem_object_funcs ivpu_gem_funcs = {
+ .free = ivpu_bo_free,
+ .mmap = ivpu_bo_mmap,
+ .vm_ops = &ivpu_vm_ops,
+ .get_sg_table = ivpu_bo_get_sg_table,
+};
+
+int
+ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct ivpu_file_priv *file_priv = file->driver_priv;
+ struct ivpu_device *vdev = file_priv->vdev;
+ struct drm_ivpu_bo_create *args = data;
+ u64 size = PAGE_ALIGN(args->size);
+ struct ivpu_bo *bo;
+ int ret;
+
+ if (args->flags & ~DRM_IVPU_BO_FLAGS)
+ return -EINVAL;
+
+ if (size == 0)
+ return -EINVAL;
+
+ bo = ivpu_bo_alloc(vdev, &file_priv->ctx, size, args->flags, &shmem_ops, NULL, 0);
+ if (IS_ERR(bo)) {
+ ivpu_err(vdev, "Failed to create BO: %pe (ctx %u size %llu flags 0x%x)",
+ bo, file_priv->ctx.id, args->size, args->flags);
+ return PTR_ERR(bo);
+ }
+
+ ret = drm_gem_handle_create(file, &bo->base, &bo->handle);
+ if (!ret) {
+ args->vpu_addr = bo->vpu_addr;
+ args->handle = bo->handle;
+ }
+
+ drm_gem_object_put(&bo->base);
+
+ ivpu_dbg(vdev, BO, "alloc shmem: ctx %u vpu_addr 0x%llx size %zu flags 0x%x\n",
+ file_priv->ctx.id, bo->vpu_addr, bo->base.size, bo->flags);
+
+ return ret;
+}
+
+struct ivpu_bo *
+ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags)
+{
+ const struct ivpu_addr_range *range;
+ struct ivpu_addr_range fixed_range;
+ struct ivpu_bo *bo;
+ pgprot_t prot;
+ int ret;
+
+ drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(vpu_addr));
+ drm_WARN_ON(&vdev->drm, !PAGE_ALIGNED(size));
+
+ if (vpu_addr) {
+ fixed_range.start = vpu_addr;
+ fixed_range.end = vpu_addr + size;
+ range = &fixed_range;
+ } else {
+ range = &vdev->hw->ranges.global_low;
+ }
+
+ bo = ivpu_bo_alloc(vdev, &vdev->gctx, size, flags, &internal_ops, range, 0);
+ if (IS_ERR(bo)) {
+ ivpu_err(vdev, "Failed to create BO: %pe (vpu_addr 0x%llx size %llu flags 0x%x)",
+ bo, vpu_addr, size, flags);
+ return NULL;
+ }
+
+ ret = ivpu_bo_pin(bo);
+ if (ret)
+ goto err_put;
+
+ if (ivpu_bo_cache_mode(bo) != DRM_IVPU_BO_CACHED)
+ drm_clflush_pages(bo->pages, bo->base.size >> PAGE_SHIFT);
+
+ prot = ivpu_bo_pgprot(bo, PAGE_KERNEL);
+ bo->kvaddr = vmap(bo->pages, bo->base.size >> PAGE_SHIFT, VM_MAP, prot);
+ if (!bo->kvaddr) {
+ ivpu_err(vdev, "Failed to map BO into kernel virtual memory\n");
+ goto err_put;
+ }
+
+ ivpu_dbg(vdev, BO, "alloc internal: ctx 0 vpu_addr 0x%llx size %zu flags 0x%x\n",
+ bo->vpu_addr, bo->base.size, flags);
+
+ return bo;
+
+err_put:
+ drm_gem_object_put(&bo->base);
+ return NULL;
+}
+
+void ivpu_bo_free_internal(struct ivpu_bo *bo)
+{
+ drm_gem_object_put(&bo->base);
+}
+
+struct drm_gem_object *ivpu_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
+{
+ struct ivpu_device *vdev = to_ivpu_device(dev);
+ struct dma_buf_attachment *attach;
+ struct ivpu_bo *bo;
+
+ attach = dma_buf_attach(buf, dev->dev);
+ if (IS_ERR(attach))
+ return ERR_CAST(attach);
+
+ get_dma_buf(buf);
+
+ bo = ivpu_bo_alloc(vdev, NULL, buf->size, DRM_IVPU_BO_MAPPABLE, &prime_ops, NULL, 0);
+ if (IS_ERR(bo)) {
+ ivpu_err(vdev, "Failed to import BO: %pe (size %lu)", bo, buf->size);
+ goto err_detach;
+ }
+
+ lockdep_set_class(&bo->lock, &prime_bo_lock_class_key);
+
+ bo->base.import_attach = attach;
+
+ return &bo->base;
+
+err_detach:
+ dma_buf_detach(buf, attach);
+ dma_buf_put(buf);
+ return ERR_CAST(bo);
+}
+
+int ivpu_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct ivpu_file_priv *file_priv = file->driver_priv;
+ struct ivpu_device *vdev = to_ivpu_device(dev);
+ struct drm_ivpu_bo_info *args = data;
+ struct drm_gem_object *obj;
+ struct ivpu_bo *bo;
+ int ret = 0;
+
+ obj = drm_gem_object_lookup(file, args->handle);
+ if (!obj)
+ return -ENOENT;
+
+ bo = to_ivpu_bo(obj);
+
+ mutex_lock(&bo->lock);
+
+ if (!bo->ctx) {
+ ret = ivpu_bo_alloc_vpu_addr(bo, &file_priv->ctx, NULL);
+ if (ret) {
+ ivpu_err(vdev, "Failed to allocate vpu_addr: %d\n", ret);
+ goto unlock;
+ }
+ }
+
+ args->flags = bo->flags;
+ args->mmap_offset = drm_vma_node_offset_addr(&obj->vma_node);
+ args->vpu_addr = bo->vpu_addr;
+ args->size = obj->size;
+unlock:
+ mutex_unlock(&bo->lock);
+ drm_gem_object_put(obj);
+ return ret;
+}
+
+int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_ivpu_bo_wait *args = data;
+ struct drm_gem_object *obj;
+ unsigned long timeout;
+ long ret;
+
+ timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
+
+ obj = drm_gem_object_lookup(file, args->handle);
+ if (!obj)
+ return -EINVAL;
+
+ ret = dma_resv_wait_timeout(obj->resv, DMA_RESV_USAGE_READ, true, timeout);
+ if (ret == 0) {
+ ret = -ETIMEDOUT;
+ } else if (ret > 0) {
+ ret = 0;
+ args->job_status = to_ivpu_bo(obj)->job_status;
+ }
+
+ drm_gem_object_put(obj);
+
+ return ret;
+}
+
+static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
+{
+ unsigned long dma_refcount = 0;
+
+ if (bo->base.dma_buf && bo->base.dma_buf->file)
+ dma_refcount = atomic_long_read(&bo->base.dma_buf->file->f_count);
+
+ drm_printf(p, "%5u %6d %16llx %10lu %10u %12lu %14s\n",
+ bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.size,
+ kref_read(&bo->base.refcount), dma_refcount, bo->ops->name);
+}
+
+void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
+{
+ struct ivpu_device *vdev = to_ivpu_device(dev);
+ struct ivpu_file_priv *file_priv;
+ unsigned long ctx_id;
+ struct ivpu_bo *bo;
+
+ drm_printf(p, "%5s %6s %16s %10s %10s %12s %14s\n",
+ "ctx", "handle", "vpu_addr", "size", "refcount", "dma_refcount", "type");
+
+ mutex_lock(&vdev->gctx.lock);
+ list_for_each_entry(bo, &vdev->gctx.bo_list, ctx_node)
+ ivpu_bo_print_info(bo, p);
+ mutex_unlock(&vdev->gctx.lock);
+
+ xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
+ file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
+ if (!file_priv)
+ continue;
+
+ mutex_lock(&file_priv->ctx.lock);
+ list_for_each_entry(bo, &file_priv->ctx.bo_list, ctx_node)
+ ivpu_bo_print_info(bo, p);
+ mutex_unlock(&file_priv->ctx.lock);
+
+ ivpu_file_priv_put(&file_priv);
+ }
+}
+
+void ivpu_bo_list_print(struct drm_device *dev)
+{
+ struct drm_printer p = drm_info_printer(dev->dev);
+
+ ivpu_bo_list(dev, &p);
+}
}
/*
- * Look up and return a brd's page for a given sector.
- * If one does not exist, allocate an empty page, and insert that. Then
- * return it.
+ * Insert a new page for a given sector, if one does not already exist.
*/
-static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
+static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
{
pgoff_t idx;
struct page *page;
- gfp_t gfp_flags;
+ int ret = 0;
page = brd_lookup_page(brd, sector);
if (page)
- return page;
+ return 0;
- /*
- * Must use NOIO because we don't want to recurse back into the
- * block or filesystem layers from page reclaim.
- */
- gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
- page = alloc_page(gfp_flags);
+ page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM);
if (!page)
- return NULL;
+ return -ENOMEM;
- if (radix_tree_preload(GFP_NOIO)) {
+ if (radix_tree_maybe_preload(gfp)) {
__free_page(page);
- return NULL;
+ return -ENOMEM;
}
spin_lock(&brd->brd_lock);
if (radix_tree_insert(&brd->brd_pages, idx, page)) {
__free_page(page);
page = radix_tree_lookup(&brd->brd_pages, idx);
- BUG_ON(!page);
- BUG_ON(page->index != idx);
+ if (!page)
+ ret = -ENOMEM;
+ else if (page->index != idx)
+ ret = -EIO;
} else {
brd->brd_nr_pages++;
}
spin_unlock(&brd->brd_lock);
radix_tree_preload_end();
-
- return page;
+ return ret;
}
/*
/*
* copy_to_brd_setup must be called before copy_to_brd. It may sleep.
*/
-static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
+static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n,
+ gfp_t gfp)
{
unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
size_t copy;
+ int ret;
copy = min_t(size_t, n, PAGE_SIZE - offset);
- if (!brd_insert_page(brd, sector))
- return -ENOSPC;
+ ret = brd_insert_page(brd, sector, gfp);
+ if (ret)
+ return ret;
if (copy < n) {
sector += copy >> SECTOR_SHIFT;
- if (!brd_insert_page(brd, sector))
- return -ENOSPC;
+ ret = brd_insert_page(brd, sector, gfp);
}
- return 0;
+ return ret;
}
/*
* Process a single bvec of a bio.
*/
static int brd_do_bvec(struct brd_device *brd, struct page *page,
- unsigned int len, unsigned int off, enum req_op op,
+ unsigned int len, unsigned int off, blk_opf_t opf,
sector_t sector)
{
void *mem;
int err = 0;
- if (op_is_write(op)) {
- err = copy_to_brd_setup(brd, sector, len);
+ if (op_is_write(opf)) {
+ /*
+ * Must use NOIO because we don't want to recurse back into the
+ * block or filesystem layers from page reclaim.
+ */
+ gfp_t gfp = opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO;
+
+ err = copy_to_brd_setup(brd, sector, len, gfp);
if (err)
goto out;
}
mem = kmap_atomic(page);
- if (!op_is_write(op)) {
+ if (!op_is_write(opf)) {
copy_from_brd(mem + off, brd, sector, len);
flush_dcache_page(page);
} else {
(len & (SECTOR_SIZE - 1)));
err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
- bio_op(bio), sector);
+ bio->bi_opf, sector);
if (err) {
+ if (err == -ENOMEM && bio->bi_opf & REQ_NOWAIT) {
+ bio_wouldblock_error(bio);
+ return;
+ }
bio_io_error(bio);
return;
}
bio_endio(bio);
}
- static int brd_rw_page(struct block_device *bdev, sector_t sector,
- struct page *page, enum req_op op)
- {
- struct brd_device *brd = bdev->bd_disk->private_data;
- int err;
-
- if (PageTransHuge(page))
- return -ENOTSUPP;
- err = brd_do_bvec(brd, page, PAGE_SIZE, 0, op, sector);
- page_endio(page, op_is_write(op), err);
- return err;
- }
-
static const struct block_device_operations brd_fops = {
.owner = THIS_MODULE,
.submit_bio = brd_submit_bio,
- .rw_page = brd_rw_page,
};
/*
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
+ blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
+ blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
err = add_disk(disk);
if (err)
goto out_cleanup_disk;
end = start + (size >> SECTOR_SHIFT);
bound = zram->disksize >> SECTOR_SHIFT;
- /* out of range range */
+ /* out of range */
if (unlikely(start >= bound || end > bound || start > end))
return false;
for (; nr_pages != 0; index++, nr_pages--) {
struct bio_vec bvec;
- bvec.bv_page = page;
- bvec.bv_len = PAGE_SIZE;
- bvec.bv_offset = 0;
+ bvec_set_page(&bvec, page, PAGE_SIZE, 0);
spin_lock(&zram->wb_limit_lock);
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
while (*args) {
args = next_arg(args, ¶m, &val);
- if (!*val)
+ if (!val || !*val)
return -EINVAL;
if (!strcmp(param, "algo")) {
static int zram_bvec_read_from_bdev(struct zram *zram, struct page *page,
u32 index, struct bio *bio, bool partial_io)
{
- struct bio_vec bvec = {
- .bv_page = page,
- .bv_len = PAGE_SIZE,
- .bv_offset = 0,
- };
+ struct bio_vec bvec;
+ bvec_set_page(&bvec, page, PAGE_SIZE, 0);
return read_from_bdev(zram, &bvec, zram_get_element(zram, index), bio,
partial_io);
}
/* Slot should be unlocked before the function call */
zram_slot_unlock(zram, index);
- /* A null bio means rw_page was used, we must fallback to bio */
- if (!bio)
- return -EOPNOTSUPP;
-
ret = zram_bvec_read_from_bdev(zram, page, index, bio,
partial_io);
}
memcpy_from_bvec(dst + offset, bvec);
kunmap_atomic(dst);
- vec.bv_page = page;
- vec.bv_len = PAGE_SIZE;
- vec.bv_offset = 0;
+ bvec_set_page(&vec, page, PAGE_SIZE, 0);
}
ret = __zram_bvec_write(zram, &vec, index, bio);
while (*args) {
args = next_arg(args, ¶m, &val);
- if (!*val)
+ if (!val || !*val)
return -EINVAL;
if (!strcmp(param, "type")) {
zram_slot_unlock(zram, index);
}
- static int zram_rw_page(struct block_device *bdev, sector_t sector,
- struct page *page, enum req_op op)
- {
- int offset, ret;
- u32 index;
- struct zram *zram;
- struct bio_vec bv;
- unsigned long start_time;
-
- if (PageTransHuge(page))
- return -ENOTSUPP;
- zram = bdev->bd_disk->private_data;
-
- if (!valid_io_request(zram, sector, PAGE_SIZE)) {
- atomic64_inc(&zram->stats.invalid_io);
- ret = -EINVAL;
- goto out;
- }
-
- index = sector >> SECTORS_PER_PAGE_SHIFT;
- offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
-
- bv.bv_page = page;
- bv.bv_len = PAGE_SIZE;
- bv.bv_offset = 0;
-
- start_time = bdev_start_io_acct(bdev->bd_disk->part0,
- SECTORS_PER_PAGE, op, jiffies);
- ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
- bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
- out:
- /*
- * If I/O fails, just return error(ie, non-zero) without
- * calling page_endio.
- * It causes resubmit the I/O with bio request by upper functions
- * of rw_page(e.g., swap_readpage, __swap_writepage) and
- * bio->bi_end_io does things to handle the error
- * (e.g., SetPageError, set_page_dirty and extra works).
- */
- if (unlikely(ret < 0))
- return ret;
-
- switch (ret) {
- case 0:
- page_endio(page, op_is_write(op), 0);
- break;
- case 1:
- ret = 0;
- break;
- default:
- WARN_ON(1);
- }
- return ret;
- }
-
static void zram_destroy_comps(struct zram *zram)
{
u32 prio;
.open = zram_open,
.submit_bio = zram_submit_bio,
.swap_slot_free_notify = zram_slot_free_notify,
- .rw_page = zram_rw_page,
.owner = THIS_MODULE
};
zram->disk->private_data = zram;
snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
- /* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
+ /* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
set_capacity(zram->disk, 0);
/* zram devices sort of resembles non-rotational disks */
blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue);
+ blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue);
/*
#define QM_VFT_CFG_RDY 0x10006c
#define QM_VFT_CFG_OP_WR 0x100058
#define QM_VFT_CFG_TYPE 0x10005c
-#define QM_SQC_VFT 0x0
-#define QM_CQC_VFT 0x1
#define QM_VFT_CFG 0x100060
#define QM_VFT_CFG_OP_ENABLE 0x100054
#define QM_PM_CTRL 0x100148
#define QM_SQC_VFT_BASE_SHIFT_V2 28
#define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
#define QM_SQC_VFT_NUM_SHIFT_V2 45
-#define QM_SQC_VFT_NUM_MASK_v2 GENMASK(9, 0)
+#define QM_SQC_VFT_NUM_MASK_V2 GENMASK(9, 0)
#define QM_ABNORMAL_INT_SOURCE 0x100000
#define QM_ABNORMAL_INT_MASK 0x100004
/* interfunction communication */
#define QM_IFC_READY_STATUS 0x100128
-#define QM_IFC_C_STS_M 0x10012C
#define QM_IFC_INT_SET_P 0x100130
#define QM_IFC_INT_CFG 0x100134
#define QM_IFC_INT_SOURCE_P 0x100138
#define PCI_BAR_2 2
#define PCI_BAR_4 4
-#define QM_SQE_DATA_ALIGN_MASK GENMASK(6, 0)
#define QMC_ALIGN(sz) ALIGN(sz, 32)
#define QM_DBG_READ_LEN 256
#define QM_DRIVER_REMOVING 0
#define QM_RST_SCHED 1
#define QM_QOS_PARAM_NUM 2
-#define QM_QOS_VAL_NUM 1
-#define QM_QOS_BDF_PARAM_NUM 4
#define QM_QOS_MAX_VAL 1000
#define QM_QOS_RATE 100
#define QM_QOS_EXPAND_RATE 1000
#define QM_SHAPER_FACTOR_CBS_B_SHIFT 15
#define QM_SHAPER_FACTOR_CBS_S_SHIFT 19
#define QM_SHAPER_CBS_B 1
-#define QM_SHAPER_CBS_S 16
#define QM_SHAPER_VFT_OFFSET 6
-#define WAIT_FOR_QOS_VF 100
#define QM_QOS_MIN_ERROR_RATE 5
-#define QM_QOS_TYPICAL_NUM 8
#define QM_SHAPER_MIN_CBS_S 8
#define QM_QOS_TICK 0x300U
#define QM_QOS_DIVISOR_CLK 0x1f40U
#define QM_QOS_MAX_CIR_B 200
#define QM_QOS_MIN_CIR_B 100
#define QM_QOS_MAX_CIR_U 6
-#define QM_QOS_MAX_CIR_S 11
#define QM_AUTOSUSPEND_DELAY 3000
#define QM_MK_CQC_DW3_V1(hop_num, pg_sz, buf_sz, cqe_sz) \
- (((hop_num) << QM_CQ_HOP_NUM_SHIFT) | \
- ((pg_sz) << QM_CQ_PAGE_SIZE_SHIFT) | \
- ((buf_sz) << QM_CQ_BUF_SIZE_SHIFT) | \
+ (((hop_num) << QM_CQ_HOP_NUM_SHIFT) | \
+ ((pg_sz) << QM_CQ_PAGE_SIZE_SHIFT) | \
+ ((buf_sz) << QM_CQ_BUF_SIZE_SHIFT) | \
((cqe_sz) << QM_CQ_CQE_SIZE_SHIFT))
#define QM_MK_CQC_DW3_V2(cqe_sz, cq_depth) \
((((u32)cq_depth) - 1) | ((cqe_sz) << QM_CQ_CQE_SIZE_SHIFT))
#define QM_MK_SQC_W13(priority, orders, alg_type) \
- (((priority) << QM_SQ_PRIORITY_SHIFT) | \
- ((orders) << QM_SQ_ORDERS_SHIFT) | \
+ (((priority) << QM_SQ_PRIORITY_SHIFT) | \
+ ((orders) << QM_SQ_ORDERS_SHIFT) | \
(((alg_type) & QM_SQ_TYPE_MASK) << QM_SQ_TYPE_SHIFT))
#define QM_MK_SQC_DW3_V1(hop_num, pg_sz, buf_sz, sqe_sz) \
- (((hop_num) << QM_SQ_HOP_NUM_SHIFT) | \
- ((pg_sz) << QM_SQ_PAGE_SIZE_SHIFT) | \
- ((buf_sz) << QM_SQ_BUF_SIZE_SHIFT) | \
+ (((hop_num) << QM_SQ_HOP_NUM_SHIFT) | \
+ ((pg_sz) << QM_SQ_PAGE_SIZE_SHIFT) | \
+ ((buf_sz) << QM_SQ_BUF_SIZE_SHIFT) | \
((u32)ilog2(sqe_sz) << QM_SQ_SQE_SIZE_SHIFT))
#define QM_MK_SQC_DW3_V2(sqe_sz, sq_depth) \
doorbell = qn | ((u64)cmd << QM_DB_CMD_SHIFT_V2) |
((u64)randata << QM_DB_RAND_SHIFT_V2) |
- ((u64)index << QM_DB_INDEX_SHIFT_V2) |
+ ((u64)index << QM_DB_INDEX_SHIFT_V2) |
((u64)priority << QM_DB_PRIORITY_SHIFT_V2);
writeq(doorbell, io_base);
}
}
-static bool do_qm_irq(struct hisi_qm *qm)
+static bool do_qm_eq_irq(struct hisi_qm *qm)
{
struct qm_eqe *eqe = qm->eqe + qm->status.eq_head;
struct hisi_qm_poll_data *poll_data;
return false;
}
-static irqreturn_t qm_irq(int irq, void *data)
+static irqreturn_t qm_eq_irq(int irq, void *data)
{
struct hisi_qm *qm = data;
bool ret;
- ret = do_qm_irq(qm);
+ ret = do_qm_eq_irq(qm);
if (ret)
return IRQ_HANDLED;
sqc_vft = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) << 32);
*base = QM_SQC_VFT_BASE_MASK_V2 & (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
- *number = (QM_SQC_VFT_NUM_MASK_v2 &
+ *number = (QM_SQC_VFT_NUM_MASK_V2 &
(sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
return 0;
* @qm: The qm we create a qp from.
* @alg_type: Accelerator specific algorithm type in sqc.
*
- * return created qp, -EBUSY if all qps in qm allocated, -ENOMEM if allocating
- * qp memory fails.
+ * Return created qp, negative error code if failed.
*/
static struct hisi_qp *hisi_qm_create_qp(struct hisi_qm *qm, u8 alg_type)
{
* @arg: Accelerator specific argument.
*
* After this function, qp can receive request from user. Return 0 if
- * successful, Return -EBUSY if failed.
+ * successful, negative error code if failed.
*/
int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
{
return -EINVAL;
}
- vma->vm_flags |= VM_IO;
+ vm_flags_set(vma, VM_IO);
return remap_pfn_range(vma, vma->vm_start,
phys_base >> PAGE_SHIFT,
return 0;
}
-
/**
* qm_clear_queues() - Clear all queues memory in a qm.
* @qm: The qm in which the queues will be cleared.
act_q_num = q_num;
}
- act_q_num = min_t(int, act_q_num, max_qp_num);
+ act_q_num = min(act_q_num, max_qp_num);
ret = hisi_qm_set_vft(qm, i, q_base, act_q_num);
if (ret) {
for (j = num_vfs; j > i; j--)
qos_val = ir / QM_QOS_RATE;
ret = scnprintf(tbuf, QM_DBG_READ_LEN, "%u\n", qos_val);
- ret = simple_read_from_buffer(buf, count, pos, tbuf, ret);
+ ret = simple_read_from_buffer(buf, count, pos, tbuf, ret);
err_get_status:
clear_bit(QM_RESETTING, &qm->misc_ctl);
if (!qm->err_status.is_dev_ecc_mbit &&
qm->err_status.is_qm_ecc_mbit &&
qm->err_ini->close_axi_master_ooo) {
-
qm->err_ini->close_axi_master_ooo(qm);
-
} else if (qm->err_status.is_dev_ecc_mbit &&
!qm->err_status.is_qm_ecc_mbit &&
!qm->err_ini->close_axi_master_ooo) {
-
nfe_enb = readl(qm->io_base + QM_RAS_NFE_ENABLE);
writel(nfe_enb & QM_RAS_NFE_MBIT_DISABLE,
qm->io_base + QM_RAS_NFE_ENABLE);
return IRQ_HANDLED;
}
-
/**
* hisi_qm_dev_shutdown() - Shutdown device.
* @pdev: The device will be shutdown.
return 0;
irq_vector = val & QM_IRQ_VECTOR_MASK;
- ret = request_irq(pci_irq_vector(pdev, irq_vector), qm_irq, 0, qm->dev_name, qm);
+ ret = request_irq(pci_irq_vector(pdev, irq_vector), qm_eq_irq, 0, qm->dev_name, qm);
if (ret)
dev_err(&pdev->dev, "failed to request eq irq, ret = %d", ret);
#include <drm/amdgpu_drm.h>
#include <drm/drm_drv.h>
#include <drm/drm_gem_ttm_helper.h>
+#include <drm/ttm/ttm_tt.h>
#include "amdgpu.h"
#include "amdgpu_display.h"
goto unlock;
}
- ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
- TTM_BO_VM_NUM_PREFAULT);
+ ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+ TTM_BO_VM_NUM_PREFAULT);
- drm_dev_exit(idx);
+ drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
}
*/
if (is_cow_mapping(vma->vm_flags) &&
!(vma->vm_flags & VM_ACCESS_FLAGS))
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
return drm_gem_ttm_mmap(obj, vma);
}
mutex_unlock(&p->svms.lock);
return -EADDRINUSE;
}
+
+ /* When register user buffer check if it has been registered by svm by
+ * buffer cpu virtual address.
+ */
+ if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) &&
+ interval_tree_iter_first(&p->svms.objects,
+ args->mmap_offset >> PAGE_SHIFT,
+ (args->mmap_offset + args->size - 1) >> PAGE_SHIFT)) {
+ pr_err("User Buffer Address: 0x%llx already allocated by SVM\n",
+ args->mmap_offset);
+ mutex_unlock(&p->svms.lock);
+ return -EADDRINUSE;
+ }
+
mutex_unlock(&p->svms.lock);
#endif
mutex_lock(&p->mutex);
}
/* Update the VRAM usage count */
- if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
- WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+ uint64_t size = args->size;
+
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM)
+ size >>= 1;
+ WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + PAGE_ALIGN(size));
+ }
mutex_unlock(&p->mutex);
address = dev->adev->rmmio_remap.bus_addr;
- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
- VM_DONTDUMP | VM_PFNMAP;
+ vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
+ VM_DONTDUMP | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
* per-process XNACK mode selection. But let the dev->noretry
* setting still influence the default XNACK mode.
*/
- if (supported && KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2))
+ if (supported && KFD_SUPPORT_XNACK_PER_PROCESS(dev))
continue;
/* GFXv10 and later GPUs do not support shader preemption
int kfd_process_device_init_vm(struct kfd_process_device *pdd,
struct file *drm_file)
{
+ struct amdgpu_fpriv *drv_priv;
+ struct amdgpu_vm *avm;
struct kfd_process *p;
struct kfd_dev *dev;
int ret;
if (pdd->drm_priv)
return -EBUSY;
+ ret = amdgpu_file_to_fpriv(drm_file, &drv_priv);
+ if (ret)
+ return ret;
+ avm = &drv_priv->vm;
+
p = pdd->process;
dev = pdd->dev;
- ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, drm_file,
+ ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, avm,
&p->kgd_process_info,
&p->ef);
if (ret) {
if (ret)
goto err_init_cwsr;
- ret = amdgpu_amdkfd_gpuvm_set_vm_pasid(dev->adev, drm_file, p->pasid);
+ ret = amdgpu_amdkfd_gpuvm_set_vm_pasid(dev->adev, avm, p->pasid);
if (ret)
goto err_set_pasid;
kfd_process_device_destroy_ib_mem(pdd);
err_reserve_ib_mem:
pdd->drm_priv = NULL;
+ amdgpu_amdkfd_gpuvm_destroy_cb(dev->adev, avm);
return ret;
}
return -ENOMEM;
}
- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND
- | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
+ vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND
+ | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP);
/* Mapping pages to user process */
return remap_pfn_range(vma, vma->vm_start,
PFN_DOWN(__pa(qpd->cwsr_kaddr)),
}
EXPORT_SYMBOL(drm_gem_private_object_init);
+/**
+ * drm_gem_private_object_fini - Finalize a failed drm_gem_object
+ * @obj: drm_gem_object
+ *
+ * Uninitialize an already allocated GEM object when it initialized failed
+ */
+void drm_gem_private_object_fini(struct drm_gem_object *obj)
+{
+ WARN_ON(obj->dma_buf);
+
+ dma_resv_fini(&obj->_resv);
+}
+EXPORT_SYMBOL(drm_gem_private_object_fini);
+
/**
* drm_gem_object_handle_free - release resources bound to userspace handles
* @obj: GEM object to clean up.
void
drm_gem_object_release(struct drm_gem_object *obj)
{
- WARN_ON(obj->dma_buf);
-
if (obj->filp)
fput(obj->filp);
- dma_resv_fini(&obj->_resv);
+ drm_gem_private_object_fini(obj);
+
drm_gem_free_mmap_offset(obj);
drm_gem_lru_remove(obj);
}
goto err_drm_gem_object_put;
}
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
}
dma_obj->dma_addr = sg_dma_address(sgt->sgl);
dma_obj->sgt = sgt;
- DRM_DEBUG_PRIME("dma_addr = %pad, size = %zu\n", &dma_obj->dma_addr,
- attach->dmabuf->size);
+ drm_dbg_prime(dev, "dma_addr = %pad, size = %zu\n", &dma_obj->dma_addr,
+ attach->dmabuf->size);
return &dma_obj->base;
}
* the whole buffer.
*/
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
- vma->vm_flags &= ~VM_PFNMAP;
- vma->vm_flags |= VM_DONTEXPAND;
+ vm_flags_mod(vma, VM_DONTEXPAND, VM_PFNMAP);
if (dma_obj->map_noncoherent) {
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
} else {
ret = drm_gem_object_init(dev, obj, size);
}
- if (ret)
+ if (ret) {
+ drm_gem_private_object_fini(obj);
goto err_free;
+ }
ret = drm_gem_create_mmap_offset(obj);
if (ret)
}
EXPORT_SYMBOL(drm_gem_shmem_vunmap);
-static struct drm_gem_shmem_object *
+static int
drm_gem_shmem_create_with_handle(struct drm_file *file_priv,
struct drm_device *dev, size_t size,
uint32_t *handle)
shmem = drm_gem_shmem_create(dev, size);
if (IS_ERR(shmem))
- return shmem;
+ return PTR_ERR(shmem);
/*
* Allocate an id of idr table where the obj is registered
ret = drm_gem_handle_create(file_priv, &shmem->base, handle);
/* drop reference from allocate - handle holds it now. */
drm_gem_object_put(&shmem->base);
- if (ret)
- return ERR_PTR(ret);
- return shmem;
+ return ret;
}
/* Update madvise status, returns true if not purged, else
struct drm_mode_create_dumb *args)
{
u32 min_pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
- struct drm_gem_shmem_object *shmem;
if (!args->pitch || !args->size) {
args->pitch = min_pitch;
args->size = PAGE_ALIGN(args->pitch * args->height);
}
- shmem = drm_gem_shmem_create_with_handle(file, dev, args->size, &args->handle);
-
- return PTR_ERR_OR_ZERO(shmem);
+ return drm_gem_shmem_create_with_handle(file, dev, args->size, &args->handle);
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_dumb_create);
if (ret)
return ret;
- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
if (shmem->map_wc)
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_get_sg_table);
-/**
- * drm_gem_shmem_get_pages_sgt - Pin pages, dma map them, and return a
- * scatter/gather table for a shmem GEM object.
- * @shmem: shmem GEM object
- *
- * This function returns a scatter/gather table suitable for driver usage. If
- * the sg table doesn't exist, the pages are pinned, dma-mapped, and a sg
- * table created.
- *
- * This is the main function for drivers to get at backing storage, and it hides
- * and difference between dma-buf imported and natively allocated objects.
- * drm_gem_shmem_get_sg_table() should not be directly called by drivers.
- *
- * Returns:
- * A pointer to the scatter/gather table of pinned pages or errno on failure.
- */
-struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_shmem_object *shmem)
+static struct sg_table *drm_gem_shmem_get_pages_sgt_locked(struct drm_gem_shmem_object *shmem)
{
struct drm_gem_object *obj = &shmem->base;
int ret;
WARN_ON(obj->import_attach);
- ret = drm_gem_shmem_get_pages(shmem);
+ ret = drm_gem_shmem_get_pages_locked(shmem);
if (ret)
return ERR_PTR(ret);
sg_free_table(sgt);
kfree(sgt);
err_put_pages:
- drm_gem_shmem_put_pages(shmem);
+ drm_gem_shmem_put_pages_locked(shmem);
return ERR_PTR(ret);
}
-EXPORT_SYMBOL_GPL(drm_gem_shmem_get_pages_sgt);
+
+/**
+ * drm_gem_shmem_get_pages_sgt - Pin pages, dma map them, and return a
+ * scatter/gather table for a shmem GEM object.
+ * @shmem: shmem GEM object
+ *
+ * This function returns a scatter/gather table suitable for driver usage. If
+ * the sg table doesn't exist, the pages are pinned, dma-mapped, and a sg
+ * table created.
+ *
+ * This is the main function for drivers to get at backing storage, and it hides
+ * and difference between dma-buf imported and natively allocated objects.
+ * drm_gem_shmem_get_sg_table() should not be directly called by drivers.
+ *
+ * Returns:
+ * A pointer to the scatter/gather table of pinned pages or errno on failure.
+ */
+struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_shmem_object *shmem)
+{
+ int ret;
+ struct sg_table *sgt;
+
+ ret = mutex_lock_interruptible(&shmem->pages_lock);
+ if (ret)
+ return ERR_PTR(ret);
+ sgt = drm_gem_shmem_get_pages_sgt_locked(shmem);
+ mutex_unlock(&shmem->pages_lock);
+
+ return sgt;
+}
+EXPORT_SYMBOL(drm_gem_shmem_get_pages_sgt);
/**
* drm_gem_shmem_prime_import_sg_table - Produce a shmem GEM object from
shmem->sgt = sgt;
- DRM_DEBUG_PRIME("size = %zu\n", size);
+ drm_dbg_prime(dev, "size = %zu\n", size);
return &shmem->base;
}
#include <drm/drm.h>
#include <drm/drm_crtc.h>
+#include <drm/drm_crtc_helper.h>
#include <drm/drm_fb_helper.h>
#include <drm/drm_fourcc.h>
#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_framebuffer_helper.h>
+#include <drm/drm_modeset_helper.h>
#include "framebuffer.h"
#include "gem.h"
*/
vma->vm_ops = &psbfb_vm_ops;
vma->vm_private_data = (void *)fb;
- vma->vm_flags |= VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}
info->screen_base = dev_priv->vram_addr + backing->offset;
info->screen_size = size;
- if (dev_priv->gtt.stolen_size) {
- info->apertures->ranges[0].base = dev_priv->fb_base;
- info->apertures->ranges[0].size = dev_priv->gtt.stolen_size;
- }
-
drm_fb_helper_fill_info(info, fb_helper, sizes);
info->fix.mmio_start = pci_resource_start(pdev, 0);
dev_priv->fb_helper = fb_helper;
- drm_fb_helper_prepare(dev, fb_helper, &psb_fb_helper_funcs);
+ drm_fb_helper_prepare(dev, fb_helper, 32, &psb_fb_helper_funcs);
ret = drm_fb_helper_init(dev, fb_helper);
if (ret)
/* disable all the possible outputs/crtcs before entering KMS mode */
drm_helper_disable_unused_functions(dev);
- ret = drm_fb_helper_initial_config(fb_helper, 32);
+ ret = drm_fb_helper_initial_config(fb_helper);
if (ret)
goto fini;
fini:
drm_fb_helper_fini(fb_helper);
free:
+ drm_fb_helper_unprepare(fb_helper);
kfree(fb_helper);
return ret;
}
return;
psb_fbdev_destroy(dev, dev_priv->fb_helper);
+ drm_fb_helper_unprepare(dev_priv->fb_helper);
kfree(dev_priv->fb_helper);
dev_priv->fb_helper = NULL;
}
/* Finally, remap it using the new GTT offset */
ret = remap_io_mapping(area,
area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT),
- (ggtt->gmadr.start + vma->node.start) >> PAGE_SHIFT,
+ (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT,
min_t(u64, vma->size, area->vm_end - area->vm_start),
&ggtt->iomap);
if (ret)
GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
out:
if (file)
- drm_vma_node_allow(&mmo->vma_node, file);
+ drm_vma_node_allow_once(&mmo->vma_node, file);
return mmo;
err:
i915_gem_object_put(obj);
return -EINVAL;
}
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
}
anon = mmap_singleton(to_i915(dev));
return PTR_ERR(anon);
}
- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+ vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
/*
* We keep the ref on mmo->obj, not vm_file, but we require
static int mtk_drm_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
+static const struct vm_operations_struct vm_ops = {
+ .open = drm_gem_vm_open,
+ .close = drm_gem_vm_close,
+};
+
static const struct drm_gem_object_funcs mtk_drm_gem_object_funcs = {
.free = mtk_drm_gem_free_object,
.get_sg_table = mtk_gem_prime_get_sg_table,
.vmap = mtk_drm_gem_prime_vmap,
.vunmap = mtk_drm_gem_prime_vunmap,
.mmap = mtk_drm_gem_object_mmap,
- .vm_ops = &drm_gem_dma_vm_ops,
+ .vm_ops = &vm_ops,
};
static struct mtk_drm_gem_obj *mtk_drm_gem_init(struct drm_device *dev,
* dma_alloc_attrs() allocated a struct page table for mtk_gem, so clear
* VM_PFNMAP flag that was set by drm_gem_mmap_obj()/drm_gem_mmap().
*/
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
ret = dma_mmap_attrs(priv->dma_dev, vma, mtk_gem->cookie,
mtk_gem->dma_addr, obj->size, mtk_gem->dma_attrs);
- if (ret)
- drm_gem_vm_close(vma);
return ret;
}
return;
vunmap(vaddr);
- mtk_gem->kvaddr = 0;
+ mtk_gem->kvaddr = NULL;
kfree(mtk_gem->pages);
}
{
struct omap_gem_object *omap_obj = to_omap_bo(obj);
- vma->vm_flags &= ~VM_PFNMAP;
- vma->vm_flags |= VM_MIXEDMAP;
+ vm_flags_mod(vma, VM_MIXEDMAP, VM_PFNMAP);
if (omap_obj->flags & OMAP_BO_WC) {
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
}
/**
- * omap_gem_dumb_map - buffer mapping for dumb interface
+ * omap_gem_dumb_map_offset - create an offset for a dumb buffer
* @file: our drm client file
* @dev: drm device
* @handle: GEM handle to the object (from dumb_create)
#define pr_fmt(fmt) "[TTM] " fmt
-#include <drm/ttm/ttm_bo_driver.h>
+#include <drm/ttm/ttm_bo.h>
#include <drm/ttm/ttm_placement.h>
-#include <drm/drm_vma_manager.h>
+#include <drm/ttm/ttm_tt.h>
+
#include <drm/drm_drv.h>
#include <drm/drm_managed.h>
-#include <linux/mm.h>
-#include <linux/pfn_t.h>
-#include <linux/rbtree.h>
-#include <linux/module.h>
-#include <linux/uaccess.h>
-#include <linux/mem_encrypt.h>
static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
struct vm_fault *vmf)
.access = ttm_bo_vm_access,
};
+/**
+ * ttm_bo_mmap_obj - mmap memory backed by a ttm buffer object.
+ *
+ * @vma: vma as input from the fbdev mmap method.
+ * @bo: The bo backing the address space.
+ *
+ * Maps a buffer object.
+ */
int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
{
/* Enforce no COW since would have really strange behavior with it. */
vma->vm_private_data = bo;
- vma->vm_flags |= VM_PFNMAP;
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}
EXPORT_SYMBOL(ttm_bo_mmap_obj);
ret = -EPERM;
goto done;
}
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
addr = vma->vm_start;
for (i = 0 ; i < uctxt->egrbufs.numbufs; i++) {
memlen = uctxt->egrbufs.buffers[i].len;
goto done;
}
- vma->vm_flags = flags;
+ vm_flags_reset(vma, flags);
hfi1_cdbg(PROC,
"%u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%lx\n",
ctxt, subctxt, type, mapio, vmf, memaddr, memlen,
addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
sizeof(tinfo.tidcnt)))
- return -EFAULT;
+ ret = -EFAULT;
addr = arg + offsetof(struct hfi1_tid_info, length);
- if (copy_to_user((void __user *)addr, &tinfo.length,
+ if (!ret && copy_to_user((void __user *)addr, &tinfo.length,
sizeof(tinfo.length)))
ret = -EFAULT;
+
+ if (ret)
+ hfi1_user_exp_rcv_invalid(fd, &tinfo);
}
return ret;
if (vma->vm_flags & (VM_WRITE | VM_EXEC))
return -EPERM;
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
if (!dev->mdev->clock_info)
return -EOPNOTSUPP;
if (vma->vm_flags & VM_WRITE)
return -EPERM;
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
/* Don't expose to user-space information it shouldn't have */
if (PAGE_SIZE > 4096)
}
}
-static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u32 port_num)
+static void mlx5_netdev_notifier_register(struct mlx5_roce *roce,
+ struct net_device *netdev)
{
int err;
- dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
- err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
- if (err) {
- dev->port[port_num].roce.nb.notifier_call = NULL;
- return err;
- }
+ if (roce->tracking_netdev)
+ return;
+ roce->tracking_netdev = netdev;
+ roce->nb.notifier_call = mlx5_netdev_event;
+ err = register_netdevice_notifier_dev_net(netdev, &roce->nb, &roce->nn);
+ WARN_ON(err);
+}
- return 0;
+static void mlx5_netdev_notifier_unregister(struct mlx5_roce *roce)
+{
+ if (!roce->tracking_netdev)
+ return;
+ unregister_netdevice_notifier_dev_net(roce->tracking_netdev, &roce->nb,
+ &roce->nn);
+ roce->tracking_netdev = NULL;
}
-static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u32 port_num)
+static int mlx5e_mdev_notifier_event(struct notifier_block *nb,
+ unsigned long event, void *data)
{
- if (dev->port[port_num].roce.nb.notifier_call) {
- unregister_netdevice_notifier(&dev->port[port_num].roce.nb);
- dev->port[port_num].roce.nb.notifier_call = NULL;
+ struct mlx5_roce *roce = container_of(nb, struct mlx5_roce, mdev_nb);
+ struct net_device *netdev = data;
+
+ switch (event) {
+ case MLX5_DRIVER_EVENT_UPLINK_NETDEV:
+ if (netdev)
+ mlx5_netdev_notifier_register(roce, netdev);
+ else
+ mlx5_netdev_notifier_unregister(roce);
+ break;
+ default:
+ return NOTIFY_DONE;
}
+
+ return NOTIFY_OK;
+}
+
+static void mlx5_mdev_netdev_track(struct mlx5_ib_dev *dev, u32 port_num)
+{
+ struct mlx5_roce *roce = &dev->port[port_num].roce;
+
+ roce->mdev_nb.notifier_call = mlx5e_mdev_notifier_event;
+ mlx5_blocking_notifier_register(dev->mdev, &roce->mdev_nb);
+ mlx5_core_uplink_netdev_event_replay(dev->mdev);
+}
+
+static void mlx5_mdev_netdev_untrack(struct mlx5_ib_dev *dev, u32 port_num)
+{
+ struct mlx5_roce *roce = &dev->port[port_num].roce;
+
+ mlx5_blocking_notifier_unregister(dev->mdev, &roce->mdev_nb);
+ mlx5_netdev_notifier_unregister(roce);
}
static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
if (mpi->mdev_events.notifier_call)
mlx5_notifier_unregister(mpi->mdev, &mpi->mdev_events);
mpi->mdev_events.notifier_call = NULL;
- mlx5_remove_netdev_notifier(ibdev, port_num);
+ mlx5_mdev_netdev_untrack(ibdev, port_num);
spin_lock(&port->mp.mpi_lock);
comps = mpi->mdev_refcnt;
if (err)
goto unbind;
- err = mlx5_add_netdev_notifier(ibdev, port_num);
- if (err) {
- mlx5_ib_err(ibdev, "failed adding netdev notifier for port %u\n",
- port_num + 1);
- goto unbind;
- }
+ mlx5_mdev_netdev_track(ibdev, port_num);
mpi->mdev_events.notifier_call = mlx5_ib_event_slave_port;
mlx5_notifier_register(mpi->mdev, &mpi->mdev_events);
port_num = mlx5_core_native_port_num(dev->mdev) - 1;
/* Register only for native ports */
- err = mlx5_add_netdev_notifier(dev, port_num);
- if (err)
- return err;
+ mlx5_mdev_netdev_track(dev, port_num);
err = mlx5_enable_eth(dev);
if (err)
return 0;
cleanup:
- mlx5_remove_netdev_notifier(dev, port_num);
+ mlx5_mdev_netdev_untrack(dev, port_num);
return err;
}
mlx5_disable_eth(dev);
port_num = mlx5_core_native_port_num(dev->mdev) - 1;
- mlx5_remove_netdev_notifier(dev, port_num);
+ mlx5_mdev_netdev_untrack(dev, port_num);
}
}
/* protect against the workqueue changing the page list */
mutex_lock(&fbdefio->lock);
- /* first write in this cycle, notify the driver */
- if (fbdefio->first_io && list_empty(&fbdefio->pagereflist))
- fbdefio->first_io(info);
-
pageref = fb_deferred_io_pageref_get(info, offset, page);
if (WARN_ON_ONCE(!pageref)) {
ret = VM_FAULT_OOM;
int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma)
{
vma->vm_ops = &fb_deferred_io_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
if (!(info->flags & FBINFO_VIRTFB))
- vma->vm_flags |= VM_IO;
+ vm_flags_set(vma, VM_IO);
vma->vm_private_data = info;
return 0;
}
}
EXPORT_SYMBOL_GPL(fb_deferred_io_open);
-void fb_deferred_io_cleanup(struct fb_info *info)
+void fb_deferred_io_release(struct fb_info *info)
{
struct fb_deferred_io *fbdefio = info->fbdefio;
struct page *page;
page = fb_deferred_io_page(info, i);
page->mapping = NULL;
}
+}
+EXPORT_SYMBOL_GPL(fb_deferred_io_release);
+
+void fb_deferred_io_cleanup(struct fb_info *info)
+{
+ struct fb_deferred_io *fbdefio = info->fbdefio;
+
+ fb_deferred_io_release(info);
kvfree(info->pagerefs);
mutex_destroy(&fbdefio->lock);
bool max_one_loop)
{
struct folio *folio;
- struct page *head_page;
+ struct folio_batch fbatch;
ssize_t ret;
+ unsigned int i;
int n, skips = 0;
_enter("%llx,%llx,", start, end);
+ folio_batch_init(&fbatch);
do {
pgoff_t index = start / PAGE_SIZE;
- n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
- PAGECACHE_TAG_DIRTY, 1, &head_page);
+ n = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
+ PAGECACHE_TAG_DIRTY, &fbatch);
+
if (!n)
break;
+ for (i = 0; i < n; i++) {
+ folio = fbatch.folios[i];
+ start = folio_pos(folio); /* May regress with THPs */
- folio = page_folio(head_page);
- start = folio_pos(folio); /* May regress with THPs */
-
- _debug("wback %lx", folio_index(folio));
+ _debug("wback %lx", folio_index(folio));
- /* At this point we hold neither the i_pages lock nor the
- * page lock: the page may be truncated or invalidated
- * (changing page->mapping to NULL), or even swizzled
- * back from swapper_space to tmpfs file mapping
- */
- if (wbc->sync_mode != WB_SYNC_NONE) {
- ret = folio_lock_killable(folio);
- if (ret < 0) {
- folio_put(folio);
- return ret;
- }
- } else {
- if (!folio_trylock(folio)) {
- folio_put(folio);
- return 0;
+ /* At this point we hold neither the i_pages lock nor the
+ * page lock: the page may be truncated or invalidated
+ * (changing page->mapping to NULL), or even swizzled
+ * back from swapper_space to tmpfs file mapping
+ */
+ if (wbc->sync_mode != WB_SYNC_NONE) {
+ ret = folio_lock_killable(folio);
+ if (ret < 0) {
+ folio_batch_release(&fbatch);
+ return ret;
+ }
+ } else {
+ if (!folio_trylock(folio))
+ continue;
}
- }
- if (folio_mapping(folio) != mapping ||
- !folio_test_dirty(folio)) {
- start += folio_size(folio);
- folio_unlock(folio);
- folio_put(folio);
- continue;
- }
+ if (folio->mapping != mapping ||
+ !folio_test_dirty(folio)) {
+ start += folio_size(folio);
+ folio_unlock(folio);
+ continue;
+ }
- if (folio_test_writeback(folio) ||
- folio_test_fscache(folio)) {
- folio_unlock(folio);
- if (wbc->sync_mode != WB_SYNC_NONE) {
- folio_wait_writeback(folio);
+ if (folio_test_writeback(folio) ||
+ folio_test_fscache(folio)) {
+ folio_unlock(folio);
+ if (wbc->sync_mode != WB_SYNC_NONE) {
+ folio_wait_writeback(folio);
#ifdef CONFIG_AFS_FSCACHE
- folio_wait_fscache(folio);
+ folio_wait_fscache(folio);
#endif
- } else {
- start += folio_size(folio);
+ } else {
+ start += folio_size(folio);
+ }
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched()) {
+ *_next = start;
+ _leave(" = 0 [%llx]", *_next);
+ return 0;
+ }
+ skips++;
+ }
+ continue;
}
- folio_put(folio);
- if (wbc->sync_mode == WB_SYNC_NONE) {
- if (skips >= 5 || need_resched())
- break;
- skips++;
+
+ if (!folio_clear_dirty_for_io(folio))
+ BUG();
+ ret = afs_write_back_from_locked_folio(mapping, wbc,
+ folio, start, end);
+ if (ret < 0) {
+ _leave(" = %zd", ret);
+ folio_batch_release(&fbatch);
+ return ret;
}
- continue;
- }
- if (!folio_clear_dirty_for_io(folio))
- BUG();
- ret = afs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
- folio_put(folio);
- if (ret < 0) {
- _leave(" = %zd", ret);
- return ret;
+ start += ret;
}
- start += ret;
-
- if (max_one_loop)
- break;
-
+ folio_batch_release(&fbatch);
cond_resched();
} while (wbc->nr_to_write > 0);
{
struct afs_vnode *vnode = AFS_FS_I(folio_inode(folio));
struct iov_iter iter;
- struct bio_vec bv[1];
+ struct bio_vec bv;
unsigned long priv;
unsigned int f, t;
int ret = 0;
t = afs_folio_dirty_to(folio, priv);
}
- bv[0].bv_page = &folio->page;
- bv[0].bv_offset = f;
- bv[0].bv_len = t - f;
- iov_iter_bvec(&iter, ITER_SOURCE, bv, 1, bv[0].bv_len);
+ bvec_set_folio(&bv, folio, t - f, f);
+ iov_iter_bvec(&iter, ITER_SOURCE, &bv, 1, bv.bv_len);
trace_afs_folio_dirty(vnode, tracepoint_string("launder"), folio);
ret = afs_store_data(vnode, &iter, folio_pos(folio) + f, true);
#include "file.h"
#include "dev-replace.h"
#include "super.h"
+#include "transaction.h"
static struct kmem_cache *extent_buffer_cache;
struct bio *bio;
int mirror_num;
enum btrfs_compression_type compress_type;
- u32 len_to_stripe_boundary;
u32 len_to_oe_boundary;
btrfs_bio_end_io_t end_io_func;
{
struct bio *bio;
struct bio_vec *bv;
- struct btrfs_inode *inode;
+ struct inode *inode;
int mirror_num;
if (!bio_ctrl->bio)
bio = bio_ctrl->bio;
bv = bio_first_bvec_all(bio);
- inode = BTRFS_I(bv->bv_page->mapping->host);
+ inode = bv->bv_page->mapping->host;
mirror_num = bio_ctrl->mirror_num;
/* Caller should ensure the bio has at least some range added */
ASSERT(bio->bi_iter.bi_size);
- btrfs_bio(bio)->file_offset = page_offset(bv->bv_page) + bv->bv_offset;
-
- if (!is_data_inode(&inode->vfs_inode)) {
+ if (!is_data_inode(inode)) {
if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
/*
* For metadata read, we should have the parent_check,
bio_ctrl->parent_check,
sizeof(struct btrfs_tree_parent_check));
}
- btrfs_submit_metadata_bio(inode, bio, mirror_num);
- } else if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
- btrfs_submit_data_write_bio(inode, bio, mirror_num);
- } else {
- btrfs_submit_data_read_bio(inode, bio, mirror_num,
- bio_ctrl->compress_type);
+ bio->bi_opf |= REQ_META;
}
+ if (btrfs_op(bio) == BTRFS_MAP_READ &&
+ bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
+ btrfs_submit_compressed_read(inode, bio, mirror_num);
+ else
+ btrfs_submit_bio(bio, mirror_num);
+
/* The bio is owned by the end_io handler now */
bio_ctrl->bio = NULL;
}
start, end, page_ops, NULL);
}
-static int insert_failrec(struct btrfs_inode *inode,
- struct io_failure_record *failrec)
-{
- struct rb_node *exist;
-
- spin_lock(&inode->io_failure_lock);
- exist = rb_simple_insert(&inode->io_failure_tree, failrec->bytenr,
- &failrec->rb_node);
- spin_unlock(&inode->io_failure_lock);
-
- return (exist == NULL) ? 0 : -EEXIST;
-}
-
-static struct io_failure_record *get_failrec(struct btrfs_inode *inode, u64 start)
-{
- struct rb_node *node;
- struct io_failure_record *failrec = ERR_PTR(-ENOENT);
-
- spin_lock(&inode->io_failure_lock);
- node = rb_simple_search(&inode->io_failure_tree, start);
- if (node)
- failrec = rb_entry(node, struct io_failure_record, rb_node);
- spin_unlock(&inode->io_failure_lock);
- return failrec;
-}
-
-static void free_io_failure(struct btrfs_inode *inode,
- struct io_failure_record *rec)
-{
- spin_lock(&inode->io_failure_lock);
- rb_erase(&rec->rb_node, &inode->io_failure_tree);
- spin_unlock(&inode->io_failure_lock);
-
- kfree(rec);
-}
-
-static int next_mirror(const struct io_failure_record *failrec, int cur_mirror)
-{
- if (cur_mirror == failrec->num_copies)
- return cur_mirror + 1 - failrec->num_copies;
- return cur_mirror + 1;
-}
-
-static int prev_mirror(const struct io_failure_record *failrec, int cur_mirror)
-{
- if (cur_mirror == 1)
- return failrec->num_copies;
- return cur_mirror - 1;
-}
-
-/*
- * each time an IO finishes, we do a fast check in the IO failure tree
- * to see if we need to process or clean up an io_failure_record
- */
-int btrfs_clean_io_failure(struct btrfs_inode *inode, u64 start,
- struct page *page, unsigned int pg_offset)
-{
- struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct extent_io_tree *io_tree = &inode->io_tree;
- u64 ino = btrfs_ino(inode);
- u64 locked_start, locked_end;
- struct io_failure_record *failrec;
- int mirror;
- int ret;
-
- failrec = get_failrec(inode, start);
- if (IS_ERR(failrec))
- return 0;
-
- BUG_ON(!failrec->this_mirror);
-
- if (sb_rdonly(fs_info->sb))
- goto out;
-
- ret = find_first_extent_bit(io_tree, failrec->bytenr, &locked_start,
- &locked_end, EXTENT_LOCKED, NULL);
- if (ret || locked_start > failrec->bytenr ||
- locked_end < failrec->bytenr + failrec->len - 1)
- goto out;
-
- mirror = failrec->this_mirror;
- do {
- mirror = prev_mirror(failrec, mirror);
- btrfs_repair_io_failure(fs_info, ino, start, failrec->len,
- failrec->logical, page, pg_offset, mirror);
- } while (mirror != failrec->failed_mirror);
-
-out:
- free_io_failure(inode, failrec);
- return 0;
-}
-
-/*
- * Can be called when
- * - hold extent lock
- * - under ordered extent
- * - the inode is freeing
- */
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
-{
- struct io_failure_record *failrec;
- struct rb_node *node, *next;
-
- if (RB_EMPTY_ROOT(&inode->io_failure_tree))
- return;
-
- spin_lock(&inode->io_failure_lock);
- node = rb_simple_search_first(&inode->io_failure_tree, start);
- while (node) {
- failrec = rb_entry(node, struct io_failure_record, rb_node);
- if (failrec->bytenr > end)
- break;
-
- next = rb_next(node);
- rb_erase(&failrec->rb_node, &inode->io_failure_tree);
- kfree(failrec);
-
- node = next;
- }
- spin_unlock(&inode->io_failure_lock);
-}
-
-static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode,
- struct btrfs_bio *bbio,
- unsigned int bio_offset)
-{
- struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
- u64 start = bbio->file_offset + bio_offset;
- struct io_failure_record *failrec;
- const u32 sectorsize = fs_info->sectorsize;
- int ret;
-
- failrec = get_failrec(BTRFS_I(inode), start);
- if (!IS_ERR(failrec)) {
- btrfs_debug(fs_info,
- "Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu",
- failrec->logical, failrec->bytenr, failrec->len);
- /*
- * when data can be on disk more than twice, add to failrec here
- * (e.g. with a list for failed_mirror) to make
- * clean_io_failure() clean all those errors at once.
- */
- ASSERT(failrec->this_mirror == bbio->mirror_num);
- ASSERT(failrec->len == fs_info->sectorsize);
- return failrec;
- }
-
- failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
- if (!failrec)
- return ERR_PTR(-ENOMEM);
-
- RB_CLEAR_NODE(&failrec->rb_node);
- failrec->bytenr = start;
- failrec->len = sectorsize;
- failrec->failed_mirror = bbio->mirror_num;
- failrec->this_mirror = bbio->mirror_num;
- failrec->logical = (bbio->iter.bi_sector << SECTOR_SHIFT) + bio_offset;
-
- btrfs_debug(fs_info,
- "new io failure record logical %llu start %llu",
- failrec->logical, start);
-
- failrec->num_copies = btrfs_num_copies(fs_info, failrec->logical, sectorsize);
- if (failrec->num_copies == 1) {
- /*
- * We only have a single copy of the data, so don't bother with
- * all the retry and error correction code that follows. No
- * matter what the error is, it is very likely to persist.
- */
- btrfs_debug(fs_info,
- "cannot repair logical %llu num_copies %d",
- failrec->logical, failrec->num_copies);
- kfree(failrec);
- return ERR_PTR(-EIO);
- }
-
- /* Set the bits in the private failure tree */
- ret = insert_failrec(BTRFS_I(inode), failrec);
- if (ret) {
- kfree(failrec);
- return ERR_PTR(ret);
- }
-
- return failrec;
-}
-
-int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_bbio,
- u32 bio_offset, struct page *page, unsigned int pgoff,
- bool submit_buffered)
-{
- u64 start = failed_bbio->file_offset + bio_offset;
- struct io_failure_record *failrec;
- struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct bio *failed_bio = &failed_bbio->bio;
- const int icsum = bio_offset >> fs_info->sectorsize_bits;
- struct bio *repair_bio;
- struct btrfs_bio *repair_bbio;
-
- btrfs_debug(fs_info,
- "repair read error: read error at %llu", start);
-
- BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
-
- failrec = btrfs_get_io_failure_record(&inode->vfs_inode, failed_bbio, bio_offset);
- if (IS_ERR(failrec))
- return PTR_ERR(failrec);
-
- /*
- * There are two premises:
- * a) deliver good data to the caller
- * b) correct the bad sectors on disk
- *
- * Since we're only doing repair for one sector, we only need to get
- * a good copy of the failed sector and if we succeed, we have setup
- * everything for btrfs_repair_io_failure to do the rest for us.
- */
- failrec->this_mirror = next_mirror(failrec, failrec->this_mirror);
- if (failrec->this_mirror == failrec->failed_mirror) {
- btrfs_debug(fs_info,
- "failed to repair num_copies %d this_mirror %d failed_mirror %d",
- failrec->num_copies, failrec->this_mirror, failrec->failed_mirror);
- free_io_failure(inode, failrec);
- return -EIO;
- }
-
- repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io,
- failed_bbio->private);
- repair_bbio = btrfs_bio(repair_bio);
- repair_bbio->file_offset = start;
- repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-
- if (failed_bbio->csum) {
- const u32 csum_size = fs_info->csum_size;
-
- repair_bbio->csum = repair_bbio->csum_inline;
- memcpy(repair_bbio->csum,
- failed_bbio->csum + csum_size * icsum, csum_size);
- }
-
- bio_add_page(repair_bio, page, failrec->len, pgoff);
- repair_bbio->iter = repair_bio->bi_iter;
-
- btrfs_debug(fs_info,
- "repair read error: submitting new read to mirror %d",
- failrec->this_mirror);
-
- /*
- * At this point we have a bio, so any errors from bio submission will
- * be handled by the endio on the repair_bio, so we can't return an
- * error here.
- */
- if (submit_buffered)
- btrfs_submit_data_read_bio(inode, repair_bio,
- failrec->this_mirror, 0);
- else
- btrfs_submit_dio_repair_bio(inode, repair_bio, failrec->this_mirror);
-
- return BLK_STS_OK;
-}
-
static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
{
struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
btrfs_subpage_end_reader(fs_info, page, start, len);
}
-static void end_sector_io(struct page *page, u64 offset, bool uptodate)
-{
- struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
- const u32 sectorsize = inode->root->fs_info->sectorsize;
-
- end_page_read(page, uptodate, offset, sectorsize);
- unlock_extent(&inode->io_tree, offset, offset + sectorsize - 1, NULL);
-}
-
-static void submit_data_read_repair(struct inode *inode,
- struct btrfs_bio *failed_bbio,
- u32 bio_offset, const struct bio_vec *bvec,
- unsigned int error_bitmap)
-{
- const unsigned int pgoff = bvec->bv_offset;
- struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
- struct page *page = bvec->bv_page;
- const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
- const u64 end = start + bvec->bv_len - 1;
- const u32 sectorsize = fs_info->sectorsize;
- const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
- int i;
-
- BUG_ON(bio_op(&failed_bbio->bio) == REQ_OP_WRITE);
-
- /* This repair is only for data */
- ASSERT(is_data_inode(inode));
-
- /* We're here because we had some read errors or csum mismatch */
- ASSERT(error_bitmap);
-
- /*
- * We only get called on buffered IO, thus page must be mapped and bio
- * must not be cloned.
- */
- ASSERT(page->mapping && !bio_flagged(&failed_bbio->bio, BIO_CLONED));
-
- /* Iterate through all the sectors in the range */
- for (i = 0; i < nr_bits; i++) {
- const unsigned int offset = i * sectorsize;
- bool uptodate = false;
- int ret;
-
- if (!(error_bitmap & (1U << i))) {
- /*
- * This sector has no error, just end the page read
- * and unlock the range.
- */
- uptodate = true;
- goto next;
- }
-
- ret = btrfs_repair_one_sector(BTRFS_I(inode), failed_bbio,
- bio_offset + offset, page, pgoff + offset,
- true);
- if (!ret) {
- /*
- * We have submitted the read repair, the page release
- * will be handled by the endio function of the
- * submitted repair bio.
- * Thus we don't need to do any thing here.
- */
- continue;
- }
- /*
- * Continue on failed repair, otherwise the remaining sectors
- * will not be properly unlocked.
- */
-next:
- end_sector_io(page, start + offset, uptodate);
- }
-}
-
/* lots and lots of room for performance fixes in the end_bio funcs */
void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
u64 start;
u64 end;
struct bvec_iter_all iter_all;
- bool first_bvec = true;
ASSERT(!bio_flagged(bio, BIO_CLONED));
bio_for_each_segment_all(bvec, bio, iter_all) {
start = page_offset(page) + bvec->bv_offset;
end = start + bvec->bv_len - 1;
- if (first_bvec) {
- btrfs_record_physical_zoned(inode, start, bio);
- first_bvec = false;
- }
-
end_extent_writepage(page, error, start, end);
btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
const u32 sectorsize = fs_info->sectorsize;
- unsigned int error_bitmap = (unsigned int)-1;
- bool repair = false;
u64 start;
u64 end;
u32 len;
len = bvec->bv_len;
mirror = bbio->mirror_num;
- if (likely(uptodate)) {
- if (is_data_inode(inode)) {
- error_bitmap = btrfs_verify_data_csum(bbio,
- bio_offset, page, start, end);
- if (error_bitmap)
- uptodate = false;
- } else {
- if (btrfs_validate_metadata_buffer(bbio,
- page, start, end, mirror))
- uptodate = false;
- }
- }
+ if (uptodate && !is_data_inode(inode) &&
+ btrfs_validate_metadata_buffer(bbio, page, start, end, mirror))
+ uptodate = false;
if (likely(uptodate)) {
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size >> PAGE_SHIFT;
- btrfs_clean_io_failure(BTRFS_I(inode), start, page, 0);
-
/*
* Zero out the remaining part if this range straddles
* i_size.
zero_user_segment(page, zero_start,
offset_in_page(end) + 1);
}
- } else if (is_data_inode(inode)) {
- /*
- * Only try to repair bios that actually made it to a
- * device. If the bio failed to be submitted mirror
- * is 0 and we need to fail it without retrying.
- *
- * This also includes the high level bios for compressed
- * extents - these never make it to a device and repair
- * is already handled on the lower compressed bio.
- */
- if (mirror > 0)
- repair = true;
- } else {
+ } else if (!is_data_inode(inode)) {
struct extent_buffer *eb;
eb = find_extent_buffer_readpage(fs_info, page, start);
atomic_dec(&eb->io_pages);
}
- if (repair) {
- /*
- * submit_data_read_repair() will handle all the good
- * and bad sectors, we just continue to the next bvec.
- */
- submit_data_read_repair(inode, bbio, bio_offset, bvec,
- error_bitmap);
- } else {
- /* Update page status and unlock */
- end_page_read(page, uptodate, start, len);
- endio_readpage_release_extent(&processed, BTRFS_I(inode),
- start, end, PageUptodate(page));
- }
+ /* Update page status and unlock. */
+ end_page_read(page, uptodate, start, len);
+ endio_readpage_release_extent(&processed, BTRFS_I(inode),
+ start, end, PageUptodate(page));
ASSERT(bio_offset + len > bio_offset);
bio_offset += len;
}
/* Release the last extent */
endio_readpage_release_extent(&processed, NULL, 0, 0, false);
- btrfs_bio_free_csum(bbio);
bio_put(bio);
}
u32 real_size;
const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
bool contig = false;
- int ret;
ASSERT(bio);
/* The limit should be calculated when bio_ctrl->bio is allocated */
- ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary);
+ ASSERT(bio_ctrl->len_to_oe_boundary);
if (bio_ctrl->compress_type != compress_type)
return 0;
if (!contig)
return 0;
- real_size = min(bio_ctrl->len_to_oe_boundary,
- bio_ctrl->len_to_stripe_boundary) - bio_size;
- real_size = min(real_size, size);
+ real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size);
/*
* If real_size is 0, never call bio_add_*_page(), as even size is 0,
if (real_size == 0)
return 0;
- if (bio_op(bio) == REQ_OP_ZONE_APPEND)
- ret = bio_add_zone_append_page(bio, page, real_size, pg_offset);
- else
- ret = bio_add_page(bio, page, real_size, pg_offset);
-
- return ret;
+ return bio_add_page(bio, page, real_size, pg_offset);
}
-static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
- struct btrfs_inode *inode, u64 file_offset)
+static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
+ struct btrfs_inode *inode, u64 file_offset)
{
- struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct btrfs_io_geometry geom;
struct btrfs_ordered_extent *ordered;
- struct extent_map *em;
- u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
- int ret;
/*
- * Pages for compressed extent are never submitted to disk directly,
- * thus it has no real boundary, just set them to U32_MAX.
- *
- * The split happens for real compressed bio, which happens in
- * btrfs_submit_compressed_read/write().
+ * Limit the extent to the ordered boundary for Zone Append.
+ * Compressed bios aren't submitted directly, so it doesn't apply to
+ * them.
*/
- if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
- bio_ctrl->len_to_oe_boundary = U32_MAX;
- bio_ctrl->len_to_stripe_boundary = U32_MAX;
- return 0;
- }
- em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
- if (IS_ERR(em))
- return PTR_ERR(em);
- ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
- logical, &geom);
- free_extent_map(em);
- if (ret < 0) {
- return ret;
- }
- if (geom.len > U32_MAX)
- bio_ctrl->len_to_stripe_boundary = U32_MAX;
- else
- bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
-
- if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
- bio_ctrl->len_to_oe_boundary = U32_MAX;
- return 0;
- }
-
- /* Ordered extent not yet created, so we're good */
- ordered = btrfs_lookup_ordered_extent(inode, file_offset);
- if (!ordered) {
- bio_ctrl->len_to_oe_boundary = U32_MAX;
- return 0;
+ if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE &&
+ btrfs_use_zone_append(btrfs_bio(bio_ctrl->bio))) {
+ ordered = btrfs_lookup_ordered_extent(inode, file_offset);
+ if (ordered) {
+ bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
+ ordered->file_offset +
+ ordered->disk_num_bytes - file_offset);
+ btrfs_put_ordered_extent(ordered);
+ return;
+ }
}
- bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
- ordered->disk_bytenr + ordered->disk_num_bytes - logical);
- btrfs_put_ordered_extent(ordered);
- return 0;
+ bio_ctrl->len_to_oe_boundary = U32_MAX;
}
-static int alloc_new_bio(struct btrfs_inode *inode,
- struct btrfs_bio_ctrl *bio_ctrl,
- struct writeback_control *wbc,
- blk_opf_t opf,
- u64 disk_bytenr, u32 offset, u64 file_offset,
- enum btrfs_compression_type compress_type)
+static void alloc_new_bio(struct btrfs_inode *inode,
+ struct btrfs_bio_ctrl *bio_ctrl,
+ struct writeback_control *wbc, blk_opf_t opf,
+ u64 disk_bytenr, u32 offset, u64 file_offset,
+ enum btrfs_compression_type compress_type)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct bio *bio;
- int ret;
- ASSERT(bio_ctrl->end_io_func);
-
- bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, bio_ctrl->end_io_func, NULL);
+ bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, inode, bio_ctrl->end_io_func,
+ NULL);
/*
* For compressed page range, its disk_bytenr is always @disk_bytenr
* passed in, no matter if we have added any range into previous bio.
bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
else
bio->bi_iter.bi_sector = (disk_bytenr + offset) >> SECTOR_SHIFT;
+ btrfs_bio(bio)->file_offset = file_offset;
bio_ctrl->bio = bio;
bio_ctrl->compress_type = compress_type;
- ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
- if (ret < 0)
- goto error;
+ calc_bio_boundaries(bio_ctrl, inode, file_offset);
if (wbc) {
/*
- * For Zone append we need the correct block_device that we are
- * going to write to set in the bio to be able to respect the
- * hardware limitation. Look it up here:
+ * Pick the last added device to support cgroup writeback. For
+ * multi-device file systems this means blk-cgroup policies have
+ * to always be set on the last added/replaced device.
+ * This is a bit odd but has been like that for a long time.
*/
- if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
- struct btrfs_device *dev;
-
- dev = btrfs_zoned_get_device(fs_info, disk_bytenr,
- fs_info->sectorsize);
- if (IS_ERR(dev)) {
- ret = PTR_ERR(dev);
- goto error;
- }
-
- bio_set_dev(bio, dev->bdev);
- } else {
- /*
- * Otherwise pick the last added device to support
- * cgroup writeback. For multi-device file systems this
- * means blk-cgroup policies have to always be set on the
- * last added/replaced device. This is a bit odd but has
- * been like that for a long time.
- */
- bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
- }
+ bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
wbc_init_bio(wbc, bio);
- } else {
- ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND);
}
- return 0;
-error:
- bio_ctrl->bio = NULL;
- btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
- return ret;
}
/*
enum btrfs_compression_type compress_type,
bool force_bio_submit)
{
- int ret = 0;
struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
unsigned int cur = pg_offset;
/* Allocate new bio if needed */
if (!bio_ctrl->bio) {
- ret = alloc_new_bio(inode, bio_ctrl, wbc, opf,
- disk_bytenr, offset,
- page_offset(page) + cur,
- compress_type);
- if (ret < 0)
- return ret;
+ alloc_new_bio(inode, bio_ctrl, wbc, opf, disk_bytenr,
+ offset, page_offset(page) + cur,
+ compress_type);
}
/*
* We must go through btrfs_bio_add_page() to ensure each
* find_next_dirty_byte() are all exclusive
*/
iosize = min(min(em_end, end + 1), dirty_range_end) - cur;
-
- if (btrfs_use_zone_append(inode, em->block_start))
- op = REQ_OP_ZONE_APPEND;
-
free_extent_map(em);
em = NULL;
*/
mapping_set_error(page->mapping, -EIO);
- /*
- * If we error out, we should add back the dirty_metadata_bytes
- * to make it consistent.
- */
- percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
- eb->len, fs_info->dirty_metadata_batch);
-
/*
* If writeback for a btree extent that doesn't belong to a log tree
* failed, increment the counter transaction->eb_write_errors.
int ret = 0;
int done = 0;
int nr_to_write_done = 0;
- struct pagevec pvec;
- int nr_pages;
+ struct folio_batch fbatch;
+ unsigned int nr_folios;
pgoff_t index;
pgoff_t end; /* Inclusive */
int scanned = 0;
xa_mark_t tag;
- pagevec_init(&pvec);
+ folio_batch_init(&fbatch);
if (wbc->range_cyclic) {
index = mapping->writeback_index; /* Start from prev offset */
end = -1;
if (wbc->sync_mode == WB_SYNC_ALL)
tag_pages_for_writeback(mapping, index, end);
while (!done && !nr_to_write_done && (index <= end) &&
- (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
- tag))) {
+ (nr_folios = filemap_get_folios_tag(mapping, &index, end,
+ tag, &fbatch))) {
unsigned i;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch.folios[i];
- ret = submit_eb_page(page, wbc, &bio_ctrl, &eb_context);
+ ret = submit_eb_page(&folio->page, wbc, &bio_ctrl,
+ &eb_context);
if (ret == 0)
continue;
if (ret < 0) {
*/
nr_to_write_done = wbc->nr_to_write <= 0;
}
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
if (!scanned && !done) {
int ret = 0;
int done = 0;
int nr_to_write_done = 0;
- struct pagevec pvec;
- int nr_pages;
+ struct folio_batch fbatch;
+ unsigned int nr_folios;
pgoff_t index;
pgoff_t end; /* Inclusive */
pgoff_t done_index;
if (!igrab(inode))
return 0;
- pagevec_init(&pvec);
+ folio_batch_init(&fbatch);
if (wbc->range_cyclic) {
index = mapping->writeback_index; /* Start from prev offset */
end = -1;
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
- (nr_pages = pagevec_lookup_range_tag(&pvec, mapping,
- &index, end, tag))) {
+ (nr_folios = filemap_get_folios_tag(mapping, &index,
+ end, tag, &fbatch))) {
unsigned i;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch.folios[i];
- done_index = page->index + 1;
+ done_index = folio->index + folio_nr_pages(folio);
/*
* At this point we hold neither the i_pages lock nor
* the page lock: the page may be truncated or
* or even swizzled back from swapper_space to
* tmpfs file mapping
*/
- if (!trylock_page(page)) {
+ if (!folio_trylock(folio)) {
submit_write_bio(bio_ctrl, 0);
- lock_page(page);
+ folio_lock(folio);
}
- if (unlikely(page->mapping != mapping)) {
- unlock_page(page);
+ if (unlikely(folio->mapping != mapping)) {
+ folio_unlock(folio);
continue;
}
if (wbc->sync_mode != WB_SYNC_NONE) {
- if (PageWriteback(page))
+ if (folio_test_writeback(folio))
submit_write_bio(bio_ctrl, 0);
- wait_on_page_writeback(page);
+ folio_wait_writeback(folio);
}
- if (PageWriteback(page) ||
- !clear_page_dirty_for_io(page)) {
- unlock_page(page);
+ if (folio_test_writeback(folio) ||
+ !folio_clear_dirty_for_io(folio)) {
+ folio_unlock(folio);
continue;
}
- ret = __extent_writepage(page, wbc, bio_ctrl);
+ ret = __extent_writepage(&folio->page, wbc, bio_ctrl);
if (ret < 0) {
done = 1;
break;
*/
nr_to_write_done = wbc->nr_to_write <= 0;
}
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
if (!scanned && !done) {
lockend = round_up(start + len, inode->root->fs_info->sectorsize);
prev_extent_end = lockstart;
+ btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end);
out_unlock:
unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
+ btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
out:
free_extent_state(delalloc_cached_state);
btrfs_free_backref_share_ctx(backref_ctx);
WARN_ON(atomic_read(&eb->refs) == 0);
}
-void clear_extent_buffer_dirty(const struct extent_buffer *eb)
+void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
+ struct extent_buffer *eb)
{
+ struct btrfs_fs_info *fs_info = eb->fs_info;
int i;
int num_pages;
struct page *page;
+ btrfs_assert_tree_write_locked(eb);
+
+ if (trans && btrfs_header_generation(eb) != trans->transid)
+ return;
+
+ if (!test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags))
+ return;
+
+ percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, -eb->len,
+ fs_info->dirty_metadata_batch);
+
if (eb->fs_info->nodesize < PAGE_SIZE)
return clear_subpage_extent_buffer_dirty(eb);
#include <linux/sched/mm.h>
#include <trace/events/block.h>
#include <linux/fscrypt.h>
+#include <linux/fsverity.h>
#include "internal.h"
inline void touch_buffer(struct buffer_head *bh)
{
trace_block_touch_buffer(bh);
- mark_page_accessed(bh->b_page);
+ folio_mark_accessed(bh->b_folio);
}
EXPORT_SYMBOL(touch_buffer);
unsigned long flags;
struct buffer_head *first;
struct buffer_head *tmp;
- struct page *page;
- int page_uptodate = 1;
+ struct folio *folio;
+ int folio_uptodate = 1;
BUG_ON(!buffer_async_read(bh));
- page = bh->b_page;
+ folio = bh->b_folio;
if (uptodate) {
set_buffer_uptodate(bh);
} else {
clear_buffer_uptodate(bh);
buffer_io_error(bh, ", async page read");
- SetPageError(page);
+ folio_set_error(folio);
}
/*
* two buffer heads end IO at almost the same time and both
* decide that the page is now completely done.
*/
- first = page_buffers(page);
+ first = folio_buffers(folio);
spin_lock_irqsave(&first->b_uptodate_lock, flags);
clear_buffer_async_read(bh);
unlock_buffer(bh);
tmp = bh;
do {
if (!buffer_uptodate(tmp))
- page_uptodate = 0;
+ folio_uptodate = 0;
if (buffer_async_read(tmp)) {
BUG_ON(!buffer_locked(tmp));
goto still_busy;
* If all of the buffers are uptodate then we can set the page
* uptodate.
*/
- if (page_uptodate)
- SetPageUptodate(page);
- unlock_page(page);
+ if (folio_uptodate)
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
return;
still_busy:
return;
}
-struct decrypt_bh_ctx {
+struct postprocess_bh_ctx {
struct work_struct work;
struct buffer_head *bh;
};
+static void verify_bh(struct work_struct *work)
+{
+ struct postprocess_bh_ctx *ctx =
+ container_of(work, struct postprocess_bh_ctx, work);
+ struct buffer_head *bh = ctx->bh;
+ bool valid;
+
+ valid = fsverity_verify_blocks(page_folio(bh->b_page), bh->b_size,
+ bh_offset(bh));
+ end_buffer_async_read(bh, valid);
+ kfree(ctx);
+}
+
+static bool need_fsverity(struct buffer_head *bh)
+{
+ struct page *page = bh->b_page;
+ struct inode *inode = page->mapping->host;
+
+ return fsverity_active(inode) &&
+ /* needed by ext4 */
+ page->index < DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
+}
+
static void decrypt_bh(struct work_struct *work)
{
- struct decrypt_bh_ctx *ctx =
- container_of(work, struct decrypt_bh_ctx, work);
+ struct postprocess_bh_ctx *ctx =
+ container_of(work, struct postprocess_bh_ctx, work);
struct buffer_head *bh = ctx->bh;
int err;
- err = fscrypt_decrypt_pagecache_blocks(bh->b_page, bh->b_size,
- bh_offset(bh));
+ err = fscrypt_decrypt_pagecache_blocks(page_folio(bh->b_page),
+ bh->b_size, bh_offset(bh));
+ if (err == 0 && need_fsverity(bh)) {
+ /*
+ * We use different work queues for decryption and for verity
+ * because verity may require reading metadata pages that need
+ * decryption, and we shouldn't recurse to the same workqueue.
+ */
+ INIT_WORK(&ctx->work, verify_bh);
+ fsverity_enqueue_verify_work(&ctx->work);
+ return;
+ }
end_buffer_async_read(bh, err == 0);
kfree(ctx);
}
*/
static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
{
- struct inode *inode = bh->b_page->mapping->host;
- /* Decrypt if needed */
- if (uptodate &&
- fscrypt_inode_uses_fs_layer_crypto(bh->b_folio->mapping->host)) {
- struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
++ struct inode *inode = bh->b_folio->mapping->host;
+ bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode);
+ bool verify = need_fsverity(bh);
+
+ /* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */
+ if (uptodate && (decrypt || verify)) {
+ struct postprocess_bh_ctx *ctx =
+ kmalloc(sizeof(*ctx), GFP_ATOMIC);
if (ctx) {
- INIT_WORK(&ctx->work, decrypt_bh);
ctx->bh = bh;
- fscrypt_enqueue_decrypt_work(&ctx->work);
+ if (decrypt) {
+ INIT_WORK(&ctx->work, decrypt_bh);
+ fscrypt_enqueue_decrypt_work(&ctx->work);
+ } else {
+ INIT_WORK(&ctx->work, verify_bh);
+ fsverity_enqueue_verify_work(&ctx->work);
+ }
return;
}
uptodate = 0;
unsigned long flags;
struct buffer_head *first;
struct buffer_head *tmp;
- struct page *page;
+ struct folio *folio;
BUG_ON(!buffer_async_write(bh));
- page = bh->b_page;
+ folio = bh->b_folio;
if (uptodate) {
set_buffer_uptodate(bh);
} else {
buffer_io_error(bh, ", lost async page write");
mark_buffer_write_io_error(bh);
clear_buffer_uptodate(bh);
- SetPageError(page);
+ folio_set_error(folio);
}
- first = page_buffers(page);
+ first = folio_buffers(folio);
spin_lock_irqsave(&first->b_uptodate_lock, flags);
clear_buffer_async_write(bh);
tmp = tmp->b_this_page;
}
spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
- end_page_writeback(page);
+ folio_end_writeback(folio);
return;
still_busy:
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
{
struct address_space *mapping = inode->i_mapping;
- struct address_space *buffer_mapping = bh->b_page->mapping;
+ struct address_space *buffer_mapping = bh->b_folio->mapping;
mark_buffer_dirty(bh);
if (!mapping->private_data) {
* and then attach the address_space's inode to its superblock's dirty
* inode list.
*
- * mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock,
+ * mark_buffer_dirty() is atomic. It takes bh->b_folio->mapping->private_lock,
* i_pages lock and mapping->host->i_lock.
*/
void mark_buffer_dirty(struct buffer_head *bh)
}
if (!test_set_buffer_dirty(bh)) {
- struct page *page = bh->b_page;
+ struct folio *folio = bh->b_folio;
struct address_space *mapping = NULL;
- lock_page_memcg(page);
- if (!TestSetPageDirty(page)) {
- mapping = page_mapping(page);
+ folio_memcg_lock(folio);
+ if (!folio_test_set_dirty(folio)) {
+ mapping = folio->mapping;
if (mapping)
- __set_page_dirty(page, mapping, 0);
+ __folio_mark_dirty(folio, mapping, 0);
}
- unlock_page_memcg(page);
+ folio_memcg_unlock(folio);
if (mapping)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
}
set_buffer_write_io_error(bh);
/* FIXME: do we need to set this in both places? */
- if (bh->b_page && bh->b_page->mapping)
- mapping_set_error(bh->b_page->mapping, -EIO);
+ if (bh->b_folio && bh->b_folio->mapping)
+ mapping_set_error(bh->b_folio->mapping, -EIO);
if (bh->b_assoc_map)
mapping_set_error(bh->b_assoc_map, -EIO);
rcu_read_lock();
{
clear_buffer_dirty(bh);
if (bh->b_assoc_map) {
- struct address_space *buffer_mapping = bh->b_page->mapping;
+ struct address_space *buffer_mapping = bh->b_folio->mapping;
spin_lock(&buffer_mapping->private_lock);
list_del_init(&bh->b_assoc_buffers);
int nr, i;
int fully_mapped = 1;
bool page_error = false;
+ loff_t limit = i_size_read(inode);
+
+ /* This is needed for ext4. */
+ if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
+ limit = inode->i_sb->s_maxbytes;
VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
bbits = block_size_bits(blocksize);
iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits);
- lblock = (i_size_read(inode)+blocksize-1) >> bbits;
+ lblock = (limit+blocksize-1) >> bbits;
bh = head;
nr = 0;
i = 0;
struct inode *inode = rreq->inode;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
- struct ceph_osd_request *req;
+ struct ceph_osd_request *req = NULL;
struct ceph_vino vino = ceph_vino(inode);
struct iov_iter iter;
struct page **pages;
int err = 0;
u64 len = subreq->len;
+ if (ceph_inode_is_shutdown(inode)) {
+ err = -EIO;
+ goto out;
+ }
+
if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq))
return;
dout("writepage %p idx %lu\n", page, page->index);
+ if (ceph_inode_is_shutdown(inode))
+ return -EIO;
+
/* verify this is a writeable snap context */
snapc = page_snap_context(page);
if (!snapc) {
struct ceph_vino vino = ceph_vino(inode);
pgoff_t index, start_index, end = -1;
struct ceph_snap_context *snapc = NULL, *last_snapc = NULL, *pgsnapc;
- struct pagevec pvec;
+ struct folio_batch fbatch;
int rc = 0;
unsigned int wsize = i_blocksize(inode);
struct ceph_osd_request *req = NULL;
if (fsc->mount_options->wsize < wsize)
wsize = fsc->mount_options->wsize;
- pagevec_init(&pvec);
+ folio_batch_init(&fbatch);
start_index = wbc->range_cyclic ? mapping->writeback_index : 0;
index = start_index;
while (!done && index <= end) {
int num_ops = 0, op_idx;
- unsigned i, pvec_pages, max_pages, locked_pages = 0;
+ unsigned i, nr_folios, max_pages, locked_pages = 0;
struct page **pages = NULL, **data_pages;
struct page *page;
pgoff_t strip_unit_end = 0;
max_pages = wsize >> PAGE_SHIFT;
get_more_pages:
- pvec_pages = pagevec_lookup_range_tag(&pvec, mapping, &index,
- end, PAGECACHE_TAG_DIRTY);
- dout("pagevec_lookup_range_tag got %d\n", pvec_pages);
- if (!pvec_pages && !locked_pages)
+ nr_folios = filemap_get_folios_tag(mapping, &index,
+ end, PAGECACHE_TAG_DIRTY, &fbatch);
+ dout("pagevec_lookup_range_tag got %d\n", nr_folios);
+ if (!nr_folios && !locked_pages)
break;
- for (i = 0; i < pvec_pages && locked_pages < max_pages; i++) {
- page = pvec.pages[i];
+ for (i = 0; i < nr_folios && locked_pages < max_pages; i++) {
+ page = &fbatch.folios[i]->page;
dout("? %p idx %lu\n", page, page->index);
if (locked_pages == 0)
lock_page(page); /* first page */
len = 0;
}
- /* note position of first page in pvec */
+ /* note position of first page in fbatch */
dout("%p will write page %p idx %lu\n",
inode, page, page->index);
fsc->write_congested = true;
pages[locked_pages++] = page;
- pvec.pages[i] = NULL;
+ fbatch.folios[i] = NULL;
len += thp_size(page);
}
/* did we get anything? */
if (!locked_pages)
- goto release_pvec_pages;
+ goto release_folios;
if (i) {
unsigned j, n = 0;
- /* shift unused page to beginning of pvec */
- for (j = 0; j < pvec_pages; j++) {
- if (!pvec.pages[j])
+ /* shift unused page to beginning of fbatch */
+ for (j = 0; j < nr_folios; j++) {
+ if (!fbatch.folios[j])
continue;
if (n < j)
- pvec.pages[n] = pvec.pages[j];
+ fbatch.folios[n] = fbatch.folios[j];
n++;
}
- pvec.nr = n;
+ fbatch.nr = n;
- if (pvec_pages && i == pvec_pages &&
+ if (nr_folios && i == nr_folios &&
locked_pages < max_pages) {
- dout("reached end pvec, trying for more\n");
- pagevec_release(&pvec);
+ dout("reached end fbatch, trying for more\n");
+ folio_batch_release(&fbatch);
goto get_more_pages;
}
}
if (wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
done = true;
- release_pvec_pages:
- dout("pagevec_release on %d pages (%p)\n", (int)pvec.nr,
- pvec.nr ? pvec.pages[0] : NULL);
- pagevec_release(&pvec);
+ release_folios:
+ dout("folio_batch release on %d folios (%p)\n", (int)fbatch.nr,
+ fbatch.nr ? fbatch.folios[0] : NULL);
+ folio_batch_release(&fbatch);
}
if (should_loop && !done) {
unsigned i, nr;
index = 0;
while ((index <= end) &&
- (nr = pagevec_lookup_tag(&pvec, mapping, &index,
- PAGECACHE_TAG_WRITEBACK))) {
+ (nr = filemap_get_folios_tag(mapping, &index,
+ (pgoff_t)-1,
+ PAGECACHE_TAG_WRITEBACK,
+ &fbatch))) {
for (i = 0; i < nr; i++) {
- page = pvec.pages[i];
+ page = &fbatch.folios[i]->page;
if (page_snap_context(page) != snapc)
continue;
wait_on_page_writeback(page);
}
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
}
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_request *req = NULL;
- struct ceph_cap_flush *prealloc_cf;
+ struct ceph_cap_flush *prealloc_cf = NULL;
struct folio *folio = NULL;
u64 inline_version = CEPH_INLINE_NONE;
struct page *pages[1];
dout("uninline_data %p %llx.%llx inline_version %llu\n",
inode, ceph_vinop(inode), inline_version);
+ if (ceph_inode_is_shutdown(inode)) {
+ err = -EIO;
+ goto out;
+ }
+
if (inline_version == CEPH_INLINE_NONE)
return 0;
*
*/
#include <linux/fs.h>
+#include <linux/filelock.h>
#include <linux/backing-dev.h>
#include <linux/stat.h>
#include <linux/fcntl.h>
#include "cifs_ioctl.h"
#include "cached_dir.h"
+/*
+ * Remove the dirty flags from a span of pages.
+ */
+static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned int len)
+{
+ struct address_space *mapping = inode->i_mapping;
+ struct folio *folio;
+ pgoff_t end;
+
+ XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+ rcu_read_lock();
+
+ end = (start + len - 1) / PAGE_SIZE;
+ xas_for_each_marked(&xas, folio, end, PAGECACHE_TAG_DIRTY) {
+ xas_pause(&xas);
+ rcu_read_unlock();
+ folio_lock(folio);
+ folio_clear_dirty_for_io(folio);
+ folio_unlock(folio);
+ rcu_read_lock();
+ }
+
+ rcu_read_unlock();
+}
+
+/*
+ * Completion of write to server.
+ */
+void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len)
+{
+ struct address_space *mapping = inode->i_mapping;
+ struct folio *folio;
+ pgoff_t end;
+
+ XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+ if (!len)
+ return;
+
+ rcu_read_lock();
+
+ end = (start + len - 1) / PAGE_SIZE;
+ xas_for_each(&xas, folio, end) {
+ if (!folio_test_writeback(folio)) {
+ WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
+ len, start, folio_index(folio), end);
+ continue;
+ }
+
+ folio_detach_private(folio);
+ folio_end_writeback(folio);
+ }
+
+ rcu_read_unlock();
+}
+
+/*
+ * Failure of write to server.
+ */
+void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len)
+{
+ struct address_space *mapping = inode->i_mapping;
+ struct folio *folio;
+ pgoff_t end;
+
+ XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+ if (!len)
+ return;
+
+ rcu_read_lock();
+
+ end = (start + len - 1) / PAGE_SIZE;
+ xas_for_each(&xas, folio, end) {
+ if (!folio_test_writeback(folio)) {
+ WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
+ len, start, folio_index(folio), end);
+ continue;
+ }
+
+ folio_set_error(folio);
+ folio_end_writeback(folio);
+ }
+
+ rcu_read_unlock();
+}
+
+/*
+ * Redirty pages after a temporary failure.
+ */
+void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int len)
+{
+ struct address_space *mapping = inode->i_mapping;
+ struct folio *folio;
+ pgoff_t end;
+
+ XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE);
+
+ if (!len)
+ return;
+
+ rcu_read_lock();
+
+ end = (start + len - 1) / PAGE_SIZE;
+ xas_for_each(&xas, folio, end) {
+ if (!folio_test_writeback(folio)) {
+ WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
+ len, start, folio_index(folio), end);
+ continue;
+ }
+
+ filemap_dirty_folio(folio->mapping, folio);
+ folio_end_writeback(folio);
+ }
+
+ rcu_read_unlock();
+}
+
/*
* Mark as invalid, all open files on tree connections since they
* were closed when session to server was lost.
if (f_flags & O_DIRECT)
create_options |= CREATE_NO_BUFFER;
- oparms.tcon = tcon;
- oparms.cifs_sb = cifs_sb;
- oparms.desired_access = desired_access;
- oparms.create_options = cifs_create_options(cifs_sb, create_options);
- oparms.disposition = disposition;
- oparms.path = full_path;
- oparms.fid = fid;
- oparms.reconnect = false;
+ oparms = (struct cifs_open_parms) {
+ .tcon = tcon,
+ .cifs_sb = cifs_sb,
+ .desired_access = desired_access,
+ .create_options = cifs_create_options(cifs_sb, create_options),
+ .disposition = disposition,
+ .path = full_path,
+ .fid = fid,
+ };
rc = server->ops->open(xid, &oparms, oplock, buf);
if (rc)
if (server->ops->get_lease_key)
server->ops->get_lease_key(inode, &cfile->fid);
- oparms.tcon = tcon;
- oparms.cifs_sb = cifs_sb;
- oparms.desired_access = desired_access;
- oparms.create_options = cifs_create_options(cifs_sb, create_options);
- oparms.disposition = disposition;
- oparms.path = full_path;
- oparms.fid = &cfile->fid;
- oparms.reconnect = true;
+ oparms = (struct cifs_open_parms) {
+ .tcon = tcon,
+ .cifs_sb = cifs_sb,
+ .desired_access = desired_access,
+ .create_options = cifs_create_options(cifs_sb, create_options),
+ .disposition = disposition,
+ .path = full_path,
+ .fid = &cfile->fid,
+ .reconnect = true,
+ };
/*
* Can not refresh inode by passing in file_info buf to be returned by
if (wdata->cfile)
cifsFileInfo_put(wdata->cfile);
- kvfree(wdata->pages);
kfree(wdata);
}
static void
cifs_writev_requeue(struct cifs_writedata *wdata)
{
- int i, rc = 0;
+ int rc = 0;
struct inode *inode = d_inode(wdata->cfile->dentry);
struct TCP_Server_Info *server;
- unsigned int rest_len;
+ unsigned int rest_len = wdata->bytes;
+ loff_t fpos = wdata->offset;
server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- i = 0;
- rest_len = wdata->bytes;
do {
struct cifs_writedata *wdata2;
- unsigned int j, nr_pages, wsize, tailsz, cur_len;
+ unsigned int wsize, cur_len;
wsize = server->ops->wp_retry_size(inode);
if (wsize < rest_len) {
- nr_pages = wsize / PAGE_SIZE;
- if (!nr_pages) {
+ if (wsize < PAGE_SIZE) {
rc = -EOPNOTSUPP;
break;
}
- cur_len = nr_pages * PAGE_SIZE;
- tailsz = PAGE_SIZE;
+ cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
} else {
- nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE);
cur_len = rest_len;
- tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
}
- wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
+ wdata2 = cifs_writedata_alloc(cifs_writev_complete);
if (!wdata2) {
rc = -ENOMEM;
break;
}
- for (j = 0; j < nr_pages; j++) {
- wdata2->pages[j] = wdata->pages[i + j];
- lock_page(wdata2->pages[j]);
- clear_page_dirty_for_io(wdata2->pages[j]);
- }
-
wdata2->sync_mode = wdata->sync_mode;
- wdata2->nr_pages = nr_pages;
- wdata2->offset = page_offset(wdata2->pages[0]);
- wdata2->pagesz = PAGE_SIZE;
- wdata2->tailsz = tailsz;
- wdata2->bytes = cur_len;
+ wdata2->offset = fpos;
+ wdata2->bytes = cur_len;
+ wdata2->iter = wdata->iter;
+
+ iov_iter_advance(&wdata2->iter, fpos - wdata->offset);
+ iov_iter_truncate(&wdata2->iter, wdata2->bytes);
+
+ if (iov_iter_is_xarray(&wdata2->iter))
+ /* Check for pages having been redirtied and clean
+ * them. We can do this by walking the xarray. If
+ * it's not an xarray, then it's a DIO and we shouldn't
+ * be mucking around with the page bits.
+ */
+ cifs_undirty_folios(inode, fpos, cur_len);
rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
&wdata2->cfile);
cifs_writedata_release);
}
- for (j = 0; j < nr_pages; j++) {
- unlock_page(wdata2->pages[j]);
- if (rc != 0 && !is_retryable_error(rc)) {
- SetPageError(wdata2->pages[j]);
- end_page_writeback(wdata2->pages[j]);
- put_page(wdata2->pages[j]);
- }
- }
-
kref_put(&wdata2->refcount, cifs_writedata_release);
if (rc) {
if (is_retryable_error(rc))
continue;
- i += nr_pages;
+ fpos += cur_len;
+ rest_len -= cur_len;
break;
}
+ fpos += cur_len;
rest_len -= cur_len;
- i += nr_pages;
- } while (i < wdata->nr_pages);
+ } while (rest_len > 0);
- /* cleanup remaining pages from the original wdata */
- for (; i < wdata->nr_pages; i++) {
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- put_page(wdata->pages[i]);
- }
+ /* Clean up remaining pages from the original wdata */
+ if (iov_iter_is_xarray(&wdata->iter))
+ cifs_pages_write_failed(inode, fpos, rest_len);
if (rc != 0 && !is_retryable_error(rc))
mapping_set_error(inode->i_mapping, rc);
struct cifs_writedata *wdata = container_of(work,
struct cifs_writedata, work);
struct inode *inode = d_inode(wdata->cfile->dentry);
- int i = 0;
if (wdata->result == 0) {
spin_lock(&inode->i_lock);
} else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
return cifs_writev_requeue(wdata);
- for (i = 0; i < wdata->nr_pages; i++) {
- struct page *page = wdata->pages[i];
+ if (wdata->result == -EAGAIN)
+ cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes);
+ else if (wdata->result < 0)
+ cifs_pages_write_failed(inode, wdata->offset, wdata->bytes);
+ else
+ cifs_pages_written_back(inode, wdata->offset, wdata->bytes);
- if (wdata->result == -EAGAIN)
- __set_page_dirty_nobuffers(page);
- else if (wdata->result < 0)
- SetPageError(page);
- end_page_writeback(page);
- cifs_readpage_to_fscache(inode, page);
- put_page(page);
- }
if (wdata->result != -EAGAIN)
mapping_set_error(inode->i_mapping, wdata->result);
kref_put(&wdata->refcount, cifs_writedata_release);
}
-struct cifs_writedata *
-cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
-{
- struct cifs_writedata *writedata = NULL;
- struct page **pages =
- kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
- if (pages) {
- writedata = cifs_writedata_direct_alloc(pages, complete);
- if (!writedata)
- kvfree(pages);
- }
-
- return writedata;
-}
-
-struct cifs_writedata *
-cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
+struct cifs_writedata *cifs_writedata_alloc(work_func_t complete)
{
struct cifs_writedata *wdata;
wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
if (wdata != NULL) {
- wdata->pages = pages;
kref_init(&wdata->refcount);
INIT_LIST_HEAD(&wdata->list);
init_completion(&wdata->done);
return wdata;
}
-
static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
{
struct address_space *mapping = page->mapping;
return rc;
}
-static struct cifs_writedata *
-wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
- pgoff_t end, pgoff_t *index,
- unsigned int *found_pages)
+/*
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ */
+static void cifs_extend_writeback(struct address_space *mapping,
+ long *_count,
+ loff_t start,
+ int max_pages,
+ size_t max_len,
+ unsigned int *_len)
{
- struct cifs_writedata *wdata;
- struct folio_batch fbatch;
- unsigned int i, idx, p, nr;
- wdata = cifs_writedata_alloc((unsigned int)tofind,
- cifs_writev_complete);
- if (!wdata)
- return NULL;
-
- folio_batch_init(&fbatch);
- *found_pages = 0;
-
-again:
- nr = filemap_get_folios_tag(mapping, index, end,
- PAGECACHE_TAG_DIRTY, &fbatch);
- if (!nr)
- goto out; /* No dirty pages left in the range */
-
- for (i = 0; i < nr; i++) {
- struct folio *folio = fbatch.folios[i];
-
- idx = 0;
- p = folio_nr_pages(folio);
-add_more:
- wdata->pages[*found_pages] = folio_page(folio, idx);
- folio_get(folio);
- if (++*found_pages == tofind) {
- folio_batch_release(&fbatch);
- goto out;
- }
- if (++idx < p)
- goto add_more;
- }
- folio_batch_release(&fbatch);
- goto again;
-out:
- return wdata;
-}
+ struct folio_batch batch;
+ struct folio *folio;
+ unsigned int psize, nr_pages;
+ size_t len = *_len;
+ pgoff_t index = (start + len) / PAGE_SIZE;
+ bool stop = true;
+ unsigned int i;
+ XA_STATE(xas, &mapping->i_pages, index);
-static unsigned int
-wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
- struct address_space *mapping,
- struct writeback_control *wbc,
- pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
-{
- unsigned int nr_pages = 0, i;
- struct page *page;
+ folio_batch_init(&batch);
- for (i = 0; i < found_pages; i++) {
- page = wdata->pages[i];
- /*
- * At this point we hold neither the i_pages lock nor the
- * page lock: the page may be truncated or invalidated
- * (changing page->mapping to NULL), or even swizzled
- * back from swapper_space to tmpfs file mapping
+ do {
+ /* Firstly, we gather up a batch of contiguous dirty pages
+ * under the RCU read lock - but we can't clear the dirty flags
+ * there if any of those pages are mapped.
*/
+ rcu_read_lock();
- if (nr_pages == 0)
- lock_page(page);
- else if (!trylock_page(page))
- break;
-
- if (unlikely(page->mapping != mapping)) {
- unlock_page(page);
- break;
- }
+ xas_for_each(&xas, folio, ULONG_MAX) {
+ stop = true;
+ if (xas_retry(&xas, folio))
+ continue;
+ if (xa_is_value(folio))
+ break;
+ if (folio_index(folio) != index)
+ break;
+ if (!folio_try_get_rcu(folio)) {
+ xas_reset(&xas);
+ continue;
+ }
+ nr_pages = folio_nr_pages(folio);
+ if (nr_pages > max_pages)
+ break;
- if (!wbc->range_cyclic && page->index > end) {
- *done = true;
- unlock_page(page);
- break;
- }
+ /* Has the page moved or been split? */
+ if (unlikely(folio != xas_reload(&xas))) {
+ folio_put(folio);
+ break;
+ }
- if (*next && (page->index != *next)) {
- /* Not next consecutive page */
- unlock_page(page);
- break;
- }
+ if (!folio_trylock(folio)) {
+ folio_put(folio);
+ break;
+ }
+ if (!folio_test_dirty(folio) || folio_test_writeback(folio)) {
+ folio_unlock(folio);
+ folio_put(folio);
+ break;
+ }
- if (wbc->sync_mode != WB_SYNC_NONE)
- wait_on_page_writeback(page);
+ max_pages -= nr_pages;
+ psize = folio_size(folio);
+ len += psize;
+ stop = false;
+ if (max_pages <= 0 || len >= max_len || *_count <= 0)
+ stop = true;
- if (PageWriteback(page) ||
- !clear_page_dirty_for_io(page)) {
- unlock_page(page);
- break;
+ index += nr_pages;
+ if (!folio_batch_add(&batch, folio))
+ break;
+ if (stop)
+ break;
}
- /*
- * This actually clears the dirty bit in the radix tree.
- * See cifs_writepage() for more commentary.
+ if (!stop)
+ xas_pause(&xas);
+ rcu_read_unlock();
+
+ /* Now, if we obtained any pages, we can shift them to being
+ * writable and mark them for caching.
*/
- set_page_writeback(page);
- if (page_offset(page) >= i_size_read(mapping->host)) {
- *done = true;
- unlock_page(page);
- end_page_writeback(page);
+ if (!folio_batch_count(&batch))
break;
- }
- wdata->pages[i] = page;
- *next = page->index + 1;
- ++nr_pages;
- }
+ for (i = 0; i < folio_batch_count(&batch); i++) {
+ folio = batch.folios[i];
+ /* The folio should be locked, dirty and not undergoing
+ * writeback from the loop above.
+ */
+ if (!folio_clear_dirty_for_io(folio))
+ WARN_ON(1);
+ if (folio_start_writeback(folio))
+ WARN_ON(1);
- /* reset index to refind any pages skipped */
- if (nr_pages == 0)
- *index = wdata->pages[0]->index + 1;
+ *_count -= folio_nr_pages(folio);
+ folio_unlock(folio);
+ }
- /* put any pages we aren't going to use */
- for (i = nr_pages; i < found_pages; i++) {
- put_page(wdata->pages[i]);
- wdata->pages[i] = NULL;
- }
+ folio_batch_release(&batch);
+ cond_resched();
+ } while (!stop);
- return nr_pages;
+ *_len = len;
}
-static int
-wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
- struct address_space *mapping, struct writeback_control *wbc)
+/*
+ * Write back the locked page and any subsequent non-locked dirty pages.
+ */
+static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping,
+ struct writeback_control *wbc,
+ struct folio *folio,
+ loff_t start, loff_t end)
{
+ struct inode *inode = mapping->host;
+ struct TCP_Server_Info *server;
+ struct cifs_writedata *wdata;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
+ struct cifs_credits credits_on_stack;
+ struct cifs_credits *credits = &credits_on_stack;
+ struct cifsFileInfo *cfile = NULL;
+ unsigned int xid, wsize, len;
+ loff_t i_size = i_size_read(inode);
+ size_t max_len;
+ long count = wbc->nr_to_write;
int rc;
- wdata->sync_mode = wbc->sync_mode;
- wdata->nr_pages = nr_pages;
- wdata->offset = page_offset(wdata->pages[0]);
- wdata->pagesz = PAGE_SIZE;
- wdata->tailsz = min(i_size_read(mapping->host) -
- page_offset(wdata->pages[nr_pages - 1]),
- (loff_t)PAGE_SIZE);
- wdata->bytes = ((nr_pages - 1) * PAGE_SIZE) + wdata->tailsz;
- wdata->pid = wdata->cfile->pid;
-
- rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
- if (rc)
- return rc;
-
- if (wdata->cfile->invalidHandle)
- rc = -EAGAIN;
- else
- rc = wdata->server->ops->async_writev(wdata,
- cifs_writedata_release);
+ /* The folio should be locked, dirty and not undergoing writeback. */
+ if (folio_start_writeback(folio))
+ WARN_ON(1);
- return rc;
-}
+ count -= folio_nr_pages(folio);
+ len = folio_size(folio);
-static int
-cifs_writepage_locked(struct page *page, struct writeback_control *wbc);
+ xid = get_xid();
+ server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
-static int cifs_write_one_page(struct folio *folio,
- struct writeback_control *wbc, void *data)
-{
- struct address_space *mapping = data;
- int ret;
+ rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
+ if (rc) {
+ cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc);
+ goto err_xid;
+ }
- ret = cifs_writepage_locked(&folio->page, wbc);
- folio_unlock(folio);
- mapping_set_error(mapping, ret);
- return ret;
-}
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
+ &wsize, credits);
+ if (rc != 0)
+ goto err_close;
-static int cifs_writepages(struct address_space *mapping,
- struct writeback_control *wbc)
-{
- struct inode *inode = mapping->host;
- struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
- struct TCP_Server_Info *server;
- bool done = false, scanned = false, range_whole = false;
- pgoff_t end, index;
- struct cifs_writedata *wdata;
- struct cifsFileInfo *cfile = NULL;
- int rc = 0;
- int saved_rc = 0;
- unsigned int xid;
+ wdata = cifs_writedata_alloc(cifs_writev_complete);
+ if (!wdata) {
+ rc = -ENOMEM;
+ goto err_uncredit;
+ }
- /*
- * If wsize is smaller than the page cache size, default to writing
- * one page at a time.
+ wdata->sync_mode = wbc->sync_mode;
+ wdata->offset = folio_pos(folio);
+ wdata->pid = cfile->pid;
+ wdata->credits = credits_on_stack;
+ wdata->cfile = cfile;
+ wdata->server = server;
+ cfile = NULL;
+
+ /* Find all consecutive lockable dirty pages, stopping when we find a
+ * page that is not immediately lockable, is not dirty or is missing,
+ * or we reach the end of the range.
*/
- if (cifs_sb->ctx->wsize < PAGE_SIZE)
- return write_cache_pages(mapping, wbc, cifs_write_one_page,
- mapping);
+ if (start < i_size) {
+ /* Trim the write to the EOF; the extra data is ignored. Also
+ * put an upper limit on the size of a single storedata op.
+ */
+ max_len = wsize;
+ max_len = min_t(unsigned long long, max_len, end - start + 1);
+ max_len = min_t(unsigned long long, max_len, i_size - start);
- xid = get_xid();
- if (wbc->range_cyclic) {
- index = mapping->writeback_index; /* Start from prev offset */
- end = -1;
- } else {
- index = wbc->range_start >> PAGE_SHIFT;
- end = wbc->range_end >> PAGE_SHIFT;
- if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
- range_whole = true;
- scanned = true;
+ if (len < max_len) {
+ int max_pages = INT_MAX;
+
+#ifdef CONFIG_CIFS_SMB_DIRECT
+ if (server->smbd_conn)
+ max_pages = server->smbd_conn->max_frmr_depth;
+#endif
+ max_pages -= folio_nr_pages(folio);
+
+ if (max_pages > 0)
+ cifs_extend_writeback(mapping, &count, start,
+ max_pages, max_len, &len);
+ }
+ len = min_t(loff_t, len, max_len);
}
- server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses);
-retry:
- while (!done && index <= end) {
- unsigned int i, nr_pages, found_pages, wsize;
- pgoff_t next = 0, tofind, saved_index = index;
- struct cifs_credits credits_on_stack;
- struct cifs_credits *credits = &credits_on_stack;
- int get_file_rc = 0;
+ wdata->bytes = len;
- if (cfile)
- cifsFileInfo_put(cfile);
+ /* We now have a contiguous set of dirty pages, each with writeback
+ * set; the first page is still locked at this point, but all the rest
+ * have been unlocked.
+ */
+ folio_unlock(folio);
- rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile);
+ if (start < i_size) {
+ iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages,
+ start, len);
- /* in case of an error store it to return later */
+ rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes);
if (rc)
- get_file_rc = rc;
+ goto err_wdata;
- rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize,
- &wsize, credits);
- if (rc != 0) {
- done = true;
- break;
+ if (wdata->cfile->invalidHandle)
+ rc = -EAGAIN;
+ else
+ rc = wdata->server->ops->async_writev(wdata,
+ cifs_writedata_release);
+ if (rc >= 0) {
+ kref_put(&wdata->refcount, cifs_writedata_release);
+ goto err_close;
}
+ } else {
+ /* The dirty region was entirely beyond the EOF. */
+ cifs_pages_written_back(inode, start, len);
+ rc = 0;
+ }
- tofind = min((wsize / PAGE_SIZE) - 1, end - index) + 1;
+err_wdata:
+ kref_put(&wdata->refcount, cifs_writedata_release);
+err_uncredit:
+ add_credits_and_wake_if(server, credits, 0);
+err_close:
+ if (cfile)
+ cifsFileInfo_put(cfile);
+err_xid:
+ free_xid(xid);
+ if (rc == 0) {
+ wbc->nr_to_write = count;
+ } else if (is_retryable_error(rc)) {
+ cifs_pages_write_redirty(inode, start, len);
+ } else {
+ cifs_pages_write_failed(inode, start, len);
+ mapping_set_error(mapping, rc);
+ }
+ /* Indication to update ctime and mtime as close is deferred */
+ set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
+ return rc;
+}
- wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
- &found_pages);
- if (!wdata) {
- rc = -ENOMEM;
- done = true;
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
+/*
+ * write a region of pages back to the server
+ */
+static int cifs_writepages_region(struct address_space *mapping,
+ struct writeback_control *wbc,
+ loff_t start, loff_t end, loff_t *_next)
+{
- struct folio *folio;
- struct page *head_page;
- ssize_t ret;
- int n, skips = 0;
++ struct folio_batch fbatch;
++ int skips = 0;
- if (found_pages == 0) {
- kref_put(&wdata->refcount, cifs_writedata_release);
- add_credits_and_wake_if(server, credits, 0);
++ folio_batch_init(&fbatch);
+ do {
++ int nr;
+ pgoff_t index = start / PAGE_SIZE;
+
- n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE,
- PAGECACHE_TAG_DIRTY, 1, &head_page);
- if (!n)
++ nr = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
++ PAGECACHE_TAG_DIRTY, &fbatch);
++ if (!nr)
break;
- }
- folio = page_folio(head_page);
- start = folio_pos(folio); /* May regress with THPs */
- nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
- end, &index, &next, &done);
++ for (int i = 0; i < nr; i++) {
++ ssize_t ret;
++ struct folio *folio = fbatch.folios[i];
- /* At this point we hold neither the i_pages lock nor the
- * page lock: the page may be truncated or invalidated
- * (changing page->mapping to NULL), or even swizzled
- * back from swapper_space to tmpfs file mapping
- */
- if (wbc->sync_mode != WB_SYNC_NONE) {
- ret = folio_lock_killable(folio);
- if (ret < 0) {
- folio_put(folio);
- return ret;
- /* nothing to write? */
- if (nr_pages == 0) {
- kref_put(&wdata->refcount, cifs_writedata_release);
- add_credits_and_wake_if(server, credits, 0);
- continue;
- }
++redo_folio:
++ start = folio_pos(folio); /* May regress with THPs */
+
- wdata->credits = credits_on_stack;
- wdata->cfile = cfile;
- wdata->server = server;
- cfile = NULL;
++ /* At this point we hold neither the i_pages lock nor the
++ * page lock: the page may be truncated or invalidated
++ * (changing page->mapping to NULL), or even swizzled
++ * back from swapper_space to tmpfs file mapping
++ */
++ if (wbc->sync_mode != WB_SYNC_NONE) {
++ ret = folio_lock_killable(folio);
++ if (ret < 0)
++ goto write_error;
++ } else {
++ if (!folio_trylock(folio))
++ goto skip_write;
+ }
- } else {
- if (!folio_trylock(folio)) {
- folio_put(folio);
- return 0;
+
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handle in writepages rc=%d\n",
- get_file_rc);
- if (is_retryable_error(get_file_rc))
- rc = get_file_rc;
- else
- rc = -EBADF;
- } else
- rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
++ if (folio_mapping(folio) != mapping ||
++ !folio_test_dirty(folio)) {
++ folio_unlock(folio);
++ goto skip_write;
+ }
- }
- if (folio_mapping(folio) != mapping ||
- !folio_test_dirty(folio)) {
- start += folio_size(folio);
- folio_unlock(folio);
- folio_put(folio);
- continue;
- }
- for (i = 0; i < nr_pages; ++i)
- unlock_page(wdata->pages[i]);
++ if (folio_test_writeback(folio) ||
++ folio_test_fscache(folio)) {
++ folio_unlock(folio);
++ if (wbc->sync_mode == WB_SYNC_NONE)
++ goto skip_write;
- if (folio_test_writeback(folio) ||
- folio_test_fscache(folio)) {
- folio_unlock(folio);
- if (wbc->sync_mode != WB_SYNC_NONE) {
- /* send failure -- clean up the mess */
- if (rc != 0) {
- add_credits_and_wake_if(server, &wdata->credits, 0);
- for (i = 0; i < nr_pages; ++i) {
- if (is_retryable_error(rc))
- redirty_page_for_writepage(wbc,
- wdata->pages[i]);
- else
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- put_page(wdata->pages[i]);
+ folio_wait_writeback(folio);
+#ifdef CONFIG_CIFS_FSCACHE
+ folio_wait_fscache(folio);
+#endif
- } else {
- start += folio_size(folio);
++ goto redo_folio;
}
- folio_put(folio);
- if (wbc->sync_mode == WB_SYNC_NONE) {
- if (skips >= 5 || need_resched())
- break;
- skips++;
- }
- continue;
- if (!is_retryable_error(rc))
- mapping_set_error(mapping, rc);
-- }
- kref_put(&wdata->refcount, cifs_writedata_release);
- if (!folio_clear_dirty_for_io(folio))
- /* We hold the page lock - it should've been dirty. */
- WARN_ON(1);
- if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
- index = saved_index;
- continue;
- }
++ if (!folio_clear_dirty_for_io(folio))
++ /* We hold the page lock - it should've been dirty. */
++ WARN_ON(1);
- ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
- folio_put(folio);
- if (ret < 0)
- /* Return immediately if we received a signal during writing */
- if (is_interrupt_error(rc)) {
- done = true;
- break;
- }
++ ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
++ if (ret < 0)
++ goto write_error;
+
- if (rc != 0 && saved_rc == 0)
- saved_rc = rc;
++ start += ret;
++ continue;
+
- wbc->nr_to_write -= nr_pages;
- if (wbc->nr_to_write <= 0)
- done = true;
++write_error:
++ folio_batch_release(&fbatch);
++ *_next = start;
+ return ret;
- start += ret;
- index = next;
- }
++skip_write:
++ /*
++ * Too many skipped writes, or need to reschedule?
++ * Treat it as a write error without an error code.
++ */
++ if (skips >= 5 || need_resched()) {
++ ret = 0;
++ goto write_error;
++ }
+
- if (!scanned && !done) {
- /*
- * We hit the last page and there is more work to be done: wrap
- * back to the start of the file
- */
- scanned = true;
- index = 0;
- goto retry;
- }
++ /* Otherwise, just skip that folio and go on to the next */
++ skips++;
++ start += folio_size(folio);
++ continue;
++ }
+
- if (saved_rc != 0)
- rc = saved_rc;
++ folio_batch_release(&fbatch);
+ cond_resched();
+ } while (wbc->nr_to_write > 0);
- if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
- mapping->writeback_index = index;
+ *_next = start;
+ return 0;
+}
- if (cfile)
- cifsFileInfo_put(cfile);
- free_xid(xid);
- /* Indication to update ctime and mtime as close is deferred */
- set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags);
- return rc;
+/*
+ * Write some of the pending data back to the server
+ */
+static int cifs_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ loff_t start, next;
+ int ret;
+
+ /* We have to be careful as we can end up racing with setattr()
+ * truncating the pagecache since the caller doesn't take a lock here
+ * to prevent it.
+ */
+
+ if (wbc->range_cyclic) {
+ start = mapping->writeback_index * PAGE_SIZE;
+ ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next);
+ if (ret == 0) {
+ mapping->writeback_index = next / PAGE_SIZE;
+ if (start > 0 && wbc->nr_to_write > 0) {
+ ret = cifs_writepages_region(mapping, wbc, 0,
+ start, &next);
+ if (ret == 0)
+ mapping->writeback_index =
+ next / PAGE_SIZE;
+ }
+ }
+ } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) {
+ ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next);
+ if (wbc->nr_to_write > 0 && ret == 0)
+ mapping->writeback_index = next / PAGE_SIZE;
+ } else {
+ ret = cifs_writepages_region(mapping, wbc,
+ wbc->range_start, wbc->range_end, &next);
+ }
+
+ return ret;
}
static int
struct inode *inode = mapping->host;
struct cifsFileInfo *cfile = file->private_data;
struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb);
+ struct folio *folio = page_folio(page);
__u32 pid;
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n",
page, pos, copied);
- if (PageChecked(page)) {
+ if (folio_test_checked(folio)) {
if (copied == len)
- SetPageUptodate(page);
- ClearPageChecked(page);
- } else if (!PageUptodate(page) && copied == PAGE_SIZE)
- SetPageUptodate(page);
+ folio_mark_uptodate(folio);
+ folio_clear_checked(folio);
+ } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE)
+ folio_mark_uptodate(folio);
- if (!PageUptodate(page)) {
+ if (!folio_test_uptodate(folio)) {
char *page_data;
unsigned offset = pos & (PAGE_SIZE - 1);
unsigned int xid;
return rc;
}
-static int
-cifs_write_allocate_pages(struct page **pages, unsigned long num_pages)
-{
- int rc = 0;
- unsigned long i;
-
- for (i = 0; i < num_pages; i++) {
- pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
- if (!pages[i]) {
- /*
- * save number of pages we have already allocated and
- * return with ENOMEM error
- */
- num_pages = i;
- rc = -ENOMEM;
- break;
- }
- }
-
- if (rc) {
- for (i = 0; i < num_pages; i++)
- put_page(pages[i]);
- }
- return rc;
-}
-
-static inline
-size_t get_numpages(const size_t wsize, const size_t len, size_t *cur_len)
-{
- size_t num_pages;
- size_t clen;
-
- clen = min_t(const size_t, len, wsize);
- num_pages = DIV_ROUND_UP(clen, PAGE_SIZE);
-
- if (cur_len)
- *cur_len = clen;
-
- return num_pages;
-}
-
static void
cifs_uncached_writedata_release(struct kref *refcount)
{
- int i;
struct cifs_writedata *wdata = container_of(refcount,
struct cifs_writedata, refcount);
kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release);
- for (i = 0; i < wdata->nr_pages; i++)
- put_page(wdata->pages[i]);
cifs_writedata_release(refcount);
}
kref_put(&wdata->refcount, cifs_uncached_writedata_release);
}
-static int
-wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
- size_t *len, unsigned long *num_pages)
-{
- size_t save_len, copied, bytes, cur_len = *len;
- unsigned long i, nr_pages = *num_pages;
-
- save_len = cur_len;
- for (i = 0; i < nr_pages; i++) {
- bytes = min_t(const size_t, cur_len, PAGE_SIZE);
- copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
- cur_len -= copied;
- /*
- * If we didn't copy as much as we expected, then that
- * may mean we trod into an unmapped area. Stop copying
- * at that point. On the next pass through the big
- * loop, we'll likely end up getting a zero-length
- * write and bailing out of it.
- */
- if (copied < bytes)
- break;
- }
- cur_len = save_len - cur_len;
- *len = cur_len;
-
- /*
- * If we have no data to send, then that probably means that
- * the copy above failed altogether. That's most likely because
- * the address in the iovec was bogus. Return -EFAULT and let
- * the caller free anything we allocated and bail out.
- */
- if (!cur_len)
- return -EFAULT;
-
- /*
- * i + 1 now represents the number of pages we actually used in
- * the copy phase above.
- */
- *num_pages = i + 1;
- return 0;
-}
-
static int
cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list,
struct cifs_aio_ctx *ctx)
return rc;
}
+/*
+ * Select span of a bvec iterator we're going to use. Limit it by both maximum
+ * size and maximum number of segments.
+ */
+static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_size,
+ size_t max_segs, unsigned int *_nsegs)
+{
+ const struct bio_vec *bvecs = iter->bvec;
+ unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0;
+ size_t len, span = 0, n = iter->count;
+ size_t skip = iter->iov_offset;
+
+ if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0)
+ return 0;
+
+ while (n && ix < nbv && skip) {
+ len = bvecs[ix].bv_len;
+ if (skip < len)
+ break;
+ skip -= len;
+ n -= len;
+ ix++;
+ }
+
+ while (n && ix < nbv) {
+ len = min3(n, bvecs[ix].bv_len - skip, max_size);
+ span += len;
+ nsegs++;
+ ix++;
+ if (span >= max_size || nsegs >= max_segs)
+ break;
+ skip = 0;
+ n -= len;
+ }
+
+ *_nsegs = nsegs;
+ return span;
+}
+
static int
-cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
+cifs_write_from_iter(loff_t fpos, size_t len, struct iov_iter *from,
struct cifsFileInfo *open_file,
struct cifs_sb_info *cifs_sb, struct list_head *wdata_list,
struct cifs_aio_ctx *ctx)
{
int rc = 0;
- size_t cur_len;
- unsigned long nr_pages, num_pages, i;
+ size_t cur_len, max_len;
struct cifs_writedata *wdata;
- struct iov_iter saved_from = *from;
- loff_t saved_offset = offset;
pid_t pid;
struct TCP_Server_Info *server;
- struct page **pagevec;
- size_t start;
- unsigned int xid;
+ unsigned int xid, max_segs = INT_MAX;
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
xid = get_xid();
+#ifdef CONFIG_CIFS_SMB_DIRECT
+ if (server->smbd_conn)
+ max_segs = server->smbd_conn->max_frmr_depth;
+#endif
+
do {
- unsigned int wsize;
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = &credits_on_stack;
+ unsigned int wsize, nsegs = 0;
+
+ if (signal_pending(current)) {
+ rc = -EINTR;
+ break;
+ }
if (open_file->invalidHandle) {
rc = cifs_reopen_file(open_file, false);
if (rc)
break;
- cur_len = min_t(const size_t, len, wsize);
-
- if (ctx->direct_io) {
- ssize_t result;
-
- result = iov_iter_get_pages_alloc2(
- from, &pagevec, cur_len, &start);
- if (result < 0) {
- cifs_dbg(VFS,
- "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
- result, iov_iter_type(from),
- from->iov_offset, from->count);
- dump_stack();
-
- rc = result;
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
- cur_len = (size_t)result;
-
- nr_pages =
- (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
-
- wdata = cifs_writedata_direct_alloc(pagevec,
- cifs_uncached_writev_complete);
- if (!wdata) {
- rc = -ENOMEM;
- for (i = 0; i < nr_pages; i++)
- put_page(pagevec[i]);
- kvfree(pagevec);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
-
-
- wdata->page_offset = start;
- wdata->tailsz =
- nr_pages > 1 ?
- cur_len - (PAGE_SIZE - start) -
- (nr_pages - 2) * PAGE_SIZE :
- cur_len;
- } else {
- nr_pages = get_numpages(wsize, len, &cur_len);
- wdata = cifs_writedata_alloc(nr_pages,
- cifs_uncached_writev_complete);
- if (!wdata) {
- rc = -ENOMEM;
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
-
- rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
- if (rc) {
- kvfree(wdata->pages);
- kfree(wdata);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
-
- num_pages = nr_pages;
- rc = wdata_fill_from_iovec(
- wdata, from, &cur_len, &num_pages);
- if (rc) {
- for (i = 0; i < nr_pages; i++)
- put_page(wdata->pages[i]);
- kvfree(wdata->pages);
- kfree(wdata);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
+ max_len = min_t(const size_t, len, wsize);
+ if (!max_len) {
+ rc = -EAGAIN;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
- /*
- * Bring nr_pages down to the number of pages we
- * actually used, and free any pages that we didn't use.
- */
- for ( ; nr_pages > num_pages; nr_pages--)
- put_page(wdata->pages[nr_pages - 1]);
+ cur_len = cifs_limit_bvec_subset(from, max_len, max_segs, &nsegs);
+ cifs_dbg(FYI, "write_from_iter len=%zx/%zx nsegs=%u/%lu/%u\n",
+ cur_len, max_len, nsegs, from->nr_segs, max_segs);
+ if (cur_len == 0) {
+ rc = -EIO;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
- wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
+ wdata = cifs_writedata_alloc(cifs_uncached_writev_complete);
+ if (!wdata) {
+ rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
}
wdata->sync_mode = WB_SYNC_ALL;
- wdata->nr_pages = nr_pages;
- wdata->offset = (__u64)offset;
- wdata->cfile = cifsFileInfo_get(open_file);
- wdata->server = server;
- wdata->pid = pid;
- wdata->bytes = cur_len;
- wdata->pagesz = PAGE_SIZE;
- wdata->credits = credits_on_stack;
- wdata->ctx = ctx;
+ wdata->offset = (__u64)fpos;
+ wdata->cfile = cifsFileInfo_get(open_file);
+ wdata->server = server;
+ wdata->pid = pid;
+ wdata->bytes = cur_len;
+ wdata->credits = credits_on_stack;
+ wdata->iter = *from;
+ wdata->ctx = ctx;
kref_get(&ctx->refcount);
+ iov_iter_truncate(&wdata->iter, cur_len);
+
rc = adjust_credits(server, &wdata->credits, wdata->bytes);
if (!rc) {
add_credits_and_wake_if(server, &wdata->credits, 0);
kref_put(&wdata->refcount,
cifs_uncached_writedata_release);
- if (rc == -EAGAIN) {
- *from = saved_from;
- iov_iter_advance(from, offset - saved_offset);
+ if (rc == -EAGAIN)
continue;
- }
break;
}
list_add_tail(&wdata->list, wdata_list);
- offset += cur_len;
+ iov_iter_advance(from, cur_len);
+ fpos += cur_len;
len -= cur_len;
} while (len > 0);
struct cifs_tcon *tcon;
struct cifs_sb_info *cifs_sb;
struct cifs_aio_ctx *ctx;
- struct iov_iter saved_from = *from;
- size_t len = iov_iter_count(from);
int rc;
- /*
- * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
- * In this case, fall back to non-direct write function.
- * this could be improved by getting pages directly in ITER_KVEC
- */
- if (direct && iov_iter_is_kvec(from)) {
- cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n");
- direct = false;
- }
-
rc = generic_write_checks(iocb, from);
if (rc <= 0)
return rc;
ctx->iocb = iocb;
ctx->pos = iocb->ki_pos;
+ ctx->direct_io = direct;
+ ctx->nr_pinned_pages = 0;
- if (direct) {
- ctx->direct_io = true;
- ctx->iter = *from;
- ctx->len = len;
- } else {
- rc = setup_aio_ctx_iter(ctx, from, ITER_SOURCE);
- if (rc) {
+ if (user_backed_iter(from)) {
+ /*
+ * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as
+ * they contain references to the calling process's virtual
+ * memory layout which won't be available in an async worker
+ * thread. This also takes a pin on every folio involved.
+ */
+ rc = netfs_extract_user_iter(from, iov_iter_count(from),
+ &ctx->iter, 0);
+ if (rc < 0) {
kref_put(&ctx->refcount, cifs_aio_ctx_release);
return rc;
}
+
+ ctx->nr_pinned_pages = rc;
+ ctx->bv = (void *)ctx->iter.bvec;
+ ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter);
+ } else if ((iov_iter_is_bvec(from) || iov_iter_is_kvec(from)) &&
+ !is_sync_kiocb(iocb)) {
+ /*
+ * If the op is asynchronous, we need to copy the list attached
+ * to a BVEC/KVEC-type iterator, but we assume that the storage
+ * will be pinned by the caller; in any case, we may or may not
+ * be able to pin the pages, so we don't try.
+ */
+ ctx->bv = (void *)dup_iter(&ctx->iter, from, GFP_KERNEL);
+ if (!ctx->bv) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return -ENOMEM;
+ }
+ } else {
+ /*
+ * Otherwise, we just pass the iterator down as-is and rely on
+ * the caller to make sure the pages referred to by the
+ * iterator don't evaporate.
+ */
+ ctx->iter = *from;
}
+ ctx->len = iov_iter_count(&ctx->iter);
+
/* grab a lock here due to read response handlers can access ctx */
mutex_lock(&ctx->aio_mutex);
- rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from,
+ rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &ctx->iter,
cfile, cifs_sb, &ctx->list, ctx);
/*
return written;
}
-static struct cifs_readdata *
-cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
+static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete)
{
struct cifs_readdata *rdata;
rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
- if (rdata != NULL) {
- rdata->pages = pages;
+ if (rdata) {
kref_init(&rdata->refcount);
INIT_LIST_HEAD(&rdata->list);
init_completion(&rdata->done);
return rdata;
}
-static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
-{
- struct page **pages =
- kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
- struct cifs_readdata *ret = NULL;
-
- if (pages) {
- ret = cifs_readdata_direct_alloc(pages, complete);
- if (!ret)
- kfree(pages);
- }
-
- return ret;
-}
-
void
cifs_readdata_release(struct kref *refcount)
{
struct cifs_readdata *rdata = container_of(refcount,
struct cifs_readdata, refcount);
+
+ if (rdata->ctx)
+ kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
#ifdef CONFIG_CIFS_SMB_DIRECT
if (rdata->mr) {
smbd_deregister_mr(rdata->mr);
if (rdata->cfile)
cifsFileInfo_put(rdata->cfile);
- kvfree(rdata->pages);
kfree(rdata);
}
-static int
-cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
-{
- int rc = 0;
- struct page *page;
- unsigned int i;
-
- for (i = 0; i < nr_pages; i++) {
- page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM);
- if (!page) {
- rc = -ENOMEM;
- break;
- }
- rdata->pages[i] = page;
- }
-
- if (rc) {
- unsigned int nr_page_failed = i;
-
- for (i = 0; i < nr_page_failed; i++) {
- put_page(rdata->pages[i]);
- rdata->pages[i] = NULL;
- }
- }
- return rc;
-}
-
-static void
-cifs_uncached_readdata_release(struct kref *refcount)
-{
- struct cifs_readdata *rdata = container_of(refcount,
- struct cifs_readdata, refcount);
- unsigned int i;
-
- kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
- for (i = 0; i < rdata->nr_pages; i++) {
- put_page(rdata->pages[i]);
- }
- cifs_readdata_release(refcount);
-}
-
-/**
- * cifs_readdata_to_iov - copy data from pages in response to an iovec
- * @rdata: the readdata response with list of pages holding data
- * @iter: destination for our data
- *
- * This function copies data from a list of pages in a readdata response into
- * an array of iovecs. It will first calculate where the data should go
- * based on the info in the readdata and then copy the data into that spot.
- */
-static int
-cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
-{
- size_t remaining = rdata->got_bytes;
- unsigned int i;
-
- for (i = 0; i < rdata->nr_pages; i++) {
- struct page *page = rdata->pages[i];
- size_t copy = min_t(size_t, remaining, PAGE_SIZE);
- size_t written;
-
- if (unlikely(iov_iter_is_pipe(iter))) {
- void *addr = kmap_atomic(page);
-
- written = copy_to_iter(addr, copy, iter);
- kunmap_atomic(addr);
- } else
- written = copy_page_to_iter(page, 0, copy, iter);
- remaining -= written;
- if (written < copy && iov_iter_count(iter) > 0)
- break;
- }
- return remaining ? -EFAULT : 0;
-}
-
static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
static void
complete(&rdata->done);
collect_uncached_read_data(rdata->ctx);
/* the below call can possibly free the last ref to aio ctx */
- kref_put(&rdata->refcount, cifs_uncached_readdata_release);
-}
-
-static int
-uncached_fill_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata, struct iov_iter *iter,
- unsigned int len)
-{
- int result = 0;
- unsigned int i;
- unsigned int nr_pages = rdata->nr_pages;
- unsigned int page_offset = rdata->page_offset;
-
- rdata->got_bytes = 0;
- rdata->tailsz = PAGE_SIZE;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = rdata->pages[i];
- size_t n;
- unsigned int segment_size = rdata->pagesz;
-
- if (i == 0)
- segment_size -= page_offset;
- else
- page_offset = 0;
-
-
- if (len <= 0) {
- /* no need to hold page hostage */
- rdata->pages[i] = NULL;
- rdata->nr_pages--;
- put_page(page);
- continue;
- }
-
- n = len;
- if (len >= segment_size)
- /* enough data to fill the page */
- n = segment_size;
- else
- rdata->tailsz = len;
- len -= n;
-
- if (iter)
- result = copy_page_from_iter(
- page, page_offset, n, iter);
-#ifdef CONFIG_CIFS_SMB_DIRECT
- else if (rdata->mr)
- result = n;
-#endif
- else
- result = cifs_read_page_from_socket(
- server, page, page_offset, n);
- if (result < 0)
- break;
-
- rdata->got_bytes += result;
- }
-
- return rdata->got_bytes > 0 && result != -ECONNABORTED ?
- rdata->got_bytes : result;
-}
-
-static int
-cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata, unsigned int len)
-{
- return uncached_fill_pages(server, rdata, NULL, len);
-}
-
-static int
-cifs_uncached_copy_into_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata,
- struct iov_iter *iter)
-{
- return uncached_fill_pages(server, rdata, iter, iter->count);
+ kref_put(&rdata->refcount, cifs_readdata_release);
}
static int cifs_resend_rdata(struct cifs_readdata *rdata,
} while (rc == -EAGAIN);
fail:
- kref_put(&rdata->refcount, cifs_uncached_readdata_release);
+ kref_put(&rdata->refcount, cifs_readdata_release);
return rc;
}
static int
-cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
+cifs_send_async_read(loff_t fpos, size_t len, struct cifsFileInfo *open_file,
struct cifs_sb_info *cifs_sb, struct list_head *rdata_list,
struct cifs_aio_ctx *ctx)
{
struct cifs_readdata *rdata;
- unsigned int npages, rsize;
+ unsigned int rsize, nsegs, max_segs = INT_MAX;
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = &credits_on_stack;
- size_t cur_len;
+ size_t cur_len, max_len;
int rc;
pid_t pid;
struct TCP_Server_Info *server;
- struct page **pagevec;
- size_t start;
- struct iov_iter direct_iov = ctx->iter;
server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
+#ifdef CONFIG_CIFS_SMB_DIRECT
+ if (server->smbd_conn)
+ max_segs = server->smbd_conn->max_frmr_depth;
+#endif
+
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
pid = current->tgid;
- if (ctx->direct_io)
- iov_iter_advance(&direct_iov, offset - ctx->pos);
-
do {
if (open_file->invalidHandle) {
rc = cifs_reopen_file(open_file, true);
if (rc)
break;
- cur_len = min_t(const size_t, len, rsize);
-
- if (ctx->direct_io) {
- ssize_t result;
-
- result = iov_iter_get_pages_alloc2(
- &direct_iov, &pagevec,
- cur_len, &start);
- if (result < 0) {
- cifs_dbg(VFS,
- "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n",
- result, iov_iter_type(&direct_iov),
- direct_iov.iov_offset,
- direct_iov.count);
- dump_stack();
-
- rc = result;
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
- cur_len = (size_t)result;
-
- rdata = cifs_readdata_direct_alloc(
- pagevec, cifs_uncached_readv_complete);
- if (!rdata) {
- add_credits_and_wake_if(server, credits, 0);
- rc = -ENOMEM;
- break;
- }
-
- npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
- rdata->page_offset = start;
- rdata->tailsz = npages > 1 ?
- cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
- cur_len;
+ max_len = min_t(size_t, len, rsize);
- } else {
-
- npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
- /* allocate a readdata struct */
- rdata = cifs_readdata_alloc(npages,
- cifs_uncached_readv_complete);
- if (!rdata) {
- add_credits_and_wake_if(server, credits, 0);
- rc = -ENOMEM;
- break;
- }
-
- rc = cifs_read_allocate_pages(rdata, npages);
- if (rc) {
- kvfree(rdata->pages);
- kfree(rdata);
- add_credits_and_wake_if(server, credits, 0);
- break;
- }
+ cur_len = cifs_limit_bvec_subset(&ctx->iter, max_len,
+ max_segs, &nsegs);
+ cifs_dbg(FYI, "read-to-iter len=%zx/%zx nsegs=%u/%lu/%u\n",
+ cur_len, max_len, nsegs, ctx->iter.nr_segs, max_segs);
+ if (cur_len == 0) {
+ rc = -EIO;
+ add_credits_and_wake_if(server, credits, 0);
+ break;
+ }
- rdata->tailsz = PAGE_SIZE;
+ rdata = cifs_readdata_alloc(cifs_uncached_readv_complete);
+ if (!rdata) {
+ add_credits_and_wake_if(server, credits, 0);
+ rc = -ENOMEM;
+ break;
}
- rdata->server = server;
- rdata->cfile = cifsFileInfo_get(open_file);
- rdata->nr_pages = npages;
- rdata->offset = offset;
- rdata->bytes = cur_len;
- rdata->pid = pid;
- rdata->pagesz = PAGE_SIZE;
- rdata->read_into_pages = cifs_uncached_read_into_pages;
- rdata->copy_into_pages = cifs_uncached_copy_into_pages;
- rdata->credits = credits_on_stack;
- rdata->ctx = ctx;
+ rdata->server = server;
+ rdata->cfile = cifsFileInfo_get(open_file);
+ rdata->offset = fpos;
+ rdata->bytes = cur_len;
+ rdata->pid = pid;
+ rdata->credits = credits_on_stack;
+ rdata->ctx = ctx;
kref_get(&ctx->refcount);
+ rdata->iter = ctx->iter;
+ iov_iter_truncate(&rdata->iter, cur_len);
+
rc = adjust_credits(server, &rdata->credits, rdata->bytes);
if (!rc) {
if (rc) {
add_credits_and_wake_if(server, &rdata->credits, 0);
- kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
- if (rc == -EAGAIN) {
- iov_iter_revert(&direct_iov, cur_len);
+ kref_put(&rdata->refcount, cifs_readdata_release);
+ if (rc == -EAGAIN)
continue;
- }
break;
}
list_add_tail(&rdata->list, rdata_list);
- offset += cur_len;
+ iov_iter_advance(&ctx->iter, cur_len);
+ fpos += cur_len;
len -= cur_len;
} while (len > 0);
list_del_init(&rdata->list);
INIT_LIST_HEAD(&tmp_list);
- /*
- * Got a part of data and then reconnect has
- * happened -- fill the buffer and continue
- * reading.
- */
- if (got_bytes && got_bytes < rdata->bytes) {
- rc = 0;
- if (!ctx->direct_io)
- rc = cifs_readdata_to_iov(rdata, to);
- if (rc) {
- kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
- continue;
- }
- }
-
if (ctx->direct_io) {
/*
* Re-use rdata as this is a
&tmp_list, ctx);
kref_put(&rdata->refcount,
- cifs_uncached_readdata_release);
+ cifs_readdata_release);
}
list_splice(&tmp_list, &ctx->list);
goto again;
} else if (rdata->result)
rc = rdata->result;
- else if (!ctx->direct_io)
- rc = cifs_readdata_to_iov(rdata, to);
/* if there was a short read -- discard anything left */
if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
ctx->total_len += rdata->got_bytes;
}
list_del_init(&rdata->list);
- kref_put(&rdata->refcount, cifs_uncached_readdata_release);
+ kref_put(&rdata->refcount, cifs_readdata_release);
}
if (!ctx->direct_io)
loff_t offset = iocb->ki_pos;
struct cifs_aio_ctx *ctx;
- /*
- * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
- * fall back to data copy read path
- * this could be improved by getting pages directly in ITER_KVEC
- */
- if (direct && iov_iter_is_kvec(to)) {
- cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
- direct = false;
- }
-
len = iov_iter_count(to);
if (!len)
return 0;
if (!ctx)
return -ENOMEM;
- ctx->cfile = cifsFileInfo_get(cfile);
+ ctx->pos = offset;
+ ctx->direct_io = direct;
+ ctx->len = len;
+ ctx->cfile = cifsFileInfo_get(cfile);
+ ctx->nr_pinned_pages = 0;
if (!is_sync_kiocb(iocb))
ctx->iocb = iocb;
- if (user_backed_iter(to))
- ctx->should_dirty = true;
-
- if (direct) {
- ctx->pos = offset;
- ctx->direct_io = true;
- ctx->iter = *to;
- ctx->len = len;
- } else {
- rc = setup_aio_ctx_iter(ctx, to, ITER_DEST);
- if (rc) {
+ if (user_backed_iter(to)) {
+ /*
+ * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as
+ * they contain references to the calling process's virtual
+ * memory layout which won't be available in an async worker
+ * thread. This also takes a pin on every folio involved.
+ */
+ rc = netfs_extract_user_iter(to, iov_iter_count(to),
+ &ctx->iter, 0);
+ if (rc < 0) {
kref_put(&ctx->refcount, cifs_aio_ctx_release);
return rc;
}
- len = ctx->len;
+
+ ctx->nr_pinned_pages = rc;
+ ctx->bv = (void *)ctx->iter.bvec;
+ ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter);
+ ctx->should_dirty = true;
+ } else if ((iov_iter_is_bvec(to) || iov_iter_is_kvec(to)) &&
+ !is_sync_kiocb(iocb)) {
+ /*
+ * If the op is asynchronous, we need to copy the list attached
+ * to a BVEC/KVEC-type iterator, but we assume that the storage
+ * will be retained by the caller; in any case, we may or may
+ * not be able to pin the pages, so we don't try.
+ */
+ ctx->bv = (void *)dup_iter(&ctx->iter, to, GFP_KERNEL);
+ if (!ctx->bv) {
+ kref_put(&ctx->refcount, cifs_aio_ctx_release);
+ return -ENOMEM;
+ }
+ } else {
+ /*
+ * Otherwise, we just pass the iterator down as-is and rely on
+ * the caller to make sure the pages referred to by the
+ * iterator don't evaporate.
+ */
+ ctx->iter = *to;
}
if (direct) {
* If the page is mmap'ed into a process' page tables, then we need to make
* sure that it doesn't change while being written back.
*/
-static vm_fault_t
-cifs_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t cifs_page_mkwrite(struct vm_fault *vmf)
{
- struct page *page = vmf->page;
+ struct folio *folio = page_folio(vmf->page);
- /* Wait for the page to be written to the cache before we allow it to
- * be modified. We then assume the entire page will need writing back.
+ /* Wait for the folio to be written to the cache before we allow it to
+ * be modified. We then assume the entire folio will need writing back.
*/
#ifdef CONFIG_CIFS_FSCACHE
- if (PageFsCache(page) &&
- wait_on_page_fscache_killable(page) < 0)
+ if (folio_test_fscache(folio) &&
+ folio_wait_fscache_killable(folio) < 0)
return VM_FAULT_RETRY;
#endif
- wait_on_page_writeback(page);
+ folio_wait_writeback(folio);
- if (lock_page_killable(page) < 0)
+ if (folio_lock_killable(folio) < 0)
return VM_FAULT_RETRY;
return VM_FAULT_LOCKED;
}
return rc;
}
-static void
-cifs_readv_complete(struct work_struct *work)
+/*
+ * Unlock a bunch of folios in the pagecache.
+ */
+static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last)
{
- unsigned int i, got_bytes;
- struct cifs_readdata *rdata = container_of(work,
- struct cifs_readdata, work);
-
- got_bytes = rdata->got_bytes;
- for (i = 0; i < rdata->nr_pages; i++) {
- struct page *page = rdata->pages[i];
-
- if (rdata->result == 0 ||
- (rdata->result == -EAGAIN && got_bytes)) {
- flush_dcache_page(page);
- SetPageUptodate(page);
- } else
- SetPageError(page);
-
- if (rdata->result == 0 ||
- (rdata->result == -EAGAIN && got_bytes))
- cifs_readpage_to_fscache(rdata->mapping->host, page);
-
- unlock_page(page);
+ struct folio *folio;
+ XA_STATE(xas, &mapping->i_pages, first);
- got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);
-
- put_page(page);
- rdata->pages[i] = NULL;
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last) {
+ folio_unlock(folio);
}
- kref_put(&rdata->refcount, cifs_readdata_release);
+ rcu_read_unlock();
}
-static int
-readpages_fill_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata, struct iov_iter *iter,
- unsigned int len)
+static void cifs_readahead_complete(struct work_struct *work)
{
- int result = 0;
- unsigned int i;
- u64 eof;
- pgoff_t eof_index;
- unsigned int nr_pages = rdata->nr_pages;
- unsigned int page_offset = rdata->page_offset;
-
- /* determine the eof that the server (probably) has */
- eof = CIFS_I(rdata->mapping->host)->server_eof;
- eof_index = eof ? (eof - 1) >> PAGE_SHIFT : 0;
- cifs_dbg(FYI, "eof=%llu eof_index=%lu\n", eof, eof_index);
-
- rdata->got_bytes = 0;
- rdata->tailsz = PAGE_SIZE;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = rdata->pages[i];
- unsigned int to_read = rdata->pagesz;
- size_t n;
-
- if (i == 0)
- to_read -= page_offset;
- else
- page_offset = 0;
-
- n = to_read;
-
- if (len >= to_read) {
- len -= to_read;
- } else if (len > 0) {
- /* enough for partial page, fill and zero the rest */
- zero_user(page, len + page_offset, to_read - len);
- n = rdata->tailsz = len;
- len = 0;
- } else if (page->index > eof_index) {
- /*
- * The VFS will not try to do readahead past the
- * i_size, but it's possible that we have outstanding
- * writes with gaps in the middle and the i_size hasn't
- * caught up yet. Populate those with zeroed out pages
- * to prevent the VFS from repeatedly attempting to
- * fill them until the writes are flushed.
- */
- zero_user(page, 0, PAGE_SIZE);
- flush_dcache_page(page);
- SetPageUptodate(page);
- unlock_page(page);
- put_page(page);
- rdata->pages[i] = NULL;
- rdata->nr_pages--;
- continue;
- } else {
- /* no need to hold page hostage */
- unlock_page(page);
- put_page(page);
- rdata->pages[i] = NULL;
- rdata->nr_pages--;
- continue;
- }
+ struct cifs_readdata *rdata = container_of(work,
+ struct cifs_readdata, work);
+ struct folio *folio;
+ pgoff_t last;
+ bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes);
- if (iter)
- result = copy_page_from_iter(
- page, page_offset, n, iter);
-#ifdef CONFIG_CIFS_SMB_DIRECT
- else if (rdata->mr)
- result = n;
-#endif
- else
- result = cifs_read_page_from_socket(
- server, page, page_offset, n);
- if (result < 0)
- break;
+ XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE);
- rdata->got_bytes += result;
- }
+ if (good)
+ cifs_readahead_to_fscache(rdata->mapping->host,
+ rdata->offset, rdata->bytes);
- return rdata->got_bytes > 0 && result != -ECONNABORTED ?
- rdata->got_bytes : result;
-}
+ if (iov_iter_count(&rdata->iter) > 0)
+ iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter);
-static int
-cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata, unsigned int len)
-{
- return readpages_fill_pages(server, rdata, NULL, len);
-}
+ last = (rdata->offset + rdata->bytes - 1) / PAGE_SIZE;
-static int
-cifs_readpages_copy_into_pages(struct TCP_Server_Info *server,
- struct cifs_readdata *rdata,
- struct iov_iter *iter)
-{
- return readpages_fill_pages(server, rdata, iter, iter->count);
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last) {
+ if (good) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+ folio_unlock(folio);
+ }
+ rcu_read_unlock();
+
+ kref_put(&rdata->refcount, cifs_readdata_release);
}
static void cifs_readahead(struct readahead_control *ractl)
{
- int rc;
struct cifsFileInfo *open_file = ractl->file->private_data;
struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file);
struct TCP_Server_Info *server;
- pid_t pid;
- unsigned int xid, nr_pages, last_batch_size = 0, cache_nr_pages = 0;
- pgoff_t next_cached = ULONG_MAX;
+ unsigned int xid, nr_pages, cache_nr_pages = 0;
+ unsigned int ra_pages;
+ pgoff_t next_cached = ULONG_MAX, ra_index;
bool caching = fscache_cookie_enabled(cifs_inode_cookie(ractl->mapping->host)) &&
cifs_inode_cookie(ractl->mapping->host)->cache_priv;
bool check_cache = caching;
+ pid_t pid;
+ int rc = 0;
+
+ /* Note that readahead_count() lags behind our dequeuing of pages from
+ * the ractl, wo we have to keep track for ourselves.
+ */
+ ra_pages = readahead_count(ractl);
+ ra_index = readahead_index(ractl);
xid = get_xid();
else
pid = current->tgid;
- rc = 0;
server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses);
cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
- __func__, ractl->file, ractl->mapping, readahead_count(ractl));
+ __func__, ractl->file, ractl->mapping, ra_pages);
/*
* Chop the readahead request up into rsize-sized read requests.
*/
- while ((nr_pages = readahead_count(ractl) - last_batch_size)) {
- unsigned int i, got, rsize;
- struct page *page;
+ while ((nr_pages = ra_pages)) {
+ unsigned int i, rsize;
struct cifs_readdata *rdata;
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = &credits_on_stack;
- pgoff_t index = readahead_index(ractl) + last_batch_size;
+ struct folio *folio;
+ pgoff_t fsize;
/*
* Find out if we have anything cached in the range of
if (caching) {
if (check_cache) {
rc = cifs_fscache_query_occupancy(
- ractl->mapping->host, index, nr_pages,
+ ractl->mapping->host, ra_index, nr_pages,
&next_cached, &cache_nr_pages);
if (rc < 0)
caching = false;
check_cache = false;
}
- if (index == next_cached) {
+ if (ra_index == next_cached) {
/*
* TODO: Send a whole batch of pages to be read
* by the cache.
*/
- struct folio *folio = readahead_folio(ractl);
-
- last_batch_size = folio_nr_pages(folio);
+ folio = readahead_folio(ractl);
+ fsize = folio_nr_pages(folio);
+ ra_pages -= fsize;
+ ra_index += fsize;
if (cifs_readpage_from_fscache(ractl->mapping->host,
&folio->page) < 0) {
/*
caching = false;
}
folio_unlock(folio);
- next_cached++;
- cache_nr_pages--;
+ next_cached += fsize;
+ cache_nr_pages -= fsize;
if (cache_nr_pages == 0)
check_cache = true;
continue;
&rsize, credits);
if (rc)
break;
- nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl));
- nr_pages = min_t(size_t, nr_pages, next_cached - index);
+ nr_pages = min_t(size_t, rsize / PAGE_SIZE, ra_pages);
+ if (next_cached != ULONG_MAX)
+ nr_pages = min_t(size_t, nr_pages, next_cached - ra_index);
/*
* Give up immediately if rsize is too small to read an entire
break;
}
- rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
+ rdata = cifs_readdata_alloc(cifs_readahead_complete);
if (!rdata) {
/* best to give up if we're out of mem */
add_credits_and_wake_if(server, credits, 0);
break;
}
- got = __readahead_batch(ractl, rdata->pages, nr_pages);
- if (got != nr_pages) {
- pr_warn("__readahead_batch() returned %u/%u\n",
- got, nr_pages);
- nr_pages = got;
- }
-
- rdata->nr_pages = nr_pages;
- rdata->bytes = readahead_batch_length(ractl);
+ rdata->offset = ra_index * PAGE_SIZE;
+ rdata->bytes = nr_pages * PAGE_SIZE;
rdata->cfile = cifsFileInfo_get(open_file);
rdata->server = server;
rdata->mapping = ractl->mapping;
- rdata->offset = readahead_pos(ractl);
rdata->pid = pid;
- rdata->pagesz = PAGE_SIZE;
- rdata->tailsz = PAGE_SIZE;
- rdata->read_into_pages = cifs_readpages_read_into_pages;
- rdata->copy_into_pages = cifs_readpages_copy_into_pages;
rdata->credits = credits_on_stack;
+ for (i = 0; i < nr_pages; i++) {
+ if (!readahead_folio(ractl))
+ WARN_ON(1);
+ }
+ ra_pages -= nr_pages;
+ ra_index += nr_pages;
+
+ iov_iter_xarray(&rdata->iter, ITER_DEST, &rdata->mapping->i_pages,
+ rdata->offset, rdata->bytes);
+
rc = adjust_credits(server, &rdata->credits, rdata->bytes);
if (!rc) {
if (rdata->cfile->invalidHandle)
if (rc) {
add_credits_and_wake_if(server, &rdata->credits, 0);
- for (i = 0; i < rdata->nr_pages; i++) {
- page = rdata->pages[i];
- unlock_page(page);
- put_page(page);
- }
+ cifs_unlock_folios(rdata->mapping,
+ rdata->offset / PAGE_SIZE,
+ (rdata->offset + rdata->bytes - 1) / PAGE_SIZE);
/* Fallback to the readpage in error/reconnect cases */
kref_put(&rdata->refcount, cifs_readdata_release);
break;
}
kref_put(&rdata->refcount, cifs_readdata_release);
- last_batch_size = nr_pages;
}
free_xid(xid);
flush_dcache_page(page);
SetPageUptodate(page);
-
- /* send this page to the cache */
- cifs_readpage_to_fscache(file_inode(file), page);
-
rc = 0;
io_error:
.launder_folio = cifs_launder_folio,
.migrate_folio = filemap_migrate_folio,
};
+
+/*
+ * Splice data from a file into a pipe.
+ */
+ssize_t cifs_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe, size_t len,
+ unsigned int flags)
+{
+ if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes))
+ return 0;
+ if (unlikely(!len))
+ return 0;
+ if (in->f_flags & O_DIRECT)
+ return direct_splice_read(in, ppos, pipe, len, flags);
+ return filemap_splice_read(in, ppos, pipe, len, flags);
+}
goto close_fail;
}
} else {
- struct user_namespace *mnt_userns;
+ struct mnt_idmap *idmap;
struct inode *inode;
int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW |
O_LARGEFILE | O_EXCL;
* a process dumps core while its cwd is e.g. on a vfat
* filesystem.
*/
- mnt_userns = file_mnt_user_ns(cprm.file);
- if (!vfsuid_eq_kuid(i_uid_into_vfsuid(mnt_userns, inode),
+ idmap = file_mnt_idmap(cprm.file);
+ if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
current_fsuid())) {
pr_info_ratelimited("Core dump to %s aborted: cannot preserve file owner\n",
cn.corename);
}
if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
goto close_fail;
- if (do_truncate(mnt_userns, cprm.file->f_path.dentry,
+ if (do_truncate(idmap, cprm.file->f_path.dentry,
0, 0, cprm.file))
goto close_fail;
}
}
}
+int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
+{
+ if (cprm->to_skip) {
+ if (!__dump_skip(cprm, cprm->to_skip))
+ return 0;
+ cprm->to_skip = 0;
+ }
+ return __dump_emit(cprm, addr, nr);
+}
+EXPORT_SYMBOL(dump_emit);
+
+void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
+{
+ cprm->to_skip = pos - cprm->pos;
+}
+EXPORT_SYMBOL(dump_skip_to);
+
+void dump_skip(struct coredump_params *cprm, size_t nr)
+{
+ cprm->to_skip += nr;
+}
+EXPORT_SYMBOL(dump_skip);
+
+#ifdef CONFIG_ELF_CORE
static int dump_emit_page(struct coredump_params *cprm, struct page *page)
{
- struct bio_vec bvec = {
- .bv_page = page,
- .bv_offset = 0,
- .bv_len = PAGE_SIZE,
- };
+ struct bio_vec bvec;
struct iov_iter iter;
struct file *file = cprm->file;
loff_t pos;
if (dump_interrupted())
return 0;
pos = file->f_pos;
+ bvec_set_page(&bvec, page, PAGE_SIZE, 0);
iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
n = __kernel_write_iter(cprm->file, &iter, &pos);
if (n != PAGE_SIZE)
return 1;
}
-int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
-{
- if (cprm->to_skip) {
- if (!__dump_skip(cprm, cprm->to_skip))
- return 0;
- cprm->to_skip = 0;
- }
- return __dump_emit(cprm, addr, nr);
-}
-EXPORT_SYMBOL(dump_emit);
-
-void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
-{
- cprm->to_skip = pos - cprm->pos;
-}
-EXPORT_SYMBOL(dump_skip_to);
-
-void dump_skip(struct coredump_params *cprm, size_t nr)
-{
- cprm->to_skip += nr;
-}
-EXPORT_SYMBOL(dump_skip);
-
-#ifdef CONFIG_ELF_CORE
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len)
{
* Helper function for iterating across a vma list. It ensures that the caller
* will visit `gate_vma' prior to terminating the search.
*/
- static struct vm_area_struct *coredump_next_vma(struct ma_state *mas,
+ static struct vm_area_struct *coredump_next_vma(struct vma_iterator *vmi,
struct vm_area_struct *vma,
struct vm_area_struct *gate_vma)
{
if (gate_vma && (vma == gate_vma))
return NULL;
- vma = mas_next(mas, ULONG_MAX);
+ vma = vma_next(vmi);
if (vma)
return vma;
return gate_vma;
{
struct vm_area_struct *gate_vma, *vma = NULL;
struct mm_struct *mm = current->mm;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(vmi, mm, 0);
int i = 0;
/*
return false;
}
- while ((vma = coredump_next_vma(&mas, vma, gate_vma)) != NULL) {
+ while ((vma = coredump_next_vma(&vmi, vma, gate_vma)) != NULL) {
struct core_vma_metadata *m = cprm->vma_meta + i;
m->start = vma->vm_start;
}
static int erofs_map_blocks_flatmode(struct inode *inode,
- struct erofs_map_blocks *map,
- int flags)
+ struct erofs_map_blocks *map)
{
erofs_blk_t nblocks, lastblk;
u64 offset = map->m_la;
map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la;
map->m_plen = blknr_to_addr(lastblk) - offset;
} else if (tailendpacking) {
- /* 2 - inode inline B: inode, [xattrs], inline last blk... */
- struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
-
- map->m_pa = iloc(sbi, vi->nid) + vi->inode_isize +
- vi->xattr_isize + erofs_blkoff(map->m_la);
+ map->m_pa = erofs_iloc(inode) + vi->inode_isize +
+ vi->xattr_isize + erofs_blkoff(offset);
map->m_plen = inode->i_size - offset;
/* inline data should be located in the same meta block */
return 0;
}
-int erofs_map_blocks(struct inode *inode,
- struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
{
struct super_block *sb = inode->i_sb;
struct erofs_inode *vi = EROFS_I(inode);
void *kaddr;
int err = 0;
- trace_erofs_map_blocks_enter(inode, map, flags);
+ trace_erofs_map_blocks_enter(inode, map, 0);
map->m_deviceid = 0;
if (map->m_la >= inode->i_size) {
/* leave out-of-bound access unmapped */
}
if (vi->datalayout != EROFS_INODE_CHUNK_BASED) {
- err = erofs_map_blocks_flatmode(inode, map, flags);
+ err = erofs_map_blocks_flatmode(inode, map);
goto out;
}
unit = EROFS_BLOCK_MAP_ENTRY_SIZE; /* block map */
chunknr = map->m_la >> vi->chunkbits;
- pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize +
+ pos = ALIGN(erofs_iloc(inode) + vi->inode_isize +
vi->xattr_isize, unit) + unit * chunknr;
kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(pos), EROFS_KMAP);
out:
if (!err)
map->m_llen = map->m_plen;
- trace_erofs_map_blocks_exit(inode, map, flags, 0);
+ trace_erofs_map_blocks_exit(inode, map, 0, err);
return err;
}
map.m_la = offset;
map.m_llen = length;
- ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ ret = erofs_map_blocks(inode, &map);
if (ret < 0)
return ret;
return -EINVAL;
vma->vm_ops = &erofs_dax_vm_ops;
- vma->vm_flags |= VM_HUGEPAGE;
+ vm_flags_set(vma, VM_HUGEPAGE);
return 0;
}
#else
BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
- vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
+ vm_flags_init(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
err = insert_vm_struct(mm, vma);
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+ if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL))
return -ENOMEM;
/*
}
tlb_finish_mmu(&tlb);
- /*
- * Shrink the vma to just the new range. Always succeeds.
- */
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
-
- return 0;
+ vma_prev(&vmi);
+ /* Shrink the vma to just the new range */
+ return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
}
/*
unsigned long stack_expand;
unsigned long rlim_stack;
struct mmu_gather tlb;
+ struct vma_iterator vmi;
#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size */
vm_flags |= mm->def_flags;
vm_flags |= VM_STACK_INCOMPLETE_SETUP;
+ vma_iter_init(&vmi, mm, vma->vm_start);
+
tlb_gather_mmu(&tlb, mm);
- ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end,
+ ret = mprotect_fixup(&vmi, &tlb, vma, &prev, vma->vm_start, vma->vm_end,
vm_flags);
tlb_finish_mmu(&tlb);
}
/* mprotect_fixup is overkill to remove the temporary stack flags */
- vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
+ vm_flags_clear(vma, VM_STACK_INCOMPLETE_SETUP);
stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
stack_size = vma->vm_end - vma->vm_start;
active_mm = tsk->active_mm;
tsk->active_mm = mm;
tsk->mm = mm;
+ mm_init_cid(mm);
/*
* This prevents preemption while active_mm is being loaded and
* it and mm are being updated, which could cause problems for
void would_dump(struct linux_binprm *bprm, struct file *file)
{
struct inode *inode = file_inode(file);
- struct user_namespace *mnt_userns = file_mnt_user_ns(file);
- if (inode_permission(mnt_userns, inode, MAY_READ) < 0) {
+ struct mnt_idmap *idmap = file_mnt_idmap(file);
+ if (inode_permission(idmap, inode, MAY_READ) < 0) {
struct user_namespace *old, *user_ns;
bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
/* Ensure mm->user_ns contains the executable */
user_ns = old = bprm->mm->user_ns;
while ((user_ns != &init_user_ns) &&
- !privileged_wrt_inode_uidgid(user_ns, mnt_userns, inode))
+ !privileged_wrt_inode_uidgid(user_ns, idmap, inode))
user_ns = user_ns->parent;
if (old != user_ns) {
static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
{
/* Handle suid and sgid on files */
- struct user_namespace *mnt_userns;
+ struct mnt_idmap *idmap;
struct inode *inode = file_inode(file);
unsigned int mode;
vfsuid_t vfsuid;
if (!(mode & (S_ISUID|S_ISGID)))
return;
- mnt_userns = file_mnt_user_ns(file);
+ idmap = file_mnt_idmap(file);
/* Be careful if suid/sgid is set */
inode_lock(inode);
/* reload atomically mode/uid/gid now that lock held */
mode = inode->i_mode;
- vfsuid = i_uid_into_vfsuid(mnt_userns, inode);
- vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
+ vfsuid = i_uid_into_vfsuid(idmap, inode);
+ vfsgid = i_gid_into_vfsgid(idmap, inode);
inode_unlock(inode);
/* We ignore suid/sgid if there are no mappings for them in the ns */
*/
check_unsafe_exec(bprm);
current->in_execve = 1;
+ sched_mm_cid_before_execve(current);
file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (retval < 0)
goto out;
+ sched_mm_cid_after_execve(current);
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
force_fatal_sig(SIGSEGV);
out_unmark:
+ sched_mm_cid_after_execve(current);
current->fs->in_exec = 0;
current->in_execve = 0;
for (i = 0; i < nr_wait; i++) {
int err2;
- err2 = fscrypt_decrypt_pagecache_blocks(page, blocksize,
+ err2 = fscrypt_decrypt_pagecache_blocks(page_folio(page),
+ blocksize,
bh_offset(wait[i]));
if (err2) {
clear_buffer_uptodate(wait[i]);
static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
{
struct address_space *mapping = mpd->inode->i_mapping;
- struct pagevec pvec;
- unsigned int nr_pages;
+ struct folio_batch fbatch;
+ unsigned int nr_folios;
long left = mpd->wbc->nr_to_write;
pgoff_t index = mpd->first_page;
pgoff_t end = mpd->last_page;
tag = PAGECACHE_TAG_TOWRITE;
else
tag = PAGECACHE_TAG_DIRTY;
-
- pagevec_init(&pvec);
+ folio_batch_init(&fbatch);
mpd->map.m_len = 0;
mpd->next_page = index;
while (index <= end) {
- nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
- tag);
- if (nr_pages == 0)
+ nr_folios = filemap_get_folios_tag(mapping, &index, end,
+ tag, &fbatch);
+ if (nr_folios == 0)
break;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch.folios[i];
/*
* Accumulated enough dirty pages? This doesn't apply
goto out;
/* If we can't merge this page, we are done. */
- if (mpd->map.m_len > 0 && mpd->next_page != page->index)
+ if (mpd->map.m_len > 0 && mpd->next_page != folio->index)
goto out;
- lock_page(page);
+ folio_lock(folio);
/*
* If the page is no longer dirty, or its mapping no
* longer corresponds to inode we are writing (which
* page is already under writeback and we are not doing
* a data integrity writeback, skip the page
*/
- if (!PageDirty(page) ||
- (PageWriteback(page) &&
+ if (!folio_test_dirty(folio) ||
+ (folio_test_writeback(folio) &&
(mpd->wbc->sync_mode == WB_SYNC_NONE)) ||
- unlikely(page->mapping != mapping)) {
- unlock_page(page);
+ unlikely(folio->mapping != mapping)) {
+ folio_unlock(folio);
continue;
}
- wait_on_page_writeback(page);
- BUG_ON(PageWriteback(page));
+ folio_wait_writeback(folio);
+ BUG_ON(folio_test_writeback(folio));
/*
* Should never happen but for buggy code in
*
*/
- if (!page_has_buffers(page)) {
- ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", page->index);
- ClearPageDirty(page);
- unlock_page(page);
+ if (!folio_buffers(folio)) {
+ ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", folio->index);
+ folio_clear_dirty(folio);
+ folio_unlock(folio);
continue;
}
if (mpd->map.m_len == 0)
- mpd->first_page = page->index;
- mpd->next_page = page->index + 1;
+ mpd->first_page = folio->index;
+ mpd->next_page = folio->index + folio_nr_pages(folio);
/*
* Writeout for transaction commit where we cannot
* modify metadata is simple. Just submit the page.
*/
if (!mpd->can_map) {
- if (ext4_page_nomap_can_writeout(page)) {
- err = mpage_submit_page(mpd, page);
+ if (ext4_page_nomap_can_writeout(&folio->page)) {
+ err = mpage_submit_page(mpd, &folio->page);
if (err < 0)
goto out;
} else {
- unlock_page(page);
- mpd->first_page++;
+ folio_unlock(folio);
+ mpd->first_page += folio_nr_pages(folio);
}
} else {
/* Add all dirty buffers to mpd */
- lblk = ((ext4_lblk_t)page->index) <<
+ lblk = ((ext4_lblk_t)folio->index) <<
(PAGE_SHIFT - blkbits);
- head = page_buffers(page);
+ head = folio_buffers(folio);
err = mpage_process_page_bufs(mpd, head, head,
- lblk);
+ lblk);
if (err <= 0)
goto out;
err = 0;
}
- left--;
+ left -= folio_nr_pages(folio);
}
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
mpd->scanned_until_end = 1;
return 0;
out:
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
return err;
}
- static int ext4_writepage_cb(struct page *page, struct writeback_control *wbc,
+ static int ext4_writepage_cb(struct folio *folio, struct writeback_control *wbc,
void *data)
{
- return ext4_writepage(page, wbc);
+ return ext4_writepage(&folio->page, wbc);
}
static int ext4_do_writepages(struct mpage_da_data *mpd)
if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
/* We expect the key to be set. */
BUG_ON(!fscrypt_has_encryption_key(inode));
- err = fscrypt_decrypt_pagecache_blocks(page, blocksize,
+ err = fscrypt_decrypt_pagecache_blocks(page_folio(page),
+ blocksize,
bh_offset(bh));
if (err) {
clear_buffer_uptodate(bh);
*
* Called with inode->i_rwsem down.
*/
-int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
ATTR_GID | ATTR_TIMES_SET))))
return -EPERM;
- error = setattr_prepare(mnt_userns, dentry, attr);
+ error = setattr_prepare(idmap, dentry, attr);
if (error)
return error;
if (error)
return error;
- if (is_quota_modification(mnt_userns, inode, attr)) {
+ if (is_quota_modification(idmap, inode, attr)) {
error = dquot_initialize(inode);
if (error)
return error;
}
- if (i_uid_needs_update(mnt_userns, attr, inode) ||
- i_gid_needs_update(mnt_userns, attr, inode)) {
+ if (i_uid_needs_update(idmap, attr, inode) ||
+ i_gid_needs_update(idmap, attr, inode)) {
handle_t *handle;
/* (user+group)*(old+new) structure, inode write (sb,
* counts xattr inode references.
*/
down_read(&EXT4_I(inode)->xattr_sem);
- error = dquot_transfer(mnt_userns, inode, attr);
+ error = dquot_transfer(idmap, inode, attr);
up_read(&EXT4_I(inode)->xattr_sem);
if (error) {
}
/* Update corresponding info in inode so that everything is in
* one transaction */
- i_uid_update(mnt_userns, attr, inode);
- i_gid_update(mnt_userns, attr, inode);
+ i_uid_update(idmap, attr, inode);
+ i_gid_update(idmap, attr, inode);
error = ext4_mark_inode_dirty(handle, inode);
ext4_journal_stop(handle);
if (unlikely(error)) {
if (!error) {
if (inc_ivers)
inode_inc_iversion(inode);
- setattr_copy(mnt_userns, inode, attr);
+ setattr_copy(idmap, inode, attr);
mark_inode_dirty(inode);
}
ext4_orphan_del(NULL, inode);
if (!error && (ia_valid & ATTR_MODE))
- rc = posix_acl_chmod(mnt_userns, dentry, inode->i_mode);
+ rc = posix_acl_chmod(idmap, dentry, inode->i_mode);
err_out:
if (error)
return 1; /* use the iomap defaults */
}
-int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
+int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int query_flags)
{
struct inode *inode = d_inode(path->dentry);
STATX_ATTR_NODUMP |
STATX_ATTR_VERITY);
- generic_fillattr(mnt_userns, inode, stat);
+ generic_fillattr(idmap, inode, stat);
return 0;
}
-int ext4_file_getattr(struct user_namespace *mnt_userns,
+int ext4_file_getattr(struct mnt_idmap *idmap,
const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
struct inode *inode = d_inode(path->dentry);
u64 delalloc_blocks;
- ext4_getattr(mnt_userns, path, stat, request_mask, query_flags);
+ ext4_getattr(idmap, path, stat, request_mask, query_flags);
/*
* If there is inline data in the inode, the inode will normally not
*
* However, we may have to redirty a page (see below.)
*/
- static int ext4_journalled_writepage_callback(struct page *page,
+ static int ext4_journalled_writepage_callback(struct folio *folio,
struct writeback_control *wbc,
void *data)
{
struct buffer_head *bh, *head;
struct journal_head *jh;
- bh = head = page_buffers(page);
+ bh = head = folio_buffers(folio);
do {
/*
* We have to redirty a page in these cases:
if (buffer_dirty(bh) ||
(jh && (jh->b_transaction != transaction ||
jh->b_next_transaction))) {
- redirty_page_for_writepage(wbc, page);
+ folio_redirty_for_writepage(wbc, folio);
goto out;
}
} while ((bh = bh->b_this_page) != head);
{
const struct ext4_fs_context *ctx = fc->fs_private;
const struct ext4_sb_info *sbi = EXT4_SB(sb);
- int err;
if (!fscrypt_is_dummy_policy_set(&ctx->dummy_enc_policy))
return 0;
"Conflicting test_dummy_encryption options");
return -EINVAL;
}
- /*
- * fscrypt_add_test_dummy_key() technically changes the super_block, so
- * technically it should be delayed until ext4_apply_options() like the
- * other changes. But since we never get here for remounts (see above),
- * and this is the last chance to report errors, we do it here.
- */
- err = fscrypt_add_test_dummy_key(sb, &ctx->dummy_enc_policy);
- if (err)
- ext4_msg(NULL, KERN_WARNING,
- "Error adding test dummy encryption key [%d]", err);
- return err;
+ return 0;
}
static void ext4_apply_test_dummy_encryption(struct ext4_fs_context *ctx,
}
}
- if (ext4_has_feature_verity(sb) && sb->s_blocksize != PAGE_SIZE) {
- ext4_msg(sb, KERN_ERR, "Unsupported blocksize for fs-verity");
- goto failed_mount_wq;
- }
-
/*
* Get the # of file system overhead blocks from the
* superblock if present.
static inline loff_t f2fs_readpage_limit(struct inode *inode)
{
- if (IS_ENABLED(CONFIG_FS_VERITY) &&
- (IS_VERITY(inode) || f2fs_verity_in_progress(inode)))
+ if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode))
return inode->i_sb->s_maxbytes;
return i_size_read(inode);
int ret = 0;
int done = 0, retry = 0;
struct page *pages[F2FS_ONSTACK_PAGES];
+ struct folio_batch fbatch;
struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
struct bio *bio = NULL;
sector_t last_block;
.private = NULL,
};
#endif
+ int nr_folios, p, idx;
int nr_pages;
pgoff_t index;
pgoff_t end; /* Inclusive */
int submitted = 0;
int i;
+ folio_batch_init(&fbatch);
+
if (get_dirty_pages(mapping->host) <=
SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
set_inode_flag(mapping->host, FI_HOT_DATA);
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !retry && (index <= end)) {
- nr_pages = find_get_pages_range_tag(mapping, &index, end,
- tag, F2FS_ONSTACK_PAGES, pages);
- if (nr_pages == 0)
+ nr_pages = 0;
+ again:
+ nr_folios = filemap_get_folios_tag(mapping, &index, end,
+ tag, &fbatch);
+ if (nr_folios == 0) {
+ if (nr_pages)
+ goto write;
break;
+ }
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch.folios[i];
+
+ idx = 0;
+ p = folio_nr_pages(folio);
+ add_more:
+ pages[nr_pages] = folio_page(folio, idx);
+ folio_get(folio);
+ if (++nr_pages == F2FS_ONSTACK_PAGES) {
+ index = folio->index + idx + 1;
+ folio_batch_release(&fbatch);
+ goto write;
+ }
+ if (++idx < p)
+ goto add_more;
+ }
+ folio_batch_release(&fbatch);
+ goto again;
+ write:
for (i = 0; i < nr_pages; i++) {
struct page *page = pages[i];
+ struct folio *folio = page_folio(page);
bool need_readd;
readd:
need_readd = false;
}
if (!f2fs_cluster_can_merge_page(&cc,
- page->index)) {
+ folio->index)) {
ret = f2fs_write_multi_pages(&cc,
&submitted, wbc, io_type);
if (!ret)
}
if (unlikely(f2fs_cp_error(sbi)))
- goto lock_page;
+ goto lock_folio;
if (!f2fs_cluster_is_empty(&cc))
- goto lock_page;
+ goto lock_folio;
if (f2fs_all_cluster_page_ready(&cc,
pages, i, nr_pages, true))
- goto lock_page;
+ goto lock_folio;
ret2 = f2fs_prepare_compress_overwrite(
inode, &pagep,
- page->index, &fsdata);
+ folio->index, &fsdata);
if (ret2 < 0) {
ret = ret2;
done = 1;
break;
} else if (ret2 &&
(!f2fs_compress_write_end(inode,
- fsdata, page->index, 1) ||
+ fsdata, folio->index, 1) ||
!f2fs_all_cluster_page_ready(&cc,
- pages, i, nr_pages, false))) {
+ pages, i, nr_pages,
+ false))) {
retry = 1;
break;
}
break;
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
- lock_page:
+ lock_folio:
#endif
- done_index = page->index;
+ done_index = folio->index;
retry_write:
- lock_page(page);
+ folio_lock(folio);
- if (unlikely(page->mapping != mapping)) {
+ if (unlikely(folio->mapping != mapping)) {
continue_unlock:
- unlock_page(page);
+ folio_unlock(folio);
continue;
}
- if (!PageDirty(page)) {
+ if (!folio_test_dirty(folio)) {
/* someone wrote it for us */
goto continue_unlock;
}
- if (PageWriteback(page)) {
+ if (folio_test_writeback(folio)) {
if (wbc->sync_mode != WB_SYNC_NONE)
- f2fs_wait_on_page_writeback(page,
+ f2fs_wait_on_page_writeback(
+ &folio->page,
DATA, true, true);
else
goto continue_unlock;
}
- if (!clear_page_dirty_for_io(page))
+ if (!folio_clear_dirty_for_io(folio))
goto continue_unlock;
#ifdef CONFIG_F2FS_FS_COMPRESSION
if (f2fs_compressed_file(inode)) {
- get_page(page);
- f2fs_compress_ctx_add_page(&cc, page);
+ folio_get(folio);
+ f2fs_compress_ctx_add_page(&cc, &folio->page);
continue;
}
#endif
- ret = f2fs_write_single_data_page(page, &submitted,
- &bio, &last_block, wbc, io_type,
- 0, true);
+ ret = f2fs_write_single_data_page(&folio->page,
+ &submitted, &bio, &last_block,
+ wbc, io_type, 0, true);
if (ret == AOP_WRITEPAGE_ACTIVATE)
- unlock_page(page);
+ folio_unlock(folio);
#ifdef CONFIG_F2FS_FS_COMPRESSION
result:
#endif
}
goto next;
}
- done_index = page->index + 1;
+ done_index = folio->index +
+ folio_nr_pages(folio);
done = 1;
break;
}
#include <linux/falloc.h>
#include <linux/uio.h>
#include <linux/fs.h>
+#include <linux/filelock.h>
static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, int opcode,
return err;
if (fc->handle_killpriv_v2 &&
- setattr_should_drop_suidgid(&init_user_ns, file_inode(file))) {
+ setattr_should_drop_suidgid(&nop_mnt_idmap,
+ file_inode(file))) {
goto writethrough;
}
return false;
}
- static int fuse_writepages_fill(struct page *page,
+ static int fuse_writepages_fill(struct folio *folio,
struct writeback_control *wbc, void *_data)
{
struct fuse_fill_wb_data *data = _data;
goto out_unlock;
}
- if (wpa && fuse_writepage_need_send(fc, page, ap, data)) {
+ if (wpa && fuse_writepage_need_send(fc, &folio->page, ap, data)) {
fuse_writepages_send(data);
data->wpa = NULL;
}
data->max_pages = 1;
ap = &wpa->ia.ap;
- fuse_write_args_fill(&wpa->ia, data->ff, page_offset(page), 0);
+ fuse_write_args_fill(&wpa->ia, data->ff, folio_pos(folio), 0);
wpa->ia.write.in.write_flags |= FUSE_WRITE_CACHE;
wpa->next = NULL;
ap->args.in_pages = true;
ap->num_pages = 0;
wpa->inode = inode;
}
- set_page_writeback(page);
+ folio_start_writeback(folio);
- copy_highpage(tmp_page, page);
+ copy_highpage(tmp_page, &folio->page);
ap->pages[ap->num_pages] = tmp_page;
ap->descs[ap->num_pages].offset = 0;
ap->descs[ap->num_pages].length = PAGE_SIZE;
- data->orig_pages[ap->num_pages] = page;
+ data->orig_pages[ap->num_pages] = &folio->page;
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
inc_node_page_state(tmp_page, NR_WRITEBACK_TEMP);
spin_lock(&fi->lock);
ap->num_pages++;
spin_unlock(&fi->lock);
- } else if (fuse_writepage_add(wpa, page)) {
+ } else if (fuse_writepage_add(wpa, &folio->page)) {
data->wpa = wpa;
} else {
- end_page_writeback(page);
+ folio_end_writeback(folio);
}
out_unlock:
- unlock_page(page);
+ folio_unlock(folio);
return err;
}
#include "aops.h"
-void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
- unsigned int from, unsigned int len)
+void gfs2_trans_add_databufs(struct gfs2_inode *ip, struct folio *folio,
+ unsigned int from, unsigned int len)
{
- struct buffer_head *head = page_buffers(page);
+ struct buffer_head *head = folio_buffers(folio);
unsigned int bsize = head->b_size;
struct buffer_head *bh;
unsigned int to = from + len;
{
struct inode *inode = page->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
- struct gfs2_sbd *sdp = GFS2_SB(inode);
if (PageChecked(page)) {
ClearPageChecked(page);
create_empty_buffers(page, inode->i_sb->s_blocksize,
BIT(BH_Dirty)|BIT(BH_Uptodate));
}
- gfs2_page_add_databufs(ip, page, 0, sdp->sd_vfs->s_blocksize);
+ gfs2_trans_add_databufs(ip, page_folio(page), 0, PAGE_SIZE);
}
return gfs2_write_jdata_page(page, wbc);
}
}
/**
- * gfs2_write_jdata_pagevec - Write back a pagevec's worth of pages
+ * gfs2_write_jdata_batch - Write back a folio batch's worth of folios
* @mapping: The mapping
* @wbc: The writeback control
- * @pvec: The vector of pages
- * @nr_pages: The number of pages to write
+ * @fbatch: The batch of folios
* @done_index: Page index
*
* Returns: non-zero if loop should terminate, zero otherwise
*/
- static int gfs2_write_jdata_pagevec(struct address_space *mapping,
+ static int gfs2_write_jdata_batch(struct address_space *mapping,
struct writeback_control *wbc,
- struct pagevec *pvec,
- int nr_pages,
+ struct folio_batch *fbatch,
pgoff_t *done_index)
{
struct inode *inode = mapping->host;
struct gfs2_sbd *sdp = GFS2_SB(inode);
- unsigned nrblocks = nr_pages * (PAGE_SIZE >> inode->i_blkbits);
+ unsigned nrblocks;
int i;
int ret;
+ int nr_pages = 0;
+ int nr_folios = folio_batch_count(fbatch);
+
+ for (i = 0; i < nr_folios; i++)
+ nr_pages += folio_nr_pages(fbatch->folios[i]);
+ nrblocks = nr_pages * (PAGE_SIZE >> inode->i_blkbits);
ret = gfs2_trans_begin(sdp, nrblocks, nrblocks);
if (ret < 0)
return ret;
- for(i = 0; i < nr_pages; i++) {
- struct page *page = pvec->pages[i];
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch->folios[i];
- *done_index = page->index;
+ *done_index = folio->index;
- lock_page(page);
+ folio_lock(folio);
- if (unlikely(page->mapping != mapping)) {
+ if (unlikely(folio->mapping != mapping)) {
continue_unlock:
- unlock_page(page);
+ folio_unlock(folio);
continue;
}
- if (!PageDirty(page)) {
+ if (!folio_test_dirty(folio)) {
/* someone wrote it for us */
goto continue_unlock;
}
- if (PageWriteback(page)) {
+ if (folio_test_writeback(folio)) {
if (wbc->sync_mode != WB_SYNC_NONE)
- wait_on_page_writeback(page);
+ folio_wait_writeback(folio);
else
goto continue_unlock;
}
- BUG_ON(PageWriteback(page));
- if (!clear_page_dirty_for_io(page))
+ BUG_ON(folio_test_writeback(folio));
+ if (!folio_clear_dirty_for_io(folio))
goto continue_unlock;
trace_wbc_writepage(wbc, inode_to_bdi(inode));
- ret = __gfs2_jdata_writepage(page, wbc);
+ ret = __gfs2_jdata_writepage(&folio->page, wbc);
if (unlikely(ret)) {
if (ret == AOP_WRITEPAGE_ACTIVATE) {
- unlock_page(page);
+ folio_unlock(folio);
ret = 0;
} else {
* not be suitable for data integrity
* writeout).
*/
- *done_index = page->index + 1;
+ *done_index = folio->index +
+ folio_nr_pages(folio);
ret = 1;
break;
}
{
int ret = 0;
int done = 0;
- struct pagevec pvec;
- int nr_pages;
+ struct folio_batch fbatch;
+ int nr_folios;
pgoff_t writeback_index;
pgoff_t index;
pgoff_t end;
int range_whole = 0;
xa_mark_t tag;
- pagevec_init(&pvec);
+ folio_batch_init(&fbatch);
if (wbc->range_cyclic) {
writeback_index = mapping->writeback_index; /* prev offset */
index = writeback_index;
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && (index <= end)) {
- nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
- tag);
- if (nr_pages == 0)
+ nr_folios = filemap_get_folios_tag(mapping, &index, end,
+ tag, &fbatch);
+ if (nr_folios == 0)
break;
- ret = gfs2_write_jdata_pagevec(mapping, wbc, &pvec, nr_pages, &done_index);
+ ret = gfs2_write_jdata_batch(mapping, wbc, &fbatch,
+ &done_index);
if (ret)
done = 1;
if (ret > 0)
ret = 0;
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
"AIL buffer %p: blocknr %llu state 0x%08lx mapping %p page "
"state 0x%lx\n",
bh, (unsigned long long)bh->b_blocknr, bh->b_state,
- bh->b_page->mapping, bh->b_page->flags);
+ bh->b_folio->mapping, bh->b_folio->flags);
fs_err(sdp, "AIL glock %u:%llu mapping %p\n",
gl->gl_name.ln_type, gl->gl_name.ln_number,
gfs2_glock2aspace(gl));
struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(gl);
int error;
- if (!test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
+ if (!rgd || !test_and_clear_bit(GLF_DIRTY, &gl->gl_flags))
return 0;
GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_EXCLUSIVE);
struct address_space *mapping = &sdp->sd_aspace;
struct gfs2_rgrpd *rgd = gfs2_glock2rgrp(gl);
const unsigned bsize = sdp->sd_sb.sb_bsize;
- loff_t start = (rgd->rd_addr * bsize) & PAGE_MASK;
- loff_t end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
+ loff_t start, end;
+ if (!rgd)
+ return;
+ start = (rgd->rd_addr * bsize) & PAGE_MASK;
+ end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
gfs2_rgrp_brelse(rgd);
WARN_ON_ONCE(!(flags & DIO_METADATA));
truncate_inode_pages_range(mapping, start, end);
struct gfs2_inode *ip = gl->gl_object;
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
- if (!remote || sb_rdonly(sdp->sd_vfs))
+ if (!remote || sb_rdonly(sdp->sd_vfs) ||
+ test_bit(SDF_DEACTIVATING, &sdp->sd_flags))
return;
if (gl->gl_demote_state == LM_ST_UNLOCKED &&
gl->gl_state == LM_ST_SHARED && ip) {
gl->gl_lockref.count++;
- if (!queue_delayed_work(gfs2_delete_workqueue,
- &gl->gl_delete, 0))
+ if (!gfs2_queue_try_to_evict(gl))
gl->gl_lockref.count--;
}
}
-static int iopen_go_demote_ok(const struct gfs2_glock *gl)
-{
- return !gfs2_delete_work_queued(gl);
-}
-
/**
* inode_go_free - wake up anyone waiting for dlm's unlock ast to free it
* @gl: glock being freed
.go_type = LM_TYPE_IOPEN,
.go_callback = iopen_go_callback,
.go_dump = inode_go_dump,
- .go_demote_ok = iopen_go_demote_ok,
.go_flags = GLOF_LRU | GLOF_NONDISK,
.go_subclass = 1,
};
brelse(bd->bd_bh);
}
- static int __gfs2_writepage(struct page *page, struct writeback_control *wbc,
++static int __gfs2_writepage(struct folio *folio, struct writeback_control *wbc,
+ void *data)
+{
+ struct address_space *mapping = data;
- int ret = mapping->a_ops->writepage(page, wbc);
++ int ret = mapping->a_ops->writepage(&folio->page, wbc);
+ mapping_set_error(mapping, ret);
+ return ret;
+}
+
/**
* gfs2_ail1_start_one - Start I/O on a transaction
* @sdp: The superblock
continue;
gl = bd->bd_gl;
list_move(&bd->bd_ail_st_list, &tr->tr_ail1_list);
- mapping = bh->b_page->mapping;
+ mapping = bh->b_folio->mapping;
if (!mapping)
continue;
spin_unlock(&sdp->sd_ail_lock);
- ret = filemap_fdatawrite_wbc(mapping, wbc);
+ ret = write_cache_pages(mapping, wbc, __gfs2_writepage, mapping);
if (need_resched()) {
blk_finish_plug(plug);
cond_resched();
* way when do_mmap unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
+ vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
vma->vm_ops = &hugetlb_vm_ops;
ret = seal_check_future_write(info->seals, vma);
{
pte_t *ptep, pte;
- ptep = huge_pte_offset(vma->vm_mm, addr,
- huge_page_size(hstate_vma(vma)));
-
+ ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma)));
if (!ptep)
return false;
*/
static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start)
{
+ unsigned long offset = 0;
+
if (vma->vm_pgoff < start)
- return (start - vma->vm_pgoff) << PAGE_SHIFT;
- else
- return 0;
+ offset = (start - vma->vm_pgoff) << PAGE_SHIFT;
+
+ return vma->vm_start + offset;
}
static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end)
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
- if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page))
+ if (!hugetlb_vma_maps_page(vma, v_start, page))
continue;
if (!hugetlb_vma_trylock_write(vma)) {
break;
}
- unmap_hugepage_range(vma, vma->vm_start + v_start, v_end,
- NULL, ZAP_FLAG_DROP_MARKER);
+ unmap_hugepage_range(vma, v_start, v_end, NULL,
+ ZAP_FLAG_DROP_MARKER);
hugetlb_vma_unlock_write(vma);
}
*/
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
- if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page))
- unmap_hugepage_range(vma, vma->vm_start + v_start,
- v_end, NULL,
- ZAP_FLAG_DROP_MARKER);
+ if (hugetlb_vma_maps_page(vma, v_start, page))
+ unmap_hugepage_range(vma, v_start, v_end, NULL,
+ ZAP_FLAG_DROP_MARKER);
kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
hugetlb_vma_unlock_write(vma);
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
- unmap_hugepage_range(vma, vma->vm_start + v_start, v_end,
- NULL, zap_flags);
+ unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags);
/*
* Note that vma lock only exists for shared/non-private
* as input to create an allocation policy.
*/
vma_init(&pseudo_vma, mm);
- pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
+ vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
pseudo_vma.vm_file = file;
for (index = start; index < end; index++) {
* This is supposed to be the vaddr where the page is being
* faulted in, but we have no vaddr here.
*/
- struct page *page;
+ struct folio *folio;
unsigned long addr;
+ bool present;
cond_resched();
mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* See if already present in mapping to avoid alloc/free */
- page = find_get_page(mapping, index);
- if (page) {
- put_page(page);
+ rcu_read_lock();
+ present = page_cache_next_miss(mapping, index, 1) != index;
+ rcu_read_unlock();
+ if (present) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
hugetlb_drop_vma_policy(&pseudo_vma);
continue;
}
/*
- * Allocate page without setting the avoid_reserve argument.
+ * Allocate folio without setting the avoid_reserve argument.
* There certainly are no reserves associated with the
* pseudo_vma. However, there could be shared mappings with
* reserves for the file at the inode level. If we fallocate
- * pages in these areas, we need to consume the reserves
+ * folios in these areas, we need to consume the reserves
* to keep reservation accounting consistent.
*/
- page = alloc_huge_page(&pseudo_vma, addr, 0);
+ folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0);
hugetlb_drop_vma_policy(&pseudo_vma);
- if (IS_ERR(page)) {
+ if (IS_ERR(folio)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- error = PTR_ERR(page);
+ error = PTR_ERR(folio);
goto out;
}
- clear_huge_page(page, addr, pages_per_huge_page(h));
- __SetPageUptodate(page);
- error = hugetlb_add_to_page_cache(page, mapping, index);
+ clear_huge_page(&folio->page, addr, pages_per_huge_page(h));
+ __folio_mark_uptodate(folio);
+ error = hugetlb_add_to_page_cache(folio, mapping, index);
if (unlikely(error)) {
- restore_reserve_on_error(h, &pseudo_vma, addr, page);
- put_page(page);
+ restore_reserve_on_error(h, &pseudo_vma, addr, folio);
+ folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
goto out;
}
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- SetHPageMigratable(page);
+ folio_set_hugetlb_migratable(folio);
/*
- * unlock_page because locked by hugetlb_add_to_page_cache()
- * put_page() due to reference from alloc_huge_page()
+ * folio_unlock because locked by hugetlb_add_to_page_cache()
+ * folio_put() due to reference from alloc_hugetlb_folio()
*/
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
}
if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
return error;
}
-static int hugetlbfs_setattr(struct user_namespace *mnt_userns,
+static int hugetlbfs_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
unsigned int ia_valid = attr->ia_valid;
struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
- error = setattr_prepare(&init_user_ns, dentry, attr);
+ error = setattr_prepare(&nop_mnt_idmap, dentry, attr);
if (error)
return error;
hugetlb_vmtruncate(inode, newsize);
}
- setattr_copy(&init_user_ns, inode, attr);
+ setattr_copy(&nop_mnt_idmap, inode, attr);
mark_inode_dirty(inode);
return 0;
}
struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
inode->i_ino = get_next_ino();
- inode_init_owner(&init_user_ns, inode, dir, mode);
+ inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
/*
* File creation. Allocate an inode, and we're done..
*/
-static int hugetlbfs_mknod(struct user_namespace *mnt_userns, struct inode *dir,
+static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, dev_t dev)
{
struct inode *inode;
return 0;
}
-static int hugetlbfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
+static int hugetlbfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode)
{
- int retval = hugetlbfs_mknod(&init_user_ns, dir, dentry,
+ int retval = hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry,
mode | S_IFDIR, 0);
if (!retval)
inc_nlink(dir);
return retval;
}
-static int hugetlbfs_create(struct user_namespace *mnt_userns,
+static int hugetlbfs_create(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry,
umode_t mode, bool excl)
{
- return hugetlbfs_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
+ return hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0);
}
-static int hugetlbfs_tmpfile(struct user_namespace *mnt_userns,
+static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
struct inode *dir, struct file *file,
umode_t mode)
{
return finish_open_simple(file, 0);
}
-static int hugetlbfs_symlink(struct user_namespace *mnt_userns,
+static int hugetlbfs_symlink(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry,
const char *symname)
{
}
EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
+/**
+ * iomap_get_folio - get a folio reference for writing
+ * @iter: iteration structure
+ * @pos: start offset of write
+ *
+ * Returns a locked reference to the folio at @pos, or an error pointer if the
+ * folio could not be obtained.
+ */
+struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
+{
+ unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
+ struct folio *folio;
+
+ if (iter->flags & IOMAP_NOWAIT)
+ fgp |= FGP_NOWAIT;
+
+ folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
+ fgp, mapping_gfp_mask(iter->inode->i_mapping));
+ if (folio)
+ return folio;
+
+ if (iter->flags & IOMAP_NOWAIT)
+ return ERR_PTR(-EAGAIN);
+ return ERR_PTR(-ENOMEM);
+}
+EXPORT_SYMBOL_GPL(iomap_get_folio);
+
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags)
{
trace_iomap_release_folio(folio->mapping->host, folio_pos(folio),
return 0;
}
+static struct folio *__iomap_get_folio(struct iomap_iter *iter, loff_t pos,
+ size_t len)
+{
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
+
+ if (folio_ops && folio_ops->get_folio)
+ return folio_ops->get_folio(iter, pos, len);
+ else
+ return iomap_get_folio(iter, pos);
+}
+
+static void __iomap_put_folio(struct iomap_iter *iter, loff_t pos, size_t ret,
+ struct folio *folio)
+{
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
+
+ if (folio_ops && folio_ops->put_folio) {
+ folio_ops->put_folio(iter->inode, pos, ret, folio);
+ } else {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+}
+
static int iomap_write_begin_inline(const struct iomap_iter *iter,
struct folio *folio)
{
static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
size_t len, struct folio **foliop)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
- unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
int status = 0;
- if (iter->flags & IOMAP_NOWAIT)
- fgp |= FGP_NOWAIT;
-
BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
BUG_ON(pos + len > srcmap->offset + srcmap->length);
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
- if (page_ops && page_ops->page_prepare) {
- status = page_ops->page_prepare(iter->inode, pos, len);
- if (status)
- return status;
- }
-
- folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
- fgp, mapping_gfp_mask(iter->inode->i_mapping));
- if (!folio) {
- status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
- goto out_no_page;
- }
+ folio = __iomap_get_folio(iter, pos, len);
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);
/*
* Now we have a locked folio, before we do anything with it we need to
* could do the wrong thing here (zero a page range incorrectly or fail
* to zero) and corrupt data.
*/
- if (page_ops && page_ops->iomap_valid) {
- bool iomap_valid = page_ops->iomap_valid(iter->inode,
- &iter->iomap);
+ if (folio_ops && folio_ops->iomap_valid) {
+ bool iomap_valid = folio_ops->iomap_valid(iter->inode,
+ &iter->iomap);
if (!iomap_valid) {
iter->iomap.flags |= IOMAP_F_STALE;
status = 0;
return 0;
out_unlock:
- folio_unlock(folio);
- folio_put(folio);
+ __iomap_put_folio(iter, pos, 0, folio);
iomap_write_failed(iter->inode, pos, len);
-out_no_page:
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, 0, NULL);
return status;
}
static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
loff_t old_size = iter->inode->i_size;
size_t ret;
i_size_write(iter->inode, pos + ret);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
}
- folio_unlock(folio);
+ __iomap_put_folio(iter, pos, ret, folio);
if (old_size < pos)
pagecache_isize_extended(iter->inode, old_size, pos);
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, ret, &folio->page);
- folio_put(folio);
-
if (ret < len)
iomap_write_failed(iter->inode, pos + ret, len - ret);
return ret;
* For unwritten space on the page, we need to start the conversion to
* regular allocated space.
*/
- static int
- iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
+ static int iomap_do_writepage(struct folio *folio,
+ struct writeback_control *wbc, void *data)
{
- struct folio *folio = page_folio(page);
struct iomap_writepage_ctx *wpc = data;
struct inode *inode = folio->mapping->host;
u64 end_pos, isize;
/*
* Then do more get_blocks calls until we are done with this folio.
*/
- map_bh->b_page = &folio->page;
+ map_bh->b_folio = folio;
while (page_block < blocks_per_page) {
map_bh->b_state = 0;
map_bh->b_size = 0;
alloc_new:
if (args->bio == NULL) {
- if (first_hole == blocks_per_page) {
- if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
- &folio->page))
- goto out;
- }
args->bio = bio_alloc(bdev, bio_max_segs(args->nr_pages), opf,
gfp);
if (args->bio == NULL)
clean_buffers(page, ~0U);
}
- static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
+ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
void *data)
{
struct mpage_data *mpd = data;
struct bio *bio = mpd->bio;
- struct address_space *mapping = page->mapping;
- struct inode *inode = page->mapping->host;
+ struct address_space *mapping = folio->mapping;
+ struct inode *inode = mapping->host;
const unsigned blkbits = inode->i_blkbits;
- unsigned long end_index;
const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
sector_t last_block;
sector_t block_in_file;
int boundary = 0;
sector_t boundary_block = 0;
struct block_device *boundary_bdev = NULL;
- int length;
+ size_t length;
struct buffer_head map_bh;
loff_t i_size = i_size_read(inode);
int ret = 0;
+ struct buffer_head *head = folio_buffers(folio);
- if (page_has_buffers(page)) {
- struct buffer_head *head = page_buffers(page);
+ if (head) {
struct buffer_head *bh = head;
/* If they're all mapped and dirty, do it */
/*
* The page has no buffers: map it to disk
*/
- BUG_ON(!PageUptodate(page));
- block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
+ BUG_ON(!folio_test_uptodate(folio));
+ block_in_file = (sector_t)folio->index << (PAGE_SHIFT - blkbits);
+ /*
+ * Whole page beyond EOF? Skip allocating blocks to avoid leaking
+ * space.
+ */
+ if (block_in_file >= (i_size + (1 << blkbits) - 1) >> blkbits)
+ goto page_is_mapped;
last_block = (i_size - 1) >> blkbits;
- map_bh.b_page = page;
+ map_bh.b_folio = folio;
for (page_block = 0; page_block < blocks_per_page; ) {
map_bh.b_state = 0;
map_bh.b_size = 1 << blkbits;
if (mpd->get_block(inode, block_in_file, &map_bh, 1))
goto confused;
+ if (!buffer_mapped(&map_bh))
+ goto confused;
if (buffer_new(&map_bh))
clean_bdev_bh_alias(&map_bh);
if (buffer_boundary(&map_bh)) {
first_unmapped = page_block;
page_is_mapped:
- end_index = i_size >> PAGE_SHIFT;
- if (page->index >= end_index) {
+ /* Don't bother writing beyond EOF, truncate will discard the folio */
+ if (folio_pos(folio) >= i_size)
+ goto confused;
+ length = folio_size(folio);
+ if (folio_pos(folio) + length > i_size) {
/*
* The page straddles i_size. It must be zeroed out on each
* and every writepage invocation because it may be mmapped.
* is zeroed when mapped, and writes to that region are not
* written out to the file."
*/
- unsigned offset = i_size & (PAGE_SIZE - 1);
-
- if (page->index > end_index || !offset)
- goto confused;
- zero_user_segment(page, offset, PAGE_SIZE);
+ length = i_size - folio_pos(folio);
+ folio_zero_segment(folio, length, folio_size(folio));
}
/*
alloc_new:
if (bio == NULL) {
- if (first_unmapped == blocks_per_page) {
- if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
- page, wbc))
- goto out;
- }
bio = bio_alloc(bdev, BIO_MAX_VECS,
REQ_OP_WRITE | wbc_to_write_flags(wbc),
GFP_NOFS);
* the confused fail path above (OOM) will be very confused when
* it finds all bh marked clean (i.e. it will not write anything)
*/
- wbc_account_cgroup_owner(wbc, page, PAGE_SIZE);
+ wbc_account_cgroup_owner(wbc, &folio->page, folio_size(folio));
length = first_unmapped << blkbits;
- if (bio_add_page(bio, page, length, 0) < length) {
+ if (!bio_add_folio(bio, folio, length, 0)) {
bio = mpage_bio_submit(bio);
goto alloc_new;
}
- clean_buffers(page, first_unmapped);
+ clean_buffers(&folio->page, first_unmapped);
- BUG_ON(PageWriteback(page));
- set_page_writeback(page);
- unlock_page(page);
+ BUG_ON(folio_test_writeback(folio));
+ folio_start_writeback(folio);
+ folio_unlock(folio);
if (boundary || (first_unmapped != blocks_per_page)) {
bio = mpage_bio_submit(bio);
if (boundary_block) {
/*
* The caller has a ref on the inode, so *mapping is stable
*/
- ret = block_write_full_page(page, mpd->get_block, wbc);
+ ret = block_write_full_page(&folio->page, mpd->get_block, wbc);
mapping_set_error(mapping, ret);
out:
mpd->bio = bio;
*
* This is a library function, which implements the writepages()
* address_space_operation.
- *
- * If a page is already under I/O, generic_writepages() skips it, even
- * if it's dirty. This is desirable behaviour for memory-cleaning writeback,
- * but it is INCORRECT for data-integrity system calls such as fsync(). fsync()
- * and msync() need to guarantee that all the data which was dirty at the time
- * the call was made get new I/O started against them. If wbc->sync_mode is
- * WB_SYNC_ALL then we were called for data integrity and we must wait for
- * existing IO to complete.
*/
int
mpage_writepages(struct address_space *mapping,
#include <linux/freezer.h>
#include <linux/wait.h>
#include <linux/iversion.h>
+#include <linux/filelock.h>
#include <linux/uaccess.h>
#include <linux/sched/mm.h>
struct inode *inode);
static struct nfs_page *
nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
- struct page *page);
+ struct folio *folio);
static struct kmem_cache *nfs_wdata_cachep;
static mempool_t *nfs_wdata_mempool;
return 0;
}
-static struct nfs_page *
-nfs_page_private_request(struct page *page)
+static struct nfs_page *nfs_folio_private_request(struct folio *folio)
{
- if (!PagePrivate(page))
- return NULL;
- return (struct nfs_page *)page_private(page);
+ return folio_get_private(folio);
}
-/*
- * nfs_page_find_head_request_locked - find head request associated with @page
+/**
+ * nfs_folio_find_private_request - find head request associated with a folio
+ * @folio: pointer to folio
*
* must be called while holding the inode lock.
*
* returns matching head request with reference held, or NULL if not found.
*/
-static struct nfs_page *
-nfs_page_find_private_request(struct page *page)
+static struct nfs_page *nfs_folio_find_private_request(struct folio *folio)
{
- struct address_space *mapping = page_file_mapping(page);
+ struct address_space *mapping = folio_file_mapping(folio);
struct nfs_page *req;
- if (!PagePrivate(page))
+ if (!folio_test_private(folio))
return NULL;
spin_lock(&mapping->private_lock);
- req = nfs_page_private_request(page);
+ req = nfs_folio_private_request(folio);
if (req) {
WARN_ON_ONCE(req->wb_head != req);
kref_get(&req->wb_kref);
return req;
}
-static struct nfs_page *
-nfs_page_find_swap_request(struct page *page)
+static struct nfs_page *nfs_folio_find_swap_request(struct folio *folio)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = folio_file_mapping(folio)->host;
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_page *req = NULL;
- if (!PageSwapCache(page))
+ if (!folio_test_swapcache(folio))
return NULL;
mutex_lock(&nfsi->commit_mutex);
- if (PageSwapCache(page)) {
+ if (folio_test_swapcache(folio)) {
req = nfs_page_search_commits_for_head_request_locked(nfsi,
- page);
+ folio);
if (req) {
WARN_ON_ONCE(req->wb_head != req);
kref_get(&req->wb_kref);
return req;
}
-/*
- * nfs_page_find_head_request - find head request associated with @page
+/**
+ * nfs_folio_find_head_request - find head request associated with a folio
+ * @folio: pointer to folio
*
* returns matching head request with reference held, or NULL if not found.
*/
-static struct nfs_page *nfs_page_find_head_request(struct page *page)
+static struct nfs_page *nfs_folio_find_head_request(struct folio *folio)
{
struct nfs_page *req;
- req = nfs_page_find_private_request(page);
+ req = nfs_folio_find_private_request(folio);
if (!req)
- req = nfs_page_find_swap_request(page);
+ req = nfs_folio_find_swap_request(folio);
return req;
}
-static struct nfs_page *nfs_find_and_lock_page_request(struct page *page)
+static struct nfs_page *nfs_folio_find_and_lock_request(struct folio *folio)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = folio_file_mapping(folio)->host;
struct nfs_page *req, *head;
int ret;
for (;;) {
- req = nfs_page_find_head_request(page);
+ req = nfs_folio_find_head_request(folio);
if (!req)
return req;
head = nfs_page_group_lock_head(req);
return ERR_PTR(ret);
}
/* Ensure that nobody removed the request before we locked it */
- if (head == nfs_page_private_request(page))
+ if (head == nfs_folio_private_request(folio))
break;
- if (PageSwapCache(page))
+ if (folio_test_swapcache(folio))
break;
nfs_unlock_and_release_request(head);
}
}
/* Adjust the file length if we're writing beyond the end */
-static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
+static void nfs_grow_file(struct folio *folio, unsigned int offset,
+ unsigned int count)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = folio_file_mapping(folio)->host;
loff_t end, i_size;
pgoff_t end_index;
spin_lock(&inode->i_lock);
i_size = i_size_read(inode);
- end_index = (i_size - 1) >> PAGE_SHIFT;
- if (i_size > 0 && page_index(page) < end_index)
+ end_index = ((i_size - 1) >> folio_shift(folio)) << folio_order(folio);
+ if (i_size > 0 && folio_index(folio) < end_index)
goto out;
- end = page_file_offset(page) + ((loff_t)offset+count);
+ end = folio_file_pos(folio) + (loff_t)offset + (loff_t)count;
if (i_size >= end)
goto out;
trace_nfs_size_grow(inode, end);
spin_unlock(&inode->i_lock);
}
-static void nfs_mapping_set_error(struct page *page, int error)
+static void nfs_mapping_set_error(struct folio *folio, int error)
{
- struct address_space *mapping = page_file_mapping(page);
+ struct address_space *mapping = folio_file_mapping(folio);
- SetPageError(page);
+ folio_set_error(folio);
filemap_set_wb_err(mapping, error);
if (mapping->host)
errseq_set(&mapping->host->i_sb->s_wb_err,
*/
static bool nfs_page_group_covers_page(struct nfs_page *req)
{
+ unsigned int len = nfs_folio_length(nfs_page_to_folio(req));
struct nfs_page *tmp;
unsigned int pos = 0;
- unsigned int len = nfs_page_length(req->wb_page);
nfs_page_group_lock(req);
*/
static void nfs_mark_uptodate(struct nfs_page *req)
{
- if (PageUptodate(req->wb_page))
+ struct folio *folio = nfs_page_to_folio(req);
+
+ if (folio_test_uptodate(folio))
return;
if (!nfs_page_group_covers_page(req))
return;
- SetPageUptodate(req->wb_page);
+ folio_mark_uptodate(folio);
}
static int wb_priority(struct writeback_control *wbc)
#define NFS_CONGESTION_OFF_THRESH \
(NFS_CONGESTION_ON_THRESH - (NFS_CONGESTION_ON_THRESH >> 2))
-static void nfs_set_page_writeback(struct page *page)
+static void nfs_folio_set_writeback(struct folio *folio)
{
- struct inode *inode = page_file_mapping(page)->host;
- struct nfs_server *nfss = NFS_SERVER(inode);
- int ret = test_set_page_writeback(page);
-
- WARN_ON_ONCE(ret != 0);
+ struct nfs_server *nfss = NFS_SERVER(folio_file_mapping(folio)->host);
- if (atomic_long_inc_return(&nfss->writeback) >
- NFS_CONGESTION_ON_THRESH)
+ folio_start_writeback(folio);
+ if (atomic_long_inc_return(&nfss->writeback) > NFS_CONGESTION_ON_THRESH)
nfss->write_congested = 1;
}
-static void nfs_end_page_writeback(struct nfs_page *req)
+static void nfs_folio_end_writeback(struct folio *folio)
{
- struct inode *inode = page_file_mapping(req->wb_page)->host;
- struct nfs_server *nfss = NFS_SERVER(inode);
- bool is_done;
+ struct nfs_server *nfss = NFS_SERVER(folio_file_mapping(folio)->host);
- is_done = nfs_page_group_sync_on_bit(req, PG_WB_END);
- nfs_unlock_request(req);
- if (!is_done)
- return;
-
- end_page_writeback(req->wb_page);
- if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
+ folio_end_writeback(folio);
+ if (atomic_long_dec_return(&nfss->writeback) <
+ NFS_CONGESTION_OFF_THRESH)
nfss->write_congested = 0;
}
+static void nfs_page_end_writeback(struct nfs_page *req)
+{
+ if (nfs_page_group_sync_on_bit(req, PG_WB_END)) {
+ nfs_unlock_request(req);
+ nfs_folio_end_writeback(nfs_page_to_folio(req));
+ } else
+ nfs_unlock_request(req);
+}
+
/*
* nfs_destroy_unlinked_subrequests - destroy recently unlinked subrequests
*
/*
* nfs_lock_and_join_requests - join all subreqs to the head req
- * @page: the page used to lookup the "page group" of nfs_page structures
+ * @folio: the folio used to lookup the "page group" of nfs_page structures
*
* This function joins all sub requests to the head request by first
* locking all requests in the group, cancelling any pending operations
*
* Returns a locked, referenced pointer to the head request - which after
* this call is guaranteed to be the only request associated with the page.
- * Returns NULL if no requests are found for @page, or a ERR_PTR if an
+ * Returns NULL if no requests are found for @folio, or a ERR_PTR if an
* error was encountered.
*/
-static struct nfs_page *
-nfs_lock_and_join_requests(struct page *page)
+static struct nfs_page *nfs_lock_and_join_requests(struct folio *folio)
{
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = folio_file_mapping(folio)->host;
struct nfs_page *head;
int ret;
* reference to the whole page group - the group will not be destroyed
* until the head reference is released.
*/
- head = nfs_find_and_lock_page_request(page);
+ head = nfs_folio_find_and_lock_request(folio);
if (IS_ERR_OR_NULL(head))
return head;
static void nfs_write_error(struct nfs_page *req, int error)
{
- trace_nfs_write_error(page_file_mapping(req->wb_page)->host, req,
- error);
- nfs_mapping_set_error(req->wb_page, error);
+ trace_nfs_write_error(nfs_page_to_inode(req), req, error);
+ nfs_mapping_set_error(nfs_page_to_folio(req), error);
nfs_inode_remove_request(req);
- nfs_end_page_writeback(req);
+ nfs_page_end_writeback(req);
nfs_release_request(req);
}
* Find an associated nfs write request, and prepare to flush it out
* May return an error if the user signalled nfs_wait_on_request().
*/
-static int nfs_page_async_flush(struct page *page,
+static int nfs_page_async_flush(struct folio *folio,
struct writeback_control *wbc,
struct nfs_pageio_descriptor *pgio)
{
struct nfs_page *req;
int ret = 0;
- req = nfs_lock_and_join_requests(page);
+ req = nfs_lock_and_join_requests(folio);
if (!req)
goto out;
ret = PTR_ERR(req);
if (IS_ERR(req))
goto out;
- nfs_set_page_writeback(page);
+ nfs_folio_set_writeback(folio);
WARN_ON_ONCE(test_bit(PG_CLEAN, &req->wb_flags));
/* If there is a fatal error that covers this write, just exit */
goto out_launder;
if (wbc->sync_mode == WB_SYNC_NONE)
ret = AOP_WRITEPAGE_ACTIVATE;
- redirty_page_for_writepage(wbc, page);
+ folio_redirty_for_writepage(wbc, folio);
nfs_redirty_request(req);
pgio->pg_error = 0;
} else
- nfs_add_stats(page_file_mapping(page)->host,
- NFSIOS_WRITEPAGES, 1);
+ nfs_add_stats(folio_file_mapping(folio)->host,
+ NFSIOS_WRITEPAGES, 1);
out:
return ret;
out_launder:
return 0;
}
-static int nfs_do_writepage(struct page *page, struct writeback_control *wbc,
+static int nfs_do_writepage(struct folio *folio, struct writeback_control *wbc,
struct nfs_pageio_descriptor *pgio)
{
- nfs_pageio_cond_complete(pgio, page_index(page));
- return nfs_page_async_flush(page, wbc, pgio);
+ nfs_pageio_cond_complete(pgio, folio_index(folio));
+ return nfs_page_async_flush(folio, wbc, pgio);
}
/*
* Write an mmapped page to the server.
*/
-static int nfs_writepage_locked(struct page *page,
+static int nfs_writepage_locked(struct folio *folio,
struct writeback_control *wbc)
{
struct nfs_pageio_descriptor pgio;
- struct inode *inode = page_file_mapping(page)->host;
+ struct inode *inode = folio_file_mapping(folio)->host;
int err;
if (wbc->sync_mode == WB_SYNC_NONE &&
return AOP_WRITEPAGE_ACTIVATE;
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
- nfs_pageio_init_write(&pgio, inode, 0,
- false, &nfs_async_write_completion_ops);
- err = nfs_do_writepage(page, wbc, &pgio);
+ nfs_pageio_init_write(&pgio, inode, 0, false,
+ &nfs_async_write_completion_ops);
+ err = nfs_do_writepage(folio, wbc, &pgio);
pgio.pg_error = 0;
nfs_pageio_complete(&pgio);
return err;
int nfs_writepage(struct page *page, struct writeback_control *wbc)
{
+ struct folio *folio = page_folio(page);
int ret;
- ret = nfs_writepage_locked(page, wbc);
+ ret = nfs_writepage_locked(folio, wbc);
if (ret != AOP_WRITEPAGE_ACTIVATE)
unlock_page(page);
return ret;
}
- static int nfs_writepages_callback(struct page *page,
+ static int nfs_writepages_callback(struct folio *folio,
- struct writeback_control *wbc, void *data)
+ struct writeback_control *wbc, void *data)
{
- struct folio *folio = page_folio(page);
int ret;
- ret = nfs_do_writepage(&folio->page, wbc, data);
+ ret = nfs_do_writepage(folio, wbc, data);
if (ret != AOP_WRITEPAGE_ACTIVATE)
- unlock_page(page);
+ folio_unlock(folio);
return ret;
}
/*
* Insert a write request into an inode
*/
-static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
+static void nfs_inode_add_request(struct nfs_page *req)
{
- struct address_space *mapping = page_file_mapping(req->wb_page);
- struct nfs_inode *nfsi = NFS_I(inode);
+ struct folio *folio = nfs_page_to_folio(req);
+ struct address_space *mapping = folio_file_mapping(folio);
+ struct nfs_inode *nfsi = NFS_I(mapping->host);
WARN_ON_ONCE(req->wb_this_page != req);
* with invalidate/truncate.
*/
spin_lock(&mapping->private_lock);
- if (likely(!PageSwapCache(req->wb_page))) {
+ if (likely(!folio_test_swapcache(folio))) {
set_bit(PG_MAPPED, &req->wb_flags);
- SetPagePrivate(req->wb_page);
- set_page_private(req->wb_page, (unsigned long)req);
+ folio_set_private(folio);
+ folio->private = req;
}
spin_unlock(&mapping->private_lock);
atomic_long_inc(&nfsi->nrequests);
*/
static void nfs_inode_remove_request(struct nfs_page *req)
{
- struct address_space *mapping = page_file_mapping(req->wb_page);
- struct inode *inode = mapping->host;
- struct nfs_inode *nfsi = NFS_I(inode);
- struct nfs_page *head;
-
if (nfs_page_group_sync_on_bit(req, PG_REMOVE)) {
- head = req->wb_head;
+ struct folio *folio = nfs_page_to_folio(req->wb_head);
+ struct address_space *mapping = folio_file_mapping(folio);
spin_lock(&mapping->private_lock);
- if (likely(head->wb_page && !PageSwapCache(head->wb_page))) {
- set_page_private(head->wb_page, 0);
- ClearPagePrivate(head->wb_page);
- clear_bit(PG_MAPPED, &head->wb_flags);
+ if (likely(folio && !folio_test_swapcache(folio))) {
+ folio->private = NULL;
+ folio_clear_private(folio);
+ clear_bit(PG_MAPPED, &req->wb_head->wb_flags);
}
spin_unlock(&mapping->private_lock);
}
if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
nfs_release_request(req);
- atomic_long_dec(&nfsi->nrequests);
+ atomic_long_dec(&NFS_I(nfs_page_to_inode(req))->nrequests);
}
}
-static void
-nfs_mark_request_dirty(struct nfs_page *req)
+static void nfs_mark_request_dirty(struct nfs_page *req)
{
- if (req->wb_page)
- __set_page_dirty_nobuffers(req->wb_page);
+ struct folio *folio = nfs_page_to_folio(req);
+ if (folio)
+ filemap_dirty_folio(folio_mapping(folio), folio);
}
/*
* nfs_page_search_commits_for_head_request_locked
*
- * Search through commit lists on @inode for the head request for @page.
+ * Search through commit lists on @inode for the head request for @folio.
* Must be called while holding the inode (which is cinfo) lock.
*
* Returns the head request if found, or NULL if not found.
*/
static struct nfs_page *
nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
- struct page *page)
+ struct folio *folio)
{
struct nfs_page *freq, *t;
struct nfs_commit_info cinfo;
nfs_init_cinfo_from_inode(&cinfo, inode);
/* search through pnfs commit lists */
- freq = pnfs_search_commit_reqs(inode, &cinfo, page);
+ freq = pnfs_search_commit_reqs(inode, &cinfo, folio);
if (freq)
return freq->wb_head;
/* Linearly search the commit list for the correct request */
list_for_each_entry_safe(freq, t, &cinfo.mds->list, wb_list) {
- if (freq->wb_page == page)
+ if (nfs_page_to_folio(freq) == folio)
return freq->wb_head;
}
mutex_lock(&NFS_I(cinfo->inode)->commit_mutex);
nfs_request_add_commit_list_locked(req, &cinfo->mds->list, cinfo);
mutex_unlock(&NFS_I(cinfo->inode)->commit_mutex);
- if (req->wb_page)
- nfs_mark_page_unstable(req->wb_page, cinfo);
+ nfs_folio_mark_unstable(nfs_page_to_folio(req), cinfo);
}
EXPORT_SYMBOL_GPL(nfs_request_add_commit_list);
nfs_request_add_commit_list(req, cinfo);
}
-static void
-nfs_clear_page_commit(struct page *page)
+static void nfs_folio_clear_commit(struct folio *folio)
{
- dec_node_page_state(page, NR_WRITEBACK);
- dec_wb_stat(&inode_to_bdi(page_file_mapping(page)->host)->wb,
- WB_WRITEBACK);
+ if (folio) {
+ long nr = folio_nr_pages(folio);
+
+ node_stat_mod_folio(folio, NR_WRITEBACK, -nr);
+ wb_stat_mod(&inode_to_bdi(folio_file_mapping(folio)->host)->wb,
+ WB_WRITEBACK, -nr);
+ }
}
/* Called holding the request lock on @req */
nfs_request_remove_commit_list(req, &cinfo);
}
mutex_unlock(&NFS_I(inode)->commit_mutex);
- nfs_clear_page_commit(req->wb_page);
+ nfs_folio_clear_commit(nfs_page_to_folio(req));
}
}
if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) &&
(hdr->good_bytes < bytes)) {
trace_nfs_comp_error(hdr->inode, req, hdr->error);
- nfs_mapping_set_error(req->wb_page, hdr->error);
+ nfs_mapping_set_error(nfs_page_to_folio(req),
+ hdr->error);
goto remove_req;
}
if (nfs_write_need_commit(hdr)) {
remove_req:
nfs_inode_remove_request(req);
next:
- nfs_end_page_writeback(req);
+ nfs_page_end_writeback(req);
nfs_release_request(req);
}
out:
* If the attempt fails, then the existing request is flushed out
* to disk.
*/
-static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
- struct page *page,
- unsigned int offset,
- unsigned int bytes)
+static struct nfs_page *nfs_try_to_update_request(struct folio *folio,
+ unsigned int offset,
+ unsigned int bytes)
{
struct nfs_page *req;
unsigned int rqend;
end = offset + bytes;
- req = nfs_lock_and_join_requests(page);
+ req = nfs_lock_and_join_requests(folio);
if (IS_ERR_OR_NULL(req))
return req;
*/
nfs_mark_request_dirty(req);
nfs_unlock_and_release_request(req);
- error = nfs_wb_page(inode, page);
+ error = nfs_wb_folio(folio_file_mapping(folio)->host, folio);
return (error < 0) ? ERR_PTR(error) : NULL;
}
* if we have to add a new request. Also assumes that the caller has
* already called nfs_flush_incompatible() if necessary.
*/
-static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
- struct page *page, unsigned int offset, unsigned int bytes)
+static struct nfs_page *nfs_setup_write_request(struct nfs_open_context *ctx,
+ struct folio *folio,
+ unsigned int offset,
+ unsigned int bytes)
{
- struct inode *inode = page_file_mapping(page)->host;
- struct nfs_page *req;
+ struct nfs_page *req;
- req = nfs_try_to_update_request(inode, page, offset, bytes);
+ req = nfs_try_to_update_request(folio, offset, bytes);
if (req != NULL)
goto out;
- req = nfs_create_request(ctx, page, offset, bytes);
+ req = nfs_page_create_from_folio(ctx, folio, offset, bytes);
if (IS_ERR(req))
goto out;
- nfs_inode_add_request(inode, req);
+ nfs_inode_add_request(req);
out:
return req;
}
-static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
- unsigned int offset, unsigned int count)
+static int nfs_writepage_setup(struct nfs_open_context *ctx,
+ struct folio *folio, unsigned int offset,
+ unsigned int count)
{
- struct nfs_page *req;
+ struct nfs_page *req;
- req = nfs_setup_write_request(ctx, page, offset, count);
+ req = nfs_setup_write_request(ctx, folio, offset, count);
if (IS_ERR(req))
return PTR_ERR(req);
/* Update file length */
- nfs_grow_file(page, offset, count);
+ nfs_grow_file(folio, offset, count);
nfs_mark_uptodate(req);
nfs_mark_request_dirty(req);
nfs_unlock_and_release_request(req);
return 0;
}
-int nfs_flush_incompatible(struct file *file, struct page *page)
+int nfs_flush_incompatible(struct file *file, struct folio *folio)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
struct nfs_lock_context *l_ctx;
* dropped page.
*/
do {
- req = nfs_page_find_head_request(page);
+ req = nfs_folio_find_head_request(folio);
if (req == NULL)
return 0;
l_ctx = req->wb_lock_context;
- do_flush = req->wb_page != page ||
- !nfs_match_open_context(nfs_req_openctx(req), ctx);
+ do_flush = nfs_page_to_folio(req) != folio ||
+ !nfs_match_open_context(nfs_req_openctx(req), ctx);
if (l_ctx && flctx &&
!(list_empty_careful(&flctx->flc_posix) &&
list_empty_careful(&flctx->flc_flock))) {
nfs_release_request(req);
if (!do_flush)
return 0;
- status = nfs_wb_page(page_file_mapping(page)->host, page);
+ status = nfs_wb_folio(folio_file_mapping(folio)->host, folio);
} while (status == 0);
return status;
}
* the PageUptodate() flag. In this case, we will need to turn off
* write optimisations that depend on the page contents being correct.
*/
-static bool nfs_write_pageuptodate(struct page *page, struct inode *inode,
- unsigned int pagelen)
+static bool nfs_folio_write_uptodate(struct folio *folio, unsigned int pagelen)
{
+ struct inode *inode = folio_file_mapping(folio)->host;
struct nfs_inode *nfsi = NFS_I(inode);
if (nfs_have_delegated_attributes(inode))
out:
if (nfsi->cache_validity & NFS_INO_INVALID_DATA && pagelen != 0)
return false;
- return PageUptodate(page) != 0;
+ return folio_test_uptodate(folio) != 0;
}
static bool
* If the file is opened for synchronous writes then we can just skip the rest
* of the checks.
*/
-static int nfs_can_extend_write(struct file *file, struct page *page,
- struct inode *inode, unsigned int pagelen)
+static int nfs_can_extend_write(struct file *file, struct folio *folio,
+ unsigned int pagelen)
{
- int ret;
+ struct inode *inode = file_inode(file);
struct file_lock_context *flctx = locks_inode_context(inode);
struct file_lock *fl;
+ int ret;
if (file->f_flags & O_DSYNC)
return 0;
- if (!nfs_write_pageuptodate(page, inode, pagelen))
+ if (!nfs_folio_write_uptodate(folio, pagelen))
return 0;
if (NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
return 1;
* XXX: Keep an eye on generic_file_read to make sure it doesn't do bad
* things with a page scheduled for an RPC call (e.g. invalidate it).
*/
-int nfs_updatepage(struct file *file, struct page *page,
- unsigned int offset, unsigned int count)
+int nfs_update_folio(struct file *file, struct folio *folio,
+ unsigned int offset, unsigned int count)
{
struct nfs_open_context *ctx = nfs_file_open_context(file);
- struct address_space *mapping = page_file_mapping(page);
- struct inode *inode = mapping->host;
- unsigned int pagelen = nfs_page_length(page);
+ struct address_space *mapping = folio_file_mapping(folio);
+ struct inode *inode = mapping->host;
+ unsigned int pagelen = nfs_folio_length(folio);
int status = 0;
nfs_inc_stats(inode, NFSIOS_VFSUPDATEPAGE);
- dprintk("NFS: nfs_updatepage(%pD2 %d@%lld)\n",
- file, count, (long long)(page_file_offset(page) + offset));
+ dprintk("NFS: nfs_update_folio(%pD2 %d@%lld)\n", file, count,
+ (long long)(folio_file_pos(folio) + offset));
if (!count)
goto out;
- if (nfs_can_extend_write(file, page, inode, pagelen)) {
+ if (nfs_can_extend_write(file, folio, pagelen)) {
count = max(count + offset, pagelen);
offset = 0;
}
- status = nfs_writepage_setup(ctx, page, offset, count);
+ status = nfs_writepage_setup(ctx, folio, offset, count);
if (status < 0)
nfs_set_pageerror(mapping);
out:
- dprintk("NFS: nfs_updatepage returns %d (isize %lld)\n",
+ dprintk("NFS: nfs_update_folio returns %d (isize %lld)\n",
status, (long long)i_size_read(inode));
return status;
}
*/
static void nfs_redirty_request(struct nfs_page *req)
{
- struct nfs_inode *nfsi = NFS_I(page_file_mapping(req->wb_page)->host);
+ struct nfs_inode *nfsi = NFS_I(nfs_page_to_inode(req));
/* Bump the transmission count */
req->wb_nio++;
nfs_mark_request_dirty(req);
atomic_long_inc(&nfsi->redirtied_pages);
- nfs_end_page_writeback(req);
+ nfs_page_end_writeback(req);
nfs_release_request(req);
}
req = nfs_list_entry(page_list->next);
nfs_list_remove_request(req);
nfs_mark_request_commit(req, lseg, cinfo, ds_commit_idx);
- if (!cinfo->dreq)
- nfs_clear_page_commit(req->wb_page);
+ nfs_folio_clear_commit(nfs_page_to_folio(req));
nfs_unlock_and_release_request(req);
}
}
EXPORT_SYMBOL_GPL(nfs_retry_commit);
-static void
-nfs_commit_resched_write(struct nfs_commit_info *cinfo,
- struct nfs_page *req)
+static void nfs_commit_resched_write(struct nfs_commit_info *cinfo,
+ struct nfs_page *req)
{
- __set_page_dirty_nobuffers(req->wb_page);
+ struct folio *folio = nfs_page_to_folio(req);
+
+ filemap_dirty_folio(folio_mapping(folio), folio);
}
/*
int status = data->task.tk_status;
struct nfs_commit_info cinfo;
struct nfs_server *nfss;
+ struct folio *folio;
while (!list_empty(&data->pages)) {
req = nfs_list_entry(data->pages.next);
nfs_list_remove_request(req);
- if (req->wb_page)
- nfs_clear_page_commit(req->wb_page);
+ folio = nfs_page_to_folio(req);
+ nfs_folio_clear_commit(folio);
dprintk("NFS: commit (%s/%llu %d@%lld)",
nfs_req_openctx(req)->dentry->d_sb->s_id,
req->wb_bytes,
(long long)req_offset(req));
if (status < 0) {
- if (req->wb_page) {
+ if (folio) {
trace_nfs_commit_error(data->inode, req,
status);
- nfs_mapping_set_error(req->wb_page, status);
+ nfs_mapping_set_error(folio, status);
nfs_inode_remove_request(req);
}
dprintk_cont(", error = %d\n", status);
* returned by the server against all stored verfs. */
if (nfs_write_match_verf(verf, req)) {
/* We have a match */
- if (req->wb_page)
+ if (folio)
nfs_inode_remove_request(req);
dprintk_cont(" OK\n");
goto next;
/* blocking call to cancel all requests and join to a single (head)
* request */
- req = nfs_lock_and_join_requests(&folio->page);
+ req = nfs_lock_and_join_requests(folio);
if (IS_ERR(req)) {
ret = PTR_ERR(req);
return ret;
}
-/*
- * Write back all requests on one page - we do this before reading it.
+/**
+ * nfs_wb_folio - Write back all requests on one page
+ * @inode: pointer to page
+ * @folio: pointer to folio
+ *
+ * Assumes that the folio has been locked by the caller, and will
+ * not unlock it.
*/
-int nfs_wb_page(struct inode *inode, struct page *page)
+int nfs_wb_folio(struct inode *inode, struct folio *folio)
{
- loff_t range_start = page_file_offset(page);
- loff_t range_end = range_start + (loff_t)(PAGE_SIZE - 1);
+ loff_t range_start = folio_file_pos(folio);
+ loff_t range_end = range_start + (loff_t)folio_size(folio) - 1;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = 0,
};
int ret;
- trace_nfs_writeback_page_enter(inode);
+ trace_nfs_writeback_folio(inode, folio);
for (;;) {
- wait_on_page_writeback(page);
- if (clear_page_dirty_for_io(page)) {
- ret = nfs_writepage_locked(page, &wbc);
+ folio_wait_writeback(folio);
+ if (folio_clear_dirty_for_io(folio)) {
+ ret = nfs_writepage_locked(folio, &wbc);
if (ret < 0)
goto out_error;
continue;
}
ret = 0;
- if (!PagePrivate(page))
+ if (!folio_test_private(folio))
break;
ret = nfs_commit_inode(inode, FLUSH_SYNC);
if (ret < 0)
goto out_error;
}
out_error:
- trace_nfs_writeback_page_exit(inode, ret);
+ trace_nfs_writeback_folio_done(inode, folio, ret);
return ret;
}
return err;
}
- static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
+ static int ntfs_resident_writepage(struct folio *folio,
+ struct writeback_control *wbc, void *data)
{
- struct address_space *mapping = page->mapping;
- struct inode *inode = mapping->host;
- struct ntfs_inode *ni = ntfs_i(inode);
- int err;
+ struct address_space *mapping = data;
+ struct ntfs_inode *ni = ntfs_i(mapping->host);
+ int ret;
- if (is_resident(ni)) {
- ni_lock(ni);
- err = attr_data_write_resident(ni, page);
- ni_unlock(ni);
- if (err != E_NTFS_NONRESIDENT) {
- unlock_page(page);
- return err;
- }
- }
+ ni_lock(ni);
+ ret = attr_data_write_resident(ni, &folio->page);
+ ni_unlock(ni);
- return block_write_full_page(page, ntfs_get_block, wbc);
+ if (ret != E_NTFS_NONRESIDENT)
+ folio_unlock(folio);
+ mapping_set_error(mapping, ret);
+ return ret;
}
static int ntfs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- /* Redirect call to 'ntfs_writepage' for resident files. */
if (is_resident(ntfs_i(mapping->host)))
- return generic_writepages(mapping, wbc);
+ return write_cache_pages(mapping, wbc, ntfs_resident_writepage,
+ mapping);
return mpage_writepages(mapping, wbc, ntfs_get_block);
}
*
* NOTE: if fnd != NULL (ntfs_atomic_open) then @dir is locked
*/
-struct inode *ntfs_create_inode(struct user_namespace *mnt_userns,
+struct inode *ntfs_create_inode(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry,
const struct cpu_str *uni, umode_t mode,
dev_t dev, const char *symname, u32 size,
goto out3;
}
inode = &ni->vfs_inode;
- inode_init_owner(mnt_userns, inode, dir, mode);
+ inode_init_owner(idmap, inode, dir, mode);
mode = inode->i_mode;
inode->i_atime = inode->i_mtime = inode->i_ctime = ni->i_crtime =
#ifdef CONFIG_NTFS3_FS_POSIX_ACL
if (!S_ISLNK(mode) && (sb->s_flags & SB_POSIXACL)) {
- err = ntfs_init_acl(mnt_userns, inode, dir);
+ err = ntfs_init_acl(idmap, inode, dir);
if (err)
goto out7;
} else
const struct address_space_operations ntfs_aops = {
.read_folio = ntfs_read_folio,
.readahead = ntfs_readahead,
- .writepage = ntfs_writepage,
.writepages = ntfs_writepages,
.write_begin = ntfs_write_begin,
.write_end = ntfs_write_end,
.direct_IO = ntfs_direct_IO,
.bmap = ntfs_bmap,
.dirty_folio = block_dirty_folio,
+ .migrate_folio = buffer_migrate_folio,
.invalidate_folio = block_invalidate_folio,
};
#include "orangefs-kernel.h"
#include "orangefs-bufmap.h"
#include <linux/fs.h>
+#include <linux/filelock.h>
#include <linux/pagemap.h>
static int flush_racache(struct inode *inode)
"orangefs_file_mmap: called on %pD\n", file);
/* set the sequential readahead hint */
- vma->vm_flags |= VM_SEQ_READ;
- vma->vm_flags &= ~VM_RAND_READ;
+ vm_flags_mod(vma, VM_SEQ_READ, VM_RAND_READ);
file_accessed(file);
vma->vm_ops = &orangefs_file_vm_ops;
/* Should've been handled in orangefs_invalidate_folio. */
WARN_ON(off == len || off + wlen > len);
- bv.bv_page = page;
- bv.bv_len = wlen;
- bv.bv_offset = off % PAGE_SIZE;
WARN_ON(wlen == 0);
+ bvec_set_page(&bv, page, wlen, off % PAGE_SIZE);
iov_iter_bvec(&iter, ITER_SOURCE, &bv, 1, wlen);
ret = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter, wlen,
for (i = 0; i < ow->npages; i++) {
set_page_writeback(ow->pages[i]);
- ow->bv[i].bv_page = ow->pages[i];
- ow->bv[i].bv_len = min(page_offset(ow->pages[i]) + PAGE_SIZE,
- ow->off + ow->len) -
- max(ow->off, page_offset(ow->pages[i]));
- if (i == 0)
- ow->bv[i].bv_offset = ow->off -
- page_offset(ow->pages[i]);
- else
- ow->bv[i].bv_offset = 0;
+ bvec_set_page(&ow->bv[i], ow->pages[i],
+ min(page_offset(ow->pages[i]) + PAGE_SIZE,
+ ow->off + ow->len) -
+ max(ow->off, page_offset(ow->pages[i])),
+ i == 0 ? ow->off - page_offset(ow->pages[i]) : 0);
}
iov_iter_bvec(&iter, ITER_SOURCE, ow->bv, ow->npages, ow->len);
return ret;
}
- static int orangefs_writepages_callback(struct page *page,
- struct writeback_control *wbc, void *data)
+ static int orangefs_writepages_callback(struct folio *folio,
+ struct writeback_control *wbc, void *data)
{
struct orangefs_writepages *ow = data;
- struct orangefs_write_range *wr;
+ struct orangefs_write_range *wr = folio->private;
int ret;
- if (!PagePrivate(page)) {
- unlock_page(page);
+ if (!wr) {
+ folio_unlock(folio);
/* It's not private so there's nothing to write, right? */
printk("writepages_callback not private!\n");
BUG();
return 0;
}
- wr = (struct orangefs_write_range *)page_private(page);
ret = -1;
if (ow->npages == 0) {
ow->len = wr->len;
ow->uid = wr->uid;
ow->gid = wr->gid;
- ow->pages[ow->npages++] = page;
+ ow->pages[ow->npages++] = &folio->page;
ret = 0;
goto done;
}
}
if (ow->off + ow->len == wr->pos) {
ow->len += wr->len;
- ow->pages[ow->npages++] = page;
+ ow->pages[ow->npages++] = &folio->page;
ret = 0;
goto done;
}
orangefs_writepages_work(ow, wbc);
ow->npages = 0;
}
- ret = orangefs_writepage_locked(page, wbc);
- mapping_set_error(page->mapping, ret);
- unlock_page(page);
- end_page_writeback(page);
+ ret = orangefs_writepage_locked(&folio->page, wbc);
+ mapping_set_error(folio->mapping, ret);
+ folio_unlock(folio);
+ folio_end_writeback(folio);
} else {
if (ow->npages == ow->maxpages) {
orangefs_writepages_work(ow, wbc);
orangefs_launder_folio(folio);
off = folio_pos(folio);
- bv.bv_page = &folio->page;
- bv.bv_len = folio_size(folio);
- bv.bv_offset = 0;
+ bvec_set_folio(&bv, folio, folio_size(folio), 0);
iov_iter_bvec(&iter, ITER_DEST, &bv, 1, folio_size(folio));
ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, &off, &iter,
ORANGEFS_I(inode)->attr_uid = current_fsuid();
ORANGEFS_I(inode)->attr_gid = current_fsgid();
}
- setattr_copy(&init_user_ns, inode, iattr);
+ setattr_copy(&nop_mnt_idmap, inode, iattr);
spin_unlock(&inode->i_lock);
mark_inode_dirty(inode);
ret = __orangefs_setattr(inode, iattr);
/* change mode on a file that has ACLs */
if (!ret && (iattr->ia_valid & ATTR_MODE))
- ret = posix_acl_chmod(&init_user_ns, dentry, inode->i_mode);
+ ret = posix_acl_chmod(&nop_mnt_idmap, dentry, inode->i_mode);
return ret;
}
/*
* Change attributes of an object referenced by dentry.
*/
-int orangefs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+int orangefs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *iattr)
{
int ret;
gossip_debug(GOSSIP_INODE_DEBUG, "__orangefs_setattr: called on %pd\n",
dentry);
- ret = setattr_prepare(&init_user_ns, dentry, iattr);
+ ret = setattr_prepare(&nop_mnt_idmap, dentry, iattr);
if (ret)
goto out;
ret = __orangefs_setattr_mode(dentry, iattr);
/*
* Obtain attributes of an object given a dentry
*/
-int orangefs_getattr(struct user_namespace *mnt_userns, const struct path *path,
+int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int flags)
{
int ret;
ret = orangefs_inode_getattr(inode,
request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
if (ret == 0) {
- generic_fillattr(&init_user_ns, inode, stat);
+ generic_fillattr(&nop_mnt_idmap, inode, stat);
/* override block size reported to stat */
if (!(request_mask & STATX_SIZE))
return ret;
}
-int orangefs_permission(struct user_namespace *mnt_userns,
+int orangefs_permission(struct mnt_idmap *idmap,
struct inode *inode, int mask)
{
int ret;
if (ret < 0)
return ret;
- return generic_permission(&init_user_ns, inode, mask);
+ return generic_permission(&nop_mnt_idmap, inode, mask);
}
int orangefs_update_time(struct inode *inode, struct timespec64 *time, int flags)
return 0;
}
-static int orangefs_fileattr_set(struct user_namespace *mnt_userns,
+static int orangefs_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa)
{
u64 val = 0;
#include <linux/uaccess.h>
#include "internal.h"
-static int ramfs_nommu_setattr(struct user_namespace *, struct dentry *, struct iattr *);
+static int ramfs_nommu_setattr(struct mnt_idmap *, struct dentry *, struct iattr *);
static unsigned long ramfs_nommu_get_unmapped_area(struct file *file,
unsigned long addr,
unsigned long len,
* handle a change of attributes
* - we're specifically interested in a change of size
*/
-static int ramfs_nommu_setattr(struct user_namespace *mnt_userns,
+static int ramfs_nommu_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *ia)
{
struct inode *inode = d_inode(dentry);
int ret = 0;
/* POSIX UID/GID verification for setting inode attributes */
- ret = setattr_prepare(&init_user_ns, dentry, ia);
+ ret = setattr_prepare(&nop_mnt_idmap, dentry, ia);
if (ret)
return ret;
}
}
- setattr_copy(&init_user_ns, inode, ia);
+ setattr_copy(&nop_mnt_idmap, inode, ia);
out:
ia->ia_valid = old_ia_valid;
return ret;
*/
static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma)
{
- if (!(vma->vm_flags & (VM_SHARED | VM_MAYSHARE)))
+ if (!is_nommu_shared_mapping(vma->vm_flags))
return -ENOSYS;
file_accessed(file);
#define FE_DELETE_PERMS (FE_PERM_U_DELETE | FE_PERM_G_DELETE | \
FE_PERM_O_DELETE)
+struct udf_map_rq;
+
static umode_t udf_convert_permissions(struct fileEntry *);
static int udf_update_inode(struct inode *, int);
static int udf_sync_inode(struct inode *inode);
static int udf_alloc_i_data(struct inode *inode, size_t size);
-static sector_t inode_getblk(struct inode *, sector_t, int *, int *);
-static int8_t udf_insert_aext(struct inode *, struct extent_position,
- struct kernel_lb_addr, uint32_t);
+static int inode_getblk(struct inode *inode, struct udf_map_rq *map);
+static int udf_insert_aext(struct inode *, struct extent_position,
+ struct kernel_lb_addr, uint32_t);
static void udf_split_extents(struct inode *, int *, int, udf_pblk_t,
struct kernel_long_ad *, int *);
static void udf_prealloc_extents(struct inode *, int, int,
struct kernel_long_ad *, int *);
static void udf_merge_extents(struct inode *, struct kernel_long_ad *, int *);
-static void udf_update_extents(struct inode *, struct kernel_long_ad *, int,
- int, struct extent_position *);
-static int udf_get_block(struct inode *, sector_t, struct buffer_head *, int);
+static int udf_update_extents(struct inode *, struct kernel_long_ad *, int,
+ int, struct extent_position *);
+static int udf_get_block_wb(struct inode *inode, sector_t block,
+ struct buffer_head *bh_result, int create);
static void __udf_clear_extent_cache(struct inode *inode)
{
}
}
- static int udf_adinicb_writepage(struct page *page,
++static int udf_adinicb_writepage(struct folio *folio,
+ struct writeback_control *wbc, void *data)
+{
++ struct page *page = &folio->page;
+ struct inode *inode = page->mapping->host;
+ struct udf_inode_info *iinfo = UDF_I(inode);
+
+ BUG_ON(!PageLocked(page));
+ memcpy_to_page(page, 0, iinfo->i_data + iinfo->i_lenEAttr,
+ i_size_read(inode));
+ unlock_page(page);
+ mark_inode_dirty(inode);
+
+ return 0;
+}
+
static int udf_writepages(struct address_space *mapping,
- struct writeback_control *wbc)
+ struct writeback_control *wbc)
{
- return mpage_writepages(mapping, wbc, udf_get_block);
+ struct inode *inode = mapping->host;
+ struct udf_inode_info *iinfo = UDF_I(inode);
+
+ if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB)
+ return mpage_writepages(mapping, wbc, udf_get_block_wb);
+ return write_cache_pages(mapping, wbc, udf_adinicb_writepage, NULL);
+}
+
+static void udf_adinicb_readpage(struct page *page)
+{
+ struct inode *inode = page->mapping->host;
+ char *kaddr;
+ struct udf_inode_info *iinfo = UDF_I(inode);
+ loff_t isize = i_size_read(inode);
+
+ kaddr = kmap_local_page(page);
+ memcpy(kaddr, iinfo->i_data + iinfo->i_lenEAttr, isize);
+ memset(kaddr + isize, 0, PAGE_SIZE - isize);
+ flush_dcache_page(page);
+ SetPageUptodate(page);
+ kunmap_local(kaddr);
}
static int udf_read_folio(struct file *file, struct folio *folio)
{
+ struct udf_inode_info *iinfo = UDF_I(file_inode(file));
+
+ if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
+ udf_adinicb_readpage(&folio->page);
+ folio_unlock(folio);
+ return 0;
+ }
return mpage_read_folio(folio, udf_get_block);
}
}
static int udf_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len,
- struct page **pagep, void **fsdata)
+ loff_t pos, unsigned len,
+ struct page **pagep, void **fsdata)
{
+ struct udf_inode_info *iinfo = UDF_I(file_inode(file));
+ struct page *page;
int ret;
- ret = block_write_begin(mapping, pos, len, pagep, udf_get_block);
- if (unlikely(ret))
- udf_write_failed(mapping, pos + len);
- return ret;
+ if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB) {
+ ret = block_write_begin(mapping, pos, len, pagep,
+ udf_get_block);
+ if (unlikely(ret))
+ udf_write_failed(mapping, pos + len);
+ return ret;
+ }
+ if (WARN_ON_ONCE(pos >= PAGE_SIZE))
+ return -EIO;
+ page = grab_cache_page_write_begin(mapping, 0);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+ if (!PageUptodate(page))
+ udf_adinicb_readpage(page);
+ return 0;
+}
+
+static int udf_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = file_inode(file);
+ loff_t last_pos;
+
+ if (UDF_I(inode)->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB)
+ return generic_write_end(file, mapping, pos, len, copied, page,
+ fsdata);
+ last_pos = pos + copied;
+ if (last_pos > inode->i_size)
+ i_size_write(inode, last_pos);
+ set_page_dirty(page);
+ unlock_page(page);
+ put_page(page);
+
+ return copied;
}
static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
size_t count = iov_iter_count(iter);
ssize_t ret;
+ /* Fallback to buffered IO for in-ICB files */
+ if (UDF_I(inode)->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
+ return 0;
ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block);
if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE))
udf_write_failed(mapping, iocb->ki_pos + count);
static sector_t udf_bmap(struct address_space *mapping, sector_t block)
{
+ struct udf_inode_info *iinfo = UDF_I(mapping->host);
+
+ if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
+ return -EINVAL;
return generic_block_bmap(mapping, block, udf_get_block);
}
.readahead = udf_readahead,
.writepages = udf_writepages,
.write_begin = udf_write_begin,
- .write_end = generic_write_end,
+ .write_end = udf_write_end,
.direct_IO = udf_direct_IO,
.bmap = udf_bmap,
.migrate_folio = buffer_migrate_folio,
/*
* Expand file stored in ICB to a normal one-block-file
*
- * This function requires i_data_sem for writing and releases it.
* This function requires i_mutex held
*/
int udf_expand_file_adinicb(struct inode *inode)
{
struct page *page;
- char *kaddr;
struct udf_inode_info *iinfo = UDF_I(inode);
int err;
WARN_ON_ONCE(!inode_is_locked(inode));
if (!iinfo->i_lenAlloc) {
+ down_write(&iinfo->i_data_sem);
if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_SHORT_AD))
iinfo->i_alloc_type = ICBTAG_FLAG_AD_SHORT;
else
mark_inode_dirty(inode);
return 0;
}
- /*
- * Release i_data_sem so that we can lock a page - page lock ranks
- * above i_data_sem. i_mutex still protects us against file changes.
- */
- up_write(&iinfo->i_data_sem);
page = find_or_create_page(inode->i_mapping, 0, GFP_NOFS);
if (!page)
return -ENOMEM;
- if (!PageUptodate(page)) {
- kaddr = kmap_atomic(page);
- memset(kaddr + iinfo->i_lenAlloc, 0x00,
- PAGE_SIZE - iinfo->i_lenAlloc);
- memcpy(kaddr, iinfo->i_data + iinfo->i_lenEAttr,
- iinfo->i_lenAlloc);
- flush_dcache_page(page);
- SetPageUptodate(page);
- kunmap_atomic(kaddr);
- }
+ if (!PageUptodate(page))
+ udf_adinicb_readpage(page);
down_write(&iinfo->i_data_sem);
memset(iinfo->i_data + iinfo->i_lenEAttr, 0x00,
iinfo->i_lenAlloc);
iinfo->i_alloc_type = ICBTAG_FLAG_AD_SHORT;
else
iinfo->i_alloc_type = ICBTAG_FLAG_AD_LONG;
- /* from now on we have normal address_space methods */
- inode->i_data.a_ops = &udf_aops;
set_page_dirty(page);
unlock_page(page);
up_write(&iinfo->i_data_sem);
/* Restore everything back so that we don't lose data... */
lock_page(page);
down_write(&iinfo->i_data_sem);
- kaddr = kmap_atomic(page);
- memcpy(iinfo->i_data + iinfo->i_lenEAttr, kaddr, inode->i_size);
- kunmap_atomic(kaddr);
+ memcpy_to_page(page, 0, iinfo->i_data + iinfo->i_lenEAttr,
+ inode->i_size);
unlock_page(page);
iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
- inode->i_data.a_ops = &udf_adinicb_aops;
iinfo->i_lenAlloc = inode->i_size;
up_write(&iinfo->i_data_sem);
}
return err;
}
-struct buffer_head *udf_expand_dir_adinicb(struct inode *inode,
- udf_pblk_t *block, int *err)
-{
- udf_pblk_t newblock;
- struct buffer_head *dbh = NULL;
- struct kernel_lb_addr eloc;
- uint8_t alloctype;
- struct extent_position epos;
+#define UDF_MAP_CREATE 0x01 /* Mapping can allocate new blocks */
+#define UDF_MAP_NOPREALLOC 0x02 /* Do not preallocate blocks */
- struct udf_fileident_bh sfibh, dfibh;
- loff_t f_pos = udf_ext0_offset(inode);
- int size = udf_ext0_offset(inode) + inode->i_size;
- struct fileIdentDesc cfi, *sfi, *dfi;
- struct udf_inode_info *iinfo = UDF_I(inode);
+#define UDF_BLK_MAPPED 0x01 /* Block was successfully mapped */
+#define UDF_BLK_NEW 0x02 /* Block was freshly allocated */
- if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_SHORT_AD))
- alloctype = ICBTAG_FLAG_AD_SHORT;
- else
- alloctype = ICBTAG_FLAG_AD_LONG;
+struct udf_map_rq {
+ sector_t lblk;
+ udf_pblk_t pblk;
+ int iflags; /* UDF_MAP_ flags determining behavior */
+ int oflags; /* UDF_BLK_ flags reporting results */
+};
- if (!inode->i_size) {
- iinfo->i_alloc_type = alloctype;
- mark_inode_dirty(inode);
- return NULL;
- }
+static int udf_map_block(struct inode *inode, struct udf_map_rq *map)
+{
+ int err;
+ struct udf_inode_info *iinfo = UDF_I(inode);
- /* alloc block, and copy data to it */
- *block = udf_new_block(inode->i_sb, inode,
- iinfo->i_location.partitionReferenceNum,
- iinfo->i_location.logicalBlockNum, err);
- if (!(*block))
- return NULL;
- newblock = udf_get_pblock(inode->i_sb, *block,
- iinfo->i_location.partitionReferenceNum,
- 0);
- if (!newblock)
- return NULL;
- dbh = udf_tgetblk(inode->i_sb, newblock);
- if (!dbh)
- return NULL;
- lock_buffer(dbh);
- memset(dbh->b_data, 0x00, inode->i_sb->s_blocksize);
- set_buffer_uptodate(dbh);
- unlock_buffer(dbh);
- mark_buffer_dirty_inode(dbh, inode);
-
- sfibh.soffset = sfibh.eoffset =
- f_pos & (inode->i_sb->s_blocksize - 1);
- sfibh.sbh = sfibh.ebh = NULL;
- dfibh.soffset = dfibh.eoffset = 0;
- dfibh.sbh = dfibh.ebh = dbh;
- while (f_pos < size) {
- iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
- sfi = udf_fileident_read(inode, &f_pos, &sfibh, &cfi, NULL,
- NULL, NULL, NULL);
- if (!sfi) {
- brelse(dbh);
- return NULL;
- }
- iinfo->i_alloc_type = alloctype;
- sfi->descTag.tagLocation = cpu_to_le32(*block);
- dfibh.soffset = dfibh.eoffset;
- dfibh.eoffset += (sfibh.eoffset - sfibh.soffset);
- dfi = (struct fileIdentDesc *)(dbh->b_data + dfibh.soffset);
- if (udf_write_fi(inode, sfi, dfi, &dfibh, sfi->impUse,
- udf_get_fi_ident(sfi))) {
- iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
- brelse(dbh);
- return NULL;
+ map->oflags = 0;
+ if (!(map->iflags & UDF_MAP_CREATE)) {
+ struct kernel_lb_addr eloc;
+ uint32_t elen;
+ sector_t offset;
+ struct extent_position epos = {};
+
+ down_read(&iinfo->i_data_sem);
+ if (inode_bmap(inode, map->lblk, &epos, &eloc, &elen, &offset)
+ == (EXT_RECORDED_ALLOCATED >> 30)) {
+ map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc,
+ offset);
+ map->oflags |= UDF_BLK_MAPPED;
}
- }
- mark_buffer_dirty_inode(dbh, inode);
+ up_read(&iinfo->i_data_sem);
+ brelse(epos.bh);
- memset(iinfo->i_data + iinfo->i_lenEAttr, 0, iinfo->i_lenAlloc);
- iinfo->i_lenAlloc = 0;
- eloc.logicalBlockNum = *block;
- eloc.partitionReferenceNum =
- iinfo->i_location.partitionReferenceNum;
- iinfo->i_lenExtents = inode->i_size;
- epos.bh = NULL;
- epos.block = iinfo->i_location;
- epos.offset = udf_file_entry_alloc_offset(inode);
- udf_add_aext(inode, &epos, &eloc, inode->i_size, 0);
- /* UniqueID stuff */
-
- brelse(epos.bh);
- mark_inode_dirty(inode);
- return dbh;
-}
-
-static int udf_get_block(struct inode *inode, sector_t block,
- struct buffer_head *bh_result, int create)
-{
- int err, new;
- sector_t phys = 0;
- struct udf_inode_info *iinfo;
-
- if (!create) {
- phys = udf_block_map(inode, block);
- if (phys)
- map_bh(bh_result, inode->i_sb, phys);
return 0;
}
- err = -EIO;
- new = 0;
- iinfo = UDF_I(inode);
-
down_write(&iinfo->i_data_sem);
- if (block == iinfo->i_next_alloc_block + 1) {
- iinfo->i_next_alloc_block++;
- iinfo->i_next_alloc_goal++;
- }
-
/*
* Block beyond EOF and prealloc extents? Just discard preallocation
* as it is not useful and complicates things.
*/
- if (((loff_t)block) << inode->i_blkbits > iinfo->i_lenExtents)
+ if (((loff_t)map->lblk) << inode->i_blkbits >= iinfo->i_lenExtents)
udf_discard_prealloc(inode);
udf_clear_extent_cache(inode);
- phys = inode_getblk(inode, block, &err, &new);
- if (!phys)
- goto abort;
-
- if (new)
- set_buffer_new(bh_result);
- map_bh(bh_result, inode->i_sb, phys);
-
-abort:
+ err = inode_getblk(inode, map);
up_write(&iinfo->i_data_sem);
return err;
}
-static struct buffer_head *udf_getblk(struct inode *inode, udf_pblk_t block,
- int create, int *err)
+static int __udf_get_block(struct inode *inode, sector_t block,
+ struct buffer_head *bh_result, int flags)
{
- struct buffer_head *bh;
- struct buffer_head dummy;
-
- dummy.b_state = 0;
- dummy.b_blocknr = -1000;
- *err = udf_get_block(inode, block, &dummy, create);
- if (!*err && buffer_mapped(&dummy)) {
- bh = sb_getblk(inode->i_sb, dummy.b_blocknr);
- if (buffer_new(&dummy)) {
- lock_buffer(bh);
- memset(bh->b_data, 0x00, inode->i_sb->s_blocksize);
- set_buffer_uptodate(bh);
- unlock_buffer(bh);
- mark_buffer_dirty_inode(bh, inode);
- }
- return bh;
+ int err;
+ struct udf_map_rq map = {
+ .lblk = block,
+ .iflags = flags,
+ };
+
+ err = udf_map_block(inode, &map);
+ if (err < 0)
+ return err;
+ if (map.oflags & UDF_BLK_MAPPED) {
+ map_bh(bh_result, inode->i_sb, map.pblk);
+ if (map.oflags & UDF_BLK_NEW)
+ set_buffer_new(bh_result);
}
+ return 0;
+}
- return NULL;
+int udf_get_block(struct inode *inode, sector_t block,
+ struct buffer_head *bh_result, int create)
+{
+ int flags = create ? UDF_MAP_CREATE : 0;
+
+ /*
+ * We preallocate blocks only for regular files. It also makes sense
+ * for directories but there's a problem when to drop the
+ * preallocation. We might use some delayed work for that but I feel
+ * it's overengineering for a filesystem like UDF.
+ */
+ if (!S_ISREG(inode->i_mode))
+ flags |= UDF_MAP_NOPREALLOC;
+ return __udf_get_block(inode, block, bh_result, flags);
+}
+
+/*
+ * We shouldn't be allocating blocks on page writeback since we allocate them
+ * on page fault. We can spot dirty buffers without allocated blocks though
+ * when truncate expands file. These however don't have valid data so we can
+ * safely ignore them. So never allocate blocks from page writeback.
+ */
+static int udf_get_block_wb(struct inode *inode, sector_t block,
+ struct buffer_head *bh_result, int create)
+{
+ return __udf_get_block(inode, block, bh_result, 0);
}
/* Extend the file with new blocks totaling 'new_block_bytes',
~(sb->s_blocksize - 1);
}
+ add = 0;
/* Can we merge with the previous extent? */
if ((last_ext->extLength & UDF_EXTENT_FLAG_MASK) ==
EXT_NOT_RECORDED_NOT_ALLOCATED) {
}
if (fake) {
- udf_add_aext(inode, last_pos, &last_ext->extLocation,
- last_ext->extLength, 1);
+ err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
+ last_ext->extLength, 1);
+ if (err < 0)
+ goto out_err;
count++;
} else {
struct kernel_lb_addr tmploc;
if (new_block_bytes)
udf_next_aext(inode, last_pos, &tmploc, &tmplen, 0);
}
+ iinfo->i_lenExtents += add;
/* Managed to do everything necessary? */
if (!new_block_bytes)
err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
if (err)
- return err;
+ goto out_err;
+ iinfo->i_lenExtents += add;
count++;
}
if (new_block_bytes) {
err = udf_add_aext(inode, last_pos, &last_ext->extLocation,
last_ext->extLength, 1);
if (err)
- return err;
+ goto out_err;
+ iinfo->i_lenExtents += new_block_bytes;
count++;
}
return -EIO;
return count;
+out_err:
+ /* Remove extents we've created so far */
+ udf_clear_extent_cache(inode);
+ udf_truncate_extents(inode);
+ return err;
}
/* Extend the final block of the file to final_block_len bytes */
else
BUG();
+ down_write(&iinfo->i_data_sem);
/*
* When creating hole in file, just don't bother with preserving
* preallocation. It likely won't be very useful anyway.
if (err < 0)
goto out;
err = 0;
- iinfo->i_lenExtents = newsize;
out:
brelse(epos.bh);
+ up_write(&iinfo->i_data_sem);
return err;
}
-static sector_t inode_getblk(struct inode *inode, sector_t block,
- int *err, int *new)
+static int inode_getblk(struct inode *inode, struct udf_map_rq *map)
{
struct kernel_long_ad laarr[EXTENT_MERGE_SIZE];
struct extent_position prev_epos, cur_epos, next_epos;
struct kernel_lb_addr eloc, tmpeloc;
int c = 1;
loff_t lbcount = 0, b_off = 0;
- udf_pblk_t newblocknum, newblock = 0;
+ udf_pblk_t newblocknum;
sector_t offset = 0;
int8_t etype;
struct udf_inode_info *iinfo = UDF_I(inode);
udf_pblk_t goal = 0, pgoal = iinfo->i_location.logicalBlockNum;
int lastblock = 0;
bool isBeyondEOF;
+ int ret = 0;
- *err = 0;
- *new = 0;
prev_epos.offset = udf_file_entry_alloc_offset(inode);
prev_epos.block = iinfo->i_location;
prev_epos.bh = NULL;
cur_epos = next_epos = prev_epos;
- b_off = (loff_t)block << inode->i_sb->s_blocksize_bits;
+ b_off = (loff_t)map->lblk << inode->i_sb->s_blocksize_bits;
/* find the extent which contains the block we are looking for.
alternate between laarr[0] and laarr[1] for locations of the
elen = EXT_RECORDED_ALLOCATED |
((elen + inode->i_sb->s_blocksize - 1) &
~(inode->i_sb->s_blocksize - 1));
+ iinfo->i_lenExtents =
+ ALIGN(iinfo->i_lenExtents,
+ inode->i_sb->s_blocksize);
udf_write_aext(inode, &cur_epos, &eloc, elen, 1);
}
- newblock = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
+ map->oflags = UDF_BLK_MAPPED;
+ map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
goto out_free;
}
/* Are we beyond EOF and preallocated extent? */
if (etype == -1) {
- int ret;
loff_t hole_len;
isBeyondEOF = true;
/* Create extents for the hole between EOF and offset */
hole_len = (loff_t)offset << inode->i_blkbits;
ret = udf_do_extend_file(inode, &prev_epos, laarr, hole_len);
- if (ret < 0) {
- *err = ret;
+ if (ret < 0)
goto out_free;
- }
c = 0;
offset = 0;
count += ret;
- /* We are not covered by a preallocated extent? */
- if ((laarr[0].extLength & UDF_EXTENT_FLAG_MASK) !=
- EXT_NOT_RECORDED_ALLOCATED) {
- /* Is there any real extent? - otherwise we overwrite
- * the fake one... */
- if (count)
- c = !c;
- laarr[c].extLength = EXT_NOT_RECORDED_NOT_ALLOCATED |
- inode->i_sb->s_blocksize;
- memset(&laarr[c].extLocation, 0x00,
- sizeof(struct kernel_lb_addr));
- count++;
- }
+ /*
+ * Is there any real extent? - otherwise we overwrite the fake
+ * one...
+ */
+ if (count)
+ c = !c;
+ laarr[c].extLength = EXT_NOT_RECORDED_NOT_ALLOCATED |
+ inode->i_sb->s_blocksize;
+ memset(&laarr[c].extLocation, 0x00,
+ sizeof(struct kernel_lb_addr));
+ count++;
endnum = c + 1;
lastblock = 1;
} else {
if ((laarr[c].extLength >> 30) == (EXT_NOT_RECORDED_ALLOCATED >> 30))
newblocknum = laarr[c].extLocation.logicalBlockNum + offset;
else { /* otherwise, allocate a new block */
- if (iinfo->i_next_alloc_block == block)
+ if (iinfo->i_next_alloc_block == map->lblk)
goal = iinfo->i_next_alloc_goal;
if (!goal) {
newblocknum = udf_new_block(inode->i_sb, inode,
iinfo->i_location.partitionReferenceNum,
- goal, err);
- if (!newblocknum) {
- *err = -ENOSPC;
+ goal, &ret);
+ if (!newblocknum)
goto out_free;
- }
if (isBeyondEOF)
iinfo->i_lenExtents += inode->i_sb->s_blocksize;
}
* block */
udf_split_extents(inode, &c, offset, newblocknum, laarr, &endnum);
- /* We preallocate blocks only for regular files. It also makes sense
- * for directories but there's a problem when to drop the
- * preallocation. We might use some delayed work for that but I feel
- * it's overengineering for a filesystem like UDF. */
- if (S_ISREG(inode->i_mode))
+ if (!(map->iflags & UDF_MAP_NOPREALLOC))
udf_prealloc_extents(inode, c, lastblock, laarr, &endnum);
/* merge any continuous blocks in laarr */
/* write back the new extents, inserting new extents if the new number
* of extents is greater than the old number, and deleting extents if
* the new number of extents is less than the old number */
- udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);
+ ret = udf_update_extents(inode, laarr, startnum, endnum, &prev_epos);
+ if (ret < 0)
+ goto out_free;
- newblock = udf_get_pblock(inode->i_sb, newblocknum,
+ map->pblk = udf_get_pblock(inode->i_sb, newblocknum,
iinfo->i_location.partitionReferenceNum, 0);
- if (!newblock) {
- *err = -EIO;
+ if (!map->pblk) {
+ ret = -EFSCORRUPTED;
goto out_free;
}
- *new = 1;
- iinfo->i_next_alloc_block = block;
- iinfo->i_next_alloc_goal = newblocknum;
+ map->oflags = UDF_BLK_NEW | UDF_BLK_MAPPED;
+ iinfo->i_next_alloc_block = map->lblk + 1;
+ iinfo->i_next_alloc_goal = newblocknum + 1;
inode->i_ctime = current_time(inode);
if (IS_SYNC(inode))
udf_sync_inode(inode);
else
mark_inode_dirty(inode);
+ ret = 0;
out_free:
brelse(prev_epos.bh);
brelse(cur_epos.bh);
brelse(next_epos.bh);
- return newblock;
+ return ret;
}
static void udf_split_extents(struct inode *inode, int *c, int offset,
blocksize - 1) >> blocksize_bits)))) {
if (((li->extLength & UDF_EXTENT_LENGTH_MASK) +
- (lip1->extLength & UDF_EXTENT_LENGTH_MASK) +
- blocksize - 1) & ~UDF_EXTENT_LENGTH_MASK) {
- lip1->extLength = (lip1->extLength -
- (li->extLength &
- UDF_EXTENT_LENGTH_MASK) +
- UDF_EXTENT_LENGTH_MASK) &
- ~(blocksize - 1);
- li->extLength = (li->extLength &
- UDF_EXTENT_FLAG_MASK) +
- (UDF_EXTENT_LENGTH_MASK + 1) -
- blocksize;
- lip1->extLocation.logicalBlockNum =
- li->extLocation.logicalBlockNum +
- ((li->extLength &
- UDF_EXTENT_LENGTH_MASK) >>
- blocksize_bits);
- } else {
+ (lip1->extLength & UDF_EXTENT_LENGTH_MASK) +
+ blocksize - 1) <= UDF_EXTENT_LENGTH_MASK) {
li->extLength = lip1->extLength +
(((li->extLength &
UDF_EXTENT_LENGTH_MASK) +
}
}
-static void udf_update_extents(struct inode *inode, struct kernel_long_ad *laarr,
- int startnum, int endnum,
- struct extent_position *epos)
+static int udf_update_extents(struct inode *inode, struct kernel_long_ad *laarr,
+ int startnum, int endnum,
+ struct extent_position *epos)
{
int start = 0, i;
struct kernel_lb_addr tmploc;
uint32_t tmplen;
+ int err;
if (startnum > endnum) {
for (i = 0; i < (startnum - endnum); i++)
udf_delete_aext(inode, *epos);
} else if (startnum < endnum) {
for (i = 0; i < (endnum - startnum); i++) {
- udf_insert_aext(inode, *epos, laarr[i].extLocation,
- laarr[i].extLength);
+ err = udf_insert_aext(inode, *epos,
+ laarr[i].extLocation,
+ laarr[i].extLength);
+ /*
+ * If we fail here, we are likely corrupting the extent
+ * list and leaking blocks. At least stop early to
+ * limit the damage.
+ */
+ if (err < 0)
+ return err;
udf_next_aext(inode, epos, &laarr[i].extLocation,
&laarr[i].extLength, 1);
start++;
udf_write_aext(inode, epos, &laarr[i].extLocation,
laarr[i].extLength, 1);
}
+ return 0;
}
struct buffer_head *udf_bread(struct inode *inode, udf_pblk_t block,
int create, int *err)
{
struct buffer_head *bh = NULL;
+ struct udf_map_rq map = {
+ .lblk = block,
+ .iflags = UDF_MAP_NOPREALLOC | (create ? UDF_MAP_CREATE : 0),
+ };
- bh = udf_getblk(inode, block, create, err);
- if (!bh)
+ *err = udf_map_block(inode, &map);
+ if (*err || !(map.oflags & UDF_BLK_MAPPED))
return NULL;
+ bh = sb_getblk(inode->i_sb, map.pblk);
+ if (!bh) {
+ *err = -ENOMEM;
+ return NULL;
+ }
+ if (map.oflags & UDF_BLK_NEW) {
+ lock_buffer(bh);
+ memset(bh->b_data, 0x00, inode->i_sb->s_blocksize);
+ set_buffer_uptodate(bh);
+ unlock_buffer(bh);
+ mark_buffer_dirty_inode(bh, inode);
+ return bh;
+ }
+
if (bh_read(bh, 0) >= 0)
return bh;
int udf_setsize(struct inode *inode, loff_t newsize)
{
- int err;
+ int err = 0;
struct udf_inode_info *iinfo;
unsigned int bsize = i_blocksize(inode);
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
return -EPERM;
+ filemap_invalidate_lock(inode->i_mapping);
iinfo = UDF_I(inode);
if (newsize > inode->i_size) {
- down_write(&iinfo->i_data_sem);
if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
- if (bsize <
+ if (bsize >=
(udf_file_entry_alloc_offset(inode) + newsize)) {
- err = udf_expand_file_adinicb(inode);
- if (err)
- return err;
down_write(&iinfo->i_data_sem);
- } else {
iinfo->i_lenAlloc = newsize;
+ up_write(&iinfo->i_data_sem);
goto set_size;
}
+ err = udf_expand_file_adinicb(inode);
+ if (err)
+ goto out_unlock;
}
err = udf_extend_file(inode, newsize);
- if (err) {
- up_write(&iinfo->i_data_sem);
- return err;
- }
+ if (err)
+ goto out_unlock;
set_size:
- up_write(&iinfo->i_data_sem);
truncate_setsize(inode, newsize);
} else {
if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
err = block_truncate_page(inode->i_mapping, newsize,
udf_get_block);
if (err)
- return err;
+ goto out_unlock;
truncate_setsize(inode, newsize);
down_write(&iinfo->i_data_sem);
udf_clear_extent_cache(inode);
err = udf_truncate_extents(inode);
up_write(&iinfo->i_data_sem);
if (err)
- return err;
+ goto out_unlock;
}
update_time:
inode->i_mtime = inode->i_ctime = current_time(inode);
udf_sync_inode(inode);
else
mark_inode_dirty(inode);
- return 0;
+out_unlock:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
/*
ret = -EIO;
goto out;
}
+ iinfo->i_hidden = hidden_inode;
iinfo->i_unique = 0;
iinfo->i_lenEAttr = 0;
iinfo->i_lenExtents = 0;
case ICBTAG_FILE_TYPE_REGULAR:
case ICBTAG_FILE_TYPE_UNDEF:
case ICBTAG_FILE_TYPE_VAT20:
- if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
- inode->i_data.a_ops = &udf_adinicb_aops;
- else
- inode->i_data.a_ops = &udf_aops;
+ inode->i_data.a_ops = &udf_aops;
inode->i_op = &udf_file_inode_operations;
inode->i_fop = &udf_file_operations;
inode->i_mode |= S_IFREG;
unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits;
struct udf_inode_info *iinfo = UDF_I(inode);
- bh = udf_tgetblk(inode->i_sb,
+ bh = sb_getblk(inode->i_sb,
udf_get_lb_pblock(inode->i_sb, &iinfo->i_location, 0));
if (!bh) {
udf_debug("getblk failure\n");
if (S_ISDIR(inode->i_mode) && inode->i_nlink > 0)
fe->fileLinkCount = cpu_to_le16(inode->i_nlink - 1);
- else
- fe->fileLinkCount = cpu_to_le16(inode->i_nlink);
+ else {
+ if (iinfo->i_hidden)
+ fe->fileLinkCount = cpu_to_le16(0);
+ else
+ fe->fileLinkCount = cpu_to_le16(inode->i_nlink);
+ }
fe->informationLength = cpu_to_le64(inode->i_size);
if (!inode)
return ERR_PTR(-ENOMEM);
- if (!(inode->i_state & I_NEW))
+ if (!(inode->i_state & I_NEW)) {
+ if (UDF_I(inode)->i_hidden != hidden_inode) {
+ iput(inode);
+ return ERR_PTR(-EFSCORRUPTED);
+ }
return inode;
+ }
memcpy(&UDF_I(inode)->i_location, ino, sizeof(struct kernel_lb_addr));
err = udf_read_inode(inode, hidden_inode);
neloc.logicalBlockNum = block;
neloc.partitionReferenceNum = epos->block.partitionReferenceNum;
- bh = udf_tgetblk(sb, udf_get_lb_pblock(sb, &neloc, 0));
+ bh = sb_getblk(sb, udf_get_lb_pblock(sb, &neloc, 0));
if (!bh)
return -EIO;
lock_buffer(bh);
epos->offset = sizeof(struct allocExtDesc);
brelse(epos->bh);
block = udf_get_lb_pblock(inode->i_sb, &epos->block, 0);
- epos->bh = udf_tread(inode->i_sb, block);
+ epos->bh = sb_bread(inode->i_sb, block);
if (!epos->bh) {
udf_debug("reading block %u failed!\n", block);
return -1;
return etype;
}
-static int8_t udf_insert_aext(struct inode *inode, struct extent_position epos,
- struct kernel_lb_addr neloc, uint32_t nelen)
+static int udf_insert_aext(struct inode *inode, struct extent_position epos,
+ struct kernel_lb_addr neloc, uint32_t nelen)
{
struct kernel_lb_addr oeloc;
uint32_t oelen;
int8_t etype;
+ int err;
if (epos.bh)
get_bh(epos.bh);
neloc = oeloc;
nelen = (etype << 30) | oelen;
}
- udf_add_aext(inode, &epos, &neloc, nelen, 1);
+ err = udf_add_aext(inode, &epos, &neloc, nelen, 1);
brelse(epos.bh);
- return (nelen >> 30);
+ return err;
}
int8_t udf_delete_aext(struct inode *inode, struct extent_position epos)
return etype;
}
-
-udf_pblk_t udf_block_map(struct inode *inode, sector_t block)
-{
- struct kernel_lb_addr eloc;
- uint32_t elen;
- sector_t offset;
- struct extent_position epos = {};
- udf_pblk_t ret;
-
- down_read(&UDF_I(inode)->i_data_sem);
-
- if (inode_bmap(inode, block, &epos, &eloc, &elen, &offset) ==
- (EXT_RECORDED_ALLOCATED >> 30))
- ret = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
- else
- ret = 0;
-
- up_read(&UDF_I(inode)->i_data_sem);
- brelse(epos.bh);
-
- if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_VARCONV))
- return udf_fixed_to_variable(ret);
- else
- return ret;
-}
iattr.ia_valid = ATTR_SIZE;
iattr.ia_size = new_size;
- error = xfs_vn_setattr_size(file_mnt_user_ns(file),
+ error = xfs_vn_setattr_size(file_mnt_idmap(file),
file_dentry(file), &iattr);
if (error)
goto out_unlock;
file_accessed(file);
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(inode))
- vma->vm_flags |= VM_HUGEPAGE;
+ vm_flags_set(vma, VM_HUGEPAGE);
return 0;
}
unsigned int max_dev_sectors;
unsigned int chunk_sectors;
unsigned int max_sectors;
+ unsigned int max_user_sectors;
unsigned int max_segment_size;
unsigned int physical_block_size;
unsigned int logical_block_size;
DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS);
struct blkcg_gq *root_blkg;
struct list_head blkg_list;
+ struct mutex blkcg_mutex;
#endif
struct queue_limits limits;
#define QUEUE_FLAG_IO_STAT 7 /* do disk/partitions IO accounting */
#define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */
#define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */
+ #define QUEUE_FLAG_SYNCHRONOUS 11 /* always completes in submit context */
#define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */
#define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */
#define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */
enum blk_default_limits {
BLK_MAX_SEGMENTS = 128,
BLK_SAFE_MAX_SECTORS = 255,
- BLK_DEF_MAX_SECTORS = 2560,
BLK_MAX_SEGMENT_SIZE = 65536,
BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
};
+#define BLK_DEF_MAX_SECTORS 2560u
+
static inline unsigned long queue_segment_boundary(const struct request_queue *q)
{
return q->limits.seg_boundary_mask;
return blk_queue_nonrot(bdev_get_queue(bdev));
}
+ static inline bool bdev_synchronous(struct block_device *bdev)
+ {
+ return test_bit(QUEUE_FLAG_SYNCHRONOUS,
+ &bdev_get_queue(bdev)->queue_flags);
+ }
+
static inline bool bdev_stable_writes(struct block_device *bdev)
{
return test_bit(QUEUE_FLAG_STABLE_WRITES,
static inline bool bdev_is_zoned(struct block_device *bdev)
{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (q)
- return blk_queue_is_zoned(q);
+ return blk_queue_is_zoned(bdev_get_queue(bdev));
+}
- return false;
+static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
+{
+ return disk_zone_no(bdev->bd_disk, sec);
}
static inline bool bdev_op_is_zoned_write(struct block_device *bdev,
return q->limits.chunk_sectors;
}
+static inline sector_t bdev_offset_from_zone_start(struct block_device *bdev,
+ sector_t sector)
+{
+ return sector & (bdev_zone_sectors(bdev) - 1);
+}
+
+static inline bool bdev_is_zone_start(struct block_device *bdev,
+ sector_t sector)
+{
+ return bdev_offset_from_zone_start(bdev, sector) == 0;
+}
+
static inline int queue_dma_alignment(const struct request_queue *q)
{
return q ? q->limits.dma_alignment : 511;
unsigned int flags);
int (*open) (struct block_device *, fmode_t);
void (*release) (struct gendisk *, fmode_t);
- int (*rw_page)(struct block_device *, sector_t, struct page *, enum req_op);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
unsigned int (*check_events) (struct gendisk *disk,
#define blkdev_compat_ptr_ioctl NULL
#endif
- extern int bdev_read_page(struct block_device *, sector_t, struct page *);
- extern int bdev_write_page(struct block_device *, sector_t, struct page *,
- struct writeback_control *);
-
static inline void blk_wake_io_task(struct task_struct *waiter)
{
/*
/* File supports DIRECT IO */
#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000)
+ #define FMODE_NOREUSE ((__force fmode_t)0x800000)
+
/* File was opened by fanotify and shouldn't generate fanotify events */
#define FMODE_NONOTIFY ((__force fmode_t)0x4000000)
#define MAX_LFS_FILESIZE ((loff_t)LLONG_MAX)
#endif
-#define FL_POSIX 1
-#define FL_FLOCK 2
-#define FL_DELEG 4 /* NFSv4 delegation */
-#define FL_ACCESS 8 /* not trying to lock, just looking */
-#define FL_EXISTS 16 /* when unlocking, test for existence */
-#define FL_LEASE 32 /* lease held on this file */
-#define FL_CLOSE 64 /* unlock on close */
-#define FL_SLEEP 128 /* A blocking lock */
-#define FL_DOWNGRADE_PENDING 256 /* Lease is being downgraded */
-#define FL_UNLOCK_PENDING 512 /* Lease is being broken */
-#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
-#define FL_LAYOUT 2048 /* outstanding pNFS layout */
-#define FL_RECLAIM 4096 /* reclaiming from a reboot server */
-
-#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
-
-/*
- * Special return value from posix_lock_file() and vfs_lock_file() for
- * asynchronous locking.
- */
-#define FILE_LOCK_DEFERRED 1
-
/* legacy typedef, should eventually be removed */
typedef void *fl_owner_t;
struct file_lock;
-struct file_lock_operations {
- void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
- void (*fl_release_private)(struct file_lock *);
-};
-
-struct lock_manager_operations {
- void *lm_mod_owner;
- fl_owner_t (*lm_get_owner)(fl_owner_t);
- void (*lm_put_owner)(fl_owner_t);
- void (*lm_notify)(struct file_lock *); /* unblock callback */
- int (*lm_grant)(struct file_lock *, int);
- bool (*lm_break)(struct file_lock *);
- int (*lm_change)(struct file_lock *, int, struct list_head *);
- void (*lm_setup)(struct file_lock *, void **);
- bool (*lm_breaker_owns_lease)(struct file_lock *);
- bool (*lm_lock_expirable)(struct file_lock *cfl);
- void (*lm_expire_lock)(void);
-};
-
-struct lock_manager {
- struct list_head list;
- /*
- * NFSv4 and up also want opens blocked during the grace period;
- * NLM doesn't care:
- */
- bool block_opens;
-};
-
-struct net;
-void locks_start_grace(struct net *, struct lock_manager *);
-void locks_end_grace(struct lock_manager *);
-bool locks_in_grace(struct net *);
-bool opens_in_grace(struct net *);
-
-/* that will die - we need it for nfs_lock_info */
-#include <linux/nfs_fs_i.h>
-
-/*
- * struct file_lock represents a generic "file lock". It's used to represent
- * POSIX byte range locks, BSD (flock) locks, and leases. It's important to
- * note that the same struct is used to represent both a request for a lock and
- * the lock itself, but the same object is never used for both.
- *
- * FIXME: should we create a separate "struct lock_request" to help distinguish
- * these two uses?
- *
- * The varous i_flctx lists are ordered by:
- *
- * 1) lock owner
- * 2) lock range start
- * 3) lock range end
- *
- * Obviously, the last two criteria only matter for POSIX locks.
- */
-struct file_lock {
- struct file_lock *fl_blocker; /* The lock, that is blocking us */
- struct list_head fl_list; /* link into file_lock_context */
- struct hlist_node fl_link; /* node in global lists */
- struct list_head fl_blocked_requests; /* list of requests with
- * ->fl_blocker pointing here
- */
- struct list_head fl_blocked_member; /* node in
- * ->fl_blocker->fl_blocked_requests
- */
- fl_owner_t fl_owner;
- unsigned int fl_flags;
- unsigned char fl_type;
- unsigned int fl_pid;
- int fl_link_cpu; /* what cpu's list is this on? */
- wait_queue_head_t fl_wait;
- struct file *fl_file;
- loff_t fl_start;
- loff_t fl_end;
-
- struct fasync_struct * fl_fasync; /* for lease break notifications */
- /* for lease breaks: */
- unsigned long fl_break_time;
- unsigned long fl_downgrade_time;
-
- const struct file_lock_operations *fl_ops; /* Callbacks for filesystems */
- const struct lock_manager_operations *fl_lmops; /* Callbacks for lockmanagers */
- union {
- struct nfs_lock_info nfs_fl;
- struct nfs4_lock_info nfs4_fl;
- struct {
- struct list_head link; /* link in AFS vnode's pending_locks list */
- int state; /* state of grant or error if -ve */
- unsigned int debug_id;
- } afs;
- struct {
- struct inode *inode;
- } ceph;
- } fl_u;
-} __randomize_layout;
-
-struct file_lock_context {
- spinlock_t flc_lock;
- struct list_head flc_flock;
- struct list_head flc_posix;
- struct list_head flc_lease;
-};
-
/* The following constant reflects the upper bound of the file/locking space */
#ifndef OFFSET_MAX
#define OFFSET_MAX type_max(loff_t)
extern void send_sigio(struct fown_struct *fown, int fd, int band);
-#define locks_inode(f) file_inode(f)
-
-#ifdef CONFIG_FILE_LOCKING
-extern int fcntl_getlk(struct file *, unsigned int, struct flock *);
-extern int fcntl_setlk(unsigned int, struct file *, unsigned int,
- struct flock *);
-
-#if BITS_PER_LONG == 32
-extern int fcntl_getlk64(struct file *, unsigned int, struct flock64 *);
-extern int fcntl_setlk64(unsigned int, struct file *, unsigned int,
- struct flock64 *);
-#endif
-
-extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
-extern int fcntl_getlease(struct file *filp);
-
-/* fs/locks.c */
-void locks_free_lock_context(struct inode *inode);
-void locks_free_lock(struct file_lock *fl);
-extern void locks_init_lock(struct file_lock *);
-extern struct file_lock * locks_alloc_lock(void);
-extern void locks_copy_lock(struct file_lock *, struct file_lock *);
-extern void locks_copy_conflock(struct file_lock *, struct file_lock *);
-extern void locks_remove_posix(struct file *, fl_owner_t);
-extern void locks_remove_file(struct file *);
-extern void locks_release_private(struct file_lock *);
-extern void posix_test_lock(struct file *, struct file_lock *);
-extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
-extern int locks_delete_block(struct file_lock *);
-extern int vfs_test_lock(struct file *, struct file_lock *);
-extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
-extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
-bool vfs_inode_has_locks(struct inode *inode);
-extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
-extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
-extern void lease_get_mtime(struct inode *, struct timespec64 *time);
-extern int generic_setlease(struct file *, long, struct file_lock **, void **priv);
-extern int vfs_setlease(struct file *, long, struct file_lock **, void **);
-extern int lease_modify(struct file_lock *, int, struct list_head *);
-
-struct notifier_block;
-extern int lease_register_notifier(struct notifier_block *);
-extern void lease_unregister_notifier(struct notifier_block *);
-
-struct files_struct;
-extern void show_fd_locks(struct seq_file *f,
- struct file *filp, struct files_struct *files);
-extern bool locks_owner_has_blockers(struct file_lock_context *flctx,
- fl_owner_t owner);
-
-static inline struct file_lock_context *
-locks_inode_context(const struct inode *inode)
-{
- return smp_load_acquire(&inode->i_flctx);
-}
-
-#else /* !CONFIG_FILE_LOCKING */
-static inline int fcntl_getlk(struct file *file, unsigned int cmd,
- struct flock __user *user)
-{
- return -EINVAL;
-}
-
-static inline int fcntl_setlk(unsigned int fd, struct file *file,
- unsigned int cmd, struct flock __user *user)
-{
- return -EACCES;
-}
-
-#if BITS_PER_LONG == 32
-static inline int fcntl_getlk64(struct file *file, unsigned int cmd,
- struct flock64 *user)
-{
- return -EINVAL;
-}
-
-static inline int fcntl_setlk64(unsigned int fd, struct file *file,
- unsigned int cmd, struct flock64 *user)
-{
- return -EACCES;
-}
-#endif
-static inline int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
-{
- return -EINVAL;
-}
-
-static inline int fcntl_getlease(struct file *filp)
-{
- return F_UNLCK;
-}
-
-static inline void
-locks_free_lock_context(struct inode *inode)
-{
-}
-
-static inline void locks_init_lock(struct file_lock *fl)
-{
- return;
-}
-
-static inline void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
-{
- return;
-}
-
-static inline void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
-{
- return;
-}
-
-static inline void locks_remove_posix(struct file *filp, fl_owner_t owner)
-{
- return;
-}
-
-static inline void locks_remove_file(struct file *filp)
-{
- return;
-}
-
-static inline void posix_test_lock(struct file *filp, struct file_lock *fl)
-{
- return;
-}
-
-static inline int posix_lock_file(struct file *filp, struct file_lock *fl,
- struct file_lock *conflock)
-{
- return -ENOLCK;
-}
-
-static inline int locks_delete_block(struct file_lock *waiter)
-{
- return -ENOENT;
-}
-
-static inline int vfs_test_lock(struct file *filp, struct file_lock *fl)
-{
- return 0;
-}
-
-static inline int vfs_lock_file(struct file *filp, unsigned int cmd,
- struct file_lock *fl, struct file_lock *conf)
-{
- return -ENOLCK;
-}
-
-static inline int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
-{
- return 0;
-}
-
-static inline bool vfs_inode_has_locks(struct inode *inode)
-{
- return false;
-}
-
-static inline int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl)
-{
- return -ENOLCK;
-}
-
-static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
-{
- return 0;
-}
-
-static inline void lease_get_mtime(struct inode *inode,
- struct timespec64 *time)
-{
- return;
-}
-
-static inline int generic_setlease(struct file *filp, long arg,
- struct file_lock **flp, void **priv)
-{
- return -EINVAL;
-}
-
-static inline int vfs_setlease(struct file *filp, long arg,
- struct file_lock **lease, void **priv)
-{
- return -EINVAL;
-}
-
-static inline int lease_modify(struct file_lock *fl, int arg,
- struct list_head *dispose)
-{
- return -EINVAL;
-}
-
-struct files_struct;
-static inline void show_fd_locks(struct seq_file *f,
- struct file *filp, struct files_struct *files) {}
-static inline bool locks_owner_has_blockers(struct file_lock_context *flctx,
- fl_owner_t owner)
-{
- return false;
-}
-
-static inline struct file_lock_context *
-locks_inode_context(const struct inode *inode)
-{
- return NULL;
-}
-
-#endif /* !CONFIG_FILE_LOCKING */
-
static inline struct inode *file_inode(const struct file *f)
{
return f->f_inode;
return d_real(file->f_path.dentry, file_inode(file));
}
-static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
-{
- return locks_lock_inode_wait(locks_inode(filp), fl);
-}
-
struct fasync_struct {
rwlock_t fa_lock;
int magic;
}
/**
- * i_uid_into_vfsuid - map an inode's i_uid down into a mnt_userns
- * @mnt_userns: user namespace of the mount the inode was found from
+ * i_uid_into_vfsuid - map an inode's i_uid down according to an idmapping
+ * @idmap: idmap of the mount the inode was found from
* @inode: inode to map
*
- * Return: whe inode's i_uid mapped down according to @mnt_userns.
+ * Return: whe inode's i_uid mapped down according to @idmap.
* If the inode's i_uid has no mapping INVALID_VFSUID is returned.
*/
-static inline vfsuid_t i_uid_into_vfsuid(struct user_namespace *mnt_userns,
+static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
const struct inode *inode)
{
- return make_vfsuid(mnt_userns, i_user_ns(inode), inode->i_uid);
+ return make_vfsuid(idmap, i_user_ns(inode), inode->i_uid);
}
/**
* i_uid_needs_update - check whether inode's i_uid needs to be updated
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
* @attr: the new attributes of @inode
* @inode: the inode to update
*
*
* Return: true if @inode's i_uid field needs to be updated, false if not.
*/
-static inline bool i_uid_needs_update(struct user_namespace *mnt_userns,
+static inline bool i_uid_needs_update(struct mnt_idmap *idmap,
const struct iattr *attr,
const struct inode *inode)
{
return ((attr->ia_valid & ATTR_UID) &&
!vfsuid_eq(attr->ia_vfsuid,
- i_uid_into_vfsuid(mnt_userns, inode)));
+ i_uid_into_vfsuid(idmap, inode)));
}
/**
* i_uid_update - update @inode's i_uid field
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
* @attr: the new attributes of @inode
* @inode: the inode to update
*
* Safely update @inode's i_uid field translating the vfsuid of any idmapped
* mount into the filesystem kuid.
*/
-static inline void i_uid_update(struct user_namespace *mnt_userns,
+static inline void i_uid_update(struct mnt_idmap *idmap,
const struct iattr *attr,
struct inode *inode)
{
if (attr->ia_valid & ATTR_UID)
- inode->i_uid = from_vfsuid(mnt_userns, i_user_ns(inode),
+ inode->i_uid = from_vfsuid(idmap, i_user_ns(inode),
attr->ia_vfsuid);
}
/**
- * i_gid_into_vfsgid - map an inode's i_gid down into a mnt_userns
- * @mnt_userns: user namespace of the mount the inode was found from
+ * i_gid_into_vfsgid - map an inode's i_gid down according to an idmapping
+ * @idmap: idmap of the mount the inode was found from
* @inode: inode to map
*
- * Return: the inode's i_gid mapped down according to @mnt_userns.
+ * Return: the inode's i_gid mapped down according to @idmap.
* If the inode's i_gid has no mapping INVALID_VFSGID is returned.
*/
-static inline vfsgid_t i_gid_into_vfsgid(struct user_namespace *mnt_userns,
+static inline vfsgid_t i_gid_into_vfsgid(struct mnt_idmap *idmap,
const struct inode *inode)
{
- return make_vfsgid(mnt_userns, i_user_ns(inode), inode->i_gid);
+ return make_vfsgid(idmap, i_user_ns(inode), inode->i_gid);
}
/**
* i_gid_needs_update - check whether inode's i_gid needs to be updated
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
* @attr: the new attributes of @inode
* @inode: the inode to update
*
*
* Return: true if @inode's i_gid field needs to be updated, false if not.
*/
-static inline bool i_gid_needs_update(struct user_namespace *mnt_userns,
+static inline bool i_gid_needs_update(struct mnt_idmap *idmap,
const struct iattr *attr,
const struct inode *inode)
{
return ((attr->ia_valid & ATTR_GID) &&
!vfsgid_eq(attr->ia_vfsgid,
- i_gid_into_vfsgid(mnt_userns, inode)));
+ i_gid_into_vfsgid(idmap, inode)));
}
/**
* i_gid_update - update @inode's i_gid field
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
* @attr: the new attributes of @inode
* @inode: the inode to update
*
* Safely update @inode's i_gid field translating the vfsgid of any idmapped
* mount into the filesystem kgid.
*/
-static inline void i_gid_update(struct user_namespace *mnt_userns,
+static inline void i_gid_update(struct mnt_idmap *idmap,
const struct iattr *attr,
struct inode *inode)
{
if (attr->ia_valid & ATTR_GID)
- inode->i_gid = from_vfsgid(mnt_userns, i_user_ns(inode),
+ inode->i_gid = from_vfsgid(idmap, i_user_ns(inode),
attr->ia_vfsgid);
}
/**
* inode_fsuid_set - initialize inode's i_uid field with callers fsuid
* @inode: inode to initialize
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
*
* Initialize the i_uid field of @inode. If the inode was found/created via
- * an idmapped mount map the caller's fsuid according to @mnt_users.
+ * an idmapped mount map the caller's fsuid according to @idmap.
*/
static inline void inode_fsuid_set(struct inode *inode,
- struct user_namespace *mnt_userns)
+ struct mnt_idmap *idmap)
{
- inode->i_uid = mapped_fsuid(mnt_userns, i_user_ns(inode));
+ inode->i_uid = mapped_fsuid(idmap, i_user_ns(inode));
}
/**
* inode_fsgid_set - initialize inode's i_gid field with callers fsgid
* @inode: inode to initialize
- * @mnt_userns: user namespace of the mount the inode was found from
+ * @idmap: idmap of the mount the inode was found from
*
* Initialize the i_gid field of @inode. If the inode was found/created via
- * an idmapped mount map the caller's fsgid according to @mnt_users.
+ * an idmapped mount map the caller's fsgid according to @idmap.
*/
static inline void inode_fsgid_set(struct inode *inode,
- struct user_namespace *mnt_userns)
+ struct mnt_idmap *idmap)
{
- inode->i_gid = mapped_fsgid(mnt_userns, i_user_ns(inode));
+ inode->i_gid = mapped_fsgid(idmap, i_user_ns(inode));
}
/**
* fsuidgid_has_mapping() - check whether caller's fsuid/fsgid is mapped
* @sb: the superblock we want a mapping in
- * @mnt_userns: user namespace of the relevant mount
+ * @idmap: idmap of the relevant mount
*
* Check whether the caller's fsuid and fsgid have a valid mapping in the
* s_user_ns of the superblock @sb. If the caller is on an idmapped mount map
- * the caller's fsuid and fsgid according to the @mnt_userns first.
+ * the caller's fsuid and fsgid according to the @idmap first.
*
* Return: true if fsuid and fsgid is mapped, false if not.
*/
static inline bool fsuidgid_has_mapping(struct super_block *sb,
- struct user_namespace *mnt_userns)
+ struct mnt_idmap *idmap)
{
struct user_namespace *fs_userns = sb->s_user_ns;
kuid_t kuid;
kgid_t kgid;
- kuid = mapped_fsuid(mnt_userns, fs_userns);
+ kuid = mapped_fsuid(idmap, fs_userns);
if (!uid_valid(kuid))
return false;
- kgid = mapped_fsgid(mnt_userns, fs_userns);
+ kgid = mapped_fsgid(idmap, fs_userns);
if (!gid_valid(kgid))
return false;
return kuid_has_mapping(fs_userns, kuid) &&
return __sb_start_write_trylock(sb, SB_FREEZE_FS);
}
-bool inode_owner_or_capable(struct user_namespace *mnt_userns,
+bool inode_owner_or_capable(struct mnt_idmap *idmap,
const struct inode *inode);
/*
* VFS helper functions..
*/
-int vfs_create(struct user_namespace *, struct inode *,
+int vfs_create(struct mnt_idmap *, struct inode *,
struct dentry *, umode_t, bool);
-int vfs_mkdir(struct user_namespace *, struct inode *,
+int vfs_mkdir(struct mnt_idmap *, struct inode *,
struct dentry *, umode_t);
-int vfs_mknod(struct user_namespace *, struct inode *, struct dentry *,
+int vfs_mknod(struct mnt_idmap *, struct inode *, struct dentry *,
umode_t, dev_t);
-int vfs_symlink(struct user_namespace *, struct inode *,
+int vfs_symlink(struct mnt_idmap *, struct inode *,
struct dentry *, const char *);
-int vfs_link(struct dentry *, struct user_namespace *, struct inode *,
+int vfs_link(struct dentry *, struct mnt_idmap *, struct inode *,
struct dentry *, struct inode **);
-int vfs_rmdir(struct user_namespace *, struct inode *, struct dentry *);
-int vfs_unlink(struct user_namespace *, struct inode *, struct dentry *,
+int vfs_rmdir(struct mnt_idmap *, struct inode *, struct dentry *);
+int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
struct inode **);
/**
* struct renamedata - contains all information required for renaming
- * @old_mnt_userns: old user namespace of the mount the inode was found from
+ * @old_mnt_idmap: idmap of the old mount the inode was found from
* @old_dir: parent of source
* @old_dentry: source
- * @new_mnt_userns: new user namespace of the mount the inode was found from
+ * @new_mnt_idmap: idmap of the new mount the inode was found from
* @new_dir: parent of destination
* @new_dentry: destination
* @delegated_inode: returns an inode needing a delegation break
* @flags: rename flags
*/
struct renamedata {
- struct user_namespace *old_mnt_userns;
+ struct mnt_idmap *old_mnt_idmap;
struct inode *old_dir;
struct dentry *old_dentry;
- struct user_namespace *new_mnt_userns;
+ struct mnt_idmap *new_mnt_idmap;
struct inode *new_dir;
struct dentry *new_dentry;
struct inode **delegated_inode;
int vfs_rename(struct renamedata *);
-static inline int vfs_whiteout(struct user_namespace *mnt_userns,
+static inline int vfs_whiteout(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry)
{
- return vfs_mknod(mnt_userns, dir, dentry, S_IFCHR | WHITEOUT_MODE,
+ return vfs_mknod(idmap, dir, dentry, S_IFCHR | WHITEOUT_MODE,
WHITEOUT_DEV);
}
-struct file *vfs_tmpfile_open(struct user_namespace *mnt_userns,
+struct file *vfs_tmpfile_open(struct mnt_idmap *idmap,
const struct path *parentpath,
umode_t mode, int open_flag, const struct cred *cred);
/*
* VFS file helper functions.
*/
-void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
+void inode_init_owner(struct mnt_idmap *idmap, struct inode *inode,
const struct inode *dir, umode_t mode);
extern bool may_open_dev(const struct path *path);
-umode_t mode_strip_sgid(struct user_namespace *mnt_userns,
+umode_t mode_strip_sgid(struct mnt_idmap *idmap,
const struct inode *dir, umode_t mode);
/*
struct inode_operations {
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
- int (*permission) (struct user_namespace *, struct inode *, int);
+ int (*permission) (struct mnt_idmap *, struct inode *, int);
struct posix_acl * (*get_inode_acl)(struct inode *, int, bool);
int (*readlink) (struct dentry *, char __user *,int);
- int (*create) (struct user_namespace *, struct inode *,struct dentry *,
+ int (*create) (struct mnt_idmap *, struct inode *,struct dentry *,
umode_t, bool);
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
- int (*symlink) (struct user_namespace *, struct inode *,struct dentry *,
+ int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,
const char *);
- int (*mkdir) (struct user_namespace *, struct inode *,struct dentry *,
+ int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,
umode_t);
int (*rmdir) (struct inode *,struct dentry *);
- int (*mknod) (struct user_namespace *, struct inode *,struct dentry *,
+ int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,
umode_t,dev_t);
- int (*rename) (struct user_namespace *, struct inode *, struct dentry *,
+ int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
- int (*setattr) (struct user_namespace *, struct dentry *,
- struct iattr *);
- int (*getattr) (struct user_namespace *, const struct path *,
+ int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *);
+ int (*getattr) (struct mnt_idmap *, const struct path *,
struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
int (*atomic_open)(struct inode *, struct dentry *,
struct file *, unsigned open_flag,
umode_t create_mode);
- int (*tmpfile) (struct user_namespace *, struct inode *,
+ int (*tmpfile) (struct mnt_idmap *, struct inode *,
struct file *, umode_t);
- struct posix_acl *(*get_acl)(struct user_namespace *, struct dentry *,
+ struct posix_acl *(*get_acl)(struct mnt_idmap *, struct dentry *,
int);
- int (*set_acl)(struct user_namespace *, struct dentry *,
+ int (*set_acl)(struct mnt_idmap *, struct dentry *,
struct posix_acl *, int);
- int (*fileattr_set)(struct user_namespace *mnt_userns,
+ int (*fileattr_set)(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa);
int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
} ____cacheline_aligned;
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
-static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns,
+static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
struct inode *inode)
{
- return !vfsuid_valid(i_uid_into_vfsuid(mnt_userns, inode)) ||
- !vfsgid_valid(i_gid_into_vfsgid(mnt_userns, inode));
+ return !vfsuid_valid(i_uid_into_vfsuid(idmap, inode)) ||
+ !vfsgid_valid(i_gid_into_vfsgid(idmap, inode));
}
static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
#define MAX_RW_COUNT (INT_MAX & PAGE_MASK)
-#ifdef CONFIG_FILE_LOCKING
-static inline int break_lease(struct inode *inode, unsigned int mode)
-{
- /*
- * Since this check is lockless, we must ensure that any refcounts
- * taken are done before checking i_flctx->flc_lease. Otherwise, we
- * could end up racing with tasks trying to set a new lease on this
- * file.
- */
- smp_mb();
- if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
- return __break_lease(inode, mode, FL_LEASE);
- return 0;
-}
-
-static inline int break_deleg(struct inode *inode, unsigned int mode)
-{
- /*
- * Since this check is lockless, we must ensure that any refcounts
- * taken are done before checking i_flctx->flc_lease. Otherwise, we
- * could end up racing with tasks trying to set a new lease on this
- * file.
- */
- smp_mb();
- if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
- return __break_lease(inode, mode, FL_DELEG);
- return 0;
-}
-
-static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
-{
- int ret;
-
- ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
- if (ret == -EWOULDBLOCK && delegated_inode) {
- *delegated_inode = inode;
- ihold(inode);
- }
- return ret;
-}
-
-static inline int break_deleg_wait(struct inode **delegated_inode)
-{
- int ret;
-
- ret = break_deleg(*delegated_inode, O_WRONLY);
- iput(*delegated_inode);
- *delegated_inode = NULL;
- return ret;
-}
-
-static inline int break_layout(struct inode *inode, bool wait)
-{
- smp_mb();
- if (inode->i_flctx && !list_empty_careful(&inode->i_flctx->flc_lease))
- return __break_lease(inode,
- wait ? O_WRONLY : O_WRONLY | O_NONBLOCK,
- FL_LAYOUT);
- return 0;
-}
-
-#else /* !CONFIG_FILE_LOCKING */
-static inline int break_lease(struct inode *inode, unsigned int mode)
-{
- return 0;
-}
-
-static inline int break_deleg(struct inode *inode, unsigned int mode)
-{
- return 0;
-}
-
-static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
-{
- return 0;
-}
-
-static inline int break_deleg_wait(struct inode **delegated_inode)
-{
- BUG();
- return 0;
-}
-
-static inline int break_layout(struct inode *inode, bool wait)
-{
- return 0;
-}
-
-#endif /* CONFIG_FILE_LOCKING */
-
/* fs/open.c */
struct audit_names;
struct filename {
};
static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
-static inline struct user_namespace *file_mnt_user_ns(struct file *file)
-{
- return mnt_user_ns(file->f_path.mnt);
-}
-
static inline struct mnt_idmap *file_mnt_idmap(struct file *file)
{
return mnt_idmap(file->f_path.mnt);
}
extern long vfs_truncate(const struct path *, loff_t);
-int do_truncate(struct user_namespace *, struct dentry *, loff_t start,
+int do_truncate(struct mnt_idmap *, struct dentry *, loff_t start,
unsigned int time_attrs, struct file *filp);
extern int vfs_fallocate(struct file *file, int mode, loff_t offset,
loff_t len);
}
#endif
-int notify_change(struct user_namespace *, struct dentry *,
+int notify_change(struct mnt_idmap *, struct dentry *,
struct iattr *, struct inode **);
-int inode_permission(struct user_namespace *, struct inode *, int);
-int generic_permission(struct user_namespace *, struct inode *, int);
+int inode_permission(struct mnt_idmap *, struct inode *, int);
+int generic_permission(struct mnt_idmap *, struct inode *, int);
static inline int file_permission(struct file *file, int mask)
{
- return inode_permission(file_mnt_user_ns(file),
+ return inode_permission(file_mnt_idmap(file),
file_inode(file), mask);
}
static inline int path_permission(const struct path *path, int mask)
{
- return inode_permission(mnt_user_ns(path->mnt),
+ return inode_permission(mnt_idmap(path->mnt),
d_inode(path->dentry), mask);
}
-int __check_sticky(struct user_namespace *mnt_userns, struct inode *dir,
+int __check_sticky(struct mnt_idmap *idmap, struct inode *dir,
struct inode *inode);
static inline bool execute_ok(struct inode *inode)
extern struct inode *new_inode_pseudo(struct super_block *sb);
extern struct inode *new_inode(struct super_block *sb);
extern void free_inode_nonrcu(struct inode *inode);
-extern int setattr_should_drop_suidgid(struct user_namespace *, struct inode *);
+extern int setattr_should_drop_suidgid(struct mnt_idmap *, struct inode *);
extern int file_remove_privs(struct file *);
/*
struct iov_iter *iter);
/* fs/splice.c */
+ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags);
+ssize_t direct_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags);
extern ssize_t generic_file_splice_read(struct file *, loff_t *,
struct pipe_inode_info *, size_t, unsigned int);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
extern int page_symlink(struct inode *inode, const char *symname, int len);
extern const struct inode_operations page_symlink_inode_operations;
extern void kfree_link(void *);
-void generic_fillattr(struct user_namespace *, struct inode *, struct kstat *);
+void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *);
void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
extern int dcache_dir_close(struct inode *, struct file *);
extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
extern int dcache_readdir(struct file *, struct dir_context *);
-extern int simple_setattr(struct user_namespace *, struct dentry *,
+extern int simple_setattr(struct mnt_idmap *, struct dentry *,
struct iattr *);
-extern int simple_getattr(struct user_namespace *, const struct path *,
+extern int simple_getattr(struct mnt_idmap *, const struct path *,
struct kstat *, u32, unsigned int);
extern int simple_statfs(struct dentry *, struct kstatfs *);
extern int simple_open(struct inode *inode, struct file *file);
extern int simple_rmdir(struct inode *, struct dentry *);
extern int simple_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
-extern int simple_rename(struct user_namespace *, struct inode *,
+extern int simple_rename(struct mnt_idmap *, struct inode *,
struct dentry *, struct inode *, struct dentry *,
unsigned int);
extern void simple_recursive_removal(struct dentry *,
extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
-int may_setattr(struct user_namespace *mnt_userns, struct inode *inode,
+int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
unsigned int ia_valid);
-int setattr_prepare(struct user_namespace *, struct dentry *, struct iattr *);
+int setattr_prepare(struct mnt_idmap *, struct dentry *, struct iattr *);
extern int inode_newsize_ok(const struct inode *, loff_t offset);
-void setattr_copy(struct user_namespace *, struct inode *inode,
+void setattr_copy(struct mnt_idmap *, struct inode *inode,
const struct iattr *attr);
extern int file_update_time(struct file *file);
return mode & (S_ISUID | S_ISGID);
}
-static inline int check_sticky(struct user_namespace *mnt_userns,
+static inline int check_sticky(struct mnt_idmap *idmap,
struct inode *dir, struct inode *inode)
{
if (!(dir->i_mode & S_ISVTX))
return 0;
- return __check_sticky(mnt_userns, dir, inode);
+ return __check_sticky(idmap, dir, inode);
}
static inline void inode_has_no_xattr(struct inode *inode)
#ifndef _LINUX_HUGETLB_H
#define _LINUX_HUGETLB_H
+ #include <linux/mm.h>
#include <linux/mm_types.h>
#include <linux/mmdebug.h>
#include <linux/fs.h>
vm_flags_t vm_flags);
long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
long freed);
- int isolate_hugetlb(struct page *page, struct list_head *list);
- int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison);
+ bool isolate_hugetlb(struct folio *folio, struct list_head *list);
+ int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison);
int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
bool *migratable_cleared);
- void putback_active_hugepage(struct page *page);
+ void folio_putback_active_hugetlb(struct folio *folio);
void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
void free_huge_page(struct page *page);
void hugetlb_fix_reserve_counts(struct inode *inode);
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz);
+ /*
+ * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
+ * Returns the pte_t* if found, or NULL if the address is not mapped.
+ *
+ * IMPORTANT: we should normally not directly call this function, instead
+ * this is only a common interface to implement arch-specific
+ * walker. Please use hugetlb_walk() instead, because that will attempt to
+ * verify the locking for you.
+ *
+ * Since this function will walk all the pgtable pages (including not only
+ * high-level pgtable page, but also PUD entry that can be unshared
+ * concurrently for VM_SHARED), the caller of this function should be
+ * responsible of its thread safety. One can follow this rule:
+ *
+ * (1) For private mappings: pmd unsharing is not possible, so holding the
+ * mmap_lock for either read or write is sufficient. Most callers
+ * already hold the mmap_lock, so normally, no special action is
+ * required.
+ *
+ * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged
+ * pgtable page can go away from under us! It can be done by a pmd
+ * unshare with a follow up munmap() on the other process), then we
+ * need either:
+ *
+ * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare
+ * won't happen upon the range (it also makes sure the pte_t we
+ * read is the right and stable one), or,
+ *
+ * (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make
+ * sure even if unshare happened the racy unmap() will wait until
+ * i_mmap_rwsem is released.
+ *
+ * Option (2.1) is the safest, which guarantees pte stability from pmd
+ * sharing pov, until the vma lock released. Option (2.2) doesn't protect
+ * a concurrent pmd unshare, but it makes sure the pgtable page is safe to
+ * access.
+ */
pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz);
unsigned long hugetlb_mask_last_page(struct hstate *h);
int pmd_huge(pmd_t pmd);
int pud_huge(pud_t pud);
- unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
+ long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot,
unsigned long cp_flags);
return NULL;
}
- static inline int isolate_hugetlb(struct page *page, struct list_head *list)
+ static inline bool isolate_hugetlb(struct folio *folio, struct list_head *list)
{
- return -EBUSY;
+ return false;
}
- static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb, bool unpoison)
+ static inline int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison)
{
return 0;
}
return 0;
}
- static inline void putback_active_hugepage(struct page *page)
+ static inline void folio_putback_active_hugetlb(struct folio *folio)
{
}
{
}
- static inline unsigned long hugetlb_change_protection(
+ static inline long hugetlb_change_protection(
struct vm_area_struct *vma, unsigned long address,
unsigned long end, pgprot_t newprot,
unsigned long cp_flags)
};
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
- struct page *alloc_huge_page(struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
- struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
- struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
+ struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
- int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
+ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
- unsigned long address, struct page *page);
+ unsigned long address, struct folio *folio);
/* arch callback */
int __init __alloc_bootmem_huge_page(struct hstate *h, int nid);
if (!page_size_log)
return &default_hstate;
- return size_to_hstate(1UL << page_size_log);
+ if (page_size_log < BITS_PER_LONG)
+ return size_to_hstate(1UL << page_size_log);
+
+ return NULL;
}
static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
unsigned long end_pfn);
#ifdef CONFIG_MEMORY_FAILURE
- extern void hugetlb_clear_page_hwpoison(struct page *hpage);
+ extern void folio_clear_hugetlb_hwpoison(struct folio *folio);
#else
- static inline void hugetlb_clear_page_hwpoison(struct page *hpage)
+ static inline void folio_clear_hugetlb_hwpoison(struct folio *folio)
{
}
#endif
return -ENOMEM;
}
- static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
+ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
int avoid_reserve)
{
return NULL;
}
- static inline struct page *
- alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+ static inline struct folio *
+ alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask)
{
return NULL;
}
- static inline struct page *alloc_huge_page_vma(struct hstate *h,
+ static inline struct folio *alloc_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
unsigned long address)
{
#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end)
#endif
+ static inline bool __vma_shareable_lock(struct vm_area_struct *vma)
+ {
+ return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data;
+ }
+
+ /*
+ * Safe version of huge_pte_offset() to check the locks. See comments
+ * above huge_pte_offset().
+ */
+ static inline pte_t *
+ hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz)
+ {
+ #if defined(CONFIG_HUGETLB_PAGE) && \
+ defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP)
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ /*
+ * If pmd sharing possible, locking needed to safely walk the
+ * hugetlb pgtables. More information can be found at the comment
+ * above huge_pte_offset() in the same file.
+ *
+ * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP.
+ */
+ if (__vma_shareable_lock(vma))
+ WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) &&
+ !lockdep_is_held(
+ &vma->vm_file->f_mapping->i_mmap_rwsem));
+ #endif
+ return huge_pte_offset(vma->vm_mm, addr, sz);
+ }
+
#endif /* _LINUX_HUGETLB_H */
}
/*
- * page_memcg_check - get the memory cgroup associated with a page
- * @page: a pointer to the page struct
+ * folio_memcg_check - Get the memory cgroup associated with a folio.
+ * @folio: Pointer to the folio.
*
- * Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any page
- * as an argument. It has to be used in cases when it's not known if a page
+ * Returns a pointer to the memory cgroup associated with the folio,
+ * or NULL. This function unlike folio_memcg() can take any folio
+ * as an argument. It has to be used in cases when it's not known if a folio
* has an associated memory cgroup pointer or an object cgroups vector or
* an object cgroup.
*
- * For a non-kmem page any of the following ensures page and memcg binding
+ * For a non-kmem folio any of the following ensures folio and memcg binding
* stability:
*
- * - the page lock
+ * - the folio lock
* - LRU isolation
- * - lock_page_memcg()
+ * - lock_folio_memcg()
* - exclusive reference
* - mem_cgroup_trylock_pages()
*
- * For a kmem page a caller should hold an rcu read lock to protect memcg
- * associated with a kmem page from being released.
+ * For a kmem folio a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem folio from being released.
*/
- static inline struct mem_cgroup *page_memcg_check(struct page *page)
+ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
{
/*
- * Because page->memcg_data might be changed asynchronously
- * for slab pages, READ_ONCE() should be used here.
+ * Because folio->memcg_data might be changed asynchronously
+ * for slabs, READ_ONCE() should be used here.
*/
- unsigned long memcg_data = READ_ONCE(page->memcg_data);
+ unsigned long memcg_data = READ_ONCE(folio->memcg_data);
if (memcg_data & MEMCG_DATA_OBJCGS)
return NULL;
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}
+ static inline struct mem_cgroup *page_memcg_check(struct page *page)
+ {
+ if (PageTail(page))
+ return NULL;
+ return folio_memcg_check((struct folio *)page);
+ }
+
static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
{
struct mem_cgroup *memcg;
percpu_ref_put(&objcg->refcnt);
}
+ static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+ {
+ return !memcg || css_tryget(&memcg->css);
+ }
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
if (memcg)
return match;
}
- struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page);
+ struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio);
ino_t page_cgroup_ino(struct page *page);
static inline bool mem_cgroup_online(struct mem_cgroup *memcg)
return NULL;
}
+ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
+ {
+ return NULL;
+ }
+
static inline struct mem_cgroup *page_memcg_check(struct page *page)
{
return NULL;
{
}
+ static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+ {
+ return true;
+ }
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
}
int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
- extern struct static_key_false memcg_kmem_enabled_key;
+extern struct static_key_false memcg_bpf_enabled_key;
+static inline bool memcg_bpf_enabled(void)
+{
+ return static_branch_likely(&memcg_bpf_enabled_key);
+}
+
+ extern struct static_key_false memcg_kmem_online_key;
- static inline bool memcg_kmem_enabled(void)
+ static inline bool memcg_kmem_online(void)
{
- return static_branch_likely(&memcg_kmem_enabled_key);
+ return static_branch_likely(&memcg_kmem_online_key);
}
static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp,
int order)
{
- if (memcg_kmem_enabled())
+ if (memcg_kmem_online())
return __memcg_kmem_charge_page(page, gfp, order);
return 0;
}
static inline void memcg_kmem_uncharge_page(struct page *page, int order)
{
- if (memcg_kmem_enabled())
+ if (memcg_kmem_online())
__memcg_kmem_uncharge_page(page, order);
}
{
struct mem_cgroup *memcg;
- if (!memcg_kmem_enabled())
+ if (!memcg_kmem_online())
return;
rcu_read_lock();
return NULL;
}
- static inline bool memcg_kmem_enabled(void)
+static inline bool memcg_bpf_enabled(void)
+{
+ return false;
+}
+
+ static inline bool memcg_kmem_online(void)
{
return false;
}
#define VM_MAYSHARE 0x00000080
#define VM_GROWSDOWN 0x00000100 /* general info on the segment */
+ #ifdef CONFIG_MMU
#define VM_UFFD_MISSING 0x00000200 /* missing pages tracking */
+ #else /* CONFIG_MMU */
+ #define VM_MAYOVERLAY 0x00000200 /* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
+ #define VM_UFFD_MISSING 0
+ #endif /* CONFIG_MMU */
#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
#define VM_UFFD_WP 0x00001000 /* wrprotect pages tracking */
/* This mask defines which mm->def_flags a process can inherit its parent */
#define VM_INIT_DEF_MASK VM_NOHUGEPAGE
- /* This mask is used to clear all the VMA flags used by mlock */
- #define VM_LOCKED_CLEAR_MASK (~(VM_LOCKED | VM_LOCKONFAULT))
+ /* This mask represents all the VMA flag bits used by mlock */
+ #define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
/* Arch-specific flags to clear when updating VM flags on protection change */
#ifndef VM_ARCH_CLEAR
INIT_LIST_HEAD(&vma->anon_vma_chain);
}
+ /* Use when VMA is not part of the VMA tree and needs no locking */
+ static inline void vm_flags_init(struct vm_area_struct *vma,
+ vm_flags_t flags)
+ {
+ ACCESS_PRIVATE(vma, __vm_flags) = flags;
+ }
+
+ /* Use when VMA is part of the VMA tree and modifications need coordination */
+ static inline void vm_flags_reset(struct vm_area_struct *vma,
+ vm_flags_t flags)
+ {
+ mmap_assert_write_locked(vma->vm_mm);
+ vm_flags_init(vma, flags);
+ }
+
+ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
+ vm_flags_t flags)
+ {
+ mmap_assert_write_locked(vma->vm_mm);
+ WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
+ }
+
+ static inline void vm_flags_set(struct vm_area_struct *vma,
+ vm_flags_t flags)
+ {
+ mmap_assert_write_locked(vma->vm_mm);
+ ACCESS_PRIVATE(vma, __vm_flags) |= flags;
+ }
+
+ static inline void vm_flags_clear(struct vm_area_struct *vma,
+ vm_flags_t flags)
+ {
+ mmap_assert_write_locked(vma->vm_mm);
+ ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
+ }
+
+ /*
+ * Use only if VMA is not part of the VMA tree or has no other users and
+ * therefore needs no locking.
+ */
+ static inline void __vm_flags_mod(struct vm_area_struct *vma,
+ vm_flags_t set, vm_flags_t clear)
+ {
+ vm_flags_init(vma, (vma->vm_flags | set) & ~clear);
+ }
+
+ /*
+ * Use only when the order of set/clear operations is unimportant, otherwise
+ * use vm_flags_{set|clear} explicitly.
+ */
+ static inline void vm_flags_mod(struct vm_area_struct *vma,
+ vm_flags_t set, vm_flags_t clear)
+ {
+ mmap_assert_write_locked(vma->vm_mm);
+ __vm_flags_mod(vma, set, clear);
+ }
+
static inline void vma_set_anonymous(struct vm_area_struct *vma)
{
vma->vm_ops = NULL;
static inline
struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max)
{
- return mas_find(&vmi->mas, max);
+ return mas_find(&vmi->mas, max - 1);
}
static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi)
{
/*
- * Uses vma_find() to get the first VMA when the iterator starts.
+ * Uses mas_find() to get the first VMA when the iterator starts.
* Calling mas_next() could skip the first entry.
*/
- return vma_find(vmi, ULONG_MAX);
+ return mas_find(&vmi->mas, ULONG_MAX);
}
static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi)
return vmi->mas.index;
}
+ static inline unsigned long vma_iter_end(struct vma_iterator *vmi)
+ {
+ return vmi->mas.last + 1;
+ }
+ static inline int vma_iter_bulk_alloc(struct vma_iterator *vmi,
+ unsigned long count)
+ {
+ return mas_expected_entries(&vmi->mas, count);
+ }
+
+ /* Free any unused preallocations */
+ static inline void vma_iter_free(struct vma_iterator *vmi)
+ {
+ mas_destroy(&vmi->mas);
+ }
+
+ static inline int vma_iter_bulk_store(struct vma_iterator *vmi,
+ struct vm_area_struct *vma)
+ {
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store(&vmi->mas, vma);
+ if (unlikely(mas_is_err(&vmi->mas)))
+ return -ENOMEM;
+
+ return 0;
+ }
+
+ static inline void vma_iter_invalidate(struct vma_iterator *vmi)
+ {
+ mas_pause(&vmi->mas);
+ }
+
+ static inline void vma_iter_set(struct vma_iterator *vmi, unsigned long addr)
+ {
+ mas_set(&vmi->mas, addr);
+ }
+
#define for_each_vma(__vmi, __vma) \
while (((__vma) = vma_next(&(__vmi))) != NULL)
/* The MM code likes to work with exclusive end addresses */
#define for_each_vma_range(__vmi, __vma, __end) \
- while (((__vma) = vma_find(&(__vmi), (__end) - 1)) != NULL)
+ while (((__vma) = vma_find(&(__vmi), (__end))) != NULL)
#ifdef CONFIG_SHMEM
/*
struct mmu_gather;
struct inode;
+ /*
+ * compound_order() can be called without holding a reference, which means
+ * that niceties like page_folio() don't work. These callers should be
+ * prepared to handle wild return values. For example, PG_head may be
+ * set before _folio_order is initialised, or this may be a tail page.
+ * See compaction.c for some good examples.
+ */
static inline unsigned int compound_order(struct page *page)
{
- if (!PageHead(page))
+ struct folio *folio = (struct folio *)page;
+
+ if (!test_bit(PG_head, &folio->flags))
return 0;
- return page[1].compound_order;
+ return folio->_folio_order;
}
/**
return page_ref_add_unless(page, 1, 0);
}
+ static inline struct folio *folio_get_nontail_page(struct page *page)
+ {
+ if (unlikely(!get_page_unless_zero(page)))
+ return NULL;
+ return (struct folio *)page;
+ }
+
extern int page_is_ram(unsigned long pfn);
enum {
static inline int folio_entire_mapcount(struct folio *folio)
{
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
- return atomic_read(folio_mapcount_ptr(folio)) + 1;
- }
-
- /*
- * Mapcount of compound page as a whole, does not include mapped sub-pages.
- * Must be called only on head of compound page.
- */
- static inline int head_compound_mapcount(struct page *head)
- {
- return atomic_read(compound_mapcount_ptr(head)) + 1;
- }
-
- /*
- * If a 16GB hugetlb page were mapped by PTEs of all of its 4kB sub-pages,
- * its subpages_mapcount would be 0x400000: choose the COMPOUND_MAPPED bit
- * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently
- * leaves subpages_mapcount at 0, but avoid surprise if it participates later.
- */
- #define COMPOUND_MAPPED 0x800000
- #define SUBPAGES_MAPPED (COMPOUND_MAPPED - 1)
-
- /*
- * Number of sub-pages mapped by PTE, does not include compound mapcount.
- * Must be called only on head of compound page.
- */
- static inline int head_subpages_mapcount(struct page *head)
- {
- return atomic_read(subpages_mapcount_ptr(head)) & SUBPAGES_MAPPED;
+ return atomic_read(&folio->_entire_mapcount) + 1;
}
/*
atomic_set(&(page)->_mapcount, -1);
}
- /*
- * Mapcount of 0-order page; when compound sub-page, includes
- * compound_mapcount of compound_head of page.
+ /**
+ * page_mapcount() - Number of times this precise page is mapped.
+ * @page: The page.
+ *
+ * The number of times this page is mapped. If this page is part of
+ * a large folio, it includes the number of times this page is mapped
+ * as part of that folio.
*
- * Result is undefined for pages which cannot be mapped into userspace.
+ * The result is undefined for pages which cannot be mapped into userspace.
* For example SLAB or special types of pages. See function page_has_type().
- * They use this place in struct page differently.
+ * They use this field in struct page differently.
*/
static inline int page_mapcount(struct page *page)
{
int mapcount = atomic_read(&page->_mapcount) + 1;
- if (likely(!PageCompound(page)))
- return mapcount;
- page = compound_head(page);
- return head_compound_mapcount(page) + mapcount;
+ if (unlikely(PageCompound(page)))
+ mapcount += folio_entire_mapcount(page_folio(page));
+
+ return mapcount;
}
- int total_compound_mapcount(struct page *head);
+ int folio_total_mapcount(struct folio *folio);
/**
* folio_mapcount() - Calculate the number of mappings of this folio.
{
if (likely(!folio_test_large(folio)))
return atomic_read(&folio->_mapcount) + 1;
- return total_compound_mapcount(&folio->page);
+ return folio_total_mapcount(folio);
}
static inline int total_mapcount(struct page *page)
{
if (likely(!PageCompound(page)))
return atomic_read(&page->_mapcount) + 1;
- return total_compound_mapcount(compound_head(page));
+ return folio_total_mapcount(page_folio(page));
}
static inline bool folio_large_is_mapped(struct folio *folio)
{
/*
- * Reading folio_mapcount_ptr() below could be omitted if hugetlb
- * participated in incrementing subpages_mapcount when compound mapped.
+ * Reading _entire_mapcount below could be omitted if hugetlb
+ * participated in incrementing nr_pages_mapped when compound mapped.
*/
- return atomic_read(folio_subpages_mapcount_ptr(folio)) > 0 ||
- atomic_read(folio_mapcount_ptr(folio)) >= 0;
+ return atomic_read(&folio->_nr_pages_mapped) > 0 ||
+ atomic_read(&folio->_entire_mapcount) >= 0;
}
/**
static inline void set_compound_page_dtor(struct page *page,
enum compound_dtor_id compound_dtor)
{
+ struct folio *folio = (struct folio *)page;
+
VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
- page[1].compound_dtor = compound_dtor;
+ VM_BUG_ON_PAGE(!PageHead(page), page);
+ folio->_folio_dtor = compound_dtor;
}
static inline void folio_set_compound_dtor(struct folio *folio,
void destroy_large_folio(struct folio *folio);
- static inline int head_compound_pincount(struct page *head)
- {
- return atomic_read(compound_pincount_ptr(head));
- }
-
static inline void set_compound_order(struct page *page, unsigned int order)
{
- page[1].compound_order = order;
- #ifdef CONFIG_64BIT
- page[1].compound_nr = 1U << order;
- #endif
- }
-
- /*
- * folio_set_compound_order is generally passed a non-zero order to
- * initialize a large folio. However, hugetlb code abuses this by
- * passing in zero when 'dissolving' a large folio.
- */
- static inline void folio_set_compound_order(struct folio *folio,
- unsigned int order)
- {
- VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
+ struct folio *folio = (struct folio *)page;
folio->_folio_order = order;
#ifdef CONFIG_64BIT
- folio->_folio_nr_pages = order ? 1U << order : 0;
- #endif
- }
-
- /* Returns the number of pages in this potentially compound page. */
- static inline unsigned long compound_nr(struct page *page)
- {
- if (!PageHead(page))
- return 1;
- #ifdef CONFIG_64BIT
- return page[1].compound_nr;
- #else
- return 1UL << compound_order(page);
+ folio->_folio_nr_pages = 1U << order;
#endif
}
return compound_order(page);
}
- /**
- * thp_nr_pages - The number of regular pages in this huge page.
- * @page: The head page of a huge page.
- */
- static inline int thp_nr_pages(struct page *page)
- {
- VM_BUG_ON_PGFLAGS(PageTail(page), page);
- return compound_nr(page);
- }
-
/**
* thp_size - Size of a transparent huge page.
* @page: Head page of a transparent huge page.
folio_get(page_folio(page));
}
- int __must_check try_grab_page(struct page *page, unsigned int flags);
-
static inline __must_check bool try_get_page(struct page *page)
{
page = compound_head(page);
return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}
+ #ifndef CONFIG_MMU
+ static inline bool is_nommu_shared_mapping(vm_flags_t flags)
+ {
+ /*
+ * NOMMU shared mappings are ordinary MAP_SHARED mappings and selected
+ * R/O MAP_PRIVATE file mappings that are an effective R/O overlay of
+ * a file mapping. R/O MAP_PRIVATE mappings might still modify
+ * underlying memory if ptrace is active, so this is only possible if
+ * ptrace does not apply. Note that there is no mprotect() to upgrade
+ * write permissions later.
+ */
+ return flags & (VM_MAYSHARE | VM_MAYOVERLAY);
+ }
+ #endif
+
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTION_IN_PAGE_FLAGS
#endif
return page_folio(pfn_to_page(pfn));
}
- static inline atomic_t *folio_pincount_ptr(struct folio *folio)
- {
- return &folio_page(folio, 1)->compound_pincount;
- }
-
/**
* folio_maybe_dma_pinned - Report if a folio may be pinned for DMA.
* @folio: The folio.
* expected to be able to deal gracefully with a false positive.
*
* For large folios, the result will be exactly correct. That's because
- * we have more tracking data available: the compound_pincount is used
+ * we have more tracking data available: the _pincount field is used
* instead of the GUP_PIN_COUNTING_BIAS scheme.
*
* For more information, please see Documentation/core-api/pin_user_pages.rst.
static inline bool folio_maybe_dma_pinned(struct folio *folio)
{
if (folio_test_large(folio))
- return atomic_read(folio_pincount_ptr(folio)) > 0;
+ return atomic_read(&folio->_pincount) > 0;
/*
* folio_ref_count() is signed. If that refcount overflows, then
#endif
}
+ /*
+ * compound_nr() returns the number of pages in this potentially compound
+ * page. compound_nr() can be called on a tail page, and is defined to
+ * return 1 in that case.
+ */
+ static inline unsigned long compound_nr(struct page *page)
+ {
+ struct folio *folio = (struct folio *)page;
+
+ if (!test_bit(PG_head, &folio->flags))
+ return 1;
+ #ifdef CONFIG_64BIT
+ return folio->_folio_nr_pages;
+ #else
+ return 1L << folio->_folio_order;
+ #endif
+ }
+
+ /**
+ * thp_nr_pages - The number of regular pages in this huge page.
+ * @page: The head page of a huge page.
+ */
+ static inline int thp_nr_pages(struct page *page)
+ {
+ return folio_nr_pages((struct folio *)page);
+ }
+
/**
* folio_next - Move to the next physical folio.
* @folio: The folio we're currently operating on.
return PAGE_SIZE << folio_order(folio);
}
+ /**
+ * folio_estimated_sharers - Estimate the number of sharers of a folio.
+ * @folio: The folio.
+ *
+ * folio_estimated_sharers() aims to serve as a function to efficiently
+ * estimate the number of processes sharing a folio. This is done by
+ * looking at the precise mapcount of the first subpage in the folio, and
+ * assuming the other subpages are the same. This may not be true for large
+ * folios. If you want exact mapcounts for exact calculations, look at
+ * page_mapcount() or folio_total_mapcount().
+ *
+ * Return: The estimated number of processes sharing a folio.
+ */
+ static inline int folio_estimated_sharers(struct folio *folio)
+ {
+ return page_mapcount(folio_page(folio, 0));
+ }
+
#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
static inline int arch_make_page_accessible(struct page *page)
{
return (uintptr_t)page->lru.next & BIT(1);
}
+ /*
+ * Return true only if the folio has been allocated with
+ * ALLOC_NO_WATERMARKS and the low watermark was not
+ * met implying that the system is under some pressure.
+ */
+ static inline bool folio_is_pfmemalloc(const struct folio *folio)
+ {
+ /*
+ * lru.next has bit 1 set if the page is allocated from the
+ * pfmemalloc reserves. Callers may simply overwrite it if
+ * they do not need to preserve that information.
+ */
+ return (uintptr_t)folio->lru.next & BIT(1);
+ }
+
/*
* Only to be called by the page allocator on a freshly allocated
* page.
/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */
#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1))
+#ifdef CONFIG_SCHED_MM_CID
+void sched_mm_cid_before_execve(struct task_struct *t);
+void sched_mm_cid_after_execve(struct task_struct *t);
+void sched_mm_cid_fork(struct task_struct *t);
+void sched_mm_cid_exit_signals(struct task_struct *t);
+static inline int task_mm_cid(struct task_struct *t)
+{
+ return t->mm_cid;
+}
+#else
+static inline void sched_mm_cid_before_execve(struct task_struct *t) { }
+static inline void sched_mm_cid_after_execve(struct task_struct *t) { }
+static inline void sched_mm_cid_fork(struct task_struct *t) { }
+static inline void sched_mm_cid_exit_signals(struct task_struct *t) { }
+static inline int task_mm_cid(struct task_struct *t)
+{
+ /*
+ * Use the processor id as a fall-back when the mm cid feature is
+ * disabled. This provides functional per-cpu data structure accesses
+ * in user-space, althrough it won't provide the memory usage benefits.
+ */
+ return raw_smp_processor_id();
+}
+#endif
+
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
#else
extern int user_shm_lock(size_t, struct ucounts *);
extern void user_shm_unlock(size_t, struct ucounts *);
+ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
+ pte_t pte);
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
pte_t pte);
struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
- void zap_page_range(struct vm_area_struct *vma, unsigned long address,
- unsigned long size);
void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details);
+ static inline void zap_vma_pages(struct vm_area_struct *vma)
+ {
+ zap_page_range_single(vma, vma->vm_start,
+ vma->vm_end - vma->vm_start, NULL);
+ }
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
- unsigned long end);
+ unsigned long end, bool mm_wr_locked);
struct mmu_notifier_range;
struct task_struct *task, bool bypass_rlim);
struct kvec;
-int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
- struct page **pages);
struct page *get_dump_page(unsigned long addr);
bool folio_mark_dirty(struct folio *folio);
}
bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
pte_t pte);
- extern unsigned long change_protection(struct mmu_gather *tlb,
+ extern long change_protection(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgprot_t newprot,
- unsigned long cp_flags);
- extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
- struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags);
+ unsigned long end, unsigned long cp_flags);
+ extern int mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
+ struct vm_area_struct *vma, struct vm_area_struct **pprev,
+ unsigned long start, unsigned long end, unsigned long newflags);
/*
* doesn't attempt to fault and will return short.
*/
int get_user_pages_fast_only(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
- int pin_user_pages_fast_only(unsigned long start, int nr_pages,
- unsigned int gup_flags, struct page **pages);
static inline bool get_user_page_fast_only(unsigned long addr,
unsigned int gup_flags, struct page **pagep)
/* mmap.c */
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
- extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand);
- static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
- {
- return __vma_adjust(vma, start, end, pgoff, insert, NULL);
- }
- extern struct vm_area_struct *vma_merge(struct mm_struct *,
- struct vm_area_struct *prev, unsigned long addr, unsigned long end,
- unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
+ extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff,
+ struct vm_area_struct *next);
+ extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end, pgoff_t pgoff);
+ extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi,
+ struct mm_struct *, struct vm_area_struct *prev, unsigned long addr,
+ unsigned long end, unsigned long vm_flags, struct anon_vma *,
+ struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx,
+ struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
- extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
- extern int split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
+ extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
+ extern int split_vma(struct vma_iterator *vmi, struct vm_area_struct *,
+ unsigned long addr, int new_below);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
bool *need_rmap_locks);
extern void exit_mmap(struct mm_struct *);
- void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas);
- void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas);
-
static inline int check_data_rlimit(unsigned long rlim,
unsigned long new,
unsigned long start,
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
unsigned long pgoff, unsigned long *populate, struct list_head *uf);
- extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm,
+ extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
unsigned long start, size_t len, struct list_head *uf,
bool downgrade);
extern int do_munmap(struct mm_struct *, unsigned long, size_t,
extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior);
#ifdef CONFIG_MMU
+ extern int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct list_head *uf, bool downgrade);
extern int __mm_populate(unsigned long addr, unsigned long len,
int ignore_errors);
static inline void mm_populate(unsigned long addr, unsigned long len)
struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
unsigned int foll_flags);
- #define FOLL_WRITE 0x01 /* check pte is writable */
- #define FOLL_TOUCH 0x02 /* mark page accessed */
- #define FOLL_GET 0x04 /* do get_page on page */
- #define FOLL_DUMP 0x08 /* give error on hole if it would be zero */
- #define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */
- #define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO
- * and return without waiting upon it */
- #define FOLL_NOFAULT 0x80 /* do not fault in pages */
- #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
- #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
- #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
- #define FOLL_ANON 0x8000 /* don't do file mappings */
- #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */
- #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */
- #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */
- #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */
- #define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */
- #define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */
-
- /*
- * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
- * other. Here is what they mean, and how to use them:
- *
- * FOLL_LONGTERM indicates that the page will be held for an indefinite time
- * period _often_ under userspace control. This is in contrast to
- * iov_iter_get_pages(), whose usages are transient.
- *
- * FIXME: For pages which are part of a filesystem, mappings are subject to the
- * lifetime enforced by the filesystem and we need guarantees that longterm
- * users like RDMA and V4L2 only establish mappings which coordinate usage with
- * the filesystem. Ideas for this coordination include revoking the longterm
- * pin, delaying writeback, bounce buffer page writeback, etc. As FS DAX was
- * added after the problem with filesystems was found FS DAX VMAs are
- * specifically failed. Filesystem pages are still subject to bugs and use of
- * FOLL_LONGTERM should be avoided on those pages.
- *
- * FIXME: Also NOTE that FOLL_LONGTERM is not supported in every GUP call.
- * Currently only get_user_pages() and get_user_pages_fast() support this flag
- * and calls to get_user_pages_[un]locked are specifically not allowed. This
- * is due to an incompatibility with the FS DAX check and
- * FAULT_FLAG_ALLOW_RETRY.
- *
- * In the CMA case: long term pins in a CMA region would unnecessarily fragment
- * that region. And so, CMA attempts to migrate the page before pinning, when
- * FOLL_LONGTERM is specified.
- *
- * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
- * but an additional pin counting system) will be invoked. This is intended for
- * anything that gets a page reference and then touches page data (for example,
- * Direct IO). This lets the filesystem know that some non-file-system entity is
- * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
- * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
- * a call to unpin_user_page().
- *
- * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
- * and separate refcounting mechanisms, however, and that means that each has
- * its own acquire and release mechanisms:
- *
- * FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
- *
- * FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release.
- *
- * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
- * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
- * calls applied to them, and that's perfectly OK. This is a constraint on the
- * callers, not on the pages.)
- *
- * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
- * directly by the caller. That's in order to help avoid mismatches when
- * releasing pages: get_user_pages*() pages must be released via put_page(),
- * while pin_user_pages*() pages must be released via unpin_user_page().
- *
- * Please see Documentation/core-api/pin_user_pages.rst for more information.
- */
-
static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
{
if (vm_fault & VM_FAULT_OOM)
return 0;
}
- /*
- * Indicates for which pages that are write-protected in the page table,
- * whether GUP has to trigger unsharing via FAULT_FLAG_UNSHARE such that the
- * GUP pin will remain consistent with the pages mapped into the page tables
- * of the MM.
- *
- * Temporary unmapping of PageAnonExclusive() pages or clearing of
- * PageAnonExclusive() has to protect against concurrent GUP:
- * * Ordinary GUP: Using the PT lock
- * * GUP-fast and fork(): mm->write_protect_seq
- * * GUP-fast and KSM or temporary unmapping (swap, migration): see
- * page_try_share_anon_rmap()
- *
- * Must be called with the (sub)page that's actually referenced via the
- * page table entry, which might not necessarily be the head page for a
- * PTE-mapped THP.
- *
- * If the vma is NULL, we're coming from the GUP-fast path and might have
- * to fallback to the slow path just to lookup the vma.
- */
- static inline bool gup_must_unshare(struct vm_area_struct *vma,
- unsigned int flags, struct page *page)
- {
- /*
- * FOLL_WRITE is implicitly handled correctly as the page table entry
- * has to be writable -- and if it references (part of) an anonymous
- * folio, that part is required to be marked exclusive.
- */
- if ((flags & (FOLL_WRITE | FOLL_PIN)) != FOLL_PIN)
- return false;
- /*
- * Note: PageAnon(page) is stable until the page is actually getting
- * freed.
- */
- if (!PageAnon(page)) {
- /*
- * We only care about R/O long-term pining: R/O short-term
- * pinning does not have the semantics to observe successive
- * changes through the process page tables.
- */
- if (!(flags & FOLL_LONGTERM))
- return false;
-
- /* We really need the vma ... */
- if (!vma)
- return true;
-
- /*
- * ... because we only care about writable private ("COW")
- * mappings where we have to break COW early.
- */
- return is_cow_mapping(vma->vm_flags);
- }
-
- /* Paired with a memory barrier in page_try_share_anon_rmap(). */
- if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
- smp_rmb();
-
- /*
- * Note that PageKsm() pages cannot be exclusive, and consequently,
- * cannot get pinned.
- */
- return !PageAnonExclusive(page);
- }
-
/*
* Indicates whether GUP can follow a PROT_NONE mapped page, or whether
* a (NUMA hinting) fault is required.
MF_MSG_UNKNOWN,
};
+ /*
+ * Sysfs entries for memory failure handling statistics.
+ */
+ extern const struct attribute_group memory_failure_attr_group;
+
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
extern void clear_huge_page(struct page *page,
unsigned long addr_hint,
* VM_MAYWRITE as we still want them to be COW-writable.
*/
if (vma->vm_flags & VM_SHARED)
- vma->vm_flags &= ~(VM_MAYWRITE);
+ vm_flags_clear(vma, VM_MAYWRITE);
}
return 0;
};
struct { /* Tail pages of compound page */
unsigned long compound_head; /* Bit zero is set */
-
- /* First tail page only */
- unsigned char compound_dtor;
- unsigned char compound_order;
- atomic_t compound_mapcount;
- atomic_t subpages_mapcount;
- atomic_t compound_pincount;
- #ifdef CONFIG_64BIT
- unsigned int compound_nr; /* 1 << compound_order */
- #endif
- };
- struct { /* Second tail page of transparent huge page */
- unsigned long _compound_pad_1; /* compound_head */
- unsigned long _compound_pad_2;
- /* For both global and memcg */
- struct list_head deferred_list;
- };
- struct { /* Second tail page of hugetlb page */
- unsigned long _hugetlb_pad_1; /* compound_head */
- void *hugetlb_subpool;
- void *hugetlb_cgroup;
- void *hugetlb_cgroup_rsvd;
- void *hugetlb_hwpoison;
- /* No more space on 32-bit: use third tail if more */
};
struct { /* Page table pages */
unsigned long _pt_pad_1; /* compound_head */
* @_refcount: Do not access this member directly. Use folio_ref_count()
* to find how many references there are to this folio.
* @memcg_data: Memory Control Group data.
- * @_flags_1: For large folios, additional page flags.
- * @_head_1: Points to the folio. Do not use.
* @_folio_dtor: Which destructor to use for this folio.
* @_folio_order: Do not use directly, call folio_order().
- * @_compound_mapcount: Do not use directly, call folio_entire_mapcount().
- * @_subpages_mapcount: Do not use directly, call folio_mapcount().
+ * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
+ * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
* @_pincount: Do not use directly, call folio_maybe_dma_pinned().
* @_folio_nr_pages: Do not use directly, call folio_nr_pages().
- * @_flags_2: For alignment. Do not use.
- * @_head_2: Points to the folio. Do not use.
* @_hugetlb_subpool: Do not use directly, use accessor in hugetlb.h.
* @_hugetlb_cgroup: Do not use directly, use accessor in hugetlb_cgroup.h.
* @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h.
* @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head().
+ * @_deferred_list: Folios to be split under memory pressure.
*
* A folio is a physically, virtually and logically contiguous set
* of bytes. It is a power-of-two in size, and it is aligned to that
struct {
unsigned long _flags_1;
unsigned long _head_1;
+ /* public: */
unsigned char _folio_dtor;
unsigned char _folio_order;
- atomic_t _compound_mapcount;
- atomic_t _subpages_mapcount;
+ atomic_t _entire_mapcount;
+ atomic_t _nr_pages_mapped;
atomic_t _pincount;
#ifdef CONFIG_64BIT
unsigned int _folio_nr_pages;
#endif
+ /* private: the union with struct page is transitional */
};
struct page __page_1;
};
struct {
unsigned long _flags_2;
unsigned long _head_2;
+ /* public: */
void *_hugetlb_subpool;
void *_hugetlb_cgroup;
void *_hugetlb_cgroup_rsvd;
void *_hugetlb_hwpoison;
+ /* private: the union with struct page is transitional */
+ };
+ struct {
+ unsigned long _flags_2a;
+ unsigned long _head_2a;
+ /* public: */
+ struct list_head _deferred_list;
+ /* private: the union with struct page is transitional */
};
struct page __page_2;
};
offsetof(struct page, pg) + sizeof(struct page))
FOLIO_MATCH(flags, _flags_1);
FOLIO_MATCH(compound_head, _head_1);
- FOLIO_MATCH(compound_dtor, _folio_dtor);
- FOLIO_MATCH(compound_order, _folio_order);
- FOLIO_MATCH(compound_mapcount, _compound_mapcount);
- FOLIO_MATCH(subpages_mapcount, _subpages_mapcount);
- FOLIO_MATCH(compound_pincount, _pincount);
- #ifdef CONFIG_64BIT
- FOLIO_MATCH(compound_nr, _folio_nr_pages);
- #endif
#undef FOLIO_MATCH
#define FOLIO_MATCH(pg, fl) \
static_assert(offsetof(struct folio, fl) == \
offsetof(struct page, pg) + 2 * sizeof(struct page))
FOLIO_MATCH(flags, _flags_2);
FOLIO_MATCH(compound_head, _head_2);
- FOLIO_MATCH(hugetlb_subpool, _hugetlb_subpool);
- FOLIO_MATCH(hugetlb_cgroup, _hugetlb_cgroup);
- FOLIO_MATCH(hugetlb_cgroup_rsvd, _hugetlb_cgroup_rsvd);
- FOLIO_MATCH(hugetlb_hwpoison, _hugetlb_hwpoison);
#undef FOLIO_MATCH
- static inline atomic_t *folio_mapcount_ptr(struct folio *folio)
- {
- struct page *tail = &folio->page + 1;
- return &tail->compound_mapcount;
- }
-
- static inline atomic_t *folio_subpages_mapcount_ptr(struct folio *folio)
- {
- struct page *tail = &folio->page + 1;
- return &tail->subpages_mapcount;
- }
-
- static inline atomic_t *compound_mapcount_ptr(struct page *page)
- {
- return &page[1].compound_mapcount;
- }
-
- static inline atomic_t *subpages_mapcount_ptr(struct page *page)
- {
- return &page[1].subpages_mapcount;
- }
-
- static inline atomic_t *compound_pincount_ptr(struct page *page)
- {
- return &page[1].compound_pincount;
- }
-
/*
* Used for sizing the vmemmap region on some architectures
*/
* See vmf_insert_mixed_prot() for discussion.
*/
pgprot_t vm_page_prot;
- unsigned long vm_flags; /* Flags, see mm.h. */
+
+ /*
+ * Flags, see mm.h.
+ * To modify use vm_flags_{init|reset|set|clear|mod} functions.
+ */
+ union {
+ const vm_flags_t vm_flags;
+ vm_flags_t __private __vm_flags;
+ };
/*
* For areas with an address space and backing store,
* &struct mm_struct is freed.
*/
atomic_t mm_count;
-
+#ifdef CONFIG_SCHED_MM_CID
+ /**
+ * @cid_lock: Protect cid bitmap updates vs lookups.
+ *
+ * Prevent situations where updates to the cid bitmap happen
+ * concurrently with lookups. Those can lead to situations
+ * where a lookup cannot find a free bit simply because it was
+ * unlucky enough to load, non-atomically, bitmap words as they
+ * were being concurrently updated by the updaters.
+ */
+ raw_spinlock_t cid_lock;
+#endif
#ifdef CONFIG_MMU
- atomic_long_t pgtables_bytes; /* PTE page table pages */
+ atomic_long_t pgtables_bytes; /* size of all page tables */
#endif
int map_count; /* number of VMAs */
static inline void vma_iter_init(struct vma_iterator *vmi,
struct mm_struct *mm, unsigned long addr)
{
- vmi->mas.tree = &mm->mm_mt;
- vmi->mas.index = addr;
- vmi->mas.node = MAS_START;
+ mas_init(&vmi->mas, &mm->mm_mt, addr);
}
+#ifdef CONFIG_SCHED_MM_CID
+/* Accessor for struct mm_struct's cidmask. */
+static inline cpumask_t *mm_cidmask(struct mm_struct *mm)
+{
+ unsigned long cid_bitmap = (unsigned long)mm;
+
+ cid_bitmap += offsetof(struct mm_struct, cpu_bitmap);
+ /* Skip cpu_bitmap */
+ cid_bitmap += cpumask_size();
+ return (struct cpumask *)cid_bitmap;
+}
+
+static inline void mm_init_cid(struct mm_struct *mm)
+{
+ raw_spin_lock_init(&mm->cid_lock);
+ cpumask_clear(mm_cidmask(mm));
+}
+
+static inline unsigned int mm_cid_size(void)
+{
+ return cpumask_size();
+}
+#else /* CONFIG_SCHED_MM_CID */
+static inline void mm_init_cid(struct mm_struct *mm) { }
+static inline unsigned int mm_cid_size(void)
+{
+ return 0;
+}
+#endif /* CONFIG_SCHED_MM_CID */
+
struct mmu_gather;
extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
typedef unsigned int __bitwise zap_flags_t;
+ /*
+ * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
+ * other. Here is what they mean, and how to use them:
+ *
+ *
+ * FIXME: For pages which are part of a filesystem, mappings are subject to the
+ * lifetime enforced by the filesystem and we need guarantees that longterm
+ * users like RDMA and V4L2 only establish mappings which coordinate usage with
+ * the filesystem. Ideas for this coordination include revoking the longterm
+ * pin, delaying writeback, bounce buffer page writeback, etc. As FS DAX was
+ * added after the problem with filesystems was found FS DAX VMAs are
+ * specifically failed. Filesystem pages are still subject to bugs and use of
+ * FOLL_LONGTERM should be avoided on those pages.
+ *
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region. And so, CMA attempts to migrate the page before pinning, when
+ * FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity is
+ * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
+ * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
+ * a call to unpin_user_page().
+ *
+ * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
+ * and separate refcounting mechanisms, however, and that means that each has
+ * its own acquire and release mechanisms:
+ *
+ * FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
+ *
+ * FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release.
+ *
+ * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
+ * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
+ * calls applied to them, and that's perfectly OK. This is a constraint on the
+ * callers, not on the pages.)
+ *
+ * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
+ * directly by the caller. That's in order to help avoid mismatches when
+ * releasing pages: get_user_pages*() pages must be released via put_page(),
+ * while pin_user_pages*() pages must be released via unpin_user_page().
+ *
+ * Please see Documentation/core-api/pin_user_pages.rst for more information.
+ */
+
+ enum {
+ /* check pte is writable */
+ FOLL_WRITE = 1 << 0,
+ /* do get_page on page */
+ FOLL_GET = 1 << 1,
+ /* give error on hole if it would be zero */
+ FOLL_DUMP = 1 << 2,
+ /* get_user_pages read/write w/o permission */
+ FOLL_FORCE = 1 << 3,
+ /*
+ * if a disk transfer is needed, start the IO and return without waiting
+ * upon it
+ */
+ FOLL_NOWAIT = 1 << 4,
+ /* do not fault in pages */
+ FOLL_NOFAULT = 1 << 5,
+ /* check page is hwpoisoned */
+ FOLL_HWPOISON = 1 << 6,
+ /* don't do file mappings */
+ FOLL_ANON = 1 << 7,
+ /*
+ * FOLL_LONGTERM indicates that the page will be held for an indefinite
+ * time period _often_ under userspace control. This is in contrast to
+ * iov_iter_get_pages(), whose usages are transient.
+ */
+ FOLL_LONGTERM = 1 << 8,
+ /* split huge pmd before returning */
+ FOLL_SPLIT_PMD = 1 << 9,
+ /* allow returning PCI P2PDMA pages */
+ FOLL_PCI_P2PDMA = 1 << 10,
+ /* allow interrupts from generic signals */
+ FOLL_INTERRUPTIBLE = 1 << 11,
+
+ /* See also internal only FOLL flags in mm/internal.h */
+ };
+
#endif /* _LINUX_MM_TYPES_H */
return __filemap_get_folio(mapping, index, FGP_LOCK, 0);
}
+ /**
+ * filemap_grab_folio - grab a folio from the page cache
+ * @mapping: The address space to search
+ * @index: The page index
+ *
+ * Looks up the page cache entry at @mapping & @index. If no folio is found,
+ * a new folio is created. The folio is locked, marked as accessed, and
+ * returned.
+ *
+ * Return: A found or created folio. NULL if no folio is found and failed to
+ * create a folio.
+ */
+ static inline struct folio *filemap_grab_folio(struct address_space *mapping,
+ pgoff_t index)
+ {
+ return __filemap_get_folio(mapping, index,
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mapping_gfp_mask(mapping));
+ }
+
/**
* find_get_page - find and get a page reference
* @mapping: the address_space to search
pgoff_t end, struct folio_batch *fbatch);
unsigned filemap_get_folios_contig(struct address_space *mapping,
pgoff_t *start, pgoff_t end, struct folio_batch *fbatch);
- unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
- pgoff_t end, xa_mark_t tag, unsigned int nr_pages,
- struct page **pages);
- static inline unsigned find_get_pages_tag(struct address_space *mapping,
- pgoff_t *index, xa_mark_t tag, unsigned int nr_pages,
- struct page **pages)
- {
- return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag,
- nr_pages, pages);
- }
+ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
+ pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch);
struct page *grab_cache_page_write_begin(struct address_space *mapping,
pgoff_t index);
struct folio *read_cache_folio(struct address_space *, pgoff_t index,
filler_t *filler, struct file *file);
+ struct folio *mapping_read_folio_gfp(struct address_space *, pgoff_t index,
+ gfp_t flags);
struct page *read_cache_page(struct address_space *, pgoff_t index,
filler_t *filler, struct file *file);
extern struct page * read_cache_page_gfp(struct address_space *mapping,
return 0;
}
-/*
- * lock_page_killable is like lock_page but can be interrupted by fatal
- * signals. It returns 0 if it locked the page and -EINTR if it was
- * killed while waiting.
- */
-static inline int lock_page_killable(struct page *page)
-{
- return folio_lock_killable(page_folio(page));
-}
-
/*
* folio_lock_or_retry - Lock the folio, unless this would block and the
* caller indicated that it can handle a retry.
#ifdef CONFIG_BOOT_CONFIG
/* Is bootconfig on command line? */
-static bool bootconfig_found;
+static bool bootconfig_found = IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE);
static size_t initargs_offs;
#else
# define bootconfig_found false
pgtable_init();
debug_objects_mem_init();
vmalloc_init();
- /* Should be run after vmap initialization */
- if (early_page_ext_enabled())
+ /* If no deferred init page_ext now, as vmap is fully initialized */
+ if (!deferred_struct_pages)
page_ext_init();
/* Should be run before the first non-init thread is created */
init_espfix_bsp();
padata_init();
page_alloc_init_late();
/* Initialize page ext after all struct pages are initialized. */
- if (!early_page_ext_enabled())
+ if (deferred_struct_pages)
page_ext_init();
do_basic_setup();
static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
static __cold void io_fallback_tw(struct io_uring_task *tctx);
-static struct kmem_cache *req_cachep;
+struct kmem_cache *req_cachep;
struct sock *io_uring_get_socket(struct file *file)
{
static inline void io_req_add_to_cache(struct io_kiocb *req, struct io_ring_ctx *ctx)
{
wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
+ kasan_poison_object_data(req_cachep, req);
}
static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref)
fallback_work.work);
struct llist_node *node = llist_del_all(&ctx->fallback_llist);
struct io_kiocb *req, *tmp;
- bool locked = false;
+ bool locked = true;
- percpu_ref_get(&ctx->refs);
+ mutex_lock(&ctx->uring_lock);
llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
req->io_task_work.func(req, &locked);
-
- if (locked) {
- io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
- }
- percpu_ref_put(&ctx->refs);
+ if (WARN_ON_ONCE(!locked))
+ return;
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
}
static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1);
mutex_init(&ctx->uring_lock);
init_waitqueue_head(&ctx->cq_wait);
+ init_waitqueue_head(&ctx->poll_wq);
spin_lock_init(&ctx->completion_lock);
spin_lock_init(&ctx->timeout_lock);
INIT_WQ_LIST(&ctx->iopoll_list);
static void io_prep_async_work(struct io_kiocb *req)
{
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
struct io_ring_ctx *ctx = req->ctx;
if (!(req->flags & REQ_F_CREDS)) {
void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
{
+ if (ctx->poll_activated)
+ io_poll_wq_wake(ctx);
if (ctx->off_timeout_used)
io_flush_timeouts(ctx);
if (ctx->drain_active) {
io_cqring_wake(ctx);
}
+static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
+ __releases(ctx->completion_lock)
+{
+ io_commit_cqring(ctx);
+ __io_cq_unlock(ctx);
+ io_commit_cqring_flush(ctx);
+
+ /*
+ * As ->task_complete implies that the ring is single tasked, cq_wait
+ * may only be waited on by the current in io_cqring_wait(), but since
+ * it will re-check the wakeup conditions once we return we can safely
+ * skip waking it up.
+ */
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
+ smp_mb();
+ __io_cqring_wake(ctx);
+ }
+}
+
void io_cq_unlock_post(struct io_ring_ctx *ctx)
__releases(ctx->completion_lock)
{
}
}
-/* Returns true if there are no backlogged entries after the flush */
static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx)
{
size_t cqe_size = sizeof(struct io_uring_cqe);
io_cqring_do_overflow_flush(ctx);
}
-void __io_put_task(struct task_struct *task, int nr)
+/* can be called by any task */
+static void io_put_task_remote(struct task_struct *task, int nr)
{
struct io_uring_task *tctx = task->io_uring;
put_task_struct_many(task, nr);
}
+/* used by a task to put its own references */
+static void io_put_task_local(struct task_struct *task, int nr)
+{
+ task->io_uring->cached_refs += nr;
+}
+
+/* must to be called somewhat shortly after putting a request */
+static inline void io_put_task(struct task_struct *task, int nr)
+{
+ if (likely(task == current))
+ io_put_task_local(task, nr);
+ else
+ io_put_task_remote(task, nr);
+}
+
void io_task_refs_refill(struct io_uring_task *tctx)
{
unsigned int refill = -tctx->cached_refs + IO_TCTX_REFS_CACHE_NR;
req->link = NULL;
}
}
+ io_put_kbuf_comp(req);
+ io_dismantle_req(req);
io_req_put_rsrc(req);
/*
* Selected buffer deallocation in io_clean_op() assumes that
* we don't hold ->completion_lock. Clean them here to avoid
* deadlocks.
*/
- io_put_kbuf_comp(req);
- io_dismantle_req(req);
- io_put_task(req->task, 1);
+ io_put_task_remote(req->task, 1);
wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
ctx->locked_free_nr++;
}
void io_req_defer_failed(struct io_kiocb *req, s32 res)
__must_hold(&ctx->uring_lock)
{
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_cold_def *def = &io_cold_defs[req->opcode];
lockdep_assert_held(&req->ctx->uring_lock);
io_req_put_rsrc(req);
io_dismantle_req(req);
- io_put_task(req->task, 1);
+ io_put_task_remote(req->task, 1);
spin_lock(&ctx->completion_lock);
wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
{
unsigned int count = 0;
- while (node != last) {
+ while (node && node != last) {
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
/* if not contended, grab and improve batching */
*locked = mutex_trylock(&(*ctx)->uring_lock);
percpu_ref_get(&(*ctx)->refs);
- }
+ } else if (!*locked)
+ *locked = mutex_trylock(&(*ctx)->uring_lock);
req->io_task_work.func(req, locked);
node = next;
count++;
+ if (unlikely(need_resched())) {
+ ctx_flush_and_put(*ctx, locked);
+ *ctx = NULL;
+ cond_resched();
+ }
}
return count;
task_work);
struct llist_node fake = {};
struct llist_node *node;
- unsigned int loops = 1;
- unsigned int count;
+ unsigned int loops = 0;
+ unsigned int count = 0;
if (unlikely(current->flags & PF_EXITING)) {
io_fallback_tw(tctx);
return;
}
- node = io_llist_xchg(&tctx->task_list, &fake);
- count = handle_tw_list(node, &ctx, &uring_locked, NULL);
- node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
- while (node != &fake) {
+ do {
loops++;
node = io_llist_xchg(&tctx->task_list, &fake);
count += handle_tw_list(node, &ctx, &uring_locked, &fake);
+
+ /* skip expensive cmpxchg if there are items in the list */
+ if (READ_ONCE(tctx->task_list.first) != &fake)
+ continue;
+ if (uring_locked && !wq_list_empty(&ctx->submit_state.compl_reqs)) {
+ io_submit_flush_completions(ctx);
+ if (READ_ONCE(tctx->task_list.first) != &fake)
+ continue;
+ }
node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
- }
+ } while (node != &fake);
ctx_flush_and_put(ctx, &uring_locked);
percpu_ref_put(&ctx->refs);
return;
}
- /* need it for the following io_cqring_wake() */
+ /* needed for the following wake up */
smp_mb__after_atomic();
if (unlikely(atomic_read(&req->task->io_uring->in_idle))) {
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
-
if (ctx->has_evfd)
io_eventfd_signal(ctx);
- __io_cqring_wake(ctx);
+
+ if (READ_ONCE(ctx->cq_waiting))
+ wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
percpu_ref_put(&ctx->refs);
}
}
}
-int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
+static int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
{
struct llist_node *node;
- struct llist_node fake;
- struct llist_node *current_final = NULL;
- int ret;
- unsigned int loops = 1;
+ unsigned int loops = 0;
+ int ret = 0;
- if (unlikely(ctx->submitter_task != current))
+ if (WARN_ON_ONCE(ctx->submitter_task != current))
return -EEXIST;
-
- node = io_llist_xchg(&ctx->work_llist, &fake);
- ret = 0;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
again:
- while (node != current_final) {
+ node = io_llist_xchg(&ctx->work_llist, NULL);
+ while (node) {
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
ret++;
node = next;
}
+ loops++;
- if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
- atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
-
- node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL);
- if (node != &fake) {
- loops++;
- current_final = &fake;
- node = io_llist_xchg(&ctx->work_llist, &fake);
+ if (!llist_empty(&ctx->work_llist))
goto again;
- }
-
- if (*locked)
+ if (*locked) {
io_submit_flush_completions(ctx);
+ if (!llist_empty(&ctx->work_llist))
+ goto again;
+ }
trace_io_uring_local_work_run(ctx, ret, loops);
return ret;
-
}
-int io_run_local_work(struct io_ring_ctx *ctx)
+static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
{
bool locked;
int ret;
if (llist_empty(&ctx->work_llist))
return 0;
- __set_current_state(TASK_RUNNING);
- locked = mutex_trylock(&ctx->uring_lock);
+ locked = true;
+ ret = __io_run_local_work(ctx, &locked);
+ /* shouldn't happen! */
+ if (WARN_ON_ONCE(!locked))
+ mutex_lock(&ctx->uring_lock);
+ return ret;
+}
+
+static int io_run_local_work(struct io_ring_ctx *ctx)
+{
+ bool locked = mutex_trylock(&ctx->uring_lock);
+ int ret;
+
ret = __io_run_local_work(ctx, &locked);
if (locked)
mutex_unlock(&ctx->uring_lock);
{
io_tw_lock(req->ctx, locked);
/* req->task == current here, checking PF_EXITING is safe */
- if (likely(!(req->task->flags & PF_EXITING)))
- io_queue_sqe(req);
- else
+ if (unlikely(req->task->flags & PF_EXITING))
io_req_defer_failed(req, -EFAULT);
+ else if (req->flags & REQ_F_FORCE_ASYNC)
+ io_queue_iowq(req, locked);
+ else
+ io_queue_sqe(req);
}
void io_req_task_queue_fail(struct io_kiocb *req, int ret)
}
}
}
- __io_cq_unlock_post(ctx);
+ __io_cq_unlock_post_flush(ctx);
if (!wq_list_empty(&ctx->submit_state.compl_reqs)) {
io_free_batch_list(ctx, state->compl_reqs.first);
bool io_alloc_async_data(struct io_kiocb *req)
{
- WARN_ON_ONCE(!io_op_defs[req->opcode].async_size);
- req->async_data = kmalloc(io_op_defs[req->opcode].async_size, GFP_KERNEL);
+ WARN_ON_ONCE(!io_cold_defs[req->opcode].async_size);
+ req->async_data = kmalloc(io_cold_defs[req->opcode].async_size, GFP_KERNEL);
if (req->async_data) {
req->flags |= REQ_F_ASYNC_DATA;
return false;
int io_req_prep_async(struct io_kiocb *req)
{
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_cold_def *cdef = &io_cold_defs[req->opcode];
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
/* assign early for deferred execution for non-fixed file */
if (def->needs_file && !(req->flags & REQ_F_FIXED_FILE))
req->file = io_file_get_normal(req, req->cqe.fd);
- if (!def->prep_async)
+ if (!cdef->prep_async)
return 0;
if (WARN_ON_ONCE(req_has_async_data(req)))
return -EFAULT;
- if (!io_op_defs[req->opcode].manual_alloc) {
+ if (!def->manual_alloc) {
if (io_alloc_async_data(req))
return -EAGAIN;
}
- return def->prep_async(req);
+ return cdef->prep_async(req);
}
static u32 io_get_sequence(struct io_kiocb *req)
}
spin_unlock(&ctx->completion_lock);
- ret = io_req_prep_async(req);
- if (ret) {
-fail:
- io_req_defer_failed(req, ret);
- return;
- }
io_prep_async_link(req);
de = kmalloc(sizeof(*de), GFP_KERNEL);
if (!de) {
ret = -ENOMEM;
- goto fail;
+ io_req_defer_failed(req, ret);
+ return;
}
spin_lock(&ctx->completion_lock);
}
if (req->flags & REQ_F_NEED_CLEANUP) {
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_cold_def *def = &io_cold_defs[req->opcode];
if (def->cleanup)
def->cleanup(req);
req->flags &= ~IO_REQ_CLEAN_FLAGS;
}
-static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags)
+static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def,
+ unsigned int issue_flags)
{
- if (req->file || !io_op_defs[req->opcode].needs_file)
+ if (req->file || !def->needs_file)
return true;
if (req->flags & REQ_F_FIXED_FILE)
static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
{
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
const struct cred *creds = NULL;
int ret;
- if (unlikely(!io_assign_file(req, issue_flags)))
+ if (unlikely(!io_assign_file(req, def, issue_flags)))
return -EBADF;
if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
void io_wq_submit_work(struct io_wq_work *work)
{
struct io_kiocb *req = container_of(work, struct io_kiocb, work);
- const struct io_op_def *def = &io_op_defs[req->opcode];
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
unsigned int issue_flags = IO_URING_F_UNLOCKED | IO_URING_F_IOWQ;
bool needs_poll = false;
int ret = 0, err = -ECANCELED;
io_req_task_queue_fail(req, err);
return;
}
- if (!io_assign_file(req, issue_flags)) {
+ if (!io_assign_file(req, def, issue_flags)) {
err = -EBADF;
work->flags |= IO_WQ_WORK_CANCEL;
goto fail;
req->flags &= ~REQ_F_HARDLINK;
req->flags |= REQ_F_LINK;
io_req_defer_failed(req, req->cqe.res);
- } else if (unlikely(req->ctx->drain_active)) {
- io_drain_req(req);
} else {
int ret = io_req_prep_async(req);
- if (unlikely(ret))
+ if (unlikely(ret)) {
io_req_defer_failed(req, ret);
+ return;
+ }
+
+ if (unlikely(req->ctx->drain_active))
+ io_drain_req(req);
else
io_queue_iowq(req, NULL);
}
const struct io_uring_sqe *sqe)
__must_hold(&ctx->uring_lock)
{
- const struct io_op_def *def;
+ const struct io_issue_def *def;
unsigned int sqe_flags;
int personality;
u8 opcode;
req->opcode = 0;
return -EINVAL;
}
- def = &io_op_defs[opcode];
+ def = &io_issue_defs[opcode];
if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
/* enforce forwards compatibility on users */
if (sqe_flags & ~SQE_VALID_FLAGS)
* used, it's important that those reads are done through READ_ONCE() to
* prevent a re-load down the line.
*/
-static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
+static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
{
unsigned head, mask = ctx->sq_entries - 1;
unsigned sq_idx = ctx->cached_sq_head++ & mask;
/* double index for 128-byte SQEs, twice as long */
if (ctx->flags & IORING_SETUP_SQE128)
head <<= 1;
- return &ctx->sq_sqes[head];
+ *sqe = &ctx->sq_sqes[head];
+ return true;
}
/* drop invalid entries */
ctx->cq_extra--;
WRITE_ONCE(ctx->rings->sq_dropped,
READ_ONCE(ctx->rings->sq_dropped) + 1);
- return NULL;
+ return false;
}
int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
- if (unlikely(!io_alloc_req_refill(ctx)))
+ if (unlikely(!io_alloc_req(ctx, &req)))
break;
- req = io_alloc_req(ctx);
- sqe = io_get_sqe(ctx);
- if (unlikely(!sqe)) {
+ if (unlikely(!io_get_sqe(ctx, &sqe))) {
io_req_add_to_cache(req, ctx);
break;
}
struct io_ring_ctx *ctx;
unsigned cq_tail;
unsigned nr_timeouts;
+ ktime_t timeout;
};
static inline bool io_has_work(struct io_ring_ctx *ctx)
{
return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
- ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
- !llist_empty(&ctx->work_llist));
+ !llist_empty(&ctx->work_llist);
}
static inline bool io_should_wake(struct io_wait_queue *iowq)
static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int wake_flags, void *key)
{
- struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
- wq);
- struct io_ring_ctx *ctx = iowq->ctx;
+ struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, wq);
/*
* Cannot safely flush overflowed CQEs from here, ensure we wake up
* the task, and the next invocation will do it.
*/
- if (io_should_wake(iowq) || io_has_work(ctx))
+ if (io_should_wake(iowq) || io_has_work(iowq->ctx))
return autoremove_wake_function(curr, mode, wake_flags, key);
return -1;
}
int io_run_task_work_sig(struct io_ring_ctx *ctx)
{
- if (io_run_task_work_ctx(ctx) > 0)
+ if (!llist_empty(&ctx->work_llist)) {
+ __set_current_state(TASK_RUNNING);
+ if (io_run_local_work(ctx) > 0)
+ return 1;
+ }
+ if (io_run_task_work() > 0)
return 1;
if (task_sigpending(current))
return -EINTR;
/* when returns >0, the caller should retry */
static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
- struct io_wait_queue *iowq,
- ktime_t *timeout)
+ struct io_wait_queue *iowq)
{
- int ret;
- unsigned long check_cq;
-
- /* make sure we run task_work before checking for signals */
- ret = io_run_task_work_sig(ctx);
- if (ret || io_should_wake(iowq))
- return ret;
-
- check_cq = READ_ONCE(ctx->check_cq);
- if (unlikely(check_cq)) {
- /* let the caller flush overflows, retry */
- if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
- return 1;
- if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
- return -EBADR;
- }
- if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
+ if (unlikely(READ_ONCE(ctx->check_cq)))
+ return 1;
+ if (unlikely(!llist_empty(&ctx->work_llist)))
+ return 1;
+ if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL)))
+ return 1;
+ if (unlikely(task_sigpending(current)))
+ return -EINTR;
+ if (unlikely(io_should_wake(iowq)))
+ return 0;
+ if (iowq->timeout == KTIME_MAX)
+ schedule();
+ else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
return -ETIME;
-
- /*
- * Run task_work after scheduling. If we got woken because of
- * task_work being processed, run it now rather than let the caller
- * do another wait loop.
- */
- ret = io_run_task_work_sig(ctx);
- return ret < 0 ? ret : 1;
+ return 0;
}
/*
{
struct io_wait_queue iowq;
struct io_rings *rings = ctx->rings;
- ktime_t timeout = KTIME_MAX;
int ret;
if (!io_allowed_run_tw(ctx))
return -EEXIST;
-
- do {
- /* always run at least 1 task work to process local work */
- ret = io_run_task_work_ctx(ctx);
- if (ret < 0)
- return ret;
- io_cqring_overflow_flush(ctx);
-
- /* if user messes with these they will just get an early return */
- if (__io_cqring_events_user(ctx) >= min_events)
- return 0;
- } while (ret > 0);
+ if (!llist_empty(&ctx->work_llist))
+ io_run_local_work(ctx);
+ io_run_task_work();
+ io_cqring_overflow_flush(ctx);
+ /* if user messes with these they will just get an early return */
+ if (__io_cqring_events_user(ctx) >= min_events)
+ return 0;
if (sig) {
#ifdef CONFIG_COMPAT
return ret;
}
- if (uts) {
- struct timespec64 ts;
-
- if (get_timespec64(&ts, uts))
- return -EFAULT;
- timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
- }
-
init_waitqueue_func_entry(&iowq.wq, io_wake_function);
iowq.wq.private = current;
INIT_LIST_HEAD(&iowq.wq.entry);
iowq.ctx = ctx;
iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
+ iowq.timeout = KTIME_MAX;
+
+ if (uts) {
+ struct timespec64 ts;
+
+ if (get_timespec64(&ts, uts))
+ return -EFAULT;
+ iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+ }
trace_io_uring_cqring_wait(ctx, min_events);
do {
- if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
- finish_wait(&ctx->cq_wait, &iowq.wq);
- io_cqring_do_overflow_flush(ctx);
+ unsigned long check_cq;
+
+ if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
+ WRITE_ONCE(ctx->cq_waiting, 1);
+ set_current_state(TASK_INTERRUPTIBLE);
+ } else {
+ prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
+ TASK_INTERRUPTIBLE);
+ }
+
+ ret = io_cqring_wait_schedule(ctx, &iowq);
+ __set_current_state(TASK_RUNNING);
+ WRITE_ONCE(ctx->cq_waiting, 0);
+
+ if (ret < 0)
+ break;
+ /*
+ * Run task_work after scheduling and before io_should_wake().
+ * If we got woken because of task_work being processed, run it
+ * now rather than let the caller do another wait loop.
+ */
+ io_run_task_work();
+ if (!llist_empty(&ctx->work_llist))
+ io_run_local_work(ctx);
+
+ check_cq = READ_ONCE(ctx->check_cq);
+ if (unlikely(check_cq)) {
+ /* let the caller flush overflows, retry */
+ if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+ io_cqring_do_overflow_flush(ctx);
+ if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)) {
+ ret = -EBADR;
+ break;
+ }
}
- prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
- TASK_INTERRUPTIBLE);
- ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
- if (__io_cqring_events_user(ctx) >= min_events)
+
+ if (io_should_wake(&iowq)) {
+ ret = 0;
break;
+ }
cond_resched();
- } while (ret > 0);
+ } while (1);
- finish_wait(&ctx->cq_wait, &iowq.wq);
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+ finish_wait(&ctx->cq_wait, &iowq.wq);
restore_saved_sigmask_unless(ret == -EINTR);
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
static void io_req_caches_free(struct io_ring_ctx *ctx)
{
+ struct io_kiocb *req;
int nr = 0;
mutex_lock(&ctx->uring_lock);
io_flush_cached_locked_reqs(ctx, &ctx->submit_state);
while (!io_req_cache_empty(ctx)) {
- struct io_kiocb *req = io_alloc_req(ctx);
-
+ req = io_extract_req(ctx);
kmem_cache_free(req_cachep, req);
nr++;
}
kfree(ctx);
}
+static __cold void io_activate_pollwq_cb(struct callback_head *cb)
+{
+ struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
+ poll_wq_task_work);
+
+ mutex_lock(&ctx->uring_lock);
+ ctx->poll_activated = true;
+ mutex_unlock(&ctx->uring_lock);
+
+ /*
+ * Wake ups for some events between start of polling and activation
+ * might've been lost due to loose synchronisation.
+ */
+ wake_up_all(&ctx->poll_wq);
+ percpu_ref_put(&ctx->refs);
+}
+
+static __cold void io_activate_pollwq(struct io_ring_ctx *ctx)
+{
+ spin_lock(&ctx->completion_lock);
+ /* already activated or in progress */
+ if (ctx->poll_activated || ctx->poll_wq_task_work.func)
+ goto out;
+ if (WARN_ON_ONCE(!ctx->task_complete))
+ goto out;
+ if (!ctx->submitter_task)
+ goto out;
+ /*
+ * with ->submitter_task only the submitter task completes requests, we
+ * only need to sync with it, which is done by injecting a tw
+ */
+ init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb);
+ percpu_ref_get(&ctx->refs);
+ if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL))
+ percpu_ref_put(&ctx->refs);
+out:
+ spin_unlock(&ctx->completion_lock);
+}
+
static __poll_t io_uring_poll(struct file *file, poll_table *wait)
{
struct io_ring_ctx *ctx = file->private_data;
__poll_t mask = 0;
- poll_wait(file, &ctx->cq_wait, wait);
+ if (unlikely(!ctx->poll_activated))
+ io_activate_pollwq(ctx);
+
+ poll_wait(file, &ctx->poll_wq, wait);
/*
* synchronizes with barrier from wq_has_sleeper call in
* io_commit_cqring
* pushes them to do the flush.
*/
- if (io_cqring_events(ctx) || io_has_work(ctx))
+ if (__io_cqring_events_user(ctx) || io_has_work(ctx))
mask |= EPOLLIN | EPOLLRDNORM;
return mask;
while (!wq_list_empty(&ctx->iopoll_list)) {
io_iopoll_try_reap_events(ctx);
ret = true;
+ cond_resched();
}
}
- if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
+ if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
+ io_allowed_defer_tw_run(ctx))
ret |= io_run_local_work(ctx) > 0;
ret |= io_cancel_defer_files(ctx, task, cancel_all);
mutex_lock(&ctx->uring_lock);
static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
- return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -EINVAL;
+ return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
}
static unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
}
if (flags & IORING_ENTER_SQ_WAKEUP)
wake_up(&ctx->sq_data->wait);
- if (flags & IORING_ENTER_SQ_WAIT) {
- ret = io_sqpoll_wait_sq(ctx);
- if (ret)
- goto out;
- }
+ if (flags & IORING_ENTER_SQ_WAIT)
+ io_sqpoll_wait_sq(ctx);
+
ret = to_submit;
} else if (to_submit) {
ret = io_uring_add_tctx_node(ctx);
!(ctx->flags & IORING_SETUP_SQPOLL))
ctx->task_complete = true;
+ /*
+ * lazy poll_wq activation relies on ->task_complete for synchronisation
+ * purposes, see io_activate_pollwq()
+ */
+ if (!ctx->task_complete)
+ ctx->poll_activated = true;
+
/*
* When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
* space applications don't need to do io completion events
IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
- IORING_FEAT_LINKED_FILE;
+ IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING;
if (copy_to_user(params, p, sizeof(*p))) {
ret = -EFAULT;
if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
&& !(ctx->flags & IORING_SETUP_R_DISABLED))
- ctx->submitter_task = get_task_struct(current);
+ WRITE_ONCE(ctx->submitter_task, get_task_struct(current));
file = io_uring_get_file(ctx);
if (IS_ERR(file)) {
for (i = 0; i < nr_args; i++) {
p->ops[i].op = i;
- if (!io_op_defs[i].not_supported)
+ if (!io_issue_defs[i].not_supported)
p->ops[i].flags = IO_URING_OP_SUPPORTED;
}
p->ops_len = i;
if (!(ctx->flags & IORING_SETUP_R_DISABLED))
return -EBADFD;
- if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task)
- ctx->submitter_task = get_task_struct(current);
+ if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task) {
+ WRITE_ONCE(ctx->submitter_task, get_task_struct(current));
+ /*
+ * Lazy activation attempts would fail if it was polled before
+ * submitter_task is set.
+ */
+ if (wq_has_sleeper(&ctx->poll_wq))
+ io_activate_pollwq(ctx);
+ }
if (ctx->restrictions.registered)
ctx->restricted = 1;
struct io_ring_ctx *ctx;
long ret = -EBADF;
struct fd f;
+ bool use_registered_ring;
+
+ use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING);
+ opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
if (opcode >= IORING_REGISTER_LAST)
return -EINVAL;
- f = fdget(fd);
- if (!f.file)
- return -EBADF;
+ if (use_registered_ring) {
+ /*
+ * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
+ * need only dereference our task private array to find it.
+ */
+ struct io_uring_task *tctx = current->io_uring;
- ret = -EOPNOTSUPP;
- if (!io_is_uring_fops(f.file))
- goto out_fput;
+ if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
+ return -EINVAL;
+ fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
+ f.file = tctx->registered_rings[fd];
+ f.flags = 0;
+ if (unlikely(!f.file))
+ return -EBADF;
+ } else {
+ f = fdget(fd);
+ if (unlikely(!f.file))
+ return -EBADF;
+ ret = -EOPNOTSUPP;
+ if (!io_is_uring_fops(f.file))
+ goto out_fput;
+ }
ctx = f.file->private_data;
int err;
/* Need to create a kthread, thus must support schedule */
- if (bpf_map_is_dev_bound(map)) {
+ if (bpf_map_is_offloaded(map)) {
return bpf_map_offload_update_elem(map, key, value, flags);
} else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
void *ptr;
int err;
- if (bpf_map_is_dev_bound(map))
+ if (bpf_map_is_offloaded(map))
return bpf_map_offload_lookup_elem(map, key, value);
bpf_disable_instrumentation();
* __GFP_RETRY_MAYFAIL to avoid such situations.
*/
- const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
+ gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO);
unsigned int flags = 0;
unsigned long align = 1;
void *area;
return id > 0 ? 0 : id;
}
-void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
+void bpf_map_free_id(struct bpf_map *map)
{
unsigned long flags;
if (!map->id)
return;
- if (do_idr_lock)
- spin_lock_irqsave(&map_idr_lock, flags);
- else
- __acquire(&map_idr_lock);
+ spin_lock_irqsave(&map_idr_lock, flags);
idr_remove(&map_idr, map->id);
map->id = 0;
- if (do_idr_lock)
- spin_unlock_irqrestore(&map_idr_lock, flags);
- else
- __release(&map_idr_lock);
+ spin_unlock_irqrestore(&map_idr_lock, flags);
}
#ifdef CONFIG_MEMCG_KMEM
* So we have to check map->objcg for being NULL each time it's
* being used.
*/
- map->objcg = get_obj_cgroup_from_current();
+ if (memcg_bpf_enabled())
+ map->objcg = get_obj_cgroup_from_current();
}
static void bpf_map_release_memcg(struct bpf_map *map)
return ptr;
}
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+ gfp_t flags)
+{
+ struct mem_cgroup *memcg, *old_memcg;
+ void *ptr;
+
+ memcg = bpf_map_get_memcg(map);
+ old_memcg = set_active_memcg(memcg);
+ ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+ set_active_memcg(old_memcg);
+ mem_cgroup_put(memcg);
+
+ return ptr;
+}
+
void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
size_t align, gfp_t flags)
{
return;
for (i = 0; i < rec->cnt; i++) {
switch (rec->fields[i].type) {
- case BPF_SPIN_LOCK:
- case BPF_TIMER:
- break;
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
if (rec->fields[i].kptr.module)
break;
case BPF_LIST_HEAD:
case BPF_LIST_NODE:
- /* Nothing to release for bpf_list_head */
+ case BPF_RB_ROOT:
+ case BPF_RB_NODE:
+ case BPF_SPIN_LOCK:
+ case BPF_TIMER:
+ /* Nothing to release */
break;
default:
WARN_ON_ONCE(1);
new_rec->cnt = 0;
for (i = 0; i < rec->cnt; i++) {
switch (fields[i].type) {
- case BPF_SPIN_LOCK:
- case BPF_TIMER:
- break;
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
btf_get(fields[i].kptr.btf);
break;
case BPF_LIST_HEAD:
case BPF_LIST_NODE:
- /* Nothing to acquire for bpf_list_head */
+ case BPF_RB_ROOT:
+ case BPF_RB_NODE:
+ case BPF_SPIN_LOCK:
+ case BPF_TIMER:
+ /* Nothing to acquire */
break;
default:
ret = -EFAULT;
continue;
bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
break;
+ case BPF_RB_ROOT:
+ if (WARN_ON_ONCE(rec->spin_lock_off < 0))
+ continue;
+ bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
+ break;
case BPF_LIST_NODE:
+ case BPF_RB_NODE:
break;
default:
WARN_ON_ONCE(1);
}
/* decrement map refcnt and schedule it for freeing via workqueue
- * (unrelying map implementation ops->map_free() might sleep)
+ * (underlying map implementation ops->map_free() might sleep)
*/
-static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
+void bpf_map_put(struct bpf_map *map)
{
if (atomic64_dec_and_test(&map->refcnt)) {
/* bpf_map_free_id() must be called first */
- bpf_map_free_id(map, do_idr_lock);
+ bpf_map_free_id(map);
btf_put(map->btf);
INIT_WORK(&map->work, bpf_map_free_deferred);
/* Avoid spawning kworkers, since they all might contend
queue_work(system_unbound_wq, &map->work);
}
}
-
-void bpf_map_put(struct bpf_map *map)
-{
- __bpf_map_put(map, true);
-}
EXPORT_SYMBOL_GPL(bpf_map_put);
void bpf_map_put_with_uref(struct bpf_map *map)
/* set default open/close callbacks */
vma->vm_ops = &bpf_map_default_vmops;
vma->vm_private_data = map;
- vma->vm_flags &= ~VM_MAYEXEC;
+ vm_flags_clear(vma, VM_MAYEXEC);
if (!(vma->vm_flags & VM_WRITE))
/* disallow re-mapping with PROT_WRITE */
- vma->vm_flags &= ~VM_MAYWRITE;
+ vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma);
if (err)
return -EINVAL;
map->record = btf_parse_fields(btf, value_type,
- BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD,
+ BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+ BPF_RB_ROOT,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
}
break;
case BPF_LIST_HEAD:
+ case BPF_RB_ROOT:
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
goto err_put;
}
- if (bpf_map_is_dev_bound(map)) {
+ if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_delete_elem(map, key);
goto out;
} else if (IS_FD_PROG_ARRAY(map) ||
if (!next_key)
goto free_key;
- if (bpf_map_is_dev_bound(map)) {
+ if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_get_next_key(map, key, next_key);
goto out;
}
map->key_size))
break;
- if (bpf_map_is_dev_bound(map)) {
+ if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_delete_elem(map, key);
break;
}
map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
- if (!bpf_map_is_dev_bound(map)) {
+ if (!bpf_map_is_offloaded(map)) {
bpf_disable_instrumentation();
rcu_read_lock();
err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags);
if (!ops)
return -EINVAL;
- if (!bpf_prog_is_dev_bound(prog->aux))
+ if (!bpf_prog_is_offloaded(prog->aux))
prog->aux->ops = ops;
else
prog->aux->ops = &bpf_offload_prog_ops;
return;
if (audit_enabled == AUDIT_OFF)
return;
- if (op == BPF_AUDIT_LOAD)
+ if (!in_irq() && !irqs_disabled())
ctx = audit_context();
ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_BPF);
if (unlikely(!ab))
return id > 0 ? 0 : id;
}
-void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
+void bpf_prog_free_id(struct bpf_prog *prog)
{
unsigned long flags;
if (!prog->aux->id)
return;
- if (do_idr_lock)
- spin_lock_irqsave(&prog_idr_lock, flags);
- else
- __acquire(&prog_idr_lock);
-
+ spin_lock_irqsave(&prog_idr_lock, flags);
idr_remove(&prog_idr, prog->aux->id);
prog->aux->id = 0;
-
- if (do_idr_lock)
- spin_unlock_irqrestore(&prog_idr_lock, flags);
- else
- __release(&prog_idr_lock);
+ spin_unlock_irqrestore(&prog_idr_lock, flags);
}
static void __bpf_prog_put_rcu(struct rcu_head *rcu)
prog = aux->prog;
perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0);
bpf_audit_prog(prog, BPF_AUDIT_UNLOAD);
+ bpf_prog_free_id(prog);
__bpf_prog_put_noref(prog, true);
}
-static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
+static void __bpf_prog_put(struct bpf_prog *prog)
{
struct bpf_prog_aux *aux = prog->aux;
if (atomic64_dec_and_test(&aux->refcnt)) {
- /* bpf_prog_free_id() must be called first */
- bpf_prog_free_id(prog, do_idr_lock);
-
if (in_irq() || irqs_disabled()) {
INIT_WORK(&aux->work, bpf_prog_put_deferred);
schedule_work(&aux->work);
void bpf_prog_put(struct bpf_prog *prog)
{
- __bpf_prog_put(prog, true);
+ __bpf_prog_put(prog);
}
EXPORT_SYMBOL_GPL(bpf_prog_put);
if (prog->type != *attach_type)
return false;
- if (bpf_prog_is_dev_bound(prog->aux) && !attach_drv)
+ if (bpf_prog_is_offloaded(prog->aux) && !attach_drv)
return false;
return true;
BPF_F_TEST_STATE_FREQ |
BPF_F_SLEEPABLE |
BPF_F_TEST_RND_HI32 |
- BPF_F_XDP_HAS_FRAGS))
+ BPF_F_XDP_HAS_FRAGS |
+ BPF_F_XDP_DEV_BOUND_ONLY))
return -EINVAL;
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
prog->aux->attach_btf = attach_btf;
prog->aux->attach_btf_id = attr->attach_btf_id;
prog->aux->dst_prog = dst_prog;
- prog->aux->offload_requested = !!attr->prog_ifindex;
+ prog->aux->dev_bound = !!attr->prog_ifindex;
prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
prog->gpl_compatible = is_gpl ? 1 : 0;
if (bpf_prog_is_dev_bound(prog->aux)) {
- err = bpf_prog_offload_init(prog, attr);
+ err = bpf_prog_dev_bound_init(prog, attr);
+ if (err)
+ goto free_prog_sec;
+ }
+
+ if (type == BPF_PROG_TYPE_EXT && dst_prog &&
+ bpf_prog_is_dev_bound(dst_prog->aux)) {
+ err = bpf_prog_dev_bound_inherit(prog, dst_prog);
if (err)
goto free_prog_sec;
}
return -EFAULT;
}
- if (bpf_prog_is_dev_bound(prog->aux)) {
+ if (bpf_prog_is_offloaded(prog->aux)) {
err = bpf_prog_offload_info_fill(&info, prog);
if (err)
return err;
}
info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
- if (bpf_map_is_dev_bound(map)) {
+ if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_info_fill(&info, map);
if (err)
return err;
{
.procname = "bpf_stats_enabled",
.data = &bpf_stats_enabled_key.key,
- .maxlen = sizeof(bpf_stats_enabled_key),
.mode = 0644,
.proc_handler = bpf_stats_handler,
},
cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
epc = &cpc->epc;
-
+ raw_spin_lock_irq(&ctx->lock);
if (!epc->ctx) {
atomic_set(&epc->refcount, 1);
epc->embedded = 1;
- raw_spin_lock_irq(&ctx->lock);
list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
epc->ctx = ctx;
- raw_spin_unlock_irq(&ctx->lock);
} else {
WARN_ON_ONCE(epc->ctx != ctx);
atomic_inc(&epc->refcount);
}
-
+ raw_spin_unlock_irq(&ctx->lock);
return epc;
}
static void put_pmu_ctx(struct perf_event_pmu_context *epc)
{
+ struct perf_event_context *ctx = epc->ctx;
unsigned long flags;
- if (!atomic_dec_and_test(&epc->refcount))
+ /*
+ * XXX
+ *
+ * lockdep_assert_held(&ctx->mutex);
+ *
+ * can't because of the call-site in _free_event()/put_event()
+ * which isn't always called under ctx->mutex.
+ */
+ if (!atomic_dec_and_raw_lock_irqsave(&epc->refcount, &ctx->lock, flags))
return;
- if (epc->ctx) {
- struct perf_event_context *ctx = epc->ctx;
+ WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
- /*
- * XXX
- *
- * lockdep_assert_held(&ctx->mutex);
- *
- * can't because of the call-site in _free_event()/put_event()
- * which isn't always called under ctx->mutex.
- */
-
- WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry));
- raw_spin_lock_irqsave(&ctx->lock, flags);
- list_del_init(&epc->pmu_ctx_entry);
- epc->ctx = NULL;
- raw_spin_unlock_irqrestore(&ctx->lock, flags);
- }
+ list_del_init(&epc->pmu_ctx_entry);
+ epc->ctx = NULL;
WARN_ON_ONCE(!list_empty(&epc->pinned_active));
WARN_ON_ONCE(!list_empty(&epc->flexible_active));
+ raw_spin_unlock_irqrestore(&ctx->lock, flags);
+
if (epc->embedded)
return;
* Since pinned accounting is per vm we cannot allow fork() to copy our
* vma.
*/
- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &perf_mmap_vmops;
if (event->pmu->event_mapped)
ring_buffer_put(rb);
}
-static void __perf_event_header__init_id(struct perf_event_header *header,
- struct perf_sample_data *data,
+/*
+ * A set of common sample data types saved even for non-sample records
+ * when event->attr.sample_id_all is set.
+ */
+#define PERF_SAMPLE_ID_ALL (PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
+ PERF_SAMPLE_ID | PERF_SAMPLE_STREAM_ID | \
+ PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER)
+
+static void __perf_event_header__init_id(struct perf_sample_data *data,
struct perf_event *event,
u64 sample_type)
{
data->type = event->attr.sample_type;
- header->size += event->id_header_size;
+ data->sample_flags |= data->type & PERF_SAMPLE_ID_ALL;
if (sample_type & PERF_SAMPLE_TID) {
/* namespace issues */
struct perf_sample_data *data,
struct perf_event *event)
{
- if (event->attr.sample_id_all)
- __perf_event_header__init_id(header, data, event, event->attr.sample_type);
+ if (event->attr.sample_id_all) {
+ header->size += event->id_header_size;
+ __perf_event_header__init_id(data, event, event->attr.sample_type);
+ }
}
static void __perf_event__output_id_sample(struct perf_output_handle *handle,
}
if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
- if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
+ if (data->br_stack) {
size_t size;
size = data->br_stack->nr
return callchain ?: &__empty_callchain;
}
-void perf_prepare_sample(struct perf_event_header *header,
- struct perf_sample_data *data,
+static __always_inline u64 __cond_set(u64 flags, u64 s, u64 d)
+{
+ return d * !!(flags & s);
+}
+
+void perf_prepare_sample(struct perf_sample_data *data,
struct perf_event *event,
struct pt_regs *regs)
{
u64 sample_type = event->attr.sample_type;
u64 filtered_sample_type;
- header->type = PERF_RECORD_SAMPLE;
- header->size = sizeof(*header) + event->header_size;
-
- header->misc = 0;
- header->misc |= perf_misc_flags(regs);
-
/*
- * Clear the sample flags that have already been done by the
- * PMU driver.
+ * Add the sample flags that are dependent to others. And clear the
+ * sample flags that have already been done by the PMU driver.
*/
- filtered_sample_type = sample_type & ~data->sample_flags;
- __perf_event_header__init_id(header, data, event, filtered_sample_type);
-
- if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
- data->ip = perf_instruction_pointer(regs);
-
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- int size = 1;
-
- if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
- data->callchain = perf_callchain(event, regs);
+ filtered_sample_type = sample_type;
+ filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_CODE_PAGE_SIZE,
+ PERF_SAMPLE_IP);
+ filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_DATA_PAGE_SIZE |
+ PERF_SAMPLE_PHYS_ADDR, PERF_SAMPLE_ADDR);
+ filtered_sample_type |= __cond_set(sample_type, PERF_SAMPLE_STACK_USER,
+ PERF_SAMPLE_REGS_USER);
+ filtered_sample_type &= ~data->sample_flags;
- size += data->callchain->nr;
-
- header->size += size * sizeof(u64);
+ if (filtered_sample_type == 0) {
+ /* Make sure it has the correct data->type for output */
+ data->type = event->attr.sample_type;
+ return;
}
- if (sample_type & PERF_SAMPLE_RAW) {
- struct perf_raw_record *raw = data->raw;
- int size;
+ __perf_event_header__init_id(data, event, filtered_sample_type);
- if (raw && (data->sample_flags & PERF_SAMPLE_RAW)) {
- struct perf_raw_frag *frag = &raw->frag;
- u32 sum = 0;
+ if (filtered_sample_type & PERF_SAMPLE_IP) {
+ data->ip = perf_instruction_pointer(regs);
+ data->sample_flags |= PERF_SAMPLE_IP;
+ }
- do {
- sum += frag->size;
- if (perf_raw_frag_last(frag))
- break;
- frag = frag->next;
- } while (1);
+ if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
+ perf_sample_save_callchain(data, event, regs);
- size = round_up(sum + sizeof(u32), sizeof(u64));
- raw->size = size - sizeof(u32);
- frag->pad = raw->size - sum;
- } else {
- size = sizeof(u64);
- data->raw = NULL;
- }
-
- header->size += size;
+ if (filtered_sample_type & PERF_SAMPLE_RAW) {
+ data->raw = NULL;
+ data->dyn_size += sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_RAW;
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
- int size = sizeof(u64); /* nr */
- if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
- if (branch_sample_hw_index(event))
- size += sizeof(u64);
-
- size += data->br_stack->nr
- * sizeof(struct perf_branch_entry);
- }
- header->size += size;
+ if (filtered_sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ data->br_stack = NULL;
+ data->dyn_size += sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
- if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER))
+ if (filtered_sample_type & PERF_SAMPLE_REGS_USER)
perf_sample_regs_user(&data->regs_user, regs);
- if (sample_type & PERF_SAMPLE_REGS_USER) {
+ /*
+ * It cannot use the filtered_sample_type here as REGS_USER can be set
+ * by STACK_USER (using __cond_set() above) and we don't want to update
+ * the dyn_size if it's not requested by users.
+ */
+ if ((sample_type & ~data->sample_flags) & PERF_SAMPLE_REGS_USER) {
/* regs dump ABI info */
int size = sizeof(u64);
size += hweight64(mask) * sizeof(u64);
}
- header->size += size;
+ data->dyn_size += size;
+ data->sample_flags |= PERF_SAMPLE_REGS_USER;
}
- if (sample_type & PERF_SAMPLE_STACK_USER) {
+ if (filtered_sample_type & PERF_SAMPLE_STACK_USER) {
/*
* Either we need PERF_SAMPLE_STACK_USER bit to be always
* processed as the last one or have additional check added
* up the rest of the sample size.
*/
u16 stack_size = event->attr.sample_stack_user;
+ u16 header_size = perf_sample_data_size(data, event);
u16 size = sizeof(u64);
- stack_size = perf_sample_ustack_size(stack_size, header->size,
+ stack_size = perf_sample_ustack_size(stack_size, header_size,
data->regs_user.regs);
/*
size += sizeof(u64) + stack_size;
data->stack_user_size = stack_size;
- header->size += size;
+ data->dyn_size += size;
+ data->sample_flags |= PERF_SAMPLE_STACK_USER;
}
- if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
+ if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE) {
data->weight.full = 0;
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
- if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
+ if (filtered_sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = PERF_MEM_NA;
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }
- if (filtered_sample_type & PERF_SAMPLE_TRANSACTION)
+ if (filtered_sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = 0;
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
- if (sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_DATA_PAGE_SIZE)) {
- if (filtered_sample_type & PERF_SAMPLE_ADDR)
- data->addr = 0;
+ if (filtered_sample_type & PERF_SAMPLE_ADDR) {
+ data->addr = 0;
+ data->sample_flags |= PERF_SAMPLE_ADDR;
}
- if (sample_type & PERF_SAMPLE_REGS_INTR) {
+ if (filtered_sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
size += hweight64(mask) * sizeof(u64);
}
- header->size += size;
+ data->dyn_size += size;
+ data->sample_flags |= PERF_SAMPLE_REGS_INTR;
}
- if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
- filtered_sample_type & PERF_SAMPLE_PHYS_ADDR)
+ if (filtered_sample_type & PERF_SAMPLE_PHYS_ADDR) {
data->phys_addr = perf_virt_to_phys(data->addr);
+ data->sample_flags |= PERF_SAMPLE_PHYS_ADDR;
+ }
#ifdef CONFIG_CGROUP_PERF
- if (sample_type & PERF_SAMPLE_CGROUP) {
+ if (filtered_sample_type & PERF_SAMPLE_CGROUP) {
struct cgroup *cgrp;
/* protected by RCU */
cgrp = task_css_check(current, perf_event_cgrp_id, 1)->cgroup;
data->cgroup = cgroup_id(cgrp);
+ data->sample_flags |= PERF_SAMPLE_CGROUP;
}
#endif
* require PERF_SAMPLE_ADDR, kernel implicitly retrieve the data->addr,
* but the value will not dump to the userspace.
*/
- if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
+ if (filtered_sample_type & PERF_SAMPLE_DATA_PAGE_SIZE) {
data->data_page_size = perf_get_page_size(data->addr);
+ data->sample_flags |= PERF_SAMPLE_DATA_PAGE_SIZE;
+ }
- if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+ if (filtered_sample_type & PERF_SAMPLE_CODE_PAGE_SIZE) {
data->code_page_size = perf_get_page_size(data->ip);
+ data->sample_flags |= PERF_SAMPLE_CODE_PAGE_SIZE;
+ }
- if (sample_type & PERF_SAMPLE_AUX) {
+ if (filtered_sample_type & PERF_SAMPLE_AUX) {
u64 size;
+ u16 header_size = perf_sample_data_size(data, event);
- header->size += sizeof(u64); /* size */
+ header_size += sizeof(u64); /* size */
/*
* Given the 16bit nature of header::size, an AUX sample can
* Make sure this doesn't happen by using up to U16_MAX bytes
* per sample in total (rounded down to 8 byte boundary).
*/
- size = min_t(size_t, U16_MAX - header->size,
+ size = min_t(size_t, U16_MAX - header_size,
event->attr.aux_sample_size);
size = rounddown(size, 8);
size = perf_prepare_sample_aux(event, data, size);
- WARN_ON_ONCE(size + header->size > U16_MAX);
- header->size += size;
+ WARN_ON_ONCE(size + header_size > U16_MAX);
+ data->dyn_size += size + sizeof(u64); /* size above */
+ data->sample_flags |= PERF_SAMPLE_AUX;
}
+}
+
+void perf_prepare_header(struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event,
+ struct pt_regs *regs)
+{
+ header->type = PERF_RECORD_SAMPLE;
+ header->size = perf_sample_data_size(data, event);
+ header->misc = perf_misc_flags(regs);
+
/*
* If you're adding more sample types here, you likely need to do
* something about the overflowing header::size, like repurpose the
/* protect the callchain buffers */
rcu_read_lock();
- perf_prepare_sample(&header, data, event, regs);
+ perf_prepare_sample(data, event, regs);
+ perf_prepare_header(&header, data, event, regs);
err = output_begin(&handle, data, event, header.size);
if (err)
};
perf_sample_data_init(&data, 0, 0);
- data.raw = &raw;
- data.sample_flags |= PERF_SAMPLE_RAW;
+ perf_sample_save_raw_data(&data, &raw);
perf_trace_buf_update(record, event_type);
rcu_read_lock();
prog = READ_ONCE(event->prog);
if (prog) {
- if (prog->call_get_stack &&
- (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
- !(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) {
- data->callchain = perf_callchain(event, regs);
- data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
- }
-
+ perf_prepare_sample(data, event, regs);
ret = bpf_prog_run(prog, &ctx);
}
rcu_read_unlock();
* orig->shared.rb may be modified concurrently, but the clone
* will be reinitialized.
*/
- *new = data_race(*orig);
+ data_race(memcpy(new, orig, sizeof(*new)));
INIT_LIST_HEAD(&new->anon_vma_chain);
dup_anon_vma_name(orig, new);
}
int retval;
unsigned long charge = 0;
LIST_HEAD(uf);
- MA_STATE(old_mas, &oldmm->mm_mt, 0, 0);
- MA_STATE(mas, &mm->mm_mt, 0, 0);
+ VMA_ITERATOR(old_vmi, oldmm, 0);
+ VMA_ITERATOR(vmi, mm, 0);
uprobe_start_dup_mmap();
if (mmap_write_lock_killable(oldmm)) {
goto out;
khugepaged_fork(mm, oldmm);
- retval = mas_expected_entries(&mas, oldmm->map_count);
+ retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count);
if (retval)
goto out;
- mas_for_each(&old_mas, mpnt, ULONG_MAX) {
+ for_each_vma(old_vmi, mpnt) {
struct file *file;
if (mpnt->vm_flags & VM_DONTCOPY) {
tmp->anon_vma = NULL;
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
- tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+ vm_flags_clear(tmp, VM_LOCKED_MASK);
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
hugetlb_dup_vma_private(tmp);
/* Link the vma into the MT */
- mas.index = tmp->vm_start;
- mas.last = tmp->vm_end - 1;
- mas_store(&mas, tmp);
- if (mas_is_err(&mas))
- goto fail_nomem_mas_store;
+ if (vma_iter_bulk_store(&vmi, tmp))
+ goto fail_nomem_vmi_store;
mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
/* a new mm has just been created */
retval = arch_dup_mmap(oldmm, mm);
loop_out:
- mas_destroy(&mas);
+ vma_iter_free(&vmi);
out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
uprobe_end_dup_mmap();
return retval;
- fail_nomem_mas_store:
+ fail_nomem_vmi_store:
unlink_anon_vmas(tmp);
fail_nomem_anon_vma_fork:
mpol_put(vma_policy(tmp));
#endif
#ifdef CONFIG_BLK_CGROUP
- tsk->throttle_queue = NULL;
+ tsk->throttle_disk = NULL;
tsk->use_memdelay = 0;
#endif
tsk->reported_split_lock = 0;
#endif
+#ifdef CONFIG_SCHED_MM_CID
+ tsk->mm_cid = -1;
+ tsk->mm_cid_active = 0;
+#endif
return tsk;
free_stack:
mm->user_ns = get_user_ns(user_ns);
lru_gen_init_mm(mm);
+ mm_init_cid(mm);
return mm;
fail_pcpu:
tsk->mm = mm;
tsk->active_mm = mm;
+ sched_mm_cid_fork(tsk);
return 0;
}
* dynamically sized based on the maximum CPU number this system
* can have, taking hotplug into account (nr_cpu_ids).
*/
- mm_size = sizeof(struct mm_struct) + cpumask_size();
+ mm_size = sizeof(struct mm_struct) + cpumask_size() + mm_cid_size();
mm_cachep = kmem_cache_create_usercopy("mm_struct",
mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
#include <linux/sched/task.h>
#include <linux/sched/signal.h>
#include <linux/idr.h>
+ #include "pid_sysctl.h"
static DEFINE_MUTEX(pid_caches_mutex);
static struct kmem_cache *pid_ns_cachep;
ns->ucounts = ucounts;
ns->pid_allocated = PIDNS_ADDING;
+ initialize_memfd_noexec_scope(ns);
+
return ns;
out_free_idr:
set_current_state(TASK_INTERRUPTIBLE);
if (pid_ns->pid_allocated == init_pids)
break;
+ /*
+ * Release tasks_rcu_exit_srcu to avoid following deadlock:
+ *
+ * 1) TASK A unshare(CLONE_NEWPID)
+ * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
+ * and TASK C
+ * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
+ * 4) TASK A calls synchronize_rcu_tasks()
+ * -> synchronize_srcu(tasks_rcu_exit_srcu)
+ * 5) *DEADLOCK*
+ *
+ * It is considered safe to release tasks_rcu_exit_srcu here
+ * because we assume the current task can not be concurrently
+ * reaped at this point.
+ */
+ exit_tasks_rcu_stop();
schedule();
+ exit_tasks_rcu_start();
}
__set_current_state(TASK_RUNNING);
#ifdef CONFIG_CHECKPOINT_RESTORE
register_sysctl_paths(kern_path, pid_ns_ctl_table);
#endif
+
+ register_pid_ns_sysctl_table_vm();
return 0;
}
return NULL;
}
-static inline struct sched_entity *parent_entity(struct sched_entity *se)
+static inline struct sched_entity *parent_entity(const struct sched_entity *se)
{
return se->parent;
}
return min_vruntime;
}
-static inline bool entity_before(struct sched_entity *a,
- struct sched_entity *b)
+static inline bool entity_before(const struct sched_entity *a,
+ const struct sched_entity *b)
{
return (s64)(a->vruntime - b->vruntime) < 0;
}
ns->nr_running += rq->cfs.h_nr_running;
ns->compute_capacity += capacity_of(cpu);
- if (find_idle && !rq->nr_running && idle_cpu(cpu)) {
+ if (find_idle && idle_core < 0 && !rq->nr_running && idle_cpu(cpu)) {
if (READ_ONCE(rq->numa_migrate_on) ||
!cpumask_test_cpu(cpu, env->p->cpus_ptr))
continue;
int start = env->dst_cpu;
/* Find alternative idle CPU. */
- for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start) {
+ for_each_cpu_wrap(cpu, cpumask_of_node(env->dst_nid), start + 1) {
if (cpu == env->best_cpu || !idle_cpu(cpu) ||
!cpumask_test_cpu(cpu, env->p->cpus_ptr)) {
continue;
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
u64 runtime = p->se.sum_exec_runtime;
- MA_STATE(mas, &mm->mm_mt, 0, 0);
struct vm_area_struct *vma;
unsigned long start, end;
unsigned long nr_pte_updates = 0;
long pages, virtpages;
+ struct vma_iterator vmi;
SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
if (!mmap_read_trylock(mm))
return;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_init(&vmi, mm, start);
+ vma = vma_next(&vmi);
if (!vma) {
reset_ptenuma_scan(p);
start = 0;
- mas_set(&mas, start);
- vma = mas_find(&mas, ULONG_MAX);
+ vma_iter_set(&vmi, start);
+ vma = vma_next(&vmi);
}
- for (; vma; vma = mas_find(&mas, ULONG_MAX)) {
+ do {
if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
continue;
cond_resched();
} while (end != vma->vm_end);
- }
+ } for_each_vma(vmi, vma);
out:
/*
*
* For uclamp_max, we can tolerate a drop in performance level as the
* goal is to cap the task. So it's okay if it's getting less.
- *
- * In case of capacity inversion we should honour the inverted capacity
- * for both uclamp_min and uclamp_max all the time.
*/
- capacity_orig = cpu_in_capacity_inversion(cpu);
- if (capacity_orig) {
- capacity_orig_thermal = capacity_orig;
- } else {
- capacity_orig = capacity_orig_of(cpu);
- capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
- }
+ capacity_orig = capacity_orig_of(cpu);
+ capacity_orig_thermal = capacity_orig - arch_scale_thermal_pressure(cpu);
/*
* We want to force a task to fit a cpu as implied by uclamp_max.
* handle the case uclamp_min > uclamp_max.
*/
uclamp_min = min(uclamp_min, uclamp_max);
- if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
- fits = fits && (uclamp_min <= capacity_orig_thermal);
+ if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal))
+ return -1;
return fits;
}
unsigned long uclamp_min = uclamp_eff_value(p, UCLAMP_MIN);
unsigned long uclamp_max = uclamp_eff_value(p, UCLAMP_MAX);
unsigned long util = task_util_est(p);
- return util_fits_cpu(util, uclamp_min, uclamp_max, cpu);
+ /*
+ * Return true only if the cpu fully fits the task requirements, which
+ * include the utilization but also the performance hints.
+ */
+ return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0);
}
static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = cfs_rq->min_vruntime;
+ u64 sleep_time;
/*
* The 'current' period is already promised to the current tasks,
vruntime -= thresh;
}
- /* ensure we never gain time by being placed backwards. */
- se->vruntime = max_vruntime(se->vruntime, vruntime);
+ /*
+ * Pull vruntime of the entity being placed to the base level of
+ * cfs_rq, to prevent boosting it if placed backwards. If the entity
+ * slept for a long time, don't even try to compare its vruntime with
+ * the base as it may be too far off and the comparison may get
+ * inversed due to s64 overflow.
+ */
+ sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+ if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+ se->vruntime = vruntime;
+ else
+ se->vruntime = max_vruntime(se->vruntime, vruntime);
}
static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
struct sched_entity *se;
s64 delta;
- ideal_runtime = sched_slice(cfs_rq, curr);
+ /*
+ * When many tasks blow up the sched_period; it is possible that
+ * sched_slice() reports unusually large results (when many tasks are
+ * very light for example). Therefore impose a maximum.
+ */
+ ideal_runtime = min_t(u64, sched_slice(cfs_rq, curr), sysctl_sched_latency);
+
delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
if (delta_exec > ideal_runtime) {
resched_curr(rq_of(cfs_rq));
resched_curr(rq);
}
-static void distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
+#ifdef CONFIG_SMP
+static void __cfsb_csd_unthrottle(void *arg)
{
- struct cfs_rq *cfs_rq;
+ struct cfs_rq *cursor, *tmp;
+ struct rq *rq = arg;
+ struct rq_flags rf;
+
+ rq_lock(rq, &rf);
+
+ /*
+ * Since we hold rq lock we're safe from concurrent manipulation of
+ * the CSD list. However, this RCU critical section annotates the
+ * fact that we pair with sched_free_group_rcu(), so that we cannot
+ * race with group being freed in the window between removing it
+ * from the list and advancing to the next entry in the list.
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_safe(cursor, tmp, &rq->cfsb_csd_list,
+ throttled_csd_list) {
+ list_del_init(&cursor->throttled_csd_list);
+
+ if (cfs_rq_throttled(cursor))
+ unthrottle_cfs_rq(cursor);
+ }
+
+ rcu_read_unlock();
+
+ rq_unlock(rq, &rf);
+}
+
+static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
+{
+ struct rq *rq = rq_of(cfs_rq);
+ bool first;
+
+ if (rq == this_rq()) {
+ unthrottle_cfs_rq(cfs_rq);
+ return;
+ }
+
+ /* Already enqueued */
+ if (SCHED_WARN_ON(!list_empty(&cfs_rq->throttled_csd_list)))
+ return;
+
+ first = list_empty(&rq->cfsb_csd_list);
+ list_add_tail(&cfs_rq->throttled_csd_list, &rq->cfsb_csd_list);
+ if (first)
+ smp_call_function_single_async(cpu_of(rq), &rq->cfsb_csd);
+}
+#else
+static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
+{
+ unthrottle_cfs_rq(cfs_rq);
+}
+#endif
+
+static void unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
+{
+ lockdep_assert_rq_held(rq_of(cfs_rq));
+
+ if (SCHED_WARN_ON(!cfs_rq_throttled(cfs_rq) ||
+ cfs_rq->runtime_remaining <= 0))
+ return;
+
+ __unthrottle_cfs_rq_async(cfs_rq);
+}
+
+static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
+{
+ struct cfs_rq *local_unthrottle = NULL;
+ int this_cpu = smp_processor_id();
u64 runtime, remaining = 1;
+ bool throttled = false;
+ struct cfs_rq *cfs_rq;
+ struct rq_flags rf;
+ struct rq *rq;
rcu_read_lock();
list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq,
throttled_list) {
- struct rq *rq = rq_of(cfs_rq);
- struct rq_flags rf;
+ rq = rq_of(cfs_rq);
+
+ if (!remaining) {
+ throttled = true;
+ break;
+ }
rq_lock_irqsave(rq, &rf);
if (!cfs_rq_throttled(cfs_rq))
goto next;
- /* By the above check, this should never be true */
+#ifdef CONFIG_SMP
+ /* Already queued for async unthrottle */
+ if (!list_empty(&cfs_rq->throttled_csd_list))
+ goto next;
+#endif
+
+ /* By the above checks, this should never be true */
SCHED_WARN_ON(cfs_rq->runtime_remaining > 0);
raw_spin_lock(&cfs_b->lock);
cfs_rq->runtime_remaining += runtime;
/* we check whether we're throttled above */
- if (cfs_rq->runtime_remaining > 0)
- unthrottle_cfs_rq(cfs_rq);
+ if (cfs_rq->runtime_remaining > 0) {
+ if (cpu_of(rq) != this_cpu ||
+ SCHED_WARN_ON(local_unthrottle))
+ unthrottle_cfs_rq_async(cfs_rq);
+ else
+ local_unthrottle = cfs_rq;
+ } else {
+ throttled = true;
+ }
next:
rq_unlock_irqrestore(rq, &rf);
-
- if (!remaining)
- break;
}
rcu_read_unlock();
+
+ if (local_unthrottle) {
+ rq = cpu_rq(this_cpu);
+ rq_lock_irqsave(rq, &rf);
+ if (cfs_rq_throttled(local_unthrottle))
+ unthrottle_cfs_rq(local_unthrottle);
+ rq_unlock_irqrestore(rq, &rf);
+ }
+
+ return throttled;
}
/*
while (throttled && cfs_b->runtime > 0) {
raw_spin_unlock_irqrestore(&cfs_b->lock, flags);
/* we can't nest cfs_b->lock while distributing bandwidth */
- distribute_cfs_runtime(cfs_b);
+ throttled = distribute_cfs_runtime(cfs_b);
raw_spin_lock_irqsave(&cfs_b->lock, flags);
-
- throttled = !list_empty(&cfs_b->throttled_cfs_rq);
}
/*
{
cfs_rq->runtime_enabled = 0;
INIT_LIST_HEAD(&cfs_rq->throttled_list);
+#ifdef CONFIG_SMP
+ INIT_LIST_HEAD(&cfs_rq->throttled_csd_list);
+#endif
}
void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
{
+ int __maybe_unused i;
+
/* init_cfs_bandwidth() was not called */
if (!cfs_b->throttled_cfs_rq.next)
return;
hrtimer_cancel(&cfs_b->period_timer);
hrtimer_cancel(&cfs_b->slack_timer);
+
+ /*
+ * It is possible that we still have some cfs_rq's pending on a CSD
+ * list, though this race is very rare. In order for this to occur, we
+ * must have raced with the last task leaving the group while there
+ * exist throttled cfs_rq(s), and the period_timer must have queued the
+ * CSD item but the remote cpu has not yet processed it. To handle this,
+ * we can simply flush all pending CSD work inline here. We're
+ * guaranteed at this point that no additional cfs_rq of this group can
+ * join a CSD list.
+ */
+#ifdef CONFIG_SMP
+ for_each_possible_cpu(i) {
+ struct rq *rq = cpu_rq(i);
+ unsigned long flags;
+
+ if (list_empty(&rq->cfsb_csd_list))
+ continue;
+
+ local_irq_save(flags);
+ __cfsb_csd_unthrottle(rq);
+ local_irq_restore(flags);
+ }
+#endif
}
/*
unsigned long rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
unsigned long rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
+ /* Return true only if the utilization doesn't fit CPU's capacity */
return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
}
select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
{
unsigned long task_util, util_min, util_max, best_cap = 0;
+ int fits, best_fits = 0;
int cpu, best_cpu = -1;
struct cpumask *cpus;
util_min = uclamp_eff_value(p, UCLAMP_MIN);
util_max = uclamp_eff_value(p, UCLAMP_MAX);
- for_each_cpu_wrap(cpu, cpus, target) {
+ for_each_cpu_wrap(cpu, cpus, target + 1) {
unsigned long cpu_cap = capacity_of(cpu);
if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
continue;
- if (util_fits_cpu(task_util, util_min, util_max, cpu))
+
+ fits = util_fits_cpu(task_util, util_min, util_max, cpu);
+
+ /* This CPU fits with all requirements */
+ if (fits > 0)
return cpu;
+ /*
+ * Only the min performance hint (i.e. uclamp_min) doesn't fit.
+ * Look for the CPU with best capacity.
+ */
+ else if (fits < 0)
+ cpu_cap = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
- if (cpu_cap > best_cap) {
+ /*
+ * First, select CPU which fits better (-1 being better than 0).
+ * Then, select the one with best capacity at same level.
+ */
+ if ((fits < best_fits) ||
+ ((fits == best_fits) && (cpu_cap > best_cap))) {
best_cap = cpu_cap;
best_cpu = cpu;
+ best_fits = fits;
}
}
int cpu)
{
if (sched_asym_cpucap_active())
- return util_fits_cpu(util, util_min, util_max, cpu);
+ /*
+ * Return true only if the cpu fully fits the task requirements
+ * which include the utilization and the performance hints.
+ */
+ return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
return true;
}
unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024;
struct root_domain *rd = this_rq()->rd;
int cpu, best_energy_cpu, target = -1;
+ int prev_fits = -1, best_fits = -1;
+ unsigned long best_thermal_cap = 0;
+ unsigned long prev_thermal_cap = 0;
struct sched_domain *sd;
struct perf_domain *pd;
struct energy_env eenv;
eenv_task_busy_time(&eenv, p, prev_cpu);
for (; pd; pd = pd->next) {
+ unsigned long util_min = p_util_min, util_max = p_util_max;
unsigned long cpu_cap, cpu_thermal_cap, util;
unsigned long cur_delta, max_spare_cap = 0;
unsigned long rq_util_min, rq_util_max;
- unsigned long util_min, util_max;
unsigned long prev_spare_cap = 0;
int max_spare_cap_cpu = -1;
unsigned long base_energy;
+ int fits, max_fits = -1;
cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
eenv.pd_cap = 0;
for_each_cpu(cpu, cpus) {
+ struct rq *rq = cpu_rq(cpu);
+
eenv.pd_cap += cpu_thermal_cap;
if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
* much capacity we can get out of the CPU; this is
* aligned with sched_cpu_util().
*/
- if (uclamp_is_used()) {
- if (uclamp_rq_is_idle(cpu_rq(cpu))) {
- util_min = p_util_min;
- util_max = p_util_max;
- } else {
- /*
- * Open code uclamp_rq_util_with() except for
- * the clamp() part. Ie: apply max aggregation
- * only. util_fits_cpu() logic requires to
- * operate on non clamped util but must use the
- * max-aggregated uclamp_{min, max}.
- */
- rq_util_min = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
- rq_util_max = uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
-
- util_min = max(rq_util_min, p_util_min);
- util_max = max(rq_util_max, p_util_max);
- }
+ if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) {
+ /*
+ * Open code uclamp_rq_util_with() except for
+ * the clamp() part. Ie: apply max aggregation
+ * only. util_fits_cpu() logic requires to
+ * operate on non clamped util but must use the
+ * max-aggregated uclamp_{min, max}.
+ */
+ rq_util_min = uclamp_rq_get(rq, UCLAMP_MIN);
+ rq_util_max = uclamp_rq_get(rq, UCLAMP_MAX);
+
+ util_min = max(rq_util_min, p_util_min);
+ util_max = max(rq_util_max, p_util_max);
}
- if (!util_fits_cpu(util, util_min, util_max, cpu))
+
+ fits = util_fits_cpu(util, util_min, util_max, cpu);
+ if (!fits)
continue;
lsub_positive(&cpu_cap, util);
if (cpu == prev_cpu) {
/* Always use prev_cpu as a candidate. */
prev_spare_cap = cpu_cap;
- } else if (cpu_cap > max_spare_cap) {
+ prev_fits = fits;
+ } else if ((fits > max_fits) ||
+ ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
/*
* Find the CPU with the maximum spare capacity
* among the remaining CPUs in the performance
*/
max_spare_cap = cpu_cap;
max_spare_cap_cpu = cpu;
+ max_fits = fits;
}
}
if (prev_delta < base_energy)
goto unlock;
prev_delta -= base_energy;
+ prev_thermal_cap = cpu_thermal_cap;
best_delta = min(best_delta, prev_delta);
}
/* Evaluate the energy impact of using max_spare_cap_cpu. */
if (max_spare_cap_cpu >= 0 && max_spare_cap > prev_spare_cap) {
+ /* Current best energy cpu fits better */
+ if (max_fits < best_fits)
+ continue;
+
+ /*
+ * Both don't fit performance hint (i.e. uclamp_min)
+ * but best energy cpu has better capacity.
+ */
+ if ((max_fits < 0) &&
+ (cpu_thermal_cap <= best_thermal_cap))
+ continue;
+
cur_delta = compute_energy(&eenv, pd, cpus, p,
max_spare_cap_cpu);
/* CPU utilization has changed */
if (cur_delta < base_energy)
goto unlock;
cur_delta -= base_energy;
- if (cur_delta < best_delta) {
- best_delta = cur_delta;
- best_energy_cpu = max_spare_cap_cpu;
- }
+
+ /*
+ * Both fit for the task but best energy cpu has lower
+ * energy impact.
+ */
+ if ((max_fits > 0) && (best_fits > 0) &&
+ (cur_delta >= best_delta))
+ continue;
+
+ best_delta = cur_delta;
+ best_energy_cpu = max_spare_cap_cpu;
+ best_fits = max_fits;
+ best_thermal_cap = cpu_thermal_cap;
}
}
rcu_read_unlock();
- if (best_delta < prev_delta)
+ if ((best_fits > prev_fits) ||
+ ((best_fits > 0) && (best_delta < prev_delta)) ||
+ ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap)))
target = best_energy_cpu;
return target;
static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
- unsigned long capacity_orig = arch_scale_cpu_capacity(cpu);
unsigned long capacity = scale_rt_capacity(cpu);
struct sched_group *sdg = sd->groups;
- struct rq *rq = cpu_rq(cpu);
- rq->cpu_capacity_orig = capacity_orig;
+ cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(cpu);
if (!capacity)
capacity = 1;
- rq->cpu_capacity = capacity;
-
- /*
- * Detect if the performance domain is in capacity inversion state.
- *
- * Capacity inversion happens when another perf domain with equal or
- * lower capacity_orig_of() ends up having higher capacity than this
- * domain after subtracting thermal pressure.
- *
- * We only take into account thermal pressure in this detection as it's
- * the only metric that actually results in *real* reduction of
- * capacity due to performance points (OPPs) being dropped/become
- * unreachable due to thermal throttling.
- *
- * We assume:
- * * That all cpus in a perf domain have the same capacity_orig
- * (same uArch).
- * * Thermal pressure will impact all cpus in this perf domain
- * equally.
- */
- if (static_branch_unlikely(&sched_asym_cpucapacity)) {
- unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
- struct perf_domain *pd = rcu_dereference(rq->rd->pd);
-
- rq->cpu_capacity_inverted = 0;
-
- for (; pd; pd = pd->next) {
- struct cpumask *pd_span = perf_domain_span(pd);
- unsigned long pd_cap_orig, pd_cap;
-
- cpu = cpumask_any(pd_span);
- pd_cap_orig = arch_scale_cpu_capacity(cpu);
-
- if (capacity_orig < pd_cap_orig)
- continue;
-
- /*
- * handle the case of multiple perf domains have the
- * same capacity_orig but one of them is under higher
- * thermal pressure. We record it as capacity
- * inversion.
- */
- if (capacity_orig == pd_cap_orig) {
- pd_cap = pd_cap_orig - thermal_load_avg(cpu_rq(cpu));
-
- if (pd_cap > inv_cap) {
- rq->cpu_capacity_inverted = inv_cap;
- break;
- }
- } else if (pd_cap_orig > inv_cap) {
- rq->cpu_capacity_inverted = inv_cap;
- break;
- }
- }
- }
-
- trace_sched_cpu_capacity_tp(rq);
+ cpu_rq(cpu)->cpu_capacity = capacity;
+ trace_sched_cpu_capacity_tp(cpu_rq(cpu));
sdg->sgc->capacity = capacity;
sdg->sgc->min_capacity = capacity;
*/
update_sd_lb_stats(env, &sds);
- if (sched_energy_enabled()) {
- struct root_domain *rd = env->dst_rq->rd;
-
- if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
- goto out_balanced;
- }
-
- local = &sds.local_stat;
- busiest = &sds.busiest_stat;
-
/* There is no busy sibling group to pull tasks from */
if (!sds.busiest)
goto out_balanced;
+ busiest = &sds.busiest_stat;
+
/* Misfit tasks should be dealt with regardless of the avg load */
if (busiest->group_type == group_misfit_task)
goto force_balance;
+ if (sched_energy_enabled()) {
+ struct root_domain *rd = env->dst_rq->rd;
+
+ if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
+ goto out_balanced;
+ }
+
/* ASYM feature bypasses nice load balance check */
if (busiest->group_type == group_asym_packing)
goto force_balance;
if (busiest->group_type == group_imbalanced)
goto force_balance;
+ local = &sds.local_stat;
/*
* If the local group is busier than the selected busiest group
* don't try and pull any tasks.
/*
* se_fi_update - Update the cfs_rq->min_vruntime_fi in a CFS hierarchy if needed.
*/
-static void se_fi_update(struct sched_entity *se, unsigned int fi_seq, bool forceidle)
+static void se_fi_update(const struct sched_entity *se, unsigned int fi_seq,
+ bool forceidle)
{
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
se_fi_update(se, rq->core->core_forceidle_seq, in_fi);
}
-bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool in_fi)
+bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
+ bool in_fi)
{
struct rq *rq = task_rq(a);
- struct sched_entity *sea = &a->se;
- struct sched_entity *seb = &b->se;
+ const struct sched_entity *sea = &a->se;
+ const struct sched_entity *seb = &b->se;
struct cfs_rq *cfs_rqa;
struct cfs_rq *cfs_rqb;
s64 delta;
for_each_possible_cpu(i) {
zalloc_cpumask_var_node(&per_cpu(load_balance_mask, i), GFP_KERNEL, cpu_to_node(i));
zalloc_cpumask_var_node(&per_cpu(select_rq_mask, i), GFP_KERNEL, cpu_to_node(i));
+
+#ifdef CONFIG_CFS_BANDWIDTH
+ INIT_CSD(&cpu_rq(i)->cfsb_csd, __cfsb_csd_unthrottle, cpu_rq(i));
+ INIT_LIST_HEAD(&cpu_rq(i)->cfsb_csd_list);
+#endif
}
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
if (resource >= RLIM_NLIMITS)
return -EINVAL;
+ resource = array_index_nospec(resource, RLIM_NLIMITS);
+
if (new_rlim) {
if (new_rlim->rlim_cur > new_rlim->rlim_max)
return -EINVAL;
}
#endif /* CONFIG_ANON_VMA_NAME */
+ static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5)
+ {
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+ return -EINVAL;
+
+ if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
+ set_bit(MMF_HAS_MDWE, ¤t->mm->flags);
+ else if (test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
+ return -EPERM; /* Cannot unset the flag */
+
+ return 0;
+ }
+
+ static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5)
+ {
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ return test_bit(MMF_HAS_MDWE, ¤t->mm->flags) ?
+ PR_MDWE_REFUSE_EXEC_GAIN : 0;
+ }
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
error = sched_core_share_pid(arg2, arg3, arg4, arg5);
break;
#endif
+ case PR_SET_MDWE:
+ error = prctl_set_mdwe(arg2, arg3, arg4, arg5);
+ break;
+ case PR_GET_MDWE:
+ error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
+ break;
case PR_SET_VMA:
error = prctl_set_vma(arg2, arg3, arg4, arg5);
break;
btf_decl_tag) or not. Currently only clang compiler implements
these attributes, so make the config depend on CC_IS_CLANG.
+config PAHOLE_HAS_LANG_EXCLUDE
+ def_bool PAHOLE_VERSION >= 124
+ help
+ Support for the --lang_exclude flag which makes pahole exclude
+ compilation units from the supplied language. Used in Kbuild to
+ omit Rust CUs which are not supported in version 1.24 of pahole,
+ otherwise it would emit malformed kernel and module binaries when
+ using DEBUG_INFO_BTF_MODULES.
+
config DEBUG_INFO_BTF_MODULES
def_bool y
depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
visibility into the kernel memory shrinkers subsystem.
Disable it to avoid an extra memory footprint.
- config HAVE_DEBUG_KMEMLEAK
- bool
-
- config DEBUG_KMEMLEAK
- bool "Kernel memory leak detector"
- depends on DEBUG_KERNEL && HAVE_DEBUG_KMEMLEAK
- select DEBUG_FS
- select STACKTRACE if STACKTRACE_SUPPORT
- select KALLSYMS
- select CRC32
- select STACKDEPOT
- select STACKDEPOT_ALWAYS_INIT if !DEBUG_KMEMLEAK_DEFAULT_OFF
- help
- Say Y here if you want to enable the memory leak
- detector. The memory allocation/freeing is traced in a way
- similar to the Boehm's conservative garbage collector, the
- difference being that the orphan objects are not freed but
- only shown in /sys/kernel/debug/kmemleak. Enabling this
- feature will introduce an overhead to memory
- allocations. See Documentation/dev-tools/kmemleak.rst for more
- details.
-
- Enabling DEBUG_SLAB or SLUB_DEBUG may increase the chances
- of finding leaks due to the slab objects poisoning.
-
- In order to access the kmemleak file, debugfs needs to be
- mounted (usually at /sys/kernel/debug).
-
- config DEBUG_KMEMLEAK_MEM_POOL_SIZE
- int "Kmemleak memory pool size"
- depends on DEBUG_KMEMLEAK
- range 200 1000000
- default 16000
- help
- Kmemleak must track all the memory allocations to avoid
- reporting false positives. Since memory may be allocated or
- freed before kmemleak is fully initialised, use a static pool
- of metadata objects to track such callbacks. After kmemleak is
- fully initialised, this memory pool acts as an emergency one
- if slab allocations fail.
-
- config DEBUG_KMEMLEAK_TEST
- tristate "Simple test for the kernel memory leak detector"
- depends on DEBUG_KMEMLEAK && m
- help
- This option enables a module that explicitly leaks memory.
-
- If unsure, say N.
-
- config DEBUG_KMEMLEAK_DEFAULT_OFF
- bool "Default kmemleak to off"
- depends on DEBUG_KMEMLEAK
- help
- Say Y here to disable kmemleak by default. It can then be enabled
- on the command line via kmemleak=on.
-
- config DEBUG_KMEMLEAK_AUTO_SCAN
- bool "Enable kmemleak auto scan thread on boot up"
- default y
- depends on DEBUG_KMEMLEAK
- help
- Depending on the cpu, kmemleak scan may be cpu intensive and can
- stall user tasks at times. This option enables/disables automatic
- kmemleak scan at boot up.
-
- Say N here to disable kmemleak auto scan thread to stop automatic
- scanning. Disabling this option disables automatic reporting of
- memory leaks.
-
- If unsure, say Y.
-
config DEBUG_STACK_USAGE
bool "Stack utilization instrumentation"
depends on DEBUG_KERNEL && !IA64
depends on TRACE_IRQFLAGS
depends on TRACE_IRQFLAGS_NMI_SUPPORT
+config NMI_CHECK_CPU
+ bool "Debugging for CPUs failing to respond to backtrace requests"
+ depends on DEBUG_KERNEL
+ depends on X86
+ default n
+ help
+ Enables debug prints when a CPU fails to respond to a given
+ backtrace NMI. These prints provide some reasons why a CPU
+ might legitimately be failing to respond, for example, if it
+ is offline of if ignore_nmis is set.
+
config DEBUG_IRQFLAGS
bool "Debug IRQ flag manipulation"
help
help
Add fault injections into various functions that are annotated with
ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
- value of theses functions. This is useful to test error paths of code.
+ value of these functions. This is useful to test error paths of code.
If unsure, say N
If unsure, say N.
+config HASHTABLE_KUNIT_TEST
+ tristate "KUnit Test for Kernel Hashtable structures" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS
+ help
+ This builds the hashtable KUnit test suite.
+ It tests the basic functionality of the API defined in
+ include/linux/hashtable.h. For more information on KUnit and
+ unit tests in general please refer to the KUnit documentation
+ in Documentation/dev-tools/kunit/.
+
+ If unsure, say N.
+
config LINEAR_RANGES_TEST
tristate "KUnit test for linear_ranges"
depends on KUNIT
If unsure, say N.
+config MEMCPY_SLOW_KUNIT_TEST
+ bool "Include exhaustive memcpy tests"
+ depends on MEMCPY_KUNIT_TEST
+ default y
+ help
+ Some memcpy tests are quite exhaustive in checking for overlaps
+ and bit ranges. These can be very slow, so they are split out
+ as a separate config, in case they need to be disabled.
+
config IS_SIGNED_TYPE_KUNIT_TEST
tristate "Test is_signed_type() macro" if !KUNIT_ALL_TESTS
depends on KUNIT
endmenu # "Rust"
-source "Documentation/Kconfig"
-
endmenu # Kernel hacking
return false;
}
- EXPORT_SYMBOL(PageMovable);
void __SetPageMovable(struct page *page, const struct movable_operations *mops)
{
locked = NULL;
}
- if (!isolate_movable_page(page, mode))
+ if (isolate_movable_page(page, mode))
goto isolate_success;
}
/*
* Avoid isolating too much unless this block is being
- * rescanned (e.g. dirty/writeback pages, parallel allocation)
+ * fully scanned (e.g. dirty/writeback pages, parallel allocation)
* or a lock is contended. For contention, isolate quickly to
* potentially remove one source of contention.
*/
if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
- !cc->rescan && !cc->contended) {
+ !cc->finish_pageblock && !cc->contended) {
++low_pfn;
break;
}
}
/*
- * Updated the cached scanner pfn once the pageblock has been scanned
+ * Update the cached scanner pfn once the pageblock has been scanned.
* Pages will either be migrated in which case there is no point
* scanning in the near future or migration failed in which case the
* failure reason may persist. The block is marked for skipping if
* there were no pages isolated in the block or if the block is
* rescanned twice in a row.
*/
- if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
+ if (low_pfn == end_pfn && (!nr_isolated || cc->finish_pageblock)) {
if (valid_page && !skip_updated)
set_pageblock_skip(valid_page);
update_cached_migrate(cc, low_pfn);
if (cc->ignore_skip_hint)
return pfn;
+ /*
+ * If the pageblock should be finished then do not select a different
+ * pageblock.
+ */
+ if (cc->finish_pageblock)
+ return pfn;
+
/*
* If the migrate_pfn is not at the start of a zone or the start
* of a pageblock then assume this is a continuation of a previous
pfn = cc->zone->zone_start_pfn;
cc->fast_search_fail = 0;
found_block = true;
+ set_pageblock_skip(freepage);
break;
}
}
struct zone *zone;
zone = &pgdat->node_zones[zoneid];
+ if (!populated_zone(zone))
+ continue;
score += fragmentation_score_zone_weighted(zone);
}
if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
return ret;
- /* huh, compaction_suitable is returning something unexpected */
- VM_BUG_ON(ret != COMPACT_CONTINUE);
-
/*
* Clear pageblock skip if there were failures recently and compaction
* is about to be retried after being deferred.
unsigned long iteration_start_pfn = cc->migrate_pfn;
/*
- * Avoid multiple rescans which can happen if a page cannot be
- * isolated (dirty/writeback in async mode) or if the migrated
- * pages are being allocated before the pageblock is cleared.
- * The first rescan will capture the entire pageblock for
- * migration. If it fails, it'll be marked skip and scanning
- * will proceed as normal.
+ * Avoid multiple rescans of the same pageblock which can
+ * happen if a page cannot be isolated (dirty/writeback in
+ * async mode) or if the migrated pages are being allocated
+ * before the pageblock is cleared. The first rescan will
+ * capture the entire pageblock for migration. If it fails,
+ * it'll be marked skip and scanning will proceed as normal.
*/
- cc->rescan = false;
+ cc->finish_pageblock = false;
if (pageblock_start_pfn(last_migrated_pfn) ==
pageblock_start_pfn(iteration_start_pfn)) {
- cc->rescan = true;
+ cc->finish_pageblock = true;
}
+ rescan:
switch (isolate_migratepages(cc)) {
case ISOLATE_ABORT:
ret = COMPACT_CONTENDED;
goto out;
}
/*
- * We failed to migrate at least one page in the current
- * order-aligned block, so skip the rest of it.
+ * If an ASYNC or SYNC_LIGHT fails to migrate a page
+ * within the current order-aligned block, scan the
+ * remainder of the pageblock. This will mark the
+ * pageblock "skip" to avoid rescanning in the near
+ * future. This will isolate more pages than necessary
+ * for the request but avoid loops due to
+ * fast_find_migrateblock revisiting blocks that were
+ * recently partially scanned.
*/
- if (cc->direct_compaction &&
- (cc->mode == MIGRATE_ASYNC)) {
- cc->migrate_pfn = block_end_pfn(
- cc->migrate_pfn - 1, cc->order);
- /* Draining pcplists is useless in this case */
- last_migrated_pfn = 0;
+ if (cc->direct_compaction && !cc->finish_pageblock &&
+ (cc->mode < MIGRATE_SYNC)) {
+ cc->finish_pageblock = true;
+
+ /*
+ * Draining pcplists does not help THP if
+ * any page failed to migrate. Even after
+ * drain, the pageblock will not be free.
+ */
+ if (cc->order == COMPACTION_HPAGE_ORDER)
+ last_migrated_pfn = 0;
+
+ goto rescan;
}
}
+ /* Stop if a page has been captured */
+ if (capc && capc->page) {
+ ret = COMPACT_SUCCESS;
+ break;
+ }
+
check_drain:
/*
* Has the migration scanner moved away from the previous
last_migrated_pfn = 0;
}
}
-
- /* Stop if a page has been captured */
- if (capc && capc->page) {
- ret = COMPACT_SUCCESS;
- break;
- }
}
out:
trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
+ VM_BUG_ON(!list_empty(&cc->freepages));
+ VM_BUG_ON(!list_empty(&cc->migratepages));
+
return ret;
}
ret = compact_zone(&cc, &capc);
- VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
-
/*
* Make sure we hide capture control first before we read the captured
* page pointer, otherwise an interrupt could free and capture a page
compact_zone(&cc, NULL);
- VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
+ count_compact_events(KCOMPACTD_MIGRATE_SCANNED,
+ cc.total_migrate_scanned);
+ count_compact_events(KCOMPACTD_FREE_SCANNED,
+ cc.total_free_scanned);
}
}
cc.zone = zone;
compact_zone(&cc, NULL);
-
- VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
}
}
continue;
pgdat->proactive_compact_trigger = true;
+ trace_mm_compaction_wakeup_kcompactd(pgdat->node_id, -1,
+ pgdat->nr_zones - 1);
wake_up_interruptible(&pgdat->kcompactd_wait);
}
}
cc.total_migrate_scanned);
count_compact_events(KCOMPACTD_FREE_SCANNED,
cc.total_free_scanned);
-
- VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
}
/*
#include <linux/ramfs.h>
#include <linux/page_idle.h>
#include <linux/migrate.h>
+#include <linux/pipe_fs_i.h>
+#include <linux/splice.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
* ->i_pages lock (__sync_single_inode)
*
* ->i_mmap_rwsem
- * ->anon_vma.lock (vma_adjust)
+ * ->anon_vma.lock (vma_merge)
*
* ->anon_vma.lock
* ->page_table_lock or pte_lock (anon_vma_prepare and various)
bool filemap_range_has_page(struct address_space *mapping,
loff_t start_byte, loff_t end_byte)
{
- struct page *page;
+ struct folio *folio;
XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
pgoff_t max = end_byte >> PAGE_SHIFT;
rcu_read_lock();
for (;;) {
- page = xas_find(&xas, max);
- if (xas_retry(&xas, page))
+ folio = xas_find(&xas, max);
+ if (xas_retry(&xas, folio))
continue;
/* Shadow entries don't count */
- if (xa_is_value(page))
+ if (xa_is_value(folio))
continue;
/*
* We don't need to try to pin this page; we're about to
}
rcu_read_unlock();
- return page != NULL;
+ return folio != NULL;
}
EXPORT_SYMBOL(filemap_range_has_page);
{
pgoff_t index = start_byte >> PAGE_SHIFT;
pgoff_t end = end_byte >> PAGE_SHIFT;
- struct pagevec pvec;
- int nr_pages;
+ struct folio_batch fbatch;
+ unsigned nr_folios;
+
+ folio_batch_init(&fbatch);
- pagevec_init(&pvec);
while (index <= end) {
unsigned i;
- nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index,
- end, PAGECACHE_TAG_WRITEBACK);
- if (!nr_pages)
+ nr_folios = filemap_get_folios_tag(mapping, &index, end,
+ PAGECACHE_TAG_WRITEBACK, &fbatch);
+
+ if (!nr_folios)
break;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
+ for (i = 0; i < nr_folios; i++) {
+ struct folio *folio = fbatch.folios[i];
- wait_on_page_writeback(page);
- ClearPageError(page);
+ folio_wait_writeback(folio);
+ folio_clear_error(folio);
}
- pagevec_release(&pvec);
+ folio_batch_release(&fbatch);
cond_resched();
}
}
EXPORT_SYMBOL(filemap_get_folios_contig);
/**
- * find_get_pages_range_tag - Find and return head pages matching @tag.
- * @mapping: the address_space to search
- * @index: the starting page index
- * @end: The final page index (inclusive)
- * @tag: the tag index
- * @nr_pages: the maximum number of pages
- * @pages: where the resulting pages are placed
+ * filemap_get_folios_tag - Get a batch of folios matching @tag
+ * @mapping: The address_space to search
+ * @start: The starting page index
+ * @end: The final page index (inclusive)
+ * @tag: The tag index
+ * @fbatch: The batch to fill
*
- * Like find_get_pages_range(), except we only return head pages which are
- * tagged with @tag. @index is updated to the index immediately after the
- * last page we return, ready for the next iteration.
+ * Same as filemap_get_folios(), but only returning folios tagged with @tag.
*
- * Return: the number of pages which were found.
+ * Return: The number of folios found.
+ * Also update @start to index the next folio for traversal.
*/
- unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
- pgoff_t end, xa_mark_t tag, unsigned int nr_pages,
- struct page **pages)
+ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
+ pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch)
{
- XA_STATE(xas, &mapping->i_pages, *index);
+ XA_STATE(xas, &mapping->i_pages, *start);
struct folio *folio;
- unsigned ret = 0;
-
- if (unlikely(!nr_pages))
- return 0;
rcu_read_lock();
- while ((folio = find_get_entry(&xas, end, tag))) {
+ while ((folio = find_get_entry(&xas, end, tag)) != NULL) {
/*
* Shadow entries should never be tagged, but this iteration
* is lockless so there is a window for page reclaim to evict
- * a page we saw tagged. Skip over it.
+ * a page we saw tagged. Skip over it.
*/
if (xa_is_value(folio))
continue;
+ if (!folio_batch_add(fbatch, folio)) {
+ unsigned long nr = folio_nr_pages(folio);
- pages[ret] = &folio->page;
- if (++ret == nr_pages) {
- *index = folio->index + folio_nr_pages(folio);
+ if (folio_test_hugetlb(folio))
+ nr = 1;
+ *start = folio->index + nr;
goto out;
}
}
-
/*
- * We come here when we got to @end. We take care to not overflow the
- * index @index as it confuses some of the callers. This breaks the
- * iteration when there is a page at index -1 but that is already
- * broken anyway.
+ * We come here when there is no page beyond @end. We take care to not
+ * overflow the index @start as it confuses some of the callers. This
+ * breaks the iteration when there is a page at index -1 but that is
+ * already broke anyway.
*/
if (end == (pgoff_t)-1)
- *index = (pgoff_t)-1;
+ *start = (pgoff_t)-1;
else
- *index = end + 1;
+ *start = end + 1;
out:
rcu_read_unlock();
- return ret;
+ return folio_batch_count(fbatch);
}
- EXPORT_SYMBOL(find_get_pages_range_tag);
+ EXPORT_SYMBOL(filemap_get_folios_tag);
/*
* CD/DVDs are error prone. When a medium error occurs, the driver may fail
}
static bool filemap_range_uptodate(struct address_space *mapping,
- loff_t pos, struct iov_iter *iter, struct folio *folio)
+ loff_t pos, size_t count, struct folio *folio,
+ bool need_uptodate)
{
- int count;
-
if (folio_test_uptodate(folio))
return true;
/* pipes can't handle partially uptodate pages */
- if (iov_iter_is_pipe(iter))
+ if (need_uptodate)
return false;
if (!mapping->a_ops->is_partially_uptodate)
return false;
if (mapping->host->i_blkbits >= folio_shift(folio))
return false;
- count = iter->count;
if (folio_pos(folio) > pos) {
count -= folio_pos(folio) - pos;
pos = 0;
}
static int filemap_update_page(struct kiocb *iocb,
- struct address_space *mapping, struct iov_iter *iter,
- struct folio *folio)
+ struct address_space *mapping, size_t count,
+ struct folio *folio, bool need_uptodate)
{
int error;
goto unlock;
error = 0;
- if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, folio))
+ if (filemap_range_uptodate(mapping, iocb->ki_pos, count, folio,
+ need_uptodate))
goto unlock;
error = -EAGAIN;
return 0;
}
-static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
- struct folio_batch *fbatch)
+static int filemap_get_pages(struct kiocb *iocb, size_t count,
+ struct folio_batch *fbatch, bool need_uptodate)
{
struct file *filp = iocb->ki_filp;
struct address_space *mapping = filp->f_mapping;
struct folio *folio;
int err = 0;
- last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
+ /* "last_index" is the index of the page beyond the end of the read */
+ last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
retry:
if (fatal_signal_pending(current))
return -EINTR;
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
if ((iocb->ki_flags & IOCB_WAITQ) &&
folio_batch_count(fbatch) > 1)
iocb->ki_flags |= IOCB_NOWAIT;
- err = filemap_update_page(iocb, mapping, iter, folio);
+ err = filemap_update_page(iocb, mapping, count, folio,
+ need_uptodate);
if (err)
goto err;
}
if (unlikely(iocb->ki_pos >= i_size_read(inode)))
break;
- error = filemap_get_pages(iocb, iter, &fbatch);
+ error = filemap_get_pages(iocb, iter->count, &fbatch,
+ iov_iter_is_pipe(iter));
if (error < 0)
break;
}
EXPORT_SYMBOL(generic_file_read_iter);
+/*
+ * Splice subpages from a folio into a pipe.
+ */
+size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
+ struct folio *folio, loff_t fpos, size_t size)
+{
+ struct page *page;
+ size_t spliced = 0, offset = offset_in_folio(folio, fpos);
+
+ page = folio_page(folio, offset / PAGE_SIZE);
+ size = min(size, folio_size(folio) - offset);
+ offset %= PAGE_SIZE;
+
+ while (spliced < size &&
+ !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+ struct pipe_buffer *buf = pipe_head_buf(pipe);
+ size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced);
+
+ *buf = (struct pipe_buffer) {
+ .ops = &page_cache_pipe_buf_ops,
+ .page = page,
+ .offset = offset,
+ .len = part,
+ };
+ folio_get(folio);
+ pipe->head++;
+ page++;
+ spliced += part;
+ offset = 0;
+ }
+
+ return spliced;
+}
+
+/*
+ * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into
+ * a pipe.
+ */
+ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
+ struct pipe_inode_info *pipe,
+ size_t len, unsigned int flags)
+{
+ struct folio_batch fbatch;
+ struct kiocb iocb;
+ size_t total_spliced = 0, used, npages;
+ loff_t isize, end_offset;
+ bool writably_mapped;
+ int i, error = 0;
+
+ init_sync_kiocb(&iocb, in);
+ iocb.ki_pos = *ppos;
+
+ /* Work out how much data we can actually add into the pipe */
+ used = pipe_occupancy(pipe->head, pipe->tail);
+ npages = max_t(ssize_t, pipe->max_usage - used, 0);
+ len = min_t(size_t, len, npages * PAGE_SIZE);
+
+ folio_batch_init(&fbatch);
+
+ do {
+ cond_resched();
+
+ if (*ppos >= i_size_read(file_inode(in)))
+ break;
+
+ iocb.ki_pos = *ppos;
+ error = filemap_get_pages(&iocb, len, &fbatch, true);
+ if (error < 0)
+ break;
+
+ /*
+ * i_size must be checked after we know the pages are Uptodate.
+ *
+ * Checking i_size after the check allows us to calculate
+ * the correct value for "nr", which means the zero-filled
+ * part of the page is not copied back to userspace (unless
+ * another truncate extends the file - this is desired though).
+ */
+ isize = i_size_read(file_inode(in));
+ if (unlikely(*ppos >= isize))
+ break;
+ end_offset = min_t(loff_t, isize, *ppos + len);
+
+ /*
+ * Once we start copying data, we don't want to be touching any
+ * cachelines that might be contended:
+ */
+ writably_mapped = mapping_writably_mapped(in->f_mapping);
+
+ for (i = 0; i < folio_batch_count(&fbatch); i++) {
+ struct folio *folio = fbatch.folios[i];
+ size_t n;
+
+ if (folio_pos(folio) >= end_offset)
+ goto out;
+ folio_mark_accessed(folio);
+
+ /*
+ * If users can be writing to this folio using arbitrary
+ * virtual addresses, take care of potential aliasing
+ * before reading the folio on the kernel side.
+ */
+ if (writably_mapped)
+ flush_dcache_folio(folio);
+
+ n = min_t(loff_t, len, isize - *ppos);
+ n = splice_folio_into_pipe(pipe, folio, *ppos, n);
+ if (!n)
+ goto out;
+ len -= n;
+ total_spliced += n;
+ *ppos += n;
+ in->f_ra.prev_pos = *ppos;
+ if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+ goto out;
+ }
+
+ folio_batch_release(&fbatch);
+ } while (len);
+
+out:
+ folio_batch_release(&fbatch);
+ file_accessed(in);
+
+ return total_spliced ? total_spliced : error;
+}
+EXPORT_SYMBOL(filemap_splice_read);
+
static inline loff_t folio_seek_hole_data(struct xa_state *xas,
struct address_space *mapping, struct folio *folio,
loff_t start, loff_t end, bool seek_data)
}
EXPORT_SYMBOL(filemap_fault);
- static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
+ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
+ pgoff_t start)
{
struct mm_struct *mm = vmf->vma->vm_mm;
/* Huge page is mapped? No need to proceed. */
if (pmd_trans_huge(*vmf->pmd)) {
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
return true;
}
- if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
+ if (pmd_none(*vmf->pmd) && folio_test_pmd_mappable(folio)) {
+ struct page *page = folio_file_page(folio, start);
vm_fault_t ret = do_set_pmd(vmf, page);
if (!ret) {
/* The page is mapped successfully, reference consumed. */
- unlock_page(page);
+ folio_unlock(folio);
return true;
}
}
/* See comment in handle_pte_fault() */
if (pmd_devmap_trans_unstable(vmf->pmd)) {
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
return true;
}
if (!folio)
goto out;
- if (filemap_map_pmd(vmf, &folio->page)) {
+ if (filemap_map_pmd(vmf, folio, start_pgoff)) {
ret = VM_FAULT_NOPAGE;
goto out;
}
}
EXPORT_SYMBOL(read_cache_folio);
+ /**
+ * mapping_read_folio_gfp - Read into page cache, using specified allocation flags.
+ * @mapping: The address_space for the folio.
+ * @index: The index that the allocated folio will contain.
+ * @gfp: The page allocator flags to use if allocating.
+ *
+ * This is the same as "read_cache_folio(mapping, index, NULL, NULL)", but with
+ * any new memory allocations done using the specified allocation flags.
+ *
+ * The most likely error from this function is EIO, but ENOMEM is
+ * possible and so is EINTR. If ->read_folio returns another error,
+ * that will be returned to the caller.
+ *
+ * The function expects mapping->invalidate_lock to be already held.
+ *
+ * Return: Uptodate folio on success, ERR_PTR() on failure.
+ */
+ struct folio *mapping_read_folio_gfp(struct address_space *mapping,
+ pgoff_t index, gfp_t gfp)
+ {
+ return do_read_cache_folio(mapping, index, NULL, NULL, gfp);
+ }
+ EXPORT_SYMBOL(mapping_read_folio_gfp);
+
static struct page *do_read_cache_page(struct address_space *mapping,
pgoff_t index, filler_t *filler, struct file *file, gfp_t gfp)
{
* own flags.
*/
if (!in_pf && shmem_file(vma->vm_file))
- return shmem_huge_enabled(vma, !enforce_sysfs);
+ return shmem_is_huge(file_inode(vma->vm_file), vma->vm_pgoff,
+ !enforce_sysfs, vma->vm_mm, vm_flags);
/* Enforce sysfs THP requirements as necessary */
if (enforce_sysfs &&
}
#ifdef CONFIG_MEMCG
- static inline struct deferred_split *get_deferred_split_queue(struct page *page)
+ static inline
+ struct deferred_split *get_deferred_split_queue(struct folio *folio)
{
- struct mem_cgroup *memcg = page_memcg(compound_head(page));
- struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
+ struct mem_cgroup *memcg = folio_memcg(folio);
+ struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
if (memcg)
return &memcg->deferred_split_queue;
return &pgdat->deferred_split_queue;
}
#else
- static inline struct deferred_split *get_deferred_split_queue(struct page *page)
+ static inline
+ struct deferred_split *get_deferred_split_queue(struct folio *folio)
{
- struct pglist_data *pgdat = NODE_DATA(page_to_nid(page));
+ struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
return &pgdat->deferred_split_queue;
}
void prep_transhuge_page(struct page *page)
{
- /*
- * we use page->mapping and page->index in second tail page
- * as list_head: assuming THP order >= 2
- */
+ struct folio *folio = (struct folio *)page;
- INIT_LIST_HEAD(page_deferred_list(page));
+ VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
+ INIT_LIST_HEAD(&folio->_deferred_list);
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
}
static inline bool is_transparent_hugepage(struct page *page)
{
+ struct folio *folio;
+
if (!PageCompound(page))
return false;
- page = compound_head(page);
- return is_huge_zero_page(page) ||
- page[1].compound_dtor == TRANSHUGE_PAGE_DTOR;
+ folio = page_folio(page);
+ return is_huge_zero_page(&folio->page) ||
+ folio->_folio_dtor == TRANSHUGE_PAGE_DTOR;
}
static unsigned long __thp_get_unmapped_area(struct file *filp,
assert_spin_locked(pmd_lockptr(mm, pmd));
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
- (FOLL_PIN | FOLL_GET)))
- return NULL;
-
if (flags & FOLL_WRITE && !pmd_write(*pmd))
return NULL;
if (flags & FOLL_WRITE && !pud_write(*pud))
return NULL;
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
- (FOLL_PIN | FOLL_GET)))
- return NULL;
-
if (pud_present(*pud) && pud_devmap(*pud))
/* pass */;
else
{
spinlock_t *ptl;
pmd_t orig_pmd;
- struct page *page;
+ struct folio *folio;
struct mm_struct *mm = tlb->mm;
bool ret = false;
goto out;
}
- page = pmd_page(orig_pmd);
+ folio = pfn_folio(pmd_pfn(orig_pmd));
/*
- * If other processes are mapping this page, we couldn't discard
- * the page unless they all do MADV_FREE so let's skip the page.
+ * If other processes are mapping this folio, we couldn't discard
+ * the folio unless they all do MADV_FREE so let's skip the folio.
*/
- if (total_mapcount(page) != 1)
+ if (folio_mapcount(folio) != 1)
goto out;
- if (!trylock_page(page))
+ if (!folio_trylock(folio))
goto out;
/*
* will deactivate only them.
*/
if (next - addr != HPAGE_PMD_SIZE) {
- get_page(page);
+ folio_get(folio);
spin_unlock(ptl);
- split_huge_page(page);
- unlock_page(page);
- put_page(page);
+ split_folio(folio);
+ folio_unlock(folio);
+ folio_put(folio);
goto out_unlocked;
}
- if (PageDirty(page))
- ClearPageDirty(page);
- unlock_page(page);
+ if (folio_test_dirty(folio))
+ folio_clear_dirty(folio);
+ folio_unlock(folio);
if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
pmdp_invalidate(vma, addr, pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
}
- mark_page_lazyfree(page);
+ folio_mark_lazyfree(folio);
ret = true;
out:
spin_unlock(ptl);
oldpmd = pmdp_invalidate_ad(vma, addr, pmd);
entry = pmd_modify(oldpmd, newprot);
- if (uffd_wp) {
- entry = pmd_wrprotect(entry);
+ if (uffd_wp)
entry = pmd_mkuffd_wp(entry);
- } else if (uffd_wp_resolve) {
+ else if (uffd_wp_resolve)
/*
* Leave the write bit to be handled by PF interrupt
* handler, then things like COW could be properly
* handled.
*/
entry = pmd_clear_uffd_wp(entry);
- }
/* See change_pte_range(). */
if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) &&
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
address & HPAGE_PUD_MASK,
(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
mmu_notifier_invalidate_range_start(&range);
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
address & HPAGE_PMD_MASK,
(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
* of swap cache pages that store the swp_entry_t in tail pages.
* Fix up and warn once if private is unexpectedly set.
*
- * What of 32-bit systems, on which head[1].compound_pincount overlays
+ * What of 32-bit systems, on which folio->_pincount overlays
* head[1].private? No problem: THP_SWAP is not enabled on 32-bit, and
- * compound_pincount must be 0 for folio_ref_freeze() to have succeeded.
+ * pincount must be 0 for folio_ref_freeze() to have succeeded.
*/
if (!folio_test_swapcache(page_folio(head))) {
VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct folio *folio = page_folio(page);
- struct deferred_split *ds_queue = get_deferred_split_queue(&folio->page);
+ struct deferred_split *ds_queue = get_deferred_split_queue(folio);
XA_STATE(xas, &folio->mapping->i_pages, folio->index);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
/* Prevent deferred_split_scan() touching ->_refcount */
spin_lock(&ds_queue->split_queue_lock);
if (folio_ref_freeze(folio, 1 + extra_pins)) {
- if (!list_empty(page_deferred_list(&folio->page))) {
+ if (!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(&folio->page));
+ list_del(&folio->_deferred_list);
}
spin_unlock(&ds_queue->split_queue_lock);
if (mapping) {
void free_transhuge_page(struct page *page)
{
- struct deferred_split *ds_queue = get_deferred_split_queue(page);
+ struct folio *folio = (struct folio *)page;
+ struct deferred_split *ds_queue = get_deferred_split_queue(folio);
unsigned long flags;
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
- if (!list_empty(page_deferred_list(page))) {
+ if (!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
- list_del(page_deferred_list(page));
+ list_del(&folio->_deferred_list);
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
free_compound_page(page);
}
- void deferred_split_huge_page(struct page *page)
+ void deferred_split_folio(struct folio *folio)
{
- struct deferred_split *ds_queue = get_deferred_split_queue(page);
+ struct deferred_split *ds_queue = get_deferred_split_queue(folio);
#ifdef CONFIG_MEMCG
- struct mem_cgroup *memcg = page_memcg(compound_head(page));
+ struct mem_cgroup *memcg = folio_memcg(folio);
#endif
unsigned long flags;
- VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+ VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
/*
* The try_to_unmap() in page reclaim path might reach here too,
* this may cause a race condition to corrupt deferred split queue.
- * And, if page reclaim is already handling the same page, it is
+ * And, if page reclaim is already handling the same folio, it is
* unnecessary to handle it again in shrinker.
*
- * Check PageSwapCache to determine if the page is being
- * handled by page reclaim since THP swap would add the page into
+ * Check the swapcache flag to determine if the folio is being
+ * handled by page reclaim since THP swap would add the folio into
* swap cache before calling try_to_unmap().
*/
- if (PageSwapCache(page))
+ if (folio_test_swapcache(folio))
+ return;
+
+ if (!list_empty(&folio->_deferred_list))
return;
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
- if (list_empty(page_deferred_list(page))) {
+ if (list_empty(&folio->_deferred_list)) {
count_vm_event(THP_DEFERRED_SPLIT_PAGE);
- list_add_tail(page_deferred_list(page), &ds_queue->split_queue);
+ list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
ds_queue->split_queue_len++;
#ifdef CONFIG_MEMCG
if (memcg)
- set_shrinker_bit(memcg, page_to_nid(page),
+ set_shrinker_bit(memcg, folio_nid(folio),
deferred_split_shrinker.id);
#endif
}
struct pglist_data *pgdata = NODE_DATA(sc->nid);
struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
unsigned long flags;
- LIST_HEAD(list), *pos, *next;
- struct page *page;
+ LIST_HEAD(list);
+ struct folio *folio, *next;
int split = 0;
#ifdef CONFIG_MEMCG
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
/* Take pin on all head pages to avoid freeing them under us */
- list_for_each_safe(pos, next, &ds_queue->split_queue) {
- page = list_entry((void *)pos, struct page, deferred_list);
- page = compound_head(page);
- if (get_page_unless_zero(page)) {
- list_move(page_deferred_list(page), &list);
+ list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
+ _deferred_list) {
+ if (folio_try_get(folio)) {
+ list_move(&folio->_deferred_list, &list);
} else {
- /* We lost race with put_compound_page() */
- list_del_init(page_deferred_list(page));
+ /* We lost race with folio_put() */
+ list_del_init(&folio->_deferred_list);
ds_queue->split_queue_len--;
}
if (!--sc->nr_to_scan)
}
spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
- list_for_each_safe(pos, next, &list) {
- page = list_entry((void *)pos, struct page, deferred_list);
- if (!trylock_page(page))
+ list_for_each_entry_safe(folio, next, &list, _deferred_list) {
+ if (!folio_trylock(folio))
goto next;
/* split_huge_page() removes page from list on success */
- if (!split_huge_page(page))
+ if (!split_folio(folio))
split++;
- unlock_page(page);
+ folio_unlock(folio);
next:
- put_page(page);
+ folio_put(folio);
}
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
{
struct zone *zone;
struct page *page;
+ struct folio *folio;
unsigned long pfn, max_zone_pfn;
unsigned long total = 0, split = 0;
int nr_pages;
page = pfn_to_online_page(pfn);
- if (!page || !get_page_unless_zero(page))
+ if (!page || PageTail(page))
+ continue;
+ folio = page_folio(page);
+ if (!folio_try_get(folio))
continue;
- if (zone != page_zone(page))
+ if (unlikely(page_folio(page) != folio))
goto next;
- if (!PageHead(page) || PageHuge(page) || !PageLRU(page))
+ if (zone != folio_zone(folio))
+ goto next;
+
+ if (!folio_test_large(folio)
+ || folio_test_hugetlb(folio)
+ || !folio_test_lru(folio))
goto next;
total++;
- lock_page(page);
- nr_pages = thp_nr_pages(page);
- if (!split_huge_page(page))
+ folio_lock(folio);
+ nr_pages = folio_nr_pages(folio);
+ if (!split_folio(folio))
split++;
pfn += nr_pages - 1;
- unlock_page(page);
+ folio_unlock(folio);
next:
- put_page(page);
+ folio_put(folio);
cond_resched();
}
}
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
- if (is_writable_migration_entry(entry))
- pmde = maybe_pmd_mkwrite(pmde, vma);
if (pmd_swp_uffd_wp(*pvmw->pmd))
- pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
+ pmde = pmd_mkuffd_wp(pmde);
if (!is_migration_entry_young(entry))
pmde = pmd_mkold(pmde);
/* NOTE: this may contain setting soft-dirty on some archs */
if (PageDirty(new) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);
+ if (is_writable_migration_entry(entry))
+ pmde = maybe_pmd_mkwrite(pmde, vma);
+ else
+ pmde = pmd_wrprotect(pmde);
if (PageAnon(new)) {
rmap_t rmap_flags = RMAP_COMPOUND;
#define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
- __GFP_ATOMIC|__GFP_NOLOCKDEP)
+ __GFP_NOLOCKDEP)
/* The GFP flags allowed during early boot */
#define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
void page_writeback_init(void);
+ /*
+ * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
+ * its nr_pages_mapped would be 0x400000: choose the COMPOUND_MAPPED bit
+ * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE). Hugetlb currently
+ * leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
+ */
+ #define COMPOUND_MAPPED 0x800000
+ #define FOLIO_PAGES_MAPPED (COMPOUND_MAPPED - 1)
+
+ /*
+ * How many individual pages have an elevated _mapcount. Excludes
+ * the folio's entire_mapcount.
+ */
+ static inline int folio_nr_pages_mapped(struct folio *folio)
+ {
+ return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
+ }
+
static inline void *folio_raw_mapping(struct folio *folio)
{
unsigned long mapping = (unsigned long)folio->mapping;
return ret;
}
- static inline bool page_evictable(struct page *page)
- {
- bool ret;
-
- /* Prevent address_space of inode and swap cache from being freed */
- rcu_read_lock();
- ret = !mapping_unevictable(page_mapping(page)) && !PageMlocked(page);
- rcu_read_unlock();
- return ret;
- }
-
/*
* Turn a non-refcounted page (->_refcount == 0) into refcounted with
* a count of one.
/*
* in mm/vmscan.c:
*/
- int isolate_lru_page(struct page *page);
- int folio_isolate_lru(struct folio *folio);
+ bool isolate_lru_page(struct page *page);
+ bool folio_isolate_lru(struct folio *folio);
void putback_lru_page(struct page *page);
void folio_putback_lru(struct folio *folio);
extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
int split_free_page(struct page *free_page,
unsigned int order, unsigned long split_pfn_offset);
+ /*
+ * This will have no effect, other than possibly generating a warning, if the
+ * caller passes in a non-large folio.
+ */
+ static inline void folio_set_order(struct folio *folio, unsigned int order)
+ {
+ if (WARN_ON_ONCE(!folio_test_large(folio)))
+ return;
+
+ folio->_folio_order = order;
+ #ifdef CONFIG_64BIT
+ /*
+ * When hugetlb dissolves a folio, we need to clear the tail
+ * page, rather than setting nr_pages to 1.
+ */
+ folio->_folio_nr_pages = order ? 1U << order : 0;
+ #endif
+ }
+
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
/*
bool proactive_compaction; /* kcompactd proactive compaction */
bool whole_zone; /* Whole zone should/has been scanned */
bool contended; /* Signal lock contention */
- bool rescan; /* Rescanning the same pageblock */
+ bool finish_pageblock; /* Scan the remainder of a pageblock. Used
+ * when there are potentially transient
+ * isolation or migration failures to
+ * ensure forward progress.
+ */
bool alloc_contig; /* alloc_contig_range allocation */
};
extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
unsigned long len);
/*
- * mlock_vma_page() and munlock_vma_page():
+ * mlock_vma_folio() and munlock_vma_folio():
* should be called with vma's mmap_lock held for read or write,
* under page table lock for the pte/pmd being added or removed.
*
- * mlock is usually called at the end of page_add_*_rmap(),
- * munlock at the end of page_remove_rmap(); but new anon
- * pages are managed by lru_cache_add_inactive_or_unevictable()
- * calling mlock_new_page().
+ * mlock is usually called at the end of page_add_*_rmap(), munlock at
+ * the end of page_remove_rmap(); but new anon folios are managed by
+ * folio_add_lru_vma() calling mlock_new_folio().
*
* @compound is used to include pmd mappings of THPs, but filter out
* pte mappings of THPs, which cannot be consistently counted: a pte
mlock_folio(folio);
}
- static inline void mlock_vma_page(struct page *page,
- struct vm_area_struct *vma, bool compound)
- {
- mlock_vma_folio(page_folio(page), vma, compound);
- }
-
- void munlock_page(struct page *page);
- static inline void munlock_vma_page(struct page *page,
+ void munlock_folio(struct folio *folio);
+ static inline void munlock_vma_folio(struct folio *folio,
struct vm_area_struct *vma, bool compound)
{
if (unlikely(vma->vm_flags & VM_LOCKED) &&
- (compound || !PageTransCompound(page)))
- munlock_page(page);
+ (compound || !folio_test_large(folio)))
+ munlock_folio(folio);
}
- void mlock_new_page(struct page *page);
- bool need_mlock_page_drain(int cpu);
- void mlock_page_drain_local(void);
- void mlock_page_drain_remote(int cpu);
+
+ void mlock_new_folio(struct folio *folio);
+ bool need_mlock_drain(int cpu);
+ void mlock_drain_local(void);
+ void mlock_drain_remote(int cpu);
extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
}
#else /* !CONFIG_MMU */
static inline void unmap_mapping_folio(struct folio *folio) { }
- static inline void mlock_vma_page(struct page *page,
- struct vm_area_struct *vma, bool compound) { }
- static inline void munlock_vma_page(struct page *page,
- struct vm_area_struct *vma, bool compound) { }
- static inline void mlock_new_page(struct page *page) { }
- static inline bool need_mlock_page_drain(int cpu) { return false; }
- static inline void mlock_page_drain_local(void) { }
- static inline void mlock_page_drain_remote(int cpu) { }
+ static inline void mlock_new_folio(struct folio *folio) { }
+ static inline bool need_mlock_drain(int cpu) { return false; }
+ static inline void mlock_drain_local(void) { }
+ static inline void mlock_drain_remote(int cpu) { }
static inline void vunmap_range_noflush(unsigned long start, unsigned long end)
{
}
#define ALLOC_OOM ALLOC_NO_WATERMARKS
#endif
- #define ALLOC_HARDER 0x10 /* try to alloc harder */
- #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */
+ #define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access
+ * to 25% of the min watermark or
+ * 62.5% if __GFP_HIGH is set.
+ */
+ #define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50%
+ * of the min watermark.
+ */
#define ALLOC_CPUSET 0x40 /* check for correct cpuset */
#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */
#ifdef CONFIG_ZONE_DMA32
#else
#define ALLOC_NOFRAGMENT 0x0
#endif
+ #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */
#define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
+ /* Flags that allow allocations below the min watermark. */
+ #define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+
enum ttu_flags;
struct tlbflush_unmap_batch;
gfp_t gfp_mask;
};
+/*
+ * mm/filemap.c
+ */
+size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
+ struct folio *folio, loff_t fpos, size_t size);
+
/*
* mm/vmalloc.c
*/
* mm/gup.c
*/
struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
+ int __must_check try_grab_page(struct page *page, unsigned int flags);
+
+ enum {
+ /* mark page accessed */
+ FOLL_TOUCH = 1 << 16,
+ /* a retry, previous pass started an IO */
+ FOLL_TRIED = 1 << 17,
+ /* we are working on non-current tsk/mm */
+ FOLL_REMOTE = 1 << 18,
+ /* pages must be released via unpin_user_page */
+ FOLL_PIN = 1 << 19,
+ /* gup_fast: prevent fall-back to slow gup */
+ FOLL_FAST_ONLY = 1 << 20,
+ /* allow unlocking the mmap lock */
+ FOLL_UNLOCKABLE = 1 << 21,
+ };
+
+ /*
+ * Indicates for which pages that are write-protected in the page table,
+ * whether GUP has to trigger unsharing via FAULT_FLAG_UNSHARE such that the
+ * GUP pin will remain consistent with the pages mapped into the page tables
+ * of the MM.
+ *
+ * Temporary unmapping of PageAnonExclusive() pages or clearing of
+ * PageAnonExclusive() has to protect against concurrent GUP:
+ * * Ordinary GUP: Using the PT lock
+ * * GUP-fast and fork(): mm->write_protect_seq
+ * * GUP-fast and KSM or temporary unmapping (swap, migration): see
+ * page_try_share_anon_rmap()
+ *
+ * Must be called with the (sub)page that's actually referenced via the
+ * page table entry, which might not necessarily be the head page for a
+ * PTE-mapped THP.
+ *
+ * If the vma is NULL, we're coming from the GUP-fast path and might have
+ * to fallback to the slow path just to lookup the vma.
+ */
+ static inline bool gup_must_unshare(struct vm_area_struct *vma,
+ unsigned int flags, struct page *page)
+ {
+ /*
+ * FOLL_WRITE is implicitly handled correctly as the page table entry
+ * has to be writable -- and if it references (part of) an anonymous
+ * folio, that part is required to be marked exclusive.
+ */
+ if ((flags & (FOLL_WRITE | FOLL_PIN)) != FOLL_PIN)
+ return false;
+ /*
+ * Note: PageAnon(page) is stable until the page is actually getting
+ * freed.
+ */
+ if (!PageAnon(page)) {
+ /*
+ * We only care about R/O long-term pining: R/O short-term
+ * pinning does not have the semantics to observe successive
+ * changes through the process page tables.
+ */
+ if (!(flags & FOLL_LONGTERM))
+ return false;
+
+ /* We really need the vma ... */
+ if (!vma)
+ return true;
+
+ /*
+ * ... because we only care about writable private ("COW")
+ * mappings where we have to break COW early.
+ */
+ return is_cow_mapping(vma->vm_flags);
+ }
+
+ /* Paired with a memory barrier in page_try_share_anon_rmap(). */
+ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
+ smp_rmb();
+
+ /*
+ * Note that PageKsm() pages cannot be exclusive, and consequently,
+ * cannot get pinned.
+ */
+ return !PageAnonExclusive(page);
+ }
extern bool mirrored_kernelcore;
return !(vma->vm_flags & VM_SOFTDIRTY);
}
+ /*
+ * VMA Iterator functions shared between nommu and mmap
+ */
+ static inline int vma_iter_prealloc(struct vma_iterator *vmi)
+ {
+ return mas_preallocate(&vmi->mas, GFP_KERNEL);
+ }
+
+ static inline void vma_iter_clear(struct vma_iterator *vmi,
+ unsigned long start, unsigned long end)
+ {
+ mas_set_range(&vmi->mas, start, end - 1);
+ mas_store_prealloc(&vmi->mas, NULL);
+ }
+
+ static inline struct vm_area_struct *vma_iter_load(struct vma_iterator *vmi)
+ {
+ return mas_walk(&vmi->mas);
+ }
+
+ /* Store a VMA with preallocated memory */
+ static inline void vma_iter_store(struct vma_iterator *vmi,
+ struct vm_area_struct *vma)
+ {
+
+ #if defined(CONFIG_DEBUG_VM_MAPLE_TREE)
+ if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.index > vma->vm_start)) {
+ printk("%lu > %lu\n", vmi->mas.index, vma->vm_start);
+ printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+ printk("into slot %lu-%lu", vmi->mas.index, vmi->mas.last);
+ mt_dump(vmi->mas.tree);
+ }
+ if (WARN_ON(vmi->mas.node != MAS_START && vmi->mas.last < vma->vm_start)) {
+ printk("%lu < %lu\n", vmi->mas.last, vma->vm_start);
+ printk("store of vma %lu-%lu", vma->vm_start, vma->vm_end);
+ printk("into slot %lu-%lu", vmi->mas.index, vmi->mas.last);
+ mt_dump(vmi->mas.tree);
+ }
+ #endif
+
+ if (vmi->mas.node != MAS_START &&
+ ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+ vma_iter_invalidate(vmi);
+
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store_prealloc(&vmi->mas, vma);
+ }
+
+ static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
+ struct vm_area_struct *vma, gfp_t gfp)
+ {
+ if (vmi->mas.node != MAS_START &&
+ ((vmi->mas.index > vma->vm_start) || (vmi->mas.last < vma->vm_start)))
+ vma_iter_invalidate(vmi);
+
+ vmi->mas.index = vma->vm_start;
+ vmi->mas.last = vma->vm_end - 1;
+ mas_store_gfp(&vmi->mas, vma, gfp);
+ if (unlikely(mas_is_err(&vmi->mas)))
+ return -ENOMEM;
+
+ return 0;
+ }
+
+ /*
+ * VMA lock generalization
+ */
+ struct vma_prepare {
+ struct vm_area_struct *vma;
+ struct vm_area_struct *adj_next;
+ struct file *file;
+ struct address_space *mapping;
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *insert;
+ struct vm_area_struct *remove;
+ struct vm_area_struct *remove2;
+ };
#endif /* __MM_INTERNAL_H */
extern enum kasan_mode kasan_mode __ro_after_init;
+ extern unsigned long kasan_page_alloc_sample;
+ extern unsigned int kasan_page_alloc_sample_order;
+ DECLARE_PER_CPU(long, kasan_page_alloc_skip);
+
static inline bool kasan_vmalloc_enabled(void)
{
return static_branch_likely(&kasan_flag_vmalloc);
return kasan_mode == KASAN_MODE_SYNC || kasan_mode == KASAN_MODE_ASYMM;
}
+ static inline bool kasan_sample_page_alloc(unsigned int order)
+ {
+ /* Fast-path for when sampling is disabled. */
+ if (kasan_page_alloc_sample == 1)
+ return true;
+
+ if (order < kasan_page_alloc_sample_order)
+ return true;
+
+ if (this_cpu_dec_return(kasan_page_alloc_skip) < 0) {
+ this_cpu_write(kasan_page_alloc_skip,
+ kasan_page_alloc_sample - 1);
+ return true;
+ }
+
+ return false;
+ }
+
#else /* CONFIG_KASAN_HW_TAGS */
static inline bool kasan_async_fault_possible(void)
return true;
}
+ static inline bool kasan_sample_page_alloc(unsigned int order)
+ {
+ return true;
+ }
+
#endif /* CONFIG_KASAN_HW_TAGS */
#ifdef CONFIG_KASAN_GENERIC
void *first_bad_addr;
struct kmem_cache *cache;
void *object;
+ size_t alloc_size;
/* Filled in by the mode-specific reporting code. */
const char *bug_type;
<< KASAN_SHADOW_SCALE_SHIFT);
}
- static inline bool addr_has_metadata(const void *addr)
+ static __always_inline bool addr_has_metadata(const void *addr)
{
return (kasan_reset_tag(addr) >=
kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
#else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
- static inline bool addr_has_metadata(const void *addr)
+ static __always_inline bool addr_has_metadata(const void *addr)
{
return (is_vmalloc_addr(addr) || virt_addr_valid(addr));
}
#endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
void *kasan_find_first_bad_addr(void *addr, size_t size);
+ size_t kasan_get_alloc_size(void *object, struct kmem_cache *cache);
void kasan_complete_mode_report_info(struct kasan_report_info *info);
void kasan_metadata_fetch_row(char *buffer, void *row);
void __asan_set_shadow_f5(const void *addr, size_t size);
void __asan_set_shadow_f8(const void *addr, size_t size);
+void *__asan_memset(void *addr, int c, size_t len);
+void *__asan_memmove(void *dest, const void *src, size_t len);
+void *__asan_memcpy(void *dest, const void *src, size_t len);
+
void __hwasan_load1_noabort(unsigned long addr);
void __hwasan_store1_noabort(unsigned long addr);
void __hwasan_load2_noabort(unsigned long addr);
}
}
+ static void release_pte_folio(struct folio *folio)
+ {
+ node_stat_mod_folio(folio,
+ NR_ISOLATED_ANON + folio_is_file_lru(folio),
+ -folio_nr_pages(folio));
+ folio_unlock(folio);
+ folio_putback_lru(folio);
+ }
+
static void release_pte_page(struct page *page)
{
- mod_node_page_state(page_pgdat(page),
- NR_ISOLATED_ANON + page_is_file_lru(page),
- -compound_nr(page));
- unlock_page(page);
- putback_lru_page(page);
+ release_pte_folio(page_folio(page));
}
static void release_pte_pages(pte_t *pte, pte_t *_pte,
struct list_head *compound_pagelist)
{
- struct page *page, *tmp;
+ struct folio *folio, *tmp;
while (--_pte >= pte) {
pte_t pteval = *_pte;
+ unsigned long pfn;
- page = pte_page(pteval);
- if (!pte_none(pteval) && !is_zero_pfn(pte_pfn(pteval)) &&
- !PageCompound(page))
- release_pte_page(page);
+ if (pte_none(pteval))
+ continue;
+ pfn = pte_pfn(pteval);
+ if (is_zero_pfn(pfn))
+ continue;
+ folio = pfn_folio(pfn);
+ if (folio_test_large(folio))
+ continue;
+ release_pte_folio(folio);
}
- list_for_each_entry_safe(page, tmp, compound_pagelist, lru) {
- list_del(&page->lru);
- release_pte_page(page);
+ list_for_each_entry_safe(folio, tmp, compound_pagelist, lru) {
+ list_del(&folio->lru);
+ release_pte_folio(folio);
}
}
* Isolate the page to avoid collapsing an hugepage
* currently in use by the VM.
*/
- if (isolate_lru_page(page)) {
+ if (!isolate_lru_page(page)) {
unlock_page(page);
result = SCAN_DEL_PAGE_LRU;
goto out;
anon_vma_lock_write(vma->anon_vma);
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
- address, address + HPAGE_PMD_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
+ address + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
pte = pte_offset_map(pmd, address);
if (vma->anon_vma)
lockdep_assert_held_write(&vma->anon_vma->root->rwsem);
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, addr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr,
addr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
pmd = pmdp_collapse_flush(vma, addr, pmdp);
goto out_unlock;
}
- if (folio_isolate_lru(folio)) {
+ if (!folio_isolate_lru(folio)) {
result = SCAN_DEL_PAGE_LRU;
goto out_unlock;
}
case SCAN_CGROUP_CHARGE_FAIL:
return -EBUSY;
/* Resource temporary unavailable - trying again might succeed */
+ case SCAN_PAGE_COUNT:
case SCAN_PAGE_LOCK:
case SCAN_PAGE_LRU:
case SCAN_DEL_PAGE_LRU:
struct mm_struct *mm = vma->vm_mm;
int error;
pgoff_t pgoff;
+ VMA_ITERATOR(vmi, mm, start);
if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
*prev = vma;
}
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
- *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
- vma->vm_file, pgoff, vma_policy(vma),
+ *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags,
+ vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
vma = *prev;
*prev = vma;
if (start != vma->vm_start) {
- if (unlikely(mm->map_count >= sysctl_max_map_count))
- return -ENOMEM;
- error = __split_vma(mm, vma, start, 1);
+ error = split_vma(&vmi, vma, start, 1);
if (error)
return error;
}
if (end != vma->vm_end) {
- if (unlikely(mm->map_count >= sysctl_max_map_count))
- return -ENOMEM;
- error = __split_vma(mm, vma, end, 0);
+ error = split_vma(&vmi, vma, end, 0);
if (error)
return error;
}
/*
* vm_flags is protected by the mmap_lock held in write mode.
*/
- vma->vm_flags = new_flags;
+ vm_flags_reset(vma, new_flags);
if (!vma->vm_file || vma_is_anon_shmem(vma)) {
error = replace_anon_vma_name(vma, anon_name);
if (error)
* otherwise we'd be including shared non-exclusive mappings, which
* opens a side channel.
*/
- return inode_owner_or_capable(&init_user_ns,
+ return inode_owner_or_capable(&nop_mnt_idmap,
file_inode(vma->vm_file)) ||
file_permission(vma->vm_file, MAY_WRITE) == 0;
}
struct vm_area_struct *vma = walk->vma;
pte_t *orig_pte, *pte, ptent;
spinlock_t *ptl;
- struct page *page = NULL;
- LIST_HEAD(page_list);
+ struct folio *folio = NULL;
+ LIST_HEAD(folio_list);
bool pageout_anon_only_filter;
if (fatal_signal_pending(current))
goto huge_unlock;
}
- page = pmd_page(orig_pmd);
+ folio = pfn_folio(pmd_pfn(orig_pmd));
- /* Do not interfere with other mappings of this page */
- if (page_mapcount(page) != 1)
+ /* Do not interfere with other mappings of this folio */
+ if (folio_mapcount(folio) != 1)
goto huge_unlock;
- if (pageout_anon_only_filter && !PageAnon(page))
+ if (pageout_anon_only_filter && !folio_test_anon(folio))
goto huge_unlock;
if (next - addr != HPAGE_PMD_SIZE) {
int err;
- get_page(page);
+ folio_get(folio);
spin_unlock(ptl);
- lock_page(page);
- err = split_huge_page(page);
- unlock_page(page);
- put_page(page);
+ folio_lock(folio);
+ err = split_folio(folio);
+ folio_unlock(folio);
+ folio_put(folio);
if (!err)
- goto regular_page;
+ goto regular_folio;
return 0;
}
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
}
- ClearPageReferenced(page);
- test_and_clear_page_young(page);
+ folio_clear_referenced(folio);
+ folio_test_clear_young(folio);
if (pageout) {
- if (!isolate_lru_page(page)) {
- if (PageUnevictable(page))
- putback_lru_page(page);
+ if (folio_isolate_lru(folio)) {
+ if (folio_test_unevictable(folio))
+ folio_putback_lru(folio);
else
- list_add(&page->lru, &page_list);
+ list_add(&folio->lru, &folio_list);
}
} else
- deactivate_page(page);
+ folio_deactivate(folio);
huge_unlock:
spin_unlock(ptl);
if (pageout)
- reclaim_pages(&page_list);
+ reclaim_pages(&folio_list);
return 0;
}
- regular_page:
+ regular_folio:
if (pmd_trans_unstable(pmd))
return 0;
#endif
if (!pte_present(ptent))
continue;
- page = vm_normal_page(vma, addr, ptent);
- if (!page || is_zone_device_page(page))
+ folio = vm_normal_folio(vma, addr, ptent);
+ if (!folio || folio_is_zone_device(folio))
continue;
/*
* Creating a THP page is expensive so split it only if we
* are sure it's worth. Split it if we are only owner.
*/
- if (PageTransCompound(page)) {
- if (page_mapcount(page) != 1)
+ if (folio_test_large(folio)) {
+ if (folio_mapcount(folio) != 1)
break;
- if (pageout_anon_only_filter && !PageAnon(page))
+ if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
- get_page(page);
- if (!trylock_page(page)) {
- put_page(page);
+ folio_get(folio);
+ if (!folio_trylock(folio)) {
+ folio_put(folio);
break;
}
pte_unmap_unlock(orig_pte, ptl);
- if (split_huge_page(page)) {
- unlock_page(page);
- put_page(page);
+ if (split_folio(folio)) {
+ folio_unlock(folio);
+ folio_put(folio);
orig_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
break;
}
- unlock_page(page);
- put_page(page);
+ folio_unlock(folio);
+ folio_put(folio);
orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte--;
addr -= PAGE_SIZE;
}
/*
- * Do not interfere with other mappings of this page and
- * non-LRU page.
+ * Do not interfere with other mappings of this folio and
+ * non-LRU folio.
*/
- if (!PageLRU(page) || page_mapcount(page) != 1)
+ if (!folio_test_lru(folio) || folio_mapcount(folio) != 1)
continue;
- if (pageout_anon_only_filter && !PageAnon(page))
+ if (pageout_anon_only_filter && !folio_test_anon(folio))
continue;
- VM_BUG_ON_PAGE(PageTransCompound(page), page);
+ VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
if (pte_young(ptent)) {
ptent = ptep_get_and_clear_full(mm, addr, pte,
}
/*
- * We are deactivating a page for accelerating reclaiming.
- * VM couldn't reclaim the page unless we clear PG_young.
+ * We are deactivating a folio for accelerating reclaiming.
+ * VM couldn't reclaim the folio unless we clear PG_young.
* As a side effect, it makes confuse idle-page tracking
* because they will miss recent referenced history.
*/
- ClearPageReferenced(page);
- test_and_clear_page_young(page);
+ folio_clear_referenced(folio);
+ folio_test_clear_young(folio);
if (pageout) {
- if (!isolate_lru_page(page)) {
- if (PageUnevictable(page))
- putback_lru_page(page);
+ if (folio_isolate_lru(folio)) {
+ if (folio_test_unevictable(folio))
+ folio_putback_lru(folio);
else
- list_add(&page->lru, &page_list);
+ list_add(&folio->lru, &folio_list);
}
} else
- deactivate_page(page);
+ folio_deactivate(folio);
}
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(orig_pte, ptl);
if (pageout)
- reclaim_pages(&page_list);
+ reclaim_pages(&folio_list);
cond_resched();
return 0;
spinlock_t *ptl;
pte_t *orig_pte, *pte, ptent;
struct folio *folio;
- struct page *page;
int nr_swap = 0;
unsigned long next;
continue;
}
- page = vm_normal_page(vma, addr, ptent);
- if (!page || is_zone_device_page(page))
+ folio = vm_normal_folio(vma, addr, ptent);
+ if (!folio || folio_is_zone_device(folio))
continue;
- folio = page_folio(page);
/*
* If pmd isn't transhuge but the folio is large and
set_pte_at(mm, addr, pte, ptent);
tlb_remove_tlb_entry(tlb, pte, addr);
}
- mark_page_lazyfree(&folio->page);
+ folio_mark_lazyfree(folio);
}
out:
if (nr_swap) {
range.end = min(vma->vm_end, end_addr);
if (range.end <= vma->vm_start)
return -EINVAL;
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
range.start, range.end);
lru_add_drain();
/* Kernel memory accounting disabled? */
static bool cgroup_memory_nokmem __ro_after_init;
+/* BPF memory accounting disabled? */
+static bool cgroup_memory_nobpf __ro_after_init;
+
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
#endif
* conditional to this static branch, we'll have to allow modules that does
* kmem_cache_alloc and the such to see this symbol as well
*/
- DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
- EXPORT_SYMBOL(memcg_kmem_enabled_key);
+ DEFINE_STATIC_KEY_FALSE(memcg_kmem_online_key);
+ EXPORT_SYMBOL(memcg_kmem_online_key);
+
+DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key);
+EXPORT_SYMBOL(memcg_bpf_enabled_key);
#endif
/**
- * mem_cgroup_css_from_page - css of the memcg associated with a page
- * @page: page of interest
+ * mem_cgroup_css_from_folio - css of the memcg associated with a folio
+ * @folio: folio of interest
*
* If memcg is bound to the default hierarchy, css of the memcg associated
- * with @page is returned. The returned css remains associated with @page
+ * with @folio is returned. The returned css remains associated with @folio
* until it is released.
*
* If memcg is bound to a traditional hierarchy, the css of root_mem_cgroup
* is returned.
*/
- struct cgroup_subsys_state *mem_cgroup_css_from_page(struct page *page)
+ struct cgroup_subsys_state *mem_cgroup_css_from_folio(struct folio *folio)
{
- struct mem_cgroup *memcg;
-
- memcg = page_memcg(page);
+ struct mem_cgroup *memcg = folio_memcg(folio);
if (!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
memcg = root_mem_cgroup;
struct mem_cgroup_per_node *mz;
struct mem_cgroup_tree_per_node *mctz;
+ if (lru_gen_enabled()) {
+ if (soft_limit_excess(memcg))
+ lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec);
+ return;
+ }
+
mctz = soft_limit_tree.rb_tree_per_node[nid];
if (!mctz)
return;
}
/*
- * page_memcg_check() is used here, because in theory we can encounter
+ * folio_memcg_check() is used here, because in theory we can encounter
* a folio where the slab flag has been cleared already, but
* slab->memcg_data has not been freed yet
- * page_memcg_check(page) will guarantee that a proper memory
+ * folio_memcg_check() will guarantee that a proper memory
* cgroup pointer or NULL will be returned.
*/
- return page_memcg_check(folio_page(folio, 0));
+ return folio_memcg_check(folio);
}
/*
{
struct obj_cgroup *objcg;
- if (!memcg_kmem_enabled())
+ if (!memcg_kmem_online())
return NULL;
if (PageMemcgKmem(page)) {
struct mem_cgroup_tree_per_node *mctz;
unsigned long excess;
+ if (lru_gen_enabled())
+ return 0;
+
if (order > 0)
return 0;
objcg->memcg = memcg;
rcu_assign_pointer(memcg->objcg, objcg);
- static_branch_enable(&memcg_kmem_enabled_key);
+ static_branch_enable(&memcg_kmem_online_key);
memcg->kmemcg_id = memcg->id.id;
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+ pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
+ "depend on this functionality.\n");
+
if (val & ~MOVE_MASK)
return -EINVAL;
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
+#if defined(CONFIG_MEMCG_KMEM)
+ if (!cgroup_memory_nobpf)
+ static_branch_inc(&memcg_bpf_enabled_key);
+#endif
+
return &memcg->css;
}
if (unlikely(mem_cgroup_is_root(memcg)))
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
+ lru_gen_online_memcg(memcg);
return 0;
offline_kmem:
memcg_offline_kmem(memcg);
memcg_offline_kmem(memcg);
reparent_shrinker_deferred(memcg);
wb_memcg_offline(memcg);
+ lru_gen_offline_memcg(memcg);
drain_all_stock(memcg);
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
invalidate_reclaim_iterators(memcg);
+ lru_gen_release_memcg(memcg);
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_active)
static_branch_dec(&memcg_sockets_enabled_key);
+#if defined(CONFIG_MEMCG_KMEM)
+ if (!cgroup_memory_nobpf)
+ static_branch_dec(&memcg_bpf_enabled_key);
+#endif
+
vmpressure_cleanup(&memcg->vmpressure);
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
* @from: mem_cgroup which the page is moved from.
* @to: mem_cgroup which the page is moved to. @from != @to.
*
- * The caller must make sure the page is not on LRU (isolate_page() is useful.)
+ * The page must be locked and not on the LRU.
*
* This function doesn't do "charge" to new cgroup and doesn't do "uncharge"
* from old cgroup.
int nid, ret;
VM_BUG_ON(from == to);
+ VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
VM_BUG_ON(compound && !folio_test_large(folio));
- /*
- * Prevent mem_cgroup_migrate() from looking at
- * page's memory cgroup of its source page while we change it.
- */
- ret = -EBUSY;
- if (!folio_trylock(folio))
- goto out;
-
ret = -EINVAL;
if (folio_memcg(folio) != from)
- goto out_unlock;
+ goto out;
pgdat = folio_pgdat(folio);
from_vec = mem_cgroup_lruvec(from, pgdat);
mem_cgroup_charge_statistics(from, -nr_pages);
memcg_check_events(from, nid);
local_irq_enable();
- out_unlock:
- folio_unlock(folio);
out:
return ret;
}
else if (is_swap_pte(ptent))
page = mc_handle_swap_pte(vma, ptent, &ent);
+ if (target && page) {
+ if (!trylock_page(page)) {
+ put_page(page);
+ return ret;
+ }
+ /*
+ * page_mapped() must be stable during the move. This
+ * pte is locked, so if it's present, the page cannot
+ * become unmapped. If it isn't, we have only partial
+ * control over the mapped state: the page lock will
+ * prevent new faults against pagecache and swapcache,
+ * so an unmapped page cannot become mapped. However,
+ * if the page is already mapped elsewhere, it can
+ * unmap, and there is nothing we can do about it.
+ * Alas, skip moving the page in this case.
+ */
+ if (!pte_present(ptent) && page_mapped(page)) {
+ unlock_page(page);
+ put_page(page);
+ return ret;
+ }
+ }
+
if (!page && !ent.val)
return ret;
if (page) {
if (target)
target->page = page;
}
- if (!ret || !target)
+ if (!ret || !target) {
+ if (target)
+ unlock_page(page);
put_page(page);
+ }
}
/*
* There is a swap entry and a page doesn't exist or isn't charged.
ret = MC_TARGET_PAGE;
if (target) {
get_page(page);
+ if (!trylock_page(page)) {
+ put_page(page);
+ return MC_TARGET_NONE;
+ }
target->page = page;
}
}
target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
if (target_type == MC_TARGET_PAGE) {
page = target.page;
- if (!isolate_lru_page(page)) {
+ if (isolate_lru_page(page)) {
if (!mem_cgroup_move_account(page, true,
mc.from, mc.to)) {
mc.precharge -= HPAGE_PMD_NR;
}
putback_lru_page(page);
}
+ unlock_page(page);
put_page(page);
} else if (target_type == MC_TARGET_DEVICE) {
page = target.page;
mc.precharge -= HPAGE_PMD_NR;
mc.moved_charge += HPAGE_PMD_NR;
}
+ unlock_page(page);
put_page(page);
}
spin_unlock(ptl);
*/
if (PageTransCompound(page))
goto put;
- if (!device && isolate_lru_page(page))
+ if (!device && !isolate_lru_page(page))
goto put;
if (!mem_cgroup_move_account(page, false,
mc.from, mc.to)) {
}
if (!device)
putback_lru_page(page);
- put: /* get_mctgt_type() gets the page */
+ put: /* get_mctgt_type() gets & locks the page */
+ unlock_page(page);
put_page(page);
break;
case MC_TARGET_SWAP:
cgroup_memory_nosocket = true;
if (!strcmp(token, "nokmem"))
cgroup_memory_nokmem = true;
+ if (!strcmp(token, "nobpf"))
+ cgroup_memory_nobpf = true;
}
return 1;
}
#include "internal.h"
- int isolate_movable_page(struct page *page, isolate_mode_t mode)
+ bool isolate_movable_page(struct page *page, isolate_mode_t mode)
{
+ struct folio *folio = folio_get_nontail_page(page);
const struct movable_operations *mops;
/*
* the put_page() at the end of this block will take care of
* release this page, thus avoiding a nasty leakage.
*/
- if (unlikely(!get_page_unless_zero(page)))
+ if (!folio)
goto out;
- if (unlikely(PageSlab(page)))
- goto out_putpage;
+ if (unlikely(folio_test_slab(folio)))
+ goto out_putfolio;
/* Pairs with smp_wmb() in slab freeing, e.g. SLUB's __free_slab() */
smp_rmb();
/*
* we use non-atomic bitops on newly allocated page flags so
* unconditionally grabbing the lock ruins page's owner side.
*/
- if (unlikely(!__PageMovable(page)))
- goto out_putpage;
+ if (unlikely(!__folio_test_movable(folio)))
+ goto out_putfolio;
/* Pairs with smp_wmb() in slab allocation, e.g. SLUB's alloc_slab_page() */
smp_rmb();
- if (unlikely(PageSlab(page)))
- goto out_putpage;
+ if (unlikely(folio_test_slab(folio)))
+ goto out_putfolio;
/*
* As movable pages are not isolated from LRU lists, concurrent
* lets be sure we have the page lock
* before proceeding with the movable page isolation steps.
*/
- if (unlikely(!trylock_page(page)))
- goto out_putpage;
+ if (unlikely(!folio_trylock(folio)))
+ goto out_putfolio;
- if (!PageMovable(page) || PageIsolated(page))
+ if (!folio_test_movable(folio) || folio_test_isolated(folio))
goto out_no_isolated;
- mops = page_movable_ops(page);
- VM_BUG_ON_PAGE(!mops, page);
+ mops = folio_movable_ops(folio);
+ VM_BUG_ON_FOLIO(!mops, folio);
- if (!mops->isolate_page(page, mode))
+ if (!mops->isolate_page(&folio->page, mode))
goto out_no_isolated;
/* Driver shouldn't use PG_isolated bit of page->flags */
- WARN_ON_ONCE(PageIsolated(page));
- SetPageIsolated(page);
- unlock_page(page);
+ WARN_ON_ONCE(folio_test_isolated(folio));
+ folio_set_isolated(folio);
+ folio_unlock(folio);
- return 0;
+ return true;
out_no_isolated:
- unlock_page(page);
- out_putpage:
- put_page(page);
+ folio_unlock(folio);
+ out_putfolio:
+ folio_put(folio);
out:
- return -EBUSY;
+ return false;
}
- static void putback_movable_page(struct page *page)
+ static void putback_movable_folio(struct folio *folio)
{
- const struct movable_operations *mops = page_movable_ops(page);
+ const struct movable_operations *mops = folio_movable_ops(folio);
- mops->putback_page(page);
- ClearPageIsolated(page);
+ mops->putback_page(&folio->page);
+ folio_clear_isolated(folio);
}
/*
*/
void putback_movable_pages(struct list_head *l)
{
- struct page *page;
- struct page *page2;
+ struct folio *folio;
+ struct folio *folio2;
- list_for_each_entry_safe(page, page2, l, lru) {
- if (unlikely(PageHuge(page))) {
- putback_active_hugepage(page);
+ list_for_each_entry_safe(folio, folio2, l, lru) {
+ if (unlikely(folio_test_hugetlb(folio))) {
+ folio_putback_active_hugetlb(folio);
continue;
}
- list_del(&page->lru);
+ list_del(&folio->lru);
/*
- * We isolated non-lru movable page so here we can use
- * __PageMovable because LRU page's mapping cannot have
+ * We isolated non-lru movable folio so here we can use
+ * __PageMovable because LRU folio's mapping cannot have
* PAGE_MAPPING_MOVABLE.
*/
- if (unlikely(__PageMovable(page))) {
- VM_BUG_ON_PAGE(!PageIsolated(page), page);
- lock_page(page);
- if (PageMovable(page))
- putback_movable_page(page);
+ if (unlikely(__folio_test_movable(folio))) {
+ VM_BUG_ON_FOLIO(!folio_test_isolated(folio), folio);
+ folio_lock(folio);
+ if (folio_test_movable(folio))
+ putback_movable_folio(folio);
else
- ClearPageIsolated(page);
- unlock_page(page);
- put_page(page);
+ folio_clear_isolated(folio);
+ folio_unlock(folio);
+ folio_put(folio);
} else {
- mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON +
- page_is_file_lru(page), -thp_nr_pages(page));
- putback_lru_page(page);
+ node_stat_mod_folio(folio, NR_ISOLATED_ANON +
+ folio_is_file_lru(folio), -folio_nr_pages(folio));
+ folio_putback_lru(folio);
}
}
}
pte = maybe_mkwrite(pte, vma);
else if (pte_swp_uffd_wp(*pvmw.pte))
pte = pte_mkuffd_wp(pte);
+ else
+ pte = pte_wrprotect(pte);
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
}
if (vma->vm_flags & VM_LOCKED)
- mlock_page_drain_local();
+ mlock_drain_local();
trace_remove_migration_pte(pvmw.address, pte_val(pte),
compound_order(new));
}
#ifdef CONFIG_HUGETLB_PAGE
- void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl)
+ /*
+ * The vma read lock must be held upon entry. Holding that lock prevents either
+ * the pte or the ptl from being freed.
+ *
+ * This function will release the vma lock before returning.
+ */
+ void __migration_entry_wait_huge(struct vm_area_struct *vma,
+ pte_t *ptep, spinlock_t *ptl)
{
pte_t pte;
+ hugetlb_vma_assert_locked(vma);
spin_lock(ptl);
pte = huge_ptep_get(ptep);
- if (unlikely(!is_hugetlb_entry_migration(pte)))
+ if (unlikely(!is_hugetlb_entry_migration(pte))) {
spin_unlock(ptl);
- else
+ hugetlb_vma_unlock_read(vma);
+ } else {
+ /*
+ * If migration entry existed, safe to release vma lock
+ * here because the pgtable page won't be freed without the
+ * pgtable lock released. See comment right above pgtable
+ * lock release in migration_entry_wait_on_locked().
+ */
+ hugetlb_vma_unlock_read(vma);
migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl);
+ }
}
void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte)
{
spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte);
- __migration_entry_wait_huge(pte, ptl);
+ __migration_entry_wait_huge(vma, pte, ptl);
}
#endif
goto out;
}
- mops = page_movable_ops(&src->page);
+ mops = folio_movable_ops(src);
rc = mops->migrate_page(&dst->page, &src->page, mode);
WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
!folio_test_isolated(src));
return rc;
}
- static int __unmap_and_move(struct folio *src, struct folio *dst,
- int force, enum migrate_mode mode)
+ /*
+ * To record some information during migration, we use some unused
+ * fields (mapping and private) of struct folio of the newly allocated
+ * destination folio. This is safe because nobody is using them
+ * except us.
+ */
+ static void __migrate_folio_record(struct folio *dst,
+ unsigned long page_was_mapped,
+ struct anon_vma *anon_vma)
+ {
+ dst->mapping = (void *)anon_vma;
+ dst->private = (void *)page_was_mapped;
+ }
+
+ static void __migrate_folio_extract(struct folio *dst,
+ int *page_was_mappedp,
+ struct anon_vma **anon_vmap)
+ {
+ *anon_vmap = (void *)dst->mapping;
+ *page_was_mappedp = (unsigned long)dst->private;
+ dst->mapping = NULL;
+ dst->private = NULL;
+ }
+
+ /* Restore the source folio to the original state upon failure */
+ static void migrate_folio_undo_src(struct folio *src,
+ int page_was_mapped,
+ struct anon_vma *anon_vma,
+ bool locked,
+ struct list_head *ret)
+ {
+ if (page_was_mapped)
+ remove_migration_ptes(src, src, false);
+ /* Drop an anon_vma reference if we took one */
+ if (anon_vma)
+ put_anon_vma(anon_vma);
+ if (locked)
+ folio_unlock(src);
+ if (ret)
+ list_move_tail(&src->lru, ret);
+ }
+
+ /* Restore the destination folio to the original state upon failure */
+ static void migrate_folio_undo_dst(struct folio *dst,
+ bool locked,
+ free_page_t put_new_page,
+ unsigned long private)
+ {
+ if (locked)
+ folio_unlock(dst);
+ if (put_new_page)
+ put_new_page(&dst->page, private);
+ else
+ folio_put(dst);
+ }
+
+ /* Cleanup src folio upon migration success */
+ static void migrate_folio_done(struct folio *src,
+ enum migrate_reason reason)
+ {
+ /*
+ * Compaction can migrate also non-LRU pages which are
+ * not accounted to NR_ISOLATED_*. They can be recognized
+ * as __PageMovable
+ */
+ if (likely(!__folio_test_movable(src)))
+ mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
+ folio_is_file_lru(src), -folio_nr_pages(src));
+
+ if (reason != MR_MEMORY_FAILURE)
+ /* We release the page in page_handle_poison. */
+ folio_put(src);
+ }
+
+ /* Obtain the lock on page, remove all ptes. */
+ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
+ unsigned long private, struct folio *src,
+ struct folio **dstp, int force, bool avoid_force_lock,
+ enum migrate_mode mode, enum migrate_reason reason,
+ struct list_head *ret)
{
+ struct folio *dst;
int rc = -EAGAIN;
- bool page_was_mapped = false;
+ struct page *newpage = NULL;
+ int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
bool is_lru = !__PageMovable(&src->page);
+ bool locked = false;
+ bool dst_locked = false;
+
+ if (folio_ref_count(src) == 1) {
+ /* Folio was freed from under us. So we are done. */
+ folio_clear_active(src);
+ folio_clear_unevictable(src);
+ /* free_pages_prepare() will clear PG_isolated. */
+ list_del(&src->lru);
+ migrate_folio_done(src, reason);
+ return MIGRATEPAGE_SUCCESS;
+ }
+
+ newpage = get_new_page(&src->page, private);
+ if (!newpage)
+ return -ENOMEM;
+ dst = page_folio(newpage);
+ *dstp = dst;
+
+ dst->private = NULL;
if (!folio_trylock(src)) {
if (!force || mode == MIGRATE_ASYNC)
if (current->flags & PF_MEMALLOC)
goto out;
+ /*
+ * We have locked some folios and are going to wait to lock
+ * this folio. To avoid a potential deadlock, let's bail
+ * out and not do that. The locked folios will be moved and
+ * unlocked, then we can wait to lock this folio.
+ */
+ if (avoid_force_lock) {
+ rc = -EDEADLOCK;
+ goto out;
+ }
+
folio_lock(src);
}
+ locked = true;
if (folio_test_writeback(src)) {
/*
break;
default:
rc = -EBUSY;
- goto out_unlock;
+ goto out;
}
if (!force)
- goto out_unlock;
+ goto out;
folio_wait_writeback(src);
}
* This is much like races on refcount of oldpage: just don't BUG().
*/
if (unlikely(!folio_trylock(dst)))
- goto out_unlock;
+ goto out;
+ dst_locked = true;
if (unlikely(!is_lru)) {
- rc = move_to_new_folio(dst, src, mode);
- goto out_unlock_both;
+ __migrate_folio_record(dst, page_was_mapped, anon_vma);
+ return MIGRATEPAGE_UNMAP;
}
/*
if (!src->mapping) {
if (folio_test_private(src)) {
try_to_free_buffers(src);
- goto out_unlock_both;
+ goto out;
}
} else if (folio_mapped(src)) {
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
!folio_test_ksm(src) && !anon_vma, src);
- try_to_migrate(src, 0);
- page_was_mapped = true;
+ try_to_migrate(src, TTU_BATCH_FLUSH);
+ page_was_mapped = 1;
}
- if (!folio_mapped(src))
- rc = move_to_new_folio(dst, src, mode);
+ if (!folio_mapped(src)) {
+ __migrate_folio_record(dst, page_was_mapped, anon_vma);
+ return MIGRATEPAGE_UNMAP;
+ }
+
+ out:
+ /*
+ * A folio that has not been unmapped will be restored to
+ * right list unless we want to retry.
+ */
+ if (rc == -EAGAIN || rc == -EDEADLOCK)
+ ret = NULL;
+
+ migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
+ migrate_folio_undo_dst(dst, dst_locked, put_new_page, private);
+
+ return rc;
+ }
+
+ /* Migrate the folio to the newly allocated folio in dst. */
+ static int migrate_folio_move(free_page_t put_new_page, unsigned long private,
+ struct folio *src, struct folio *dst,
+ enum migrate_mode mode, enum migrate_reason reason,
+ struct list_head *ret)
+ {
+ int rc;
+ int page_was_mapped = 0;
+ struct anon_vma *anon_vma = NULL;
+ bool is_lru = !__PageMovable(&src->page);
+ struct list_head *prev;
+
+ __migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+ prev = dst->lru.prev;
+ list_del(&dst->lru);
+
+ rc = move_to_new_folio(dst, src, mode);
+ if (rc)
+ goto out;
+
+ if (unlikely(!is_lru))
+ goto out_unlock_both;
/*
* When successful, push dst to LRU immediately: so that if it
* unsuccessful, and other cases when a page has been temporarily
* isolated from the unevictable LRU: but this case is the easiest.
*/
- if (rc == MIGRATEPAGE_SUCCESS) {
- folio_add_lru(dst);
- if (page_was_mapped)
- lru_add_drain();
- }
+ folio_add_lru(dst);
+ if (page_was_mapped)
+ lru_add_drain();
if (page_was_mapped)
- remove_migration_ptes(src,
- rc == MIGRATEPAGE_SUCCESS ? dst : src, false);
+ remove_migration_ptes(src, dst, false);
out_unlock_both:
folio_unlock(dst);
- out_unlock:
- /* Drop an anon_vma reference if we took one */
- if (anon_vma)
- put_anon_vma(anon_vma);
- folio_unlock(src);
- out:
+ set_page_owner_migrate_reason(&dst->page, reason);
/*
* If migration is successful, decrease refcount of dst,
* which will not free the page because new page owner increased
* refcounter.
*/
- if (rc == MIGRATEPAGE_SUCCESS)
- folio_put(dst);
+ folio_put(dst);
- return rc;
- }
-
- /*
- * Obtain the lock on folio, remove all ptes and migrate the folio
- * to the newly allocated folio in dst.
- */
- static int unmap_and_move(new_page_t get_new_page,
- free_page_t put_new_page,
- unsigned long private, struct folio *src,
- int force, enum migrate_mode mode,
- enum migrate_reason reason,
- struct list_head *ret)
- {
- struct folio *dst;
- int rc = MIGRATEPAGE_SUCCESS;
- struct page *newpage = NULL;
-
- if (!thp_migration_supported() && folio_test_transhuge(src))
- return -ENOSYS;
-
- if (folio_ref_count(src) == 1) {
- /* Folio was freed from under us. So we are done. */
- folio_clear_active(src);
- folio_clear_unevictable(src);
- /* free_pages_prepare() will clear PG_isolated. */
- goto out;
- }
-
- newpage = get_new_page(&src->page, private);
- if (!newpage)
- return -ENOMEM;
- dst = page_folio(newpage);
-
- dst->private = NULL;
- rc = __unmap_and_move(src, dst, force, mode);
- if (rc == MIGRATEPAGE_SUCCESS)
- set_page_owner_migrate_reason(&dst->page, reason);
+ /*
+ * A folio that has been migrated has all references removed
+ * and will be freed.
+ */
+ list_del(&src->lru);
+ /* Drop an anon_vma reference if we took one */
+ if (anon_vma)
+ put_anon_vma(anon_vma);
+ folio_unlock(src);
+ migrate_folio_done(src, reason);
+ return rc;
out:
- if (rc != -EAGAIN) {
- /*
- * A folio that has been migrated has all references
- * removed and will be freed. A folio that has not been
- * migrated will have kept its references and be restored.
- */
- list_del(&src->lru);
- }
-
/*
- * If migration is successful, releases reference grabbed during
- * isolation. Otherwise, restore the folio to right list unless
- * we want to retry.
+ * A folio that has not been migrated will be restored to
+ * right list unless we want to retry.
*/
- if (rc == MIGRATEPAGE_SUCCESS) {
- /*
- * Compaction can migrate also non-LRU folios which are
- * not accounted to NR_ISOLATED_*. They can be recognized
- * as __folio_test_movable
- */
- if (likely(!__folio_test_movable(src)))
- mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
- folio_is_file_lru(src), -folio_nr_pages(src));
-
- if (reason != MR_MEMORY_FAILURE)
- /*
- * We release the folio in page_handle_poison.
- */
- folio_put(src);
- } else {
- if (rc != -EAGAIN)
- list_add_tail(&src->lru, ret);
-
- if (put_new_page)
- put_new_page(&dst->page, private);
- else
- folio_put(dst);
+ if (rc == -EAGAIN) {
+ list_add(&dst->lru, prev);
+ __migrate_folio_record(dst, page_was_mapped, anon_vma);
+ return rc;
}
+ migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret);
+ migrate_folio_undo_dst(dst, true, put_new_page, private);
+
return rc;
}
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
- /*
- * Migratability of hugepages depends on architectures and their size.
- * This check is necessary because some callers of hugepage migration
- * like soft offline and memory hotremove don't walk through page
- * tables or check whether the hugepage is pmd-based or not before
- * kicking migration.
- */
- if (!hugepage_migration_supported(page_hstate(hpage)))
- return -ENOSYS;
-
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
- putback_active_hugepage(hpage);
+ folio_putback_active_hugetlb(src);
return MIGRATEPAGE_SUCCESS;
}
folio_unlock(src);
out:
if (rc == MIGRATEPAGE_SUCCESS)
- putback_active_hugepage(hpage);
+ folio_putback_active_hugetlb(src);
else if (rc != -EAGAIN)
list_move_tail(&src->lru, ret);
if (put_new_page)
put_new_page(new_hpage, private);
else
- putback_active_hugepage(new_hpage);
+ folio_putback_active_hugetlb(dst);
return rc;
}
return rc;
}
+ #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ #define NR_MAX_BATCHED_MIGRATION HPAGE_PMD_NR
+ #else
+ #define NR_MAX_BATCHED_MIGRATION 512
+ #endif
+ #define NR_MAX_MIGRATE_PAGES_RETRY 10
+
+ struct migrate_pages_stats {
+ int nr_succeeded; /* Normal and large folios migrated successfully, in
+ units of base pages */
+ int nr_failed_pages; /* Normal and large folios failed to be migrated, in
+ units of base pages. Untried folios aren't counted */
+ int nr_thp_succeeded; /* THP migrated successfully */
+ int nr_thp_failed; /* THP failed to be migrated */
+ int nr_thp_split; /* THP split before migrating */
+ };
+
/*
- * migrate_pages - migrate the folios specified in a list, to the free folios
- * supplied as the target for the page migration
- *
- * @from: The list of folios to be migrated.
- * @get_new_page: The function used to allocate free folios to be used
- * as the target of the folio migration.
- * @put_new_page: The function used to free target folios if migration
- * fails, or NULL if no special handling is necessary.
- * @private: Private data to be passed on to get_new_page()
- * @mode: The migration mode that specifies the constraints for
- * folio migration, if any.
- * @reason: The reason for folio migration.
- * @ret_succeeded: Set to the number of folios migrated successfully if
- * the caller passes a non-NULL pointer.
- *
- * The function returns after 10 attempts or if no folios are movable any more
- * because the list has become empty or no retryable folios exist any more.
- * It is caller's responsibility to call putback_movable_pages() to return folios
- * to the LRU or free list only if ret != 0.
- *
- * Returns the number of {normal folio, large folio, hugetlb} that were not
- * migrated, or an error code. The number of large folio splits will be
- * considered as the number of non-migrated large folio, no matter how many
- * split folios of the large folio are migrated successfully.
+ * Returns the number of hugetlb folios that were not migrated, or an error code
+ * after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no hugetlb folios are movable
+ * any more because the list has become empty or no retryable hugetlb folios
+ * exist any more. It is caller's responsibility to call putback_movable_pages()
+ * only if ret != 0.
*/
- int migrate_pages(struct list_head *from, new_page_t get_new_page,
- free_page_t put_new_page, unsigned long private,
- enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+ static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
+ free_page_t put_new_page, unsigned long private,
+ enum migrate_mode mode, int reason,
+ struct migrate_pages_stats *stats,
+ struct list_head *ret_folios)
{
int retry = 1;
+ int nr_failed = 0;
+ int nr_retry_pages = 0;
+ int pass = 0;
+ struct folio *folio, *folio2;
+ int rc, nr_pages;
+
+ for (pass = 0; pass < NR_MAX_MIGRATE_PAGES_RETRY && retry; pass++) {
+ retry = 0;
+ nr_retry_pages = 0;
+
+ list_for_each_entry_safe(folio, folio2, from, lru) {
+ if (!folio_test_hugetlb(folio))
+ continue;
+
+ nr_pages = folio_nr_pages(folio);
+
+ cond_resched();
+
+ /*
+ * Migratability of hugepages depends on architectures and
+ * their size. This check is necessary because some callers
+ * of hugepage migration like soft offline and memory
+ * hotremove don't walk through page tables or check whether
+ * the hugepage is pmd-based or not before kicking migration.
+ */
+ if (!hugepage_migration_supported(folio_hstate(folio))) {
+ nr_failed++;
+ stats->nr_failed_pages += nr_pages;
+ list_move_tail(&folio->lru, ret_folios);
+ continue;
+ }
+
+ rc = unmap_and_move_huge_page(get_new_page,
+ put_new_page, private,
+ &folio->page, pass > 2, mode,
+ reason, ret_folios);
+ /*
+ * The rules are:
+ * Success: hugetlb folio will be put back
+ * -EAGAIN: stay on the from list
+ * -ENOMEM: stay on the from list
+ * Other errno: put on ret_folios list
+ */
+ switch(rc) {
+ case -ENOMEM:
+ /*
+ * When memory is low, don't bother to try to migrate
+ * other folios, just exit.
+ */
+ stats->nr_failed_pages += nr_pages + nr_retry_pages;
+ return -ENOMEM;
+ case -EAGAIN:
+ retry++;
+ nr_retry_pages += nr_pages;
+ break;
+ case MIGRATEPAGE_SUCCESS:
+ stats->nr_succeeded += nr_pages;
+ break;
+ default:
+ /*
+ * Permanent failure (-EBUSY, etc.):
+ * unlike -EAGAIN case, the failed folio is
+ * removed from migration folio list and not
+ * retried in the next outer loop.
+ */
+ nr_failed++;
+ stats->nr_failed_pages += nr_pages;
+ break;
+ }
+ }
+ }
+ /*
+ * nr_failed is number of hugetlb folios failed to be migrated. After
+ * NR_MAX_MIGRATE_PAGES_RETRY attempts, give up and count retried hugetlb
+ * folios as failed.
+ */
+ nr_failed += retry;
+ stats->nr_failed_pages += nr_retry_pages;
+
+ return nr_failed;
+ }
+
+ /*
+ * migrate_pages_batch() first unmaps folios in the from list as many as
+ * possible, then move the unmapped folios.
+ */
+ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
+ free_page_t put_new_page, unsigned long private,
+ enum migrate_mode mode, int reason, struct list_head *ret_folios,
+ struct migrate_pages_stats *stats)
+ {
+ int retry;
int large_retry = 1;
int thp_retry = 1;
int nr_failed = 0;
- int nr_failed_pages = 0;
int nr_retry_pages = 0;
- int nr_succeeded = 0;
- int nr_thp_succeeded = 0;
int nr_large_failed = 0;
- int nr_thp_failed = 0;
- int nr_thp_split = 0;
int pass = 0;
bool is_large = false;
bool is_thp = false;
- struct folio *folio, *folio2;
- int rc, nr_pages;
- LIST_HEAD(ret_folios);
+ struct folio *folio, *folio2, *dst = NULL, *dst2;
+ int rc, rc_saved, nr_pages;
LIST_HEAD(split_folios);
+ LIST_HEAD(unmap_folios);
+ LIST_HEAD(dst_folios);
bool nosplit = (reason == MR_NUMA_MISPLACED);
bool no_split_folio_counting = false;
-
- trace_mm_migrate_pages_start(mode, reason);
-
- split_folio_migration:
- for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
+ bool avoid_force_lock;
+
+ retry:
+ rc_saved = 0;
+ avoid_force_lock = false;
+ retry = 1;
+ for (pass = 0;
+ pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
+ pass++) {
retry = 0;
large_retry = 0;
thp_retry = 0;
* folio. Capture required information that might get
* lost during migration.
*/
- is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
+ is_large = folio_test_large(folio);
is_thp = is_large && folio_test_pmd_mappable(folio);
nr_pages = folio_nr_pages(folio);
+
cond_resched();
- if (folio_test_hugetlb(folio))
- rc = unmap_and_move_huge_page(get_new_page,
- put_new_page, private,
- &folio->page, pass > 2, mode,
- reason,
- &ret_folios);
- else
- rc = unmap_and_move(get_new_page, put_new_page,
- private, folio, pass > 2, mode,
- reason, &ret_folios);
- /*
- * The rules are:
- * Success: non hugetlb folio will be freed, hugetlb
- * folio will be put back
- * -EAGAIN: stay on the from list
- * -ENOMEM: stay on the from list
- * -ENOSYS: stay on the from list
- * Other errno: put on ret_folios list then splice to
- * from list
- */
- switch(rc) {
/*
* Large folio migration might be unsupported or
- * the allocation could've failed so we should retry
+ * the allocation might be failed so we should retry
* on the same folio with the large folio split
* to normal folios.
*
* we will migrate them after the rest of the
* list is processed.
*/
- case -ENOSYS:
- /* Large folio migration is unsupported */
- if (is_large) {
- nr_large_failed++;
- nr_thp_failed += is_thp;
- if (!try_split_folio(folio, &split_folios)) {
- nr_thp_split += is_thp;
- break;
- }
- /* Hugetlb migration is unsupported */
- } else if (!no_split_folio_counting) {
- nr_failed++;
+ if (!thp_migration_supported() && is_thp) {
+ nr_large_failed++;
+ stats->nr_thp_failed++;
+ if (!try_split_folio(folio, &split_folios)) {
+ stats->nr_thp_split++;
+ continue;
}
+ stats->nr_failed_pages += nr_pages;
+ list_move_tail(&folio->lru, ret_folios);
+ continue;
+ }
- nr_failed_pages += nr_pages;
- list_move_tail(&folio->lru, &ret_folios);
- break;
+ rc = migrate_folio_unmap(get_new_page, put_new_page, private,
+ folio, &dst, pass > 2, avoid_force_lock,
+ mode, reason, ret_folios);
+ /*
+ * The rules are:
+ * Success: folio will be freed
+ * Unmap: folio will be put on unmap_folios list,
+ * dst folio put on dst_folios list
+ * -EAGAIN: stay on the from list
+ * -EDEADLOCK: stay on the from list
+ * -ENOMEM: stay on the from list
+ * Other errno: put on ret_folios list
+ */
+ switch(rc) {
case -ENOMEM:
/*
* When memory is low, don't bother to try to migrate
- * other folios, just exit.
+ * other folios, move unmapped folios, then exit.
*/
if (is_large) {
nr_large_failed++;
- nr_thp_failed += is_thp;
+ stats->nr_thp_failed += is_thp;
/* Large folio NUMA faulting doesn't split to retry. */
if (!nosplit) {
int ret = try_split_folio(folio, &split_folios);
if (!ret) {
- nr_thp_split += is_thp;
+ stats->nr_thp_split += is_thp;
break;
} else if (reason == MR_LONGTERM_PIN &&
ret == -EAGAIN) {
nr_failed++;
}
- nr_failed_pages += nr_pages + nr_retry_pages;
+ stats->nr_failed_pages += nr_pages + nr_retry_pages;
/*
* There might be some split folios of fail-to-migrate large
- * folios left in split_folios list. Move them back to migration
+ * folios left in split_folios list. Move them to ret_folios
* list so that they could be put back to the right list by
* the caller otherwise the folio refcnt will be leaked.
*/
- list_splice_init(&split_folios, from);
+ list_splice_init(&split_folios, ret_folios);
/* nr_failed isn't updated for not used */
nr_large_failed += large_retry;
- nr_thp_failed += thp_retry;
- goto out;
+ stats->nr_thp_failed += thp_retry;
+ rc_saved = rc;
+ if (list_empty(&unmap_folios))
+ goto out;
+ else
+ goto move;
+ case -EDEADLOCK:
+ /*
+ * The folio cannot be locked for potential deadlock.
+ * Go move (and unlock) all locked folios. Then we can
+ * try again.
+ */
+ rc_saved = rc;
+ goto move;
case -EAGAIN:
if (is_large) {
large_retry++;
nr_retry_pages += nr_pages;
break;
case MIGRATEPAGE_SUCCESS:
- nr_succeeded += nr_pages;
- nr_thp_succeeded += is_thp;
+ stats->nr_succeeded += nr_pages;
+ stats->nr_thp_succeeded += is_thp;
+ break;
+ case MIGRATEPAGE_UNMAP:
+ /*
+ * We have locked some folios, don't force lock
+ * to avoid deadlock.
+ */
+ avoid_force_lock = true;
+ list_move_tail(&folio->lru, &unmap_folios);
+ list_add_tail(&dst->lru, &dst_folios);
break;
default:
/*
*/
if (is_large) {
nr_large_failed++;
- nr_thp_failed += is_thp;
+ stats->nr_thp_failed += is_thp;
+ } else if (!no_split_folio_counting) {
+ nr_failed++;
+ }
+
+ stats->nr_failed_pages += nr_pages;
+ break;
+ }
+ }
+ }
+ nr_failed += retry;
+ nr_large_failed += large_retry;
+ stats->nr_thp_failed += thp_retry;
+ stats->nr_failed_pages += nr_retry_pages;
+ move:
+ /* Flush TLBs for all unmapped folios */
+ try_to_unmap_flush();
+
+ retry = 1;
+ for (pass = 0;
+ pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
+ pass++) {
+ retry = 0;
+ large_retry = 0;
+ thp_retry = 0;
+ nr_retry_pages = 0;
+
+ dst = list_first_entry(&dst_folios, struct folio, lru);
+ dst2 = list_next_entry(dst, lru);
+ list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+ is_large = folio_test_large(folio);
+ is_thp = is_large && folio_test_pmd_mappable(folio);
+ nr_pages = folio_nr_pages(folio);
+
+ cond_resched();
+
+ rc = migrate_folio_move(put_new_page, private,
+ folio, dst, mode,
+ reason, ret_folios);
+ /*
+ * The rules are:
+ * Success: folio will be freed
+ * -EAGAIN: stay on the unmap_folios list
+ * Other errno: put on ret_folios list
+ */
+ switch(rc) {
+ case -EAGAIN:
+ if (is_large) {
+ large_retry++;
+ thp_retry += is_thp;
+ } else if (!no_split_folio_counting) {
+ retry++;
+ }
+ nr_retry_pages += nr_pages;
+ break;
+ case MIGRATEPAGE_SUCCESS:
+ stats->nr_succeeded += nr_pages;
+ stats->nr_thp_succeeded += is_thp;
+ break;
+ default:
+ if (is_large) {
+ nr_large_failed++;
+ stats->nr_thp_failed += is_thp;
} else if (!no_split_folio_counting) {
nr_failed++;
}
- nr_failed_pages += nr_pages;
+ stats->nr_failed_pages += nr_pages;
break;
}
+ dst = dst2;
+ dst2 = list_next_entry(dst, lru);
}
}
nr_failed += retry;
nr_large_failed += large_retry;
- nr_thp_failed += thp_retry;
- nr_failed_pages += nr_retry_pages;
+ stats->nr_thp_failed += thp_retry;
+ stats->nr_failed_pages += nr_retry_pages;
+
+ if (rc_saved)
+ rc = rc_saved;
+ else
+ rc = nr_failed + nr_large_failed;
+ out:
+ /* Cleanup remaining folios */
+ dst = list_first_entry(&dst_folios, struct folio, lru);
+ dst2 = list_next_entry(dst, lru);
+ list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) {
+ int page_was_mapped = 0;
+ struct anon_vma *anon_vma = NULL;
+
+ __migrate_folio_extract(dst, &page_was_mapped, &anon_vma);
+ migrate_folio_undo_src(folio, page_was_mapped, anon_vma,
+ true, ret_folios);
+ list_del(&dst->lru);
+ migrate_folio_undo_dst(dst, true, put_new_page, private);
+ dst = dst2;
+ dst2 = list_next_entry(dst, lru);
+ }
+
/*
* Try to migrate split folios of fail-to-migrate large folios, no
* nr_failed counting in this round, since all split folios of a
* large folio is counted as 1 failure in the first round.
*/
- if (!list_empty(&split_folios)) {
+ if (rc >= 0 && !list_empty(&split_folios)) {
/*
- * Move non-migrated folios (after 10 retries) to ret_folios
- * to avoid migrating them again.
+ * Move non-migrated folios (after NR_MAX_MIGRATE_PAGES_RETRY
+ * retries) to ret_folios to avoid migrating them again.
*/
- list_splice_init(from, &ret_folios);
+ list_splice_init(from, ret_folios);
list_splice_init(&split_folios, from);
no_split_folio_counting = true;
- retry = 1;
- goto split_folio_migration;
+ goto retry;
}
- rc = nr_failed + nr_large_failed;
+ /*
+ * We have unlocked all locked folios, so we can force lock now, let's
+ * try again.
+ */
+ if (rc == -EDEADLOCK)
+ goto retry;
+
+ return rc;
+ }
+
+ /*
+ * migrate_pages - migrate the folios specified in a list, to the free folios
+ * supplied as the target for the page migration
+ *
+ * @from: The list of folios to be migrated.
+ * @get_new_page: The function used to allocate free folios to be used
+ * as the target of the folio migration.
+ * @put_new_page: The function used to free target folios if migration
+ * fails, or NULL if no special handling is necessary.
+ * @private: Private data to be passed on to get_new_page()
+ * @mode: The migration mode that specifies the constraints for
+ * folio migration, if any.
+ * @reason: The reason for folio migration.
+ * @ret_succeeded: Set to the number of folios migrated successfully if
+ * the caller passes a non-NULL pointer.
+ *
+ * The function returns after NR_MAX_MIGRATE_PAGES_RETRY attempts or if no folios
+ * are movable any more because the list has become empty or no retryable folios
+ * exist any more. It is caller's responsibility to call putback_movable_pages()
+ * only if ret != 0.
+ *
+ * Returns the number of {normal folio, large folio, hugetlb} that were not
+ * migrated, or an error code. The number of large folio splits will be
+ * considered as the number of non-migrated large folio, no matter how many
+ * split folios of the large folio are migrated successfully.
+ */
+ int migrate_pages(struct list_head *from, new_page_t get_new_page,
+ free_page_t put_new_page, unsigned long private,
+ enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+ {
+ int rc, rc_gather;
+ int nr_pages;
+ struct folio *folio, *folio2;
+ LIST_HEAD(folios);
+ LIST_HEAD(ret_folios);
+ struct migrate_pages_stats stats;
+
+ trace_mm_migrate_pages_start(mode, reason);
+
+ memset(&stats, 0, sizeof(stats));
+
+ rc_gather = migrate_hugetlbs(from, get_new_page, put_new_page, private,
+ mode, reason, &stats, &ret_folios);
+ if (rc_gather < 0)
+ goto out;
+ again:
+ nr_pages = 0;
+ list_for_each_entry_safe(folio, folio2, from, lru) {
+ /* Retried hugetlb folios will be kept in list */
+ if (folio_test_hugetlb(folio)) {
+ list_move_tail(&folio->lru, &ret_folios);
+ continue;
+ }
+
+ nr_pages += folio_nr_pages(folio);
+ if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+ break;
+ }
+ if (nr_pages > NR_MAX_BATCHED_MIGRATION)
+ list_cut_before(&folios, from, &folio->lru);
+ else
+ list_splice_init(from, &folios);
+ rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
+ mode, reason, &ret_folios, &stats);
+ list_splice_tail_init(&folios, &ret_folios);
+ if (rc < 0) {
+ rc_gather = rc;
+ goto out;
+ }
+ rc_gather += rc;
+ if (!list_empty(from))
+ goto again;
out:
/*
* Put the permanent failure folio back to migration list, they
* are migrated successfully.
*/
if (list_empty(from))
- rc = 0;
+ rc_gather = 0;
- count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
- count_vm_events(PGMIGRATE_FAIL, nr_failed_pages);
- count_vm_events(THP_MIGRATION_SUCCESS, nr_thp_succeeded);
- count_vm_events(THP_MIGRATION_FAIL, nr_thp_failed);
- count_vm_events(THP_MIGRATION_SPLIT, nr_thp_split);
- trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
- nr_thp_failed, nr_thp_split, mode, reason);
+ count_vm_events(PGMIGRATE_SUCCESS, stats.nr_succeeded);
+ count_vm_events(PGMIGRATE_FAIL, stats.nr_failed_pages);
+ count_vm_events(THP_MIGRATION_SUCCESS, stats.nr_thp_succeeded);
+ count_vm_events(THP_MIGRATION_FAIL, stats.nr_thp_failed);
+ count_vm_events(THP_MIGRATION_SPLIT, stats.nr_thp_split);
+ trace_mm_migrate_pages(stats.nr_succeeded, stats.nr_failed_pages,
+ stats.nr_thp_succeeded, stats.nr_thp_failed,
+ stats.nr_thp_split, mode, reason);
if (ret_succeeded)
- *ret_succeeded = nr_succeeded;
+ *ret_succeeded = stats.nr_succeeded;
- return rc;
+ return rc_gather;
}
struct page *alloc_migration_target(struct page *page, unsigned long private)
struct migration_target_control *mtc;
gfp_t gfp_mask;
unsigned int order = 0;
+ struct folio *hugetlb_folio = NULL;
struct folio *new_folio = NULL;
int nid;
int zidx;
struct hstate *h = folio_hstate(folio);
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
- return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
+ hugetlb_folio = alloc_hugetlb_folio_nodemask(h, nid,
+ mtc->nmask, gfp_mask);
+ return &hugetlb_folio->page;
}
if (folio_test_large(folio)) {
struct vm_area_struct *vma;
struct page *page;
int err;
+ bool isolated;
mmap_read_lock(mm);
err = -EFAULT;
if (PageHuge(page)) {
if (PageHead(page)) {
- err = isolate_hugetlb(page, pagelist);
- if (!err)
- err = 1;
+ isolated = isolate_hugetlb(page_folio(page), pagelist);
+ err = isolated ? 1 : -EBUSY;
}
} else {
struct page *head;
head = compound_head(page);
- err = isolate_lru_page(head);
- if (err)
+ isolated = isolate_lru_page(head);
+ if (!isolated) {
+ err = -EBUSY;
goto out_putpage;
+ }
err = 1;
list_add_tail(&head->lru, pagelist);
return 0;
}
- if (isolate_lru_page(page))
+ if (!isolate_lru_page(page))
return 0;
mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
int page_group_by_mobility_disabled __read_mostly;
+ bool deferred_struct_pages __meminitdata;
+
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
* During boot we initialize deferred pages on-demand, as needed, but once
return static_branch_unlikely(&deferred_pages);
}
- /* Returns true if the struct page for the pfn is uninitialised */
- static inline bool __meminit early_page_uninitialised(unsigned long pfn)
+ /* Returns true if the struct page for the pfn is initialised */
+ static inline bool __meminit early_page_initialised(unsigned long pfn)
{
int nid = early_pfn_to_nid(pfn);
if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
- return true;
+ return false;
- return false;
+ return true;
}
/*
return false;
}
- static inline bool early_page_uninitialised(unsigned long pfn)
+ static inline bool early_page_initialised(unsigned long pfn)
{
- return false;
+ return true;
}
static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
static void prep_compound_head(struct page *page, unsigned int order)
{
+ struct folio *folio = (struct folio *)page;
+
set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
set_compound_order(page, order);
- atomic_set(compound_mapcount_ptr(page), -1);
- atomic_set(subpages_mapcount_ptr(page), 0);
- atomic_set(compound_pincount_ptr(page), 0);
+ atomic_set(&folio->_entire_mapcount, -1);
+ atomic_set(&folio->_nr_pages_mapped, 0);
+ atomic_set(&folio->_pincount, 0);
}
static void prep_compound_tail(struct page *head, int tail_idx)
void destroy_large_folio(struct folio *folio)
{
- enum compound_dtor_id dtor = folio_page(folio, 1)->compound_dtor;
+ enum compound_dtor_id dtor = folio->_folio_dtor;
VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
compound_page_dtors[dtor](&folio->page);
static int free_tail_pages_check(struct page *head_page, struct page *page)
{
+ struct folio *folio = (struct folio *)head_page;
int ret = 1;
/*
switch (page - head_page) {
case 1:
/* the first tail page: these may be in place of ->mapping */
- if (unlikely(head_compound_mapcount(head_page))) {
- bad_page(page, "nonzero compound_mapcount");
+ if (unlikely(folio_entire_mapcount(folio))) {
+ bad_page(page, "nonzero entire_mapcount");
goto out;
}
- if (unlikely(atomic_read(subpages_mapcount_ptr(head_page)))) {
- bad_page(page, "nonzero subpages_mapcount");
+ if (unlikely(atomic_read(&folio->_nr_pages_mapped))) {
+ bad_page(page, "nonzero nr_pages_mapped");
goto out;
}
- if (unlikely(head_compound_pincount(head_page))) {
- bad_page(page, "nonzero compound_pincount");
+ if (unlikely(atomic_read(&folio->_pincount))) {
+ bad_page(page, "nonzero pincount");
goto out;
}
break;
* see the comment next to it.
* 3. Skipping poisoning is requested via __GFP_SKIP_KASAN_POISON,
* see the comment next to it.
+ * 4. The allocation is excluded from being checked due to sampling,
+ * see the call to kasan_unpoison_pages.
*
* Poisoning pages during deferred memory init will greatly lengthen the
* process and cause problem in large memory systems as the deferred pages
* Do not let hwpoison pages hit pcplists/buddy
* Untie memcg state and reset page's owner
*/
- if (memcg_kmem_enabled() && PageMemcgKmem(page))
+ if (memcg_kmem_online() && PageMemcgKmem(page))
__memcg_kmem_uncharge_page(page, order);
reset_page_owner(page, order);
page_table_check_free(page, order);
}
if (PageMappingFlags(page))
page->mapping = NULL;
- if (memcg_kmem_enabled() && PageMemcgKmem(page))
+ if (memcg_kmem_online() && PageMemcgKmem(page))
__memcg_kmem_uncharge_page(page, order);
if (check_free && free_page_is_bad(page))
bad++;
pg_data_t *pgdat;
int nid, zid;
- if (!early_page_uninitialised(pfn))
+ if (early_page_initialised(pfn))
return;
nid = early_pfn_to_nid(pfn);
void __init memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order)
{
- if (early_page_uninitialised(pfn))
+ if (!early_page_initialised(pfn))
return;
if (!kmsan_memblock_free_pages(page, order)) {
/* KMSAN will take care of these pages. */
{
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
!should_skip_init(gfp_flags);
- bool init_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+ bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+ bool reset_tags = true;
int i;
set_page_private(page, 0);
*/
/*
- * If memory tags should be zeroed (which happens only when memory
- * should be initialized as well).
+ * If memory tags should be zeroed
+ * (which happens only when memory should be initialized as well).
*/
- if (init_tags) {
- /* Initialize both memory and tags. */
+ if (zero_tags) {
+ /* Initialize both memory and memory tags. */
for (i = 0; i != 1 << order; ++i)
tag_clear_highpage(page + i);
- /* Note that memory is already initialized by the loop above. */
+ /* Take note that memory was initialized by the loop above. */
init = false;
}
if (!should_skip_kasan_unpoison(gfp_flags)) {
- /* Unpoison shadow memory or set memory tags. */
- kasan_unpoison_pages(page, order, init);
-
- /* Note that memory is already initialized by KASAN. */
- if (kasan_has_integrated_init())
- init = false;
- } else {
- /* Ensure page_address() dereferencing does not fault. */
+ /* Try unpoisoning (or setting tags) and initializing memory. */
+ if (kasan_unpoison_pages(page, order, init)) {
+ /* Take note that memory was initialized by KASAN. */
+ if (kasan_has_integrated_init())
+ init = false;
+ /* Take note that memory tags were set by KASAN. */
+ reset_tags = false;
+ } else {
+ /*
+ * KASAN decided to exclude this allocation from being
+ * (un)poisoned due to sampling. Make KASAN skip
+ * poisoning when the allocation is freed.
+ */
+ SetPageSkipKASanPoison(page);
+ }
+ }
+ /*
+ * If memory tags have not been set by KASAN, reset the page tags to
+ * ensure page_address() dereferencing does not fault.
+ */
+ if (reset_tags) {
for (i = 0; i != 1 << order; ++i)
page_kasan_tag_reset(page + i);
}
- /* If memory is still not initialized, do it now. */
+ /* If memory is still not initialized, initialize it now. */
if (init)
kernel_init_pages(page, 1 << order);
/* Propagate __GFP_SKIP_KASAN_POISON to page flags. */
*
* The other migratetypes do not have fallbacks.
*/
- static int fallbacks[MIGRATE_TYPES][3] = {
- [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
- [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
- [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
+ static int fallbacks[MIGRATE_TYPES][MIGRATE_PCPTYPES - 1] = {
+ [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE },
+ [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
+ [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE },
};
#ifdef CONFIG_CMA
return -1;
*can_steal = false;
- for (i = 0;; i++) {
+ for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
fallback_mt = fallbacks[migratetype][i];
- if (fallback_mt == MIGRATE_TYPES)
- break;
-
if (free_area_empty(area, fallback_mt))
continue;
* reserved for high-order atomic allocation, so order-0
* request should skip it.
*/
- if (order > 0 && alloc_flags & ALLOC_HARDER)
+ if (alloc_flags & ALLOC_HIGHATOMIC)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
page = __rmqueue(zone, order, migratetype, alloc_flags);
+
+ /*
+ * If the allocation fails, allow OOM handling access
+ * to HIGHATOMIC reserves as failing now is worse than
+ * failing a high-order atomic allocation in the
+ * future.
+ */
+ if (!page && (alloc_flags & ALLOC_OOM))
+ page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+
if (!page) {
spin_unlock_irqrestore(&zone->lock, flags);
return NULL;
static inline long __zone_watermark_unusable_free(struct zone *z,
unsigned int order, unsigned int alloc_flags)
{
- const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
long unusable_free = (1 << order) - 1;
/*
- * If the caller does not have rights to ALLOC_HARDER then subtract
- * the high-atomic reserves. This will over-estimate the size of the
- * atomic reserve but it avoids a search.
+ * If the caller does not have rights to reserves below the min
+ * watermark then subtract the high-atomic reserves. This will
+ * over-estimate the size of the atomic reserve but it avoids a search.
*/
- if (likely(!alloc_harder))
+ if (likely(!(alloc_flags & ALLOC_RESERVES)))
unusable_free += z->nr_reserved_highatomic;
#ifdef CONFIG_CMA
{
long min = mark;
int o;
- const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
/* free_pages may go negative - that's OK */
free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
- if (alloc_flags & ALLOC_HIGH)
- min -= min / 2;
+ if (unlikely(alloc_flags & ALLOC_RESERVES)) {
+ /*
+ * __GFP_HIGH allows access to 50% of the min reserve as well
+ * as OOM.
+ */
+ if (alloc_flags & ALLOC_MIN_RESERVE) {
+ min -= min / 2;
+
+ /*
+ * Non-blocking allocations (e.g. GFP_ATOMIC) can
+ * access more reserves than just __GFP_HIGH. Other
+ * non-blocking allocations requests such as GFP_NOWAIT
+ * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
+ * access to the min reserve.
+ */
+ if (alloc_flags & ALLOC_NON_BLOCK)
+ min -= min / 4;
+ }
- if (unlikely(alloc_harder)) {
/*
- * OOM victims can try even harder than normal ALLOC_HARDER
+ * OOM victims can try even harder than the normal reserve
* users on the grounds that it's definitely going to be in
* the exit path shortly and free memory. Any allocation it
* makes during the free path will be small and short-lived.
*/
if (alloc_flags & ALLOC_OOM)
min -= min / 2;
- else
- min -= min / 4;
}
/*
return true;
}
#endif
- if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
+ if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
+ !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
return true;
+ }
}
return false;
}
if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
free_pages))
return true;
+
/*
- * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
+ * Ignore watermark boosting for __GFP_HIGH order-0 allocations
* when checking the min watermark. The min watermark is the
* point where boosting is ignored so that kswapd is woken up
* when below the low watermark.
*/
- if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
+ if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
mark = z->_watermark[WMARK_MIN];
return __zone_watermark_ok(z, order, mark, highest_zoneidx,
* Watermark failed for this zone, but see if we can
* grow this zone if it contains deferred pages.
*/
- if (static_branch_unlikely(&deferred_pages)) {
+ if (deferred_pages_enabled()) {
if (_deferred_grow_zone(zone, order))
goto try_this_zone;
}
* If this is a high-order atomic allocation then check
* if the pageblock should be reserved for the future
*/
- if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
+ if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
reserve_highatomic_pageblock(page, zone, order);
return page;
} else {
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
- if (static_branch_unlikely(&deferred_pages)) {
+ if (deferred_pages_enabled()) {
if (_deferred_grow_zone(zone, order))
goto try_this_zone;
}
}
static inline unsigned int
- gfp_to_alloc_flags(gfp_t gfp_mask)
+ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
{
unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
/*
- * __GFP_HIGH is assumed to be the same as ALLOC_HIGH
+ * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE
* and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD
* to save two branches.
*/
- BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
+ BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_MIN_RESERVE);
BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM != (__force gfp_t) ALLOC_KSWAPD);
/*
* The caller may dip into page reserves a bit more if the caller
* cannot run direct reclaim, or if the caller has realtime scheduling
* policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will
- * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH).
+ * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
*/
alloc_flags |= (__force int)
(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
- if (gfp_mask & __GFP_ATOMIC) {
+ if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
/*
* Not worth trying to allocate harder for __GFP_NOMEMALLOC even
* if it can't schedule.
*/
- if (!(gfp_mask & __GFP_NOMEMALLOC))
- alloc_flags |= ALLOC_HARDER;
+ if (!(gfp_mask & __GFP_NOMEMALLOC)) {
+ alloc_flags |= ALLOC_NON_BLOCK;
+
+ if (order > 0)
+ alloc_flags |= ALLOC_HIGHATOMIC;
+ }
+
/*
- * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
- * comment for __cpuset_node_allowed().
+ * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
+ * GFP_ATOMIC) rather than fail, see the comment for
+ * __cpuset_node_allowed().
*/
- alloc_flags &= ~ALLOC_CPUSET;
+ if (alloc_flags & ALLOC_MIN_RESERVE)
+ alloc_flags &= ~ALLOC_CPUSET;
} else if (unlikely(rt_task(current)) && in_task())
- alloc_flags |= ALLOC_HARDER;
+ alloc_flags |= ALLOC_MIN_RESERVE;
alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
unsigned int zonelist_iter_cookie;
int reserve_flags;
- /*
- * We also sanity check to catch abuse of atomic reserves being used by
- * callers that are not in atomic context.
- */
- if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
- (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
- gfp_mask &= ~__GFP_ATOMIC;
-
restart:
compaction_retries = 0;
no_progress_loops = 0;
* kswapd needs to be woken up, and to avoid the cost of setting up
* alloc_flags precisely. So we do that now.
*/
- alloc_flags = gfp_to_alloc_flags(gfp_mask);
+ alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
/*
* We need to recalculate the starting point for the zonelist iterator
WARN_ON_ONCE_GFP(costly_order, gfp_mask);
/*
- * Help non-failing allocations by giving them access to memory
- * reserves but do not use ALLOC_NO_WATERMARKS because this
+ * Help non-failing allocations by giving some access to memory
+ * reserves normally used for high priority non-blocking
+ * allocations but do not use ALLOC_NO_WATERMARKS because this
* could deplete whole memory reserves which would just make
- * the situation worse
+ * the situation worse.
*/
- page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+ page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
if (page)
goto got_pg;
goto out;
/* Bulk allocator does not support memcg accounting. */
- if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT))
+ if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT))
goto failed;
/* Use the single page allocator for one page. */
page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
out:
- if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
+ if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
__free_pages(page, order);
page = NULL;
*/
void __free_pages(struct page *page, unsigned int order)
{
+ /* get PageHead before we drop reference */
+ int head = PageHead(page);
+
if (put_page_testzero(page))
free_the_page(page, order);
- else if (!PageHead(page))
+ else if (!head)
while (order-- > 0)
free_the_page(page + (1 << order), order);
}
if (context == MEMINIT_EARLY) {
if (overlap_memmap_init(zone, &pfn))
continue;
- if (defer_init(nid, pfn, zone_end_pfn))
+ if (defer_init(nid, pfn, zone_end_pfn)) {
+ deferred_struct_pages = true;
break;
+ }
}
page = pfn_to_page(pfn);
pgdat_set_deferred_range(pgdat);
free_area_init_core(pgdat);
+ lru_gen_init_pgdat(pgdat);
}
static void __init free_area_init_memoryless_node(int nid)
/* Allocator not initialized yet */
pgdat = arch_alloc_nodedata(nid);
- if (!pgdat) {
- pr_err("Cannot allocate %zuB for node %d.\n",
- sizeof(*pgdat), nid);
- continue;
- }
+ if (!pgdat)
+ panic("Cannot allocate %zuB for node %d.\n",
+ sizeof(*pgdat), nid);
arch_refresh_nodedata(nid, pgdat);
free_area_init_memoryless_node(nid);
struct zone *zone;
lru_add_drain_cpu(cpu);
- mlock_page_drain_remote(cpu);
+ mlock_drain_remote(cpu);
drain_pages(cpu);
/*
#include <linux/swap.h>
#include <linux/bio.h>
#include <linux/swapops.h>
- #include <linux/buffer_head.h>
#include <linux/writeback.h>
#include <linux/frontswap.h>
#include <linux/blkdev.h>
#include <linux/delayacct.h>
#include "swap.h"
- static void end_swap_bio_write(struct bio *bio)
+ static void __end_swap_bio_write(struct bio *bio)
{
struct page *page = bio_first_page_all(bio);
ClearPageReclaim(page);
}
end_page_writeback(page);
+ }
+
+ static void end_swap_bio_write(struct bio *bio)
+ {
+ __end_swap_bio_write(bio);
bio_put(bio);
}
- static void end_swap_bio_read(struct bio *bio)
+ static void __end_swap_bio_read(struct bio *bio)
{
struct page *page = bio_first_page_all(bio);
- struct task_struct *waiter = bio->bi_private;
if (bio->bi_status) {
SetPageError(page);
pr_alert_ratelimited("Read-error on swap-device (%u:%u:%llu)\n",
MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)),
(unsigned long long)bio->bi_iter.bi_sector);
- goto out;
+ } else {
+ SetPageUptodate(page);
}
-
- SetPageUptodate(page);
- out:
unlock_page(page);
- WRITE_ONCE(bio->bi_private, NULL);
+ }
+
+ static void end_swap_bio_read(struct bio *bio)
+ {
+ __end_swap_bio_read(bio);
bio_put(bio);
- if (waiter) {
- blk_wake_io_task(waiter);
- put_task_struct(waiter);
- }
}
int generic_swapfile_activate(struct swap_info_struct *sis,
int swap_writepage(struct page *page, struct writeback_control *wbc)
{
struct folio *folio = page_folio(page);
- int ret = 0;
+ int ret;
if (folio_free_swap(folio)) {
folio_unlock(folio);
- goto out;
+ return 0;
}
/*
* Arch code may have to preserve more data than just the page
if (ret) {
folio_mark_dirty(folio);
folio_unlock(folio);
- goto out;
+ return ret;
}
if (frontswap_store(&folio->page) == 0) {
folio_start_writeback(folio);
folio_unlock(folio);
folio_end_writeback(folio);
- goto out;
+ return 0;
}
- ret = __swap_writepage(&folio->page, wbc);
- out:
- return ret;
+ __swap_writepage(&folio->page, wbc);
+ return 0;
}
static inline void count_swpout_vm_event(struct page *page)
mempool_free(sio, sio_pool);
}
- static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
+ static void swap_writepage_fs(struct page *page, struct writeback_control *wbc)
{
struct swap_iocb *sio = NULL;
struct swap_info_struct *sis = page_swap_info(page);
sio->pages = 0;
sio->len = 0;
}
- sio->bvec[sio->pages].bv_page = page;
- sio->bvec[sio->pages].bv_len = thp_size(page);
- sio->bvec[sio->pages].bv_offset = 0;
+ bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0);
sio->len += thp_size(page);
sio->pages += 1;
if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) {
}
if (wbc->swap_plug)
*wbc->swap_plug = sio;
-
- return 0;
}
- int __swap_writepage(struct page *page, struct writeback_control *wbc)
+ static void swap_writepage_bdev_sync(struct page *page,
+ struct writeback_control *wbc, struct swap_info_struct *sis)
{
- struct bio *bio;
- int ret;
- struct swap_info_struct *sis = page_swap_info(page);
+ struct bio_vec bv;
+ struct bio bio;
- VM_BUG_ON_PAGE(!PageSwapCache(page), page);
- /*
- * ->flags can be updated non-atomicially (scan_swap_map_slots),
- * but that will never affect SWP_FS_OPS, so the data_race
- * is safe.
- */
- if (data_race(sis->flags & SWP_FS_OPS))
- return swap_writepage_fs(page, wbc);
+ bio_init(&bio, sis->bdev, &bv, 1,
+ REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc));
+ bio.bi_iter.bi_sector = swap_page_sector(page);
+ bio_add_page(&bio, page, thp_size(page), 0);
- ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
- if (!ret) {
- count_swpout_vm_event(page);
- return 0;
- }
+ bio_associate_blkg_from_page(&bio, page);
+ count_swpout_vm_event(page);
+
+ set_page_writeback(page);
+ unlock_page(page);
+
+ submit_bio_wait(&bio);
+ __end_swap_bio_write(&bio);
+ }
+
+ static void swap_writepage_bdev_async(struct page *page,
+ struct writeback_control *wbc, struct swap_info_struct *sis)
+ {
+ struct bio *bio;
bio = bio_alloc(sis->bdev, 1,
REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc),
set_page_writeback(page);
unlock_page(page);
submit_bio(bio);
+ }
- return 0;
+ void __swap_writepage(struct page *page, struct writeback_control *wbc)
+ {
+ struct swap_info_struct *sis = page_swap_info(page);
+
+ VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+ /*
+ * ->flags can be updated non-atomicially (scan_swap_map_slots),
+ * but that will never affect SWP_FS_OPS, so the data_race
+ * is safe.
+ */
+ if (data_race(sis->flags & SWP_FS_OPS))
+ swap_writepage_fs(page, wbc);
+ else if (sis->flags & SWP_SYNCHRONOUS_IO)
+ swap_writepage_bdev_sync(page, wbc, sis);
+ else
+ swap_writepage_bdev_async(page, wbc, sis);
}
void swap_write_unplug(struct swap_iocb *sio)
sio->pages = 0;
sio->len = 0;
}
- sio->bvec[sio->pages].bv_page = page;
- sio->bvec[sio->pages].bv_len = thp_size(page);
- sio->bvec[sio->pages].bv_offset = 0;
+ bvec_set_page(&sio->bvec[sio->pages], page, thp_size(page), 0);
sio->len += thp_size(page);
sio->pages += 1;
if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) {
*plug = sio;
}
- int swap_readpage(struct page *page, bool synchronous,
- struct swap_iocb **plug)
+ static void swap_readpage_bdev_sync(struct page *page,
+ struct swap_info_struct *sis)
+ {
+ struct bio_vec bv;
+ struct bio bio;
+
+ bio_init(&bio, sis->bdev, &bv, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = swap_page_sector(page);
+ bio_add_page(&bio, page, thp_size(page), 0);
+ /*
+ * Keep this task valid during swap readpage because the oom killer may
+ * attempt to access it in the page fault retry time check.
+ */
+ get_task_struct(current);
+ count_vm_event(PSWPIN);
+ submit_bio_wait(&bio);
+ __end_swap_bio_read(&bio);
+ put_task_struct(current);
+ }
+
+ static void swap_readpage_bdev_async(struct page *page,
+ struct swap_info_struct *sis)
{
struct bio *bio;
- int ret = 0;
+
+ bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
+ bio->bi_iter.bi_sector = swap_page_sector(page);
+ bio->bi_end_io = end_swap_bio_read;
+ bio_add_page(bio, page, thp_size(page), 0);
+ count_vm_event(PSWPIN);
+ submit_bio(bio);
+ }
+
+ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug)
+ {
struct swap_info_struct *sis = page_swap_info(page);
bool workingset = PageWorkingset(page);
unsigned long pflags;
if (frontswap_load(page) == 0) {
SetPageUptodate(page);
unlock_page(page);
- goto out;
- }
-
- if (data_race(sis->flags & SWP_FS_OPS)) {
+ } else if (data_race(sis->flags & SWP_FS_OPS)) {
swap_readpage_fs(page, plug);
- goto out;
- }
-
- if (sis->flags & SWP_SYNCHRONOUS_IO) {
- ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
- if (!ret) {
- count_vm_event(PSWPIN);
- goto out;
- }
- }
-
- ret = 0;
- bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
- bio->bi_iter.bi_sector = swap_page_sector(page);
- bio->bi_end_io = end_swap_bio_read;
- bio_add_page(bio, page, thp_size(page), 0);
- /*
- * Keep this task valid during swap readpage because the oom killer may
- * attempt to access it in the page fault retry time check.
- */
- if (synchronous) {
- get_task_struct(current);
- bio->bi_private = current;
- }
- count_vm_event(PSWPIN);
- bio_get(bio);
- submit_bio(bio);
- while (synchronous) {
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (!READ_ONCE(bio->bi_private))
- break;
-
- blk_io_schedule();
+ } else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) {
+ swap_readpage_bdev_sync(page, sis);
+ } else {
+ swap_readpage_bdev_async(page, sis);
}
- __set_current_state(TASK_RUNNING);
- bio_put(bio);
- out:
if (workingset) {
delayacct_thrashing_end(&in_thrashing);
psi_memstall_leave(&pflags);
}
delayacct_swapin_end();
- return ret;
}
void __swap_read_unplug(struct swap_iocb *sio)
if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len))
return -EAGAIN;
- vma->vm_flags |= VM_LOCKED | VM_DONTDUMP;
+ vm_flags_set(vma, VM_LOCKED | VM_DONTDUMP);
vma->vm_ops = &secretmem_vm_ops;
return 0;
.migrate_folio = secretmem_migrate_folio,
};
-static int secretmem_setattr(struct user_namespace *mnt_userns,
+static int secretmem_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *iattr)
{
struct inode *inode = d_inode(dentry);
if ((ia_valid & ATTR_SIZE) && inode->i_size)
ret = -EINVAL;
else
- ret = simple_setattr(mnt_userns, dentry, iattr);
+ ret = simple_setattr(idmap, dentry, iattr);
filemap_invalidate_unlock(mapping);
static struct file *secretmem_file_create(unsigned long flags)
{
- struct file *file = ERR_PTR(-ENOMEM);
+ struct file *file;
struct inode *inode;
const char *anon_name = "[secretmem]";
const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name));
#include <linux/random.h>
#include <linux/sched/signal.h>
#include <linux/export.h>
+ #include <linux/shmem_fs.h>
#include <linux/swap.h>
#include <linux/uio.h>
#include <linux/hugetlb.h>
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/backing-dev.h>
- #include <linux/shmem_fs.h>
#include <linux/writeback.h>
#include <linux/pagevec.h>
#include <linux/percpu_counter.h>
static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER;
- bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode,
- pgoff_t index, bool shmem_huge_force)
+ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
+ struct mm_struct *mm, unsigned long vm_flags)
{
loff_t i_size;
if (!S_ISREG(inode->i_mode))
return false;
- if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
+ if (mm && ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &mm->flags)))
return false;
if (shmem_huge == SHMEM_HUGE_DENY)
return false;
return true;
fallthrough;
case SHMEM_HUGE_ADVISE:
- if (vma && (vma->vm_flags & VM_HUGEPAGE))
+ if (mm && (vm_flags & VM_HUGEPAGE))
return true;
fallthrough;
default:
#define shmem_huge SHMEM_HUGE_DENY
- bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode,
- pgoff_t index, bool shmem_huge_force)
+ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
+ struct mm_struct *mm, unsigned long vm_flags)
{
return false;
}
}
EXPORT_SYMBOL_GPL(shmem_truncate_range);
-static int shmem_getattr(struct user_namespace *mnt_userns,
+static int shmem_getattr(struct mnt_idmap *idmap,
const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
stat->attributes_mask |= (STATX_ATTR_APPEND |
STATX_ATTR_IMMUTABLE |
STATX_ATTR_NODUMP);
- generic_fillattr(&init_user_ns, inode, stat);
+ generic_fillattr(idmap, inode, stat);
- if (shmem_is_huge(NULL, inode, 0, false))
+ if (shmem_is_huge(inode, 0, false, NULL, 0))
stat->blksize = HPAGE_PMD_SIZE;
if (request_mask & STATX_BTIME) {
return 0;
}
-static int shmem_setattr(struct user_namespace *mnt_userns,
+static int shmem_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
bool update_mtime = false;
bool update_ctime = true;
- error = setattr_prepare(&init_user_ns, dentry, attr);
+ error = setattr_prepare(idmap, dentry, attr);
if (error)
return error;
+ if ((info->seals & F_SEAL_EXEC) && (attr->ia_valid & ATTR_MODE)) {
+ if ((inode->i_mode ^ attr->ia_mode) & 0111) {
+ return -EPERM;
+ }
+ }
+
if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
loff_t oldsize = inode->i_size;
loff_t newsize = attr->ia_size;
}
}
- setattr_copy(&init_user_ns, inode, attr);
+ setattr_copy(idmap, inode, attr);
if (attr->ia_valid & ATTR_MODE)
- error = posix_acl_chmod(&init_user_ns, dentry, inode->i_mode);
+ error = posix_acl_chmod(idmap, dentry, inode->i_mode);
if (!error && update_ctime) {
inode->i_ctime = current_time(inode);
if (update_mtime)
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
struct mm_struct *charge_mm = vma ? vma->vm_mm : NULL;
+ struct swap_info_struct *si;
struct folio *folio = NULL;
swp_entry_t swap;
int error;
if (is_swapin_error_entry(swap))
return -EIO;
+ si = get_swap_device(swap);
+ if (!si) {
+ if (!shmem_confirm_swap(mapping, index, swap))
+ return -EEXIST;
+ else
+ return -EINVAL;
+ }
+
/* Look it up and read it in.. */
folio = swap_cache_get_folio(swap, NULL, 0);
if (!folio) {
delete_from_swap_cache(folio);
folio_mark_dirty(folio);
swap_free(swap);
+ put_swap_device(si);
*foliop = folio;
return 0;
folio_unlock(folio);
folio_put(folio);
}
+ put_swap_device(si);
return error;
}
return 0;
}
- if (!shmem_is_huge(vma, inode, index, false))
+ if (!shmem_is_huge(inode, index, false,
+ vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
goto alloc_nohuge;
huge_gfp = vma_thp_gfp_mask(vma);
return ret;
/* arm64 - allow memory tagging on RAM-based files */
- vma->vm_flags |= VM_MTE_ALLOWED;
+ vm_flags_set(vma, VM_MTE_ALLOWED);
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
#define shmem_initxattrs NULL
#endif
-static struct inode *shmem_get_inode(struct super_block *sb, struct inode *dir,
- umode_t mode, dev_t dev, unsigned long flags)
+static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block *sb,
+ struct inode *dir, umode_t mode, dev_t dev,
+ unsigned long flags)
{
struct inode *inode;
struct shmem_inode_info *info;
inode = new_inode(sb);
if (inode) {
inode->i_ino = ino;
- inode_init_owner(&init_user_ns, inode, dir, mode);
+ inode_init_owner(idmap, inode, dir, mode);
inode->i_blocks = 0;
inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
inode->i_generation = get_random_u32();
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
{
+ struct folio *folio = page_folio(page);
struct inode *inode = mapping->host;
if (pos + copied > inode->i_size)
i_size_write(inode, pos + copied);
- if (!PageUptodate(page)) {
- struct page *head = compound_head(page);
- if (PageTransCompound(page)) {
- int i;
-
- for (i = 0; i < HPAGE_PMD_NR; i++) {
- if (head + i == page)
- continue;
- clear_highpage(head + i);
- flush_dcache_page(head + i);
- }
- }
- if (copied < PAGE_SIZE) {
- unsigned from = pos & (PAGE_SIZE - 1);
- zero_user_segments(page, 0, from,
- from + copied, PAGE_SIZE);
+ if (!folio_test_uptodate(folio)) {
+ if (copied < folio_size(folio)) {
+ size_t from = offset_in_folio(folio, pos);
+ folio_zero_segments(folio, 0, from,
+ from + copied, folio_size(folio));
}
- SetPageUptodate(head);
+ folio_mark_uptodate(folio);
}
- set_page_dirty(page);
- unlock_page(page);
- put_page(page);
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ folio_put(folio);
return copied;
}
* File creation. Allocate an inode, and we're done..
*/
static int
-shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir,
+shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, dev_t dev)
{
struct inode *inode;
int error = -ENOSPC;
- inode = shmem_get_inode(dir->i_sb, dir, mode, dev, VM_NORESERVE);
+ inode = shmem_get_inode(idmap, dir->i_sb, dir, mode, dev, VM_NORESERVE);
if (inode) {
error = simple_acl_create(dir, inode);
if (error)
}
static int
-shmem_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+shmem_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
struct file *file, umode_t mode)
{
struct inode *inode;
int error = -ENOSPC;
- inode = shmem_get_inode(dir->i_sb, dir, mode, 0, VM_NORESERVE);
+ inode = shmem_get_inode(idmap, dir->i_sb, dir, mode, 0, VM_NORESERVE);
if (inode) {
error = security_inode_init_security(inode, dir,
NULL,
return error;
}
-static int shmem_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
+static int shmem_mkdir(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode)
{
int error;
- if ((error = shmem_mknod(&init_user_ns, dir, dentry,
- mode | S_IFDIR, 0)))
+ error = shmem_mknod(idmap, dir, dentry, mode | S_IFDIR, 0);
+ if (error)
return error;
inc_nlink(dir);
return 0;
}
-static int shmem_create(struct user_namespace *mnt_userns, struct inode *dir,
+static int shmem_create(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode, bool excl)
{
- return shmem_mknod(&init_user_ns, dir, dentry, mode | S_IFREG, 0);
+ return shmem_mknod(idmap, dir, dentry, mode | S_IFREG, 0);
}
/*
return shmem_unlink(dir, dentry);
}
-static int shmem_whiteout(struct user_namespace *mnt_userns,
+static int shmem_whiteout(struct mnt_idmap *idmap,
struct inode *old_dir, struct dentry *old_dentry)
{
struct dentry *whiteout;
if (!whiteout)
return -ENOMEM;
- error = shmem_mknod(&init_user_ns, old_dir, whiteout,
+ error = shmem_mknod(idmap, old_dir, whiteout,
S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
dput(whiteout);
if (error)
* it exists so that the VFS layer correctly free's it when it
* gets overwritten.
*/
-static int shmem_rename2(struct user_namespace *mnt_userns,
+static int shmem_rename2(struct mnt_idmap *idmap,
struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
if (flags & RENAME_WHITEOUT) {
int error;
- error = shmem_whiteout(&init_user_ns, old_dir, old_dentry);
+ error = shmem_whiteout(idmap, old_dir, old_dentry);
if (error)
return error;
}
return 0;
}
-static int shmem_symlink(struct user_namespace *mnt_userns, struct inode *dir,
+static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, const char *symname)
{
int error;
if (len > PAGE_SIZE)
return -ENAMETOOLONG;
- inode = shmem_get_inode(dir->i_sb, dir, S_IFLNK | 0777, 0,
+ inode = shmem_get_inode(idmap, dir->i_sb, dir, S_IFLNK | 0777, 0,
VM_NORESERVE);
if (!inode)
return -ENOSPC;
return 0;
}
-static int shmem_fileattr_set(struct user_namespace *mnt_userns,
+static int shmem_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa)
{
struct inode *inode = d_inode(dentry);
}
static int shmem_xattr_handler_set(const struct xattr_handler *handler,
- struct user_namespace *mnt_userns,
+ struct mnt_idmap *idmap,
struct dentry *unused, struct inode *inode,
const char *name, const void *value,
size_t size, int flags)
#endif
uuid_gen(&sb->s_uuid);
- inode = shmem_get_inode(sb, NULL, S_IFDIR | sbinfo->mode, 0, VM_NORESERVE);
+ inode = shmem_get_inode(&nop_mnt_idmap, sb, NULL, S_IFDIR | sbinfo->mode, 0,
+ VM_NORESERVE);
if (!inode)
goto failed;
inode->i_uid = sbinfo->uid;
.parameters = shmem_fs_parameters,
#endif
.kill_sb = kill_litter_super,
+#ifdef CONFIG_SHMEM
+ .fs_flags = FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
+#else
.fs_flags = FS_USERNS_MOUNT,
+#endif
};
void __init shmem_init(void)
#define shmem_vm_ops generic_file_vm_ops
#define shmem_anon_vm_ops generic_file_vm_ops
#define shmem_file_operations ramfs_file_operations
-#define shmem_get_inode(sb, dir, mode, dev, flags) ramfs_get_inode(sb, dir, mode, dev)
+#define shmem_get_inode(idmap, sb, dir, mode, dev, flags) ramfs_get_inode(sb, dir, mode, dev)
#define shmem_acct_size(flags, size) 0
#define shmem_unacct_size(flags, size) do {} while (0)
if (shmem_acct_size(flags, size))
return ERR_PTR(-ENOMEM);
- inode = shmem_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0,
- flags);
+ if (is_idmapped_mnt(mnt))
+ return ERR_PTR(-EINVAL);
+
+ inode = shmem_get_inode(&nop_mnt_idmap, mnt->mnt_sb, NULL,
+ S_IFREG | S_IRWXUGO, 0, flags);
if (unlikely(!inode)) {
shmem_unacct_size(flags, size);
return ERR_PTR(-ENOSPC);
}
/**
- * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags.
- * @mapping: the page's address_space
- * @index: the page index
+ * shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
+ * @mapping: the folio's address_space
+ * @index: the folio index
* @gfp: the page allocator flags to use if allocating
*
* This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
* i915_gem_object_get_pages_gtt() mixes __GFP_NORETRY | __GFP_NOWARN in
* with the mapping_gfp_mask(), to avoid OOMing the machine unnecessarily.
*/
- struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp)
+ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
+ pgoff_t index, gfp_t gfp)
{
#ifdef CONFIG_SHMEM
struct inode *inode = mapping->host;
struct folio *folio;
- struct page *page;
int error;
BUG_ON(!shmem_mapping(mapping));
return ERR_PTR(error);
folio_unlock(folio);
+ return folio;
+ #else
+ /*
+ * The tiny !SHMEM case uses ramfs without swap
+ */
+ return mapping_read_folio_gfp(mapping, index, gfp);
+ #endif
+ }
+ EXPORT_SYMBOL_GPL(shmem_read_folio_gfp);
+
+ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+ pgoff_t index, gfp_t gfp)
+ {
+ struct folio *folio = shmem_read_folio_gfp(mapping, index, gfp);
+ struct page *page;
+
+ if (IS_ERR(folio))
+ return &folio->page;
+
page = folio_file_page(folio, index);
if (PageHWPoison(page)) {
folio_put(folio);
}
return page;
- #else
- /*
- * The tiny !SHMEM case uses ramfs without swap
- */
- return read_cache_page_gfp(mapping, index, gfp);
- #endif
}
EXPORT_SYMBOL_GPL(shmem_read_mapping_page_gfp);
static inline void fixup_slab_list(struct kmem_cache *cachep,
struct kmem_cache_node *n, struct slab *slab,
void **list);
-static int slab_early_init = 1;
#define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node))
slab_state = PARTIAL_NODE;
setup_kmalloc_cache_index_table();
- slab_early_init = 0;
-
/* 5) Replace the bootstrap kmem_cache_node */
{
int nid;
/* Make the flag visible before any changes to folio->mapping */
smp_wmb();
/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
- if (sk_memalloc_socks() && page_is_pfmemalloc(folio_page(folio, 0)))
+ if (sk_memalloc_socks() && folio_is_pfmemalloc(folio))
slab_set_pfmemalloc(slab);
return slab;
BUG_ON(!folio_test_slab(folio));
__slab_clear_pfmemalloc(slab);
- page_mapcount_reset(folio_page(folio, 0));
+ page_mapcount_reset(&folio->page);
folio->mapping = NULL;
/* Make the mapping reset visible before clearing the flag */
smp_wmb();
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += 1 << order;
unaccount_slab(slab, order, cachep);
- __free_pages(folio_page(folio, 0), order);
+ __free_pages(&folio->page, order);
}
static void kmem_rcu_free(struct rcu_head *head)
}
#if DEBUG
-static bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
+static inline bool is_debug_pagealloc_cache(struct kmem_cache *cachep)
{
- if (debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
- (cachep->size % PAGE_SIZE) == 0)
- return true;
-
- return false;
+ return debug_pagealloc_enabled_static() && OFF_SLAB(cachep) &&
+ ((cachep->size % PAGE_SIZE) == 0);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
raw_spin_unlock_irq(&n->list_lock);
slab_destroy(cache, slab);
nr_freed++;
+
+ cond_resched();
}
out:
return nr_freed;
int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
void **p)
{
- size_t i;
struct obj_cgroup *objcg = NULL;
+ unsigned long irqflags;
+ size_t i;
s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (!s)
return 0;
- local_irq_disable();
+ local_irq_save(irqflags);
for (i = 0; i < size; i++) {
void *objp = kfence_alloc(s, s->object_size, flags) ?:
__do_cache_alloc(s, flags, NUMA_NO_NODE);
goto error;
p[i] = objp;
}
- local_irq_enable();
+ local_irq_restore(irqflags);
cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
/* FIXME: Trace call missing. Christoph would like a bulk variant */
return size;
error:
- local_irq_enable();
+ local_irq_restore(irqflags);
cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_);
slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size);
kmem_cache_free_bulk(s, i, p);
void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
{
+ unsigned long flags;
- local_irq_disable();
+ local_irq_save(flags);
for (int i = 0; i < size; i++) {
void *objp = p[i];
struct kmem_cache *s;
/* called via kfree_bulk */
if (!folio_test_slab(folio)) {
- local_irq_enable();
+ local_irq_restore(flags);
free_large_kmalloc(folio, objp);
- local_irq_disable();
+ local_irq_save(flags);
continue;
}
s = folio_slab(folio)->slab_cache;
__cache_free(s, objp, _RET_IP_);
}
- local_irq_enable();
+ local_irq_restore(flags);
/* FIXME: add tracing */
}
} else {
slab_list_specified = true;
if (flags & SLAB_STORE_USER)
- stack_depot_want_early_init();
+ stack_depot_request_early_init();
}
}
out:
slub_debug = global_flags;
if (slub_debug & SLAB_STORE_USER)
- stack_depot_want_early_init();
+ stack_depot_request_early_init();
if (slub_debug != 0 || slub_debug_string)
static_branch_enable(&slub_debug_enabled);
else
__folio_set_slab(folio);
/* Make the flag visible before any changes to folio->mapping */
smp_wmb();
- if (page_is_pfmemalloc(folio_page(folio, 0)))
+ if (folio_is_pfmemalloc(folio))
slab_set_pfmemalloc(slab);
return slab;
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
unaccount_slab(slab, order, s);
- __free_pages(folio_page(folio, 0), order);
+ __free_pages(&folio->page, order);
}
static void rcu_free_slab(struct rcu_head *h)
size_t size, void **p, struct obj_cgroup *objcg)
{
struct kmem_cache_cpu *c;
+ unsigned long irqflags;
int i;
/*
* handlers invoking normal fastpath.
*/
c = slub_get_cpu_ptr(s->cpu_slab);
- local_lock_irq(&s->cpu_slab->lock);
+ local_lock_irqsave(&s->cpu_slab->lock, irqflags);
for (i = 0; i < size; i++) {
void *object = kfence_alloc(s, s->object_size, flags);
*/
c->tid = next_tid(c->tid);
- local_unlock_irq(&s->cpu_slab->lock);
+ local_unlock_irqrestore(&s->cpu_slab->lock, irqflags);
/*
* Invoking slow path likely have side-effect
c = this_cpu_ptr(s->cpu_slab);
maybe_wipe_obj_freeptr(s, p[i]);
- local_lock_irq(&s->cpu_slab->lock);
+ local_lock_irqsave(&s->cpu_slab->lock, irqflags);
continue; /* goto for-loop */
}
maybe_wipe_obj_freeptr(s, p[i]);
}
c->tid = next_tid(c->tid);
- local_unlock_irq(&s->cpu_slab->lock);
+ local_unlock_irqrestore(&s->cpu_slab->lock, irqflags);
slub_put_cpu_ptr(s->cpu_slab);
return i;
void debugfs_slab_release(struct kmem_cache *s)
{
- debugfs_remove_recursive(debugfs_lookup(s->name, slab_debugfs_root));
+ debugfs_lookup_and_remove(s->name, slab_debugfs_root);
}
static int __init slab_debugfs_init(void)
}
EXPORT_SYMBOL(put_pages_list);
-/*
- * get_kernel_pages() - pin kernel pages in memory
- * @kiov: An array of struct kvec structures
- * @nr_segs: number of segments to pin
- * @write: pinning for read/write, currently ignored
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_segs long.
- *
- * Returns number of pages pinned. This may be fewer than the number requested.
- * If nr_segs is 0 or negative, returns 0. If no pages were pinned, returns 0.
- * Each page returned must be released with a put_page() call when it is
- * finished with.
- */
-int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
- struct page **pages)
-{
- int seg;
-
- for (seg = 0; seg < nr_segs; seg++) {
- if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
- return seg;
-
- pages[seg] = kmap_to_page(kiov[seg].iov_base);
- get_page(pages[seg]);
- }
-
- return seg;
-}
-EXPORT_SYMBOL_GPL(get_kernel_pages);
-
typedef void (*move_fn_t)(struct lruvec *lruvec, struct folio *folio);
static void lru_add_fn(struct lruvec *lruvec, struct folio *folio)
* Is an smp_mb__after_atomic() still required here, before
* folio_evictable() tests the mlocked flag, to rule out the possibility
* of stranding an evictable folio on an unevictable LRU? I think
- * not, because __munlock_page() only clears the mlocked flag
+ * not, because __munlock_folio() only clears the mlocked flag
* while the LRU lock is held.
*
* (That is not true of __page_cache_release(), and not necessarily
folio_set_unevictable(folio);
/*
* folio->mlock_count = !!folio_test_mlocked(folio)?
- * But that leaves __mlock_page() in doubt whether another
+ * But that leaves __mlock_folio() in doubt whether another
* actor has already counted the mlock or not. Err on the
* safe side, underestimate, let page reclaim fix it, rather
* than leaving a page on the unevictable LRU indefinitely.
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
if (unlikely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED))
- mlock_new_page(&folio->page);
+ mlock_new_folio(folio);
else
folio_add_lru(folio);
}
}
/*
- * deactivate_page - deactivate a page
- * @page: page to deactivate
+ * folio_deactivate - deactivate a folio
+ * @folio: folio to deactivate
*
- * deactivate_page() moves @page to the inactive list if @page was on the active
- * list and was not an unevictable page. This is done to accelerate the reclaim
- * of @page.
+ * folio_deactivate() moves @folio to the inactive list if @folio was on the
+ * active list and was not unevictable. This is done to accelerate the
+ * reclaim of @folio.
*/
- void deactivate_page(struct page *page)
+ void folio_deactivate(struct folio *folio)
{
- struct folio *folio = page_folio(page);
-
if (folio_test_lru(folio) && !folio_test_unevictable(folio) &&
(folio_test_active(folio) || lru_gen_enabled())) {
struct folio_batch *fbatch;
}
/**
- * mark_page_lazyfree - make an anon page lazyfree
- * @page: page to deactivate
+ * folio_mark_lazyfree - make an anon folio lazyfree
+ * @folio: folio to deactivate
*
- * mark_page_lazyfree() moves @page to the inactive file list.
- * This is done to accelerate the reclaim of @page.
+ * folio_mark_lazyfree() moves @folio to the inactive file list.
+ * This is done to accelerate the reclaim of @folio.
*/
- void mark_page_lazyfree(struct page *page)
+ void folio_mark_lazyfree(struct folio *folio)
{
- struct folio *folio = page_folio(page);
-
if (folio_test_lru(folio) && folio_test_anon(folio) &&
folio_test_swapbacked(folio) && !folio_test_swapcache(folio) &&
!folio_test_unevictable(folio)) {
local_lock(&cpu_fbatches.lock);
lru_add_drain_cpu(smp_processor_id());
local_unlock(&cpu_fbatches.lock);
- mlock_page_drain_local();
+ mlock_drain_local();
}
/*
lru_add_drain_cpu(smp_processor_id());
local_unlock(&cpu_fbatches.lock);
invalidate_bh_lrus_cpu();
- mlock_page_drain_local();
+ mlock_drain_local();
}
void lru_add_drain_cpu_zone(struct zone *zone)
lru_add_drain_cpu(smp_processor_id());
drain_local_pages(zone);
local_unlock(&cpu_fbatches.lock);
- mlock_page_drain_local();
+ mlock_drain_local();
}
#ifdef CONFIG_SMP
folio_batch_count(&fbatches->lru_deactivate) ||
folio_batch_count(&fbatches->lru_lazyfree) ||
folio_batch_count(&fbatches->activate) ||
- need_mlock_page_drain(cpu) ||
+ need_mlock_drain(cpu) ||
has_bh_in_lru(cpu, NULL);
}
fbatch->nr = j;
}
- unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
- struct address_space *mapping, pgoff_t *index, pgoff_t end,
- xa_mark_t tag)
- {
- pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
- PAGEVEC_SIZE, pvec->pages);
- return pagevec_count(pvec);
- }
- EXPORT_SYMBOL(pagevec_lookup_range_tag);
-
/*
* Perform any setup for the swap system
*/
spin_unlock(&si->lock);
if (n_ret || size == SWAPFILE_CLUSTER)
goto check_out;
- pr_debug("scan_swap_map of si %d failed to find offset\n",
- si->type);
cond_resched();
spin_lock(&swap_avail_lock);
pte_t *pte;
struct swap_info_struct *si;
int ret = 0;
- volatile unsigned char *swap_map;
si = swap_info[type];
pte = pte_offset_map(pmd, addr);
do {
struct folio *folio;
unsigned long offset;
+ unsigned char swp_count;
if (!is_swap_pte(*pte))
continue;
offset = swp_offset(entry);
pte_unmap(pte);
- swap_map = &si->swap_map[offset];
folio = swap_cache_get_folio(entry, vma, addr);
if (!folio) {
struct page *page;
folio = page_folio(page);
}
if (!folio) {
- if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD)
+ swp_count = READ_ONCE(si->swap_map[offset]);
+ if (swp_count == 0 || swp_count == SWAP_MAP_BAD)
goto try_next;
+
return -ENOMEM;
}
if (p->bdev && bdev_stable_writes(p->bdev))
p->flags |= SWP_STABLE_WRITES;
- if (p->bdev && p->bdev->bd_disk->fops->rw_page)
+ if (p->bdev && bdev_synchronous(p->bdev))
p->flags |= SWP_SYNCHRONOUS_IO;
if (p->bdev && bdev_nonrot(p->bdev)) {
* We've already scheduled a throttle, avoid taking the global swap
* lock.
*/
- if (current->throttle_queue)
+ if (current->throttle_disk)
return;
spin_lock(&swap_avail_lock);
};
static DEFINE_PER_CPU(struct vfree_deferred, vfree_deferred);
- static void __vunmap(const void *, int);
-
- static void free_work(struct work_struct *w)
- {
- struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
- struct llist_node *t, *llnode;
-
- llist_for_each_safe(llnode, t, llist_del_all(&p->list))
- __vunmap((void *)llnode, 1);
- }
-
/*** Page table manipulation functions ***/
static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
#endif
return is_vmalloc_addr(x);
}
+EXPORT_SYMBOL_GPL(is_vmalloc_or_module_addr);
/*
* Walk a vmap address to the struct page it maps. Huge vmap mappings will
static struct vmap_area *alloc_vmap_area(unsigned long size,
unsigned long align,
unsigned long vstart, unsigned long vend,
- int node, gfp_t gfp_mask)
+ int node, gfp_t gfp_mask,
+ unsigned long va_flags)
{
struct vmap_area *va;
unsigned long freed;
int purged = 0;
int ret;
- BUG_ON(!size);
- BUG_ON(offset_in_page(size));
- BUG_ON(!is_power_of_2(align));
+ if (unlikely(!size || offset_in_page(size) || !is_power_of_2(align)))
+ return ERR_PTR(-EINVAL);
if (unlikely(!vmap_initialized))
return ERR_PTR(-EBUSY);
va->va_start = addr;
va->va_end = addr + size;
va->vm = NULL;
+ va->flags = va_flags;
spin_lock(&vmap_area_lock);
insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
}
/*
- * Free a vmap area, caller ensuring that the area has been unmapped
- * and flush_cache_vunmap had been called for the correct range
- * previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped,
+ * unlinked and flush_cache_vunmap had been called for the correct
+ * range previously.
*/
static void free_vmap_area_noflush(struct vmap_area *va)
{
unsigned long va_start = va->va_start;
unsigned long nr_lazy;
- spin_lock(&vmap_area_lock);
- unlink_va(va, &vmap_area_root);
- spin_unlock(&vmap_area_lock);
+ if (WARN_ON_ONCE(!list_empty(&va->list)))
+ return;
nr_lazy = atomic_long_add_return((va->va_end - va->va_start) >>
PAGE_SHIFT, &vmap_lazy_nr);
return va;
}
+ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
+ {
+ struct vmap_area *va;
+
+ spin_lock(&vmap_area_lock);
+ va = __find_vmap_area(addr, &vmap_area_root);
+ if (va)
+ unlink_va(va, &vmap_area_root);
+ spin_unlock(&vmap_area_lock);
+
+ return va;
+ }
+
/*** Per cpu kva allocator ***/
/*
#define VMAP_BLOCK_SIZE (VMAP_BBMAP_BITS * PAGE_SIZE)
+ #define VMAP_RAM 0x1 /* indicates vm_map_ram area*/
+ #define VMAP_BLOCK 0x2 /* mark out the vmap_block sub-type*/
+ #define VMAP_FLAGS_MASK 0x3
+
struct vmap_block_queue {
spinlock_t lock;
struct list_head free;
spinlock_t lock;
struct vmap_area *va;
unsigned long free, dirty;
+ DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
unsigned long dirty_min, dirty_max; /*< dirty range */
struct list_head free_list;
struct rcu_head rcu_head;
va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
VMALLOC_START, VMALLOC_END,
- node, gfp_mask);
+ node, gfp_mask,
+ VMAP_RAM|VMAP_BLOCK);
if (IS_ERR(va)) {
kfree(vb);
return ERR_CAST(va);
vb->va = va;
/* At least something should be left free */
BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+ bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
vb->free = VMAP_BBMAP_BITS - (1UL << order);
vb->dirty = 0;
vb->dirty_min = VMAP_BBMAP_BITS;
vb->dirty_max = 0;
+ bitmap_set(vb->used_map, 0, (1UL << order));
INIT_LIST_HEAD(&vb->free_list);
vb_idx = addr_to_vb_idx(va->va_start);
tmp = xa_erase(&vmap_blocks, addr_to_vb_idx(vb->va->va_start));
BUG_ON(tmp != vb);
+ spin_lock(&vmap_area_lock);
+ unlink_va(vb->va, &vmap_area_root);
+ spin_unlock(&vmap_area_lock);
+
free_vmap_area_noflush(vb->va);
kfree_rcu(vb, rcu_head);
}
pages_off = VMAP_BBMAP_BITS - vb->free;
vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
vb->free -= 1UL << order;
+ bitmap_set(vb->used_map, pages_off, (1UL << order));
if (vb->free == 0) {
spin_lock(&vbq->lock);
list_del_rcu(&vb->free_list);
order = get_order(size);
offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
+ spin_lock(&vb->lock);
+ bitmap_clear(vb->used_map, offset, (1UL << order));
+ spin_unlock(&vb->lock);
vunmap_range_noflush(addr, addr + size);
return;
}
- va = find_vmap_area(addr);
- BUG_ON(!va);
+ va = find_unlink_vmap_area(addr);
+ if (WARN_ON_ONCE(!va))
+ return;
+
debug_check_no_locks_freed((void *)va->va_start,
(va->va_end - va->va_start));
free_unmap_vmap_area(va);
} else {
struct vmap_area *va;
va = alloc_vmap_area(size, PAGE_SIZE,
- VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
+ VMALLOC_START, VMALLOC_END,
+ node, GFP_KERNEL, VMAP_RAM);
if (IS_ERR(va))
return NULL;
}
}
- void __init vmalloc_init(void)
- {
- struct vmap_area *va;
- struct vm_struct *tmp;
- int i;
-
- /*
- * Create the cache for vmap_area objects.
- */
- vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
-
- for_each_possible_cpu(i) {
- struct vmap_block_queue *vbq;
- struct vfree_deferred *p;
-
- vbq = &per_cpu(vmap_block_queue, i);
- spin_lock_init(&vbq->lock);
- INIT_LIST_HEAD(&vbq->free);
- p = &per_cpu(vfree_deferred, i);
- init_llist_head(&p->list);
- INIT_WORK(&p->wq, free_work);
- }
-
- /* Import existing vmlist entries. */
- for (tmp = vmlist; tmp; tmp = tmp->next) {
- va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
- if (WARN_ON_ONCE(!va))
- continue;
-
- va->va_start = (unsigned long)tmp->addr;
- va->va_end = va->va_start + tmp->size;
- va->vm = tmp;
- insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
- }
-
- /*
- * Now we can initialize a free vmap space.
- */
- vmap_init_free_space();
- vmap_initialized = true;
- }
-
static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
struct vmap_area *va, unsigned long flags, const void *caller)
{
if (!(flags & VM_NO_GUARD))
size += PAGE_SIZE;
- va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
+ va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
if (IS_ERR(va)) {
kfree(area);
return NULL;
struct vm_struct *remove_vm_area(const void *addr)
{
struct vmap_area *va;
+ struct vm_struct *vm;
might_sleep();
- spin_lock(&vmap_area_lock);
- va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
- if (va && va->vm) {
- struct vm_struct *vm = va->vm;
-
- va->vm = NULL;
- spin_unlock(&vmap_area_lock);
+ if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+ addr))
+ return NULL;
- kasan_free_module_shadow(vm);
- free_unmap_vmap_area(va);
+ va = find_unlink_vmap_area((unsigned long)addr);
+ if (!va || !va->vm)
+ return NULL;
+ vm = va->vm;
- return vm;
- }
+ debug_check_no_locks_freed(vm->addr, get_vm_area_size(vm));
+ debug_check_no_obj_freed(vm->addr, get_vm_area_size(vm));
+ kasan_free_module_shadow(vm);
+ kasan_poison_vmalloc(vm->addr, get_vm_area_size(vm));
- spin_unlock(&vmap_area_lock);
- return NULL;
+ free_unmap_vmap_area(va);
+ return vm;
}
static inline void set_area_direct_map(const struct vm_struct *area,
set_direct_map(area->pages[i]);
}
- /* Handle removing and resetting vm mappings related to the vm_struct. */
- static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
+ /*
+ * Flush the vm mapping and reset the direct map.
+ */
+ static void vm_reset_perms(struct vm_struct *area)
{
unsigned long start = ULONG_MAX, end = 0;
unsigned int page_order = vm_area_page_order(area);
- int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
int flush_dmap = 0;
int i;
- remove_vm_area(area->addr);
-
- /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */
- if (!flush_reset)
- return;
-
/*
- * If not deallocating pages, just do the flush of the VM area and
- * return.
- */
- if (!deallocate_pages) {
- vm_unmap_aliases();
- return;
- }
-
- /*
- * If execution gets here, flush the vm mapping and reset the direct
- * map. Find the start and end range of the direct mappings to make sure
+ * Find the start and end range of the direct mappings to make sure that
* the vm_unmap_aliases() flush includes the direct map.
*/
for (i = 0; i < area->nr_pages; i += 1U << page_order) {
unsigned long addr = (unsigned long)page_address(area->pages[i]);
+
if (addr) {
unsigned long page_size;
set_area_direct_map(area, set_direct_map_default_noflush);
}
- static void __vunmap(const void *addr, int deallocate_pages)
+ static void delayed_vfree_work(struct work_struct *w)
{
- struct vm_struct *area;
-
- if (!addr)
- return;
-
- if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
- addr))
- return;
-
- area = find_vm_area(addr);
- if (unlikely(!area)) {
- WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
- addr);
- return;
- }
-
- debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
- debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
-
- kasan_poison_vmalloc(area->addr, get_vm_area_size(area));
-
- vm_remove_mappings(area, deallocate_pages);
-
- if (deallocate_pages) {
- int i;
-
- for (i = 0; i < area->nr_pages; i++) {
- struct page *page = area->pages[i];
-
- BUG_ON(!page);
- mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
- /*
- * High-order allocs for huge vmallocs are split, so
- * can be freed as an array of order-0 allocations
- */
- __free_pages(page, 0);
- cond_resched();
- }
- atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
-
- kvfree(area->pages);
- }
-
- kfree(area);
- }
-
- static inline void __vfree_deferred(const void *addr)
- {
- /*
- * Use raw_cpu_ptr() because this can be called from preemptible
- * context. Preemption is absolutely fine here, because the llist_add()
- * implementation is lockless, so it works even if we are adding to
- * another cpu's list. schedule_work() should be fine with this too.
- */
- struct vfree_deferred *p = raw_cpu_ptr(&vfree_deferred);
+ struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
+ struct llist_node *t, *llnode;
- if (llist_add((struct llist_node *)addr, &p->list))
- schedule_work(&p->wq);
+ llist_for_each_safe(llnode, t, llist_del_all(&p->list))
+ vfree(llnode);
}
/**
*/
void vfree_atomic(const void *addr)
{
- BUG_ON(in_nmi());
+ struct vfree_deferred *p = raw_cpu_ptr(&vfree_deferred);
+ BUG_ON(in_nmi());
kmemleak_free(addr);
- if (!addr)
- return;
- __vfree_deferred(addr);
- }
-
- static void __vfree(const void *addr)
- {
- if (unlikely(in_interrupt()))
- __vfree_deferred(addr);
- else
- __vunmap(addr, 1);
+ /*
+ * Use raw_cpu_ptr() because this can be called from preemptible
+ * context. Preemption is absolutely fine here, because the llist_add()
+ * implementation is lockless, so it works even if we are adding to
+ * another cpu's list. schedule_work() should be fine with this too.
+ */
+ if (addr && llist_add((struct llist_node *)addr, &p->list))
+ schedule_work(&p->wq);
}
/**
*/
void vfree(const void *addr)
{
- BUG_ON(in_nmi());
+ struct vm_struct *vm;
+ int i;
- kmemleak_free(addr);
+ if (unlikely(in_interrupt())) {
+ vfree_atomic(addr);
+ return;
+ }
- might_sleep_if(!in_interrupt());
+ BUG_ON(in_nmi());
+ kmemleak_free(addr);
+ might_sleep();
if (!addr)
return;
- __vfree(addr);
+ vm = remove_vm_area(addr);
+ if (unlikely(!vm)) {
+ WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+ addr);
+ return;
+ }
+
+ if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
+ vm_reset_perms(vm);
+ for (i = 0; i < vm->nr_pages; i++) {
+ struct page *page = vm->pages[i];
+
+ BUG_ON(!page);
+ mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+ /*
+ * High-order allocs for huge vmallocs are split, so
+ * can be freed as an array of order-0 allocations
+ */
+ __free_pages(page, 0);
+ cond_resched();
+ }
+ atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+ kvfree(vm->pages);
+ kfree(vm);
}
EXPORT_SYMBOL(vfree);
*/
void vunmap(const void *addr)
{
+ struct vm_struct *vm;
+
BUG_ON(in_interrupt());
might_sleep();
- if (addr)
- __vunmap(addr, 0);
+
+ if (!addr)
+ return;
+ vm = remove_vm_area(addr);
+ if (unlikely(!vm)) {
+ WARN(1, KERN_ERR "Trying to vunmap() nonexistent vm area (%p)\n",
+ addr);
+ return;
+ }
+ kfree(vm);
}
EXPORT_SYMBOL(vunmap);
might_sleep();
+ if (WARN_ON_ONCE(flags & VM_FLUSH_RESET_PERMS))
+ return NULL;
+
/*
* Your top guard is someone else's bottom guard. Not having a top
* guard compromises someone else's mappings too.
int ret;
array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
- gfp_mask |= __GFP_NOWARN;
+
if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
gfp_mask |= __GFP_HIGHMEM;
/*
* If not enough pages were obtained to accomplish an
- * allocation request, free them via __vfree() if any.
+ * allocation request, free them via vfree() if any.
*/
if (area->nr_pages != nr_small_pages) {
warn_alloc(gfp_mask, NULL,
return area->addr;
fail:
- __vfree(area->addr);
+ vfree(area->addr);
return NULL;
}
return copied;
}
+ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
+ {
+ char *start;
+ struct vmap_block *vb;
+ unsigned long offset;
+ unsigned int rs, re, n;
+
+ /*
+ * If it's area created by vm_map_ram() interface directly, but
+ * not further subdividing and delegating management to vmap_block,
+ * handle it here.
+ */
+ if (!(flags & VMAP_BLOCK)) {
+ aligned_vread(buf, addr, count);
+ return;
+ }
+
+ /*
+ * Area is split into regions and tracked with vmap_block, read out
+ * each region and zero fill the hole between regions.
+ */
+ vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
+ if (!vb)
+ goto finished;
+
+ spin_lock(&vb->lock);
+ if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
+ spin_unlock(&vb->lock);
+ goto finished;
+ }
+ for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
+ if (!count)
+ break;
+ start = vmap_block_vaddr(vb->va->va_start, rs);
+ while (addr < start) {
+ if (count == 0)
+ goto unlock;
+ *buf = '\0';
+ buf++;
+ addr++;
+ count--;
+ }
+ /*it could start reading from the middle of used region*/
+ offset = offset_in_page(addr);
+ n = ((re - rs + 1) << PAGE_SHIFT) - offset;
+ if (n > count)
+ n = count;
+ aligned_vread(buf, start+offset, n);
+
+ buf += n;
+ addr += n;
+ count -= n;
+ }
+ unlock:
+ spin_unlock(&vb->lock);
+
+ finished:
+ /* zero-fill the left dirty or free regions */
+ if (count)
+ memset(buf, 0, count);
+ }
+
/**
* vread() - read vmalloc area in a safe way.
* @buf: buffer for reading data
struct vm_struct *vm;
char *vaddr, *buf_start = buf;
unsigned long buflen = count;
- unsigned long n;
+ unsigned long n, size, flags;
addr = kasan_reset_tag(addr);
if (!count)
break;
- if (!va->vm)
+ vm = va->vm;
+ flags = va->flags & VMAP_FLAGS_MASK;
+ /*
+ * VMAP_BLOCK indicates a sub-type of vm_map_ram area, need
+ * be set together with VMAP_RAM.
+ */
+ WARN_ON(flags == VMAP_BLOCK);
+
+ if (!vm && !flags)
continue;
- vm = va->vm;
- vaddr = (char *) vm->addr;
- if (addr >= vaddr + get_vm_area_size(vm))
+ if (vm && (vm->flags & VM_UNINITIALIZED))
+ continue;
+ /* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
+ smp_rmb();
+
+ vaddr = (char *) va->va_start;
+ size = vm ? get_vm_area_size(vm) : va_size(va);
+
+ if (addr >= vaddr + size)
continue;
while (addr < vaddr) {
if (count == 0)
addr++;
count--;
}
- n = vaddr + get_vm_area_size(vm) - addr;
+ n = vaddr + size - addr;
if (n > count)
n = count;
- if (!(vm->flags & VM_IOREMAP))
+
+ if (flags & VMAP_RAM)
+ vmap_ram_vread(buf, addr, n, flags);
+ else if (!(vm->flags & VM_IOREMAP))
aligned_vread(buf, addr, n);
else /* IOREMAP area is treated as memory hole */
memset(buf, 0, n);
size -= PAGE_SIZE;
} while (size > 0);
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}
va = list_entry(p, struct vmap_area, list);
- /*
- * s_show can encounter race with remove_vm_area, !vm on behalf
- * of vmap area is being tear down or vm_map_ram allocation.
- */
if (!va->vm) {
- seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
- (void *)va->va_start, (void *)va->va_end,
- va->va_end - va->va_start);
+ if (va->flags & VMAP_RAM)
+ seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
+ (void *)va->va_start, (void *)va->va_end,
+ va->va_end - va->va_start);
goto final;
}
module_init(proc_vmalloc_init);
#endif
+
+ void __init vmalloc_init(void)
+ {
+ struct vmap_area *va;
+ struct vm_struct *tmp;
+ int i;
+
+ /*
+ * Create the cache for vmap_area objects.
+ */
+ vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
+
+ for_each_possible_cpu(i) {
+ struct vmap_block_queue *vbq;
+ struct vfree_deferred *p;
+
+ vbq = &per_cpu(vmap_block_queue, i);
+ spin_lock_init(&vbq->lock);
+ INIT_LIST_HEAD(&vbq->free);
+ p = &per_cpu(vfree_deferred, i);
+ init_llist_head(&p->list);
+ INIT_WORK(&p->wq, delayed_vfree_work);
+ }
+
+ /* Import existing vmlist entries. */
+ for (tmp = vmlist; tmp; tmp = tmp->next) {
+ va = kmem_cache_zalloc(vmap_area_cachep, GFP_NOWAIT);
+ if (WARN_ON_ONCE(!va))
+ continue;
+
+ va->va_start = (unsigned long)tmp->addr;
+ va->va_end = va->va_start + tmp->size;
+ va->vm = tmp;
+ insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
+ }
+
+ /*
+ * Now we can initialize a free vmap space.
+ */
+ vmap_init_free_space();
+ vmap_initialized = true;
+ }
/* There's a bubble in the pipe until at least the first ACK. */
tp->app_limited = ~0U;
+ tp->rate_app_limited = 1;
/* See draft-stevens-tcpca-spec-01 for discussion of the
* initialization of these values.
{
if (vma->vm_flags & (VM_WRITE | VM_EXEC))
return -EPERM;
- vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+ vm_flags_clear(vma, VM_MAYWRITE | VM_MAYEXEC);
/* Instruct vm_insert_page() to not mmap_read_lock(mm) */
- vma->vm_flags |= VM_MIXEDMAP;
+ vm_flags_set(vma, VM_MIXEDMAP);
vma->vm_ops = &tcp_vm_ops;
return 0;
maybe_zap_len = total_bytes_to_map - /* All bytes to map */
*length + /* Mapped or pending */
(pages_remaining * PAGE_SIZE); /* Failed map. */
- zap_page_range(vma, *address, maybe_zap_len);
+ zap_page_range_single(vma, *address, maybe_zap_len, NULL);
err = 0;
}
unsigned long leftover_pages = pages_remaining;
int bytes_mapped;
- /* We called zap_page_range, try to reinsert. */
+ /* We called zap_page_range_single, try to reinsert. */
err = vm_insert_pages(vma, *address,
pending_pages,
&pages_remaining);
total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
if (total_bytes_to_map) {
if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
- zap_page_range(vma, address, total_bytes_to_map);
+ zap_page_range_single(vma, address, total_bytes_to_map,
+ NULL);
zc->length = total_bytes_to_map;
zc->recv_skip_hint = 0;
} else {
tp->plb_rehash = 0;
/* There's a bubble in the pipe until at least the first ACK. */
tp->app_limited = ~0U;
+ tp->rate_app_limited = 1;
tp->rack.mstamp = 0;
tp->rack.advanced = 0;
tp->rack.reo_wnd_steps = 1;
"snp_abort",
"stop_this_cpu",
"usercopy_abort",
+ "xen_cpu_bringup_again",
"xen_start_kernel",
};
if (!strcmp(sec->name, ".noinstr.text") ||
!strcmp(sec->name, ".entry.text") ||
+ !strcmp(sec->name, ".cpuidle.text") ||
!strncmp(sec->name, ".text.__x86.", 12))
sec->noinstr = true;
"__tsan_atomic64_compare_exchange_val",
"__tsan_atomic_thread_fence",
"__tsan_atomic_signal_fence",
+ "__tsan_unaligned_read16",
+ "__tsan_unaligned_write16",
/* KCOV */
"write_comp_data",
"check_kcov_mode",
"__ubsan_handle_type_mismatch",
"__ubsan_handle_type_mismatch_v1",
"__ubsan_handle_shift_out_of_bounds",
+ "__ubsan_handle_load_invalid_value",
/* misc */
"csum_partial_copy_generic",
"copy_mc_fragile",
if (func->sec->noinstr)
return true;
+ /*
+ * If the symbol is a static_call trampoline, we can't tell.
+ */
+ if (func->static_call_tramp)
+ return true;
+
/*
* The __ubsan_handle_*() calls are like WARN(), they only happen when
* something 'BAD' happened. At the risk of taking the machine down,
warnings += validate_unwind_hints(file, sec);
}
+ sec = find_section_by_name(file->elf, ".cpuidle.text");
+ if (sec) {
+ warnings += validate_section(file, sec);
+ warnings += validate_unwind_hints(file, sec);
+ }
+
return warnings;
}
TARGETS += ftrace
TARGETS += futex
TARGETS += gpio
+TARGETS += hid
TARGETS += intel_pstate
TARGETS += iommu
TARGETS += ipc
TARGETS += tpm2
TARGETS += user
TARGETS += vDSO
- TARGETS += vm
+ TARGETS += mm
TARGETS += x86
TARGETS += zram
#Please keep the TARGETS list alphabetically sorted
@# included in the generated runlist.
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
- [ ! -d $(INSTALL_PATH)/$$TARGET ] && echo "Skipping non-existent dir: $$TARGET" && continue; \
- echo -ne "Emit Tests for $$TARGET\n"; \
+ [ ! -d $(INSTALL_PATH)/$$TARGET ] && printf "Skipping non-existent dir: $$TARGET\n" && continue; \
+ printf "Emit Tests for $$TARGET\n"; \
$(MAKE) -s --no-print-directory OUTPUT=$$BUILD_TARGET COLLECTION=$$TARGET \
-C $$TARGET emit_tests >> $(TEST_LIST); \
done;
--- /dev/null
-CFLAGS = -Wall -I $(top_srcdir) -I $(top_srcdir)/usr/include $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ # SPDX-License-Identifier: GPL-2.0
+ # Makefile for mm selftests
+
+ LOCAL_HDRS += $(selfdir)/mm/local_config.h $(top_srcdir)/mm/gup_test.h
+
+ include local_config.mk
+
+ ifeq ($(CROSS_COMPILE),)
+ uname_M := $(shell uname -m 2>/dev/null || echo not)
+ else
+ uname_M := $(shell echo $(CROSS_COMPILE) | grep -o '^[a-z0-9]\+')
+ endif
+ MACHINE ?= $(shell echo $(uname_M) | sed -e 's/aarch64.*/arm64/' -e 's/ppc64.*/ppc64/')
+
+ # Without this, failed build products remain, with up-to-date timestamps,
+ # thus tricking Make (and you!) into believing that All Is Well, in subsequent
+ # make invocations:
+ .DELETE_ON_ERROR:
+
+ # Avoid accidental wrong builds, due to built-in rules working just a little
+ # bit too well--but not quite as well as required for our situation here.
+ #
+ # In other words, "make userfaultfd" is supposed to fail to build at all,
+ # because this Makefile only supports either "make" (all), or "make /full/path".
+ # However, the built-in rules, if not suppressed, will pick up CFLAGS and the
+ # initial LDLIBS (but not the target-specific LDLIBS, because those are only
+ # set for the full path target!). This causes it to get pretty far into building
+ # things despite using incorrect values such as an *occasionally* incomplete
+ # LDLIBS.
+ MAKEFLAGS += --no-builtin-rules
+
++CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
+ LDLIBS = -lrt -lpthread
+ TEST_GEN_FILES = cow
+ TEST_GEN_FILES += compaction_test
+ TEST_GEN_FILES += gup_test
+ TEST_GEN_FILES += hmm-tests
+ TEST_GEN_FILES += hugetlb-madvise
+ TEST_GEN_FILES += hugepage-mmap
+ TEST_GEN_FILES += hugepage-mremap
+ TEST_GEN_FILES += hugepage-shm
+ TEST_GEN_FILES += hugepage-vmemmap
+ TEST_GEN_FILES += khugepaged
+ TEST_GEN_PROGS = madv_populate
+ TEST_GEN_FILES += map_fixed_noreplace
+ TEST_GEN_FILES += map_hugetlb
+ TEST_GEN_FILES += map_populate
+ TEST_GEN_FILES += memfd_secret
+ TEST_GEN_FILES += migration
+ TEST_GEN_FILES += mlock-random-test
+ TEST_GEN_FILES += mlock2-tests
+ TEST_GEN_FILES += mrelease_test
+ TEST_GEN_FILES += mremap_dontunmap
+ TEST_GEN_FILES += mremap_test
+ TEST_GEN_FILES += on-fault-limit
+ TEST_GEN_FILES += thuge-gen
+ TEST_GEN_FILES += transhuge-stress
+ TEST_GEN_FILES += userfaultfd
+ TEST_GEN_PROGS += soft-dirty
+ TEST_GEN_PROGS += split_huge_page_test
+ TEST_GEN_FILES += ksm_tests
+ TEST_GEN_PROGS += ksm_functional_tests
+ TEST_GEN_PROGS += mdwe_test
+
+ ifeq ($(MACHINE),x86_64)
+ CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
+ CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
+ CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
+
+ VMTARGETS := protection_keys
+ BINARIES_32 := $(VMTARGETS:%=%_32)
+ BINARIES_64 := $(VMTARGETS:%=%_64)
+
+ ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+ CFLAGS += -no-pie
+ endif
+
+ ifeq ($(CAN_BUILD_I386),1)
+ TEST_GEN_FILES += $(BINARIES_32)
+ endif
+
+ ifeq ($(CAN_BUILD_X86_64),1)
+ TEST_GEN_FILES += $(BINARIES_64)
+ endif
+ else
+
+ ifneq (,$(findstring $(MACHINE),ppc64))
+ TEST_GEN_FILES += protection_keys
+ endif
+
+ endif
+
+ ifneq (,$(filter $(MACHINE),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64))
+ TEST_GEN_FILES += va_128TBswitch
+ TEST_GEN_FILES += virtual_address_range
+ TEST_GEN_FILES += write_to_hugetlbfs
+ endif
+
+ TEST_PROGS := run_vmtests.sh
+
+ TEST_FILES := test_vmalloc.sh
+ TEST_FILES += test_hmm.sh
+ TEST_FILES += va_128TBswitch.sh
+
+ include ../lib.mk
+
+ $(OUTPUT)/cow: vm_util.c
+ $(OUTPUT)/khugepaged: vm_util.c
+ $(OUTPUT)/ksm_functional_tests: vm_util.c
+ $(OUTPUT)/madv_populate: vm_util.c
+ $(OUTPUT)/soft-dirty: vm_util.c
+ $(OUTPUT)/split_huge_page_test: vm_util.c
+ $(OUTPUT)/userfaultfd: vm_util.c
+
+ ifeq ($(MACHINE),x86_64)
+ BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+ BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+
+ define gen-target-rule-32
+ $(1) $(1)_32: $(OUTPUT)/$(1)_32
+ .PHONY: $(1) $(1)_32
+ endef
+
+ define gen-target-rule-64
+ $(1) $(1)_64: $(OUTPUT)/$(1)_64
+ .PHONY: $(1) $(1)_64
+ endef
+
+ ifeq ($(CAN_BUILD_I386),1)
+ $(BINARIES_32): CFLAGS += -m32 -mxsave
+ $(BINARIES_32): LDLIBS += -lrt -ldl -lm
+ $(BINARIES_32): $(OUTPUT)/%_32: %.c
+ $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-32,$(t))))
+ endif
+
+ ifeq ($(CAN_BUILD_X86_64),1)
+ $(BINARIES_64): CFLAGS += -m64 -mxsave
+ $(BINARIES_64): LDLIBS += -lrt -ldl
+ $(BINARIES_64): $(OUTPUT)/%_64: %.c
+ $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-64,$(t))))
+ endif
+
+ # x86_64 users should be encouraged to install 32-bit libraries
+ ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+ all: warn_32bit_failure
+
+ warn_32bit_failure:
+ @echo "Warning: you seem to have a broken 32-bit build" 2>&1; \
+ echo "environment. This will reduce test coverage of 64-bit" 2>&1; \
+ echo "kernels. If you are using a Debian-like distribution," 2>&1; \
+ echo "try:"; 2>&1; \
+ echo ""; \
+ echo " apt-get install gcc-multilib libc6-i386 libc6-dev-i386"; \
+ echo ""; \
+ echo "If you are using a Fedora-like distribution, try:"; \
+ echo ""; \
+ echo " yum install glibc-devel.*i686"; \
+ exit 0;
+ endif
+ endif
+
+ # cow_EXTRA_LIBS may get set in local_config.mk, or it may be left empty.
+ $(OUTPUT)/cow: LDLIBS += $(COW_EXTRA_LIBS)
+
+ $(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap
+
+ $(OUTPUT)/ksm_tests: LDLIBS += -lnuma
+
+ $(OUTPUT)/migration: LDLIBS += -lnuma
+
+ local_config.mk local_config.h: check_config.sh
+ /bin/sh ./check_config.sh $(CC)
+
+ EXTRA_CLEAN += local_config.mk local_config.h
+
+ ifeq ($(COW_EXTRA_LIBS),)
+ all: warn_missing_liburing
+
+ warn_missing_liburing:
+ @echo ; \
+ echo "Warning: missing liburing support. Some COW tests will be skipped." ; \
+ echo
+ endif