]>
Commit | Line | Data |
---|---|---|
d27554d8 | 1 | |
2 | PAT (Page Attribute Table) | |
3 | ||
4 | x86 Page Attribute Table (PAT) allows for setting the memory attribute at the | |
5 | page level granularity. PAT is complementary to the MTRR settings which allows | |
6 | for setting of memory types over physical address ranges. However, PAT is | |
7 | more flexible than MTRR due to its capability to set attributes at page level | |
8 | and also due to the fact that there are no hardware limitations on number of | |
9 | such attribute settings allowed. Added flexibility comes with guidelines for | |
10 | not having memory type aliasing for the same physical memory with multiple | |
11 | virtual addresses. | |
12 | ||
13 | PAT allows for different types of memory attributes. The most commonly used | |
14 | ones that will be supported at this time are Write-back, Uncached, | |
15 | Write-combined and Uncached Minus. | |
16 | ||
59dfc3f8 | 17 | |
18 | PAT APIs | |
19 | -------- | |
20 | ||
d27554d8 | 21 | There are many different APIs in the kernel that allows setting of memory |
22 | attributes at the page level. In order to avoid aliasing, these interfaces | |
23 | should be used thoughtfully. Below is a table of interfaces available, | |
24 | their intended usage and their memory attribute relationships. Internally, | |
25 | these APIs use a reserve_memtype()/free_memtype() interface on the physical | |
26 | address range to avoid any aliasing. | |
27 | ||
28 | ||
29 | ------------------------------------------------------------------- | |
30 | API | RAM | ACPI,... | Reserved/Holes | | |
31 | -----------------------|----------|------------|------------------| | |
32 | | | | | | |
59dfc3f8 | 33 | ioremap | -- | UC- | UC- | |
d27554d8 | 34 | | | | | |
35 | ioremap_cache | -- | WB | WB | | |
36 | | | | | | |
59dfc3f8 | 37 | ioremap_nocache | -- | UC- | UC- | |
d27554d8 | 38 | | | | | |
39 | ioremap_wc | -- | -- | WC | | |
40 | | | | | | |
59dfc3f8 | 41 | set_memory_uc | UC- | -- | -- | |
d27554d8 | 42 | set_memory_wb | | | | |
43 | | | | | | |
44 | set_memory_wc | WC | -- | -- | | |
45 | set_memory_wb | | | | | |
46 | | | | | | |
59dfc3f8 | 47 | pci sysfs resource | -- | -- | UC- | |
d27554d8 | 48 | | | | | |
49 | pci sysfs resource_wc | -- | -- | WC | | |
50 | is IORESOURCE_PREFETCH| | | | | |
51 | | | | | | |
59dfc3f8 | 52 | pci proc | -- | -- | UC- | |
d27554d8 | 53 | !PCIIOC_WRITE_COMBINE | | | | |
54 | | | | | | |
55 | pci proc | -- | -- | WC | | |
56 | PCIIOC_WRITE_COMBINE | | | | | |
57 | | | | | | |
59dfc3f8 | 58 | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | |
d27554d8 | 59 | read-write | | | | |
60 | | | | | | |
59dfc3f8 | 61 | /dev/mem | -- | UC- | UC- | |
d27554d8 | 62 | mmap SYNC flag | | | | |
63 | | | | | | |
59dfc3f8 | 64 | /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | |
d27554d8 | 65 | mmap !SYNC flag | |(from exist-| (from exist- | |
66 | and | | ing alias)| ing alias) | | |
67 | any alias to this area| | | | | |
68 | | | | | | |
69 | /dev/mem | -- | WB | WB | | |
70 | mmap !SYNC flag | | | | | |
71 | no alias to this area | | | | | |
72 | and | | | | | |
73 | MTRR says WB | | | | | |
74 | | | | | | |
59dfc3f8 | 75 | /dev/mem | -- | -- | UC- | |
d27554d8 | 76 | mmap !SYNC flag | | | | |
77 | no alias to this area | | | | | |
78 | and | | | | | |
79 | MTRR says !WB | | | | | |
80 | | | | | | |
81 | ------------------------------------------------------------------- | |
82 | ||
a2ced6e1 | 83 | Advanced APIs for drivers |
84 | ------------------------- | |
67bac792 | 85 | A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, |
a2ced6e1 | 86 | vm_insert_pfn |
87 | ||
67bac792 | 88 | Drivers wanting to export some pages to userspace do it by using mmap |
a2ced6e1 | 89 | interface and a combination of |
90 | 1) pgprot_noncached() | |
91 | 2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn() | |
92 | ||
67bac792 | 93 | With PAT support, a new API pgprot_writecombine is being added. So, drivers can |
a2ced6e1 | 94 | continue to use the above sequence, with either pgprot_noncached() or |
95 | pgprot_writecombine() in step 1, followed by step 2. | |
96 | ||
97 | In addition, step 2 internally tracks the region as UC or WC in memtype | |
98 | list in order to ensure no conflicting mapping. | |
99 | ||
67bac792 | 100 | Note that this set of APIs only works with IO (non RAM) regions. If driver |
101 | wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() | |
a2ced6e1 | 102 | as step 0 above and also track the usage of those pages and use set_memory_wb() |
103 | before the page is freed to free pool. | |
104 | ||
105 | ||
106 | ||
d27554d8 | 107 | Notes: |
108 | ||
109 | -- in the above table mean "Not suggested usage for the API". Some of the --'s | |
110 | are strictly enforced by the kernel. Some others are not really enforced | |
111 | today, but may be enforced in future. | |
112 | ||
113 | For ioremap and pci access through /sys or /proc - The actual type returned | |
114 | can be more restrictive, in case of any existing aliasing for that address. | |
115 | For example: If there is an existing uncached mapping, a new ioremap_wc can | |
116 | return uncached mapping in place of write-combine requested. | |
117 | ||
118 | set_memory_[uc|wc] and set_memory_wb should be used in pairs, where driver will | |
119 | first make a region uc or wc and switch it back to wb after use. | |
120 | ||
121 | Over time writes to /proc/mtrr will be deprecated in favor of using PAT based | |
122 | interfaces. Users writing to /proc/mtrr are suggested to use above interfaces. | |
123 | ||
124 | Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access | |
125 | types. | |
126 | ||
127 | Drivers should use set_memory_[uc|wc] to set access type for RAM ranges. | |
128 | ||
59dfc3f8 | 129 | |
130 | PAT debugging | |
131 | ------------- | |
132 | ||
133 | With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by | |
134 | ||
135 | # mount -t debugfs debugfs /sys/kernel/debug | |
136 | # cat /sys/kernel/debug/x86/pat_memtype_list | |
137 | PAT memtype list: | |
138 | uncached-minus @ 0x7fadf000-0x7fae0000 | |
139 | uncached-minus @ 0x7fb19000-0x7fb1a000 | |
140 | uncached-minus @ 0x7fb1a000-0x7fb1b000 | |
141 | uncached-minus @ 0x7fb1b000-0x7fb1c000 | |
142 | uncached-minus @ 0x7fb1c000-0x7fb1d000 | |
143 | uncached-minus @ 0x7fb1d000-0x7fb1e000 | |
144 | uncached-minus @ 0x7fb1e000-0x7fb25000 | |
145 | uncached-minus @ 0x7fb25000-0x7fb26000 | |
146 | uncached-minus @ 0x7fb26000-0x7fb27000 | |
147 | uncached-minus @ 0x7fb27000-0x7fb28000 | |
148 | uncached-minus @ 0x7fb28000-0x7fb2e000 | |
149 | uncached-minus @ 0x7fb2e000-0x7fb2f000 | |
150 | uncached-minus @ 0x7fb2f000-0x7fb30000 | |
151 | uncached-minus @ 0x7fb31000-0x7fb32000 | |
152 | uncached-minus @ 0x80000000-0x90000000 | |
153 | ||
154 | This list shows physical address ranges and various PAT settings used to | |
155 | access those physical address ranges. | |
156 | ||
157 | Another, more verbose way of getting PAT related debug messages is with | |
158 | "debugpat" boot parameter. With this parameter, various debug messages are | |
159 | printed to dmesg log. | |
160 |