]>
Commit | Line | Data |
---|---|---|
9d3a4736 AK |
1 | The memory API |
2 | ============== | |
3 | ||
4 | The memory API models the memory and I/O buses and controllers of a QEMU | |
5 | machine. It attempts to allow modelling of: | |
6 | ||
7 | - ordinary RAM | |
8 | - memory-mapped I/O (MMIO) | |
9 | - memory controllers that can dynamically reroute physical memory regions | |
69ddaf66 | 10 | to different destinations |
9d3a4736 AK |
11 | |
12 | The memory model provides support for | |
13 | ||
14 | - tracking RAM changes by the guest | |
15 | - setting up coalesced memory for kvm | |
16 | - setting up ioeventfd regions for kvm | |
17 | ||
7075ba30 | 18 | Memory is modelled as a tree (really acyclic graph) of MemoryRegion objects. |
9d3a4736 AK |
19 | The root of the tree is memory as seen from the CPU's viewpoint (the system |
20 | bus). Nodes in the tree represent other buses, memory controllers, and | |
21 | memory regions that have been rerouted. Leaves are RAM and MMIO regions. | |
22 | ||
23 | Types of regions | |
24 | ---------------- | |
25 | ||
26 | There are four types of memory regions (all represented by a single C type | |
27 | MemoryRegion): | |
28 | ||
29 | - RAM: a RAM region is simply a range of host memory that can be made available | |
30 | to the guest. | |
31 | ||
32 | - MMIO: a range of guest memory that is implemented by host callbacks; | |
33 | each read or write causes a callback to be called on the host. | |
34 | ||
35 | - container: a container simply includes other memory regions, each at | |
36 | a different offset. Containers are useful for grouping several regions | |
37 | into one unit. For example, a PCI BAR may be composed of a RAM region | |
38 | and an MMIO region. | |
39 | ||
40 | A container's subregions are usually non-overlapping. In some cases it is | |
41 | useful to have overlapping regions; for example a memory controller that | |
42 | can overlay a subregion of RAM with MMIO or ROM, or a PCI controller | |
43 | that does not prevent card from claiming overlapping BARs. | |
44 | ||
45 | - alias: a subsection of another region. Aliases allow a region to be | |
46 | split apart into discontiguous regions. Examples of uses are memory banks | |
47 | used when the guest address space is smaller than the amount of RAM | |
48 | addressed, or a memory controller that splits main memory to expose a "PCI | |
49 | hole". Aliases may point to any type of region, including other aliases, | |
50 | but an alias may not point back to itself, directly or indirectly. | |
51 | ||
52 | ||
53 | Region names | |
54 | ------------ | |
55 | ||
56 | Regions are assigned names by the constructor. For most regions these are | |
57 | only used for debugging purposes, but RAM regions also use the name to identify | |
58 | live migration sections. This means that RAM region names need to have ABI | |
59 | stability. | |
60 | ||
61 | Region lifecycle | |
62 | ---------------- | |
63 | ||
64 | A region is created by one of the constructor functions (memory_region_init*()) | |
65 | and destroyed by the destructor (memory_region_destroy()). In between, | |
66 | a region can be added to an address space by using memory_region_add_subregion() | |
67 | and removed using memory_region_del_subregion(). Region attributes may be | |
68 | changed at any point; they take effect once the region becomes exposed to the | |
69 | guest. | |
70 | ||
71 | Overlapping regions and priority | |
72 | -------------------------------- | |
73 | Usually, regions may not overlap each other; a memory address decodes into | |
74 | exactly one target. In some cases it is useful to allow regions to overlap, | |
75 | and sometimes to control which of an overlapping regions is visible to the | |
76 | guest. This is done with memory_region_add_subregion_overlap(), which | |
77 | allows the region to overlap any other region in the same container, and | |
78 | specifies a priority that allows the core to decide which of two regions at | |
79 | the same address are visible (highest wins). | |
80 | ||
81 | Visibility | |
82 | ---------- | |
83 | The memory core uses the following rules to select a memory region when the | |
84 | guest accesses an address: | |
85 | ||
86 | - all direct subregions of the root region are matched against the address, in | |
87 | descending priority order | |
88 | - if the address lies outside the region offset/size, the subregion is | |
89 | discarded | |
7075ba30 | 90 | - if the subregion is a leaf (RAM or MMIO), the search terminates |
9d3a4736 AK |
91 | - if the subregion is a container, the same algorithm is used within the |
92 | subregion (after the address is adjusted by the subregion offset) | |
93 | - if the subregion is an alias, the search is continues at the alias target | |
94 | (after the address is adjusted by the subregion offset and alias offset) | |
95 | ||
96 | Example memory map | |
97 | ------------------ | |
98 | ||
99 | system_memory: container@0-2^48-1 | |
100 | | | |
101 | +---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff) | |
102 | | | |
103 | +---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff) | |
104 | | | |
105 | +---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff) | |
106 | | (prio 1) | |
107 | | | |
108 | +---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff) | |
109 | ||
110 | pci (0-2^32-1) | |
111 | | | |
112 | +--- vga-area: container@0xa0000-0xbffff | |
113 | | | | |
114 | | +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff) | |
115 | | | | |
116 | | +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff) | |
117 | | | |
118 | +---- vram: ram@0xe1000000-0xe1ffffff | |
119 | | | |
120 | +---- vga-mmio: mmio@0xe2000000-0xe200ffff | |
121 | ||
122 | ram: ram@0x00000000-0xffffffff | |
123 | ||
69ddaf66 | 124 | This is a (simplified) PC memory map. The 4GB RAM block is mapped into the |
9d3a4736 AK |
125 | system address space via two aliases: "lomem" is a 1:1 mapping of the first |
126 | 3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the | |
127 | so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with | |
128 | 4GB of memory. | |
129 | ||
130 | The memory controller diverts addresses in the range 640K-768K to the PCI | |
7075ba30 | 131 | address space. This is modelled using the "vga-window" alias, mapped at a |
9d3a4736 AK |
132 | higher priority so it obscures the RAM at the same addresses. The vga window |
133 | can be removed by programming the memory controller; this is modelled by | |
134 | removing the alias and exposing the RAM underneath. | |
135 | ||
136 | The pci address space is not a direct child of the system address space, since | |
137 | we only want parts of it to be visible (we accomplish this using aliases). | |
138 | It has two subregions: vga-area models the legacy vga window and is occupied | |
139 | by two 32K memory banks pointing at two sections of the framebuffer. | |
140 | In addition the vram is mapped as a BAR at address e1000000, and an additional | |
141 | BAR containing MMIO registers is mapped after it. | |
142 | ||
143 | Note that if the guest maps a BAR outside the PCI hole, it would not be | |
144 | visible as the pci-hole alias clips it to a 0.5GB range. | |
145 | ||
146 | Attributes | |
147 | ---------- | |
148 | ||
149 | Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd) | |
150 | can be changed during the region lifecycle. They take effect once the region | |
151 | is made visible (which can be immediately, later, or never). | |
152 | ||
153 | MMIO Operations | |
154 | --------------- | |
155 | ||
156 | MMIO regions are provided with ->read() and ->write() callbacks; in addition | |
157 | various constraints can be supplied to control how these callbacks are called: | |
158 | ||
159 | - .valid.min_access_size, .valid.max_access_size define the access sizes | |
160 | (in bytes) which the device accepts; accesses outside this range will | |
161 | have device and bus specific behaviour (ignored, or machine check) | |
162 | - .valid.aligned specifies that the device only accepts naturally aligned | |
163 | accesses. Unaligned accesses invoke device and bus specific behaviour. | |
164 | - .impl.min_access_size, .impl.max_access_size define the access sizes | |
165 | (in bytes) supported by the *implementation*; other access sizes will be | |
166 | emulated using the ones available. For example a 4-byte write will be | |
69ddaf66 | 167 | emulated using four 1-byte writes, if .impl.max_access_size = 1. |
9d3a4736 AK |
168 | - .impl.valid specifies that the *implementation* only supports unaligned |
169 | accesses; unaligned accesses will be emulated by two aligned accesses. | |
170 | - .old_portio and .old_mmio can be used to ease porting from code using | |
171 | cpu_register_io_memory() and register_ioport(). They should not be used | |
172 | in new code. |