]>
Commit | Line | Data |
---|---|---|
79c0f397 HZ |
1 | QEMU Virtual NVDIMM |
2 | =================== | |
3 | ||
4 | This document explains the usage of virtual NVDIMM (vNVDIMM) feature | |
5 | which is available since QEMU v2.6.0. | |
6 | ||
7 | The current QEMU only implements the persistent memory mode of vNVDIMM | |
8 | device and not the block window mode. | |
9 | ||
10 | Basic Usage | |
11 | ----------- | |
12 | ||
13 | The storage of a vNVDIMM device in QEMU is provided by the memory | |
14 | backend (i.e. memory-backend-file and memory-backend-ram). A simple | |
15 | way to create a vNVDIMM device at startup time is done via the | |
16 | following command line options: | |
17 | ||
18 | -machine pc,nvdimm | |
19 | -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE | |
20 | -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE | |
21 | -device nvdimm,id=nvdimm1,memdev=mem1 | |
22 | ||
23 | Where, | |
24 | ||
25 | - the "nvdimm" machine option enables vNVDIMM feature. | |
26 | ||
27 | - "slots=$N" should be equal to or larger than the total amount of | |
28 | normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here. | |
29 | ||
30 | - "maxmem=$MAX_SIZE" should be equal to or larger than the total size | |
31 | of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be | |
32 | >= $RAM_SIZE + $NVDIMM_SIZE here. | |
33 | ||
34 | - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE" | |
35 | creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All | |
36 | accesses to the virtual NVDIMM device go to the file $PATH. | |
37 | ||
38 | "share=on/off" controls the visibility of guest writes. If | |
39 | "share=on", then guest writes will be applied to the backend | |
40 | file. If another guest uses the same backend file with option | |
41 | "share=on", then above writes will be visible to it as well. If | |
42 | "share=off", then guest writes won't be applied to the backend | |
43 | file and thus will be invisible to other guests. | |
44 | ||
45 | - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM | |
46 | device whose storage is provided by above memory backend device. | |
47 | ||
48 | Multiple vNVDIMM devices can be created if multiple pairs of "-object" | |
49 | and "-device" are provided. | |
50 | ||
51 | For above command line options, if the guest OS has the proper NVDIMM | |
bd54b110 KC |
52 | driver (e.g. "CONFIG_ACPI_NFIT=y" under Linux), it should be able to |
53 | detect a NVDIMM device which is in the persistent memory mode and whose | |
54 | size is $NVDIMM_SIZE. | |
79c0f397 HZ |
55 | |
56 | Note: | |
57 | ||
58 | 1. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual | |
59 | backend file size is not equal to the size given by "size" option, | |
60 | QEMU will truncate the backend file by ftruncate(2), which will | |
61 | corrupt the existing data in the backend file, especially for the | |
62 | shrink case. | |
63 | ||
64 | QEMU v2.8.0 and later check the backend file size and the "size" | |
65 | option. If they do not match, QEMU will report errors and abort in | |
66 | order to avoid the data corruption. | |
67 | ||
68 | 2. QEMU v2.6.0 only puts a basic alignment requirement on the "size" | |
69 | option of memory-backend-file, e.g. 4KB alignment on x86. However, | |
70 | QEMU v.2.7.0 puts an additional alignment requirement, which may | |
71 | require a larger value than the basic one, e.g. 2MB on x86. This | |
72 | change breaks the usage of memory-backend-file that only satisfies | |
73 | the basic alignment. | |
74 | ||
75 | QEMU v2.8.0 and later remove the additional alignment on non-s390x | |
76 | architectures, so the broken memory-backend-file can work again. | |
77 | ||
78 | Label | |
79 | ----- | |
80 | ||
81 | QEMU v2.7.0 and later implement the label support for vNVDIMM devices. | |
82 | To enable label on vNVDIMM devices, users can simply add | |
83 | "label-size=$SZ" option to "-device nvdimm", e.g. | |
84 | ||
85 | -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K | |
86 | ||
87 | Note: | |
88 | ||
89 | 1. The minimal label size is 128KB. | |
90 | ||
91 | 2. QEMU v2.7.0 and later store labels at the end of backend storage. | |
92 | If a memory backend file, which was previously used as the backend | |
93 | of a vNVDIMM device without labels, is now used for a vNVDIMM | |
94 | device with label, the data in the label area at the end of file | |
95 | will be inaccessible to the guest. If any useful data (e.g. the | |
96 | meta-data of the file system) was stored there, the latter usage | |
97 | may result guest data corruption (e.g. breakage of guest file | |
98 | system). | |
99 | ||
100 | Hotplug | |
101 | ------- | |
102 | ||
103 | QEMU v2.8.0 and later implement the hotplug support for vNVDIMM | |
104 | devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is | |
105 | accomplished by two monitor commands "object_add" and "device_add". | |
106 | ||
107 | For example, the following commands add another 4GB vNVDIMM device to | |
108 | the guest: | |
109 | ||
110 | (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G | |
111 | (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2 | |
112 | ||
113 | Note: | |
114 | ||
115 | 1. Each hotplugged vNVDIMM device consumes one memory slot. Users | |
116 | should always ensure the memory option "-m ...,slots=N" specifies | |
117 | enough number of slots, i.e. | |
118 | N >= number of RAM devices + | |
119 | number of statically plugged vNVDIMM devices + | |
120 | number of hotplugged vNVDIMM devices | |
121 | ||
122 | 2. The similar is required for the memory option "-m ...,maxmem=M", i.e. | |
123 | M >= size of RAM devices + | |
124 | size of statically plugged vNVDIMM devices + | |
125 | size of hotplugged vNVDIMM devices | |
98376843 HZ |
126 | |
127 | Alignment | |
128 | --------- | |
129 | ||
130 | QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping | |
131 | address to the page size (getpagesize(2)) by default. However, some | |
132 | types of backends may require an alignment different than the page | |
133 | size. In that case, QEMU v2.12.0 and later provide 'align' option to | |
134 | memory-backend-file to allow users to specify the proper alignment. | |
135 | ||
136 | For example, device dax require the 2 MB alignment, so we can use | |
137 | following QEMU command line options to use it (/dev/dax0.0) as the | |
138 | backend of vNVDIMM: | |
139 | ||
140 | -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M | |
141 | -device nvdimm,id=nvdimm1,memdev=mem1 | |
cb836434 HZ |
142 | |
143 | Guest Data Persistence | |
144 | ---------------------- | |
145 | ||
146 | Though QEMU supports multiple types of vNVDIMM backends on Linux, | |
119906af ZY |
147 | the only backend that can guarantee the guest write persistence is: |
148 | ||
149 | A. DAX device (e.g., /dev/dax0.0, ) or | |
150 | B. DAX file(mounted with dax option) | |
151 | ||
152 | When using B (A file supporting direct mapping of persistent memory) | |
153 | as a backend, write persistence is guaranteed if the host kernel has | |
154 | support for the MAP_SYNC flag in the mmap system call (available | |
155 | since Linux 4.15 and on certain distro kernels) and additionally | |
156 | both 'pmem' and 'share' flags are set to 'on' on the backend. | |
157 | ||
158 | If these conditions are not satisfied i.e. if either 'pmem' or 'share' | |
159 | are not set, if the backend file does not support DAX or if MAP_SYNC | |
160 | is not supported by the host kernel, write persistence is not | |
161 | guaranteed after a system crash. For compatibility reasons, these | |
162 | conditions are ignored if not satisfied. Currently, no way is | |
163 | provided to test for them. | |
164 | For more details, please reference mmap(2) man page: | |
165 | http://man7.org/linux/man-pages/man2/mmap.2.html. | |
cb836434 HZ |
166 | |
167 | When using other types of backends, it's suggested to set 'unarmed' | |
168 | option of '-device nvdimm' to 'on', which sets the unarmed flag of the | |
169 | guest NVDIMM region mapping structure. This unarmed flag indicates | |
170 | guest software that this vNVDIMM device contains a region that cannot | |
171 | accept persistent writes. In result, for example, the guest Linux | |
172 | NVDIMM driver, marks such vNVDIMM device as read-only. | |
9ab3aad2 | 173 | |
11c39b5c RZ |
174 | NVDIMM Persistence |
175 | ------------------ | |
9ab3aad2 RZ |
176 | |
177 | ACPI 6.2 Errata A added support for a new Platform Capabilities Structure | |
178 | which allows the platform to communicate what features it supports related to | |
11c39b5c RZ |
179 | NVDIMM data persistence. Users can provide a persistence value to a guest via |
180 | the optional "nvdimm-persistence" machine command line option: | |
9ab3aad2 | 181 | |
11c39b5c | 182 | -machine pc,accel=kvm,nvdimm,nvdimm-persistence=cpu |
9ab3aad2 | 183 | |
11c39b5c | 184 | There are currently two valid values for this option: |
9ab3aad2 | 185 | |
11c39b5c RZ |
186 | "mem-ctrl" - The platform supports flushing dirty data from the memory |
187 | controller to the NVDIMMs in the event of power loss. | |
9ab3aad2 | 188 | |
11c39b5c RZ |
189 | "cpu" - The platform supports flushing dirty data from the CPU cache to |
190 | the NVDIMMs in the event of power loss. This implies that the | |
191 | platform also supports flushing dirty data through the memory | |
192 | controller on power loss. | |
a4de8552 JH |
193 | |
194 | If the vNVDIMM backend is in host persistent memory that can be accessed in | |
195 | SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set | |
196 | the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU | |
197 | is built with libpmem [2] support (configured with --enable-libpmem), QEMU | |
198 | will take necessary operations to guarantee the persistence of its own writes | |
199 | to the vNVDIMM backend(e.g., in vNVDIMM label emulation and live migration). | |
200 | If 'pmem' is 'on' while there is no libpmem support, qemu will exit and report | |
201 | a "lack of libpmem support" message to ensure the persistence is available. | |
202 | For example, if we want to ensure the persistence for some backend file, | |
203 | use the QEMU command line: | |
204 | ||
205 | -object memory-backend-file,id=nv_mem,mem-path=/XXX/yyy,size=4G,pmem=on | |
206 | ||
207 | References | |
208 | ---------- | |
209 | ||
210 | [1] NVM Programming Model (NPM) | |
211 | Version 1.2 | |
212 | https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf | |
213 | [2] Persistent Memory Development Kit (PMDK), formerly known as NVML project, home page: | |
214 | http://pmem.io/pmdk/ |