]>
Commit | Line | Data |
---|---|---|
1 | qcow2 L2/refcount cache configuration | |
2 | ===================================== | |
3 | Copyright (C) 2015, 2018 Igalia, S.L. | |
4 | Author: Alberto Garcia <[email protected]> | |
5 | ||
6 | This work is licensed under the terms of the GNU GPL, version 2 or | |
7 | later. See the COPYING file in the top-level directory. | |
8 | ||
9 | Introduction | |
10 | ------------ | |
11 | The QEMU qcow2 driver has two caches that can improve the I/O | |
12 | performance significantly. However, setting the right cache sizes is | |
13 | not a straightforward operation. | |
14 | ||
15 | This document attempts to give an overview of the L2 and refcount | |
16 | caches, and how to configure them. | |
17 | ||
18 | Please refer to the docs/interop/qcow2.txt file for an in-depth | |
19 | technical description of the qcow2 file format. | |
20 | ||
21 | ||
22 | Clusters | |
23 | -------- | |
24 | A qcow2 file is organized in units of constant size called clusters. | |
25 | ||
26 | The cluster size is configurable, but it must be a power of two and | |
27 | its value 512 bytes or higher. QEMU currently defaults to 64 KB | |
28 | clusters, and it does not support sizes larger than 2MB. | |
29 | ||
30 | The 'qemu-img create' command supports specifying the size using the | |
31 | cluster_size option: | |
32 | ||
33 | qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G | |
34 | ||
35 | ||
36 | The L2 tables | |
37 | ------------- | |
38 | The qcow2 format uses a two-level structure to map the virtual disk as | |
39 | seen by the guest to the disk image in the host. These structures are | |
40 | called the L1 and L2 tables. | |
41 | ||
42 | There is one single L1 table per disk image. The table is small and is | |
43 | always kept in memory. | |
44 | ||
45 | There can be many L2 tables, depending on how much space has been | |
46 | allocated in the image. Each table is one cluster in size. In order to | |
47 | read or write data from the virtual disk, QEMU needs to read its | |
48 | corresponding L2 table to find out where that data is located. Since | |
49 | reading the table for each I/O operation can be expensive, QEMU keeps | |
50 | an L2 cache in memory to speed up disk access. | |
51 | ||
52 | The size of the L2 cache can be configured, and setting the right | |
53 | value can improve the I/O performance significantly. | |
54 | ||
55 | ||
56 | The refcount blocks | |
57 | ------------------- | |
58 | The qcow2 format also mantains a reference count for each cluster. | |
59 | Reference counts are used for cluster allocation and internal | |
60 | snapshots. The data is stored in a two-level structure similar to the | |
61 | L1/L2 tables described above. | |
62 | ||
63 | The second level structures are called refcount blocks, are also one | |
64 | cluster in size and the number is also variable and dependent on the | |
65 | amount of allocated space. | |
66 | ||
67 | Each block contains a number of refcount entries. Their size (in bits) | |
68 | is a power of two and must not be higher than 64. It defaults to 16 | |
69 | bits, but a different value can be set using the refcount_bits option: | |
70 | ||
71 | qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G | |
72 | ||
73 | QEMU keeps a refcount cache to speed up I/O much like the | |
74 | aforementioned L2 cache, and its size can also be configured. | |
75 | ||
76 | ||
77 | Choosing the right cache sizes | |
78 | ------------------------------ | |
79 | In order to choose the cache sizes we need to know how they relate to | |
80 | the amount of allocated space. | |
81 | ||
82 | The part of the virtual disk that can be mapped by the L2 and refcount | |
83 | caches (in bytes) is: | |
84 | ||
85 | disk_size = l2_cache_size * cluster_size / 8 | |
86 | disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits | |
87 | ||
88 | With the default values for cluster_size (64KB) and refcount_bits | |
89 | (16), this becomes: | |
90 | ||
91 | disk_size = l2_cache_size * 8192 | |
92 | disk_size = refcount_cache_size * 32768 | |
93 | ||
94 | So in order to cover n GB of disk space with the default values we | |
95 | need: | |
96 | ||
97 | l2_cache_size = disk_size_GB * 131072 | |
98 | refcount_cache_size = disk_size_GB * 32768 | |
99 | ||
100 | For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual | |
101 | image size (given that the default cluster size is used): | |
102 | ||
103 | 8 GB / 8192 = 1 MB | |
104 | ||
105 | The refcount cache is 4 times the cluster size by default. With the default | |
106 | cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for | |
107 | 8 GB of image size: | |
108 | ||
109 | 262144 * 32768 = 8 GB | |
110 | ||
111 | ||
112 | How to configure the cache sizes | |
113 | -------------------------------- | |
114 | Cache sizes can be configured using the -drive option in the | |
115 | command-line, or the 'blockdev-add' QMP command. | |
116 | ||
117 | There are three options available, and all of them take bytes: | |
118 | ||
119 | "l2-cache-size": maximum size of the L2 table cache | |
120 | "refcount-cache-size": maximum size of the refcount block cache | |
121 | "cache-size": maximum size of both caches combined | |
122 | ||
123 | There are a few things that need to be taken into account: | |
124 | ||
125 | - Both caches must have a size that is a multiple of the cluster size | |
126 | (or the cache entry size: see "Using smaller cache sizes" below). | |
127 | ||
128 | - The maximum L2 cache size is 1 MB by default (enough for full coverage | |
129 | of 8 GB images, with the default cluster size). This value can be | |
130 | modified using the "l2-cache-size" option. QEMU will not use more memory | |
131 | than needed to hold all of the image's L2 tables, regardless of this max. | |
132 | value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see | |
133 | below). | |
134 | ||
135 | - The default (and minimum) refcount cache size is 4 clusters. | |
136 | ||
137 | - If only "cache-size" is specified then QEMU will assign as much | |
138 | memory as possible to the L2 cache before increasing the refcount | |
139 | cache size. | |
140 | ||
141 | - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size" | |
142 | can be set simultaneously. | |
143 | ||
144 | Unlike L2 tables, refcount blocks are not used during normal I/O but | |
145 | only during allocations and internal snapshots. In most cases they are | |
146 | accessed sequentially (even during random guest I/O) so increasing the | |
147 | refcount cache size won't have any measurable effect in performance | |
148 | (this can change if you are using internal snapshots, so you may want | |
149 | to think about increasing the cache size if you use them heavily). | |
150 | ||
151 | Before QEMU 2.12 the refcount cache had a default size of 1/4 of the | |
152 | L2 cache size. This resulted in unnecessarily large caches, so now the | |
153 | refcount cache is as small as possible unless overridden by the user. | |
154 | ||
155 | ||
156 | Using smaller cache entries | |
157 | --------------------------- | |
158 | The qcow2 L2 cache stores complete tables by default. This means that | |
159 | if QEMU needs an entry from an L2 table then the whole table is read | |
160 | from disk and is kept in the cache. If the cache is full then a | |
161 | complete table needs to be evicted first. | |
162 | ||
163 | This can be inefficient with large cluster sizes since it results in | |
164 | more disk I/O and wastes more cache memory. | |
165 | ||
166 | Since QEMU 2.12 you can change the size of the L2 cache entry and make | |
167 | it smaller than the cluster size. This can be configured using the | |
168 | "l2-cache-entry-size" parameter: | |
169 | ||
170 | -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096 | |
171 | ||
172 | Some things to take into account: | |
173 | ||
174 | - The L2 cache entry size has the same restrictions as the cluster | |
175 | size (power of two, at least 512 bytes). | |
176 | ||
177 | - Smaller entry sizes generally improve the cache efficiency and make | |
178 | disk I/O faster. This is particularly true with solid state drives | |
179 | so it's a good idea to reduce the entry size in those cases. With | |
180 | rotating hard drives the situation is a bit more complicated so you | |
181 | should test it first and stay with the default size if unsure. | |
182 | ||
183 | - Try different entry sizes to see which one gives faster performance | |
184 | in your case. The block size of the host filesystem is generally a | |
185 | good default (usually 4096 bytes in the case of ext4). | |
186 | ||
187 | - Only the L2 cache can be configured this way. The refcount cache | |
188 | always uses the cluster size as the entry size. | |
189 | ||
190 | - If the L2 cache is big enough to hold all of the image's L2 tables | |
191 | (as explained in the "Choosing the right cache sizes" and "How to | |
192 | configure the cache sizes" sections in this document) then none of | |
193 | this is necessary and you can omit the "l2-cache-entry-size" | |
194 | parameter altogether. | |
195 | ||
196 | ||
197 | Reducing the memory usage | |
198 | ------------------------- | |
199 | It is possible to clean unused cache entries in order to reduce the | |
200 | memory usage during periods of low I/O activity. | |
201 | ||
202 | The parameter "cache-clean-interval" defines an interval (in seconds). | |
203 | All cache entries that haven't been accessed during that interval are | |
204 | removed from memory. | |
205 | ||
206 | This example removes all unused cache entries every 15 minutes: | |
207 | ||
208 | -drive file=hd.qcow2,cache-clean-interval=900 | |
209 | ||
210 | If unset, the default value for this parameter is 0 and it disables | |
211 | this feature. | |
212 | ||
213 | Note that this functionality currently relies on the MADV_DONTNEED | |
214 | argument for madvise() to actually free the memory. This is a | |
215 | Linux-specific feature, so cache-clean-interval is not supported in | |
216 | other systems. |