docs/qcow2-cache.txt

   1 qcow2 L2/refcount cache configuration
   2 =====================================
   3 Copyright (C) 2015, 2018 Igalia, S.L.
   4 Author: Alberto Garcia <[email protected]>
   5
   6 This work is licensed under the terms of the GNU GPL, version 2 or
   7 later. See the COPYING file in the top-level directory.
   8
   9 Introduction
  10 ------------
  11 The QEMU qcow2 driver has two caches that can improve the I/O
  12 performance significantly. However, setting the right cache sizes is
  13 not a straightforward operation.
  14
  15 This document attempts to give an overview of the L2 and refcount
  16 caches, and how to configure them.
  17
  18 Please refer to the docs/interop/qcow2.txt file for an in-depth
  19 technical description of the qcow2 file format.
  20
  21
  22 Clusters
  23 --------
  24 A qcow2 file is organized in units of constant size called clusters.
  25
  26 The cluster size is configurable, but it must be a power of two and
  27 its value 512 bytes or higher. QEMU currently defaults to 64 KB
  28 clusters, and it does not support sizes larger than 2MB.
  29
  30 The 'qemu-img create' command supports specifying the size using the
  31 cluster_size option:
  32
  33    qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G
  34
  35
  36 The L2 tables
  37 -------------
  38 The qcow2 format uses a two-level structure to map the virtual disk as
  39 seen by the guest to the disk image in the host. These structures are
  40 called the L1 and L2 tables.
  41
  42 There is one single L1 table per disk image. The table is small and is
  43 always kept in memory.
  44
  45 There can be many L2 tables, depending on how much space has been
  46 allocated in the image. Each table is one cluster in size. In order to
  47 read or write data from the virtual disk, QEMU needs to read its
  48 corresponding L2 table to find out where that data is located. Since
  49 reading the table for each I/O operation can be expensive, QEMU keeps
  50 an L2 cache in memory to speed up disk access.
  51
  52 The size of the L2 cache can be configured, and setting the right
  53 value can improve the I/O performance significantly.
  54
  55
  56 The refcount blocks
  57 -------------------
  58 The qcow2 format also mantains a reference count for each cluster.
  59 Reference counts are used for cluster allocation and internal
  60 snapshots. The data is stored in a two-level structure similar to the
  61 L1/L2 tables described above.
  62
  63 The second level structures are called refcount blocks, are also one
  64 cluster in size and the number is also variable and dependent on the
  65 amount of allocated space.
  66
  67 Each block contains a number of refcount entries. Their size (in bits)
  68 is a power of two and must not be higher than 64. It defaults to 16
  69 bits, but a different value can be set using the refcount_bits option:
  70
  71    qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G
  72
  73 QEMU keeps a refcount cache to speed up I/O much like the
  74 aforementioned L2 cache, and its size can also be configured.
  75
  76
  77 Choosing the right cache sizes
  78 ------------------------------
  79 In order to choose the cache sizes we need to know how they relate to
  80 the amount of allocated space.
  81
  82 The part of the virtual disk that can be mapped by the L2 and refcount
  83 caches (in bytes) is:
  84
  85    disk_size = l2_cache_size * cluster_size / 8
  86    disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
  87
  88 With the default values for cluster_size (64KB) and refcount_bits
  89 (16), this becomes:
  90
  91    disk_size = l2_cache_size * 8192
  92    disk_size = refcount_cache_size * 32768
  93
  94 So in order to cover n GB of disk space with the default values we
  95 need:
  96
  97    l2_cache_size = disk_size_GB * 131072
  98    refcount_cache_size = disk_size_GB * 32768
  99
 100 For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
 101 image size (given that the default cluster size is used):
 102
 103    8 GB / 8192 = 1 MB
 104
 105 The refcount cache is 4 times the cluster size by default. With the default
 106 cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
 107 8 GB of image size:
 108
 109    262144 * 32768 = 8 GB
 110
 111
 112 How to configure the cache sizes
 113 --------------------------------
 114 Cache sizes can be configured using the -drive option in the
 115 command-line, or the 'blockdev-add' QMP command.
 116
 117 There are three options available, and all of them take bytes:
 118
 119 "l2-cache-size":         maximum size of the L2 table cache
 120 "refcount-cache-size":   maximum size of the refcount block cache
 121 "cache-size":            maximum size of both caches combined
 122
 123 There are a few things that need to be taken into account:
 124
 125  - Both caches must have a size that is a multiple of the cluster size
 126    (or the cache entry size: see "Using smaller cache sizes" below).
 127
 128  - The maximum L2 cache size is 1 MB by default (enough for full coverage
 129    of 8 GB images, with the default cluster size). This value can be
 130    modified using the "l2-cache-size" option. QEMU will not use more memory
 131    than needed to hold all of the image's L2 tables, regardless of this max.
 132    value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see
 133    below).
 134
 135  - The default (and minimum) refcount cache size is 4 clusters.
 136
 137  - If only "cache-size" is specified then QEMU will assign as much
 138    memory as possible to the L2 cache before increasing the refcount
 139    cache size.
 140
 141  - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
 142    can be set simultaneously.
 143
 144 Unlike L2 tables, refcount blocks are not used during normal I/O but
 145 only during allocations and internal snapshots. In most cases they are
 146 accessed sequentially (even during random guest I/O) so increasing the
 147 refcount cache size won't have any measurable effect in performance
 148 (this can change if you are using internal snapshots, so you may want
 149 to think about increasing the cache size if you use them heavily).
 150
 151 Before QEMU 2.12 the refcount cache had a default size of 1/4 of the
 152 L2 cache size. This resulted in unnecessarily large caches, so now the
 153 refcount cache is as small as possible unless overridden by the user.
 154
 155
 156 Using smaller cache entries
 157 ---------------------------
 158 The qcow2 L2 cache stores complete tables by default. This means that
 159 if QEMU needs an entry from an L2 table then the whole table is read
 160 from disk and is kept in the cache. If the cache is full then a
 161 complete table needs to be evicted first.
 162
 163 This can be inefficient with large cluster sizes since it results in
 164 more disk I/O and wastes more cache memory.
 165
 166 Since QEMU 2.12 you can change the size of the L2 cache entry and make
 167 it smaller than the cluster size. This can be configured using the
 168 "l2-cache-entry-size" parameter:
 169
 170    -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
 171
 172 Some things to take into account:
 173
 174  - The L2 cache entry size has the same restrictions as the cluster
 175    size (power of two, at least 512 bytes).
 176
 177  - Smaller entry sizes generally improve the cache efficiency and make
 178    disk I/O faster. This is particularly true with solid state drives
 179    so it's a good idea to reduce the entry size in those cases. With
 180    rotating hard drives the situation is a bit more complicated so you
 181    should test it first and stay with the default size if unsure.
 182
 183  - Try different entry sizes to see which one gives faster performance
 184    in your case. The block size of the host filesystem is generally a
 185    good default (usually 4096 bytes in the case of ext4).
 186
 187  - Only the L2 cache can be configured this way. The refcount cache
 188    always uses the cluster size as the entry size.
 189
 190  - If the L2 cache is big enough to hold all of the image's L2 tables
 191    (as explained in the "Choosing the right cache sizes" and "How to
 192    configure the cache sizes" sections in this document) then none of
 193    this is necessary and you can omit the "l2-cache-entry-size"
 194    parameter altogether.
 195
 196
 197 Reducing the memory usage
 198 -------------------------
 199 It is possible to clean unused cache entries in order to reduce the
 200 memory usage during periods of low I/O activity.
 201
 202 The parameter "cache-clean-interval" defines an interval (in seconds).
 203 All cache entries that haven't been accessed during that interval are
 204 removed from memory.
 205
 206 This example removes all unused cache entries every 15 minutes:
 207
 208    -drive file=hd.qcow2,cache-clean-interval=900
 209
 210 If unset, the default value for this parameter is 0 and it disables
 211 this feature.
 212
 213 Note that this functionality currently relies on the MADV_DONTNEED
 214 argument for madvise() to actually free the memory. This is a
 215 Linux-specific feature, so cache-clean-interval is not supported in
 216 other systems.