qcow2 L2/refcount cache configuration
=====================================
-Copyright (C) 2015 Igalia, S.L.
+Copyright (C) 2015, 2018 Igalia, S.L.
This work is licensed under the terms of the GNU GPL, version 2 or
This document attempts to give an overview of the L2 and refcount
caches, and how to configure them.
-Please refer to the docs/specs/qcow2.txt file for an in-depth
+Please refer to the docs/interop/qcow2.txt file for an in-depth
technical description of the qcow2 file format.
The refcount blocks
-------------------
-The qcow2 format also mantains a reference count for each cluster.
+The qcow2 format also maintains a reference count for each cluster.
Reference counts are used for cluster allocation and internal
snapshots. The data is stored in a two-level structure similar to the
L1/L2 tables described above.
In order to choose the cache sizes we need to know how they relate to
the amount of allocated space.
-The amount of virtual disk that can be mapped by the L2 and refcount
+The part of the virtual disk that can be mapped by the L2 and refcount
caches (in bytes) is:
disk_size = l2_cache_size * cluster_size / 8
disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
With the default values for cluster_size (64KB) and refcount_bits
-(16), that is
+(16), this becomes:
disk_size = l2_cache_size * 8192
disk_size = refcount_cache_size * 32768
l2_cache_size = disk_size_GB * 131072
refcount_cache_size = disk_size_GB * 32768
-QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
-cache of 256KB (262144 bytes), so using the formulas we've just seen
-we have
+For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
+image size (given that the default cluster size is used):
- 1048576 / 131072 = 8 GB of virtual disk covered by that cache
- 262144 / 32768 = 8 GB
+ 8 GB / 8192 = 1 MB
+
+The refcount cache is 4 times the cluster size by default. With the default
+cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
+8 GB of image size:
+
+ 262144 * 32768 = 8 GB
How to configure the cache sizes
"refcount-cache-size": maximum size of the refcount block cache
"cache-size": maximum size of both caches combined
-There are two things that need to be taken into account:
+There are a few things that need to be taken into account:
+
+ - Both caches must have a size that is a multiple of the cluster size
+ (or the cache entry size: see "Using smaller cache sizes" below).
+
+ - The maximum L2 cache size is 32 MB by default on Linux platforms (enough
+ for full coverage of 256 GB images, with the default cluster size). This
+ value can be modified using the "l2-cache-size" option. QEMU will not use
+ more memory than needed to hold all of the image's L2 tables, regardless
+ of this max. value.
+ On non-Linux platforms the maximal value is smaller by default (8 MB) and
+ this difference stems from the fact that on Linux the cache can be cleared
+ periodically if needed, using the "cache-clean-interval" option (see below).
+ The minimal L2 cache size is 2 clusters (or 2 cache entries, see below).
+
+ - The default (and minimum) refcount cache size is 4 clusters.
+
+ - If only "cache-size" is specified then QEMU will assign as much
+ memory as possible to the L2 cache before increasing the refcount
+ cache size.
+
+ - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
+ can be set simultaneously.
+
+Unlike L2 tables, refcount blocks are not used during normal I/O but
+only during allocations and internal snapshots. In most cases they are
+accessed sequentially (even during random guest I/O) so increasing the
+refcount cache size won't have any measurable effect in performance
+(this can change if you are using internal snapshots, so you may want
+to think about increasing the cache size if you use them heavily).
+
+Before QEMU 2.12 the refcount cache had a default size of 1/4 of the
+L2 cache size. This resulted in unnecessarily large caches, so now the
+refcount cache is as small as possible unless overridden by the user.
+
+
+Using smaller cache entries
+---------------------------
+The qcow2 L2 cache can store complete tables. This means that if QEMU
+needs an entry from an L2 table then the whole table is read from disk
+and is kept in the cache. If the cache is full then a complete table
+needs to be evicted first.
+
+This can be inefficient with large cluster sizes since it results in
+more disk I/O and wastes more cache memory.
+
+Since QEMU 2.12 you can change the size of the L2 cache entry and make
+it smaller than the cluster size. This can be configured using the
+"l2-cache-entry-size" parameter:
+
+ -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
+
+Since QEMU 4.0 the value of l2-cache-entry-size defaults to 4KB (or
+the cluster size if it's smaller).
- - Both caches must have a size that is a multiple of the cluster
- size.
+Some things to take into account:
- - If you only set one of the options above, QEMU will automatically
- adjust the others so that the L2 cache is 4 times bigger than the
- refcount cache.
+ - The L2 cache entry size has the same restrictions as the cluster
+ size (power of two, at least 512 bytes).
-This means that these options are equivalent:
+ - Smaller entry sizes generally improve the cache efficiency and make
+ disk I/O faster. This is particularly true with solid state drives
+ so it's a good idea to reduce the entry size in those cases. With
+ rotating hard drives the situation is a bit more complicated so you
+ should test it first and stay with the default size if unsure.
- -drive file=hd.qcow2,l2-cache-size=2097152
- -drive file=hd.qcow2,refcount-cache-size=524288
- -drive file=hd.qcow2,cache-size=2621440
+ - Try different entry sizes to see which one gives faster performance
+ in your case. The block size of the host filesystem is generally a
+ good default (usually 4096 bytes in the case of ext4, hence the
+ default).
-The reason for this 1/4 ratio is to ensure that both caches cover the
-same amount of disk space. Note however that this is only valid with
-the default value of refcount_bits (16). If you are using a different
-value you might want to calculate both cache sizes yourself since QEMU
-will always use the same 1/4 ratio.
+ - Only the L2 cache can be configured this way. The refcount cache
+ always uses the cluster size as the entry size.
-It's also worth mentioning that there's no strict need for both caches
-to cover the same amount of disk space. The refcount cache is used
-much less often than the L2 cache, so it's perfectly reasonable to
-keep it small.
+ - If the L2 cache is big enough to hold all of the image's L2 tables
+ (as explained in the "Choosing the right cache sizes" and "How to
+ configure the cache sizes" sections in this document) then none of
+ this is necessary and you can omit the "l2-cache-entry-size"
+ parameter altogether. In this case QEMU makes the entry size
+ equal to the cluster size by default.
Reducing the memory usage
It is possible to clean unused cache entries in order to reduce the
memory usage during periods of low I/O activity.
-The parameter "cache-clean-interval" defines an interval (in seconds).
-All cache entries that haven't been accessed during that interval are
-removed from memory.
+The parameter "cache-clean-interval" defines an interval (in seconds),
+after which all the cache entries that haven't been accessed during the
+interval are removed from memory. Setting this parameter to 0 disables this
+feature.
-This example removes all unused cache entries every 15 minutes:
+The following example removes all unused cache entries every 15 minutes:
-drive file=hd.qcow2,cache-clean-interval=900
-If unset, the default value for this parameter is 0 and it disables
-this feature.
+If unset, the default value for this parameter is 600 on platforms which
+support this functionality, and is 0 (disabled) on other platforms.
-Note that this functionality currently relies on the MADV_DONTNEED
-argument for madvise() to actually free the memory, so it is not
-useful in systems that don't follow that behavior.
+This functionality currently relies on the MADV_DONTNEED argument for
+madvise() to actually free the memory. This is a Linux-specific feature,
+so cache-clean-interval is not supported on other systems.