]>
Commit | Line | Data |
---|---|---|
1ffad77c AG |
1 | The QEMU throttling infrastructure |
2 | ================================== | |
3 | Copyright (C) 2016 Igalia, S.L. | |
4 | Author: Alberto Garcia <[email protected]> | |
5 | ||
6 | This work is licensed under the terms of the GNU GPL, version 2 or | |
7 | later. See the COPYING file in the top-level directory. | |
8 | ||
9 | Introduction | |
10 | ------------ | |
11 | QEMU includes a throttling module that can be used to set limits to | |
12 | I/O operations. The code itself is generic and independent of the I/O | |
cb8d4c8f | 13 | units, but it is currently used to limit the number of bytes per second |
1ffad77c AG |
14 | and operations per second (IOPS) when performing disk I/O. |
15 | ||
16 | This document explains how to use the throttling code in QEMU, and how | |
17 | it works internally. The implementation is in throttle.c. | |
18 | ||
19 | ||
20 | Using throttling to limit disk I/O | |
21 | ---------------------------------- | |
22 | Two aspects of the disk I/O can be limited: the number of bytes per | |
23 | second and the number of operations per second (IOPS). For each one of | |
24 | them the user can set a global limit or separate limits for read and | |
25 | write operations. This gives us a total of six different parameters. | |
26 | ||
27 | I/O limits can be set using the throttling.* parameters of -drive, or | |
28 | using the QMP 'block_set_io_throttle' command. These are the names of | |
29 | the parameters for both cases: | |
30 | ||
31 | |-----------------------+-----------------------| | |
32 | | -drive | block_set_io_throttle | | |
33 | |-----------------------+-----------------------| | |
34 | | throttling.iops-total | iops | | |
35 | | throttling.iops-read | iops_rd | | |
36 | | throttling.iops-write | iops_wr | | |
37 | | throttling.bps-total | bps | | |
38 | | throttling.bps-read | bps_rd | | |
39 | | throttling.bps-write | bps_wr | | |
40 | |-----------------------+-----------------------| | |
41 | ||
0bab0ebb | 42 | It is possible to set limits for both IOPS and bps at the same time, |
1ffad77c AG |
43 | and for each case we can decide whether to have separate read and |
44 | write limits or not, but note that if iops-total is set then neither | |
45 | iops-read nor iops-write can be set. The same applies to bps-total and | |
46 | bps-read/write. | |
47 | ||
48 | The default value of these parameters is 0, and it means 'unlimited'. | |
49 | ||
50 | In its most basic usage, the user can add a drive to QEMU with a limit | |
51 | of 100 IOPS with the following -drive line: | |
52 | ||
53 | -drive file=hd0.qcow2,throttling.iops-total=100 | |
54 | ||
55 | We can do the same using QMP. In this case all these parameters are | |
56 | mandatory, so we must set to 0 the ones that we don't want to limit: | |
57 | ||
58 | { "execute": "block_set_io_throttle", | |
59 | "arguments": { | |
60 | "device": "virtio0", | |
61 | "iops": 100, | |
62 | "iops_rd": 0, | |
63 | "iops_wr": 0, | |
64 | "bps": 0, | |
65 | "bps_rd": 0, | |
66 | "bps_wr": 0 | |
67 | } | |
68 | } | |
69 | ||
70 | ||
71 | I/O bursts | |
72 | ---------- | |
73 | In addition to the basic limits we have just seen, QEMU allows the | |
74 | user to do bursts of I/O for a configurable amount of time. A burst is | |
75 | an amount of I/O that can exceed the basic limit. Bursts are useful to | |
76 | allow better performance when there are peaks of activity (the OS | |
77 | boots, a service needs to be restarted) while keeping the average | |
78 | limits lower the rest of the time. | |
79 | ||
80 | Two parameters control bursts: their length and the maximum amount of | |
81 | I/O they allow. These two can be configured separately for each one of | |
82 | the six basic parameters described in the previous section, but in | |
83 | this section we'll use 'iops-total' as an example. | |
84 | ||
85 | The I/O limit during bursts is set using 'iops-total-max', and the | |
86 | maximum length (in seconds) is set with 'iops-total-max-length'. So if | |
87 | we want to configure a drive with a basic limit of 100 IOPS and allow | |
88 | bursts of 2000 IOPS for 60 seconds, we would do it like this (the line | |
89 | is split for clarity): | |
90 | ||
91 | -drive file=hd0.qcow2, | |
92 | throttling.iops-total=100, | |
93 | throttling.iops-total-max=2000, | |
94 | throttling.iops-total-max-length=60 | |
95 | ||
96 | Or, with QMP: | |
97 | ||
98 | { "execute": "block_set_io_throttle", | |
99 | "arguments": { | |
100 | "device": "virtio0", | |
101 | "iops": 100, | |
102 | "iops_rd": 0, | |
103 | "iops_wr": 0, | |
104 | "bps": 0, | |
105 | "bps_rd": 0, | |
106 | "bps_wr": 0, | |
107 | "iops_max": 2000, | |
108 | "iops_max_length": 60, | |
109 | } | |
110 | } | |
111 | ||
112 | With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 | |
113 | IOPS for 1 minute before it's throttled down to 100 IOPS. | |
114 | ||
115 | The user will be able to do bursts again if there's a sufficiently | |
116 | long period of time with unused I/O (see below for details). | |
117 | ||
118 | The default value for 'iops-total-max' is 0 and it means that bursts | |
119 | are not allowed. 'iops-total-max-length' can only be set if | |
120 | 'iops-total-max' is set as well, and its default value is 1 second. | |
121 | ||
122 | Here's the complete list of parameters for configuring bursts: | |
123 | ||
124 | |----------------------------------+-----------------------| | |
125 | | -drive | block_set_io_throttle | | |
126 | |----------------------------------+-----------------------| | |
127 | | throttling.iops-total-max | iops_max | | |
128 | | throttling.iops-total-max-length | iops_max_length | | |
129 | | throttling.iops-read-max | iops_rd_max | | |
130 | | throttling.iops-read-max-length | iops_rd_max_length | | |
131 | | throttling.iops-write-max | iops_wr_max | | |
132 | | throttling.iops-write-max-length | iops_wr_max_length | | |
133 | | throttling.bps-total-max | bps_max | | |
134 | | throttling.bps-total-max-length | bps_max_length | | |
135 | | throttling.bps-read-max | bps_rd_max | | |
136 | | throttling.bps-read-max-length | bps_rd_max_length | | |
137 | | throttling.bps-write-max | bps_wr_max | | |
138 | | throttling.bps-write-max-length | bps_wr_max_length | | |
139 | |----------------------------------+-----------------------| | |
140 | ||
141 | ||
142 | Controlling the size of I/O operations | |
143 | -------------------------------------- | |
144 | When applying IOPS limits all I/O operations are treated equally | |
145 | regardless of their size. This means that the user can take advantage | |
146 | of this in order to circumvent the limits and submit one huge I/O | |
147 | request instead of several smaller ones. | |
148 | ||
149 | QEMU provides a setting called throttling.iops-size to prevent this | |
150 | from happening. This setting specifies the size (in bytes) of an I/O | |
151 | request for accounting purposes. Larger requests will be counted | |
152 | proportionally to this size. | |
153 | ||
154 | For example, if iops-size is set to 4096 then an 8KB request will be | |
155 | counted as two, and a 6KB request will be counted as one and a | |
156 | half. This only applies to requests larger than iops-size: smaller | |
157 | requests will be always counted as one, no matter their size. | |
158 | ||
159 | The default value of iops-size is 0 and it means that the size of the | |
160 | requests is never taken into account when applying IOPS limits. | |
161 | ||
162 | ||
163 | Applying I/O limits to groups of disks | |
164 | -------------------------------------- | |
165 | In all the examples so far we have seen how to apply limits to the I/O | |
166 | performed on individual drives, but QEMU allows grouping drives so | |
167 | they all share the same limits. | |
168 | ||
169 | The way it works is that each drive with I/O limits is assigned to a | |
170 | group named using the throttling.group parameter. If this parameter is | |
171 | not specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will | |
172 | be used as the group name. | |
173 | ||
174 | Limits set using the throttling.* parameters discussed earlier in this | |
175 | document apply to the combined I/O of all members of a group. | |
176 | ||
177 | Consider this example: | |
178 | ||
179 | -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo | |
180 | -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo | |
181 | -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar | |
182 | -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo | |
183 | -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar | |
184 | -drive file=hd6.qcow2,throttling.iops-total=5000 | |
185 | ||
186 | Here hd1, hd2 and hd4 are all members of a group named 'foo' with a | |
187 | combined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6 | |
188 | is left alone (technically it is part of a 1-member group). | |
189 | ||
190 | Limits are applied in a round-robin fashion so if there are concurrent | |
191 | I/O requests on several drives of the same group they will be | |
192 | distributed evenly. | |
193 | ||
194 | When I/O limits are applied to an existing drive using the QMP command | |
195 | 'block_set_io_throttle', the following things need to be taken into | |
196 | account: | |
197 | ||
198 | - I/O limits are shared within the same group, so new values will | |
199 | affect all members and overwrite the previous settings. In other | |
200 | words: if different limits are applied to members of the same | |
201 | group, the last one wins. | |
202 | ||
203 | - If 'group' is unset it is assumed to be the current group of that | |
204 | drive. If the drive is not in a group yet, it will be added to a | |
205 | group named after the device name. | |
206 | ||
207 | - If 'group' is set then the drive will be moved to that group if | |
208 | it was member of a different one. In this case the limits | |
209 | specified in the parameters will be applied to the new group | |
210 | only. | |
211 | ||
212 | - I/O limits can be disabled by setting all of them to 0. In this | |
213 | case the device will be removed from its group and the rest of | |
214 | its members will not be affected. The 'group' parameter is | |
215 | ignored. | |
216 | ||
217 | ||
218 | The Leaky Bucket algorithm | |
219 | -------------------------- | |
220 | I/O limits in QEMU are implemented using the leaky bucket algorithm | |
221 | (specifically the "Leaky bucket as a meter" variant). | |
222 | ||
223 | This algorithm uses the analogy of a bucket that leaks water | |
224 | constantly. The water that gets into the bucket represents the I/O | |
225 | that has been performed, and no more I/O is allowed once the bucket is | |
226 | full. | |
227 | ||
228 | To see the way this corresponds to the throttling parameters in QEMU, | |
229 | consider the following values: | |
230 | ||
231 | iops-total=100 | |
232 | iops-total-max=2000 | |
233 | iops-total-max-length=60 | |
234 | ||
235 | - Water leaks from the bucket at a rate of 100 IOPS. | |
236 | - Water can be added to the bucket at a rate of 2000 IOPS. | |
237 | - The size of the bucket is 2000 x 60 = 120000 | |
37e3645a AG |
238 | - If 'iops-total-max-length' is unset then it defaults to 1 and the |
239 | size of the bucket is 2000. | |
240 | - If 'iops-total-max' is unset then 'iops-total-max-length' must be | |
241 | unset as well. In this case the bucket size is 100. | |
1ffad77c AG |
242 | |
243 | The bucket is initially empty, therefore water can be added until it's | |
244 | full at a rate of 2000 IOPS (the burst rate). Once the bucket is full | |
245 | we can only add as much water as it leaks, therefore the I/O rate is | |
246 | reduced to 100 IOPS. If we add less water than it leaks then the | |
247 | bucket will start to empty, allowing for bursts again. | |
248 | ||
249 | Note that since water is leaking from the bucket even during bursts, | |
250 | it will take a bit more than 60 seconds at 2000 IOPS to fill it | |
251 | up. After those 60 seconds the bucket will have leaked 60 x 100 = | |
252 | 6000, allowing for 3 more seconds of I/O at 2000 IOPS. | |
253 | ||
254 | Also, due to the way the algorithm works, longer burst can be done at | |
255 | a lower I/O rate, e.g. 1000 IOPS during 120 seconds. |