]>
Commit | Line | Data |
---|---|---|
e8412576 SH |
1 | @node Security |
2 | @chapter Security | |
3 | ||
4 | @section Overview | |
5 | ||
6 | This chapter explains the security requirements that QEMU is designed to meet | |
7 | and principles for securely deploying QEMU. | |
8 | ||
9 | @section Security Requirements | |
10 | ||
11 | QEMU supports many different use cases, some of which have stricter security | |
12 | requirements than others. The community has agreed on the overall security | |
13 | requirements that users may depend on. These requirements define what is | |
14 | considered supported from a security perspective. | |
15 | ||
16 | @subsection Virtualization Use Case | |
17 | ||
18 | The virtualization use case covers cloud and virtual private server (VPS) | |
19 | hosting, as well as traditional data center and desktop virtualization. These | |
20 | use cases rely on hardware virtualization extensions to execute guest code | |
21 | safely on the physical CPU at close-to-native speed. | |
22 | ||
23 | The following entities are untrusted, meaning that they may be buggy or | |
24 | malicious: | |
25 | ||
26 | @itemize | |
27 | @item Guest | |
28 | @item User-facing interfaces (e.g. VNC, SPICE, WebSocket) | |
29 | @item Network protocols (e.g. NBD, live migration) | |
30 | @item User-supplied files (e.g. disk images, kernels, device trees) | |
31 | @item Passthrough devices (e.g. PCI, USB) | |
32 | @end itemize | |
33 | ||
34 | Bugs affecting these entities are evaluated on whether they can cause damage in | |
35 | real-world use cases and treated as security bugs if this is the case. | |
36 | ||
37 | @subsection Non-virtualization Use Case | |
38 | ||
39 | The non-virtualization use case covers emulation using the Tiny Code Generator | |
40 | (TCG). In principle the TCG and device emulation code used in conjunction with | |
41 | the non-virtualization use case should meet the same security requirements as | |
42 | the virtualization use case. However, for historical reasons much of the | |
43 | non-virtualization use case code was not written with these security | |
44 | requirements in mind. | |
45 | ||
46 | Bugs affecting the non-virtualization use case are not considered security | |
47 | bugs at this time. Users with non-virtualization use cases must not rely on | |
48 | QEMU to provide guest isolation or any security guarantees. | |
49 | ||
50 | @section Architecture | |
51 | ||
52 | This section describes the design principles that ensure the security | |
53 | requirements are met. | |
54 | ||
55 | @subsection Guest Isolation | |
56 | ||
57 | Guest isolation is the confinement of guest code to the virtual machine. When | |
58 | guest code gains control of execution on the host this is called escaping the | |
59 | virtual machine. Isolation also includes resource limits such as throttling of | |
60 | CPU, memory, disk, or network. Guests must be unable to exceed their resource | |
61 | limits. | |
62 | ||
63 | QEMU presents an attack surface to the guest in the form of emulated devices. | |
64 | The guest must not be able to gain control of QEMU. Bugs in emulated devices | |
65 | could allow malicious guests to gain code execution in QEMU. At this point the | |
66 | guest has escaped the virtual machine and is able to act in the context of the | |
67 | QEMU process on the host. | |
68 | ||
69 | Guests often interact with other guests and share resources with them. A | |
70 | malicious guest must not gain control of other guests or access their data. | |
71 | Disk image files and network traffic must be protected from other guests unless | |
72 | explicitly shared between them by the user. | |
73 | ||
74 | @subsection Principle of Least Privilege | |
75 | ||
76 | The principle of least privilege states that each component only has access to | |
77 | the privileges necessary for its function. In the case of QEMU this means that | |
78 | each process only has access to resources belonging to the guest. | |
79 | ||
80 | The QEMU process should not have access to any resources that are inaccessible | |
81 | to the guest. This way the guest does not gain anything by escaping into the | |
82 | QEMU process since it already has access to those same resources from within | |
83 | the guest. | |
84 | ||
85 | Following the principle of least privilege immediately fulfills guest isolation | |
86 | requirements. For example, guest A only has access to its own disk image file | |
87 | @code{a.img} and not guest B's disk image file @code{b.img}. | |
88 | ||
89 | In reality certain resources are inaccessible to the guest but must be | |
90 | available to QEMU to perform its function. For example, host system calls are | |
91 | necessary for QEMU but are not exposed to guests. A guest that escapes into | |
92 | the QEMU process can then begin invoking host system calls. | |
93 | ||
94 | New features must be designed to follow the principle of least privilege. | |
95 | Should this not be possible for technical reasons, the security risk must be | |
96 | clearly documented so users are aware of the trade-off of enabling the feature. | |
97 | ||
98 | @subsection Isolation mechanisms | |
99 | ||
100 | Several isolation mechanisms are available to realize this architecture of | |
101 | guest isolation and the principle of least privilege. With the exception of | |
102 | Linux seccomp, these mechanisms are all deployed by management tools that | |
103 | launch QEMU, such as libvirt. They are also platform-specific so they are only | |
104 | described briefly for Linux here. | |
105 | ||
106 | The fundamental isolation mechanism is that QEMU processes must run as | |
107 | unprivileged users. Sometimes it seems more convenient to launch QEMU as | |
108 | root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a | |
109 | huge security risk. File descriptor passing can be used to give an otherwise | |
110 | unprivileged QEMU process access to host devices without running QEMU as root. | |
111 | It is also possible to launch QEMU as a non-root user and configure UNIX groups | |
112 | for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes. | |
113 | Some Linux distros already ship with UNIX groups for these devices by default. | |
114 | ||
115 | @itemize | |
116 | @item SELinux and AppArmor make it possible to confine processes beyond the | |
117 | traditional UNIX process and file permissions model. They restrict the QEMU | |
118 | process from accessing processes and files on the host system that are not | |
119 | needed by QEMU. | |
120 | ||
121 | @item Resource limits and cgroup controllers provide throughput and utilization | |
122 | limits on key resources such as CPU time, memory, and I/O bandwidth. | |
123 | ||
124 | @item Linux namespaces can be used to make process, file system, and other system | |
125 | resources unavailable to QEMU. A namespaced QEMU process is restricted to only | |
126 | those resources that were granted to it. | |
127 | ||
128 | @item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables | |
129 | system calls that are not needed by QEMU, thereby reducing the host kernel | |
130 | attack surface. | |
131 | @end itemize | |
4f244308 DB |
132 | |
133 | @section Sensitive configurations | |
134 | ||
135 | There are aspects of QEMU that can have security implications which users & | |
136 | management applications must be aware of. | |
137 | ||
138 | @subsection Monitor console (QMP and HMP) | |
139 | ||
140 | The monitor console (whether used with QMP or HMP) provides an interface | |
141 | to dynamically control many aspects of QEMU's runtime operation. Many of the | |
142 | commands exposed will instruct QEMU to access content on the host file system | |
143 | and/or trigger spawning of external processes. | |
144 | ||
145 | For example, the @code{migrate} command allows for the spawning of arbitrary | |
146 | processes for the purpose of tunnelling the migration data stream. The | |
147 | @code{blockdev-add} command instructs QEMU to open arbitrary files, exposing | |
148 | their content to the guest as a virtual disk. | |
149 | ||
150 | Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, | |
151 | or Linux namespaces, the monitor console should be considered to have privileges | |
152 | equivalent to those of the user account QEMU is running under. | |
153 | ||
154 | It is further important to consider the security of the character device backend | |
155 | over which the monitor console is exposed. It needs to have protection against | |
156 | malicious third parties which might try to make unauthorized connections, or | |
157 | perform man-in-the-middle attacks. Many of the character device backends do not | |
158 | satisfy this requirement and so must not be used for the monitor console. | |
159 | ||
160 | The general recommendation is that the monitor console should be exposed over | |
161 | a UNIX domain socket backend to the local host only. Use of the TCP based | |
162 | character device backend is inappropriate unless configured to use both TLS | |
163 | encryption and authorization control policy on client connections. | |
164 | ||
165 | In summary, the monitor console is considered a privileged control interface to | |
166 | QEMU and as such should only be made accessible to a trusted management | |
167 | application or user. |