]>
Commit | Line | Data |
---|---|---|
c02c112a PM |
1 | Security |
2 | ======== | |
3 | ||
4 | Overview | |
5 | -------- | |
6 | ||
7 | This chapter explains the security requirements that QEMU is designed to meet | |
8 | and principles for securely deploying QEMU. | |
9 | ||
10 | Security Requirements | |
11 | --------------------- | |
12 | ||
13 | QEMU supports many different use cases, some of which have stricter security | |
14 | requirements than others. The community has agreed on the overall security | |
15 | requirements that users may depend on. These requirements define what is | |
16 | considered supported from a security perspective. | |
17 | ||
18 | Virtualization Use Case | |
19 | ''''''''''''''''''''''' | |
20 | ||
21 | The virtualization use case covers cloud and virtual private server (VPS) | |
22 | hosting, as well as traditional data center and desktop virtualization. These | |
23 | use cases rely on hardware virtualization extensions to execute guest code | |
24 | safely on the physical CPU at close-to-native speed. | |
25 | ||
26 | The following entities are untrusted, meaning that they may be buggy or | |
27 | malicious: | |
28 | ||
29 | - Guest | |
30 | - User-facing interfaces (e.g. VNC, SPICE, WebSocket) | |
31 | - Network protocols (e.g. NBD, live migration) | |
32 | - User-supplied files (e.g. disk images, kernels, device trees) | |
33 | - Passthrough devices (e.g. PCI, USB) | |
34 | ||
35 | Bugs affecting these entities are evaluated on whether they can cause damage in | |
36 | real-world use cases and treated as security bugs if this is the case. | |
37 | ||
38 | Non-virtualization Use Case | |
39 | ''''''''''''''''''''''''''' | |
40 | ||
41 | The non-virtualization use case covers emulation using the Tiny Code Generator | |
42 | (TCG). In principle the TCG and device emulation code used in conjunction with | |
43 | the non-virtualization use case should meet the same security requirements as | |
44 | the virtualization use case. However, for historical reasons much of the | |
45 | non-virtualization use case code was not written with these security | |
46 | requirements in mind. | |
47 | ||
48 | Bugs affecting the non-virtualization use case are not considered security | |
49 | bugs at this time. Users with non-virtualization use cases must not rely on | |
50 | QEMU to provide guest isolation or any security guarantees. | |
51 | ||
52 | Architecture | |
53 | ------------ | |
54 | ||
55 | This section describes the design principles that ensure the security | |
56 | requirements are met. | |
57 | ||
58 | Guest Isolation | |
59 | ''''''''''''''' | |
60 | ||
61 | Guest isolation is the confinement of guest code to the virtual machine. When | |
62 | guest code gains control of execution on the host this is called escaping the | |
63 | virtual machine. Isolation also includes resource limits such as throttling of | |
64 | CPU, memory, disk, or network. Guests must be unable to exceed their resource | |
65 | limits. | |
66 | ||
67 | QEMU presents an attack surface to the guest in the form of emulated devices. | |
68 | The guest must not be able to gain control of QEMU. Bugs in emulated devices | |
69 | could allow malicious guests to gain code execution in QEMU. At this point the | |
70 | guest has escaped the virtual machine and is able to act in the context of the | |
71 | QEMU process on the host. | |
72 | ||
73 | Guests often interact with other guests and share resources with them. A | |
74 | malicious guest must not gain control of other guests or access their data. | |
75 | Disk image files and network traffic must be protected from other guests unless | |
76 | explicitly shared between them by the user. | |
77 | ||
78 | Principle of Least Privilege | |
79 | '''''''''''''''''''''''''''' | |
80 | ||
81 | The principle of least privilege states that each component only has access to | |
82 | the privileges necessary for its function. In the case of QEMU this means that | |
83 | each process only has access to resources belonging to the guest. | |
84 | ||
85 | The QEMU process should not have access to any resources that are inaccessible | |
86 | to the guest. This way the guest does not gain anything by escaping into the | |
87 | QEMU process since it already has access to those same resources from within | |
88 | the guest. | |
89 | ||
90 | Following the principle of least privilege immediately fulfills guest isolation | |
91 | requirements. For example, guest A only has access to its own disk image file | |
92 | ``a.img`` and not guest B's disk image file ``b.img``. | |
93 | ||
94 | In reality certain resources are inaccessible to the guest but must be | |
95 | available to QEMU to perform its function. For example, host system calls are | |
96 | necessary for QEMU but are not exposed to guests. A guest that escapes into | |
97 | the QEMU process can then begin invoking host system calls. | |
98 | ||
99 | New features must be designed to follow the principle of least privilege. | |
100 | Should this not be possible for technical reasons, the security risk must be | |
101 | clearly documented so users are aware of the trade-off of enabling the feature. | |
102 | ||
103 | Isolation mechanisms | |
104 | '''''''''''''''''''' | |
105 | ||
106 | Several isolation mechanisms are available to realize this architecture of | |
107 | guest isolation and the principle of least privilege. With the exception of | |
108 | Linux seccomp, these mechanisms are all deployed by management tools that | |
109 | launch QEMU, such as libvirt. They are also platform-specific so they are only | |
110 | described briefly for Linux here. | |
111 | ||
112 | The fundamental isolation mechanism is that QEMU processes must run as | |
113 | unprivileged users. Sometimes it seems more convenient to launch QEMU as | |
114 | root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a | |
115 | huge security risk. File descriptor passing can be used to give an otherwise | |
116 | unprivileged QEMU process access to host devices without running QEMU as root. | |
117 | It is also possible to launch QEMU as a non-root user and configure UNIX groups | |
118 | for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes. | |
119 | Some Linux distros already ship with UNIX groups for these devices by default. | |
120 | ||
121 | - SELinux and AppArmor make it possible to confine processes beyond the | |
122 | traditional UNIX process and file permissions model. They restrict the QEMU | |
123 | process from accessing processes and files on the host system that are not | |
124 | needed by QEMU. | |
125 | ||
126 | - Resource limits and cgroup controllers provide throughput and utilization | |
127 | limits on key resources such as CPU time, memory, and I/O bandwidth. | |
128 | ||
129 | - Linux namespaces can be used to make process, file system, and other system | |
130 | resources unavailable to QEMU. A namespaced QEMU process is restricted to only | |
131 | those resources that were granted to it. | |
132 | ||
133 | - Linux seccomp is available via the QEMU ``--sandbox`` option. It disables | |
134 | system calls that are not needed by QEMU, thereby reducing the host kernel | |
135 | attack surface. | |
136 | ||
137 | Sensitive configurations | |
138 | ------------------------ | |
139 | ||
140 | There are aspects of QEMU that can have security implications which users & | |
141 | management applications must be aware of. | |
142 | ||
143 | Monitor console (QMP and HMP) | |
144 | ''''''''''''''''''''''''''''' | |
145 | ||
146 | The monitor console (whether used with QMP or HMP) provides an interface | |
147 | to dynamically control many aspects of QEMU's runtime operation. Many of the | |
148 | commands exposed will instruct QEMU to access content on the host file system | |
149 | and/or trigger spawning of external processes. | |
150 | ||
151 | For example, the ``migrate`` command allows for the spawning of arbitrary | |
152 | processes for the purpose of tunnelling the migration data stream. The | |
153 | ``blockdev-add`` command instructs QEMU to open arbitrary files, exposing | |
154 | their content to the guest as a virtual disk. | |
155 | ||
156 | Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, | |
157 | or Linux namespaces, the monitor console should be considered to have privileges | |
158 | equivalent to those of the user account QEMU is running under. | |
159 | ||
160 | It is further important to consider the security of the character device backend | |
161 | over which the monitor console is exposed. It needs to have protection against | |
162 | malicious third parties which might try to make unauthorized connections, or | |
163 | perform man-in-the-middle attacks. Many of the character device backends do not | |
164 | satisfy this requirement and so must not be used for the monitor console. | |
165 | ||
166 | The general recommendation is that the monitor console should be exposed over | |
167 | a UNIX domain socket backend to the local host only. Use of the TCP based | |
168 | character device backend is inappropriate unless configured to use both TLS | |
169 | encryption and authorization control policy on client connections. | |
170 | ||
171 | In summary, the monitor console is considered a privileged control interface to | |
172 | QEMU and as such should only be made accessible to a trusted management | |
173 | application or user. |