[qemu.git] / docs / system / security.rst

Security
========

Overview
--------

This chapter explains the security requirements that QEMU is designed to meet
and principles for securely deploying QEMU.

Security Requirements
---------------------

QEMU supports many different use cases, some of which have stricter security
requirements than others.  The community has agreed on the overall security
requirements that users may depend on.  These requirements define what is
considered supported from a security perspective.

Virtualization Use Case
'''''''''''''''''''''''

The virtualization use case covers cloud and virtual private server (VPS)
hosting, as well as traditional data center and desktop virtualization.  These
use cases rely on hardware virtualization extensions to execute guest code
safely on the physical CPU at close-to-native speed.

The following entities are untrusted, meaning that they may be buggy or
malicious:

- Guest
- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
- Network protocols (e.g. NBD, live migration)
- User-supplied files (e.g. disk images, kernels, device trees)
- Passthrough devices (e.g. PCI, USB)

Bugs affecting these entities are evaluated on whether they can cause damage in
real-world use cases and treated as security bugs if this is the case.

Non-virtualization Use Case
'''''''''''''''''''''''''''

The non-virtualization use case covers emulation using the Tiny Code Generator
(TCG).  In principle the TCG and device emulation code used in conjunction with
the non-virtualization use case should meet the same security requirements as
the virtualization use case.  However, for historical reasons much of the
non-virtualization use case code was not written with these security
requirements in mind.

Bugs affecting the non-virtualization use case are not considered security
bugs at this time.  Users with non-virtualization use cases must not rely on
QEMU to provide guest isolation or any security guarantees.

Architecture
------------

This section describes the design principles that ensure the security
requirements are met.

Guest Isolation
'''''''''''''''

Guest isolation is the confinement of guest code to the virtual machine.  When
guest code gains control of execution on the host this is called escaping the
virtual machine.  Isolation also includes resource limits such as throttling of
CPU, memory, disk, or network.  Guests must be unable to exceed their resource
limits.

QEMU presents an attack surface to the guest in the form of emulated devices.
The guest must not be able to gain control of QEMU.  Bugs in emulated devices
could allow malicious guests to gain code execution in QEMU.  At this point the
guest has escaped the virtual machine and is able to act in the context of the
QEMU process on the host.

Guests often interact with other guests and share resources with them.  A
malicious guest must not gain control of other guests or access their data.
Disk image files and network traffic must be protected from other guests unless
explicitly shared between them by the user.

Principle of Least Privilege
''''''''''''''''''''''''''''

The principle of least privilege states that each component only has access to
the privileges necessary for its function.  In the case of QEMU this means that
each process only has access to resources belonging to the guest.

The QEMU process should not have access to any resources that are inaccessible
to the guest.  This way the guest does not gain anything by escaping into the
QEMU process since it already has access to those same resources from within
the guest.

Following the principle of least privilege immediately fulfills guest isolation
requirements.  For example, guest A only has access to its own disk image file
``a.img`` and not guest B's disk image file ``b.img``.

In reality certain resources are inaccessible to the guest but must be
available to QEMU to perform its function.  For example, host system calls are
necessary for QEMU but are not exposed to guests.  A guest that escapes into
the QEMU process can then begin invoking host system calls.

New features must be designed to follow the principle of least privilege.
Should this not be possible for technical reasons, the security risk must be
clearly documented so users are aware of the trade-off of enabling the feature.

Isolation mechanisms
''''''''''''''''''''

Several isolation mechanisms are available to realize this architecture of
guest isolation and the principle of least privilege.  With the exception of
Linux seccomp, these mechanisms are all deployed by management tools that
launch QEMU, such as libvirt.  They are also platform-specific so they are only
described briefly for Linux here.

The fundamental isolation mechanism is that QEMU processes must run as
unprivileged users.  Sometimes it seems more convenient to launch QEMU as
root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
huge security risk.  File descriptor passing can be used to give an otherwise
unprivileged QEMU process access to host devices without running QEMU as root.
It is also possible to launch QEMU as a non-root user and configure UNIX groups
for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
Some Linux distros already ship with UNIX groups for these devices by default.

- SELinux and AppArmor make it possible to confine processes beyond the
  traditional UNIX process and file permissions model.  They restrict the QEMU
  process from accessing processes and files on the host system that are not
  needed by QEMU.

- Resource limits and cgroup controllers provide throughput and utilization
  limits on key resources such as CPU time, memory, and I/O bandwidth.

- Linux namespaces can be used to make process, file system, and other system
  resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
  those resources that were granted to it.

- Linux seccomp is available via the QEMU ``--sandbox`` option.  It disables
  system calls that are not needed by QEMU, thereby reducing the host kernel
  attack surface.

Sensitive configurations
------------------------

There are aspects of QEMU that can have security implications which users &
management applications must be aware of.

Monitor console (QMP and HMP)
'''''''''''''''''''''''''''''

The monitor console (whether used with QMP or HMP) provides an interface
to dynamically control many aspects of QEMU's runtime operation. Many of the
commands exposed will instruct QEMU to access content on the host file system
and/or trigger spawning of external processes.

For example, the ``migrate`` command allows for the spawning of arbitrary
processes for the purpose of tunnelling the migration data stream. The
``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
their content to the guest as a virtual disk.

Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
or Linux namespaces, the monitor console should be considered to have privileges
equivalent to those of the user account QEMU is running under.

It is further important to consider the security of the character device backend
over which the monitor console is exposed. It needs to have protection against
malicious third parties which might try to make unauthorized connections, or
perform man-in-the-middle attacks. Many of the character device backends do not
satisfy this requirement and so must not be used for the monitor console.

The general recommendation is that the monitor console should be exposed over
a UNIX domain socket backend to the local host only. Use of the TCP based
character device backend is inappropriate unless configured to use both TLS
encryption and authorization control policy on client connections.

In summary, the monitor console is considered a privileged control interface to
QEMU and as such should only be made accessible to a trusted management
application or user.
Commit	Line	Data
c02c112a PM	1	Security
	2	========
	3
	4	Overview
	5	--------
	6
	7	This chapter explains the security requirements that QEMU is designed to meet
	8	and principles for securely deploying QEMU.
	9
	10	Security Requirements
	11	---------------------
	12
	13	QEMU supports many different use cases, some of which have stricter security
	14	requirements than others. The community has agreed on the overall security
	15	requirements that users may depend on. These requirements define what is
	16	considered supported from a security perspective.
	17
	18	Virtualization Use Case
	19	'''''''''''''''''''''''
	20
	21	The virtualization use case covers cloud and virtual private server (VPS)
	22	hosting, as well as traditional data center and desktop virtualization. These
	23	use cases rely on hardware virtualization extensions to execute guest code
	24	safely on the physical CPU at close-to-native speed.
	25
	26	The following entities are untrusted, meaning that they may be buggy or
	27	malicious:
	28
	29	- Guest
	30	- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
	31	- Network protocols (e.g. NBD, live migration)
	32	- User-supplied files (e.g. disk images, kernels, device trees)
	33	- Passthrough devices (e.g. PCI, USB)
	34
	35	Bugs affecting these entities are evaluated on whether they can cause damage in
	36	real-world use cases and treated as security bugs if this is the case.
	37
	38	Non-virtualization Use Case
	39	'''''''''''''''''''''''''''
	40
	41	The non-virtualization use case covers emulation using the Tiny Code Generator
	42	(TCG). In principle the TCG and device emulation code used in conjunction with
	43	the non-virtualization use case should meet the same security requirements as
	44	the virtualization use case. However, for historical reasons much of the
	45	non-virtualization use case code was not written with these security
	46	requirements in mind.
	47
	48	Bugs affecting the non-virtualization use case are not considered security
	49	bugs at this time. Users with non-virtualization use cases must not rely on
	50	QEMU to provide guest isolation or any security guarantees.
	51
	52	Architecture
	53	------------
	54
	55	This section describes the design principles that ensure the security
	56	requirements are met.
	57
	58	Guest Isolation
	59	'''''''''''''''
	60
	61	Guest isolation is the confinement of guest code to the virtual machine. When
	62	guest code gains control of execution on the host this is called escaping the
	63	virtual machine. Isolation also includes resource limits such as throttling of
	64	CPU, memory, disk, or network. Guests must be unable to exceed their resource
65	limits.
66
67	QEMU presents an attack surface to the guest in the form of emulated devices.
68	The guest must not be able to gain control of QEMU. Bugs in emulated devices
69	could allow malicious guests to gain code execution in QEMU. At this point the
70	guest has escaped the virtual machine and is able to act in the context of the
71	QEMU process on the host.
72
73	Guests often interact with other guests and share resources with them. A
74	malicious guest must not gain control of other guests or access their data.
75	Disk image files and network traffic must be protected from other guests unless
76	explicitly shared between them by the user.
77
78	Principle of Least Privilege
79	''''''''''''''''''''''''''''
80
81	The principle of least privilege states that each component only has access to
82	the privileges necessary for its function. In the case of QEMU this means that
83	each process only has access to resources belonging to the guest.
84
85	The QEMU process should not have access to any resources that are inaccessible
86	to the guest. This way the guest does not gain anything by escaping into the
87	QEMU process since it already has access to those same resources from within
88	the guest.
89
90	Following the principle of least privilege immediately fulfills guest isolation
91	requirements. For example, guest A only has access to its own disk image file
92	``a.img`` and not guest B's disk image file ``b.img``.
93
94	In reality certain resources are inaccessible to the guest but must be
95	available to QEMU to perform its function. For example, host system calls are
96	necessary for QEMU but are not exposed to guests. A guest that escapes into
97	the QEMU process can then begin invoking host system calls.
98
99	New features must be designed to follow the principle of least privilege.
100	Should this not be possible for technical reasons, the security risk must be
101	clearly documented so users are aware of the trade-off of enabling the feature.
102
103	Isolation mechanisms
104	''''''''''''''''''''
105
106	Several isolation mechanisms are available to realize this architecture of
107	guest isolation and the principle of least privilege. With the exception of
108	Linux seccomp, these mechanisms are all deployed by management tools that
109	launch QEMU, such as libvirt. They are also platform-specific so they are only
110	described briefly for Linux here.
111
112	The fundamental isolation mechanism is that QEMU processes must run as
113	unprivileged users. Sometimes it seems more convenient to launch QEMU as
114	root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
115	huge security risk. File descriptor passing can be used to give an otherwise
116	unprivileged QEMU process access to host devices without running QEMU as root.
117	It is also possible to launch QEMU as a non-root user and configure UNIX groups
118	for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
119	Some Linux distros already ship with UNIX groups for these devices by default.
120
121	- SELinux and AppArmor make it possible to confine processes beyond the
122	traditional UNIX process and file permissions model. They restrict the QEMU
123	process from accessing processes and files on the host system that are not
124	needed by QEMU.
125
126	- Resource limits and cgroup controllers provide throughput and utilization
127	limits on key resources such as CPU time, memory, and I/O bandwidth.
128
129	- Linux namespaces can be used to make process, file system, and other system
130	resources unavailable to QEMU. A namespaced QEMU process is restricted to only
131	those resources that were granted to it.
132
133	- Linux seccomp is available via the QEMU ``--sandbox`` option. It disables
134	system calls that are not needed by QEMU, thereby reducing the host kernel
135	attack surface.
136
137	Sensitive configurations
138	------------------------
139
140	There are aspects of QEMU that can have security implications which users &
141	management applications must be aware of.
142
143	Monitor console (QMP and HMP)
144	'''''''''''''''''''''''''''''
145
146	The monitor console (whether used with QMP or HMP) provides an interface
147	to dynamically control many aspects of QEMU's runtime operation. Many of the
148	commands exposed will instruct QEMU to access content on the host file system
149	and/or trigger spawning of external processes.
150
151	For example, the ``migrate`` command allows for the spawning of arbitrary
152	processes for the purpose of tunnelling the migration data stream. The
153	``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
154	their content to the guest as a virtual disk.
155
156	Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
157	or Linux namespaces, the monitor console should be considered to have privileges
158	equivalent to those of the user account QEMU is running under.
159
160	It is further important to consider the security of the character device backend
161	over which the monitor console is exposed. It needs to have protection against
162	malicious third parties which might try to make unauthorized connections, or
163	perform man-in-the-middle attacks. Many of the character device backends do not
164	satisfy this requirement and so must not be used for the monitor console.
165
166	The general recommendation is that the monitor console should be exposed over
167	a UNIX domain socket backend to the local host only. Use of the TCP based
168	character device backend is inappropriate unless configured to use both TLS
169	encryption and authorization control policy on client connections.
170
171	In summary, the monitor console is considered a privileged control interface to
172	QEMU and as such should only be made accessible to a trusted management
173	application or user.