]>
Commit | Line | Data |
---|---|---|
8e1c5a40 KW |
1 | /* |
2 | * VFIO Mediated devices | |
3 | * | |
4 | * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved. | |
5 | * Author: Neo Jia <[email protected]> | |
6 | * Kirti Wankhede <[email protected]> | |
7 | * | |
8 | * This program is free software; you can redistribute it and/or modify | |
9 | * it under the terms of the GNU General Public License version 2 as | |
10 | * published by the Free Software Foundation. | |
11 | */ | |
12 | ||
13 | Virtual Function I/O (VFIO) Mediated devices[1] | |
14 | =============================================== | |
15 | ||
16 | The number of use cases for virtualizing DMA devices that do not have built-in | |
17 | SR_IOV capability is increasing. Previously, to virtualize such devices, | |
18 | developers had to create their own management interfaces and APIs, and then | |
19 | integrate them with user space software. To simplify integration with user space | |
20 | software, we have identified common requirements and a unified management | |
21 | interface for such devices. | |
22 | ||
23 | The VFIO driver framework provides unified APIs for direct device access. It is | |
24 | an IOMMU/device-agnostic framework for exposing direct device access to user | |
25 | space in a secure, IOMMU-protected environment. This framework is used for | |
26 | multiple devices, such as GPUs, network adapters, and compute accelerators. With | |
27 | direct device access, virtual machines or user space applications have direct | |
28 | access to the physical device. This framework is reused for mediated devices. | |
29 | ||
30 | The mediated core driver provides a common interface for mediated device | |
31 | management that can be used by drivers of different devices. This module | |
32 | provides a generic interface to perform these operations: | |
33 | ||
34 | * Create and destroy a mediated device | |
35 | * Add a mediated device to and remove it from a mediated bus driver | |
36 | * Add a mediated device to and remove it from an IOMMU group | |
37 | ||
38 | The mediated core driver also provides an interface to register a bus driver. | |
39 | For example, the mediated VFIO mdev driver is designed for mediated devices and | |
40 | supports VFIO APIs. The mediated bus driver adds a mediated device to and | |
41 | removes it from a VFIO group. | |
42 | ||
43 | The following high-level block diagram shows the main components and interfaces | |
44 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | |
45 | devices as examples, as these devices are the first devices to use this module. | |
46 | ||
47 | +---------------+ | |
48 | | | | |
49 | | +-----------+ | mdev_register_driver() +--------------+ | |
50 | | | | +<------------------------+ | | |
51 | | | mdev | | | | | |
52 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user | |
53 | | | driver | | probe()/remove() | | APIs | |
54 | | | | | +--------------+ | |
55 | | +-----------+ | | |
56 | | | | |
57 | | MDEV CORE | | |
58 | | MODULE | | |
59 | | mdev.ko | | |
60 | | +-----------+ | mdev_register_device() +--------------+ | |
61 | | | | +<------------------------+ | | |
62 | | | | | | nvidia.ko |<-> physical | |
63 | | | | +------------------------>+ | device | |
64 | | | | | callbacks +--------------+ | |
65 | | | Physical | | | |
66 | | | device | | mdev_register_device() +--------------+ | |
67 | | | interface | |<------------------------+ | | |
68 | | | | | | i915.ko |<-> physical | |
69 | | | | +------------------------>+ | device | |
70 | | | | | callbacks +--------------+ | |
71 | | | | | | |
72 | | | | | mdev_register_device() +--------------+ | |
73 | | | | +<------------------------+ | | |
74 | | | | | | ccw_device.ko|<-> physical | |
75 | | | | +------------------------>+ | device | |
76 | | | | | callbacks +--------------+ | |
77 | | +-----------+ | | |
78 | +---------------+ | |
79 | ||
80 | ||
81 | Registration Interfaces | |
82 | ======================= | |
83 | ||
84 | The mediated core driver provides the following types of registration | |
85 | interfaces: | |
86 | ||
87 | * Registration interface for a mediated bus driver | |
88 | * Physical device driver interface | |
89 | ||
90 | Registration Interface for a Mediated Bus Driver | |
91 | ------------------------------------------------ | |
92 | ||
93 | The registration interface for a mediated bus driver provides the following | |
94 | structure to represent a mediated device's driver: | |
95 | ||
96 | /* | |
97 | * struct mdev_driver [2] - Mediated device's driver | |
98 | * @name: driver name | |
99 | * @probe: called when new device created | |
100 | * @remove: called when device removed | |
101 | * @driver: device driver structure | |
102 | */ | |
103 | struct mdev_driver { | |
104 | const char *name; | |
105 | int (*probe) (struct device *dev); | |
106 | void (*remove) (struct device *dev); | |
107 | struct device_driver driver; | |
108 | }; | |
109 | ||
110 | A mediated bus driver for mdev should use this structure in the function calls | |
111 | to register and unregister itself with the core driver: | |
112 | ||
113 | * Register: | |
114 | ||
115 | extern int mdev_register_driver(struct mdev_driver *drv, | |
116 | struct module *owner); | |
117 | ||
118 | * Unregister: | |
119 | ||
120 | extern void mdev_unregister_driver(struct mdev_driver *drv); | |
121 | ||
122 | The mediated bus driver is responsible for adding mediated devices to the VFIO | |
123 | group when devices are bound to the driver and removing mediated devices from | |
124 | the VFIO when devices are unbound from the driver. | |
125 | ||
126 | ||
127 | Physical Device Driver Interface | |
128 | -------------------------------- | |
129 | ||
42930553 AW |
130 | The physical device driver interface provides the mdev_parent_ops[3] structure |
131 | to define the APIs to manage work in the mediated core driver that is related | |
132 | to the physical device. | |
8e1c5a40 | 133 | |
42930553 | 134 | The structures in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
135 | |
136 | * dev_attr_groups: attributes of the parent device | |
137 | * mdev_attr_groups: attributes of the mediated device | |
138 | * supported_config: attributes to define supported configurations | |
139 | ||
42930553 | 140 | The functions in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
141 | |
142 | * create: allocate basic resources in a driver for a mediated device | |
143 | * remove: free resources in a driver when a mediated device is destroyed | |
144 | ||
42930553 | 145 | The callbacks in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
146 | |
147 | * open: open callback of mediated device | |
148 | * close: close callback of mediated device | |
149 | * ioctl: ioctl callback of mediated device | |
150 | * read : read emulation callback | |
151 | * write: write emulation callback | |
152 | * mmap: mmap emulation callback | |
153 | ||
42930553 AW |
154 | A driver should use the mdev_parent_ops structure in the function call to |
155 | register itself with the mdev core driver: | |
8e1c5a40 KW |
156 | |
157 | extern int mdev_register_device(struct device *dev, | |
42930553 | 158 | const struct mdev_parent_ops *ops); |
8e1c5a40 | 159 | |
42930553 AW |
160 | However, the mdev_parent_ops structure is not required in the function call |
161 | that a driver should use to unregister itself with the mdev core driver: | |
8e1c5a40 KW |
162 | |
163 | extern void mdev_unregister_device(struct device *dev); | |
164 | ||
165 | ||
166 | Mediated Device Management Interface Through sysfs | |
167 | ================================================== | |
168 | ||
169 | The management interface through sysfs enables user space software, such as | |
170 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. | |
171 | This management interface provides flexibility to the underlying physical | |
172 | device's driver to support features such as: | |
173 | ||
174 | * Mediated device hot plug | |
175 | * Multiple mediated devices in a single virtual machine | |
176 | * Multiple mediated devices from different physical devices | |
177 | ||
178 | Links in the mdev_bus Class Directory | |
179 | ------------------------------------- | |
180 | The /sys/class/mdev_bus/ directory contains links to devices that are registered | |
181 | with the mdev core driver. | |
182 | ||
183 | Directories and files under the sysfs for Each Physical Device | |
184 | -------------------------------------------------------------- | |
185 | ||
186 | |- [parent physical device] | |
187 | |--- Vendor-specific-attributes [optional] | |
188 | |--- [mdev_supported_types] | |
189 | | |--- [<type-id>] | |
190 | | | |--- create | |
191 | | | |--- name | |
192 | | | |--- available_instances | |
193 | | | |--- device_api | |
194 | | | |--- description | |
195 | | | |--- [devices] | |
196 | | |--- [<type-id>] | |
197 | | | |--- create | |
198 | | | |--- name | |
199 | | | |--- available_instances | |
200 | | | |--- device_api | |
201 | | | |--- description | |
202 | | | |--- [devices] | |
203 | | |--- [<type-id>] | |
204 | | |--- create | |
205 | | |--- name | |
206 | | |--- available_instances | |
207 | | |--- device_api | |
208 | | |--- description | |
209 | | |--- [devices] | |
210 | ||
211 | * [mdev_supported_types] | |
212 | ||
213 | The list of currently supported mediated device types and their details. | |
214 | ||
215 | [<type-id>], device_api, and available_instances are mandatory attributes | |
216 | that should be provided by vendor driver. | |
217 | ||
218 | * [<type-id>] | |
219 | ||
220 | The [<type-id>] name is created by adding the the device driver string as a | |
221 | prefix to the string provided by the vendor driver. This format of this name | |
222 | is as follows: | |
223 | ||
224 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | |
225 | ||
9372e6fe AW |
226 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
227 | of the core mdev code) | |
228 | ||
8e1c5a40 KW |
229 | * device_api |
230 | ||
231 | This attribute should show which device API is being created, for example, | |
232 | "vfio-pci" for a PCI device. | |
233 | ||
234 | * available_instances | |
235 | ||
236 | This attribute should show the number of devices of type <type-id> that can be | |
237 | created. | |
238 | ||
239 | * [device] | |
240 | ||
241 | This directory contains links to the devices of type <type-id> that have been | |
242 | created. | |
243 | ||
244 | * name | |
245 | ||
246 | This attribute should show human readable name. This is optional attribute. | |
247 | ||
248 | * description | |
249 | ||
250 | This attribute should show brief features/description of the type. This is | |
251 | optional attribute. | |
252 | ||
253 | Directories and Files Under the sysfs for Each mdev Device | |
254 | ---------------------------------------------------------- | |
255 | ||
256 | |- [parent phy device] | |
257 | |--- [$MDEV_UUID] | |
258 | |--- remove | |
259 | |--- mdev_type {link to its type} | |
260 | |--- vendor-specific-attributes [optional] | |
261 | ||
262 | * remove (write only) | |
263 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can | |
264 | fail the remove() callback if that device is active and the vendor driver | |
265 | doesn't support hot unplug. | |
266 | ||
267 | Example: | |
268 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove | |
269 | ||
270 | Mediated device Hot plug: | |
271 | ------------------------ | |
272 | ||
273 | Mediated devices can be created and assigned at runtime. The procedure to hot | |
274 | plug a mediated device is the same as the procedure to hot plug a PCI device. | |
275 | ||
276 | Translation APIs for Mediated Devices | |
277 | ===================================== | |
278 | ||
279 | The following APIs are provided for translating user pfn to host pfn in a VFIO | |
280 | driver: | |
281 | ||
282 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, | |
283 | int npage, int prot, unsigned long *phys_pfn); | |
284 | ||
285 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, | |
286 | int npage); | |
287 | ||
288 | These functions call back into the back-end IOMMU module by using the pin_pages | |
289 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | |
290 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for | |
291 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide | |
292 | these two callback functions. | |
293 | ||
9d1a546c KW |
294 | Using the Sample Code |
295 | ===================== | |
296 | ||
297 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to | |
298 | demonstrate how to use the mediated device framework. | |
299 | ||
300 | The sample driver creates an mdev device that simulates a serial port over a PCI | |
301 | card. | |
302 | ||
303 | 1. Build and load the mtty.ko module. | |
304 | ||
305 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | |
306 | ||
307 | Files in this device directory in sysfs are similar to the following: | |
308 | ||
309 | # tree /sys/devices/virtual/mtty/mtty/ | |
310 | /sys/devices/virtual/mtty/mtty/ | |
311 | |-- mdev_supported_types | |
312 | | |-- mtty-1 | |
313 | | | |-- available_instances | |
314 | | | |-- create | |
315 | | | |-- device_api | |
316 | | | |-- devices | |
317 | | | `-- name | |
318 | | `-- mtty-2 | |
319 | | |-- available_instances | |
320 | | |-- create | |
321 | | |-- device_api | |
322 | | |-- devices | |
323 | | `-- name | |
324 | |-- mtty_dev | |
325 | | `-- sample_mtty_dev | |
326 | |-- power | |
327 | | |-- autosuspend_delay_ms | |
328 | | |-- control | |
329 | | |-- runtime_active_time | |
330 | | |-- runtime_status | |
331 | | `-- runtime_suspended_time | |
332 | |-- subsystem -> ../../../../class/mtty | |
333 | `-- uevent | |
334 | ||
335 | 2. Create a mediated device by using the dummy device that you created in the | |
336 | previous step. | |
337 | ||
338 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ | |
339 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create | |
340 | ||
341 | 3. Add parameters to qemu-kvm. | |
342 | ||
343 | -device vfio-pci,\ | |
344 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | |
345 | ||
346 | 4. Boot the VM. | |
347 | ||
348 | In the Linux guest VM, with no hardware on the host, the device appears | |
349 | as follows: | |
350 | ||
351 | # lspci -s 00:05.0 -xxvv | |
352 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | |
353 | Subsystem: Device 4348:3253 | |
354 | Physical Slot: 5 | |
355 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | |
356 | Stepping- SERR- FastB2B- DisINTx- | |
357 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | |
358 | <TAbort- <MAbort- >SERR- <PERR- INTx- | |
359 | Interrupt: pin A routed to IRQ 10 | |
360 | Region 0: I/O ports at c150 [size=8] | |
361 | Region 1: I/O ports at c158 [size=8] | |
362 | Kernel driver in use: serial | |
363 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | |
364 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | |
365 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | |
366 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | |
367 | ||
368 | In the Linux guest VM, dmesg output for the device is as follows: | |
369 | ||
370 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ | |
371 | 10 | |
372 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | |
373 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | |
374 | ||
375 | ||
376 | 5. In the Linux guest VM, check the serial ports. | |
377 | ||
378 | # setserial -g /dev/ttyS* | |
379 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | |
380 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | |
381 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | |
382 | ||
383 | 6. Using a minicom or any terminal enulation program, open port /dev/ttyS1 or | |
384 | /dev/ttyS2 with hardware flow control disabled. | |
385 | ||
386 | 7. Type data on the minicom terminal or send data to the terminal emulation | |
387 | program and read the data. | |
388 | ||
389 | Data is loop backed from hosts mtty driver. | |
390 | ||
391 | 8. Destroy the mediated device that you created. | |
392 | ||
393 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove | |
394 | ||
8e1c5a40 | 395 | References |
9d1a546c | 396 | ========== |
8e1c5a40 KW |
397 | |
398 | [1] See Documentation/vfio.txt for more information on VFIO. | |
399 | [2] struct mdev_driver in include/linux/mdev.h | |
42930553 | 400 | [3] struct mdev_parent_ops in include/linux/mdev.h |
8e1c5a40 | 401 | [4] struct vfio_iommu_driver_ops in include/linux/vfio.h |