]>
Commit | Line | Data |
---|---|---|
453ac883 MA |
1 | PCI EXPRESS GUIDELINES |
2 | ====================== | |
3 | ||
4 | 1. Introduction | |
5 | ================ | |
c8945922 KC |
6 | The doc proposes best practices on how to use PCI Express (PCIe) / PCI |
7 | devices in PCI Express based machines and explains the reasoning behind | |
8 | them. | |
9 | ||
10 | Note that the PCIe features are available only when using the 'q35' | |
11 | machine type on x86 architecture and the 'virt' machine type on AArch64. | |
12 | Other machine types do not use PCIe at this time. | |
453ac883 MA |
13 | |
14 | The following presentations accompany this document: | |
15 | (1) Q35 overview. | |
70b7fba9 | 16 | https://wiki.qemu.org/images/4/4e/Q35.pdf |
453ac883 | 17 | (2) A comparison between PCI and PCI Express technologies. |
70b7fba9 | 18 | https://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf |
453ac883 MA |
19 | |
20 | Note: The usage examples are not intended to replace the full | |
21 | documentation, please use QEMU help to retrieve all options. | |
22 | ||
23 | 2. Device placement strategy | |
24 | ============================ | |
25 | QEMU does not have a clear socket-device matching mechanism | |
26 | and allows any PCI/PCI Express device to be plugged into any | |
27 | PCI/PCI Express slot. | |
28 | Plugging a PCI device into a PCI Express slot might not always work and | |
29 | is weird anyway since it cannot be done for "bare metal". | |
30 | Plugging a PCI Express device into a PCI slot will hide the Extended | |
31 | Configuration Space thus is also not recommended. | |
32 | ||
33 | The recommendation is to separate the PCI Express and PCI hierarchies. | |
34 | PCI Express devices should be plugged only into PCI Express Root Ports and | |
35 | PCI Express Downstream ports. | |
36 | ||
37 | 2.1 Root Bus (pcie.0) | |
38 | ===================== | |
39 | Place only the following kinds of devices directly on the Root Complex: | |
40 | (1) PCI Devices (e.g. network card, graphics card, IDE controller), | |
41 | not controllers. Place only legacy PCI devices on | |
42 | the Root Complex. These will be considered Integrated Endpoints. | |
43 | Note: Integrated Endpoints are not hot-pluggable. | |
44 | ||
45 | Although the PCI Express spec does not forbid PCI Express devices as | |
46 | Integrated Endpoints, existing hardware mostly integrates legacy PCI | |
47 | devices with the Root Complex. Guest OSes are suspected to behave | |
48 | strangely when PCI Express devices are integrated | |
49 | with the Root Complex. | |
50 | ||
51 | (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express | |
52 | hierarchies. | |
53 | ||
c1800a16 | 54 | (3) PCI Express to PCI Bridge (pcie-pci-bridge), for starting legacy PCI |
453ac883 MA |
55 | hierarchies. |
56 | ||
57 | (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses | |
58 | are needed. | |
59 | ||
60 | pcie.0 bus | |
61 | ---------------------------------------------------------------------------- | |
62 | | | | | | |
c1800a16 AB |
63 | ----------- ------------------ ------------------- -------------- |
64 | | PCI Dev | | PCIe Root Port | | PCIe-PCI Bridge | | pxb-pcie | | |
65 | ----------- ------------------ ------------------- -------------- | |
453ac883 MA |
66 | |
67 | 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use: | |
68 | -device <dev>[,bus=pcie.0] | |
69 | 2.1.2 To expose a new PCI Express Root Bus use: | |
70 | -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z] | |
c1800a16 AB |
71 | PCI Express Root Ports and PCI Express to PCI bridges can be |
72 | connected to the pcie.1 bus: | |
453ac883 | 73 | -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ |
c1800a16 | 74 | -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1 |
453ac883 MA |
75 | |
76 | ||
77 | 2.2 PCI Express only hierarchy | |
78 | ============================== | |
79 | Always use PCI Express Root Ports to start PCI Express hierarchies. | |
80 | ||
81 | A PCI Express Root bus supports up to 32 devices. Since each | |
82 | PCI Express Root Port is a function and a multi-function | |
83 | device may support up to 8 functions, the maximum possible | |
84 | number of PCI Express Root Ports per PCI Express Root Bus is 256. | |
85 | ||
86 | Prefer grouping PCI Express Root Ports into multi-function devices | |
87 | to keep a simple flat hierarchy that is enough for most scenarios. | |
88 | Only use PCI Express Switches (x3130-upstream, xio3130-downstream) | |
89 | if there is no more room for PCI Express Root Ports. | |
90 | Please see section 4. for further justifications. | |
91 | ||
92 | Plug only PCI Express devices into PCI Express Ports. | |
93 | ||
94 | ||
95 | pcie.0 bus | |
96 | ---------------------------------------------------------------------------------- | |
97 | | | | | |
98 | ------------- ------------- ------------- | |
99 | | Root Port | | Root Port | | Root Port | | |
100 | ------------ ------------- ------------- | |
101 | | -------------------------|------------------------ | |
102 | ------------ | ----------------- | | |
103 | | PCIe Dev | | PCI Express | Upstream Port | | | |
104 | ------------ | Switch ----------------- | | |
105 | | | | | | |
106 | | ------------------- ------------------- | | |
107 | | | Downstream Port | | Downstream Port | | | |
108 | | ------------------- ------------------- | | |
109 | -------------|-----------------------|------------ | |
110 | ------------ | |
111 | | PCIe Dev | | |
112 | ------------ | |
113 | ||
114 | 2.2.1 Plugging a PCI Express device into a PCI Express Root Port: | |
115 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ | |
116 | -device <dev>,bus=root_port1 | |
117 | 2.2.2 Using multi-function PCI Express Root Ports: | |
2e41dfe1 C |
118 | -device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \ |
119 | -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \ | |
120 | -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \ | |
121 | 2.2.3 Plugging a PCI Express device into a Switch: | |
453ac883 MA |
122 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ |
123 | -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ | |
124 | -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \ | |
125 | -device <dev>,bus=downstream_port1 | |
126 | ||
127 | Notes: | |
2e41dfe1 C |
128 | - (slot, chassis) pair is mandatory and must be unique for each |
129 | PCI Express Root Port. slot defaults to 0 when not specified. | |
453ac883 MA |
130 | - 'addr' parameter can be 0 for all the examples above. |
131 | ||
132 | ||
133 | 2.3 PCI only hierarchy | |
134 | ====================== | |
135 | Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints, | |
136 | but, as mentioned in section 5, doing so means the legacy PCI | |
137 | device in question will be incapable of hot-unplugging. | |
c1800a16 AB |
138 | Besides that use PCI Express to PCI Bridges (pcie-pci-bridge) in |
139 | combination with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies. | |
453ac883 | 140 | |
c1800a16 | 141 | Prefer flat hierarchies. For most scenarios a single PCI Express to PCI Bridge |
453ac883 MA |
142 | (having 32 slots) and several PCI-PCI Bridges attached to it |
143 | (each supporting also 32 slots) will support hundreds of legacy devices. | |
c1800a16 AB |
144 | The recommendation is to populate one PCI-PCI Bridge under the |
145 | PCI Express to PCI Bridge until is full and then plug a new PCI-PCI Bridge... | |
453ac883 MA |
146 | |
147 | pcie.0 bus | |
148 | ---------------------------------------------- | |
149 | | | | |
c1800a16 AB |
150 | ----------- ------------------- |
151 | | PCI Dev | | PCIe-PCI Bridge | | |
152 | ----------- ------------------- | |
453ac883 MA |
153 | | | |
154 | ------------------ ------------------ | |
c1800a16 | 155 | | PCI-PCI Bridge | | PCI-PCI Bridge | |
453ac883 MA |
156 | ------------------ ------------------ |
157 | | | | |
158 | ----------- ----------- | |
159 | | PCI Dev | | PCI Dev | | |
160 | ----------- ----------- | |
161 | ||
162 | 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use: | |
163 | -device <dev>[,bus=pcie.0] | |
164 | 2.3.2 Plugging a PCI device into a PCI-PCI Bridge: | |
c1800a16 AB |
165 | -device pcie-pci-bridge,id=pcie_pci_bridge1[,bus=pcie.0] \ |
166 | -device pci-bridge,id=pci_bridge1,bus=pcie_pci_bridge1[,chassis_nr=x][,addr=y] \ | |
453ac883 MA |
167 | -device <dev>,bus=pci_bridge1[,addr=x] |
168 | Note that 'addr' cannot be 0 unless shpc=off parameter is passed to | |
c1800a16 | 169 | the PCI Bridge/PCI Express to PCI Bridge. |
453ac883 MA |
170 | |
171 | 3. IO space issues | |
172 | =================== | |
173 | The PCI Express Root Ports and PCI Express Downstream ports are seen by | |
174 | Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each | |
175 | such Port should be reserved a 4K IO range for, even though only one | |
176 | (multifunction) device can be plugged into each Port. This results in | |
177 | poor IO space utilization. | |
178 | ||
179 | The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations | |
180 | by not allocating IO space for each PCI Express Root / PCI Express | |
181 | Downstream port if: | |
182 | (1) the port is empty, or | |
183 | (2) the device behind the port has no IO BARs. | |
184 | ||
185 | The IO space is very limited, to 65536 byte-wide IO ports, and may even be | |
186 | fragmented by fixed IO ports owned by platform devices resulting in at most | |
187 | 10 PCI Express Root Ports or PCI Express Downstream Ports per system | |
188 | if devices with IO BARs are used in the PCI Express hierarchy. Using the | |
189 | proposed device placing strategy solves this issue by using only | |
190 | PCI Express devices within PCI Express hierarchy. | |
191 | ||
192 | The PCI Express spec requires that PCI Express devices work properly | |
193 | without using IO ports. The PCI hierarchy has no such limitations. | |
194 | ||
195 | ||
196 | 4. Bus numbers issues | |
197 | ====================== | |
198 | Each PCI domain can have up to only 256 buses and the QEMU PCI Express | |
199 | machines do not support multiple PCI domains even if extra Root | |
200 | Complexes (pxb-pcie) are used. | |
201 | ||
202 | Each element of the PCI Express hierarchy (Root Complexes, | |
203 | PCI Express Root Ports, PCI Express Downstream/Upstream ports) | |
204 | uses one bus number. Since only one (multifunction) device | |
205 | can be attached to a PCI Express Root Port or PCI Express Downstream | |
206 | Port it is advised to plan in advance for the expected number of | |
207 | devices to prevent bus number starvation. | |
208 | ||
209 | Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI | |
210 | Express hierarchy) enables the hierarchy to not spend bus numbers on | |
211 | Upstream Ports. | |
212 | ||
213 | The bus_nr properties of the pxb-pcie devices partition the 0..255 bus | |
214 | number space. All bus numbers assigned to the buses recursively behind a | |
215 | given pxb-pcie device's root bus must fit between the bus_nr property of | |
216 | that pxb-pcie device, and the lowest of the higher bus_nr properties | |
217 | that the command line sets for other pxb-pcie devices. | |
218 | ||
219 | ||
220 | 5. Hot-plug | |
221 | ============ | |
222 | The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) | |
223 | do not support hot-plug, so any devices plugged into Root Complexes | |
224 | cannot be hot-plugged/hot-unplugged: | |
225 | (1) PCI Express Integrated Endpoints | |
226 | (2) PCI Express Root Ports | |
c1800a16 | 227 | (3) PCI Express to PCI Bridges |
453ac883 MA |
228 | (4) pxb-pcie |
229 | ||
230 | Be aware that PCI Express Downstream Ports can't be hot-plugged into | |
231 | an existing PCI Express Upstream Port. | |
232 | ||
c1800a16 AB |
233 | PCI devices can be hot-plugged into PCI Express to PCI and PCI-PCI Bridges. |
234 | The PCI hot-plug into PCI-PCI bridge is ACPI based, whereas hot-plug into | |
235 | PCI Express to PCI bridges is SHPC-based. They both can work side by side with | |
236 | the PCI Express native hot-plug. | |
453ac883 MA |
237 | |
238 | PCI Express devices can be natively hot-plugged/hot-unplugged into/from | |
239 | PCI Express Root Ports (and PCI Express Downstream Ports). | |
240 | ||
241 | 5.1 Planning for hot-plug: | |
242 | (1) PCI hierarchy | |
243 | Leave enough PCI-PCI Bridge slots empty or add one | |
c1800a16 | 244 | or more empty PCI-PCI Bridges to the PCI Express to PCI Bridge. |
453ac883 MA |
245 | |
246 | For each such PCI-PCI Bridge the Guest Firmware is expected to reserve | |
247 | 4K IO space and 2M MMIO range to be used for all devices behind it. | |
c1800a16 | 248 | Appropriate PCI capability is designed, see pcie_pci_bridge.txt. |
453ac883 MA |
249 | |
250 | Because of the hard IO limit of around 10 PCI Bridges (~ 40K space) | |
251 | per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the | |
252 | Integrated Endpoints. (The PCI Express Hierarchy needs no IO space). | |
253 | ||
254 | (2) PCI Express hierarchy: | |
255 | Leave enough PCI Express Root Ports empty. Use multifunction | |
256 | PCI Express Root Ports (up to 8 ports per pcie.0 slot) | |
257 | on the Root Complex(es), for keeping the | |
258 | hierarchy as flat as possible, thereby saving PCI bus numbers. | |
259 | Don't use PCI Express Switches if you don't have | |
260 | to, each one of those uses an extra PCI bus (for its Upstream Port) | |
261 | that could be put to better use with another Root Port or Downstream | |
262 | Port, which may come handy for hot-plugging another device. | |
263 | ||
264 | ||
265 | 5.3 Hot-plug example: | |
266 | Using HMP: (add -monitor stdio to QEMU command line) | |
267 | device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/> | |
268 | ||
269 | ||
270 | 6. Device assignment | |
271 | ==================== | |
272 | Host devices are mostly PCI Express and should be plugged only into | |
273 | PCI Express Root Ports or PCI Express Downstream Ports. | |
274 | PCI-PCI Bridge slots can be used for legacy PCI host devices. | |
275 | ||
276 | 6.1 How to detect if a device is PCI Express: | |
277 | > lspci -s 03:00.0 -v (as root) | |
278 | ||
279 | 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) | |
280 | Subsystem: Intel Corporation Dual Band Wireless-AC 7260 | |
281 | Flags: bus master, fast devsel, latency 0, IRQ 50 | |
282 | Memory at f0400000 (64-bit, non-prefetchable) [size=8K] | |
283 | Capabilities: [c8] Power Management version 3 | |
284 | Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ | |
285 | Capabilities: [40] Express Endpoint, MSI 00 | |
286 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
287 | Capabilities: [100] Advanced Error Reporting | |
288 | Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 | |
289 | Capabilities: [14c] Latency Tolerance Reporting | |
290 | Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 | |
291 | ||
292 | If you can see the "Express Endpoint" capability in the | |
293 | output, then the device is indeed PCI Express. | |
294 | ||
295 | ||
296 | 7. Virtio devices | |
297 | ================= | |
298 | Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints | |
299 | will remain PCI and have transitional behaviour as default. | |
300 | Transitional virtio devices work in both IO and MMIO modes depending on | |
301 | the guest support. The Guest firmware will assign both IO and MMIO resources | |
302 | to transitional virtio devices. | |
303 | ||
304 | Virtio devices plugged into PCI Express ports are PCI Express devices and | |
305 | have "1.0" behavior by default without IO support. | |
306 | In both cases disable-legacy and disable-modern properties can be used | |
307 | to override the behaviour. | |
308 | ||
309 | Note that setting disable-legacy=off will enable legacy mode (enabling | |
310 | legacy behavior) for PCI Express virtio devices causing them to | |
311 | require IO space, which, given the limited available IO space, may quickly | |
312 | lead to resource exhaustion, and is therefore strongly discouraged. | |
313 | ||
314 | ||
315 | 8. Conclusion | |
316 | ============== | |
317 | The proposal offers a usage model that is easy to understand and follow | |
318 | and at the same time overcomes the PCI Express architecture limitations. |