]>
Commit | Line | Data |
---|---|---|
453ac883 MA |
1 | PCI EXPRESS GUIDELINES |
2 | ====================== | |
3 | ||
4 | 1. Introduction | |
5 | ================ | |
6 | The doc proposes best practices on how to use PCI Express/PCI device | |
7 | in PCI Express based machines and explains the reasoning behind them. | |
8 | ||
9 | The following presentations accompany this document: | |
10 | (1) Q35 overview. | |
11 | http://wiki.qemu.org/images/4/4e/Q35.pdf | |
12 | (2) A comparison between PCI and PCI Express technologies. | |
13 | http://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf | |
14 | ||
15 | Note: The usage examples are not intended to replace the full | |
16 | documentation, please use QEMU help to retrieve all options. | |
17 | ||
18 | 2. Device placement strategy | |
19 | ============================ | |
20 | QEMU does not have a clear socket-device matching mechanism | |
21 | and allows any PCI/PCI Express device to be plugged into any | |
22 | PCI/PCI Express slot. | |
23 | Plugging a PCI device into a PCI Express slot might not always work and | |
24 | is weird anyway since it cannot be done for "bare metal". | |
25 | Plugging a PCI Express device into a PCI slot will hide the Extended | |
26 | Configuration Space thus is also not recommended. | |
27 | ||
28 | The recommendation is to separate the PCI Express and PCI hierarchies. | |
29 | PCI Express devices should be plugged only into PCI Express Root Ports and | |
30 | PCI Express Downstream ports. | |
31 | ||
32 | 2.1 Root Bus (pcie.0) | |
33 | ===================== | |
34 | Place only the following kinds of devices directly on the Root Complex: | |
35 | (1) PCI Devices (e.g. network card, graphics card, IDE controller), | |
36 | not controllers. Place only legacy PCI devices on | |
37 | the Root Complex. These will be considered Integrated Endpoints. | |
38 | Note: Integrated Endpoints are not hot-pluggable. | |
39 | ||
40 | Although the PCI Express spec does not forbid PCI Express devices as | |
41 | Integrated Endpoints, existing hardware mostly integrates legacy PCI | |
42 | devices with the Root Complex. Guest OSes are suspected to behave | |
43 | strangely when PCI Express devices are integrated | |
44 | with the Root Complex. | |
45 | ||
46 | (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express | |
47 | hierarchies. | |
48 | ||
49 | (3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI | |
50 | hierarchies. | |
51 | ||
52 | (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses | |
53 | are needed. | |
54 | ||
55 | pcie.0 bus | |
56 | ---------------------------------------------------------------------------- | |
57 | | | | | | |
58 | ----------- ------------------ ------------------ -------------- | |
59 | | PCI Dev | | PCIe Root Port | | DMI-PCI Bridge | | pxb-pcie | | |
60 | ----------- ------------------ ------------------ -------------- | |
61 | ||
62 | 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use: | |
63 | -device <dev>[,bus=pcie.0] | |
64 | 2.1.2 To expose a new PCI Express Root Bus use: | |
65 | -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z] | |
66 | Only PCI Express Root Ports and DMI-PCI bridges can be connected | |
67 | to the pcie.1 bus: | |
68 | -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ | |
69 | -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1 | |
70 | ||
71 | ||
72 | 2.2 PCI Express only hierarchy | |
73 | ============================== | |
74 | Always use PCI Express Root Ports to start PCI Express hierarchies. | |
75 | ||
76 | A PCI Express Root bus supports up to 32 devices. Since each | |
77 | PCI Express Root Port is a function and a multi-function | |
78 | device may support up to 8 functions, the maximum possible | |
79 | number of PCI Express Root Ports per PCI Express Root Bus is 256. | |
80 | ||
81 | Prefer grouping PCI Express Root Ports into multi-function devices | |
82 | to keep a simple flat hierarchy that is enough for most scenarios. | |
83 | Only use PCI Express Switches (x3130-upstream, xio3130-downstream) | |
84 | if there is no more room for PCI Express Root Ports. | |
85 | Please see section 4. for further justifications. | |
86 | ||
87 | Plug only PCI Express devices into PCI Express Ports. | |
88 | ||
89 | ||
90 | pcie.0 bus | |
91 | ---------------------------------------------------------------------------------- | |
92 | | | | | |
93 | ------------- ------------- ------------- | |
94 | | Root Port | | Root Port | | Root Port | | |
95 | ------------ ------------- ------------- | |
96 | | -------------------------|------------------------ | |
97 | ------------ | ----------------- | | |
98 | | PCIe Dev | | PCI Express | Upstream Port | | | |
99 | ------------ | Switch ----------------- | | |
100 | | | | | | |
101 | | ------------------- ------------------- | | |
102 | | | Downstream Port | | Downstream Port | | | |
103 | | ------------------- ------------------- | | |
104 | -------------|-----------------------|------------ | |
105 | ------------ | |
106 | | PCIe Dev | | |
107 | ------------ | |
108 | ||
109 | 2.2.1 Plugging a PCI Express device into a PCI Express Root Port: | |
110 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ | |
111 | -device <dev>,bus=root_port1 | |
112 | 2.2.2 Using multi-function PCI Express Root Ports: | |
2e41dfe1 C |
113 | -device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \ |
114 | -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \ | |
115 | -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \ | |
116 | 2.2.3 Plugging a PCI Express device into a Switch: | |
453ac883 MA |
117 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ |
118 | -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ | |
119 | -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \ | |
120 | -device <dev>,bus=downstream_port1 | |
121 | ||
122 | Notes: | |
2e41dfe1 C |
123 | - (slot, chassis) pair is mandatory and must be unique for each |
124 | PCI Express Root Port. slot defaults to 0 when not specified. | |
453ac883 MA |
125 | - 'addr' parameter can be 0 for all the examples above. |
126 | ||
127 | ||
128 | 2.3 PCI only hierarchy | |
129 | ====================== | |
130 | Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints, | |
131 | but, as mentioned in section 5, doing so means the legacy PCI | |
132 | device in question will be incapable of hot-unplugging. | |
133 | Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination | |
134 | with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies. | |
135 | ||
136 | Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge | |
137 | (having 32 slots) and several PCI-PCI Bridges attached to it | |
138 | (each supporting also 32 slots) will support hundreds of legacy devices. | |
139 | The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge | |
140 | until is full and then plug a new PCI-PCI Bridge... | |
141 | ||
142 | pcie.0 bus | |
143 | ---------------------------------------------- | |
144 | | | | |
145 | ----------- ------------------ | |
146 | | PCI Dev | | DMI-PCI BRIDGE | | |
147 | ---------- ------------------ | |
148 | | | | |
149 | ------------------ ------------------ | |
150 | | PCI-PCI Bridge | | PCI-PCI Bridge | ... | |
151 | ------------------ ------------------ | |
152 | | | | |
153 | ----------- ----------- | |
154 | | PCI Dev | | PCI Dev | | |
155 | ----------- ----------- | |
156 | ||
157 | 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use: | |
158 | -device <dev>[,bus=pcie.0] | |
159 | 2.3.2 Plugging a PCI device into a PCI-PCI Bridge: | |
160 | -device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0] \ | |
161 | -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y] \ | |
162 | -device <dev>,bus=pci_bridge1[,addr=x] | |
163 | Note that 'addr' cannot be 0 unless shpc=off parameter is passed to | |
164 | the PCI Bridge. | |
165 | ||
166 | 3. IO space issues | |
167 | =================== | |
168 | The PCI Express Root Ports and PCI Express Downstream ports are seen by | |
169 | Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each | |
170 | such Port should be reserved a 4K IO range for, even though only one | |
171 | (multifunction) device can be plugged into each Port. This results in | |
172 | poor IO space utilization. | |
173 | ||
174 | The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations | |
175 | by not allocating IO space for each PCI Express Root / PCI Express | |
176 | Downstream port if: | |
177 | (1) the port is empty, or | |
178 | (2) the device behind the port has no IO BARs. | |
179 | ||
180 | The IO space is very limited, to 65536 byte-wide IO ports, and may even be | |
181 | fragmented by fixed IO ports owned by platform devices resulting in at most | |
182 | 10 PCI Express Root Ports or PCI Express Downstream Ports per system | |
183 | if devices with IO BARs are used in the PCI Express hierarchy. Using the | |
184 | proposed device placing strategy solves this issue by using only | |
185 | PCI Express devices within PCI Express hierarchy. | |
186 | ||
187 | The PCI Express spec requires that PCI Express devices work properly | |
188 | without using IO ports. The PCI hierarchy has no such limitations. | |
189 | ||
190 | ||
191 | 4. Bus numbers issues | |
192 | ====================== | |
193 | Each PCI domain can have up to only 256 buses and the QEMU PCI Express | |
194 | machines do not support multiple PCI domains even if extra Root | |
195 | Complexes (pxb-pcie) are used. | |
196 | ||
197 | Each element of the PCI Express hierarchy (Root Complexes, | |
198 | PCI Express Root Ports, PCI Express Downstream/Upstream ports) | |
199 | uses one bus number. Since only one (multifunction) device | |
200 | can be attached to a PCI Express Root Port or PCI Express Downstream | |
201 | Port it is advised to plan in advance for the expected number of | |
202 | devices to prevent bus number starvation. | |
203 | ||
204 | Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI | |
205 | Express hierarchy) enables the hierarchy to not spend bus numbers on | |
206 | Upstream Ports. | |
207 | ||
208 | The bus_nr properties of the pxb-pcie devices partition the 0..255 bus | |
209 | number space. All bus numbers assigned to the buses recursively behind a | |
210 | given pxb-pcie device's root bus must fit between the bus_nr property of | |
211 | that pxb-pcie device, and the lowest of the higher bus_nr properties | |
212 | that the command line sets for other pxb-pcie devices. | |
213 | ||
214 | ||
215 | 5. Hot-plug | |
216 | ============ | |
217 | The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) | |
218 | do not support hot-plug, so any devices plugged into Root Complexes | |
219 | cannot be hot-plugged/hot-unplugged: | |
220 | (1) PCI Express Integrated Endpoints | |
221 | (2) PCI Express Root Ports | |
222 | (3) DMI-PCI Bridges | |
223 | (4) pxb-pcie | |
224 | ||
225 | Be aware that PCI Express Downstream Ports can't be hot-plugged into | |
226 | an existing PCI Express Upstream Port. | |
227 | ||
228 | PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI | |
229 | based and can work side by side with the PCI Express native hot-plug. | |
230 | ||
231 | PCI Express devices can be natively hot-plugged/hot-unplugged into/from | |
232 | PCI Express Root Ports (and PCI Express Downstream Ports). | |
233 | ||
234 | 5.1 Planning for hot-plug: | |
235 | (1) PCI hierarchy | |
236 | Leave enough PCI-PCI Bridge slots empty or add one | |
237 | or more empty PCI-PCI Bridges to the DMI-PCI Bridge. | |
238 | ||
239 | For each such PCI-PCI Bridge the Guest Firmware is expected to reserve | |
240 | 4K IO space and 2M MMIO range to be used for all devices behind it. | |
241 | ||
242 | Because of the hard IO limit of around 10 PCI Bridges (~ 40K space) | |
243 | per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the | |
244 | Integrated Endpoints. (The PCI Express Hierarchy needs no IO space). | |
245 | ||
246 | (2) PCI Express hierarchy: | |
247 | Leave enough PCI Express Root Ports empty. Use multifunction | |
248 | PCI Express Root Ports (up to 8 ports per pcie.0 slot) | |
249 | on the Root Complex(es), for keeping the | |
250 | hierarchy as flat as possible, thereby saving PCI bus numbers. | |
251 | Don't use PCI Express Switches if you don't have | |
252 | to, each one of those uses an extra PCI bus (for its Upstream Port) | |
253 | that could be put to better use with another Root Port or Downstream | |
254 | Port, which may come handy for hot-plugging another device. | |
255 | ||
256 | ||
257 | 5.3 Hot-plug example: | |
258 | Using HMP: (add -monitor stdio to QEMU command line) | |
259 | device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/> | |
260 | ||
261 | ||
262 | 6. Device assignment | |
263 | ==================== | |
264 | Host devices are mostly PCI Express and should be plugged only into | |
265 | PCI Express Root Ports or PCI Express Downstream Ports. | |
266 | PCI-PCI Bridge slots can be used for legacy PCI host devices. | |
267 | ||
268 | 6.1 How to detect if a device is PCI Express: | |
269 | > lspci -s 03:00.0 -v (as root) | |
270 | ||
271 | 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) | |
272 | Subsystem: Intel Corporation Dual Band Wireless-AC 7260 | |
273 | Flags: bus master, fast devsel, latency 0, IRQ 50 | |
274 | Memory at f0400000 (64-bit, non-prefetchable) [size=8K] | |
275 | Capabilities: [c8] Power Management version 3 | |
276 | Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ | |
277 | Capabilities: [40] Express Endpoint, MSI 00 | |
278 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
279 | Capabilities: [100] Advanced Error Reporting | |
280 | Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 | |
281 | Capabilities: [14c] Latency Tolerance Reporting | |
282 | Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 | |
283 | ||
284 | If you can see the "Express Endpoint" capability in the | |
285 | output, then the device is indeed PCI Express. | |
286 | ||
287 | ||
288 | 7. Virtio devices | |
289 | ================= | |
290 | Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints | |
291 | will remain PCI and have transitional behaviour as default. | |
292 | Transitional virtio devices work in both IO and MMIO modes depending on | |
293 | the guest support. The Guest firmware will assign both IO and MMIO resources | |
294 | to transitional virtio devices. | |
295 | ||
296 | Virtio devices plugged into PCI Express ports are PCI Express devices and | |
297 | have "1.0" behavior by default without IO support. | |
298 | In both cases disable-legacy and disable-modern properties can be used | |
299 | to override the behaviour. | |
300 | ||
301 | Note that setting disable-legacy=off will enable legacy mode (enabling | |
302 | legacy behavior) for PCI Express virtio devices causing them to | |
303 | require IO space, which, given the limited available IO space, may quickly | |
304 | lead to resource exhaustion, and is therefore strongly discouraged. | |
305 | ||
306 | ||
307 | 8. Conclusion | |
308 | ============== | |
309 | The proposal offers a usage model that is easy to understand and follow | |
310 | and at the same time overcomes the PCI Express architecture limitations. |